RNAfeature provides a common set of conserved features for ncRNAs across multiple species.
The models in RNAfeature were trained on canonical ncRNAs (e.g.,tRNAs,rRNAs, miRNAs, snRNAs, snoRNAs, 7SK RNAs, Y RNAs).
Nucl. Acids Res. (January 2015) 43 (1): 104-114.
Long Hu, Chao Di, Mingxuan Kai, Yu-Cheng T. Yang, Yang Li, Yunjiang Qiu, Xihao Hu, Kevin Y. Yip, Michael Q. Zhang and Zhi John Lu
To find signature features shared by various ncRNA sub-types and characterize novel ncRNAs, we have developed a method, RNAfeature, to investigate >600 sets of genomic and epigenomic data with various evolutionary and biophysical scores. RNAfeature utilizes a fine-tuned intra-species wrapper algorithm that is followed by a novel feature selection strategy across species. It considers long distance effect of certain features (e.g., histone modification at the promoter region). We finally narrow down on 10 informative features (including sequences, structures, expression profiles and epigenetic signals). These features are complementary to each other and as a whole can accurately distinguish canonical ncRNAs from CDSs and UTRs (accuracies: >92% in human, mouse, worm and fly). Moreover, the feature pattern is conserved across multiple species. For instance, the supervised 10-feature model derived from animal species can predict ncRNAs in Arabidopsis (accuracy: 82%). Subsequently, we integrate the 10 features to define a set of noncoding potential scores, which can identify, evaluate and characterize novel noncoding ncRNAs. The score covers all transcribed regions (including unconserved ncRNAs), without requiring assembly of the full-length transcript. Importantly, the noncoding potential allows us to identify and characterize potential functional domains with feature patterns similar to canonical ncRNAs (e.g., tRNA, snRNA, miRNA, etc) on ~70% of human long ncRNAs (lncRNAs).
Predicted Noncoding Potential
The whole genome noncoding potential calculated by RNAfeature for every genomic bin (100nt, step size:50nt) is provided.
lncRNA feature matrixes
The feature matrixes at the lncRNA (exon) regions are provided.
All codes and files in this directory are subject to GNU GENERAL PUBLIC LICENSE, Version 3 or any later version published. Permission is granted to copy, distribute and/or modify this document under the terms of the License.