User Guide

I. Obtaining RIscoper

Instructions for obtaining RIscoper are available on the download page.




II.Launching RIscoper

To begin using RIscoper, follow the instructions on the download page for ensuring that you have Java on your computer and for downloading RIscoper.
When RIscoper starts, the main window appears. The main components of the user interface are as shown below:       



Reference:
1.The navigation bar, which provides quick access to common RIscoper operations.
2.The upload bar, which is used to upload new RNA names and positive RRI sentences to the system. Then RIscoper can automated integration of them to entity repository and/or RRI corpus for your analysis.
3.The processes panel, which provides information about the status of your analyses.
4.The main panel, which is used to display dialogs and results. When you start RIscoper, the main panel displays the Home page.


III. Quick start with example files

Once you have downloaded and started RIscoper, ① click “Input” icon in the navigation bar, you will see the Input interface as shown below, you can input the literatures in different ways (see Input for details). Here, we provide a PMID list (example PMIDs.txt) as an example, ② select the “PMID list” icon, and click “browser” button to upload the example PMIDs file from your computer, ③ then click “Confirm” button to download the article abstracts. It may take some time for RIscoper to automatically download the article abstracts from PubMed, depending on number of the articles and your internet speed.




Once downloaded, the window will go to the Parameter setting interface as shown below. You can set the parameters for your analysis on this interface (see parameter setting for details). Here, we start the analysis with default parameters, ① By click “Confirm” button, the window will go to the Run RIscoper interface.



The Run RIscoper interface is as shown below, ① you can click the “Run” button to execute your analysis. It may take a few minutes for RIscoper to complete the analysis, depending on number of the articles.
<>

When the analysis is complete, the results will display in the main panel as shown below. The results table contains sentences inputted and corresponding PMIDs, scores and entity names (see RIscoper execution and output for details). ○1 You can browse the result on RIscoper or download it to your computer by click “Download” button.




IV. Input

Click “Input” icon in the navigation bar, you will see the Input interface as shown below. RIscoper support user-provided abstracts or full text papers, such as PDF and TXT file extraction, and network communication with PubMed by online PMID and keyword-based extraction. It’s practical for biologists.



Reference:
1. Load example: a example data was integrated for quick start.
2. Single PMID: a single PMID can be fill in the box (e.g: PMID: 24984703).
3. PMID list: list of PMIDs in plain text can be submitted (e.g: example PMID.txt).
4. Single text: literatures in plain text can be submitted (e.g: example articles.txt).
5. PDF directory: a folder of PDF documents can be submitted (e.g: example PDF file.zip) can be submitted conveniently as well.
6. Keywords: a keyword can be fill in the box (e.g: hsa-mir-221 or liver cancer).





V. Parameter setting

Click “Parameter setting” icon in the navigation bar, you will see the Parameter setting interface as shown below. There are three parameters for configuration.



Reference:

1.RNA entity filter:
   It’s a optional option for filtering out sentences without RNA name. Only the sentence containing at least one RNA names can access to sentence scoring process, the remaining sentences will be filter out with score as 0.
   In most cases, the RNA entity filter could improve the performance and speed of RIscoper. However, due to the limited coverage of the preset entity repository, some positive sentences may be mistaken deleted because of contain RNA names (new or unconventional name usually) have not been included in the preset entity repository. So selection of this filter or not need a trade off based on user requirements.

2.FDR:
   The FDR threshold is identified by the score rank list of the negative RRI corpus (13377 negative sentences with no RRI information).
   For instance, the score at top 5% of the negative RRI corpus is defined as threshold score of FDR 0.05, the score at top 1% of the negative RRI corpus is defined as threshold score of FDR 0.01. In fact, there is no optimized FDR threshold for users, because of the great differences of the true positive sentence rate of the input articles.

3. Presentation:
   Entity show: it allow users to select shown the entity names from sentence or not.
   Sentence show: show all the sentences inputted or only the sentences pass the FDR Threshold.





VI. RIscoper execution and presentation

It’s so easy to run RIscoper. ① with click “Run” button, then the analysis begin to execution, then user can see the status of your analysis by the processes panel in the bottom left corner. details of computational procedure consisting of three steps:
(1).sentence standardization, as the content inputting, the articles will be segmented into sentences by OpenNLP tool and the words will be lemmatized by BioLemmatizer tool. Then RIscoper converts all letters to lowercase, deletes all words in brackets, and strips whitespace and punctuation except commas and periods.
(2). Named Entity Recognition, A preset huge entity repository collected a large number of RNA names from various databases is employed to recognize the RNA names in each sentence.
(3) Sentence scoring, the sentences is scored by N-gram statistical model and a manually curated RRI corpus containing 13377 sentences include definitely RNA-RNA interaction information. And then, Katz smooth algorithm and geometric mean are used to smoothed and normalized the score, respectively.



When the analysis is complete, RIscoper provide a table to show the results as shown below.



Reference:

1.The first column of the table contains PMID and the numerical order of a sentence(e.g, 24635082_1 means the sentence is the first sentence in the article with PMID24635082).

2.The second column of the table contains sentence which analysis by RIscoper.

3.The third column of the table contain entity names (RNA names) of a sentence.

4.The fourth column of the table contains score of a sentence, which ranges from 0 to 1, with higher scores indicating more possibility of containing RRI information.


FAQ

Q: What is the purpose of RIscoper?
A: RIscoper is a user friendly software written in JAVA, which is a fast and simple tool that extracted RRIs from literatures with high performance and practicability.

Q: How do I use RIscoper?
A:For basic instructions on using RIscoper, see the User Guide.

Q: What is requirements of computer configuration for RIscoper
A: RIscoper can works on Windows, Mac and Linux system, and need Java on your computer. We recommend at least 1GB of RAM to RIscoper, and we provide 2 versions of RIscoper: 32 and 64 bit architecture for Windows and Linux system.

Q: What does that RIscoper take as input?
A: RIscoper support user-provided abstracts or full text papers, such as PDF and TXT file extraction, and network communication with PubMed by online PMID and keyword-based extraction (see User Guide for details). It’s practical for biologists.

Q:Can I use RIscoper without an internet connection?
A: If there is no internet connection, the PMID input and keywords input of RIscoper are infeasible. Because of they need connect to the PubMed to download articles. However, you can use RIscoper by input articles in plain text (TXT or PDF) or a folder of plain texts which on local computer.

Q: If the download speed of articles is too slow when I use the PMID input and keywords Input, What should I do?
A: You can directly download all the articles you need from PubMed by yourself, and save these documentation in plain text, then submit it using plain text input.

Q: What is the RNA entity filter?
A: It’s a optional option for filtering out sentences without RNA names. Only the sentence containing at least one RNA names can access to sentence scoring process, the remaining sentences will be filter out with score as 0. In most cases, the RNA entity filter could improve the performance and speed of RIscoper. However, due to the limited coverage of the preset entity repository, some positive sentences may be mistaken deleted because of contain RNA names (new or unconventional name usually) have not been included in the preset entity repository. So selection of this filter or not need a trade off based on user requirements.

Q: What is the FDR threshold in the RIscoper?
A: The FDR threshold is identified by the score rank list of the negative RRI corpus (13377 negative sentences with no RRI information ). For instance, the score at top 5% of the negative RRI corpus is defined as threshold score of FDR 0.05, the score at top 1% of the negative RRI corpus is defined as threshold score of FDR 0.01. In fact, there is no optimized FDR threshold for users, because of the great differences of the true positive sentence rate of the input articles.

Q: How can I upload the new RNA names or RRI sentences to RIscoper?
A: RIscoper support users upload new RNA names and positive RRI sentences to the system by uploading features. Then RIscoper can automated integration of them to entity repository and/or RRI corpus for their own analysis.

Q: How can I download the RRI corpus?
A: All the sentences in RRI corpus were manually confirmed carefully. As a first dataset describing RNA-RNA interactions, this dataset providing a favourable resource for ongoing text mining studies of RNA interactions and will be a useful dataset in other future machine learning work related to natural language processing. The corpus could be download at http://www.rna-society.org/riscoper/download.html.

Q:I discovered a software bug in RIscoper. What should I do?
A:Please report the bug to us immediately by email, we can assure you that each bug report will be addressed and responded to as soon as possible. Contact address: wangdong@ems.hrbmu.edu.cn.

© College of Bioinformatics Science and Technology, Harbin Medical University