Tutorial

User Guide

I. Obtaining RIscoper

Instructions for obtaining RIscoper are available on the download page.




II. Launching RIscoper

To begin using RIscoper, follow the instructions on the download page for ensuring that you have Java on your computer and for downloading RIscoper. It is recommended that your computer has Java version 1.8 or later and your computer has at least 2 gigabytes of RAM. 
When RIscoper starts, the main window appears. The main components of the user interface are as shown below:     



1. The navigation bar, which provides quick access to common RIscoper operations.
2. The upload bar, which is used to upload new RNA names and positive RRI sentences to the system. See Sentence and RNA name upload for details .
3. The processes panel, which provides information about the status of your analyses.
4. The main panel, which is used to display dialogs and results. When you start RIscoper, the main panel displays the Home page.


III. Quick start with example files

Once you have downloaded and started RIscoper,
① click "Input" icon in the navigation bar, you will see the Input interface as shown below, you can input the literatures in different ways (see Input for details). Here, we provide a PMID list (example PMIDs.txt) as an example,
② select the "PMID list" icon, and click "browser" button to upload the example PMIDs file from your computer (or you can just click "example" button instead),
③ then click "Confirm" button to download the article abstracts. It may take some time for RIscoper to automatically download the article abstracts from PubMed, depending on number of the articles and your internet speed.




Once downloaded, the window will go to the Parameter setting interface as shown below. You can set the parameters for your analysis on this interface (see Parameter setting for details). Here, we start the analysis with default parameters,
① by click "Confirm" button, the window will go to the Run RIscoper interface.



The Run RIscoper interface is as shown below,
① you can click the "Run" button to execute your analysis. It may take a few minutes for RIscoper to complete the analysis, depending on number of the articles.



When the analysis is complete, the results will display in the main panel as shown below. The results table contains sentences inputted and corresponding PMIDs, scores and entity names (see RIscoper execution and output for details).
① You can browse the result on RIscoper or download it to your computer by click "Download" button.




IV. Input

Click "Input" icon in the navigation bar, you will see the Input interface as shown below. RIscoper support user-provided abstracts or full text papers, such as PDF and TXT file extraction, and network communication with PubMed by online PMID and keyword-based extraction. It’s practical for biologists. Furthermore, RIscoper provide "example" button for each input way.


1. Single PMID: a single PMID can be fill in the box (e.g: PMID: 24984703).
2. PMID list: list of PMIDs in plain text can be submitted (e.g: example PMID.txt).
3. Single text: literatures in plain text can be submitted (e.g: example articles.txt).
4. PDF directory: a folder of PDF documents can be submitted (e.g: example PDF file.zip).
5. Keywords: a keyword can be fill in the box (e.g: mir-34a-5p).


V. Parameter setting

Click "Parameter setting" icon in the navigation bar, you will see the Parameter setting interface as shown below. There are three parameters for configuration.


1. RNA entity filter:

  • It’s a option for filtering out sentences without RNA name. Only the sentence containing at least one RNA names can access to sentence scoring process.
  • In most cases, the RNA entity filter could improve the performance and speed of RIscoper. However, due to the limited coverage of the preset entity repository, some positive sentences may be mistaken deleted because of contain RNA names (new or unconventional name usually) have not been included in the preset entity repository. So selection of this filter or not need a trade off based on user requirements.

2. FDR:

  • The FDR threshold is estimated by the negative RRI corpus (13377 negative sentences without RRI information).
  • For instance, the score at top 5% of the negative RRI corpus is defined as threshold score of FDR 0.05, the score at top 1% of the negative RRI corpus is defined as threshold score of FDR 0.01. In fact, there is no optimized FDR threshold for users, because of the great differences of the true positive sentence rate of the input articles.

3. Presentation:

  • Entity show: it allow users to select shown the entity names from sentence or not.
  • Sentence show: show all the sentences inputted or only the sentences pass the FDR Threshold.




VI. RIscoper execution and presentation

It’s so easy to run RIscoper.
① with click "Run" button in the navigation bar, the analysis begin to execution, then you can see the status of analysis by the processes panel in the bottom left corner. The calculating speed of RIscoper has been tested on a PC with a 2.8 GHz CPU and 8 GB RAM, the results demonstrated that RIscoper can run on PC with fast calculating speed (it tooks about 23s for 100 sentences, 70s for 1000 sentences, 105s for 5000 sentences and 148s for 10000 sentence).
The details of computational procedure consisting of three steps:

  • (a). Sentence standardization, the articles inputted will be segmented into sentences by OpenNLP tool and all words will be lemmatized by BioLemmatizer tool. Then RIscoper deletes all words in brackets, strips whitespace and punctuation except commas and periods, and converts all words to lowercase.
  • (b). Named Entity Recognition, a preset entity repository (collected a large number of RNA names from various databases and corpuses) is employed to recognize the RNA names in each sentence.
  • (c). Sentence scoring, the sentence is scored by N-gram statistical model and a manually curated RRI corpus (containing 13377 sentences with RRI information). And then, Katz smooth and geometric mean algorithm are used to smoothed and normalized the score, respectively.




When the analysis is complete, RIscoper display the results by table format as shown below.
① You can browse the result on RIscoper or download it to your computer by click "Download" button.


1. The first column of the table contains PMID and the numerical order of sentences (e.g, 24635082_1 means the sentence is the first sentence in the article with PMID 24635082).
2. The second column of the table contains analyzed sentences.
3. The third column of the table contains entity names (RNA names) of sentences.
4. The fourth column of the table contains score of sentences, which ranges from 0 to 1, with a higher score indicating more possibility of containing RRI information.






VII. Sentence and RNA name upload

RIscoper provide options for uploading new positive RRI sentences and RNA names to extend the RRI corpus and/or entity repository. You can upload your own RRI sentences and RNA names by the upload buttons, then RIscoper can automated integration of them to pre-existing RRI corpus and/or entity repository for your analysis.


1. Pos-sentence upload: for uploading new RRI sentences (pos-sentence.txt).
2. ncRNA upload: for uploading new ncRNA names (ncRNA-entity.txt).
3. mRNA upload: for uploading new mRNA names ( mRNA-entity.txt).

      FAQ

Q: What is the purpose of RIscoper?
A: RIscoper is a user friendly software written in JAVA, which is a fast and simple tool that extracted RRIs from literatures with high performance and practicability.

Q: How do I use RIscoper?
A: For basic instructions on using RIscoper, see the User Guide.

Q: What is requirements of computer configuration for RIscoper
A: RIscoper can works on Windows (for 32 and 64 bit) and Mac system. It is recommended that your computer has Java version 1.8 or later and your computer has at least 2 gigabytes of RAM. Moreover, the software with JRE are provided for windows and Mac system as well, which can work on systems without JRE.

Q: What does that RIscoper take as input?
A: RIscoper support user-provided abstracts or full text papers, such as PDF and TXT file extraction, and network communication with PubMed by online PMID and keyword-based extraction (see User Guide for details). It’s practical for biologists.

Q: Can I use RIscoper without internet connection?
A: If there is no internet connection, the PMID input and keywords input of RIscoper are infeasible. Because of they need connect to the PubMed to download articles. However, you can use RIscoper by input articles in plain text (TXT or PDF) or a folder of plain texts which on local computer.

Q: If the download speed of articles is too slow when I use the PMID input and keywords Input, What should I do?
A: You can directly download all the articles you need from PubMed by yourself, and save these documentation in plain text, then submit it using plain text input.

Q: What is the RNA entity filter?
A: It’s a option for filtering out sentences without RNA names. Only the sentence containing at least one RNA names can access to sentence scoring process. In most cases, the RNA entity filter could improve the performance and speed of RIscoper. However, due to the limited coverage of the preset entity repository, some positive sentences may be mistaken deleted because of contain RNA names (new or unconventional name usually) have not been included in the preset entity repository. So selection of this filter or not need a trade off based on user requirements.

Q: What is the FDR threshold in the RIscoper?
A: The FDR threshold is estimated by the negative RRI corpus (13377 negative sentences without RRI information ). For instance, the score at top 5% of the negative RRI corpus is defined as threshold score of FDR 0.05, the score at top 1% of the negative RRI corpus is defined as threshold score of FDR 0.01. In fact, there is no optimized FDR threshold for users, because of the great differences of the true positive sentence rate of the input articles.

Q: How can I upload the new RNA names or RRI sentences to RIscoper?
A: RIscoper support users upload new positive RRI sentences and RNA names to extend the RRI corpus and/or entity repository. You can upload your own RRI sentences and RNA names by the upload buttons, then RIscoper can automated integration of them to pre-existing RRI corpus and/or entity repository for your analysis.

Q: How can I download the RRI corpus?
A: All the sentences in RRI corpus were manually confirmed carefully. As a first dataset describing RNA-RNA interactions, this dataset providing a favourable resource for ongoing text mining studies of RNA interactions and will be a useful dataset in other future machine learning work related to natural language processing. The corpus could be download at http ://www.rna-society.org/riscoper/download.html.

Q: I discovered a software bug in RIscoper. What should I do?
A: Please report the bug to us immediately by email, we can assure you that each bug report will be addressed and responded as soon as possible. Contact address: wangdong @ems.hrbmu.edu.cn.

© College of Bioinformatics Science and Technology, Harbin Medical University