• Post Reply Bookmark Topic Watch Topic
  • New Topic

Extract data from text file  RSS feed

 
sae0203
Ranch Hand
Posts: 34
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I need help... I want to do some file processing.
let's say,
example:
vir.txt
//contents of the text file
source 1..2607
/organism="Bovine adenovirus 1"
/mol_type="genomic DNA"
/strain="10"
/db_xref="taxon:10546"
/map="73.3-80.8 map units"
CDS 1..666
/codon_start=1
/product="hexon-associated structural protein pVIII
precursor"
/protein_id="AAC40820.1"
/db_xref="GI:3135468"
/translation="MSKDIPTPYVWTFQPQLGGCGASQDYSTRMNWLSAGPSMINQVN
SVRADRNRILLRQAAVSETPRLVRNPPTWPAQYLFQPIGAPQTFELPRNESLEVAMSN
SGMQLAGGGRRTKDIKPEDIVGRGLELNSDIPSASFLRPDGVFQLAGGSRSSFNPGLS
TLLTVQPASSLPRSGGIGEVQFVHEFVPSVYFQPFSGPPGTYPDEFIYNYDIVSDSVD
GYD"
enhancer 266..273
/note="putative lymphoid 1"
TATA_signal 348..354
/note="putative E3"
mRNA 374..2298
/product="E3"
/note="AraC resistant transcript"
Below is the code i have to extact the contents of some desirable portion:
public void extrCDS ( String aLine) {
int exist = -1;
exist = aLine.indexOf(_CDS );
if ( exist == 5 ) { //the position of the first letter
CDS = aLine.substring(_Pos2, aLine.length()-1);
//_Pos2 is the start of position of the string i want to display
try
{
VDesc.write ("CDS " + CDS + "\n");
System.out.print ("CDS " + CDS + "\n");
}
catch (IOException e)
{
e.printStackTrace();
System.out.println("Error=" + e );
}

}
}
output:
CDS 909..162
how can i get it to display the rest of the CDS portion from the next line?
eg CDS 909..162
Codon_start=1
product ="hexon-associated structural protein pVIII
precursor"
protein_id="AAC40820.1" etc...
 
Ernest Friedman-Hill
author and iconoclast
Sheriff
Posts: 24217
38
Chrome Eclipse IDE Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If I understand this right, the lines starting with "/" are continuation lines, or sub-property lines. You could just grab these when you read the file -- i.e., read a line, and if it doesn't start with "/" create an ArrayList. Then read more lines, and as long as the lines start with a "/", add them to the ArrayList. When you find a line that doesn't start with "/", then add the ArrayList to a HashMap with the very first line you read as the key; then start the process over again.
Now, you've read the whole file; to find all the lines in the CDS section, you just find "CDS" in the HashMap, and you've got a list of them.
Make sense?
 
sae0203
Ranch Hand
Posts: 34
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks... I will work on it now.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!