extracting substrings from a file
Tim Williams
tim at tdw.net
Mon Sep 11 10:33:58 EDT 2006
More information about the Python-list mailing list
Mon Sep 11 10:33:58 EDT 2006
- Previous message (by thread): extracting substrings from a file
- Next message (by thread): A cross platform systray icon
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 11 Sep 2006 05:29:17 -0700, sofiafig at gmail.com <sofiafig at gmail.com> wrote: > Hi, > > I have a file with several entries in the form: > > AFFX-BioB-5_at E. coli /GEN=bioB /gb:J04423.1 NOTE=SIF > corresponding to nucleotides 2032-2305 of /gb:J04423.1 DEF=E.coli > 7,8-diamino-pelargonic acid (bioA), biotin synthetase (bioB), > 7-keto-8-amino-pelargonic acid synthetase (bioF), bioC protein, and > dethiobiotin synthetase (bioD), complete cds. > > 1415785_a_at /gb:NM_009840.1 /DB_XREF=gi:6753327 /GEN=Cct8 /FEA=FLmRNA > /CNT=482 /TID=Mm.17989.1 /TIER=FL+Stack /STK=281 /UG=Mm.17989 /LL=12469 > /DEF=Mus musculus chaperonin subunit 8 (theta) (Cct8), mRNA. > /PROD=chaperonin subunit 8 (theta) /FL=/gb:NM_009840.1 /gb:BC009007.1 > > and I would like to create a file that has only the following: > > AFFX-BioB-5_at /GEN=bioB /gb:J04423.1 > > 1415785_a_at /gb:NM_009840.1 /GEN=Cct8 > > Could anyone please tell me how can I do it? If each entry is a single line, then the following is just to give you some ideas. It is not robust enough for "production" though. The 2nd input line has 2 /gb fields, and your script would need to have some way of knowing which one to pick. >>> for x in s.splitlines(): ... data = x.split() ... output = [ data[0] ] ... for z in data[1:]: ... if (z.startswith('/GEN') or z.startswith('/gb'))and z not in output: ... output.append(z) ... print ' '.join(output) ... AFFX-BioB-5_at /GEN=bioB /gb:J04423.1 1415785_a_at /gb:NM_009840.1 /GEN=Cct8 /gb:BC009007.1 HTH :)
- Previous message (by thread): extracting substrings from a file
- Next message (by thread): A cross platform systray icon
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list