Разделите отдельные файлы fasta на множество файлов fasta на основе фрагмента белка

#bash #fasta

Вопрос:

У меня есть много файлов в формате .faa (формат FASTA), которые выводятся из VIBRANT и содержат множество фаговых белков, разделенных их заголовками. Вот пример вывода файла (имя файла= plate11.A10.faa):

 >122_fragment_2_95  (102956..103258)    1   K04764  "ihfA, himA; integration host factor subunit alpha"
MGALTKAEMAERLYEELGLNKREAKELVELFFEEIRHALEDNEQVKLSGFGNFDLRDKRQRPGRNPKTGEEIPITARRVVTFRPGQKLKARVEAYAGTKS*
>122_fragment_2_96  (103239..103595)    1   PF13411.6   MerR HTH family regulatory protein
MLEPSHNDELPPIPGKRYFTIGEVSELCAVKPHVLRYWEQEFPQLNPVKRRGNRRYYQRQDVLMIRQIRGLLYDQGFTIGGARLRLTNGEVKDDTQQYKQMIRQMIAELEDVLVMLKS*
>122_fragment_2_97  (103843..105027)    -1  VOG00041    sp|O10330|VLF1_NPVOP Very late expression factor 1
MAQKAITGLQKMPNGIWKIDKKYRGERIQESTGTCDRAEAEQYLIHMLEKLRQRKVYGVRQVRTWREASIRFLMEVKNQPSIHISATYMSQLDPFIGHMPITHVDDDALAPYIRSKLEPENGKPVTNRTVNIALQRVIRVLNLCARKWRDEERRPWLDVVPMISLLDEKTNCRKPYPLSWEEQSILFAELPAHLQTMAMFKVNTGCREQEVCKLQWDWEIAVPELGTSVFLIPAGFGGRSAKAGVKNRDERLVVLNDVAKSVVEQQRGKHKLFVFPFGKPDGEGNETTVHRMNDSAWKKARIRAAKKWQEKYLRPAHDGFLRIRIHDLKHSFGRRLRAAGVTEEDRKALLGHKNGSITSHYSAAELDQLIAAANKVSATDSRAPALTILKRRQL*
>122_fragment_2_98  (105051..105527)    1   None    hypothetical protein
MTLAAGLIVVLIGCLFNRLTLDVGIRPRIQLQPIKADALFSNGKFPHVWAHGLVEFVTTHAQIAVGITCPDEPGQDWRYLGGRFVCHRVTAPGRAGRRKGLFPVVTEQWVDRRIALKRRQIQQREMPIHPACCIDVVDVSEDGRLSKRRMPYDQRLDT*
>122_fragment_2_99  (105351..106085)    -1  K07313  pphA; serine/threonine protein phosphatase 1 [EC:3.1.3.16]
MLETIEVVRIKRFAENTAGRDFAVGDIHGHFTRLQVALDAAGFNPAVDRLFSVGDLVDRGPECEDVIKWLNKPWFHPVRGNHDDYVCRFDTCDIGNWMYNGGTWFVGLPLDEQKNYQVMFDELPIAIEVETAGGLVGIVHADCPFPSWDELRAELESPQTRKRLKLVHNTCMWSRSRIQDADASGVSGIKALVVGHTPLRQPAILGNVYHIDTAGWMDGHFTLLDLATLQCNPPINPLLSHDWE*
>122_fragment_2_100 (106135..106302)    -1  None    hypothetical protein
MNEIQLDLIPRKITLPHPPRHTGFLEPDRQVPGYTLEQMIEYGKACATEAVKQSK*
>122_fragment_2_101 (106299..106451)    -1  None    hypothetical protein
MSREAFEQAYAEDNNCDLEWCQGQRLDNGSYRDRYMARAWHWWCRAKEAT*
>122_fragment_2_102 (106451..106882)    -1  None    hypothetical protein
MEKNKLGPDHYRYVDELDPKGLEVTCKRYVVIGETDQCWYIVSEFHDKLFGGSQRESLLKQYRKRVLKDGGEHGRRFAYTDKALALRSYKQRKSWQVRHAQLSLERAKAAIAYFGDTRTESTVPPDNLMIPCEYIQDMNWSEC*
>122_fragment_2_103 (106946..107464)    -1  None    hypothetical protein
MGLTNNKPNDVRVVPVELLERIVSLNNSFADHRQAQHDLVKFLAQPADQQGEPVVWGAPETVGQLIRQLQTLDPALETVALYRLPDHIPGVGGKVKQGHISTSYERMEGIWLGPYKGDGRKVLAFWTKLDPRPVPDGEFLMQTPPPDRELVTRRCECPVCWPDHPAHANRPR*
>122_fragment_2_104 (107455..107670)    -1  None    hypothetical protein
MSKVRRCAASPVQPEFSLCGEAFDAFDEKLTSEPYEIAEPDQSITCPMCCEAIREIKSIRNPLRPRSRSWA*
>122_fragment_2_105 (107667..108203)    -1  None    hypothetical protein
MSKHPIPDWLRSQFSLIEDEVRKLGPCGVFTQMRTVTQTYFEQLAAAPQRPALGGEPDILGKVVSFGEGPKEISWAKGKIPEFGVELIDRAHLAPLQAEIERLDDELDERGQWDTAQLKVIHGLNDDVARLKARCNEMESGLRIIATAGTKTRAPDLRLKAKQTLIRALSKPAGSEQV*
>122_fragment_2_106 (108200..108529)    -1  None    hypothetical protein
MSNELKKCSVLKDANVSYAASTAVNAAALGAQNVNSAMSTIDALRQQLADVSNERDGFRVQVEEAERFVEYLINNCVGQVVSEGKIKYWMACSIERHAKEKSTEQRAQS*
>122_fragment_2_107 (108693..109934)    -1  K00590  E2.1.1.113; site-specific DNA-methyltransferase (cytosine-N4-specific) [EC:2.1.1.113]
MSQLHQILVGDCIDMMRTLPDESVHTCVTSPPYYGLRDYGVEGQIGLEETPAEFIARLVDVFREVRRVLRSDGTIWVNMGDSYAGSWGAHGREDMGVGVSTLSQRQVMASQRKSKAITHAEYKPKDLMGMPWRLAFALQDDGWYLRQDIIWHKPNPMPESTRDRCTKAHEYLFLLSKSRRYHYDSDAIREPANLTGKGNANGYRGGAYVNGSTFDNAEGGKRTSSGNTVPNNGVGWGHGTDKASRNRPRVTVPTGWDTSTGEGGHGVFHKDGAERKRRDSFKREDSKREQAIPGQSKGTHRPDRDESTHDTATRNKRSVWTVATHAFKEAHFATFPPDLIRPCILAGAPRGGVVLDPFGGAGTTSLVSMQEGRRSIICELNPEYAALARARIDAAWLDGAAQMDVFRDSVPAA*
>122_fragment_2_108 (110007..110387)    -1  None    hypothetical protein
MSKRKDILDELSKEELLAWVRTQFFSRLPKRSEILYLRWEKQSSEALEEMRLENLKGPGVDLKERDLLAVRFNESTDAAEKLRLLELMEPYGAALNAHIKRSQAISRKLKRVDALYEQIDIERQKE*
>122_fragment_2_109 (110495..110710)    1   None    hypothetical protein
MSAEHRKLIGIPDDHGLKHTGSKSEQRKGRDTDIDFYDETDAQGNVIAQYEVRDSMSIYPPQGTTLSFRKL*
>122_fragment_2_110 (110799..111032)    -1  None    hypothetical protein
MALTQKQRDERTALKRHKAGEEELRLRVRPGTKQALKELMEWAKIEEQGEALTLMIQHLHSLGRAARCRCLKSRATK*
>122_fragment_2_111 (111176..111703)    -1  None    hypothetical protein
MMTTELSAIRRNSIESNRLAQAMAEFTSKGGTVEVIEGFVSKPRPEPKAYGRDFPAEQAPKPVKRERKRTPTQVRNSSSGRTQVNDALVQRILEMARRPARARLKRKRGSAATCSIDWQMSMGFSSSSTTHARTCGPRRSTQLRTLCTSSESKSCETKAWLESRPLPRWASAIRW*
>122_fragment_2_112 (111714..112385)    -1  K10857  exoX; exodeoxyribonuclease X [EC:3.1.11.-]
MTAYIFDSETTGFKEPQLVEAAWLKLGATVGLPVTDEYLARFKPSKAIELGALSTSHILDEDLVDCPCHTSFQMPPDTEYLIGHNVDYDWGVIGQPDLKRICTAALSRRLWPEADSHSQSAMIYLHYREQATGLLRNAHAALDDVKNCRLLLSKILDALAVKLGRPVEGWEELWSISEEARVPTVISFGKHKGSLIANLPSDYKRWLLNQADLDPFVRKALSK*
>122_fragment_2_113 (112382..113248)    -1  VOG01757    REFSEQ AAA ATPase
MFKKAERKQAKLRLALAGPSGSGKTYSALLMAKGLGGRIAVIDTEQGSASLYSDIADFDVLELQAPFSPERYVDAIAAAEAAGYNVLIIDSYSHEWTGPGGCLESNEALAHQKFRGNTWAAWNETTPRHRQLTNKILTSTLHVICTMRSKTETVQGEGKKIVKLGMKSEQRDGTDYEFTVVLDLTHDAHTALASKDRTKLFTQPELIDESTGRKLLDWLNSGVNPEERAKELLIDAIADIASAKDMVGLQAAFNAAKVIAIGYDDLVNRVVAAKDKRKTELTPLEQSA*
>122_fragment_2_114 (113259..113402)    -1  None    hypothetical protein
MSNPRMSAQLDWMTVGSFSPERFTGEERKEYEAEQARIEREWDQQPN*
>122_fragment_2_115 (113410..114534)    -1  VOG11477    REFSEQ hypothetical protein
MSVEKELAVVPPKEKALQIFQTPKGLDPYLQIVRDKIDAFVPDVTTRKGRDAIASIAYTVARSKTALDNRGKELVAELKEIPKLIDAERKRMRDTLDTWQEEVRRPLNEWQAKEDARVEYHNSMIRHIEDCGIGLIGGQPQPFGLLFRELEEKIIVDEKYQEFEAEAHRVKAAALAKLRASFDEHQKREAEQAELARLRSEAEARAKAERDAEIARAAAEKARFEAEQKAQAEREAAAKREQELIEQAAQAKRDAEQKQRDADAAAANQALQLKLAAEREERQKLQAEQDRIAAEQRQAAAVERARLDEISRQEQEAAEASRIAEAREADKAHIKSVCLAAQQAMVNLGIDEACAKAVIILIHQKKIPAITIAY*
>122_fragment_2_116 (114704..114874)    -1  None    hypothetical protein
MTTPIVKSLIDEQLADIERSLSIVSAGLPREIPVSALPPKLVEAVKTGRLAVRPRQ*
>122_fragment_2_117 (114871..115122)    -1  None    hypothetical protein
MSDLERYQDSAQGRRSIRQATGLYDDLGNLKSALVDYFHDYADPVDYAAVRAAERDYRKKLARRISVAITKMEMVCPPKGASA*
>122_fragment_2_118 (115119..115337)    -1  None    hypothetical protein
MSRHDTAKRFIERALAEYATAPCPDMTAAAVQMAVELAYAQGDISCVEHTHYTERRNRMVARHRIEPVRACA*
>122_fragment_2_119 (115358..115579)    -1  VOG00405    REFSEQ hypothetical protein
MSNDQSEPAFPVPGSEYGGTGTCFGMTLRDYFAAHAPNAPDDFGWNNGEATQCERLARWSYHYADAMLAARTA*
>122_fragment_2_120 (116347..116631)    -1  None    hypothetical protein
MQCECVTRVKERIDGKLREQMPEGANSLEWSFPQIRFGLTNDGVVHLPVFDIKGEFQAPKKAGGFKRVKVDTFLAATYCPFCGMKCKADEQKAA*
>122_fragment_2_121 (117059..117718)    -1  K01356  lexA; repressor LexA [EC:3.4.21.88]
MEFKDRLRARMTDLKLSATDLSAMIRVSKATITFWRNGTNGATGSNLMELAKALRCSPEWLETGKGEPGGVSGGEASNFELVEAPDRLYRYPVVSWVAAGAWAEAVEPLPSGFSDRYEVSEYKAKGPAFWLEVRGDSMTALSGTSIPEGMMILVDTEADVRPGKFVVAKLPNSEEATFKKLVEDAGRRYLKPLNPAYAMIECSDDCRIIGVAVRMTGTL*
>122_fragment_2_122 (117808..118017)    1   None    hypothetical protein
MTYEEALKHFGTQRAIGDALGVTTSRVSQCRTAGGFSYPMQCVLEKESSGTLIANRQDDPAQAPRMTAA*
>122_fragment_2_123 (118047..118235)    1   VOG24653    REFSEQ gp11
MHFDPSHMHDKPTKVRLDEVADDLLTAMARYQRTQKAVLAREILERGLNQMMEELNAKTDVA*
>122_fragment_2_124 (118254..118445)    1   None    hypothetical protein
MPERKQLDVQLDGIGVSNLELLAKREGITPEELAAKIINKELDRMSRPPPSRGKVRSIGRRAD*
>122_fragment_2_125 (118570..119409)    1   VOG02593    REFSEQ hypothetical protein
MPISQHVVNSDSPRHEIAPSQNVAHSQLITLIGGEAFTTTLAISAGCELDHASVIKMVRTYQADLEEFGLLDFKSESTGGRPTELALLNEQQSTLLLTYMRNTPIVREFKKRLVKEFWRLAHSAPAFDIASLNDPKVLLALLTDNVRKVVHLEADNTELTQENHLLEQKVVADAPKVDFFNAVITSTSIHSVREVAQSIGTGQNRLFAFMRQQRWVDRHNTPYQGRVESGYLVAEPHSYICPETGERKTKFTCKVTGKGFTKLQALWAGRDTAILGGAA*
>122_fragment_2_126 (119406..120155)    1   None    hypothetical protein
MNDVPRQFKGVWIPAEVWLDRSLSITEKVMIVEIGSLQDPVRGCYASNNHFGRFFGLSNSRVSEIISSLTSKGLLRVELIRDGRQVVERRVRLTDLFGKSNTYSENASTLFGKGGDPYSEKAQESNTKSNSTTEGEKRGSAKASPSASRKASKFDPLTARPSNVSESTWADWSQHRLEIKKPLTATTCAKQAKTLAGHHDADAVINQSISNGWTGLFPEKVLPGAKASGKAQGPDFYDKSWRTDTSDDL*
>122_fragment_2_127 (120152..120883)    1   VOG11468    sp|P03689|VRPP_LAMBD Replication protein P
MKNVTQMIPGAARALGTAAPYQAPAQTGTQLGVVDDATGEVVERLFRQLQAIFPAHKQAWPDDKAKAAAMRNWTMGFMAAGIRTLEQIRYGIEQCRKSGSPFAPSVGQFIGWCTPGPEAFGLPASADAWMEALMAVYSHEGVKIAAIATGLFDLRSAKQEDKGLRQRFDHNYTIVIRRAQSGQPLDGKILTGIGHDSQKTELELAEEQAEQAVQARIIQQGIPVDAASARALLLARIGRRAGQ*
>122_fragment_2_128 (120880..121305)    1   PF09397.10  Ftsk gamma domain
MSNDKKRGELLAAFEAHKSRIATAIIGGGEGARVRQVLERVSLSDFEAGWQASREALEQSAISPEVQAMLSQFEADEAEEIRMAEAYVRESERCSISALQRKFKIGYNRACRLMDRLVALKVVSPIDAEGRRTVLPEQVKP*
>122_fragment_2_129 (121302..121490)    1   VOG11464    REFSEQ hypothetical protein
MNRANPAQLRQALALANAYTKAGIRFVCMPVVDEADGMNLKDQAQQRLERMALIAESAERLA*
>122_fragment_2_130 (121487..122185)    1   None    hypothetical protein
MNVDIEKIEALAKGCRDEVIRSHGWTGMIADAGLLRRDSEFLKECSPEAVLDLIASGKRMASRLMYCPACQGEGEVYSGRNSYEGYNQPPEPIMNKCGECDGDGALGDTAECISILDEVETLRAENAGLKTGYEAYERVNAELRAECEKLRAKILSGAARAKKLVWIASNHRRDALVLRKDAERREWEGFNNGLTLAGNLRPSITHGDDGGRVLNDIRCQIDEAMRKDQSHD*
>122_fragment_2_131 (122178..122648)    1   None    hypothetical protein
MTDKISVNCQSKLTEAVTRMSAMFREKKFVVVSLRPGKDRTLDQNALWFAFYKRISEMTQIGDASEARKYCKLHHGVQILINEDEDYRAAWHRTTKHLTYEEKLGLMGDSKLLGPDGFPVTSMFNRAQGVAYTDRILTEFSALGVFFGDLIGEVAA*
>122_fragment_2_132 (122645..122959)    1   VOG01140    REFSEQ hypothetical protein
MTHQFKSGDLALIVGAHTTPENVGKVCELVELLAPEQISTWRDPADGQRIQNGDVGAAWVVIGDGLTSWCGSSGWVMADPIHLMPLRGDFAPEQQEVKEAEPCA*
>122_fragment_2_133 (122950..123540)    1   PF05766.12  Bacteriophage Lambda NinG protein
MRLSLQAKTPKTKKCRVPDCGASFVPQKLGQAVCSPACAIIDAPRNQAKARKALAQVERREIKIRKEALKSRSDHMKDTQQAFNEWVRNRDAALPCVSCGRHHEGKYDAGHYRTVGSNPALRFEPLNCHKQCVPCNQHKSGNVVEYRIELVRRIGMLNVEWLEGPHEPQKYTIEELKALTAKYRALTRELKKGEAA*
>122_fragment_2_134 (123540..124082)    1   VOG12651    REFSEQ hypothetical protein
MYRNVVAAVVRALAAETINSAGGCDFEPKVQCAKQKGEIVGKEAAFLTDCWVFGRLHKALSNEHWRALVAKYSTHTERKHAAITEITRQYRSPAPERFRHCAIVTWAMPKLPGVDGKRSTNVLPSAWYEMDNWSDEPHPIKTQERWRRDIRKGLESMVDVALTEAQHILEAEGILIADCA*
>122_fragment_2_135 (124425..124796)    1   None    hypothetical protein
MDPTDLGPGTATWLGGTGTILLGGFLWLRKFLSKDAADRAMDNADIGTVRRLNELLDSERVARKEAEARADQFAKERNELAAAVGRMEGKIEALTSHIVQLTDKVTSQSAEIARLRAQLGGNN*
>122_fragment_2_136 (124796..125134)    1   None    hypothetical protein
MDRCAINFVARHWWRRVEVWLIAILLLAGGAMLGFQVAQWSLASWYVAQVAEVRQAYDEATIQRDMRLNKLAKSATEAAVKVEGAAGKATEAAEVASKAADKVNEAVERQSP*
>122_fragment_2_137 (125180..125314)    1   None    hypothetical protein
MTKKNWYVTTPGHKPFPMILLESALDHAGALAFARSIWPNCTVE*
>122_fragment_2_138 (125319..125942)    1   VOG12618    REFSEQ hypothetical protein
MIRPTPPAELLRESEDSDVFMRLVPAKDVWDWIQAEILADTGSIHNEDHAHLIDADICIMWASSAFTKQGRTVLGQAEQVAFRAGGWQKARMEQQMRDWFGYVPSYIITLAADYCSQCPDDDFCALVEHELYHIAQATDQYGAPKFTQDGLPKLEMRGHDVEEFVGVVRRYGASPDVQLLVDAANKPAEVGKLNISRACGTCLLKLA*
>122_fragment_2_139 (125974..126432)    1   VOG12619    REFSEQ hypothetical protein
MAALTPDVKAYIVQALACFDTPSQVVDAVQREYGITVSRQQVETHDPNKTSGKGLAKRWVALFEDTRKRFREDAAAIPIANRSYRLRVLDRMAVRAEGMKNIALAAQLIEQAAKETGGIYTNKQQVDHTSSDGSMSPKGKSLDDFYNGDVPA*
>122_fragment_2_140 (126419..127762)    1   K06909  xtmB; phage terminase large subunit
MYQLNPNLREFWRIRKPYKLLKGGRFSSKTQDAGGMAAFLARNYTVKFLCIRQFQNRIADSVYTVIKEKINQAGWADEFDIGVSSIKHRKTGSEFLFYGIARNLNDIKGTEGVDVCWIEEGEGLTEEQWSVIDPTIRKQGSEIWILWNPDLMTDFVQAKLPRLLGDDCVIKHINYADNPFLSDTARSKAERLKEADEESYNHIYLGQPRTNDDAAVIKFSWVEACVNAHLKLGMSLSGAKAVGYDVADSGEDSNACALFDGAICFDMDDWKAGEDELNESAMRAWSHVRGGRLIYDSIGNGAHVGSTLKAARIHGGYFKFNAAGAIVNPDKEYAPKIKNKDKFENLKAQAWQDVADRMRNTFNAVTKGHKFKASDLISISGDLQKIEQLKIELSTPRKRYSKRGLDMVETKDELARRSVASPNLADAFVMGACPHLVANSRPIRDLL*
>122_fragment_2_141 (127779..129155)    1   VOG02778    sp|P44183|Y1409_HAEIN Uncharacterized protein HI_1409
MSKKGLVPADKKLGKALVRAAHKYEAQIKSSSDGLVNVVSGLGTQKAKRSHNQFQYGFLNDFQQLDAAYQTSWLARAIVDYPAEDMTREWRTLKCDDADVIRAEEDRLNLPAMVSEATSWARLYGGAGILMLTNQDLTKPLKPEKIKKGDLYRLLVIDRFDMTAMNLNQTNILAANYLQPEFYTISAGAQQIHWTHFARFAGAKLPRRQRAQTQGWGDSELRKCLDDVMDIVASKDGIAELMQEANVDIIKRVGLSDELASDQDDAITARYALFSMMKSSINLALLDDQETYDRKTLDLSGVAPVLDLLMTWISGAAGVPVTRLFGESAKGLGNNGEGDNTNYHNQLSSKRLTQIDPGLRQLDEVMVRSATGRWIDDFNYTWNPFKQPDLVQIAQANKANAETDIAYKDAGVITTSQIQRKLQAQELYQFDDEKIEALEAEEDLTMFNDPVGDDDKVE*
>122_fragment_2_142 (129158..129973)    1   VOG01506    sp|P71385|Y1407_HAEIN Uncharacterized protein HI_1407
MDMIGIQYNARLQRLVKQVKADIAKEVMPLVRQLAPEYTQDAVVTTDAWSDLIIAAMRRLTSKWASFGVDAGADRIAGEFVQSALKKSERDLKKSMGIDVFSGSKTLQDYLKASAQQNAQLIKSIPAKYLDEVQTLVMANMRSGMRPGFIEKALQEQFGVSQRRAKVIARDQTGKINGELAEKQQIGAGFEYFQWIDSDDRRVRHRHSEIANKVTAYGKGIYRWDDLPLSDSGVPIKPGSDYQCRCIARPVSAREVKANQDAGRTAPGVLR*
>122_fragment_2_143 (129998..130225)    1   None    hypothetical protein
MKIKIKHIYTGNESEFDTDNYHIAVQMTPTDLENIKSLPDTEDGRSIEGNEHRTYACIRPADDTESDALFAWAKQ*
>122_fragment_2_144 (130225..131385)    1   VOG00976    sp|P44180|Y1405_HAEIN Uncharacterized protein HI_1405
MSRQTVFDRVGYRITQREYTDEGFLKVPGRVARTGIQEYLARELGLDGDPMRVVKVYRPEEEVFKDESLSTYDASAVTNNHPHGLVTAANYKGLTVGVVRGSGRRDGDFVVCDLIVKDKATIDDITSGKCELSAGYTAVYDDTPGVTDDGEDYHYIQRDIRINHVAVVDRARAGANARFFDHNPGGNTMPVLITTDSGRSVDVADPANAQVVADSFDRLMKRATEAEAKADKAQASADSAAEKLGDALKASSDEAISTRVTAISSAHALARKVAGDSFTCDSMDVTEIKRAALAVALPKRDWAGKSAGYVEAAFDAESDKDEDEDDKDEDGKPKVKKPTGDAATLLAQLTQLALDGAKPAAVADGKPTPYQAHKQSLSGAHKSKGA*
>122_fragment_2_145 (131388..131882)    1   VOG01175    REFSEQ hypothetical protein
MPVQGGNAINHGVAYAGMVADGEVSNGVSKVNKGTVNIAYGLGVVTDGDDGAKLPVAASTAAQFIGVVKRELNRAYTQSEVFGAVAKRDMTVETMAPIWVTARVAVAKDDPVYLVVGDGTGAFQGQFSNVVGAAATLAVLIPNAKWVSTAAAGALAKISLKIGG*
>122_fragment_2_146 (131888..132913)    1   VOG00793    sp|A0A0U5AF03|CAPSD_BPK22 Major capsid protein
MKLKKIVVAIDAAIAYQIGRDAHEVTFNDGLPTIDDGLAFYISQLASLEARIYEAKYAAINYMELIPVDTSLPEWVDQWDYISYDGVTIGKFIGASADDLPDVAVNANKSVVPIGYAGNKYSYSLDELRKSQALRIPLDTTKAKLAFRGAQEHTQRVAYFGDAARNMTGLFNNPNLALSNSTLDWYNAATTGDQIVADLNKILVDVYINSATVHVPDTIILDATRFAFISNKRMGTITDKTILEYFRTNNQFTALTGRPINIFSRLQLSAAQLAAAGVSNANKDRIVAYELNDENLGMQVPIPWRSLAPQMWNLKVNVPCEYKISGVEFRYPFSGAYRDQF*
>122_fragment_2_147 (132982..133287)    1   VOG02622    REFSEQ hypothetical protein
MFLKNEAARLITINHLVGEKETSYPILPGENPAVEVPDAVVKIDFVKALLSNGDLRRVGADEIENDDGEEDLFAEAEALGLKPEKSWNEDRLRAAIAKAKK*
>122_fragment_2_148 (133330..133749)    1   VOG00306    REFSEQ hypothetical protein
MIITPEMIAAFRSNPVLKAFTDATKWPDEYIVEALCEAGTETGSSRWGALELTCDNFKWRGMQYFAAHWLATNFATLGANGTPNSEARLNVAQKSVGDESIAYRVPQMMDAGTDWLTYTNYGQQFYRLKKRAGMGAKAV*
>122_fragment_2_149 (133750..134349)    1   VOG01950    REFSEQ hypothetical protein
MINIDLTGFQDLQDELSRELAALRTNKIVTVGIHEEAGDVESGDLTMASLGAINEFGADIKHPGGTSYGYANQASAERGEVRFLKKGAGYMELGVTKPHDIKIPARPWLEPGVASATPEVLLTIQDGMEAGHSMDQILEMVGLVAAGAVKIYMTDLKTPPNAASTVRKKKSSNPLIDTGAMRASVTHKVSIGPSEEGLE*
>122_fragment_2_150 (134346..134717)    1   VOG00454    REFSEQ hypothetical protein
MSLNMEGQIDLVFVSVEASRTVDVGGQWVDGIWTPGTPDTKPYVVNIQPASDREVDFIRQGGERITDVRRIYINQGEMQLIDQTGTWAFLGQQWKTVKCDNRYWRNYCKVLVMRIDDQSGGPA*
>122_fragment_2_151 (134714..135280)    1   VOG01012    REFSEQ hypothetical protein
MTNEELFKKLRPIVMLATGVPECLLADQAGPGSMPAPQGAYATITPRQSISERGQANVVSRNVPGEQVEVQVRAQIMCSCSVNFYRGEAVMFAELLKQANKRPDISIMLFKSKIGWNSTDAVNNLTSLQSANFEQRAQITIRLMYETVSLPVINNILSASVAVENEESQVLQTFSVEIDPTQPMERSL*
>122_fragment_2_152 (135277..136419)    1   VOG01167    REFSEQ hypothetical protein
MSYPATNIIRVNARISPAGLGNANFASAMLFAPQTALPVGFAPDTYRTYSSLPELSEDFDDTTDVYKAAQRWLGGTPATRELKVWGAATADATRTASLNKARNTVWWYWTMWTAPVLAVIGDVLNIAQWCEDNGSMFIDNQTGAAVEDIRDPAVTDDIASQLTTFGFRHAFTAAHASDAYAGSALAKHFAAVNYSATRSTITGEFKKSPGVAAESLLTTEYSAMQSAGKKAVFYTAVDNQGSVDVGRWLNTFTHSSFGEYIDDVVNLDACINYLTTSLYNTIANQPTKLAQSPVGQAVLIGAAGAIMQLFIDNGYLGPRNYIDPDDGIEKYTKGFEILTKPEDILDLSDADRAARKSAPLRIRLFRAGAIHIVEADLDVY*
>122_fragment_2_153 (136502..136909)    1   VOG01803    sp|D6RRG7|ORF10_BPKPP Structural protein ORF10
MALSNFSTDLTVVTINGRQIQDWGDTATPYTDAPIDATSQLRRGQGGGGIRLDRINPGREVNIFLNPGSADAAYVQGLFNSRANVTLTYTQIGTLDGAIGTEGVIVNDGQRGRAGSTINDDQFTMQFNIWDGTRG*
>122_fragment_2_154 (136913..137314)    1   VOG00124    REFSEQ hypothetical protein
MSVRAFTVGGVQYNAAMASAVDQDRLMSLLSAAVIERFAVAAREDQVVSADVLCSMFMSMRQDAKAQVAQMLMGKVVVNGSDRAVTVADFGGKMVHYNQLLTELLLWNLTDFFEWLPSGASGDRQRETGSPAQ*
>122_fragment_2_155 (137471..139528)    1   VOG21899    REFSEQ putative tape measure protein
MASKVLKSFLIGIGYDTRSLEAGDKKINASLNGIRSGALGISGALIGAFGAAAGSIAGVANRVDKLAMSTQNLRTSQAAVYSFGNAVKLMGGDAVDALDAIKRFEEIQNNLRLKGDAGPISDLATAGIDVSSLYETKTGEEFMRALADMLPKLDEGQSNQVQSALGLSDGVFRTLKGGADQLDEAMKRASGLTGSVDQLTEDARKLAENASEFGLIIDGVTNEIAEKFLPSLVGAGTALNDFLKESRGKISNVIDYSADNPEATAVLGLSSVTAMAGAVMAKLGLSTIGGAVSKAGTAGLAVTGGAVGANVLNKTLDEKVPGYKGASEGFDEFLKSVTGLDRIKSPIEVLFGKPISKRSDEGEASPAAPAEVGKWPDIEHEAFGSGEAKEVKSQSVDDMVKAIQSAKRSVNGPPAEAAATASVAVPEIMPMARDDKPETSLEAPPAARIIPESPEVAASSEQKHTDKREPAPPIDAAPRISANEAYGDPPEPVVTILRENDETPPVVVAPEPTQAKVDAPIDDRRDRRIELIDVGARGLSRGEDSAKRAPAKAPEVSSPRQGVRIEDQSLGGPAQMPGHDKDDREAPPQSPWTDAPLFKQIFGERSSSDMMPPIHQVDTGQGSKETSAQGVDDIVRALQAAKLKVENNQSFTIQLDGQAIEAKITQVNERLNYETLNDLKTTTER*
>122_fragment_2_156 (139531..140187)    1   VOG00491    REFSEQ hypothetical protein
MSIINIFTRKAPTIAGYAFDAVLEDTFEATVTITSVPVESGVRISDHRILNPFKWTMSGAISNNPVKVQLTDFLGGALSNLTDNPIVSTVAGLSAGWLAGSDDTRASSTLDFLVWLMKSADPFDIDAGDILLKNMAITRLARTKEPRNEGGLEFVVELQEVIDLSRIQRSLQCTPDQLREGDPSKSALSRAINRGQAIAKEAADNVSDAVNGILDGVI*
>122_fragment_2_157 (140187..140495)    1   VOG00111    REFSEQ hypothetical protein
MLVIPLRAGSSNAHQRFGVQLGENLIDFEVDYVSYLDEPAWSMNLLRDGSRIVSGAMLEPGSDIIQIYRTGIGQMVFTGKNVTLDNLGVDNFLVWIAPVVDI*
>122_fragment_2_158 (140492..141907)    1   VOG00777    REFSEQ hypothetical protein
MRERVWSIDVDGQPYIEPQTGRRQFRIQFNIDISPGDAISFADIRLFNLQKGSSIPQKSGIVLRAGYDDNVDAIFTGYVTNTMRERPPGAPEVITRLICRSGQPIADRASAQLSFGVGTRVEEVLRALARAWPLPIDIDNAQFADDKPLSSGLIVDGDIPTAISDLAYAYKFEWVQDRGRIVITKTGMPRTVTPIRVDMFSGMIGIPEVSRGPDGLGVFVAVQLNPSLRINGKIDVESEFSTFNTGNLFVSELSGDATANGEYNIFAMKHSGDSHTDLWRTEIDGLRSGTTPKKDDVATPENGKLIWGARVDQAFRVKVREIGDRLSMDPNWLMAVMGFETGYTFSPAARNPGSTATGLIQFLEASARQVGTSTAQLARMTAVKQLDYVEEYYRPYSGRIRNLGDAYLAVLWPIAVGRPDSYVMWSRDSGPYQREYAANSGLDVSRDGVITRGEAVASVNTSYLRGQQFVR*
>122_fragment_2_159 (141975..142682)    1   VOG00751    sp|P31340|SPIKE_BPP2 Spike protein
MLETEGRAKQAKLIRDAFRELMKGVCTSIPGHILTFDPGTQLAQVQVGITRVDINDAEFTLKSIIEVPVYFPGGDYCIEYQIDPGCEGDILFSQRCIDGWIQSGGVAQNPIGRFHNMQDAMFLPGFRSRPGAISGFQNNGVRLRNRDGVQTVWLKNDNTISSSNGEVRFDLNPDGSTVMKNESGSFQLLADGSFLINGLKITTDGDVITAAGISLNQHRTSGVTGGNQISGVPVI*
>122_fragment_2_160 (142679..143026)    1   VOG00195    REFSEQ hypothetical protein
MTVRRLDENGDIVTQGQQFVNGREEVRQTVLTRLRLFLGEYFRDITDGTPWYEQILGKFSNLSAAEAALRARIANTPGVIRLTSFNADFDIETRRYSITAGILTEFGTDEVTLNG*
>122_fragment_2_161 (143019..144233)    1   VOG00243    sp|P51767|BPJ_BPP2 Baseplate protein J
MASLTSTGYVLLTQNEWFASERQFYLDIDPLWNLDPSTPDGLKMAHDAEIFYALDETLQRAYNSKDPNKAKGIDLDIICSITGSIRSKGSPSSVQLTLTATPGTQVLQGNRFESSTTGSRWSIDQTVTAPGTGIVSVNATCTVVGPTQADINTITKIVDVVAGLSGVTNAAPATPGADGQRDEQLRVTRATSVGRPGNNQIDSMIGELFSVFGVRRVKVYENDTGSSAVSTSNPYGLPKNSIAPIIDGGSDADIAMAIYVKKNPGAGLYQAGTPFEVLVTSPKYPANQKLVKASRPIYVDMILVINIKNDGTLPTNADQLIKEAVMEYAAGDLIPADVGFKISGFDIGESVPYSTMFTPVNKVIGEYGNSYVTLLQLNGAQANTAIAYNQMSRWTESNITVVIS*
>122_fragment_2_162 (144230..144874)    1   VOG01943    REFSEQ hypothetical protein
MMNIPNRVYAQYWDKPKAVDWYAIARKLGGSIEDAAEAVRKSYDIDTVVGEQLSVIGRIVVAPRSFVGAIPMTPGLFDLTDGDEFGNDDAMFSALTIDQDDQLSDELYRLVIKAKIIKNNGDATIENILDGMNYLLPTADVLRVTDGEDMTFSIEFYGQISNLERFALLNAGLVPKPQSVRFNGFLEGFEMFEFGDVDAEFGDEGAEFIGFIGA*
>122_fragment_2_163 (144876..145787)    1   None    hypothetical protein
MALKLNERYPGRFDNPSAGYPQGSFKNRTSPTAKDGSYLERDWANDKEGFFQSLLSSSGFVANGTVDKVGASQYFDSLVSAIRSKATGRVISQQLFTSTATYTPTPGIAFAIVECVGGGGGSAHVLATGASQYATTAGGQAGHYTRSRFTAAQLAGGVLCTIGAAGVGANAGGITPSSNGGSTTFGALVTAGGGNRSSVGILSSGTYLSAPSGAPTTVFGSFQYSISGQPGSWGIYSSGSGQLGGNGGSSMFGGGGIGVGIGAAAQPGSGYGAGGGASSLGPNSSSIGGAAGSGGLIVITEYI*
>122_fragment_2_164 (145787..146161)    1   VOG11448    REFSEQ hypothetical protein
MRTYARVTGGEVAELLSTDQDIDKNFPEDFVATLVDVTDVKPSPAQGWIATKKKSKWSFQEPDHNSYAMTPDGVKDMRLAAYRQFSDHLKLEAEFDAISSGKEPDYSAWLAKVEEIKALYPMPE*
>122_fragment_2_165 (146167..147342)    1   VOG01432    REFSEQ hypothetical protein
MLPFIYDDNVSTMRDEGVDTSRWTLSTPTAGNIMVVGSSLKISTAGVGVNYSQPVNMPPENDDFIIYVKLKAEYAQGKASVVHFNGLDGKPRIGFALGYSYVSQTLSLGQLSVINKDGSAKSDYASINYSESWCDLAIHGCRSLGTYRVYLRDANSEWLSVFSGDISQIADISSVVVGSQFSLSQPSHLYLDHILICRPNIVSIGDSICAGYAVPADPYVGWQKYAKLYSWLRNDLIVNLGVPGNSSQQISDRIVSSSFAGARLVFLHASSNDFRLGVSASDRTQITQRSITAINAFGAKCVLINGIYPNSRYVNADYQAETAYQKQWWESSAITLTGLSGMIDIMLCLKGVMGAYIAEALARLEDGKHPNMPGTVLMGRLIKSLGTISTG*
>122_fragment_2_166 (147401..148063)    1   K10804  tesA; acyl-CoA thioesterase I [EC:3.1.2.- 3.1.2.2 3.1.1.2 3.1.1.5]
MLKRSLLKGILVSAAALIIGCSAGVHEKPKVLIVGDSISIGYTPYVKGSLEGRAVVTHNAGNAQDSNNGVSNIDAWIGGGRWDVISFNFGLWDLCYRLPGPITATNRDKIHGTISVPVEQYRANLRIIATKLKATGARIVYQTTTVVPAAEPGRYSSDVAIYNDAAKSVMRDLGIPVNDLQAVSAALPDSMRESNTDVHYTEAGYSEIAKSVTASINGLL*
>122_fragment_2_167 (148123..148668)    1   K03791  K03791; putative chitinase
MPITAQQLLQMLPNAGQRAGVFVPALNTAMGKYQIITRERIAAFIAQIGHESGQLRYVRELGGSEYLSKYDTGKLAERLGNTPEADGDGQFYRGRGLIQITGRANYAECGEALGLDLIHHPELLEQPEHAAMSAAWYWGSRGLNSLADKGDFLQITRRINGGTNGLADRQALYDRALKVLA*
>122_fragment_2_168 (148665..149183)    1   PF10721.9   Protein of unknown function (DUF2514)
MTGLYARIGGVLLILLAVAGALYGAYRHGVSVTDSKWQVKWAEQVSTQAQAVATTTTEYRTEEQRRQKAANQVANDARQEQAVAIADAAGADAAGDRLRSEAGKLAASVSCVPSDPGIADRGKNATRAAMVLSDLLGRADARAGELARYADRLTVSLQACEAFNVSISPSSH*
>56_fragment_1_18   (15411..15920)  -1  VOG01059    "sp|P00726|SPAN1_LAMBD Spanin, inner membrane subunit"
MNGLDLRFVLLAVVVGSGLGGWLAWEWQATRYEQQLSEQAMACLQERELASRAVSDWQTAEQARRRALEVRLQNSDTTLHKELSDAQTSQVRLRDRLATADLRLSVLLATPSSGAGVSAATDSGRVVHGGPRAELDPTAAQRIVAITGDGDQGLIALKACQAYVREIAF*
>56_fragment_1_19   (15917..16462)  -1  K03791  K03791; putative chitinase
MPVNQQQLLHILPNAGLKAGVFVPALNIAMTRYCIDTRLRVAAFIAQIGHESGQLRYVRELGNDSYLAKYDTGQLALRLGNTPDADGDGQLYRGRGLIQVTGRANYEACGEALGLDLLRQPELLERPDHAAMSAAWFWDRANLNALADKGDFLMITRRINGGINGLADRQALYQRALEVLP*
>56_fragment_1_20   (16486..17490)  -1  None    hypothetical protein
MKTVSSARNPGWADQAHTTLNLWVIFEENKDSGREEGISISANDQDPQVVALFNRAVAGEFGVISEPSEQMVRIAVMMQRGNYSADASRKIDALTNDLSVLQNAVASGTATQAQIESLPALQAELDAYMAYRADLAHLEDIPGFPMSFVWPVPPASPFVYVKPPEVPTPPTGVSDDELPWVMSSIRNPRWADQSHNAIVLLVVFEQTKDTRGEEAVTVSFNDPRPQARKLFDRAIFSEFGPVLEPLEPLEPVVTVDGRVQRDRYAAMATAKIEALNHTLSTLQSAIEAQLKSLPALQAERDAYWLYRVQLAQLDALPGFPVSFEWPVAPATTFV*
>56_fragment_1_21   (17626..18138)  -1  VOG03347    sp|P03740|TFA_LAMBD Tail fiber assembly protein
MHTVLSARDPRWSDLAHTSIEMWVLFEEMKDIYGEVPFAASPKDSEPHGVDLFNRAVAGEFGEVLEPTEQTVLTLVTLQREAFSATATARINELVAELDTLQDATALKMETESQVNSLPAIQAELNAFRLYRVQLAQLETLEGYPAKVDWPVAPAKPFVYVQPVEEAVSD*
>56_fragment_1_22   (18149..19531)  -1  None    hypothetical protein
MDYPKSVPGVGLVSGKFVDENPATGTPGSLIPAQWGNSVTQEILNVILGAGLVPSEADVTQLHRAILGLAASDYKKSVRCATTMAIGLSGLQTIDDVTLVAGDRVLVKNQDNPAQNWIYLAAADAWTRAQDANESTECTPGHLVPVQAGTKNGGTVWQLTNTTAPVLGTTGLVFERALGRSGVAAGSYSRVKVNRYGQVEEGSSPTTLAGYGVTDAFTKAEVDLRDAARPLRDSITHVGMANNQPDAPYMRRESDNGVYYLQSRLGFTPVQQGGGAGQLANPIKIGWSGSSLKAAVDSTDLGNLWYSRNFNPDDKANRGTTLEAYGITNAYTKAEVDLRDMQRPLADSINILGFASNNPLYPYMRRSSDGQVYNLVSEQGLAARIAALGLSEVGSYAFARVINSIGPINQGGLVAGNNLIYSSTSGSDGGTNNSGTIGIGTWRAHGAFSSGERTLFQRAS*
>56_fragment_1_23   (19542..20141)  -1  PF10076.9   Uncharacterised protein conserved in bacteria (DUF2313)
MVVIRTAEHYAEQLQALLPPGPAWDPERVPEVQQLIAGLSHEFARIDGRAFDLLNEMDPASVSELVPDWERVMNLPDPCLGLKPLFEDRRLSVRQRLVAVGGQNAAFYVGIAVSQGYPDASVTEFRTPRMGRSRFGQAHFGTWNAQFMWTLNTGGRQRLGRRFGASYWGERFGVNPGTAIECLIRRAAPAHGIEFVNFN*
>56_fragment_1_24   (20129..21169)  -1  PF04865.14  Baseplate J-like protein
MPFETPTLPALVNRTQVDLAGDALRQSDARVLSRAHSGAAYGLYGYQDWIADQILPDTADEETLERQAILRLRQPRKPAQPATGTVRFVAAAGAVLDVDTILQFSDGRFYRVTQGVTTVAGNNTTTVEAVDAGALGNADAGQVMTVVQPVEGIDSSFTVIADGLTGGIARESTESLRARVVRSYRVIPHGGNQDDYVTWALDVPGVTRAWCVRRYMGPGTVAVFFMRDDDATPTPDAEQLAQVAAYIEPLRPVTAELYVLAPVQKPVTYTISLTPDTTAVRAAVQAQLADLHNREAGLGETLLLTHIAEAISRAAGETDHVLISPTANVTAAANQLLTFGGILWSS*
>56_fragment_1_25   (21159..21557)  -1  PF07409.12  Phage protein GP46
MIIEGSLQASLLRSVIISLFTWRRAEADDPFDDAERFGWWGDTYPAVANDRIGSRLWLLRRVKLTAQTQRDAEFYAREALSWLIDDGHVQRINIFTEQVQSNRLNLGVELVVPDGQVVRFNPSEQWQVIYAV*
>56_fragment_1_26   (21557..22066)  -1  PF06890.12  Bacteriophage Mu Gp45 protein
MSLFNRMLVRGTVVLVRASSKMQALQMRLTAGEVRGDMEHFEPYGFTSNPLAGAEGIAAFIGGDRSHGLLLVVADRRYRLQGLESGEVAIYTDEGDKVHFKRGKVIDIETNTLNINAATAVNFDTPQITQTGKIVSQGDQVAGGISQITHLHGSVRSGPDQSGPPVGGG*
>56_fragment_1_27   (22063..23169)  -1  VOG00534    sp|P10312|BPD_BPP2 Probable baseplate hub protein
MIDPNVVTLTVDDKDYAGWKTVEISAGIERQARSFDISLTWQWPGTDMVRPVRAGARCAVSIGGELILTGRVFATPVSYDDKQITLKISGRSLTADLIDCSAINKPGEWNDVSALTIVRELAAPYNVKVLSEIPETSRKSKHTIEPGETVFKSIDRLLTVFRIFSTDDEYGNVVLARPGSMGNAVDAVELGRNVLSAVAPLDFSGLFSEYQVIGQQAGNDKTFGKAASEVSASVTDSSVTPARVLVIHEESPITPALALSRAKWERGHRQGKTRLTTYKVQGWRQANGALWRHNTLVRVVDSILDLDQEMLISAITYSLSDKGTTTTLVVGPIEGFEAEPGDPEKRSKVPVNKDAYSYAQPNNEGTLA*
>56_fragment_1_28   (23173..24666)  -1  VOG01137    sp|P71389|VPN_HAEIN Mu-like prophage FluMu DNA circularization protein
MSTWRDSLLPASFRGVGFFISSAVVPIGRKGQLHEFPQRDEPYFESLGKQSQVHTVTAFIVGPECFEQRDKLLQALETSGAGELVHPWLGRMQVRVGDCDMTHSLAEGGIVRLNLKFYPDQPLKFPTSTLNTGRQLMQASDGLLGSALRRYRAVMATVDAVRINIQALRSTLSGVFATIQRQFSSFMTVYSDATALVHSLVNAPYTVSTMFSTFFASFQGDSRRSSRERGSSNVGAGGTGSGSGSAGGGSTGSGSGSSGSGNTGGSSGSGSGSSGSGGSSGSSGSNGTGSVGSVASGASVSRNASGVEAVPYRSIISDATQQAQAVSSINQVNQGGGLDTGVTAQATADLVQDALLVKVARVVASMPVAVSTTPILVVPSLDQQRVQPLQRADVPVADDVIELRDTLSSAIWDASLKADPEHYLALNTLRHALISHLNAVAASGVRLQDMKVSEPLPALVLAYRRFGDASRSREVVQRNRIPHPGFVPPGTLKIAQE*
>56_fragment_1_29   (24663..27191)  -1  VOG03144    REFSEQ hypothetical protein
MADTIKTLITGVDQLSPTLATVSKNVKGFTEGLESSGLAEVPLKEMIGESTLAQPLIDAVKAAMGFETSMAGVKRSVTFETPQQFQAMSRDILDLSERLPESASGLAAIVTEGAKANVPRAELTGFATDAVKMGIAFDQTAAQSGEMMGKWRSSFEMTQPQVAALSEKINVLGGNNLEKQIATMVTAMGPLGPVAGRASGEIAAMGATLAGVDVPTDVAAKGIKSFMQSITEGGAAKAGAFEALQLDINQLTQGMQQDPSGTIEKVLKAISTVDPGTQSAVITQLFGEESLGAITPLLKNLDVLRSNLAKVGEGVQSAGTIEQEFQANSETTAVALKEMENRVDRLSINIGSMFLPAMNETMAVIGPMISQVAALAAEHPGVIKGVVGAAIAFGVLQVAVIAATNASRVLSSVLGMSPVGIVVRALALAAGLLIANWSTVAPYFQAVWAAIREPVMALWDVFKRVFGWSAIGLIISNWQPLSAFFVGLWDGIKVLAAPVFDAMKTLFSWKPIDTIVSSWQSLSTFFVGLWDDINVLAAPVFDAMKTLFGWTPIGMIISNWQPLSEFFSALWGVIQALASPVIGYFQSMFDWSPMDTISSAWQPVSSFFSGLWETIKAEAAPVTDALASLFDVSPMELISAAWQPVSGFFTGIWEGIKAETAPLMEALTGLFNWSPMDSINEKWAPIKTFFSGLFTDIKPFIDPILNWFGIGSDDKSLLQKATQKLNEFAEERRVDNAGPGGGKGAFLTADAVQVSQLKQQQINQAMGIPATSQLLSAPNLPAPGSLLLQQGTGVGSRLEGELNIRFENAPPGMRTEQMQTNQPGLTISPSVGYRTLGAGAAS*
>56_fragment_1_30   (27322..27618)  -1  PF10109.9   "Phage tail assembly chaperone proteins, E, or 41 or 14"
MSEIIELARPIEAHGETVSQLTFRRPTAQEARAIKALPYRIDKNEDVSLDLDVAAKYIAVCAGIPPSSVNQMDLCDINTLSWKVASFFMAAASATLKA*
>56_fragment_1_31   (27615..27962)  -1  PF10618.9   Phage tail tube protein
MAQKVAGTCYIKVDGTQLTISGGGEAPLMNVKRDTVVPGYFKEVDKAAWVKFKAVHTPDMPLKLLTTGVDMTITCEFNNGKTYVLSGAYLVEEPSSKADDGTIDLKFEGSQGSWQ*
>56_fragment_1_32   (28030..29526)  -1  VOG02914    sp|P44233|VPL_HAEIN Mu-like prophage FluMu tail sheath protein
MAISFNNIPSDVRVPLFYAEMDNSAANSASAGMRRLIVAQVNDDVTGPEIGSLVLVPSVALAKNLGGQGSMLAAMYETWRKADPTGEVWCLPLLNTEGAKAGATVTVAGAATETGLLNLYVGGVRVQATVVNGATAAQAANALSVKINATPDLPVRSVVDAGVLTLSCKWSGVSGNDIRLEFNRLGKTNGEAIPAGLTAEVTAMTGGVGTPDQVQALAALGDEPFEFLCLPWTDTTTLDAWKAAMDDSTGRWSWARQLYGHVYSAKRGTVGTLVAAGQLRNDQHITIQAVEMAAPQPVWLQAAALAARTAVFISADASRPTQSGTMPGLDPAPASQRFTLTERESLLRYGIATAYYEGGYVRIQRSITTYQKNAYGQADNSYLDSETMHQSAFIIRRLQGVITSKYGRHKLASDGTRFGAGQPIITPSTIRGELIAQYARLEEEGHVENAEVFAQHLIVERDGNDPSRVNVMFPPDYINGLRVFALLNQFRLQYDEAA*
>56_fragment_1_33   (29545..29730)  -1  PF10948.8   Protein of unknown function (DUF2635)
MTQRITVLPGEGRTVPDPEAGDLLPAEGRVVTFNAWWQRRHNDGDITLQTEQSPTQSAETA*
>56_fragment_1_34   (29727..30317)  -1  VOG10943    REFSEQ hypothetical protein
MKITPIVAHLQATCPTFAGRISAGIDWAAVALGDQLAHPSAYVIATGDLALANDLQNVVRQIITDRLDVVVVLDGGDKRGQEASEQLHAIRAELWRALVGWSPEREYDPMQYQGGALVQISGDRVTYRFGFAAQFQLGRNLESQPAETWHEAYLDGLPGFTGATFEMDSIDPADPNLKYPGPDGRIEVKFSGDVKP*
>56_fragment_1_35   (30404..30742)  -1  None    hypothetical protein
MHRCVIDFIARRAWRQIEVWVIAALMIAGCLMLGFQAGQWSANAEHTQQLAEVRKAYDAALGRRDLRLDRLAESTTEAAGKVESAATVASEAAHTASRAADKANKVLDKATQ*
>56_fragment_1_36   (30723..31112)  -1  None    hypothetical protein
MDPTDLGPGTATWLGGTGTILLGGFLWLRKFLSRDAADRAMDNADIGTVRRLNELLDSERQIRKEAEARADQFAKERNELAAAVGRMEGKIEALTSHIVQLTDKVTTQSAEIARLRSQLGGANDAQMRN*
>56_fragment_1_37   (31417..31854)  -1  None    hypothetical protein
MIEEVEAVMVHWGEQRNRIGLSGGLSSPMAGIMEWGAYIPRSTPGSRSLIGNGSGMDYISSEVEAAVAEMARSPARSRGPELAQLAALRYVESLPVREQMRLLGINEGADRTYRNWVDKLHQSVLAALSARSASRSNRKEVARRA*
>56_fragment_1_38   (31985..32641)  1   K01356  lexA; repressor LexA [EC:3.4.21.88]
MIRRMNKWYEVARQVMDTQQISQEEMAERMGVTPGAVGHWLNGKREPKIEVINRLLGELGLPILTTSIPWNEPGQQNVAPTEQPSRFYRYPVISWVEAGGWNEAVEPYPVGYSDTFELSDYKAKGRAFWLVVRGDSMTAPAGQSIPEGMLILVDTGIEPTPGKLVIAKLPESNEATFKKLVEDAGRYFLKPLNPAYPTIAISEECKLIGVIRQMTMRL*
>56_fragment_1_39   (32789..33046)  1   VOG04172    REFSEQ hypothetical protein
MQNLRPEASQHDAYLALAQRIQDLITSPKAQIEHQVLLVREPGESPVHWEQIVEQISEAEGINVTRNFENGSVNVSWYVESADAY*
 

Как вы можете видеть, в геноме есть много мест, из которых извлекаются эти данные. Мне нужно создать отдельный файл для каждого из этих местоположений на основе первого числа (т. е. 56_fragment.. против 122_fragment..)
Поэтому, по сути, для этого одного файла мне нужно, чтобы он был разделен на два в зависимости от того, совпадает ли первое число после>. Мне нужен систематический способ сделать это, так как каждый из моих 1534 файлов, отформатированных так, как описано выше, имеет разные выходные данные.

Мы будем очень признательны за любую помощь.

Комментарии:

1. Вы хотите получить помощь с каким-то кодом, который вы уже написали, или вы хотите нанять кого-то, кто напишет какой-то код для вас?