#bash #fasta
Вопрос:
У меня есть много файлов в формате .faa (формат FASTA), которые выводятся из VIBRANT и содержат множество фаговых белков, разделенных их заголовками. Вот пример вывода файла (имя файла= plate11.A10.faa):
>122_fragment_2_95 (102956..103258) 1 K04764 "ihfA, himA; integration host factor subunit alpha"
MGALTKAEMAERLYEELGLNKREAKELVELFFEEIRHALEDNEQVKLSGFGNFDLRDKRQRPGRNPKTGEEIPITARRVVTFRPGQKLKARVEAYAGTKS*
>122_fragment_2_96 (103239..103595) 1 PF13411.6 MerR HTH family regulatory protein
MLEPSHNDELPPIPGKRYFTIGEVSELCAVKPHVLRYWEQEFPQLNPVKRRGNRRYYQRQDVLMIRQIRGLLYDQGFTIGGARLRLTNGEVKDDTQQYKQMIRQMIAELEDVLVMLKS*
>122_fragment_2_97 (103843..105027) -1 VOG00041 sp|O10330|VLF1_NPVOP Very late expression factor 1
MAQKAITGLQKMPNGIWKIDKKYRGERIQESTGTCDRAEAEQYLIHMLEKLRQRKVYGVRQVRTWREASIRFLMEVKNQPSIHISATYMSQLDPFIGHMPITHVDDDALAPYIRSKLEPENGKPVTNRTVNIALQRVIRVLNLCARKWRDEERRPWLDVVPMISLLDEKTNCRKPYPLSWEEQSILFAELPAHLQTMAMFKVNTGCREQEVCKLQWDWEIAVPELGTSVFLIPAGFGGRSAKAGVKNRDERLVVLNDVAKSVVEQQRGKHKLFVFPFGKPDGEGNETTVHRMNDSAWKKARIRAAKKWQEKYLRPAHDGFLRIRIHDLKHSFGRRLRAAGVTEEDRKALLGHKNGSITSHYSAAELDQLIAAANKVSATDSRAPALTILKRRQL*
>122_fragment_2_98 (105051..105527) 1 None hypothetical protein
MTLAAGLIVVLIGCLFNRLTLDVGIRPRIQLQPIKADALFSNGKFPHVWAHGLVEFVTTHAQIAVGITCPDEPGQDWRYLGGRFVCHRVTAPGRAGRRKGLFPVVTEQWVDRRIALKRRQIQQREMPIHPACCIDVVDVSEDGRLSKRRMPYDQRLDT*
>122_fragment_2_99 (105351..106085) -1 K07313 pphA; serine/threonine protein phosphatase 1 [EC:3.1.3.16]
MLETIEVVRIKRFAENTAGRDFAVGDIHGHFTRLQVALDAAGFNPAVDRLFSVGDLVDRGPECEDVIKWLNKPWFHPVRGNHDDYVCRFDTCDIGNWMYNGGTWFVGLPLDEQKNYQVMFDELPIAIEVETAGGLVGIVHADCPFPSWDELRAELESPQTRKRLKLVHNTCMWSRSRIQDADASGVSGIKALVVGHTPLRQPAILGNVYHIDTAGWMDGHFTLLDLATLQCNPPINPLLSHDWE*
>122_fragment_2_100 (106135..106302) -1 None hypothetical protein
MNEIQLDLIPRKITLPHPPRHTGFLEPDRQVPGYTLEQMIEYGKACATEAVKQSK*
>122_fragment_2_101 (106299..106451) -1 None hypothetical protein
MSREAFEQAYAEDNNCDLEWCQGQRLDNGSYRDRYMARAWHWWCRAKEAT*
>122_fragment_2_102 (106451..106882) -1 None hypothetical protein
MEKNKLGPDHYRYVDELDPKGLEVTCKRYVVIGETDQCWYIVSEFHDKLFGGSQRESLLKQYRKRVLKDGGEHGRRFAYTDKALALRSYKQRKSWQVRHAQLSLERAKAAIAYFGDTRTESTVPPDNLMIPCEYIQDMNWSEC*
>122_fragment_2_103 (106946..107464) -1 None hypothetical protein
MGLTNNKPNDVRVVPVELLERIVSLNNSFADHRQAQHDLVKFLAQPADQQGEPVVWGAPETVGQLIRQLQTLDPALETVALYRLPDHIPGVGGKVKQGHISTSYERMEGIWLGPYKGDGRKVLAFWTKLDPRPVPDGEFLMQTPPPDRELVTRRCECPVCWPDHPAHANRPR*
>122_fragment_2_104 (107455..107670) -1 None hypothetical protein
MSKVRRCAASPVQPEFSLCGEAFDAFDEKLTSEPYEIAEPDQSITCPMCCEAIREIKSIRNPLRPRSRSWA*
>122_fragment_2_105 (107667..108203) -1 None hypothetical protein
MSKHPIPDWLRSQFSLIEDEVRKLGPCGVFTQMRTVTQTYFEQLAAAPQRPALGGEPDILGKVVSFGEGPKEISWAKGKIPEFGVELIDRAHLAPLQAEIERLDDELDERGQWDTAQLKVIHGLNDDVARLKARCNEMESGLRIIATAGTKTRAPDLRLKAKQTLIRALSKPAGSEQV*
>122_fragment_2_106 (108200..108529) -1 None hypothetical protein
MSNELKKCSVLKDANVSYAASTAVNAAALGAQNVNSAMSTIDALRQQLADVSNERDGFRVQVEEAERFVEYLINNCVGQVVSEGKIKYWMACSIERHAKEKSTEQRAQS*
>122_fragment_2_107 (108693..109934) -1 K00590 E2.1.1.113; site-specific DNA-methyltransferase (cytosine-N4-specific) [EC:2.1.1.113]
MSQLHQILVGDCIDMMRTLPDESVHTCVTSPPYYGLRDYGVEGQIGLEETPAEFIARLVDVFREVRRVLRSDGTIWVNMGDSYAGSWGAHGREDMGVGVSTLSQRQVMASQRKSKAITHAEYKPKDLMGMPWRLAFALQDDGWYLRQDIIWHKPNPMPESTRDRCTKAHEYLFLLSKSRRYHYDSDAIREPANLTGKGNANGYRGGAYVNGSTFDNAEGGKRTSSGNTVPNNGVGWGHGTDKASRNRPRVTVPTGWDTSTGEGGHGVFHKDGAERKRRDSFKREDSKREQAIPGQSKGTHRPDRDESTHDTATRNKRSVWTVATHAFKEAHFATFPPDLIRPCILAGAPRGGVVLDPFGGAGTTSLVSMQEGRRSIICELNPEYAALARARIDAAWLDGAAQMDVFRDSVPAA*
>122_fragment_2_108 (110007..110387) -1 None hypothetical protein
MSKRKDILDELSKEELLAWVRTQFFSRLPKRSEILYLRWEKQSSEALEEMRLENLKGPGVDLKERDLLAVRFNESTDAAEKLRLLELMEPYGAALNAHIKRSQAISRKLKRVDALYEQIDIERQKE*
>122_fragment_2_109 (110495..110710) 1 None hypothetical protein
MSAEHRKLIGIPDDHGLKHTGSKSEQRKGRDTDIDFYDETDAQGNVIAQYEVRDSMSIYPPQGTTLSFRKL*
>122_fragment_2_110 (110799..111032) -1 None hypothetical protein
MALTQKQRDERTALKRHKAGEEELRLRVRPGTKQALKELMEWAKIEEQGEALTLMIQHLHSLGRAARCRCLKSRATK*
>122_fragment_2_111 (111176..111703) -1 None hypothetical protein
MMTTELSAIRRNSIESNRLAQAMAEFTSKGGTVEVIEGFVSKPRPEPKAYGRDFPAEQAPKPVKRERKRTPTQVRNSSSGRTQVNDALVQRILEMARRPARARLKRKRGSAATCSIDWQMSMGFSSSSTTHARTCGPRRSTQLRTLCTSSESKSCETKAWLESRPLPRWASAIRW*
>122_fragment_2_112 (111714..112385) -1 K10857 exoX; exodeoxyribonuclease X [EC:3.1.11.-]
MTAYIFDSETTGFKEPQLVEAAWLKLGATVGLPVTDEYLARFKPSKAIELGALSTSHILDEDLVDCPCHTSFQMPPDTEYLIGHNVDYDWGVIGQPDLKRICTAALSRRLWPEADSHSQSAMIYLHYREQATGLLRNAHAALDDVKNCRLLLSKILDALAVKLGRPVEGWEELWSISEEARVPTVISFGKHKGSLIANLPSDYKRWLLNQADLDPFVRKALSK*
>122_fragment_2_113 (112382..113248) -1 VOG01757 REFSEQ AAA ATPase
MFKKAERKQAKLRLALAGPSGSGKTYSALLMAKGLGGRIAVIDTEQGSASLYSDIADFDVLELQAPFSPERYVDAIAAAEAAGYNVLIIDSYSHEWTGPGGCLESNEALAHQKFRGNTWAAWNETTPRHRQLTNKILTSTLHVICTMRSKTETVQGEGKKIVKLGMKSEQRDGTDYEFTVVLDLTHDAHTALASKDRTKLFTQPELIDESTGRKLLDWLNSGVNPEERAKELLIDAIADIASAKDMVGLQAAFNAAKVIAIGYDDLVNRVVAAKDKRKTELTPLEQSA*
>122_fragment_2_114 (113259..113402) -1 None hypothetical protein
MSNPRMSAQLDWMTVGSFSPERFTGEERKEYEAEQARIEREWDQQPN*
>122_fragment_2_115 (113410..114534) -1 VOG11477 REFSEQ hypothetical protein
MSVEKELAVVPPKEKALQIFQTPKGLDPYLQIVRDKIDAFVPDVTTRKGRDAIASIAYTVARSKTALDNRGKELVAELKEIPKLIDAERKRMRDTLDTWQEEVRRPLNEWQAKEDARVEYHNSMIRHIEDCGIGLIGGQPQPFGLLFRELEEKIIVDEKYQEFEAEAHRVKAAALAKLRASFDEHQKREAEQAELARLRSEAEARAKAERDAEIARAAAEKARFEAEQKAQAEREAAAKREQELIEQAAQAKRDAEQKQRDADAAAANQALQLKLAAEREERQKLQAEQDRIAAEQRQAAAVERARLDEISRQEQEAAEASRIAEAREADKAHIKSVCLAAQQAMVNLGIDEACAKAVIILIHQKKIPAITIAY*
>122_fragment_2_116 (114704..114874) -1 None hypothetical protein
MTTPIVKSLIDEQLADIERSLSIVSAGLPREIPVSALPPKLVEAVKTGRLAVRPRQ*
>122_fragment_2_117 (114871..115122) -1 None hypothetical protein
MSDLERYQDSAQGRRSIRQATGLYDDLGNLKSALVDYFHDYADPVDYAAVRAAERDYRKKLARRISVAITKMEMVCPPKGASA*
>122_fragment_2_118 (115119..115337) -1 None hypothetical protein
MSRHDTAKRFIERALAEYATAPCPDMTAAAVQMAVELAYAQGDISCVEHTHYTERRNRMVARHRIEPVRACA*
>122_fragment_2_119 (115358..115579) -1 VOG00405 REFSEQ hypothetical protein
MSNDQSEPAFPVPGSEYGGTGTCFGMTLRDYFAAHAPNAPDDFGWNNGEATQCERLARWSYHYADAMLAARTA*
>122_fragment_2_120 (116347..116631) -1 None hypothetical protein
MQCECVTRVKERIDGKLREQMPEGANSLEWSFPQIRFGLTNDGVVHLPVFDIKGEFQAPKKAGGFKRVKVDTFLAATYCPFCGMKCKADEQKAA*
>122_fragment_2_121 (117059..117718) -1 K01356 lexA; repressor LexA [EC:3.4.21.88]
MEFKDRLRARMTDLKLSATDLSAMIRVSKATITFWRNGTNGATGSNLMELAKALRCSPEWLETGKGEPGGVSGGEASNFELVEAPDRLYRYPVVSWVAAGAWAEAVEPLPSGFSDRYEVSEYKAKGPAFWLEVRGDSMTALSGTSIPEGMMILVDTEADVRPGKFVVAKLPNSEEATFKKLVEDAGRRYLKPLNPAYAMIECSDDCRIIGVAVRMTGTL*
>122_fragment_2_122 (117808..118017) 1 None hypothetical protein
MTYEEALKHFGTQRAIGDALGVTTSRVSQCRTAGGFSYPMQCVLEKESSGTLIANRQDDPAQAPRMTAA*
>122_fragment_2_123 (118047..118235) 1 VOG24653 REFSEQ gp11
MHFDPSHMHDKPTKVRLDEVADDLLTAMARYQRTQKAVLAREILERGLNQMMEELNAKTDVA*
>122_fragment_2_124 (118254..118445) 1 None hypothetical protein
MPERKQLDVQLDGIGVSNLELLAKREGITPEELAAKIINKELDRMSRPPPSRGKVRSIGRRAD*
>122_fragment_2_125 (118570..119409) 1 VOG02593 REFSEQ hypothetical protein
MPISQHVVNSDSPRHEIAPSQNVAHSQLITLIGGEAFTTTLAISAGCELDHASVIKMVRTYQADLEEFGLLDFKSESTGGRPTELALLNEQQSTLLLTYMRNTPIVREFKKRLVKEFWRLAHSAPAFDIASLNDPKVLLALLTDNVRKVVHLEADNTELTQENHLLEQKVVADAPKVDFFNAVITSTSIHSVREVAQSIGTGQNRLFAFMRQQRWVDRHNTPYQGRVESGYLVAEPHSYICPETGERKTKFTCKVTGKGFTKLQALWAGRDTAILGGAA*
>122_fragment_2_126 (119406..120155) 1 None hypothetical protein
MNDVPRQFKGVWIPAEVWLDRSLSITEKVMIVEIGSLQDPVRGCYASNNHFGRFFGLSNSRVSEIISSLTSKGLLRVELIRDGRQVVERRVRLTDLFGKSNTYSENASTLFGKGGDPYSEKAQESNTKSNSTTEGEKRGSAKASPSASRKASKFDPLTARPSNVSESTWADWSQHRLEIKKPLTATTCAKQAKTLAGHHDADAVINQSISNGWTGLFPEKVLPGAKASGKAQGPDFYDKSWRTDTSDDL*
>122_fragment_2_127 (120152..120883) 1 VOG11468 sp|P03689|VRPP_LAMBD Replication protein P
MKNVTQMIPGAARALGTAAPYQAPAQTGTQLGVVDDATGEVVERLFRQLQAIFPAHKQAWPDDKAKAAAMRNWTMGFMAAGIRTLEQIRYGIEQCRKSGSPFAPSVGQFIGWCTPGPEAFGLPASADAWMEALMAVYSHEGVKIAAIATGLFDLRSAKQEDKGLRQRFDHNYTIVIRRAQSGQPLDGKILTGIGHDSQKTELELAEEQAEQAVQARIIQQGIPVDAASARALLLARIGRRAGQ*
>122_fragment_2_128 (120880..121305) 1 PF09397.10 Ftsk gamma domain
MSNDKKRGELLAAFEAHKSRIATAIIGGGEGARVRQVLERVSLSDFEAGWQASREALEQSAISPEVQAMLSQFEADEAEEIRMAEAYVRESERCSISALQRKFKIGYNRACRLMDRLVALKVVSPIDAEGRRTVLPEQVKP*
>122_fragment_2_129 (121302..121490) 1 VOG11464 REFSEQ hypothetical protein
MNRANPAQLRQALALANAYTKAGIRFVCMPVVDEADGMNLKDQAQQRLERMALIAESAERLA*
>122_fragment_2_130 (121487..122185) 1 None hypothetical protein
MNVDIEKIEALAKGCRDEVIRSHGWTGMIADAGLLRRDSEFLKECSPEAVLDLIASGKRMASRLMYCPACQGEGEVYSGRNSYEGYNQPPEPIMNKCGECDGDGALGDTAECISILDEVETLRAENAGLKTGYEAYERVNAELRAECEKLRAKILSGAARAKKLVWIASNHRRDALVLRKDAERREWEGFNNGLTLAGNLRPSITHGDDGGRVLNDIRCQIDEAMRKDQSHD*
>122_fragment_2_131 (122178..122648) 1 None hypothetical protein
MTDKISVNCQSKLTEAVTRMSAMFREKKFVVVSLRPGKDRTLDQNALWFAFYKRISEMTQIGDASEARKYCKLHHGVQILINEDEDYRAAWHRTTKHLTYEEKLGLMGDSKLLGPDGFPVTSMFNRAQGVAYTDRILTEFSALGVFFGDLIGEVAA*
>122_fragment_2_132 (122645..122959) 1 VOG01140 REFSEQ hypothetical protein
MTHQFKSGDLALIVGAHTTPENVGKVCELVELLAPEQISTWRDPADGQRIQNGDVGAAWVVIGDGLTSWCGSSGWVMADPIHLMPLRGDFAPEQQEVKEAEPCA*
>122_fragment_2_133 (122950..123540) 1 PF05766.12 Bacteriophage Lambda NinG protein
MRLSLQAKTPKTKKCRVPDCGASFVPQKLGQAVCSPACAIIDAPRNQAKARKALAQVERREIKIRKEALKSRSDHMKDTQQAFNEWVRNRDAALPCVSCGRHHEGKYDAGHYRTVGSNPALRFEPLNCHKQCVPCNQHKSGNVVEYRIELVRRIGMLNVEWLEGPHEPQKYTIEELKALTAKYRALTRELKKGEAA*
>122_fragment_2_134 (123540..124082) 1 VOG12651 REFSEQ hypothetical protein
MYRNVVAAVVRALAAETINSAGGCDFEPKVQCAKQKGEIVGKEAAFLTDCWVFGRLHKALSNEHWRALVAKYSTHTERKHAAITEITRQYRSPAPERFRHCAIVTWAMPKLPGVDGKRSTNVLPSAWYEMDNWSDEPHPIKTQERWRRDIRKGLESMVDVALTEAQHILEAEGILIADCA*
>122_fragment_2_135 (124425..124796) 1 None hypothetical protein
MDPTDLGPGTATWLGGTGTILLGGFLWLRKFLSKDAADRAMDNADIGTVRRLNELLDSERVARKEAEARADQFAKERNELAAAVGRMEGKIEALTSHIVQLTDKVTSQSAEIARLRAQLGGNN*
>122_fragment_2_136 (124796..125134) 1 None hypothetical protein
MDRCAINFVARHWWRRVEVWLIAILLLAGGAMLGFQVAQWSLASWYVAQVAEVRQAYDEATIQRDMRLNKLAKSATEAAVKVEGAAGKATEAAEVASKAADKVNEAVERQSP*
>122_fragment_2_137 (125180..125314) 1 None hypothetical protein
MTKKNWYVTTPGHKPFPMILLESALDHAGALAFARSIWPNCTVE*
>122_fragment_2_138 (125319..125942) 1 VOG12618 REFSEQ hypothetical protein
MIRPTPPAELLRESEDSDVFMRLVPAKDVWDWIQAEILADTGSIHNEDHAHLIDADICIMWASSAFTKQGRTVLGQAEQVAFRAGGWQKARMEQQMRDWFGYVPSYIITLAADYCSQCPDDDFCALVEHELYHIAQATDQYGAPKFTQDGLPKLEMRGHDVEEFVGVVRRYGASPDVQLLVDAANKPAEVGKLNISRACGTCLLKLA*
>122_fragment_2_139 (125974..126432) 1 VOG12619 REFSEQ hypothetical protein
MAALTPDVKAYIVQALACFDTPSQVVDAVQREYGITVSRQQVETHDPNKTSGKGLAKRWVALFEDTRKRFREDAAAIPIANRSYRLRVLDRMAVRAEGMKNIALAAQLIEQAAKETGGIYTNKQQVDHTSSDGSMSPKGKSLDDFYNGDVPA*
>122_fragment_2_140 (126419..127762) 1 K06909 xtmB; phage terminase large subunit
MYQLNPNLREFWRIRKPYKLLKGGRFSSKTQDAGGMAAFLARNYTVKFLCIRQFQNRIADSVYTVIKEKINQAGWADEFDIGVSSIKHRKTGSEFLFYGIARNLNDIKGTEGVDVCWIEEGEGLTEEQWSVIDPTIRKQGSEIWILWNPDLMTDFVQAKLPRLLGDDCVIKHINYADNPFLSDTARSKAERLKEADEESYNHIYLGQPRTNDDAAVIKFSWVEACVNAHLKLGMSLSGAKAVGYDVADSGEDSNACALFDGAICFDMDDWKAGEDELNESAMRAWSHVRGGRLIYDSIGNGAHVGSTLKAARIHGGYFKFNAAGAIVNPDKEYAPKIKNKDKFENLKAQAWQDVADRMRNTFNAVTKGHKFKASDLISISGDLQKIEQLKIELSTPRKRYSKRGLDMVETKDELARRSVASPNLADAFVMGACPHLVANSRPIRDLL*
>122_fragment_2_141 (127779..129155) 1 VOG02778 sp|P44183|Y1409_HAEIN Uncharacterized protein HI_1409
MSKKGLVPADKKLGKALVRAAHKYEAQIKSSSDGLVNVVSGLGTQKAKRSHNQFQYGFLNDFQQLDAAYQTSWLARAIVDYPAEDMTREWRTLKCDDADVIRAEEDRLNLPAMVSEATSWARLYGGAGILMLTNQDLTKPLKPEKIKKGDLYRLLVIDRFDMTAMNLNQTNILAANYLQPEFYTISAGAQQIHWTHFARFAGAKLPRRQRAQTQGWGDSELRKCLDDVMDIVASKDGIAELMQEANVDIIKRVGLSDELASDQDDAITARYALFSMMKSSINLALLDDQETYDRKTLDLSGVAPVLDLLMTWISGAAGVPVTRLFGESAKGLGNNGEGDNTNYHNQLSSKRLTQIDPGLRQLDEVMVRSATGRWIDDFNYTWNPFKQPDLVQIAQANKANAETDIAYKDAGVITTSQIQRKLQAQELYQFDDEKIEALEAEEDLTMFNDPVGDDDKVE*
>122_fragment_2_142 (129158..129973) 1 VOG01506 sp|P71385|Y1407_HAEIN Uncharacterized protein HI_1407
MDMIGIQYNARLQRLVKQVKADIAKEVMPLVRQLAPEYTQDAVVTTDAWSDLIIAAMRRLTSKWASFGVDAGADRIAGEFVQSALKKSERDLKKSMGIDVFSGSKTLQDYLKASAQQNAQLIKSIPAKYLDEVQTLVMANMRSGMRPGFIEKALQEQFGVSQRRAKVIARDQTGKINGELAEKQQIGAGFEYFQWIDSDDRRVRHRHSEIANKVTAYGKGIYRWDDLPLSDSGVPIKPGSDYQCRCIARPVSAREVKANQDAGRTAPGVLR*
>122_fragment_2_143 (129998..130225) 1 None hypothetical protein
MKIKIKHIYTGNESEFDTDNYHIAVQMTPTDLENIKSLPDTEDGRSIEGNEHRTYACIRPADDTESDALFAWAKQ*
>122_fragment_2_144 (130225..131385) 1 VOG00976 sp|P44180|Y1405_HAEIN Uncharacterized protein HI_1405
MSRQTVFDRVGYRITQREYTDEGFLKVPGRVARTGIQEYLARELGLDGDPMRVVKVYRPEEEVFKDESLSTYDASAVTNNHPHGLVTAANYKGLTVGVVRGSGRRDGDFVVCDLIVKDKATIDDITSGKCELSAGYTAVYDDTPGVTDDGEDYHYIQRDIRINHVAVVDRARAGANARFFDHNPGGNTMPVLITTDSGRSVDVADPANAQVVADSFDRLMKRATEAEAKADKAQASADSAAEKLGDALKASSDEAISTRVTAISSAHALARKVAGDSFTCDSMDVTEIKRAALAVALPKRDWAGKSAGYVEAAFDAESDKDEDEDDKDEDGKPKVKKPTGDAATLLAQLTQLALDGAKPAAVADGKPTPYQAHKQSLSGAHKSKGA*
>122_fragment_2_145 (131388..131882) 1 VOG01175 REFSEQ hypothetical protein
MPVQGGNAINHGVAYAGMVADGEVSNGVSKVNKGTVNIAYGLGVVTDGDDGAKLPVAASTAAQFIGVVKRELNRAYTQSEVFGAVAKRDMTVETMAPIWVTARVAVAKDDPVYLVVGDGTGAFQGQFSNVVGAAATLAVLIPNAKWVSTAAAGALAKISLKIGG*
>122_fragment_2_146 (131888..132913) 1 VOG00793 sp|A0A0U5AF03|CAPSD_BPK22 Major capsid protein
MKLKKIVVAIDAAIAYQIGRDAHEVTFNDGLPTIDDGLAFYISQLASLEARIYEAKYAAINYMELIPVDTSLPEWVDQWDYISYDGVTIGKFIGASADDLPDVAVNANKSVVPIGYAGNKYSYSLDELRKSQALRIPLDTTKAKLAFRGAQEHTQRVAYFGDAARNMTGLFNNPNLALSNSTLDWYNAATTGDQIVADLNKILVDVYINSATVHVPDTIILDATRFAFISNKRMGTITDKTILEYFRTNNQFTALTGRPINIFSRLQLSAAQLAAAGVSNANKDRIVAYELNDENLGMQVPIPWRSLAPQMWNLKVNVPCEYKISGVEFRYPFSGAYRDQF*
>122_fragment_2_147 (132982..133287) 1 VOG02622 REFSEQ hypothetical protein
MFLKNEAARLITINHLVGEKETSYPILPGENPAVEVPDAVVKIDFVKALLSNGDLRRVGADEIENDDGEEDLFAEAEALGLKPEKSWNEDRLRAAIAKAKK*
>122_fragment_2_148 (133330..133749) 1 VOG00306 REFSEQ hypothetical protein
MIITPEMIAAFRSNPVLKAFTDATKWPDEYIVEALCEAGTETGSSRWGALELTCDNFKWRGMQYFAAHWLATNFATLGANGTPNSEARLNVAQKSVGDESIAYRVPQMMDAGTDWLTYTNYGQQFYRLKKRAGMGAKAV*
>122_fragment_2_149 (133750..134349) 1 VOG01950 REFSEQ hypothetical protein
MINIDLTGFQDLQDELSRELAALRTNKIVTVGIHEEAGDVESGDLTMASLGAINEFGADIKHPGGTSYGYANQASAERGEVRFLKKGAGYMELGVTKPHDIKIPARPWLEPGVASATPEVLLTIQDGMEAGHSMDQILEMVGLVAAGAVKIYMTDLKTPPNAASTVRKKKSSNPLIDTGAMRASVTHKVSIGPSEEGLE*
>122_fragment_2_150 (134346..134717) 1 VOG00454 REFSEQ hypothetical protein
MSLNMEGQIDLVFVSVEASRTVDVGGQWVDGIWTPGTPDTKPYVVNIQPASDREVDFIRQGGERITDVRRIYINQGEMQLIDQTGTWAFLGQQWKTVKCDNRYWRNYCKVLVMRIDDQSGGPA*
>122_fragment_2_151 (134714..135280) 1 VOG01012 REFSEQ hypothetical protein
MTNEELFKKLRPIVMLATGVPECLLADQAGPGSMPAPQGAYATITPRQSISERGQANVVSRNVPGEQVEVQVRAQIMCSCSVNFYRGEAVMFAELLKQANKRPDISIMLFKSKIGWNSTDAVNNLTSLQSANFEQRAQITIRLMYETVSLPVINNILSASVAVENEESQVLQTFSVEIDPTQPMERSL*
>122_fragment_2_152 (135277..136419) 1 VOG01167 REFSEQ hypothetical protein
MSYPATNIIRVNARISPAGLGNANFASAMLFAPQTALPVGFAPDTYRTYSSLPELSEDFDDTTDVYKAAQRWLGGTPATRELKVWGAATADATRTASLNKARNTVWWYWTMWTAPVLAVIGDVLNIAQWCEDNGSMFIDNQTGAAVEDIRDPAVTDDIASQLTTFGFRHAFTAAHASDAYAGSALAKHFAAVNYSATRSTITGEFKKSPGVAAESLLTTEYSAMQSAGKKAVFYTAVDNQGSVDVGRWLNTFTHSSFGEYIDDVVNLDACINYLTTSLYNTIANQPTKLAQSPVGQAVLIGAAGAIMQLFIDNGYLGPRNYIDPDDGIEKYTKGFEILTKPEDILDLSDADRAARKSAPLRIRLFRAGAIHIVEADLDVY*
>122_fragment_2_153 (136502..136909) 1 VOG01803 sp|D6RRG7|ORF10_BPKPP Structural protein ORF10
MALSNFSTDLTVVTINGRQIQDWGDTATPYTDAPIDATSQLRRGQGGGGIRLDRINPGREVNIFLNPGSADAAYVQGLFNSRANVTLTYTQIGTLDGAIGTEGVIVNDGQRGRAGSTINDDQFTMQFNIWDGTRG*
>122_fragment_2_154 (136913..137314) 1 VOG00124 REFSEQ hypothetical protein
MSVRAFTVGGVQYNAAMASAVDQDRLMSLLSAAVIERFAVAAREDQVVSADVLCSMFMSMRQDAKAQVAQMLMGKVVVNGSDRAVTVADFGGKMVHYNQLLTELLLWNLTDFFEWLPSGASGDRQRETGSPAQ*
>122_fragment_2_155 (137471..139528) 1 VOG21899 REFSEQ putative tape measure protein
MASKVLKSFLIGIGYDTRSLEAGDKKINASLNGIRSGALGISGALIGAFGAAAGSIAGVANRVDKLAMSTQNLRTSQAAVYSFGNAVKLMGGDAVDALDAIKRFEEIQNNLRLKGDAGPISDLATAGIDVSSLYETKTGEEFMRALADMLPKLDEGQSNQVQSALGLSDGVFRTLKGGADQLDEAMKRASGLTGSVDQLTEDARKLAENASEFGLIIDGVTNEIAEKFLPSLVGAGTALNDFLKESRGKISNVIDYSADNPEATAVLGLSSVTAMAGAVMAKLGLSTIGGAVSKAGTAGLAVTGGAVGANVLNKTLDEKVPGYKGASEGFDEFLKSVTGLDRIKSPIEVLFGKPISKRSDEGEASPAAPAEVGKWPDIEHEAFGSGEAKEVKSQSVDDMVKAIQSAKRSVNGPPAEAAATASVAVPEIMPMARDDKPETSLEAPPAARIIPESPEVAASSEQKHTDKREPAPPIDAAPRISANEAYGDPPEPVVTILRENDETPPVVVAPEPTQAKVDAPIDDRRDRRIELIDVGARGLSRGEDSAKRAPAKAPEVSSPRQGVRIEDQSLGGPAQMPGHDKDDREAPPQSPWTDAPLFKQIFGERSSSDMMPPIHQVDTGQGSKETSAQGVDDIVRALQAAKLKVENNQSFTIQLDGQAIEAKITQVNERLNYETLNDLKTTTER*
>122_fragment_2_156 (139531..140187) 1 VOG00491 REFSEQ hypothetical protein
MSIINIFTRKAPTIAGYAFDAVLEDTFEATVTITSVPVESGVRISDHRILNPFKWTMSGAISNNPVKVQLTDFLGGALSNLTDNPIVSTVAGLSAGWLAGSDDTRASSTLDFLVWLMKSADPFDIDAGDILLKNMAITRLARTKEPRNEGGLEFVVELQEVIDLSRIQRSLQCTPDQLREGDPSKSALSRAINRGQAIAKEAADNVSDAVNGILDGVI*
>122_fragment_2_157 (140187..140495) 1 VOG00111 REFSEQ hypothetical protein
MLVIPLRAGSSNAHQRFGVQLGENLIDFEVDYVSYLDEPAWSMNLLRDGSRIVSGAMLEPGSDIIQIYRTGIGQMVFTGKNVTLDNLGVDNFLVWIAPVVDI*
>122_fragment_2_158 (140492..141907) 1 VOG00777 REFSEQ hypothetical protein
MRERVWSIDVDGQPYIEPQTGRRQFRIQFNIDISPGDAISFADIRLFNLQKGSSIPQKSGIVLRAGYDDNVDAIFTGYVTNTMRERPPGAPEVITRLICRSGQPIADRASAQLSFGVGTRVEEVLRALARAWPLPIDIDNAQFADDKPLSSGLIVDGDIPTAISDLAYAYKFEWVQDRGRIVITKTGMPRTVTPIRVDMFSGMIGIPEVSRGPDGLGVFVAVQLNPSLRINGKIDVESEFSTFNTGNLFVSELSGDATANGEYNIFAMKHSGDSHTDLWRTEIDGLRSGTTPKKDDVATPENGKLIWGARVDQAFRVKVREIGDRLSMDPNWLMAVMGFETGYTFSPAARNPGSTATGLIQFLEASARQVGTSTAQLARMTAVKQLDYVEEYYRPYSGRIRNLGDAYLAVLWPIAVGRPDSYVMWSRDSGPYQREYAANSGLDVSRDGVITRGEAVASVNTSYLRGQQFVR*
>122_fragment_2_159 (141975..142682) 1 VOG00751 sp|P31340|SPIKE_BPP2 Spike protein
MLETEGRAKQAKLIRDAFRELMKGVCTSIPGHILTFDPGTQLAQVQVGITRVDINDAEFTLKSIIEVPVYFPGGDYCIEYQIDPGCEGDILFSQRCIDGWIQSGGVAQNPIGRFHNMQDAMFLPGFRSRPGAISGFQNNGVRLRNRDGVQTVWLKNDNTISSSNGEVRFDLNPDGSTVMKNESGSFQLLADGSFLINGLKITTDGDVITAAGISLNQHRTSGVTGGNQISGVPVI*
>122_fragment_2_160 (142679..143026) 1 VOG00195 REFSEQ hypothetical protein
MTVRRLDENGDIVTQGQQFVNGREEVRQTVLTRLRLFLGEYFRDITDGTPWYEQILGKFSNLSAAEAALRARIANTPGVIRLTSFNADFDIETRRYSITAGILTEFGTDEVTLNG*
>122_fragment_2_161 (143019..144233) 1 VOG00243 sp|P51767|BPJ_BPP2 Baseplate protein J
MASLTSTGYVLLTQNEWFASERQFYLDIDPLWNLDPSTPDGLKMAHDAEIFYALDETLQRAYNSKDPNKAKGIDLDIICSITGSIRSKGSPSSVQLTLTATPGTQVLQGNRFESSTTGSRWSIDQTVTAPGTGIVSVNATCTVVGPTQADINTITKIVDVVAGLSGVTNAAPATPGADGQRDEQLRVTRATSVGRPGNNQIDSMIGELFSVFGVRRVKVYENDTGSSAVSTSNPYGLPKNSIAPIIDGGSDADIAMAIYVKKNPGAGLYQAGTPFEVLVTSPKYPANQKLVKASRPIYVDMILVINIKNDGTLPTNADQLIKEAVMEYAAGDLIPADVGFKISGFDIGESVPYSTMFTPVNKVIGEYGNSYVTLLQLNGAQANTAIAYNQMSRWTESNITVVIS*
>122_fragment_2_162 (144230..144874) 1 VOG01943 REFSEQ hypothetical protein
MMNIPNRVYAQYWDKPKAVDWYAIARKLGGSIEDAAEAVRKSYDIDTVVGEQLSVIGRIVVAPRSFVGAIPMTPGLFDLTDGDEFGNDDAMFSALTIDQDDQLSDELYRLVIKAKIIKNNGDATIENILDGMNYLLPTADVLRVTDGEDMTFSIEFYGQISNLERFALLNAGLVPKPQSVRFNGFLEGFEMFEFGDVDAEFGDEGAEFIGFIGA*
>122_fragment_2_163 (144876..145787) 1 None hypothetical protein
MALKLNERYPGRFDNPSAGYPQGSFKNRTSPTAKDGSYLERDWANDKEGFFQSLLSSSGFVANGTVDKVGASQYFDSLVSAIRSKATGRVISQQLFTSTATYTPTPGIAFAIVECVGGGGGSAHVLATGASQYATTAGGQAGHYTRSRFTAAQLAGGVLCTIGAAGVGANAGGITPSSNGGSTTFGALVTAGGGNRSSVGILSSGTYLSAPSGAPTTVFGSFQYSISGQPGSWGIYSSGSGQLGGNGGSSMFGGGGIGVGIGAAAQPGSGYGAGGGASSLGPNSSSIGGAAGSGGLIVITEYI*
>122_fragment_2_164 (145787..146161) 1 VOG11448 REFSEQ hypothetical protein
MRTYARVTGGEVAELLSTDQDIDKNFPEDFVATLVDVTDVKPSPAQGWIATKKKSKWSFQEPDHNSYAMTPDGVKDMRLAAYRQFSDHLKLEAEFDAISSGKEPDYSAWLAKVEEIKALYPMPE*
>122_fragment_2_165 (146167..147342) 1 VOG01432 REFSEQ hypothetical protein
MLPFIYDDNVSTMRDEGVDTSRWTLSTPTAGNIMVVGSSLKISTAGVGVNYSQPVNMPPENDDFIIYVKLKAEYAQGKASVVHFNGLDGKPRIGFALGYSYVSQTLSLGQLSVINKDGSAKSDYASINYSESWCDLAIHGCRSLGTYRVYLRDANSEWLSVFSGDISQIADISSVVVGSQFSLSQPSHLYLDHILICRPNIVSIGDSICAGYAVPADPYVGWQKYAKLYSWLRNDLIVNLGVPGNSSQQISDRIVSSSFAGARLVFLHASSNDFRLGVSASDRTQITQRSITAINAFGAKCVLINGIYPNSRYVNADYQAETAYQKQWWESSAITLTGLSGMIDIMLCLKGVMGAYIAEALARLEDGKHPNMPGTVLMGRLIKSLGTISTG*
>122_fragment_2_166 (147401..148063) 1 K10804 tesA; acyl-CoA thioesterase I [EC:3.1.2.- 3.1.2.2 3.1.1.2 3.1.1.5]
MLKRSLLKGILVSAAALIIGCSAGVHEKPKVLIVGDSISIGYTPYVKGSLEGRAVVTHNAGNAQDSNNGVSNIDAWIGGGRWDVISFNFGLWDLCYRLPGPITATNRDKIHGTISVPVEQYRANLRIIATKLKATGARIVYQTTTVVPAAEPGRYSSDVAIYNDAAKSVMRDLGIPVNDLQAVSAALPDSMRESNTDVHYTEAGYSEIAKSVTASINGLL*
>122_fragment_2_167 (148123..148668) 1 K03791 K03791; putative chitinase
MPITAQQLLQMLPNAGQRAGVFVPALNTAMGKYQIITRERIAAFIAQIGHESGQLRYVRELGGSEYLSKYDTGKLAERLGNTPEADGDGQFYRGRGLIQITGRANYAECGEALGLDLIHHPELLEQPEHAAMSAAWYWGSRGLNSLADKGDFLQITRRINGGTNGLADRQALYDRALKVLA*
>122_fragment_2_168 (148665..149183) 1 PF10721.9 Protein of unknown function (DUF2514)
MTGLYARIGGVLLILLAVAGALYGAYRHGVSVTDSKWQVKWAEQVSTQAQAVATTTTEYRTEEQRRQKAANQVANDARQEQAVAIADAAGADAAGDRLRSEAGKLAASVSCVPSDPGIADRGKNATRAAMVLSDLLGRADARAGELARYADRLTVSLQACEAFNVSISPSSH*
>56_fragment_1_18 (15411..15920) -1 VOG01059 "sp|P00726|SPAN1_LAMBD Spanin, inner membrane subunit"
MNGLDLRFVLLAVVVGSGLGGWLAWEWQATRYEQQLSEQAMACLQERELASRAVSDWQTAEQARRRALEVRLQNSDTTLHKELSDAQTSQVRLRDRLATADLRLSVLLATPSSGAGVSAATDSGRVVHGGPRAELDPTAAQRIVAITGDGDQGLIALKACQAYVREIAF*
>56_fragment_1_19 (15917..16462) -1 K03791 K03791; putative chitinase
MPVNQQQLLHILPNAGLKAGVFVPALNIAMTRYCIDTRLRVAAFIAQIGHESGQLRYVRELGNDSYLAKYDTGQLALRLGNTPDADGDGQLYRGRGLIQVTGRANYEACGEALGLDLLRQPELLERPDHAAMSAAWFWDRANLNALADKGDFLMITRRINGGINGLADRQALYQRALEVLP*
>56_fragment_1_20 (16486..17490) -1 None hypothetical protein
MKTVSSARNPGWADQAHTTLNLWVIFEENKDSGREEGISISANDQDPQVVALFNRAVAGEFGVISEPSEQMVRIAVMMQRGNYSADASRKIDALTNDLSVLQNAVASGTATQAQIESLPALQAELDAYMAYRADLAHLEDIPGFPMSFVWPVPPASPFVYVKPPEVPTPPTGVSDDELPWVMSSIRNPRWADQSHNAIVLLVVFEQTKDTRGEEAVTVSFNDPRPQARKLFDRAIFSEFGPVLEPLEPLEPVVTVDGRVQRDRYAAMATAKIEALNHTLSTLQSAIEAQLKSLPALQAERDAYWLYRVQLAQLDALPGFPVSFEWPVAPATTFV*
>56_fragment_1_21 (17626..18138) -1 VOG03347 sp|P03740|TFA_LAMBD Tail fiber assembly protein
MHTVLSARDPRWSDLAHTSIEMWVLFEEMKDIYGEVPFAASPKDSEPHGVDLFNRAVAGEFGEVLEPTEQTVLTLVTLQREAFSATATARINELVAELDTLQDATALKMETESQVNSLPAIQAELNAFRLYRVQLAQLETLEGYPAKVDWPVAPAKPFVYVQPVEEAVSD*
>56_fragment_1_22 (18149..19531) -1 None hypothetical protein
MDYPKSVPGVGLVSGKFVDENPATGTPGSLIPAQWGNSVTQEILNVILGAGLVPSEADVTQLHRAILGLAASDYKKSVRCATTMAIGLSGLQTIDDVTLVAGDRVLVKNQDNPAQNWIYLAAADAWTRAQDANESTECTPGHLVPVQAGTKNGGTVWQLTNTTAPVLGTTGLVFERALGRSGVAAGSYSRVKVNRYGQVEEGSSPTTLAGYGVTDAFTKAEVDLRDAARPLRDSITHVGMANNQPDAPYMRRESDNGVYYLQSRLGFTPVQQGGGAGQLANPIKIGWSGSSLKAAVDSTDLGNLWYSRNFNPDDKANRGTTLEAYGITNAYTKAEVDLRDMQRPLADSINILGFASNNPLYPYMRRSSDGQVYNLVSEQGLAARIAALGLSEVGSYAFARVINSIGPINQGGLVAGNNLIYSSTSGSDGGTNNSGTIGIGTWRAHGAFSSGERTLFQRAS*
>56_fragment_1_23 (19542..20141) -1 PF10076.9 Uncharacterised protein conserved in bacteria (DUF2313)
MVVIRTAEHYAEQLQALLPPGPAWDPERVPEVQQLIAGLSHEFARIDGRAFDLLNEMDPASVSELVPDWERVMNLPDPCLGLKPLFEDRRLSVRQRLVAVGGQNAAFYVGIAVSQGYPDASVTEFRTPRMGRSRFGQAHFGTWNAQFMWTLNTGGRQRLGRRFGASYWGERFGVNPGTAIECLIRRAAPAHGIEFVNFN*
>56_fragment_1_24 (20129..21169) -1 PF04865.14 Baseplate J-like protein
MPFETPTLPALVNRTQVDLAGDALRQSDARVLSRAHSGAAYGLYGYQDWIADQILPDTADEETLERQAILRLRQPRKPAQPATGTVRFVAAAGAVLDVDTILQFSDGRFYRVTQGVTTVAGNNTTTVEAVDAGALGNADAGQVMTVVQPVEGIDSSFTVIADGLTGGIARESTESLRARVVRSYRVIPHGGNQDDYVTWALDVPGVTRAWCVRRYMGPGTVAVFFMRDDDATPTPDAEQLAQVAAYIEPLRPVTAELYVLAPVQKPVTYTISLTPDTTAVRAAVQAQLADLHNREAGLGETLLLTHIAEAISRAAGETDHVLISPTANVTAAANQLLTFGGILWSS*
>56_fragment_1_25 (21159..21557) -1 PF07409.12 Phage protein GP46
MIIEGSLQASLLRSVIISLFTWRRAEADDPFDDAERFGWWGDTYPAVANDRIGSRLWLLRRVKLTAQTQRDAEFYAREALSWLIDDGHVQRINIFTEQVQSNRLNLGVELVVPDGQVVRFNPSEQWQVIYAV*
>56_fragment_1_26 (21557..22066) -1 PF06890.12 Bacteriophage Mu Gp45 protein
MSLFNRMLVRGTVVLVRASSKMQALQMRLTAGEVRGDMEHFEPYGFTSNPLAGAEGIAAFIGGDRSHGLLLVVADRRYRLQGLESGEVAIYTDEGDKVHFKRGKVIDIETNTLNINAATAVNFDTPQITQTGKIVSQGDQVAGGISQITHLHGSVRSGPDQSGPPVGGG*
>56_fragment_1_27 (22063..23169) -1 VOG00534 sp|P10312|BPD_BPP2 Probable baseplate hub protein
MIDPNVVTLTVDDKDYAGWKTVEISAGIERQARSFDISLTWQWPGTDMVRPVRAGARCAVSIGGELILTGRVFATPVSYDDKQITLKISGRSLTADLIDCSAINKPGEWNDVSALTIVRELAAPYNVKVLSEIPETSRKSKHTIEPGETVFKSIDRLLTVFRIFSTDDEYGNVVLARPGSMGNAVDAVELGRNVLSAVAPLDFSGLFSEYQVIGQQAGNDKTFGKAASEVSASVTDSSVTPARVLVIHEESPITPALALSRAKWERGHRQGKTRLTTYKVQGWRQANGALWRHNTLVRVVDSILDLDQEMLISAITYSLSDKGTTTTLVVGPIEGFEAEPGDPEKRSKVPVNKDAYSYAQPNNEGTLA*
>56_fragment_1_28 (23173..24666) -1 VOG01137 sp|P71389|VPN_HAEIN Mu-like prophage FluMu DNA circularization protein
MSTWRDSLLPASFRGVGFFISSAVVPIGRKGQLHEFPQRDEPYFESLGKQSQVHTVTAFIVGPECFEQRDKLLQALETSGAGELVHPWLGRMQVRVGDCDMTHSLAEGGIVRLNLKFYPDQPLKFPTSTLNTGRQLMQASDGLLGSALRRYRAVMATVDAVRINIQALRSTLSGVFATIQRQFSSFMTVYSDATALVHSLVNAPYTVSTMFSTFFASFQGDSRRSSRERGSSNVGAGGTGSGSGSAGGGSTGSGSGSSGSGNTGGSSGSGSGSSGSGGSSGSSGSNGTGSVGSVASGASVSRNASGVEAVPYRSIISDATQQAQAVSSINQVNQGGGLDTGVTAQATADLVQDALLVKVARVVASMPVAVSTTPILVVPSLDQQRVQPLQRADVPVADDVIELRDTLSSAIWDASLKADPEHYLALNTLRHALISHLNAVAASGVRLQDMKVSEPLPALVLAYRRFGDASRSREVVQRNRIPHPGFVPPGTLKIAQE*
>56_fragment_1_29 (24663..27191) -1 VOG03144 REFSEQ hypothetical protein
MADTIKTLITGVDQLSPTLATVSKNVKGFTEGLESSGLAEVPLKEMIGESTLAQPLIDAVKAAMGFETSMAGVKRSVTFETPQQFQAMSRDILDLSERLPESASGLAAIVTEGAKANVPRAELTGFATDAVKMGIAFDQTAAQSGEMMGKWRSSFEMTQPQVAALSEKINVLGGNNLEKQIATMVTAMGPLGPVAGRASGEIAAMGATLAGVDVPTDVAAKGIKSFMQSITEGGAAKAGAFEALQLDINQLTQGMQQDPSGTIEKVLKAISTVDPGTQSAVITQLFGEESLGAITPLLKNLDVLRSNLAKVGEGVQSAGTIEQEFQANSETTAVALKEMENRVDRLSINIGSMFLPAMNETMAVIGPMISQVAALAAEHPGVIKGVVGAAIAFGVLQVAVIAATNASRVLSSVLGMSPVGIVVRALALAAGLLIANWSTVAPYFQAVWAAIREPVMALWDVFKRVFGWSAIGLIISNWQPLSAFFVGLWDGIKVLAAPVFDAMKTLFSWKPIDTIVSSWQSLSTFFVGLWDDINVLAAPVFDAMKTLFGWTPIGMIISNWQPLSEFFSALWGVIQALASPVIGYFQSMFDWSPMDTISSAWQPVSSFFSGLWETIKAEAAPVTDALASLFDVSPMELISAAWQPVSGFFTGIWEGIKAETAPLMEALTGLFNWSPMDSINEKWAPIKTFFSGLFTDIKPFIDPILNWFGIGSDDKSLLQKATQKLNEFAEERRVDNAGPGGGKGAFLTADAVQVSQLKQQQINQAMGIPATSQLLSAPNLPAPGSLLLQQGTGVGSRLEGELNIRFENAPPGMRTEQMQTNQPGLTISPSVGYRTLGAGAAS*
>56_fragment_1_30 (27322..27618) -1 PF10109.9 "Phage tail assembly chaperone proteins, E, or 41 or 14"
MSEIIELARPIEAHGETVSQLTFRRPTAQEARAIKALPYRIDKNEDVSLDLDVAAKYIAVCAGIPPSSVNQMDLCDINTLSWKVASFFMAAASATLKA*
>56_fragment_1_31 (27615..27962) -1 PF10618.9 Phage tail tube protein
MAQKVAGTCYIKVDGTQLTISGGGEAPLMNVKRDTVVPGYFKEVDKAAWVKFKAVHTPDMPLKLLTTGVDMTITCEFNNGKTYVLSGAYLVEEPSSKADDGTIDLKFEGSQGSWQ*
>56_fragment_1_32 (28030..29526) -1 VOG02914 sp|P44233|VPL_HAEIN Mu-like prophage FluMu tail sheath protein
MAISFNNIPSDVRVPLFYAEMDNSAANSASAGMRRLIVAQVNDDVTGPEIGSLVLVPSVALAKNLGGQGSMLAAMYETWRKADPTGEVWCLPLLNTEGAKAGATVTVAGAATETGLLNLYVGGVRVQATVVNGATAAQAANALSVKINATPDLPVRSVVDAGVLTLSCKWSGVSGNDIRLEFNRLGKTNGEAIPAGLTAEVTAMTGGVGTPDQVQALAALGDEPFEFLCLPWTDTTTLDAWKAAMDDSTGRWSWARQLYGHVYSAKRGTVGTLVAAGQLRNDQHITIQAVEMAAPQPVWLQAAALAARTAVFISADASRPTQSGTMPGLDPAPASQRFTLTERESLLRYGIATAYYEGGYVRIQRSITTYQKNAYGQADNSYLDSETMHQSAFIIRRLQGVITSKYGRHKLASDGTRFGAGQPIITPSTIRGELIAQYARLEEEGHVENAEVFAQHLIVERDGNDPSRVNVMFPPDYINGLRVFALLNQFRLQYDEAA*
>56_fragment_1_33 (29545..29730) -1 PF10948.8 Protein of unknown function (DUF2635)
MTQRITVLPGEGRTVPDPEAGDLLPAEGRVVTFNAWWQRRHNDGDITLQTEQSPTQSAETA*
>56_fragment_1_34 (29727..30317) -1 VOG10943 REFSEQ hypothetical protein
MKITPIVAHLQATCPTFAGRISAGIDWAAVALGDQLAHPSAYVIATGDLALANDLQNVVRQIITDRLDVVVVLDGGDKRGQEASEQLHAIRAELWRALVGWSPEREYDPMQYQGGALVQISGDRVTYRFGFAAQFQLGRNLESQPAETWHEAYLDGLPGFTGATFEMDSIDPADPNLKYPGPDGRIEVKFSGDVKP*
>56_fragment_1_35 (30404..30742) -1 None hypothetical protein
MHRCVIDFIARRAWRQIEVWVIAALMIAGCLMLGFQAGQWSANAEHTQQLAEVRKAYDAALGRRDLRLDRLAESTTEAAGKVESAATVASEAAHTASRAADKANKVLDKATQ*
>56_fragment_1_36 (30723..31112) -1 None hypothetical protein
MDPTDLGPGTATWLGGTGTILLGGFLWLRKFLSRDAADRAMDNADIGTVRRLNELLDSERQIRKEAEARADQFAKERNELAAAVGRMEGKIEALTSHIVQLTDKVTTQSAEIARLRSQLGGANDAQMRN*
>56_fragment_1_37 (31417..31854) -1 None hypothetical protein
MIEEVEAVMVHWGEQRNRIGLSGGLSSPMAGIMEWGAYIPRSTPGSRSLIGNGSGMDYISSEVEAAVAEMARSPARSRGPELAQLAALRYVESLPVREQMRLLGINEGADRTYRNWVDKLHQSVLAALSARSASRSNRKEVARRA*
>56_fragment_1_38 (31985..32641) 1 K01356 lexA; repressor LexA [EC:3.4.21.88]
MIRRMNKWYEVARQVMDTQQISQEEMAERMGVTPGAVGHWLNGKREPKIEVINRLLGELGLPILTTSIPWNEPGQQNVAPTEQPSRFYRYPVISWVEAGGWNEAVEPYPVGYSDTFELSDYKAKGRAFWLVVRGDSMTAPAGQSIPEGMLILVDTGIEPTPGKLVIAKLPESNEATFKKLVEDAGRYFLKPLNPAYPTIAISEECKLIGVIRQMTMRL*
>56_fragment_1_39 (32789..33046) 1 VOG04172 REFSEQ hypothetical protein
MQNLRPEASQHDAYLALAQRIQDLITSPKAQIEHQVLLVREPGESPVHWEQIVEQISEAEGINVTRNFENGSVNVSWYVESADAY*
Как вы можете видеть, в геноме есть много мест, из которых извлекаются эти данные. Мне нужно создать отдельный файл для каждого из этих местоположений на основе первого числа (т. е. 56_fragment.. против 122_fragment..)
Поэтому, по сути, для этого одного файла мне нужно, чтобы он был разделен на два в зависимости от того, совпадает ли первое число после>. Мне нужен систематический способ сделать это, так как каждый из моих 1534 файлов, отформатированных так, как описано выше, имеет разные выходные данные.
Мы будем очень признательны за любую помощь.
Комментарии:
1. Вы хотите получить помощь с каким-то кодом, который вы уже написали, или вы хотите нанять кого-то, кто напишет какой-то код для вас?