Bacterial Amino Acid Auxotrophies Enable Energetically Costlier Proteomes. (2025)

Link/Page Citation

Author(s): Niko Kasalo [1]; Tomislav Domazet-Lošo (corresponding author) [1,2,*]; Mirjana Domazet-Lošo (corresponding author) [3,*]

1. Introduction

The concept of functional outsourcing suggests that organisms streamline their genomes by losing costly functions and substituting them through biological interactions [1]. A particularly striking example is the outsourcing of amino acid (AA) production in animals, which almost universally lack the ability to synthesize about half of the proteinogenic AAs [1,2,3,4,5,6]. Our recent research demonstrates that this phenotype is driven by selective pressures related to the energy cost of AA production, allowing animals to evolve proteomes that incorporate expensive AAs more frequently [2]. These findings raise the possibility that similar correlations between proteome costs and AA biosynthetic capabilities exist in other major clades on the tree of life, including bacteria.

Compared to animals, the understanding of bacterial AA biosynthesis remains relatively incomplete. Earlier studies indicated that most bacteria possess at least a few auxotrophies [7,8]. However, recent research has revealed that the prevalence of AA auxotrophies may have been overestimated due to gaps in knowledge regarding bacterial biosynthetic pathways, though many bacteria still lack the ability to synthesize the full set of AAs [4,9]. Experimental evidence shows that AA auxotrophies can confer a fitness advantage in competition with prototrophic strains [7,10], and they may serve as an evolutionary strategy to reduce biosynthetic burdens via cooperative interactions within microbial communities [11].

However, the adaptive benefits of AA auxotrophies in bacteria remain debated. A previous study speculated that outsourcing AA biosynthesis might reduce cellular metabolic costs [10]. While it is well-established that AAs vary in biosynthetic cost [11,12,13] and that costlier AAs promote stronger microbial cross-feeding interactions [11], a recent study found no significant correlation between AA biosynthesis costs and the prevalence of AA auxotrophies among bacteria [4].

Nevertheless, alternative approaches could provide a more rigorous test of whether energy-related selection drives the evolution of AA auxotrophies. For instance, if such a selection influences the loss of AA biosynthetic capabilities, it should also be reflected in the incorporation of more expensive AAs into bacterial proteomes [2]. To our knowledge, no study has yet examined bacterial proteome costs in this context.

Detecting auxotrophies remains an active area of research, employing various methodologies. Many studies rely on in silico approaches, such as genome-scale metabolic modeling [14,15] or homology-guided enzyme annotation [4,7,16]. To assess their accuracy, these computational methods are typically validated against a limited number of experimentally derived datasets [4,14]. However, no simple or standardized protocol currently exists for comparing AA auxotrophy estimates across studies.

To address these gaps, we assembled a taxonomically diverse dataset of bacterial proteomes predicted from genome sequences and examined whether AA biosynthesis costs explain trends in AA auxotrophy composition. Using a simple auxotrophy detection methodology based on the MMseqs2 clustering pipeline, we obtained results comparable in quality to previous approaches. Our findings reveal that costlier AAs are more frequently lost and that bacteria with more expensive AA auxotrophies encode costlier proteomes, mirroring patterns observed in animals. These results suggest that energy-driven selection plays a key role in shaping AA auxotrophies in bacteria, enabling them to explore protein sequence space more freely during evolution.

2. Results

2.1. Global Trends in Bacterial AA Auxotrophies

To explore global patterns in the evolution of bacterial auxotrophies, we assembled a database of 980 high-quality proteomes, computationally inferred from their corresponding genomes, capturing broad bacterial diversity (Table S1). We assessed the completeness of AA biosynthesis pathways in these species using an MMseq2-based clustering approach [17]. Our method clusters, in a single step, a representative sample of bacterial enzymes known to catalyze reactions in AA biosynthesis pathways together with all bacterial proteomes in our database (see Section 4). Based on the composition of the recovered clusters, functional information on AA anabolism is then transferred between cluster members, allowing us to determine the completeness of 20 AA biosynthesis pathways for each species. Unlike previous studies that rely on strictly defined cutoff values to designate auxotrophies—thereby losing part of the available information—we analyzed pathway completeness values directly. This approach provides a more accurate representation of the likelihood that a particular pathway is present.

For each of the twenty amino acids, completeness score (CS) range from 0 to 1, where 0 indicates that all enzymes in the pathway are absent, and 1 indicates that all enzymes are present in a given species (Figure 1, File S1). When summing the completeness scores for all 20 AAs, we found that the average value across all bacterial species is 17.33, and that nearly 66% of species in our dataset exhibit total pathway completeness above 19. These high completeness values suggest that many bacteria are prototrophic for most amino acids, aligning with findings from recent studies [4,9]. However, certain taxonomic groups—including Lactobacillales, Mollicutes, and Borreliaceae—exhibit substantial incompleteness in multiple amino acid biosynthetic pathways (Figure 1, File S1), indicating that auxotrophies are common in these groups.

2.2. Expensive AAs Are More Commonly Lost

If energy-driven selection influences the evolution of bacterial AA auxotrophies, one would expect the ability to synthesize energy-expensive AAs to be lost more frequently. To test this hypothesis, we devised an AA auxotrophy index (AI, Equation (1)), defined as 1 minus the completeness score, and compared it against the opportunity cost [2], which estimates the energy expenditure associated with AA synthesis (see Section 4). We observed a significant but moderate correlation between opportunity cost and auxotrophy index when opportunity cost values for respiratory metabolism were applied (Figure 2a,b). However, under fermentative conditions, this correlation was much weaker and not statistically significant (Figure 2c). Together, these findings suggest that the evolution of AA auxotrophies in bacteria, similar to that in animals, is primarily driven by selection favoring energy savings during AA synthesis—a pattern most pronounced under respiratory conditions.

2.3. Energy Savings via AA Auxotrophies Enable Costlier Proteomes

However, the central question remains: how do reductions in AA biosynthetic pathways influence the overall cost of encoded proteomes? If energy-driven selection underpins the evolution of AA auxotrophies in bacteria, one would expect auxotrophic species to maintain more expensive proteomes than their prototrophic counterparts. This is because AA auxotrophs expend considerably less energy on AA biosynthesis and acquire the missing AAs at a relatively low cost from the environment. In essence, this pattern suggests that the energy saved on synthesizing costly AAs offsets part of the proteome’s energy expenditure, enabling auxotrophs to incorporate more expensive proteins, potentially contributing to novel functions [2].

To test this idea, we calculated the opportunity cost of each bacterial proteome (OCproteome, Equation (2)) as well as the overall biosynthetic savings achieved through AA auxotrophy (OCsavings, Equation (3)). These calculations were performed across three respiratory modes: fermentation, low respiration, and high respiration [2]. We then assessed the correlation between OCproteome and OCsaving (Figure 3). Regardless of the respiratory mode used to estimate energy expenditure, our results showed that species achieving the greatest energy savings in AA production also maintain the most expensive proteomes. Similar to animals, bacterial auxotrophies appear to influence not only the immediate energy budget related to AA production but also reduce selective pressure against the use of energetically costly AAs in proteomes. Notably, we found that Mollicutes and Borreliaceae are almost invariably highly auxotrophic and possess extremely expensive proteomes. On the other hand, Lactobacillales exhibit a wide range of auxotrophy levels and have proteomes that are slightly more expensive than average.

2.4. Expensive Proteins Have Ecologically Relevant Functions

To investigate the functional background of the most expensive proteins in Mollicutes, Borreliaceae, and Lactobacillales, we conducted an enrichment analysis of COG functions. Within each of these three clades, we separately calculated the opportunity cost of every protein (Equation (4)) and used MMseqs2 clustering [17] to group them into homologous clusters [1,2]. Next, we determined the opportunity cost of each cluster by averaging the opportunity costs of its members (Equation (5)). Finally, we performed an enrichment analysis of COG functions for the top 10%, 20%, 30%, 40%, and 50% most expensive clusters within each clade (Figure 4).

All three bacterial clades show enrichment in a similar set of COG functions; however, the statistical significance of these enrichments varies between the clades. For instance, proteins with unknown functions are among the top two functional categories with the most significant enrichments (i.e., the lowest p-values). However, the second most significantly enriched category differs across clades: defense mechanisms in Borreliaceae (Figure 4a), inorganic ion transport in Mollicutes (Figure 4b), and intracellular trafficking in Lactobacillales (Figure 4c). This suggests that many of the most expensive proteins in these bacterial clades remain severely understudied, while those that have been characterized are involved in different aspects of cellular processes.

3. Discussion

In our previous work, we applied the concept of functional outsourcing [1] to amino acid (AA) biosynthesis in animals and developed a model describing the conditions required for the evolution of AA auxotrophies [2]. This model predicts that the loss of AA production capabilities is not random because energy-optimizing selection favors the loss of pathways for energetically costly AAs. As a result, AA auxotrophs are able to more freely explore protein sequence space by reducing selective constraints on the use of expensive AAs in their proteomes [2]. To test the broader applicability of this model across major clades, we investigated the patterns of AA auxotrophies in bacteria.

While animals exhibit nearly identical sets of auxotrophies among each other, bacterial metabolisms are far more diverse, making bacterial auxotrophies more challenging to detect and interpret [4,7,9]. Although it is known that bacterial AA auxotrophies can confer a fitness advantage in competition with prototrophic strains [7,10], it remains unclear which adaptive benefits are gained through AA auxotrophy and whether they are linked to energy management [2].

Our results reveal a significant positive correlation between the AA auxotrophy index averaged across tested bacteria and the AA opportunity cost, as predicted by the AA outsourcing model [2]. This finding aligns with earlier observations that costlier AAs promote stronger microbial cross-feeding interactions [11] and that AA auxotrophies confer a fitness advantage [7,10]. Beyond our study, only one prior investigation has explicitly examined bacterial AA auxotrophies from an energy-usage perspective, reporting no significant correlation between the frequency of an AA being auxotrophic and its biosynthetic cost [4]. The reasons for this discrepancy remain unclear and may stem from differences in bacterial genome datasets, auxotrophy detection pipelines, biosynthetic cost estimates, or uneven sampling of bacterial groups. Nonetheless, other trends observed in that study [4] closely resemble our findings—tryptophan, leucine, histidine, valine, and serine are commonly auxotrophic.

To further test the robustness of our results, we analyzed data from another comparable study that estimated AA auxotrophies in the gut microbiome using metabolic modeling, though it did not include energy calculations [14]. Similar to our study, this dataset also showed a significant correlation between the frequency of an AA being auxotrophic and its biosynthetic cost, with an even higher correlation coefficient than in our results (Figure S1). Consistent with our findings, they reported that tryptophan, the most expensive AA, is the most commonly auxotrophic. Moreover, their auxotrophy profiles closely resemble ours, despite being derived from a niche-derived community—gut microbiome bacteria [14]. Interestingly, they also found that host–microbiome and microbe–microbe interactions can play a crucial role in the maintenance and spread of AA auxotrophy [14], aligning with our concept of functional outsourcing [1]. Collectively, these findings suggest that, in at least some bacterial groups, AA auxotrophies are influenced by energy savings at the level of AA biosynthesis.

The second prediction of our model is that bacterial species with more auxotrophies should have more expensive proteomes. Our results confirm this prediction, showing that auxotrophic species indeed maintain genes that encode more expensive proteomes. This finding suggests that AA auxotrophy fundamentally influences protein evolution. The energy savings achieved through outsourcing AA biosynthesis relax the constraints on incorporating costly AAs into proteins, thereby enabling auxotrophic organisms to explore protein sequence space more freely [1,2,18,19,20].

This is best illustrated by Borreliaceae and Mollicutes, which exhibit extreme levels of both proteome expensiveness and auxotrophy. They have outsourced most of their metabolic processes to their hosts [21,22,23], making their AAs relatively inexpensive, which likely facilitated pathogen–host coevolution. In contrast, some Lactobacillales species exhibit only a few auxotrophies while others display many, yet this variation does not impact proteome costs, which remain relatively constant. This pattern most likely arises because Lactobacillales predominantly rely on fermentative metabolism [24]. As predicted by our model [2], fermentation lowers AA synthesis costs, thereby relaxing the selective pressures that drive the outsourcing of expensive AA production. This suggests that auxotrophies in Lactobacillales are more strongly determined by the availability of specific AAs in the environment rather than by the energy burden of AA biosynthesis [2].

Based on these findings, we hypothesize that the increase in the frequencies of costly AAs in auxotrophic species could result in the evolution of proteins with novel functions [1,2,18,19,20]. Interestingly, the most auxotrophic groups with the most expensive proteomes in our analysis are Borreliaceae and Mollicutes, the members of which are notorious for causing severe infections that are difficult to manage [25,26,27]. It is also known that Borreliella (syn. Borrelia) burgdorferi (Spirochaetales) harbors many Borreliaceae-specific genes with unknown functions, which may be implicated in the development of Lyme disease [28].

Our enrichment analysis further supports the idea that auxotrophies facilitate the evolution of proteins with novel functions. In all three tested groups, many of the most expensive protein clusters have unknown functions, suggesting that the energy saved through outsourcing is invested in lineage-specific proteins. In Borreliaceae, the enrichment of defense-related functions among expensive proteins may be associated with their complex immune evasion strategies [29,30]. Similarly, inorganic ion transport functions are the most expensive category in Mollicutes, whose pathogenicity is closely linked to alterations in the ion transport of host cells [31]. Finally, the enrichment of expensive intracellular trafficking functions in Lactobacillales likely reflects their ability to acquire nutrients through symbiotic interactions [24]. Taken together, our results suggest that auxotrophy-related energy shifts may drive the evolution and function of lineage-specific genes, a topic that warrants further in-depth investigation.

In conclusion, our results suggest a global macroevolutionary trend in bacteria where AA biosynthesis capability is shaped by energy-saving selection, ultimately leading to the evolution of more expensive proteomes. However, bacteria exhibit remarkable ecological and metabolic diversity, thriving in a variety of physically and chemically unique habitats [9,32,33]. This diversity raises the possibility that factors beyond energy costs influence reductions in AA biosynthetic pathways. Future studies focusing on specific bacterial lineages and ecologies will be crucial for uncovering the role of such factors in shaping the evolution of AA biosynthetic capabilities.

4. Materials and Methods

4.1. Databases, Completeness Score and Auxotrophy Index

To create the database of bacterial proteomes (Table S1), we combined datasets previously assembled for the phylostratigraphic analyses of Bacillus subtilis [34] and Borrelia burgdorferi [28], along with a resolved phylogeny of Escherichia coli. This resulted in a database of 980 bacterial proteomes, predicted from their corresponding genomes, representing most major bacterial lineages. The proteomes, primarily retrieved from the NCBI database and supplemented by the Ensembl database, were evaluated for contamination using BUSCO [35], and all were confirmed to be free of contamination [1].

In our previous study, we retrieved pathways and enzyme codes involved in AA biosynthesis from the KEGG and MetaCyc databases [2,36]. For AAs that can be synthesized via multiple alternative pathways, we treated each pathway separately, even when they shared some enzymes. Using this collection of enzyme codes associated with AA biosynthesis, we retrieved bacterial protein sequences from the KEGG database [37,38]. For each genus with representatives annotated in KEGG, we selected the species with the largest number of enzymes catalyzing reactions in AA biosynthesis pathways. This process resulted in a protein sequence reference database comprising 387,892 AA biosynthesis enzymes across 2095 species (Table S2).

In the next step, we combined the downloaded enzyme sequences known to be involved in AA biosynthesis pathways with all sequences from our bacterial proteomes (4,230,625 sequences across 980 species) into a single database. We then clustered this combined database using MMseqs2 [17] with the following parameters: -cluster-mode 0, -cov-mode 0, -c 0.8, and -e 0.001. Clustering with these parameters generated clusters whose members exhibit highly similar architectures, as the alignment between query and target sequences covered at least 80% of their length [1].

Using the presence of enzymes involved in AA synthesis, we functionally annotated the remaining members of each cluster by transferring functional information from known AA synthesis enzymes. For each AA biosynthesis pathway and species in the database, we calculated a pathway completeness score (CS) by dividing the number of detected enzymes by the total number of enzymes in that pathway, resulting in values ranging from 0 to 1. If a species contained alternative biosynthetic pathways for an AA, the pathway with the highest completeness score was selected. We then calculated the AA auxotrophy index (AI[sub.i]) using the following equation:(1)AI[sub.i]=1-CS[sub.i]

In this equation, CS is the completeness score, and i denotes one of 20 AAs (i = 1, …, 20).

To evaluate the performance of our method, we utilized a previously established testing set of experimentally identified prototrophies and auxotrophies [4,14] to estimate pathway completeness scores (CSs). The first tested dataset comprised 160 fully prototrophic species [4]. Our approach exhibited an error rate of 0.012, indicating that approximately 1.2% of amino acids were incorrectly identified as auxotrophic (Data S1). The second dataset included 15 species with at least one known auxotrophy [14]. In this case, we detected an error rate of 0.188 for false prototrophs, meaning that around 19% of amino acids were incorrectly classified as prototrophic (Data S2). Taken together, these error rates suggest that our approach to auxotrophy detection is conservative and aligns with error rates reported for similar methods [4,14].

4.2. Opportunity Cost Measures

In our earlier study, we calculated the opportunity cost of biosynthesis for each AA depending on the three respiration modes: high respiration, low respiration, and fermentation [2]. The opportunity cost is calculated as the sum of the energy lost in the synthesis of AAs and the energy that would have been produced if a cell catabolized precursors instead of making AAs [2,39,40]. This measure reflects the overall impact of AA synthesis on the cell’s energy budget and is rather consistent regardless of the carbon source used by the bacteria [41]. To approximate how much ATP is generated from the reducing equivalents in different respiratory conditions, we converted the reducing equivalents to ATP in the following way: (i) ‘high respiration’, representing fully functional oxidative phosphorylation: 1 NAD(P)H = 2 FADH2 = 2 ATP [12]; (ii) ‘low respiration’, representing oxidative phosphorylation without proton pumping at complex I: 1 NAD(P)H = 2 FADH2 = 1 ATP, which corresponds, for instance, to the metabolism of S. cerevisiae and some E. coli strains [42,43]; (iii) ‘fermentation’, representing anaerobic metabolism without the conversion of reducing equivalents to ATP [2]. All of these values for each AA are available in Table S3.

Using the AA opportunity cost, we also calculated the opportunity cost of each proteome (OCproteome) using the following equation:(2)OC[sub.proteome]=?i=1n=20OCi×Ni/?i=1n=20Ni=?[sub.i=1][sup.n=20]OC[sub.i]×F[sub.i]

In this equation, OCproteome is a weighted mean where OCi represents the opportunity cost of the i-th AA, N[sub.i] denotes the total number of occurrences of this AA in the entire proteome, and F[sub.i] represents the frequency of the AA in the proteome (calculated as the number of occurrences of the AA divided by the total number of AAs in the proteome).

We also introduced a new measure (OCsavings) to quantify the energy savings of a species by linking the AA auxotrophy index (AI[sub.i]) to the opportunity cost. This measure estimates the energy saved by outsourcing AA production to the environment. It is calculated as follows:(3)OC[sub.savings]=?[sub.i=1][sup.n=20]OC[sub.i]×AI[sub.i]

In this equation, OCi denotes the opportunity cost of a given AA, while AI[sub.i] denotes the AA auxotrophy index.

4.3. COG Functions Enrichment Analyses

For functional analyses, we analyzed the proteomes of Borreliaceae (50), Mollicutes (31), and Lactobacillales (74) separately, taken from our full database of 980 bacterial species. We clustered the three datasets separately using the MMseqs2 cluster algorithm (the 14-7e284 version) with the following parameters: -e 0.001 -c 0.8 --max-seqs 400 --cluster-mode 1 [1,17]. For each protein in the datasets, we obtained its COG annotations using the EggNOG-mapper (version 2.1.12) [44] with the diamond (version 2.1.8) searching tool [45]. We also computed for each protein its opportunity cost (OCprotein) for high respiration mode [2] as:(4)OC[sub.protein]=?i=1n=20OCi×ni/?i=1n=20ni=?[sub.i=1][sup.n=20]OC[sub.i]×f[sub.i]

In this equation, OCprotein is a weighted mean where OCi denotes the opportunity cost of the i-th amino acid, ni is the number of occurrences of the i-th of amino acid in a protein, and fi is the frequency of the i-th amino acid in a protein.

Finally, we performed the functional enrichment analysis of the three datasets independently. For each dataset, a cluster was assigned a COG function if at least one of its members was annotated with that function. The clusters whose proteins had no annotations were assigned with NA. The enrichment analysis was performed for clusters with at least 10 members. For each cluster, we calculated the average opportunity cost of a cluster (OCcluster) as follows:(5)OC[sub.cluster]=?i=1NOCproteini/Ncluster

In this equation, OCprotein[sub.(i)] is the opportunity cost of the i-th protein in a cluster (see Equation (4)) and Ncluster is the number of proteins in the cluster.

We performed an overrepresentation analysis for the top k% of clusters ranked by OCcluster, with k = 10, 20, 30, 40, 50, using a one-tailed hypergeometric test as implemented in the Python (version 3.2) scipy.stats module. The obtained p-values were corrected for multiple testing and adjusted using the Benjamini–Hochberg method as implemented in the Python statsmodels library [46]. All results of enrichment analysis are shown in Table S4.

To calculate correlations, we used the cor.test() function in the R stats (v. 3.6.2) package. The heatmap was visualized using the ggtree R package [47].

Author Contributions

Conceptualization, N.K., T.D.-L. and M.D.-L.; methodology, N.K., T.D.-L. and M.D.-L.; software, N.K. and M.D.-L.; validation, N.K., T.D.-L. and M.D.-L.; formal analysis, N.K., T.D.-L. and M.D.-L.; writing—original draft preparation, N.K., T.D.-L. and M.D.-L.; writing—review and editing, N.K., T.D.-L. and M.D.-L.; visualization, N.K. and M.D.-L.; supervision, T.D.-L. and M.D.-L. All authors have read and agreed to the published version of the manuscript.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data are available in the Supplementary Materials.

Conflicts of Interest

The authors declare no conflicts of interest.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Acknowledgments

We thank M. Futo, A. Tušar, S. Koska, D. Franjevic and G. Klobucar for discussions. We used the computational resources of the University Computing Center (SRCE) (Padobran) and the Institute Ruder Boškovic.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms26052285/s1.

References

1. M. Domazet-Lošo; T. Široki; K. Šimicevic; T. Domazet-Lošo Macroevolutionary Dynamics of Gene Family Gain and Loss along Multicellular Eukaryotic Lineages., 2024, 15,p. 2663. DOI: https://doi.org/10.1038/s41467-024-47017-w. PMID: https://www.ncbi.nlm.nih.gov/pubmed/38531970.

2. N. Kasalo; M. Domazet-Lošo; T. Domazet-Lošo Massive Outsourcing of Energetically Costly Amino Acids at the Origin of Animals., 2024, DOI: https://doi.org/10.1101/2024.04.18.590100.

3. S.H. Payne; W.F. Loomis Retention and Loss of Amino Acid Biosynthetic Pathways Based on Analysis of Whole-Genome Sequences., 2006, 5,pp. 272-276. DOI: https://doi.org/10.1128/EC.5.2.272-276.2006.

4. J. Ramoneda; T.B.N. Jensen; M.N. Price; E.O. Casamayor; N. Fierer Taxonomic and Environmental Distribution of Bacterial Amino Acid Auxotrophies., 2023, 14,p. 7608. DOI: https://doi.org/10.1038/s41467-023-43435-4. PMID: https://www.ncbi.nlm.nih.gov/pubmed/37993466.

5. D.J. Richter; P. Fozouni; M.B. Eisen; N. King Gene Family Innovation, Conservation and Loss on the Animal Stem Lineage., 2018, 7,p. e34226. DOI: https://doi.org/10.7554/eLife.34226.

6. R. Guedes; F. Prosdocimi; G. Fernandes; L. Moura; H. Ribeiro; J. Ortega Amino Acids Biosynthesis and Nitrogen Assimilation Pathways: A Great Genomic Deletion during Eukaryotes Evolution., 2011, 12, S2. DOI: https://doi.org/10.1186/1471-2164-12-S4-S2. PMID: https://www.ncbi.nlm.nih.gov/pubmed/22369087.

7. G. D’Souza; S. Waschina; S. Pande; K. Bohl; C. Kaleta; C. Kost Less Is More: Selective Advantages CAN Explain the Prevalent Loss of Biosynthetic Genes in Bacteria., 2014, 68,pp. 2559-2570. DOI: https://doi.org/10.1111/evo.12468.

8. M.T. Mee; H.H. Wang Engineering Ecosystems and Synthetic Ecologies., 2012, 8, 2470. DOI: https://doi.org/10.1039/c2mb25133g.

9. M.N. Price; G.M. Zane; J.V. Kuehl; R.A. Melnyk; J.D. Wall; A.M. Deutschbauer; A.P. Arkin Filling Gaps in Bacterial Amino Acid Biosynthesis Pathways with High-Throughput Genetics., 2018, 14, e1007147. DOI: https://doi.org/10.1371/journal.pgen.1007147.

10. G. D’Souza; C. Kost Experimental Evolution of Metabolic Dependency in Bacteria., 2016, 12, e1006364. DOI: https://doi.org/10.1371/journal.pgen.1006364.

11. M.T. Mee; J.J. Collins; G.M. Church; H.H. Wang Syntrophic Exchange in Synthetic Microbial Communities., 2014, 111,pp. E2149-E2156. DOI: https://doi.org/10.1073/pnas.1405641111.

12. C. Kaleta; S. Schäuble; U. Rinas; S. Schuster Metabolic Costs of Amino Acid and Protein Production in Escherichia coli., 2013, 8,pp. 1105-1114. DOI: https://doi.org/10.1002/biot.201200267. PMID: https://www.ncbi.nlm.nih.gov/pubmed/23744758.

13. K. Zengler; L.S. Zaramela The Social Network of Microorganisms—How Auxotrophies Shape Complex Communities., 2018, 16,pp. 383-390. DOI: https://doi.org/10.1038/s41579-018-0004-5. PMID: https://www.ncbi.nlm.nih.gov/pubmed/29599459.

14. S. Starke; D.M.M. Harris; J. Zimmermann; S. Schuchardt; M. Oumari; D. Frank; C. Bang; P. Rosenstiel; S. Schreiber; N. Frey et al. Amino Acid Auxotrophies in Human Gut Bacteria Are Linked to Higher Microbiome Diversity and Long-Term Stability., 2023, 17,pp. 2370-2380. DOI: https://doi.org/10.1038/s41396-023-01537-3. PMID: https://www.ncbi.nlm.nih.gov/pubmed/37891427.

15. J. Zimmermann; C. Kaleta; S. Waschina Gapseq: Informed Prediction of Bacterial Metabolic Pathways and Reconstruction of Accurate Metabolic Models., 2021, 22, 81. DOI: https://doi.org/10.1186/s13059-021-02295-1.

16. M.N. Price; A.M. Deutschbauer; A.P. Arkin GapMind: Automated Annotation of Amino Acid Biosynthesis., 2020, 5,p. e00291-20. DOI: https://doi.org/10.1128/msystems.00291-20.

17. M. Steinegger; J. Söding MMseqs2 Enables Sensitive Protein Sequence Searching for the Analysis of Massive Data Sets., 2017, 35,pp. 1026-1028. DOI: https://doi.org/10.1038/nbt.3988.

18. T. Domazet-Lošo; J. Brajkovic; D. Tautz A Phylostratigraphy Approach to Uncover the Genomic History of Major Adaptations in Metazoan Lineages., 2007, 23,pp. 533-539. DOI: https://doi.org/10.1016/j.tig.2007.08.014.

19. T. Domazet-Loso; D. Tautz An Evolutionary Analysis of Orphan Genes in Drosophila., 2003, 13,pp. 2213-2219. DOI: https://doi.org/10.1101/gr.1311003.

20. D. Tautz; T. Domazet-Lošo The Evolutionary Origin of Orphan Genes., 2011, 12,pp. 692-702. DOI: https://doi.org/10.1038/nrg3053.

21. M. Kerstholt; M.G. Netea; L.A.B. Joosten Borrelia Burgdorferi Hijacks Cellular Metabolism of Immune Cells: Consequences for Host Defense., 2020, 11,p. 101386. DOI: https://doi.org/10.1016/j.ttbdis.2020.101386. PMID: https://www.ncbi.nlm.nih.gov/pubmed/32035898.

22. P. Sirand-Pugnet; C. Citti; A. Barré; A. Blanchard Evolution of Mollicutes: Down a Bumpy Road with Twists and Turns., 2007, 158,pp. 754-766. DOI: https://doi.org/10.1016/j.resmic.2007.09.007.

23. G.Y. Fisunov; D.G. Alexeev; N.A. Bazaleev; V.G. Ladygina; M.A. Galyamina; I.G. Kondratov; N.A. Zhukova; M.V. Serebryakova; I.A. Demina; V.M. Govorun Core Proteome of the Minimal Cell: Comparative Proteomics of Three Mollicute Species., 2011, 6, e21964. DOI: https://doi.org/10.1371/journal.pone.0021964. PMID: https://www.ncbi.nlm.nih.gov/pubmed/21818284.

24. K. Makarova; A. Slesarev; Y. Wolf; A. Sorokin; B. Mirkin; E. Koonin; A. Pavlov; N. Pavlova; V. Karamychev; N. Polouchine et al. Comparative Genomics of the Lactic Acid Bacteria., 2006, 103,pp. 15611-15616. DOI: https://doi.org/10.1073/pnas.0607117103. PMID: https://www.ncbi.nlm.nih.gov/pubmed/17030793.

25. P. Pilo; J. Frey; E.M. Vilei Molecular Mechanisms of Pathogenicity of Mycoplasma Mycoides Subsp. Mycoides SC., 2007, 174,pp. 513-521. DOI: https://doi.org/10.1016/j.tvjl.2006.10.016.

26. M. Strnad; N. Rudenko; R.O.M. Rego Pathogenicity and Virulence of Borrelia burgdorferi., 2023, 14,p. 2265015. DOI: https://doi.org/10.1080/21505594.2023.2265015.

27. K.B. Waites; D.F. Talkington Mycoplasma pneumoniae and Its Role as a Human Pathogen., 2004, 17,pp. 697-728. DOI: https://doi.org/10.1128/CMR.17.4.697-728.2004.

28. N. Corak; S. Anniko; C. Daschkin-Steinborn; V. Krey; S. Koska; M. Futo; T. Široki; I. Woichansky; L. Opašic; D. Kifer et al. Pleomorphic Variants of Borreliella (Syn. Borrelia) Burgdorferi Express Evolutionary Distinct Transcriptomes., 2023, 24, 5594. DOI: https://doi.org/10.3390/ijms24065594.

29. C. Anderson; C.A. Brissette The Brilliance of Borrelia: Mechanisms of Host Immune Evasion by Lyme Disease-Causing Spirochetes., 2021, 10, 281. DOI: https://doi.org/10.3390/pathogens10030281.

30. V. Dulipati; S. Meri; J. Panelius Complement Evasion Strategies of Borrelia burgdorferi Sensu Lato., 2020, 594,pp. 2645-2656. DOI: https://doi.org/10.1002/1873-3468.13894.

31. L.C. Lambert; H.Q. Trummell; A. Singh; G.H. Cassell; R.J. Bridges Mycoplasma Pulmonis Inhibits Electrogenic Ion Transport across Murine Tracheal Epithelial Cell Monolayers., 1998, 66,pp. 272-279. DOI: https://doi.org/10.1128/IAI.66.1.272-279.1998. PMID: https://www.ncbi.nlm.nih.gov/pubmed/9423868.

32. H.-C. Flemming; S. Wuertz Bacteria and Archaea on Earth and Their Abundance in Biofilms., 2019, 17,pp. 247-260. DOI: https://doi.org/10.1038/s41579-019-0158-9. PMID: https://www.ncbi.nlm.nih.gov/pubmed/30760902.

33. C.A. Lozupone; R. Knight Global Patterns in Bacterial Diversity., 2007, 104,pp. 11436-11440. DOI: https://doi.org/10.1073/pnas.0611525104. PMID: https://www.ncbi.nlm.nih.gov/pubmed/17592124.

34. M. Futo; L. Opašic; S. Koska; N. Corak; T. Široki; V. Ravikumar; A. Thorsell; M. Lenuzzi; D. Kifer; M. Domazet-Lošo et al. Embryo-Like Features in Developing Bacillus subtilis Biofilms., 2021, 38,pp. 31-47. DOI: https://doi.org/10.1093/molbev/msaa217.

35. F.A. Simão; R.M. Waterhouse; P. Ioannidis; E.V. Kriventseva; E.M. Zdobnov BUSCO: Assessing Genome Assembly and Annotation Completeness with Single-Copy Orthologs., 2015, 31,pp. 3210-3212. DOI: https://doi.org/10.1093/bioinformatics/btv351.

36. R. Caspi; R. Billington; I.M. Keseler; A. Kothari; M. Krummenacker; P.E. Midford; W.K. Ong; S. Paley; P. Subhraveti; P.D. Karp The MetaCyc Database of Metabolic Pathways and Enzymes—A 2019 Update., 2020, 48,pp. D445-D453. DOI: https://doi.org/10.1093/nar/gkz862.

37. M. Kanehisa; M. Furumichi; Y. Sato; M. Kawashima; M. Ishiguro-Watanabe KEGG for Taxonomy-Based Analysis of Pathways and Genomes., 2023, 51,pp. D587-D592. DOI: https://doi.org/10.1093/nar/gkac963.

38. M. Kanehisa The KEGG Database., Wiley: Hoboken, NJ, USA, 2002, Volume 247,pp. 91-103. ISBN: 978-0-470-84480-9.

39. C.L. Craig; R.S. Weber Selection Costs of Amino Acid Substitutions in ColE1 and ColIa Gene Clusters Harbored by Escherichia Coli., 1998, 15,pp. 774-776. DOI: https://doi.org/10.1093/oxfordjournals.molbev.a025981.

40. H. Zhang; Y. Wang; J. Li; H. Chen; X. He; H. Zhang; H. Liang; J. Lu Biosynthetic Energy Cost for Amino Acids Decreases in Cancer Evolution., 2018, 9,p. 4124. DOI: https://doi.org/10.1038/s41467-018-06461-1.

41. H. Akashi; T. Gojobori Metabolic Efficiency and Amino Acid Composition in the Proteomes of Escherichia coli and Bacillus subtilis., 2002, 99,pp. 3695-3700. DOI: https://doi.org/10.1073/pnas.062526999.

42. S. Kok; B.U. Kozak; J.T. Pronk; A.J.A. Maris Energy Coupling in Saccharomyces Cerevisiae: Selected Opportunities for Metabolic Engineering., 2012, 12,pp. 387-397. DOI: https://doi.org/10.1111/j.1567-1364.2012.00799.x. PMID: https://www.ncbi.nlm.nih.gov/pubmed/22404754.

43. R. Schuetz; L. Kuepfer; U. Sauer Systematic Evaluation of Objective Functions for Predicting Intracellular Fluxes in Escherichia coli., 2007, 3, 119. DOI: https://doi.org/10.1038/msb4100162. PMID: https://www.ncbi.nlm.nih.gov/pubmed/17625511.

44. J. Huerta-Cepas; D. Szklarczyk; D. Heller; A. Hernández-Plaza; S.K. Forslund; H. Cook; D.R. Mende; I. Letunic; T. Rattei; L.J. Jensen et al. eggNOG 5.0: A Hierarchical, Functionally and Phylogenetically Annotated Orthology Resource Based on 5090 Organisms and 2502 Viruses., 2019, 47,pp. D309-D314. DOI: https://doi.org/10.1093/nar/gky1085. PMID: https://www.ncbi.nlm.nih.gov/pubmed/30418610.

45. B. Buchfink; C. Xie; D.H. Huson Fast and Sensitive Protein Alignment Using DIAMOND., 2015, 12,pp. 59-60. DOI: https://doi.org/10.1038/nmeth.3176.

46. S. Seabold; J. Perktold Statsmodels: Econometric and Statistical Modeling with Python.,pp. 92-96.

47. G. Yu; D.K. Smith; H. Zhu; Y. Guan; T.T. Lam Ggtree: An Package for Visualization and Annotation of Phylogenetic Trees with Their Covariates and Other Associated Data., 2017, 8,pp. 28-36. DOI: https://doi.org/10.1111/2041-210X.12628.

Figures

Figure 1: Completeness of AA biosynthesis pathways in bacteria. We created a database of 980 bacterial species to obtain a comprehensive overview of AA dispensability in this group. Fully resolved tree is shown in File S1. We retrieved data on enzymes involved in AA biosynthesis pathways from the KEGG and MetaCyc databases and searched for their homologs within our reference database using MMseqs2 clustering (see Section 4). For each AA, we showed a completeness score (CS), which represents the percentage of enzymes within a pathway that returned significant sequence similarity matches to our reference collection of AA biosynthesis enzymes. In the case of AAs with multiple alternative pathways, we showed the results only for the most complete one. [Please download the PDF to view the image]

Figure 2: Correlation between AA biosynthesis cost and the average AA auxotrophy index. We estimated the AA auxotrophy index (AI), defined as 1 minus the completeness score (CS), for 980 bacterial species and calculated the average AI value for each AA. This value was then correlated with the opportunity cost of each AA, previously calculated for three different respiratory modes: (a) high respiration, (b) low respiration, and (c) fermentation [2] (see Section 4). The Pearson correlation coefficient and p-value are displayed on the graph. Amino acids are marked using three-letter codes: Alanine (Ala), Arginine (Arg), Asparagine (Asn), Aspartic acid (Asp), Cysteine (Cys), Glutamic acid (Glu), Glutamine (Gln), Glycine (Gly), Histidine (His), Isoleucine (Ile), Leucine (Leu), Lysine (Lys), Methionine (Met), Phenylalanine (Phe), Proline (Pro), Serine (Ser), Threonine (Thr), Tryptophan (Trp), Tyrosine (Tyr), and Valine (Val). [Please download the PDF to view the image]

Figure 3: Correlation between the average AA cost per proteome and the amount of outsourced energy for AA biosynthesis. We estimated the completeness of AA biosynthesis pathways for 980 bacterial species. For each proteome, we calculated its opportunity cost (OCproteome) by multiplying the opportunity cost of each AA by its frequency in the proteome and summing the resulting values (see Section 4). For each species, we also estimated the energy saved by outsourcing AA biosynthesis (OCsavings). This was calculated by multiplying the auxotrophy index (AI) by the opportunity cost for each AA and summing the values across all 20 AAs (see Section 4). Opportunity costs for each AA were previously estimated for three different respiratory modes: (a) high respiration, (b) low respiration, and (c) fermentation [2]. The Pearson correlation coefficient (r) and p-value are displayed on the graph. [Please download the PDF to view the image]

Figure 4: COG Enrichment analysis of the most expensive protein clusters in highly auxotrophic bacterial groups. We analyzed the proteomes of (a) Borreliaceae (50 species), (b) Mollicutes (31 species), and (c) Lactobacillales (74 species) from our full database of 980 bacterial species. Each dataset was clustered separately using the MMseqs2 algorithm to identify clusters of homologous proteins. COG functions were assigned to each cluster using EggNOG-mapper (see Section 4). Clusters without functional annotations were labeled as NA (i.e., no annotation). We performed an overrepresentation analysis for the top k% of clusters ranked by OCcluster (Equation (5)), with k = 10, 20, 30, 40, and 50, using a one-tailed hypergeometric test. The resulting p-values were corrected for multiple testing using the Benjamini–Hochberg method. Only enrichment signals with p-values < 0.05 are shown (Table S4). [Please download the PDF to view the image]

Author Affiliation(s):

[1] Laboratory of Evolutionary Genetics, Division of Molecular Biology, Ruder Boškovic Institute, Bijenicka Cesta 54, HR-10000 Zagreb, Croatia; niko.kasalo@irb.hr

[2] School of Medicine, Catholic University of Croatia, Ilica 242, HR-10000 Zagreb, Croatia

[3] Department of Applied Computing, Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, HR-10000 Zagreb, Croatia

Author Note(s):

[*] Correspondence: tdomazet@irb.hr (T.D.-L.); mirjana.domazet@fer.hr (M.D.-L.)

DOI: 10.3390/ijms26052285

COPYRIGHT 2025 MDPI AG
No portion of this article can be reproduced without the express written permission from the copyright holder.

Copyright 2025 Gale, Cengage Learning. All rights reserved.


Bacterial Amino Acid Auxotrophies Enable Energetically Costlier Proteomes. (2025)

References

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Velia Krajcik

Last Updated:

Views: 5823

Rating: 4.3 / 5 (74 voted)

Reviews: 89% of readers found this page helpful

Author information

Name: Velia Krajcik

Birthday: 1996-07-27

Address: 520 Balistreri Mount, South Armand, OR 60528

Phone: +466880739437

Job: Future Retail Associate

Hobby: Polo, Scouting, Worldbuilding, Cosplaying, Photography, Rowing, Nordic skating

Introduction: My name is Velia Krajcik, I am a handsome, clean, lucky, gleaming, magnificent, proud, glorious person who loves writing and wants to share my knowledge and understanding with you.