In bioinformatics EcoCyc is a biological database for the bacterium Escherichia coli K The EcoCyc project performs literature-based curation of the E. coli. PDF | EcoCyc is a bioinformatics database available at that describes the genome and the biochemical machinery of Escherichia. EcoCyc is a scientific database for the bacterium Escherichia coli K MG The EcoCyc project performs literature-based curation of the.

Karp; The EcoCyc database: New experimental discoveries about gene products, their function and regulation, new metabolic pathways, enzymes and cofactors are regularly added to EcoCyc. New SmartTable tools allow users to browse collections of related EcoCyc content. SmartTables can also serve as repositories for user- or curator-generated lists.

EcoCyc now supports running and modifying E. Over the course of many decades, thousands of researchers have contributed to the experimental study of Escherichia coli. Considering its moderate-size genome that contains roughly genes, it is easy to assume that we already know most of what there is to know about E.

In fact, however, many facets of E. One of the most significant uses of a model organism database such as EcoCyc is to collect and disseminate both longstanding knowledge as well as recent research advances in an easily accessible format.

EcoCyc continues to serve in this role for the E. Since our last publication on EcoCyc in the NAR Database Issue 2many improvements to EcoCyc, the Pathway Tools software, the BioCyc family of websites and the display and analysis tools available there have been described elsewhere 3 — 6. Here, we focus on recent updates to the EcoCyc web site and on additions to the database that reflect new knowledge about E.

Manual curation within EcoCyc continues to adopt a two-pronged approach. Priority is given to adding new functions for gene products when they are reported in the literature.

Databsse effort also goes into updating older entries in the database; typically this is undertaken by reviewing all the proteins belonging to a specific family, metabolic pathway or regulon. Occasionally, large datasets containing, for example, gene essentiality or protein localization information are added computationally.

An overview of the current data content of EcoCyc is shown in Table 1. As of May release The GenBank sequence record had been updated from version U Importantly for next-generation sequencing studies, U Sequence differences between common E. Most significantly, the final sequenced strain differs from other MG strains by carrying mutations that inactivate the transcriptional regulators encoded by crl and glpR and the galactitol transporter component encoded by gatC.

Due to an IS1 element insertion and other indels, the nucleotide coordinates of genes and other features differ ecofyc U To dstabase the transition for researchers with datasets that use the prior genome coordinates, the EcoCyc web site is datwbase a coordinate mapping service that translates data files containing the old genome coordinates to new coordinates.

However, researchers should keep in mind that the genome of any given laboratory stock of MG will also differ from the published genome sequence, adtabase that some differences will be physiologically significant.

Since our last publication in the NAR database issue 214 new transport proteins or complexes have been characterized and curated accordingly Table 2.

We have also reviewed and updated the curation of 48 proteins, both membrane and cystosolic, which belong to the functional superfamily of the phosphoenolpyruvate PEP -dependent, sugar transporting phosphotransferase systems PTS sugar.

The EcoCyc Database

This work extends our representation of the range of substrates, both physiological and non-physiological, that E. Our coverage of the literature was improved through the addition of a further citations.

We have reviewed and updated transporter classes within the Pathway Tools ontology. All transport proteins in EcoCyc are classified within this ontology, making it straightforward for a user to accurately determine the number and identity of proteins within a particular class. We have completed a long-term project to update the curation of electron transport ET pathways and respiratory enzymes in EcoCyc.

An initiative to represent ET pathways in EcoCyc was first described in 10 along with the subsequent addition of 11 ET pathways. We have now added a further 15 pathways Table 3bringing the total number to All pathways contain a fully referenced text summary, including when known information on energetics, isoenzyme involvement and the identity of membrane quinone s.


The curation of all 23 respiratory enzymes involved in ET pathways has also been updated. Particular attention has been given to ensuring that the correct cofactors of each respiratory enzyme are identified when known. In addition, a new, recently-described cofactor, the 4Fe-3S iron—sulfur cluster of hydrogenase I 11was added to the database as part of this project. Supplementary Table S1 summarizes the respiratory enzymes and their associated cofactors as currently represented in EcoCyc.

Just over references were added to the database, the majority of these dating from the last quarter of the 20th century, a period of intense research activity in E. A total of 14 new transcription factors TFs regulating a variety of different biological processes have been identified in the experimental literature and have been added to the database since fall The functions of these TFs are summarized in Table 4.

In addition to the new TFs, there has been an increase in other database objects like transcription units, regulatory interactions and transcription factor binding sites TFBSspromoters and terminators Table 5. Table 6 summarizes updates to existing TFs within EcoCyc. For several TFs, active or inactive protein conformations have been identified.

For example, it was shown that only the homotetrameric conformation of the quorum-sensing regulator LsrR is active, while the autoinducer-bound conformation LsrR-AI-2 is inactive. The newly discovered iron-sulfur cluster-bound conformation of IscR IscR-[2Fe-2S] was shown to regulate the expression of genes involved the iron-sulfur cluster assembly pathway through negative feedback that depends on the cellular Fe-S cluster demand Coordinated regulation of these two pathways maintains differential control of Fe—S cluster biogenesis and ensures viability under a variety of growth conditions Genomic SELEX screening usually results in the discovery of many target sites for transcriptional regulators.

Surprisingly, only one target was found for each of the two newly discovered regulators, CecR and DecR, These regulators were found to be associated with novel roles in the control of sensitivity to cefoperazone and chloramphenicol CecR 15 and cysteine detoxification systems DecR In EcoCyc, evidence codes are attached to many types of data and generally contain a supporting literature citation.

Filling gaps in our representation of transcriptional regulation, we have added missing references to the published experimental evidence to a set of promoter objects. To encourage further research on E. Static versions of these tables are available as Supplementary Tables S2 — S4. Although the basic metabolic capabilities of E. A recent example is E. Sulfoquinovose is a major component of organo-sulfur compounds in nature It is synthesized by higher plants, mosses, ferns, algae and most photosynthetic bacteria and serves as the polar headgroup of the sulfolipid in photosynthetic membranes 19 Sulfoquinovose is structurally similar to glucose, and degradation of this sugar follows a pathway that is highly similar to glycolysis Figure 2.

The sulfur-containing three-carbon degradation product sulfopropanediol is excreted and can be utilized as both a carbon and sulfur source by other organisms However, open questions remain. Although proteins with suggestive predicted functions or mutant phenotypes are encoded in the genomic vicinity of the sulfoquinovose-degrading enzymes, neither the importer for sulfoquinovose nor the exporter for sulfopropanediol has been firmly established, and no regulatory mechanisms are yet known.

New discoveries in E. This system, comprised of a periplasmic methionine sulfoxide reductase MsrP and an inner membrane, heme-binding, quinol dehydrogenase MsrQfunctions to protect periplasmic proteins from oxidative damage and is conserved throughout Gram-negative bacteria The use of membrane quinols as a source of reducing power in the cell envelope is a novel finding and represents a notable advance in our understanding of how bacteria repair damaged proteins.

The EcoCyc database: reflecting new knowledge about Escherichia coli K-12

We have added a new zoom level to the BioCyc genome browser. It is now possible to zoom to the sequence level, which will show details such as transcription start sites, transcription factor binding sites, other regulatory sites such as attenuators and interaction sites for small regulatory RNAs, as well as gene and protein ecocyyc.

This zoom level thus enables inspection of the relative location satabase sites within the sequence. Figure 3 shows the new zoom level, comparing it to two previously available zoom levels.


The SmartTables tools for manipulating sets of genes, chemical compounds, and other objects within EcoCyc and eocyc BioCyc databases have been expanded in several respects. Because the metabolic model associated with EcoCyc is derived directly from EcoCyc using the MetaFlux module of Pathway Tools, its data content has continued to databawe as we update the set of metabolic reactions and transporters within EcoCyc.

Previously, it was only possible to execute the EcoCyc metabolic model using the downloadable Pathway Tools software. To make it more accessible to users, the model can now be executed directly on the EcoCyc web site. Web-based metabolic models are also available for two other gut microbiome organisms in the BioCyc database collection, Bacteroides thetaiotaomicron and Eubacterium rectale.

In basic terms, a metabolic model consists of a set of active reactions plus the conditions of growth of the organism; the models stored within the EcoCyc website contain both. The active reactions correspond to those reactions that are active at a given time based on cellular regulation, and can be either the full set or a subset of the reactions stored within EcoCyc.

For each modeled growth situation, the conditions of growth consist of the nutrients available to the growing E. The process of running a metabolic model through the EcoCyc website consists of the following steps. First, choose EcoCyc as your current organism by clicking Select Organism Database in the upper-right corner of the screen.

At this point you can either select an existing model to run, or you can create a new model if none of the existing models cover exactly the situation you wish to model. Usually, it is easiest to create a new model by copying and editing an existing databas. You can select an existing model to ecocyf from either the list of models that other people have ecocyf public, or from a list of models that you may have saved in the past.

For example, select the public Glucose Fermentation model to select a model that anaerobically ferments glucose. Once you select a model you can inspect the nutrients, reactions, biomass metabolites, and secretions eckcyc it contains by selecting a tab of the corresponding name toward the bottom of the model-summary page see Figure 4.

To actually run the model, click Execute within the Results tab. The result of running a model is a list of steady-state wcocyc flux values for those metabolic reactions that carry non-zero flux, which are presented in a table. For example, Figure 4 shows that the two highest fluxes in the entire metabolic network during glucose fermentation are through two reactions in glycolysis. Those fluxes can be painted on the EcoCyc metabolic map diagram by clicking the button Show Fluxes on Cellular Overview.

Additional information is available from the solution file and the log file which can be accessed via buttons in the Reactions tabsuch as the uptake fluxes of each nutrient.

Imagine you want to run a model that ferments galactose instead of datahase. The Nutrients tab allows you to place upper and lower bounds on the uptake rates of different nutrients. Databsse models typically attempt to optimize the cellular growth rate, an upper bound databade be provided for some nutrient, otherwise the model would attempt to produce infinite growth, which would stymie the mathematical solver software.

Gene knock-outs can be simulated by specifying reactions to remove from the model from within the Reactions tab. EcoCyc is freely and openly available to all. New versions of the downloadable EcoCyc data files and of the EcoCyc website are released three times per year.

Access to the website is free; users are required to register for a free account after viewing dataabse than 30 pages in a given month. The content of this article is solely the responsibility of the authors and does not necessarily represent the official views datagase the National Institutes of Health. Funding for open access charge: Oxford University Press dayabase a department of the University of Oxford.