Introduction to Retrosynthesis
This protocol considers the scenario where a desirable product is known, but no natural pathway leading to it is known. In such a case, we use retrosynthesis to discover possible pathways. Such algorithm generates networks linking the target compounds that one desires to bioproduce (the source) and the metabolites of the chassis strain (the sink) by applying chemical rules.
These networks are then processed to extract biologically relevant information.
For example, pathways can be enumerated  and ranked based on several criteria including enzyme availability and performance, product and intermediate compound toxicities , or the theoretical yield of the desired compound [187, 188, 212, 190, 213]. Users of retrosynthesis-based solutions often face a challenge common to most of these tools: the algorithms and the underlying data are often not fully documented and released. In most cases, authors provide fine-tuned webservers [187, 113, 188, 189, 214] filled with pregenerated data that focuses on some exemplar cases. On the contrary,  provide the open-source workflow RetroPath2.0 based on theKNIME analytics platform  that uses community nodes and is therefore fully modular and can easily be tuned to a user’s need. We describe first a protocol based on the RetroPath2.0 workflow and the necessary steps to generate and use retrosynthesis for bioproduction. The tool is available at myExperiment.org along with a set of reaction rules and some classic metabolic engineering examples to test RetroPath2.0 features.
Protocol Description (RetroPath2.0)
Sink and Source Denition The first step to generate a retrosynthesis map is to encode all compounds of interest in a format that will allow processing by the retrosynthesis algorithm. Source compounds are the compounds the workflow starts iterating upon (compounds one desires to produce) and the sink compounds are the compounds where the algorithm stops (either the metabolome of the chassis organism or compounds easily supplemented in the media).
1. Gather compounds from a whole-cell metabolic model of the chassis organism of interest. It must include structural data for compounds.
2. Filter out compounds with incomplete structure. They often stand to define a class of compounds and cannot be processed further.
3. Select the sink compounds: either the whole set of compounds from the chassis organism or a subset selected based on expert knowledge.
One can, for example, remove compounds belonging to blocked pathways by performing a flux-balance analysis. Sink compounds can also include molecules easily supplemented in the media.
4. Choose source compounds that one desires to bioproduce and collect associated structural information in order for the algorithm to process them.
Reaction Rules The second step is to encode (bio)chemical reactions that will be used to perform the retrosynthesis. RetroPath2.0 uses reaction SMARTS to encode reactions. It is a SMIRK-like reaction rule  format defined by RDKit .
Use case: 1,4-Butanediol Pathways Prediction Using RetroPath2.0
1,4-Butanediol is an important commodity chemical used as a starting point for the synthesis of other chemicals and polymers such as the polybutylene terephthalate, a unique engineering plastic. While most of the production of 1,4-butanediol is performed by chemical synthesis and is still making use of petroleum-based feedstock, a bioprocess alternative has been first reported in .
Here we showcase the usefulness of RetroPath2.0 in order to predict pathways enabling the bioproduction of 1,4-butanediol in E. coli.
Isomer enumeration is a long-standing problem that is still under scrutiny [227, 228]. Our intent here is not to provide the fastest enumeration algorithm but to demonstrate how RetroPath2.0 can perform that job once appropriate reaction rules are provided. However, we provide in Figure 2.1 a comparison of RetroPath2.0’s execution time with the OMG and PMG software tools [228, 229] specifically dedicated to isomers enumeration. RetroPath2.0 is found faster than OMG but slower than PMG. Thereafter, we outline two approaches making use of RetroPath2.0. The first is based on the classical canonical augmentation algorithm  and the second consists of iteratively transforming a given molecule such that all its isomers are produced. We name this latter approach isomer transformation. In both cases we limit ourselves to structural (constitutional) isomers, as there already exist workflows to enumerate stereoisomers .
Virtual screening in the chemical space
In this section we used RetroPath2.0 to search all molecules that are at predefined distances of a given set of molecules. Such queries are routinely carried out in large chemical databases for drug discovery purposes , but in the present case we search similar structures in the entire chemical space. To perform search in the chemical space, we used a source set composed of 158 well-known monomers having a molecular weight up to 200 Da. Our rule set included the transformations colored green in Figure 2.4 (i.e. transformation rules where double bonds are not transformed into cycles and conversely). For each monomer, RetroPath2.0 was iterated until no new isomers were generated. Each generated structures at a Tanimoto similarity greater than 0.5 from its corresponding monomer were retained (Tanimoto was computed using MACCS keys fingerprints ). Next, we wanted to probe if the generated structures exhibited interesting properties as far as polymer properties are concerned. To that end we first developed a Quantitative Structure Property Relationship (QSPR) model taking properties from . We focused on polymer glass transition temperature Tg data . The QSPR model was based on a ran- dom forest trained using RDKit fingerprints descriptors . The obtained model had a leave-one-out cross-validation performance of Q2 = 0.75. The model was then applied to predict the Tg for the set of enumerated isomers.
PubChem. Tg values for enumerated isomers appeared evenly distributed around 301.86 ± 25.69 K compared with the isomers that were available in PubChem (331.66±46.19 K). This shift in the Tg values could be explained by the difference in distribution that necessarily exists between the isomers that are present in PubChem and the total number of enumerated isomers.
As we lower the Tanimoto threshold, some monomers might become underrepresented in terms of isomer availability in PubChem. Figure 2.6 shows the distributions of both sets of isomers in function of the threshold. The increased ability of selecting polymers with Tg above or below room temperature for the enumerated set compared with the PubChem isomers is a desirable feature, as this parameter will determine the mechanical properties of the polymer . In that way, performing a virtual screening of the chemical space of isomers of the reference monomers opens the possibility to engineering applications with improved polymer design.
Table of contents :
Résumé français détaillé
0.1 Outils de conception assistée par ordinateur
0.2 Analyse et modélisation de circuits métaboliques: des données
à la connaissance
Synthetic biology’s aims and advances
Design tools for metabolic circuits
Analysis and modeling tools for metabolic circuits
Thesis structure and contributions
I Computational design tools
1 Enzyme Selection and Pathway Design
1.3 Enzyme selection
1.4 Pathway Design
1.5 Summary and conclusion
2 Molecular structure enumeration
2.2 Results and Discussion
2.5 Supplementary Data
3 RetroPath3.0: Similarity-guided Monte Carlo Tree Search for metabolic engineering
3.3 Theoretical background
3.4 Results and Discussion
3.6 Materials and methods
3.7 Supplementary Tables
3.8 Supplementary Note 1: Parameters’ Role and Effects
4 Detectable Compounds Dataset
4.3 Experimental design, materials and methods
5 Active learning for cell-free optimization
5.2 Results and Discussion
5.4 Supplementary Data
II Analyzing and modeling metabolic circuits
6 Transcriptional Biosensors for Metabolic Engineering
6.3 Designing a transcriptional biosensor to detect a compound of interest
6.4 Computer-assisted fine-tuning of biosensor properties
6.5 Custom-made biosensors’ new application domain: cell-free metabolic engineering
7 Building a minimal and generalisable model of transcription factor-based biosensors: Showcasing flavonoids
7.3 Materials and methods
7.6 Supplementary Data
8 Models for Cell-free Synthetic Biology
8.3 Translation and transcription processes in cell-free
8.4 Resource competition in cell-free
8.5 Metabolism in cell-free
9 Plug-and-Play Metabolic Transducers
9.6 Mathematical Modeling of Cell-Free Biosensors
9.7 Mathematical model derivation
10 Metabolic Perceptrons for Neural Computing in Biological Systems
10.6 Supplementary data
Conclusion & perspectives
Design tools for metabolic circuits
Analysis and modeling tools for metabolic circuits
List of Symbols
List of Tables
List of Figures