Modeling the architecture of the regulatory system controlling methylenomycin production in Streptomyces coelicolor

Background The antibiotic methylenomycin A is produced naturally by Streptomyces coelicolor A3(2), a model organism for streptomycetes. This compound is of particular interest to synthetic biologists because all of the associated biosynthetic, regulatory and resistance genes are located on a single cluster on the SCP1 plasmid, making the entire module easily transferable between different bacterial strains. Understanding further the regulation and biosynthesis of the methylenomycin producing gene cluster could assist in the identification of motifs that can be exploited in synthetic regulatory systems for the rational engineering of novel natural products and antibiotics. Results We identify and validate a plausible architecture for the regulatory system controlling methylenomycin production in S. coelicolor using mathematical modeling approaches. Model selection via an approximate Bayesian computation (ABC) approach identifies three candidate model architectures that are most likely to produce the available experimental data, from a set of 48 possible candidates. Subsequent global optimization of the parameters of these model architectures identifies a single model that most accurately reproduces the dynamical response of the system, as captured by time series data on methylenomycin production. Further analyses of variants of this model architecture that capture the effects of gene knockouts also reproduce qualitative experimental results observed in mutant S. coelicolor strains. Conclusions The mechanistic mathematical model developed in this study recapitulates current biological knowledge of the regulation and biosynthesis of the methylenomycin producing gene cluster, and can be used in future studies to make testable predictions and formulate experiments to further improve our understanding of this complex regulatory system.


Background
There is currently an increasing demand for research and development of new antibiotics as their overuse, along with many other factors, has led to increased resistance. Streptomycetes produce approximately 70% of all commercial antibiotics currently available [1]. The bacterium Streptomyces coelicolor A3 (2) has emerged as the model organism for studying streptomycetes, initially thanks to the production of colored metabolites that *Correspondence: J.Bowyer@warwick.ac.uk 1 Warwick Integrative Synthetic Biology Centre, School of Engineering, University of Warwick, Coventry CV4 7AL, UK Full list of author information is available at the end of the article facilitated genetic studies, and more recently thanks to the sequencing of its entire genome [2]. These bacteria have a 8,667,507 base pair single linear chromosome containing protein coding genes of which over 12% are thought to be regulatory [2]. These predicted transcriptional regulators are thought to mediate antibiotic synthesis through the production of microbial hormones, as well as influence structural and metabolic cellular responses [3]. The linear SCP1 plasmid (∼ 356 kb) and the circular SCP2 plasmid (∼ 31 kb) are both present within the S. coelicolor genome and have also both been sequenced [4]. This genome sequencing has revealed many cryptic and 'silent' gene clusters: sets of genes predicted to produce a natural product, but whose product has not been observed. Silent gene clusters have been awakened through genetic manipulation of regulatory elements [5,6]. Thus, characterization of the regulatory system that mediates the production of specialized metabolites is key to discovering new natural products. Developing improved understanding of the regulatory architectures that underlie natural product biosynthesis can also accelerate the design of novel regulatory systems in synthetic biology.
The antibiotic methylenomycin A is a natural product of S. coelicolor A3(2) and is of particular interest since all of the 21 biosynthetic, regulatory and resistance genes, located in a cluster on the SCP1 plasmid [4,7], have been studied in detail [8], and a series of knockout mutant strains has been generated [9]. The regulation of methylenomycin biosynthesis is mediated by the transcriptional repressor MmfR, a TetR-family homodimeric protein consisting of an N-terminal DNA-binding domain and a C-terminal ligand-binding domain (Fig. 1a) [10,11]. In the initial growth phase of S. coelicolor, the MmfR N-terminal domain is thought to be bound to the DNA at the methylenomycin auto-regulatory response element (MARE) causing the transcriptional repression of downstream genes. MmfR holds the system in this repressed state until the advent of the small signaling molecules, methylenomycin furans (MMFs) [12]. MMFs bind specifically to the C-terminal domain of the MmfR, forming an MmfR:MMF complex that results in a conformational change in the MmfR. Consequently, MmfR is released from the MARE, negating the repression and triggering gene transcription. The biosynthesis of MMFs is controlled by the MmfLHP enzymes which are, themselves, repressed by MmfR, thus forming a feedback control loop that governs the dynamical properties of the system. A second repressor, MmyR, is homologous to MmfR yet its role in methylenomycin regulation is currently less understood. There is, however, clear evidence that the impact of MmyR is particularly significant, since S. coelicolor strains with the mmyR gene knocked out have been found to overproduce methylenomycin [9].
Homologous architectures to that of the methylenomycin regulatory system have been identified across a plethora of microorganisms [13], regulating different classes of natural products and thus indicating the utility of this specific type of regulatory architecture [12]. Responding to environmental changes is of paramount importance to these bacteria. The soil they live in presents a harsh environment with considerable competition for resources and it is therefore vital that they possess sophisticated, tightly regulated mechanisms to turn on the expression of specific genes when required. Hence, obtaining a detailed mechanistic understanding of the regulatory system controlling the biosynthetic pathway to this antibiotic has the potential to elucidate a host of other, less tractable, biosynthetic gene clusters and help standardize one of the most important regulatory networks for the development of new antibiotics.
Recent mathematical modeling investigations have generated new insights into the operation of numerous systems of interest to synthetic biologists [14][15][16][17]. Such models not only provide the capability to accurately simulate synthetic systems during the design and development phase, but are also effective tools for the prediction of system responses to variations in environmental conditions [18]. Here, we develop the first detailed a b Fig. 1

Formulation of candidate model architectures
The various binding interactions and protein expression summarized in Fig. 1 inform the formulation of our candidate model architectures. MmfR is thought to bind to three distinct intergenic regions on the gene cluster [9]. However, we combine the region associated with MmyR biosynthesis together with the region associated with both MmfR and MMF biosynthesis to form a single DNA module responsible for the biosynthesis of all three molecules (the furan producing module, fpm). That is, we use the term fpm to refer to five distinct genes that provide control over three distinct molecular products: MmyR, MmfR and MMF. The genes mmfL, mmfH and mmfP are coregulated in an operon and are directly responsible for the production (assembly) of MMF molecules; the mmfR and mmyR genes control MmfR and MmyR production respectively [9,12] (Fig. 1b). The third distinct intergenic region is represented by our second DNA module which we consider responsible for methylenomycin (MMY) biosynthesis only (the antibiotic producing module, apm). Therefore, our model architectures all consist of two fundamental DNA modules that can both be bound by MmfR, and that have production of their respective proteins repressed as a consequence. Due to its effect on the gene cluster and its homology to MmfR, in this study we hypothesize that MmyR also binds both modules in a similar manner. Our base architecture accounts for reversible MmfR and MmyR binding to both the fpm and apm to form four complexes: fpm:MmfR, fpm:MmyR, apm:MmfR and apm:MmyR. MMF binds MmfR reversibly at these complexes in order to trigger gene expression; MMF binding MmfR in solution is also accounted for since we have been able to co-crystallize MmfR:MMF complexes and solve the 3D-structure through experimentation void of target DNA modules (data not shown). MmfR:MMF complexes that dissociate from the MAREs return free MmfR and MMF back into the system irreversibly. MmyR, MmfR and MMF production is controlled by the fpm. We account for an initial repressed system state by imposing non-zero initial concentrations upon the fpm:MmfR and apm:MmfR complexes; all remaining model variables have initial concentrations equal to zero. MmfR, MmyR, MMF and MMY all undergo degradation at constant rates (Fig. 2).
This model architecture represents the extent of our current mechanistic understanding, however there are certain details that require further investigation. For example, although we believe that the MMF releases MmfR from the fpm:MmfR and apm:MmfR complexes and also binds free MmfR in solution, it would be insightful to examine the dynamical influence of each binding mechanism in isolation. Similarly, although we believe there is no interaction between the MMF and MmyR within the system (data not shown), the binding interactions of MMY are not as well documented. It may therefore be possible that MMY is able to inhibit the action of both MmfR and MmyR either through dissociation from their respective fpm and apm complexes or binding in solution. Consequently, the aim of our modeling investigation is to examine the effect of three key mechanistic properties on model performance: This set of possible molecular interactions results in 48 distinct candidate model architectures for the methylenomycin regulatory system. Each candidate architecture is given a three letter name corresponding to the interactions accounted for with respect to each of the three properties listed above. The order of the letters in each name corresponds strictly to the numerical order of these properties. For example, our base architecture (described above) is given the name BNN since it accounts for both mechanisms (B) regarding property 1, for no interactions at all (N) regarding property 2 and for no interactions at all (N) regarding property 3. Each of the 48 candidate architectures presents a distinct reaction network and set of biochemical equations that can be used to derive a dynamical mathematical model. We apply mass action kinetics to the biochemical equations comprising each reaction network to derive a corresponding system of ordinary differential equations (ODEs). Each ODE describes the rate of change in concentration corresponding to each model variable (cellular entity). The solution to each system of ODEs is determined numerically due to the non-linearity of the equations and provides a deterministic output that can be used to simulate and predict in vivo system dynamics apm:MmfR + MMF MmfR:MMF MmyR from which we derive the following system of model ODEs: where square brackets denote concentration and the reaction rate constants translate to model parameters. Reactions associated with reversible DNA:protein binding (k 1 , k −1 , k 2 , k −2 , k 4 , k −4 , k 5 and k −5 ), the production of MmyR, MmfR, MMF and MMY (k 7 , k 8 , k 9 and k 10 ) and each individual protein degradation reaction (γ 1,2,3,4 ) are common to all of our candidate model architectures. Other reactions that are associated with the release of MmfR from existing DNA:MmfR complexes or the sequestration of MmfR and MmyR via binding in solution are not common to all models and are thus subject to investigation through our computational simulations. Model simulations are provided by the numerical solutions to the relevant model ODEs, which are calculated using the ODE solver ode45 in MATLAB. We are interested in examining the dynamics of methylenomycin production in each of the 48 candidate models and therefore analyze the simulations of MMY provided by numerical solutions to the corresponding ODE (30).

Available experimental data
Methylenomycin production by S. coelicolor has been shown to adopt a typical dynamical profile [19,20]. Once expression is initiated, usually by environmental conditions that are thought to establish MMF production, it increases relatively quickly towards a global maximum level. Expression then decreases from this maximum, reaching a relatively low level at steady-state. This profile aligns with the premise that the system is initially held in a repressed state until MmfR is released by MMF to trigger methylenomycin expression, which then increases quickly until free MmfR and MmyR cause secondary repression and eventual equilibrium of the feedback loop.
We consider the binding affinity of MmfR to the fpm and apm to be strong, based on experimental data regarding binding interactions between a similar protein, SAV2270, and its associated DNA motifs (our unpublished data). We characterized the binding of this protein to Streptavidin Immobilized oligonucleotides using a Biocore T200 SPR instrument. Our data reveal that the association and dissociation rates of this protein:DNA binding are on the order of 10 5 M −1 s −1 and 10 −2 s −1 respectively. As a result, we fix the model parameters relating to MmfR association and dissociation from both the fpm and apm at 10 5 and 10 −2 respectively (k 2 = k 5 = 10 5 ; k −2 = k −5 = 10 −2 ). The dimensionality of our experimental measurements agree with the corresponding parameters in our dimensional model and we are therefore able to apply these values directly. We assume that MmyR binding interactions are identical to that of MmfR and hence the same values are fixed for the parameters describing MmyR association and dissociation from the fpm and apm (k 1 = k 4 = 10 5 ; k −1 = k −4 = 10 −2 ).
Mutant strains of S. coelicolor that account for specific gene knockouts reveal qualitatively different methylenomycin production dynamics ( Table 1). The mutant strain accounting for mmyR deletion, mmyR, has been shown to exhibit increased methylenomycin expression compared to the wildtype; in the absence of MmyR, the overall capacity of the system to repress methylenomycin production is reduced and therefore the production of the antibiotic is increased. The mmfLHP strain exhibits a complete cessation of methylenomycin expression; in the absence of the mmfLHP genes, the system Table 1 The effects of knocking out certain genes and combinations of genes observed experimentally, adapted from [9] S. coelicolor strain Methylenomycin production The wildtype strain is allocated a single '+' to denote typical methylenomycin expression. Over-expression and the cessation of expression are denoted by '+++' and '-' respectively is locked in the apm:MmfR complex since the expression of MmfR, MmyR and particularly MMF is prevented and thus the bound MmfR cannot be released. The mmfLHP+ mmyR+ mmfR strain exhibits increased methylenomycin production compared to the wildtype; in the absence of MmfR and MmyR, both initially and as a result of any subsequent production by the fpm, the apm is able to produce methylenomycin in an unrestricted manner. The mmfLHP strain with exogenous MMF exhibits relatively similar methylenomycin expression to that of the wildtype; in the absence of endogenous MMFs, exogenous MMF permits the release of MmfR and, in turn, methylenomycin expression. Experimentation with the mmfR strain has thus far yielded inconclusive results and, as such, presents the opportunity for mathematical modeling simulations to inform future experimental studies.

Model selection via approximate Bayesian computation
In order to assess the potential of the 48 candidate architectures to reproduce the known characteristics of the system, we perform model selection based on approximate Bayesian computation (ABC) using the ABC-SysBio software package. ABC-SysBio combines Bayes' rule with sequential Monte Carlo (SMC) approaches to solve parameter inference and model selection problems in systems biology [21][22][23]. The procedure determines the model, from a set of candidate models, that is most likely to have produced the associated experimental data. Extensive quantitative data regarding methylenomycin expression is lacking in the literature, however a time series expression profile is reported in [20]. We therefore provide ABC-SysBio with a dataset designed to replicate this profile (Fig. 3), with two important exceptions. Firstly, we specifically account for the dynamical series of data points in the 40 hour interval between hours 54 and 94 of the time series. This is because the 54 hour experimental time point is when methylenomycin expression commences and translates to the 0 hour time point in our simulations. The time points that precede 54 hours record the repression of methylenomycin production prior to the environmental trigger and are hence excluded when fitting a model that accounts purely for the dynamical response of the system. Secondly, we incorporate additional uniformly distributed 'synthetic' data points, increasing the size of the dataset from 5 points to 41, in order to provide a more rigorous data fitting task to the ABC-SysBio algorithm.
ABC-SysBio also requires a prior probability distribution on each model parameter subject to inference in order to establish the parameter space within which to locate acceptable parameter sets. The prior distributions chosen for all parameters associated with each of the 48 candidate models are uniform distributions on  [20]. Synthetic data points are added uniformly between real data points to increase the rigor of model selection and parameter inference the interval [ 10 −4 , 10 4 ], that is, all candidate models are given an equal parameter space in attempting to identify parameter values capable of replicating our experimental dataset. We consider uniform priors to be the most suitable for model selection given the complete uncertainty surrounding the parameters subject to inference, and hence all potential parameter values require an equal probability of selection. We also impose prior distributions on the initial conditions of the necessary state variables due to the lack of experimental data regarding the physical quantity of DNA in the system: the prior distributions are uniform distributions on the interval [ 0, 1] and are assigned only to the MmfR:fpm and MmfR:apm complexes, all other initial conditions are set equal to zero. ABC-SysBio convergence is dependent on the sequential satisfaction of a predefined series of decreasing error thresholds by a predefined number of solutions (see Methods). Here, the number of solutions required to satisfy each error threshold is 500 [24] in order to reduce the time frame required for convergence; the number of models subject to selection coupled with the inability to parallelize the process presents a particularly time consuming computational workload. The user-defined error function designed to measure the accuracy of simulations takes the mean absolute value of the difference between model outputs and data values: where E is the error and x i , d i are the model outputs and data values at each of the 41 corresponding time points, t i , respectively.
The results of our model selection are shown in Fig. 4. The final probability distributions reveal that the model most likely to have produced the experimental data is BNN, the model formulated based on our current knowledge (Fig. 4a). The BNN model achieved a 0.916 probability of producing our data which is vastly superior to the remaining models, 36 of which were statistically eliminated through the selection process. This suggests that the most plausible network of molecular interactions underlying this system should account for MMF-MmfR interactions both at existing DNA:MmfR complexes and in solution, no MMY-MmfR interactions at all and no MMY-MmyR interactions at all, as depicted in Fig. 2.
In order to verify that the addition of synthetic data points does not restrict the emergence of other viable candidate models, we repeated the model selection procedure using only the 5 real experimental data points taken from [20]. Mean absolute error generally increases with decreasing numbers of data points which subsequently increases the difficulty for each population of solutions to meet the same error thresholds. Hence, the acceptance rate decreases and the process becomes more time consuming; this run took longer than the original run and met 7 thresholds compared to the previous 11 (Fig. 4b). The probability distribution across all models clearly identified the most likely models as early as P2, which converged further at P4 and P6 to suggest that BNN was a likely model architecture, in agreement with our initial result. However, P3, P5 and P7 identified a different distribution which suggested that models SCS and SNS were also likely candidates. Given that ABC-SysBio appeared to present two alternating probability distributions, it is probable that additional local minima were identified in this case. To further investigate the set of plausible models identified using this Bayesian inference framework, we next employed global optimization methods, as well as analysis of mutant versions of the candidate models, as described in the following sections.

Parameter inference via global optimization
ABC-SysBio performs parameter inference by producing probability distributions on the numerical values that comprise accepted parameter sets during the model Histograms showing the probabilities of producing the real experimental dataset for the 48 candidate models. The numbers above each histogram denote the population number, the error threshold (square brackets) and the acceptance rate (parentheses) respectively. The number of accepted solutions required to satisfy each error threshold is 500 selection process. For example, the distributions on the initial conditions imposed on the fpm:MmfR and apm:MmfR complexes reveal that statistically these values can be approximated to be 0.6 and 0.5 respectively (Fig. 5). These distributions are insightful, but cannot provide complete clarity over the numerical values inferred in all cases. Other parameter inference methods, such as global optimization, place greater focus on the identification of specific numerical parameter sets capable of minimizing the user-defined error function. The genetic algorithm (GA), is a particularly powerful global optimization tool and is exploited regularly in biological model parameter inference [25,26]. The GA converges to the solution providing the global minimum error within the allocated parameter space by evolving an initial population of randomly generated solutions over a large number of generations. This process is based on natural selection, giving the best solutions in the population the best chance of creating the next generation of solutions (see Methods).
In our case, the error function minimized by the GA is the same absolute mean error function used for ABC-SysBio model selection (32). We also allocate the same parameter space to the GA by imposing lower and upper bounds on the inferred parameters of 10 −4 and 10 4 respectively. Again, the initial conditions imposed on the model variables are zero with the exception of those regarding the fpm:MmfR and apm:MmfR complexes which we approximate to be 0.55 in light of our ABC-SysBio probability distributions and given that we require both initial concentrations to be equal. The results of our global optimization are shown in Fig. 6. The BNN model is capable of accurately matching the experimental timecourse data when optimized within the same parameter space used for model selection. The optimal parameter set identified by the GA is listed in Table 2 and provides an absolute mean error of 6.12 × 10 −6 . The four parameter values describing protein degradation (γ 1,2,3,4 ) vary by one order of magnitude at most; the remaining parameter values all describe protein:protein association and dissociation and vary by three orders of magnitude at most. Hence, we conclude that the numerical ranges of  Genetic algorithm global optimization results. The BNN model is able to fit the experimental data using the optimal parameter set identified by the GA. The absolute mean error provided by this optimal solution is 6.12×10 −6 . The optimal fits provided by the SCS and SNS models are similar and are not as accurate as the BNN model. The absolute mean error provided by both of these optimal solutions is 2.39×10 −5 these optimal parameter values are reasonable within this biological context.
To investigate further, we also optimized the parameters of the SCS and SNS models against the experimental data using the GA, in an identical manner to that previously This optimal parameter set is dimensional, with parameters in the first and second reaction columns taking the units M −1 s −1 and s −1 respectively based on standard mass action kinetics done for the BNN model. This revealed that neither model was able to achieve the same quality of fit to the data as the BNN (minimum error of 2.39 × 10 −5 for both SCS and SNS compared to 6.12 × 10 −6 for BNN). In addition, neither the SCS or SNS models were able to even qualitatively replicate the non-monotonicity in the response that is clearly exhibited in the experimental data.

Monte Carlo simulations of methylenomycin production in mutant strains
We performed additional model validation by testing the BNN model against our qualitative data regarding methylenomycin production in mutant S. coelicolor strains. We employ Monte Carlo simulations to examine methylenomycin production under four distinct conditions corresponding to the mutant strains described in Table 1. By examining the dynamical response to specific gene knockouts against the wildtype strain, represented by the optimal BNN model output in Fig. 6, we are able to investigate the qualitative effect of adapting our BNN model to emulate these mutant strains. When simulating MMY production in the different mutant strains, we account for mmyR by simply setting the parameter describing MmyR production from the fpm, k 7 , to zero. However, mmfR strains are incapable of producing MmfR and therefore cannot be simulated in the initial repressed state comprised of the fpm:MmfR and apm:MmfR complexes. Hence, the parameter describing MmyR production from the fpm, k 8 , is set to zero and the allocation of initial concentrations is adapted to exclude the fpm:MmfR and apm:MmfR complexes. The mmfLHP strain is simulated by setting the initial concentration of the fpm and its associated complexes to zero, since the entire DNA module has been knocked out. The addition of exogenous MMF involves allocating this variable an initial concentration of 0.55 to align with the initial concentrations allocated to the relevant variables, that is, no new model parameters are introduced to describe production of exogenous MMF. Mutant strains comprising combinations of gene knockouts are simulated by combining the appropriate adaptations.
Specifically, in order to simulate the mmyR strain we set k 7 = 0. To simulate the mmfLHP strain the initial concentration of 0.55 is imposed on the apm:MmfR complex only, all other initial concentrations are set equal to zero. To simulate the mmfLHP+ mmyR+ mmfR strain we set k 7 = k 8 = 0 and all initial concentration are set equal to zero with the exception of the apm which is set equal to 0.55. To simulate the mmfLHP+exogenous MMF strain initial concentrations of the apm and MMF are set equal to 0.55, and all other initial concentrations are set equal to zero.
Monte Carlo simulations assign random values in the interval [ 10 −4 , 10 4 ] to all model parameters, excluding those that retain their fixed values assigned for previous model selection and parameter inference purposes, as we continue to examine dimensional dynamic responses. We run a total of 10 4 Monte Carlo simulations to allow for substantial sampling of the parameter space within a feasible time frame. Each simulation outputs MMY production for each of the four mutant strains and calculates the ratio of the mean value to that of the optimal wildtype simulation. We utilize these ratios to investigate the ability of our model to satisfy the following four criteria, which capture the experimentally observed responses of the mutant strains: where overproduction translates to an increase in mean MMY production of < 10%, cessation translates to a decrease in MMY production of < 10% and comparable production translates to a maximum increase or decrease in MMY production of less than 10%. The results of our Monte Carlo simulations are shown in Fig. 7. Parameter sets were identified for BNN that are capable of satisfying each of the four criteria, within the same dimensional solution space as the optimized wildtype model (Fig. 7a). Given the uncertainty regarding the effect of gene knockouts on the reaction kinetics and MMY production of the system, this qualitative agreement offers further validation of the replication and prediction capabilities of the BNN model. The SCS and SNS models are also able to simulate the responses observed experimentally for the mmyR and the mmfLHP+ mmyR+ mmfR knockout strains, but not for the mmfLHP and mmfLHP+exogenous MMF knockouts ( Fig. 7b and 7c). This is likely due to the most significant mechanistic property separating them from BNN, i.e. the interaction of MMY with one or both of MmyR and MmfR. This interaction results in decreased repression of the apm since the MMY is negating the action of either or both regulators and hence the apm is less restricted in producing MMY, which causes an overproduction of the antibiotic for the mmfLHP and mmfLHP+exogenous MMF knockouts that has not been observed experimentally (Fig. 7b and 7c). We therefore conclude that the BNN model remains the most likely candidate model to explain all the available experimental data for this system.

Experimental design for future studies
We are able to inform the design of future experimental studies in light of our results. For example, we are a b c Fig. 7 Monte Carlo simulation results. a The BNN model is able to simulate the qualitative data regarding four mutant S. coelicolor strains when adapted to replicate the corresponding gene knockouts. b The SCS model is unable to simulate qualitative data regarding the mmfLHP and mmfLHP+MMF knockout strains. c The SNS model is unable to simulate qualitative data regarding the mmfLHP and mmfLHP+MMF knockout strains interested in quantifying the response of the mmyR and mmfLHP + mmyR + mmfR strains in order to verify our model prediction that the five gene mutant elicits a more rapid and significantly greater overproduction of MMY. This has implications both in terms of improving product yields for industrially relevant natural products, and also regarding the potential adverse effects this might cause in the cells, such as toxicity. The result of these experiments would subsequently reveal whether the mmfLHP + mmyR + mmfR is the most effective knockout for improving antibiotic production in novel synthetic regulatory systems.
In the event that directly quantifying MMY production is inconclusive, we would be interested in replacing the gene controlled by the apm with a reporter gene coding for fluorescence or luminescence such as green fluorescent protein (GFP) or lux genes respectively. This output may enable us to measure the response of the different mutant strains with greater clarity, since experiments of this nature are already well characterized, particularly in the related bacterium S. venezuelae.
Finally, we are also interested in examining the mmfLHP+MMF mutant in order to establish the quantity of exogenous MMF and the specific time point of induction that provides optimal MMY production. Our model predicts a narrow production window for this strain which may suggest that direct MMY quantification is not straightforward and that, again, experimental designs incorporating reporter genes would provide improved results.

Conclusions
We have developed a plausible model architecture for the regulatory system controlling methylenomycin production in Streptomyces coelicolor. This architecture was found to most closely reproduce the various dynamical responses described by experimental time series data for this system, when tested against 47 other candidate architectures. Global optimization of the model parameters produced close agreement with the experimental data. Appropriate adjustments to the proposed model architecture allow it to replicate observed changes in the dynamics of methylenomycin production in a number of mutant S. coelicolor strains.
The mechanistic details captured in the proposed regulatory architecture provide useful insights for the design of future experiments to further investigate the operation of this system, and demonstrate the potential of mathematical models to elucidate the design principles of complex biological control systems. We expect that the emergence of further quantitative experimental data for this system will inform further model development and validation, and allow for the generation of optimized models that are capable of accurately predicting the dynamical responses of one of the most prevalent and important gene regulatory networks in nature.

ABC-SysBio model selection
ABC-SysBio is a Python software package that is designed specifically for parameter inference and model selection in biological systems research using the approach of approximate Bayesian computation (ABC) [21]. The program enables ABC inference of mathematical models via sequential Monte Carlo (SMC) approaches [21][22][23]. Monte Carlo approaches to computational simulations involve generating random candidate solutions, testing their fitness against a desired output and repeating until a viable solution can be identified. In this way, vast numbers of randomly selected parameter sets can be examined in building an accurate approximation to the posterior distribution defined by conditional probabilities known as Bayes' theorem: where P(B) > 0. The ABC-SMC approach proceeds in the following manner: the first 'population' of accepted solutions or 'particles' is generated randomly based on the prior distributions imposed on the model parameters. Each particle gives rise to a simulated dataset, D , which is compared to the fixed experimental dataset, D, by an appropriate distance function and its fitness is scored accordingly. Acceptance of a particle is dependent on a decreasing sequence of error thresholds, , set to correspond with each population. That is, where 1 > 2 >. . . > n and d is the distance function. Each subsequent population is obtained by perturbing particles from the previous population in accordance with a predetermined perturbation kernel, proceeding until the model is unable to produce particles of sufficient fitness to satisfy the immediate threshold. An array of model-specific criteria are required to allow the ABC-SysBio package to run efficiently: the sequence of decreasing error thresholds, , must be provided whereby only the particles capable of providing error less than that of the threshold will be accepted by the algorithm. Each must be satisfied in succession until the particles are unable to satisfy the next threshold. Satisfaction of an individual threshold is dependent on the number of particles accepted; the number of acceptable particles required before passage to the next threshold must also be predetermined. The larger the number of particles, the higher probability of significant inference and the longer the duration of algorithm to reach convergence. Each individual parameter subject to inference requires a prior probability distribution in order to establish the parameter space within which to locate acceptable particles. Sequences of numerical values representing the relevant experimental data and the corresponding time points must also be provided; the number of data points and time points must be equal. Time series data is currently the only supported data format. One or more distinct datasets can be supplied and can be fitted to any individual model variable or combination of variables. Convergence of the algorithm is dependent on all of the aforementioned factors and hence it may require several trials to establish the appropriate performance criteria. To achieve credible results, it is advised that identical parameter inference and model selection tasks are repeated multiple times due to the random nature of the Monte Carlo simulations that drive the algorithm. Note that all models submitted to the ABC-SysBio package must be written in Systems Biology Markup Language (SBML), a systems biology programming language based on Extensible Markup Language (XML).

Global optimization
We employ the genetic algorithm function in MATLAB in order to optimize our model against our experimental data. The reaction rate constants k i are chosen as optimization variables with the exception of those fixed in light of our kinetic data. The GA mimics natural selection; converging to the global minimum within the allocated parameter space by evolving an initial population of randomly generated solutions over a large number of generations. The probability of obtaining the global optimum solution is maximized by selecting the largest population size and number of generations possible. However, increasing the computational workload in this manner also greatly increases the time frame required for the algorithm to converge. Establishing an effective compromise is key for successful deployment. We ran the GA under the following conditions: The default GA population size in MATLAB is 200 for inference of over 5 parameters however, we selected a population size of 1000 for inference of 17 parameters since a single time series dataset presents a relatively low computational workload and thus allows optimal solutions for large populations to be obtained in feasible time frames. We selected a large parameter space due to our focus on establishing optimal model performance in light of the lack of documentation regarding reaction rates.