We share the vision of synthetic biology as a field dedicated to solving the technical and conceptual bottlenecks of genetic engineering. Although we currently face several challenges, it is clear that many of these obstacles can be attributed either to an insufficient survey of the design space or to a lack of knowledge about the molecular and cellular functions of added components. Both of these problems can be addressed by robust, cost-effective, rapid, and outsourced DNA fabrication platforms that lower barriers to fabrication and streamline assembly. Basic standard biological parts, such as BioBricks™, as well as standard assembly strategies to join them together head-to-tail, have emerged as a way to deal with the potential cloning problems associated with the uniqueness of each DNA sequence. The rationale behind these approaches is that by standardizing parts and part junctions to conform to a particular set of rules, considerations of design and function can be separated away from those pertaining to assembly. Furthermore, because the products generated via these methods are themselves standardized parts, a single, iterative, standard assembly reaction can be used to concatenate parts together to form progressively more complex genetic devices. Thus, the problem of composition can be pursued without consideration for how the DNAs are assembled together. Our results indicate that it is possible to robustly and effectively automate the fabrication of hundreds of genes using BioBricks™-based approaches. Several alternate chemistries that can also be used as standard fabrication methods, including SOEing, SLIC, CPEC, Golden Gate Shuffling, and the isothermal method reported by Gibson [16–21], to name a few, have been recently developed. We anticipate that many of these could be similarly automated, and thus, that specific types of experiments will be better served by different assembly methodologies.
Although we currently lack specific benchmarks to compare our results to those obtained using the alternate chemistries mentioned above, a success rate above 96% for BioBricks™-based approaches is very encouraging, particularly since this is a conservative estimate of efficiency that could be improved upon through minor modifications of the existing set-up. The majority of failures observed were related to colonies containing unrecognizable plasmids, which were always traced back to poor quality input mini-prep DNA. Thus, these errors could likely be mitigated using DNA extraction protocols that perform more consistently. A second source of failure was traced back to incomplete methylation of input plasmid DNA, which gave rise to junctions lacking either lefty or righty parts. These errors can be improved upon with better methylation strains. Several of these strains are currently under development in the lab. In a few instances we also observed that particular combinations of antibiotics performed better than others, which could likely be resolved using DIAL strains or through modifications to regulatory elements, including promoter and/or RBS mutations.
Automated assembly enables cumulative information gain in every design cycle by lowering existing barriers to construction and screening. Large numbers of data points reveal patterns that reflect specific underlying properties of the elements contained in a system. Patterns can reflect several distinct variables. In the examples presented here, two of the variables observed to have a significant effect on relative protein expression levels were the location of an epitope tag within an ORF, and the location of that ORF relative to its driving promoter (first or second position in a bi-cistronic operon). As demonstrated previously, other variables can also be used to manipulate expression systems, including RBS and/or promoter mutations . Further, these variables can be used to test whether an engineered device is behaving in a manner that is quantitatively consistent with a particular model. Despite the fact that observed patterns can be useful tools for the interpretation of results, particularly because they enable informed decisions about how the system should be manipulated in subsequent cycles, these trends can rarely be reliably mapped onto specific biophysical parameters. Thus, one of the main challenges we still face is how to encapsulate the root of observed patterns. Ideally, combinatorial sets could be used to quantify parameters that reflect inherent properties of a particular part. As shown with the test sets described here, the patterns observed had, at the very least, qualitative utility for deciding whether expression levels of independent ORFs should be increased or decreased, and by how much, in order to obtain stoichiometric equivalences. Although it is tempting to try to directly encapsulate measurements like ELISA or fluorescence reads as relative measures of particular biophysical values, like transcription, translation or folding rates, we want to root our theory of encapsulation in real numbers that can be directly mapped to actual biophysical parameters reflecting molecular function, such as a molecule’s Km, Kcat, folding rate, etc. Ironically perhaps, the most central parameter that we currently manipulate in synthetic biology, expression level, cannot be easily characterized or encapsulated in terms of such parameters. To get there, one would be fully justified to express a sigma70 binding constant for a promoter, for example. Nevertheless, that alone would not encapsulate everything the promoter could do. In practice, transcription rates would depend on many factors simultaneously exerting an effect within the context of the cell, including load and stress, to name just a few. Although reading trends and patterns within a well-defined context is clearly useful, we still have a long way to go before we can encapsulate information that seeks to define narrow parameters, like whether a protein prefers to be tagged at the N- or C-terminus, or whether using a particular tag will work on a particular percentage of proteins or times. Given that undoubtedly these are statements about molecular function that are related to folding kinetics, it will be necessary to develop theory that will eventually enable us to properly measure these types of parameters, such that they can be reported in biophysically-justifiable terms.
Here we have successfully automated fabrication and characterization of a large set of DNAs based on standard biological parts. Although the process was neither easy nor straightforward, it is clear that it was doable. In fact, the theory of how to do this is simple, and implementation difficulties stem from the fact that it is challenging to automate this type of fabrication and characterization in a single, streamlined liquid handling platform, not from any intrinsic complications associated with the building of large sets of DNAs on automated platforms. To do this well in practice, it would need to be implemented in the context of a centralized DNA fabrication facility that has an assembly line of robots dedicated to specific tasks along the assembly process. Given advances in acoustic liquid handling and microfluidics, more streamlined hardware solutions may soon be available. In the mean time, we should state that the most striking observation coming out of these studies was the lack of appropriate software solutions available to integrate automation tools in a traditional academic laboratory. Electronic information held in our lab, like in many labs, is stored within an aggregation of tools that provide reasonable organization for small teams of researchers. These include sequence-management programs like ApE, Google docs for tables of sequences, parts, and samples, and wikis for experimental data. Though tools such as the Registry of Standard Biological Parts and ICE platforms provide excellent solutions for the dissemination of sequence and qualitative usage information, they do not support the full range of data types that must be considered, tracked, and persisted for managing a synthetic biology experiment. For large sets like ours, the data entry process alone becomes unrealistically burdensome when using only these tools, and the lack of cross-tool communication leads to an enormous amount of custom script writing simply to port information from existing tools into the software needed to run the automation hardware. Further, it was clear that user errors, custom notes, and non-standardized formats to freeform documents make conversion into a standardized form difficult and cumbersome. Despite rapid progress in the area of BioCAD tools [6, 23–27], basic day-to-day usability and integration into wet-lab workflows remains one of the outstanding challenges to fully realizing the benefits of these tools.
BioCAD-driven wetlab automation has been identified in both academia and industry as the primary driver of new advances in genetic engineering. Considering the biosafety and biosecurity risks associated with DNA fabrication at this level of sophistication, it is important that we begin to draft road maps of the major events in the development of fabrication standards so that proper practices can be anticipated and interventions implemented a priori. This is particularly important given the high level of interest that such technologies generate in the DIY community. The question of whether individuals, either acting alone or in small groups, could make use of these advances to set-up clandestine operations is legitimate. In the absence of guarantees the answer seems to be that although not easy at all, it would be possible for teams of 2–3 people to generate a few hundred DNA constructs a month using a similar, minimal automation setup. Clearly, DNA fabrication is just the first hurdle to overcome in a series of required steps before a functional microbial chemical factory is possible. Thus, although initial barriers to entry may not be as high as desirable, increasing the levels of throughput required for success bring about a series of complications that are not easily overcome unless one gets to a much larger operational scale, such as those found in the context of successful and well-capitalized biotech companies. When acting alone, members of the DIY community could potentially work on applications using BioCAD-driven design processes. However, efforts to optimize processes using combinatorial approaches will likely require external support from an angel investor or well-capitalized fund in order to have a meaningful effect. As we demonstrate with our particular data set, full integration of automated tools into a wet-lab environment will require not only hardware, bioware, and software, but also a complete set of tools that are responsive to the practical needs and practices of wet-lab researchers and can manage the information already held in synthetic biology labs. Fortunately, there is great interest within the synthetic biology community to develop community-shared data standards and interoperating procedures  that should significantly facilitate progress in moving forward in an ethical and safe manner.