Synthesizing non-natural parts from natural genomic template

Table 1 Description of eka sequences

ID	Length a	Start – End	Overlap	Vector sequence		Total a+b+c	% vector contribution	Protein		e-value	Bit score	GC ratio
				b	c			Aa	M.W.			(i)	(ii)
eka1	104	70,283 – 70,386	No	381	157	642	83.8	214	23.5	>10	*	39.4	50.0
eka2	138	3,651,282 – 3,651,704	Yes, 32%	381	90	609	77.3	203	22.1	> 10	*	42.0	48.3
eka3	432	348,779 – 349,210	No	381	90	903	52.1	301	33.7	6 e-04	46.2	47.0	48.6
eka4	105	49,681 – 49,785	No	381	90	576	81.7	192	20.9	>10	*	49.5	50.0
eka5	141	57,173 – 57,313	No	381	90	612	76.9	204	22.2	> 10	*	43.2	50.8
eka6	96	70,285 – 70,380	No	381	90	567	83.1	189	20.5	>10	*	39.6	48.3

Start-end indicates genomic location of the selected sequences. 'a' indicates the length of the original genomic insert, 'b' and 'c' indicate vector contributed prefix and suffix DNA sequences respectively. Total (a+b+c) indicates the entire DNA sequence expressed into proteins. The pBAD vector contribution to the final protein sequence is indicated in percentages. Aa indicates the number of amino acid residues of the synthesized protein. M.W. refers to the Isotopically Averaged Molecular Weight calculated in kiloDaltons (kDa). (i) indicates GC ratio of the genomic insert, and (ii) indicates GC ratio of the complete DNA sequence (vector + genomic DNA) expressed into proteins. The large e-value and extremely small bit score approaching zero (*) indicates very low sequence similarity of eka proteins to the known protein sequences.

ISSN: 1754-1611