Skip to main content
Fig. 2 | Journal of Biological Engineering

Fig. 2

From: High capacity DNA data storage with variable-length Oligonucleotides using repeat accumulate code and hybrid mapping

Fig. 2

The illustration of Repeat Accumulate (RA) coding strategies and the hybrid mapping. (A) An example of rate \(\frac {1}{2}\) packet level RA code with 3 source packets. A ith parity packet at position i is generated by bit-wise modulo-2 sum of the (i−1)th parity packet and the source packets that are connected to the ith X-OR node. (B) The flow chart of the hybrid mapping. Each binary sequence is initially mapped via binary-to-quaternary mapping. With one of interleaving patterns, the interleaved sequence with the flag nucleotide appending at the end might pass the screening test where GC content and homopolymer are checked, outputting a valid sequence. Otherwise, the original binary sequence will be sent to the variable-length constrained (VLC) mapping. (C. i) The FSTD of a (4, 0, 2) constrained DNA storage system, where 0, 1, 2, and 3 represent four transition symbols that indicate the transitions among four nucleotide alphabets, and s0, s1 and s2 represent three different states that record the length of consecutive 0’s (no transition) in the output (4, 0, 2) constrained sequences. (C. ii) The generation of a Huffman coding tree. The Huffman coding tree optimizes the code rate by aligning the source word with high occurrence possibility to the codeword with short length and verse vice. (C. iii) The VLC mapping rule. The alignment of Huffman coding tree generates a look-up table between variable-length source words and variable-length transition codewords. (C. iv) The strategy for enabling the decoder to distinguish two mappings via the length of received DNA sequence. (D) The flow chart of the decoder. The decoder first distinguishes the mapping method the received sequence has used and performs the associative reverse. The CRC check then decides on whether the reversed binary sequence is in errors or not. Afterwards, the RA decoder works to recover all sequences in errors. (E) The distribution of lengths of mapped DNA sequences. The length of resultant DNA sequences ranges from 150nt to 159nt, where the interleaved mapping only generates sequences with the length of 151nt while sequences with other lengths are all generated by the VLC mapping

Back to article page