# DATA MATRIX BARCODE

## Format of the DATA MATRIX

Length | Type | Checksum |
---|---|---|

from 1 to 3116 digits (variable) | numeric, alphanumeric or byte |
yes (Reed-Solomon) |

The encoded data size depends on the type and the length od the data to data to code.

An intermediate structure of 8 bits is used to save each data: the codeword.

A symbol is composed of 3 groups of codewords:

- Data codewords: encoded from the data
- Stuffing codewords: not mandatory, fill the symbol when data codeword does not complete it
- Correction codewords: Based on data and stuffing codewords, it's a Reed-Solomon checksum which allows to retrieve each part of the data in case of reading error.

The encoding requires 2 steps:

- High level encoding: consists in the conversion in codewords of the data, adding some stuffing codewords and calculating the checksum one
- Low level encoding: consists to position these codewords in the data matrix.

## Graphic representation of the DATA MATRIX

A symbol is composed of black and white rectangular patterns called "cells" or "modules".

A module represent a bit: usually black for the "1" and white for the "0".

The DATA MATRIX can be a square (mostly used) or rectangular.

The symbole can be decomposed into 3 parts:

- The "finder pattern", which forms the shape of the data. There are two adjacent segments representing L, consisting of black modules only to determine the direction and orientation data. The two opposite sides alternate black and white modules to determine the size of a module, and therefore the symbol, but also to delimit the symbol
- The "quiet zone", which is a white area of at least one module located around the locating units, so that the decoder is not impaired when reading
- The data matrix which is composed of modules in the form of a matrix and which is in the center of the tracking units.

When the symbol exceeds a certain size limit, the symbol is divided into data areas, regions, which are defined by internal locating designs. Four types of division into regions exist: 3 for square and 1 rectangular shape.

## High level encoding

Six encoding modes coexist: ASCII, Text, C40, X12, EDIFACT and Base 256. The ASCII encoding is the default encoding and the most used. It will be presented here only.

The set of values of the ASCII mode codewords are visible in the following table:

Codeword value | Meaning |
---|---|

1 - 128 | ASCII characters (ASCII value + 1) |

129 | first stuffing character |

130 - 229 | Number between 00 and 99 (numeric value + 130) |

230 | Begin C40 encoding |

231 | Begin Base 256 encoding |

232 | FNC1 |

233 | Structured append. Allows a message to be split across multiple symbols |

234 | Reader programming |

235 | Upper Shift - Set high bit of the following character |

236 - 237 | 05 Macro - 06 Macro |

238 | Begin ANSI X12 encoding |

239 | Begin Text encoding |

240 | Begin EDIFACT encoding |

241 | ECI - Extended Channel Interpretation code |

242 - 255 | Not used |

Some of the values will not be explained because not used in the ASCII explanation.

In the ASCII mode, some data are encoded according to three possibilities:

- The value of a non-extended ASCII codeword (from 0 to 127) equals "ASCII value + 1"
- The value of a non-extended ASCII codeword (from 128 to 255) is encoded with 2 codewords: the first, called Upper Shift, equals "235" and the second equals "ASCII value - 127"
- Numbers are encoding in a single codeword by group of 2 figures which is equals to "Numeric value + 130".

When the number of data codewords can completely fill the "data" part of the symbol smaller possible codewords filler are added. The first of these codewords, which indicates the end of data in this case is always "129". Following is calculated using the following algorithm 253-state:

- with "%" the modulo and v
_{i}the stuffing codeword at position i

### Calculation of correction codewords

The Reed-Solomon codes are used as error correction system. They have a capacity of detection and correction of errors defined in advance, depending on the size of the data matrix.

A body C is a set of elements provided with two closed and called binary operations addition and multiplication. Being closed means that the result of each operation on two elements of the assembly will always be an element of the assembly.

There are many bodies, finite and infinite. It turns out that for any integer prime p and any integer n (>=1), there exists a unique finite n^{p} elements, noted GF(n_{p}). These bodies are called Galois fields.

We can represent the elements of GF(n_{p}) using the basic elements of a : 0, 1, a, a^{2}, a^{3}, ..., a^{(n-1)}. Let's take an example:

- a
^{3}+ a + 1

Elements can also be represented in different forms: binary, decimal, or even in the form of a polynomial. For the same example as:

- binary: 1011 (0 indicates no a
^{2}) - decimal: 11 (8+2+1)
- polynomial: a
^{3}x^{3}+ a x + 1

For each Galois field, there is at least one primitive polynomial, denoted by p (x), which allows to build the body. Data Matrix storing data of 8 bits (in codewords), and therefore work in the Galois field GF(2^{8}). They use the primitive polynomial x^{8}+x^{5}+x^{3}+x^{2}+1, which corresponds to the value 301.

The addition and subtraction of two elements is the XOR (exclusive OR) between these two elements. Multiplication and division are calculated by multiplying the corresponding polynomials. This operation is relatively expensive in computation time, an algorithm is used. Since all components are a power of primitive polynomial p, must be sought for each element c, i power as:

- i = log
_{p}(c) - c = antilog
_{p}(i)

From there, we can calculate for two elements c and d:

- c * d = antilog
_{p}(log_{p}(c) + log_{p}(d)) - c / d = antilog
_{p}(log_{p}(c) - log_{p}(d))

We can get the e antilogs table thanks to this algorithm:

A Reed-Solomon code of length n with k data is noted RS(n,k). So there is n-k = 2t check symbols and this code can correct up to t erroneous symbols. The number of control symbols is defined by the size of the Data Matrix.

The equation for encoding a RS(n,k) code is:

- c(x) = i(x) x
^{n-k}+ [i(x) x^{n-k}] modulo g(x)

with

- c(x) : the final polynomial
- i(x) : the data polynomial
- [i(x) x
^{n-k}] modulo g(x) : the control polynomial - g(x) : the generator polynomial

The 2t check symbols are derived from a polynomial power generator 2t, 2t + 1 compound words, which is of the following form:

There are 16 block lengths of Reed-Solomon possible (see summary table). The coefficients of the generator polynomials can be calculated using the following algorithm:

Once the generator polynomial calculated, we simply follow the equation to obtain the control polynomial: it is the rest of the division between the data polynomial and the generator polynomial. Coefficients correspond to codewords control.

For Data Matrix symbols containing a total of less than 255 codewords (all symbols rectangular and square sizes less than 48x48), the codewords correction are calculated directly from the data codewords. For other symbols, the codewords correction are calculated in accordance with a procedure of interleaving: dividing the codewords into several data blocks of same size and correction codewords are computed block by block. Each codeword c is part of the group:

- c modulo n

with n the number of blocks for the symbol

For simplicity, for n blocks, the first group consists of data codewords 1, n +1, 2n +1, 3n +1, ... the second group of data codewords 2, n +2, 2n +2, 3n +2,...

An example of interleaving can be viewed as follows:

## Low level encoding

Each codeword is represented in the data matrix by a square part of 8 modules, corresponding to 8 bits. Module 1 has the value of the most significant bit (value 128), while the module 8 has the value of the most insignificant bit (value 1).

It is however possible that a codeword can not be contained entirely within the matrix at a certain location. These can then be separated into parts (2 or 3) to be continued on the other side of the symbol.

In addition, in four cases, the codewords placed in the corners can also be placed differently.

Placement in the matrix follows a certain order. The initial codewords are always placed as follows:

The codewords are then nested in the other, following the oblique lines (45 °) in parallel. The meaning is identical slashes half the time.

It may, depending on the size of the symbol, the four modules of the lower right corner are empty. Symbols 12x12 example, the data matrix is 10x10, which give 100 modules in 12 codewords can be inserted. Knowing that a codeword full 8 modules, a total of 96 modules will be occupied and the last 4 remain empty. To fill, use of alternating black and white modules as shown below:

It will then add the finder pattern and possibly the main internal finder pattern if the number of regions is greater than 1.

## Encoding example

Suppose we want to encode "ENC01." ASCII values of the characters are:

E | N | C | 0 | 1 |

69 | 78 | 67 | 48 | 49 |

Depending on how ASCII, the values of codewords will be:

Codeword | Method | Evaluation | ASCII value | Binary value |
---|---|---|---|---|

C1 | ASCII value + 1 | 69 + 1 | 70 | 0100 0110 |

C2 | ASCII value + 1 | 78 + 1 | 79 | 0100 1111 |

C3 | ASCII value + 1 | 67 + 1 | 68 | 0100 0100 |

C4 | Numeric value + 130 | 1 + 130 | 131 | 1000 0011 |

The value of codewords 1, 2 and 3 were calculated simply by using the "ASCII value + 1", while the codeword 4 encodes two digits using "130 + numeric value."

It has four codewords to encode, therefore we choose the minimum size of the symbol to encode: it is the symbol of size 12x12, offering a data matrix of size 10x10 and 12 codewords to encode compounds 5 codewords and data correction codewords 7. Knowing that 5 codewords data are needed and that we do not have that 4, a codeword filling will be added: it is the codeword value "129", which is always the first to indicate the end of data. It is not necessary to calculate other using the 253-state algorithm.

Codeword | ASCII value | Binary value |
---|---|---|

C5 | 129 | 1000 0001 |

Compared to what we saw earlier, we know that:

- n = 12, the total number of codewords
- k = 5, the number of data codewords
- 2t = 7, the number of correction codewords
- the polynomiali(x) = 70 x
^{4}+ 79 x^{3}+ 68 x^{2}+ 131 x + 129 - the polynomial i(x) x
^{n-k}= 70 x^{11}+ 79 x^{10}+ 68 x^{9}+ 131 x^{8}+ 129 x^{7}.

As a reminder, the operations occur in GF(2^{8}). With the algorithms presented above, the generator polynomial was calculated:

- g(x) = x
^{7}+ 254 x^{6}+ 92 x^{5}+ 240 x^{4}+ 134 x^{3}+ 144 x^{2}+ 68 x + 23

The calculation of the codewords representing polynomial correction is performed like this:

- (i(x) x
^{n-k}) = 4 x^{6}+ 133 x^{5}+ 98 x^{4}+ 49 x^{3}+ 253 x^{2}+ 53 x + 182

The correction codewords, corresponding to the coefficients of the polynomial are:

Codeword | ASCII value | Binary value |
---|---|---|

C6 | 4 | 0000 0100 |

C7 | 133 | 1000 0101 |

C8 | 98 | 0110 0010 |

C9 | 49 | 0011 0001 |

C10 | 253 | 1111 1101 |

C11 | 53 | 0011 0101 |

C12 | 182 | 1011 0110 |

Must then place these codewords in the data matrix according to the appropriate algorithm. The order of placement is as follows:

## Summary table of data Data Matrix

## Barcode Generator Data Matrix Online

Please enter the code: