Suvarna Garge (Editor)

Binary Reed Solomon

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit
Binary Reed-Solomon

BRS(Binary Reed-Solomon) Coding, which belongs to a RS code, is a way of encoding that can fix node data loss in a distributed storage environment, It has MDS’s(Maximum Distance Separable) encoding properties,Its encoding and decoding rate outperforms conventional RS coding and optimum CRS coding.

Contents

Background

RS Coding is a fault-tolerant encoding method in a distributed storage environment, It stores data into k blocks, each block is size l, and generating n coded blocks in k data blocks via coding matrix, where n=k+m. Each coded block is stored in a storage node, when the loss number of encoded blocks is not greater than m, the system can fix all the data from any of the k coded blocks.

Traditional RS encoding method using Vandermonde matrix as data encoded generation matrix, and regarded it’s inverse as decoding matrix. Traditional RS encoding and decoding operations are all carried out on a large domain of finite.

Encoding matrix used by BRS encoding algorithm includes only data block shift and XOR operation . Decoding arithmetic uses the ZigZag decoding. The propose of BRS encoding method improves the performance of traditional RS encoding, and the reason is that RS encoding achieve coding and decoding operations in a large finite field while BRS encoding achieve encoding and decoding operation in a finite field size GF(2), making the operations of encoding and decoding operation contains only shift and XOR operation, improving the coding and decoding speed.

The algorithm of BRS coding is proposed by the advanced network technology laboratory of Peking University, and it also released the open source implementation of BRS coding. In the actual environment test, the encoding and decoding speed of BRS is faster than that of CRS. In the design and implementation of distributed storage system, using BRS coding can make makes the system has the characteristics of fault tolerant regeneration.

BRS encoding principle

The structure of traditional Reed Solomon codes are based on finite fields, and the BRS code is based on the shift and XOR operation. BRS encoding is based on the Vandermonde matrix, and its specific encoding steps are as follows:

1、Equally divides the original data blocks into k blocks, and each block of data has L-bit data, recorded as

S = ( s 0 , s 1 , . . . , s k 1 )

where s i = s i , 0 s i , 1 . . . s i , L 1 , i = 0 , 1 , 2 , . . . , k 1 .

2、Builds the calibration data block M M has a total of n k blocks:

M = ( m 0 , m 1 , . . . , m n k 1 )

where m i = j = 0 k 1 s j ( r j i ) , i = 0 , 1 , . . . , n k 1 .

The addition here are all XOR operation,where r j i represents the number of bits of “0” added to the front of the original data block s j . Thereby forming a parity data block m i . r j i is given by the following way:

( r 0 a , r 1 a , . . . , r k 1 a ) = ( 0 , a , 2 a , . . . ( k 1 ) a )

where a = 0 , 1 , . . . n k 1 .

3、Each node stores data, nodes N i ( i = 0 , 1 , . . . , n 1 ) store the data as s 0 , s 1 , . . . , s k 1 , m 0 , m 1 , . . . , m n k 1 .

BRS encoding example

If now n = 6 , k = 3 , there I D 0 = ( 0 , 0 , 0 ) I D 0 = ( 0 , 1 , 2 ) I D 0 = ( 0 , 2 , 4 ) . The original data block are s i = s 0 , s 1 , . . . , s L 1 , where i = 0 , 1 , . . . , k 1 , The calibration data for each block are m i = m i , 0 m i , 1 . . . m x i , L + i × ( k 1 ) 1 ,where i = 0 , 1 , . . . , k 1 .

Calculation of calibration data blocks is as follows, the addition operation represents a bit XOR operation:

m 0 = s 0 ( 0 ) s 1 ( 0 ) s 2 ( 0 ) , so m 0 = ( m 0 , 0 m 0 , 1 . . . m 0 , 5 )

m 1 = s 0 ( 0 ) s 1 ( 1 ) s 2 ( 2 ) , so m 1 = ( m 1 , 0 m 1 , 1 . . . m 1 , 7 )

m 2 = s 0 ( 0 ) s 1 ( 2 ) s 2 ( 4 ) , so m 2 = ( m 2 , 0 m 2 , 1 . . . m 2 , 9 )

BRS decoding principle

In the structure of BRS code, we divide the original data blocks into k blocks. They are S = ( s 0 , s 1 , . . . , s k 1 ) . And encoding has been n block calibration data blocks, there are M = ( m 0 , m 1 , . . . , m n k 1 ) .

During the decoding process, there is a necessary condition: The number of undamaged calibration data blocks have to be greater than or equal to the number of the original data blocks that missing, if not, it cannot be repaired.

The following is a decoding process analysis:

Might as well make n = 6 , k = 3 . Then

m 0 = s 0 + s 1 + s 2

m 1 = s 0 + x s 1 + x 2 s 2

m 1 = s 0 + x 2 s 1 + x 4 s 2

Supposed s 0 is intact, s 1 , s 2 miss, choose m 1 , m 2 to repair, make

m 1 = m 1 + s 0

m 2 = m 2 + s 0

Because m 1 m 2 s 0 are known, m 1 m 2 are known. So that

s 1 , i 2 = m 2 , i + s 2 , i 4

s 2 , i 2 = m 1 , i + s 1 , i 1

According to the above iterative formula, each cycle can figure out two bit values( s 1 , s 2 can get a bit). Each of the original data block length( L bit), so after repeating L times, We can work out all the unknown bit in the original data block. by parity of reasoning, we can completed the data decoding.

Performance

Some experiments shows that, considering the encoding rate, BRS encoding rate is about 6-fold as much as RS encoding rate and 1.5-fold as much as CRS encoding rate in the single core processor, which meets the conditions that compare to RS encoding, its encoding speed upgrades no less than 200%.

Under the same conditions, for the different number of deletions, BRS decoding rate is about 4-fold as much as RS encoding rate, about 1.3-fold as much as CRS encoding rate, which meets the conditions that compare to RS encoding, the decoding speed promotes 100%.

Applications

In the current situation, the application of distributed systems is commonly used. Using erasure code to store data in the bottom of the distributed storage system can increase the fault tolerance of the system. at the same time, compare to the traditional replica strategy,erasure code technology can exponentially improve the reliability of the system for the same redundancy.

BRS encoding can be applied to distributed storage systems, for example, BRS encoding can be used as the underlying data encoding while using HDFS. Due to the advantages of performance and similarity of the encoding way, BRS encoding can be used to replace the CRS encoding in distributed systems.

Usage

There are open source code to implement BRS encoding written by C language and are placed in Github, In the design and implementation of distributed storage system, we can use the BRS encoding way to store data, to achieve the system's own fault tolerance.

References

Binary Reed-Solomon Wikipedia