Got it

Technical principles of deduplication and compression

Created: Jan 21, 2020 03:43:58Latest reply: Jan 21, 2020 03:48:20 242 1 0 0 0
  Rewarded HiCoins: 0 (problem resolved)

Hi, Community!

What are the technical principles of deduplication and compression, especially in I/Os? I appreciate your help!

Featured Answers
little_fish
Admin Created Jan 21, 2020 03:48:20

Hi Axe,

Here are some tips about the technical principles of deduplication and compression.

Deduplication

  •    Divides I/Os into blocks of 4 KB/8 KB and calculates fingerprints of data blocks using the weak Hash algorithm. Checks the global fingerprint table and determines deduplicate data preliminarily.

  •    Checks deduplicate data blocks. Compares data with the same fingerprint as that in the storage byte by byte. The data is duplicate if it is the same as the stored data.

  •    Adds a mapping entry to the fingerprint index for the duplicate data and adds the reference number of the index.

  •    Forwards the fingerprint index to the owning controller and inserts it into the mapping table.

  •    Returns a write completion message.

Compression:

  •    Divides I/Os into blocks of 4 KB/8 KB/16KB/32KB after deduplication. 8 KB of data can be compressed into 512 bytes at most.

  •    Uses the LZ4 algorithm for compression and 1 KB alignment for Dorado V3. The compression alignment granularity for peer vendors' storage is 2 KB.

After compression, converges data into full stripes and writes data to disks. 

I hope it is of help! Thanks.

View more
  • x
  • convention:

All Answers

Hi Axe,

Here are some tips about the technical principles of deduplication and compression.

Deduplication

  •    Divides I/Os into blocks of 4 KB/8 KB and calculates fingerprints of data blocks using the weak Hash algorithm. Checks the global fingerprint table and determines deduplicate data preliminarily.

  •    Checks deduplicate data blocks. Compares data with the same fingerprint as that in the storage byte by byte. The data is duplicate if it is the same as the stored data.

  •    Adds a mapping entry to the fingerprint index for the duplicate data and adds the reference number of the index.

  •    Forwards the fingerprint index to the owning controller and inserts it into the mapping table.

  •    Returns a write completion message.

Compression:

  •    Divides I/Os into blocks of 4 KB/8 KB/16KB/32KB after deduplication. 8 KB of data can be compressed into 512 bytes at most.

  •    Uses the LZ4 algorithm for compression and 1 KB alignment for Dorado V3. The compression alignment granularity for peer vendors' storage is 2 KB.

After compression, converges data into full stripes and writes data to disks. 

I hope it is of help! Thanks.

View more
  • x
  • convention:

Comment

You need to log in to comment to the post Login | Register
Comment

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.