Hi Axe,
Here are some tips about the technical principles of deduplication and compression.
Deduplication
Divides I/Os into blocks of 4 KB/8 KB and calculates fingerprints of data blocks using the weak Hash algorithm. Checks the global fingerprint table and determines deduplicate data preliminarily.
Checks deduplicate data blocks. Compares data with the same fingerprint as that in the storage byte by byte. The data is duplicate if it is the same as the stored data.
Adds a mapping entry to the fingerprint index for the duplicate data and adds the reference number of the index.
Forwards the fingerprint index to the owning controller and inserts it into the mapping table.
Returns a write completion message.
Compression:
Divides I/Os into blocks of 4 KB/8 KB/16KB/32KB after deduplication. 8 KB of data can be compressed into 512 bytes at most.
Uses the LZ4 algorithm for compression and 1 KB alignment for Dorado V3. The compression alignment granularity for peer vendors' storage is 2 KB.
After compression, converges data into full stripes and writes data to disks.
I hope it is of help! Thanks.