数据压缩

客户端压缩

请参考 Java 客户端 - 数据序列化Java 客户端 - 数据压缩

服务端压缩

建议:

  • 对于 CPU 比较空闲的服务器,采用压缩率最高的 zstd 算法。
  • 对于 CPU 比较繁忙的服务器,采用压缩率和速度都比较好的 lz4 算法。

Pegasus 服务端支持的压缩算法:

  • snappy
  • lz4 (从 v1.11.2 版本开始支持)
  • zstd (从 v1.11.2 版本开始支持)

通过 配置文件 来配置压缩算法,例如:

[pegasus.server]
    rocksdb_compression_type = lz4

不同压缩算法的比较(数据来自 zstd 官方的 benchmark):

Compressor name Ratio Compression (MB/s) Decompress (MB/s)
zstd 1.3.4 -1 2.877 470 1380
zlib 1.2.11 -1 2.743 110 400
brotli 1.0.2 -0 2.701 410 430
quicklz 1.5.0 -1 2.238 550 710
lzo1x 2.09 -1 2.108 650 830
lz4 1.8.1 2.101 750 3700
snappy 1.1.4 2.091 530 1800
lzf 3.6 -1 2.077 400 860

compression-comparation.png

这个结果与 lz4 官方的 benchmark 是一致的。

附上 RocksDB 的压缩建议

Use options.compression to specify the compression to use. By default it is Snappy. We believe LZ4 is almost always better than Snappy. We leave Snappy as default to avoid unexpected compatibility problems to previous users. LZ4/Snappy is lightweight compression so it usually strikes a good balance between space and CPU usage.

If you want to further reduce the in-memory and have some free CPU to use, you can try to set a heavy-weight compression in the latter by setting options.bottommost_compression. The bottommost level will be compressed using this compression style. Usually the bottommost level contains majority of the data, so users get almost optimal space setting, without paying CPU for compress all the data ever flowing to any level. We recommend ZSTD. If it is not available, Zlib is the second choice.

If you want have a lot of free CPU and want to reduce not just space but write amplification too, try to set options.compression to heavy weight compression type. We recommend ZSTD. Use Zlib if it is not available.

Copyright © 2023 The Apache Software Foundation. Licensed under the Apache License, Version 2.0.

Apache Pegasus is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.

Apache Pegasus, Pegasus, Apache, the Apache feather logo, and the Apache Pegasus project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.