All of lore.kernel.org
 help / color / mirror / Atom feed
From: Igor Fedotov <ifedotov@mirantis.com>
To: ceph-devel@vger.kernel.org
Subject: Adding Data-At-Rest compression support to Ceph
Date: Tue, 22 Sep 2015 20:04:05 +0300	[thread overview]
Message-ID: <56018A05.6090100@mirantis.com> (raw)

Hi guys,

I can find some talks about adding compression support to Ceph. Let me 
share some thoughts and proposals on that too.

First of all I’d like to consider several major implementation options 
separately. IMHO this makes sense since they have different 
applicability, value and implementation specifics. Besides that less 
parts are easier for both understanding and implementation.

   * Data-At-Rest Compression. This is about compressing basic data 
volume kept by the Ceph backing tier. The main reason for that is data 
store costs reduction. One can find similar approach introduced by 
Erasure Coding Pool implementation - cluster capacity increases (i.e. 
storage cost reduces) at the expense of additional computations. This is 
especially effective when combined with the high-performance cache tier.
   *  Intermediate Data Compression. This case is about applying 
compression for intermediate data like system journals, caches etc. The 
intention is to improve expensive storage resource  utilization (e.g. 
solid state drives or RAM ). At the same time the idea to apply 
compression ( feature that undoubtedly introduces additional overhead ) 
to the crucial heavy-duty components probably looks contradictory.
   *  Exchange Data Сompression. This one to be applied to messages 
transported between client and storage cluster components as well as 
internal cluster traffic. The rationale for that might be the desire to 
improve cluster run-time characteristics, e.g. limited data bandwidth 
caused by the network or storage devices throughput. The potential 
drawback is client overburdening - client computation resources might 
become a bottleneck since they take most of compression/decompression tasks.

Obviously it would be great to have support for all the above cases, 
e.g. object compression takes place at the client and cluster components 
handle that naturally during the object life-cycle. Unfortunately 
significant  complexities arise on this way. Most of them are related to 
partial object access, both reading and writing. It looks like huge 
development ( redesigning, refactoring and new code development ) and 
testing efforts are required on this way. It’s hard to estimate the 
value of such aggregated support at the current moment too.
Thus the approach I’m suggesting is to drive the progress eventually and 
consider cases separately. At the moment my proposal is to add 
Data-At-Rest compression to Erasure Coded pools as the most definite one 
from both implementation and value points of view.

How we can do that.

Ceph Cluster Architecture suggests two-tier storage model for production 
usage. Cache tier built on high-performance expensive storage devices 
provides performance. Storage tier with low-cost less-efficient devices 
provides cost-effectiveness and capacity. Cache tier is supposed to use 
ordinary data replication while storage one can use erasure coding (EC) 
for effective and reliable data keeping. EC provides less store costs 
with the same reliability comparing to data replication approach at the 
expenses of additional computations. Thus Ceph already has some trade 
off between capacity and computation efforts. Actually Data-At-Rest 
compression is exactly about the same. Moreover one can tie EC and 
Data-At-Rest compression together to achieve even better storage 
effectiveness.
There are two possible ways on adding Data-At-Rest compression:
   *  Use data compression built into a file system beyond the Ceph.
   *  Add compression to Ceph OSD.

At first glance Option 1. looks pretty attractive but there are some 
drawbacks for this approach. Here they are:
   *  File System lock-in. BTRFS is the only file system supporting 
transparent compression among ones recommended for Ceph usage.         
          Moreover AFAIK it’s still not recommended for production 
usage, see:
http://ceph.com/docs/master/rados/configuration/filesystem-recommendations/
    *  Limited flexibility - one can use compression methods and 
policies supported by FS only.
    *  Data compression depends on volume or mount point properties (and 
is bound to OSD). Without additional support Ceph lacks the ability to 
have different compression policies for different pools residing at the 
same OSD.
    *  File Compression Control isn’t standardized among file systems. 
If (or when) new compression-equipped File System appears Ceph might 
require corresponding changes to handle that properly.

Having compression at OSD helps to eliminate these drawbacks.
As mentioned above Data-At-Rest compression purposes are pretty the same 
as for Erasure Coding. It looks quite easy to add compression support to 
EC pools. This way one can have even more storage space for higher CPU load.
Additional Pros for combining compression and erasure coding are:
   *  Both EC and compression have complexities in partial writing. EC 
pools don’t have partial write support (data append only) and the 
solution for that is cache tier insertion.  Thus we can transparently 
reuse the same approach in case of compression.
   *  Compression becomes a pool property thus Ceph users will have 
direct control what pools to apply compression with.
   *  Original write performance isn’t impacted by the compression for 
two-tier model - write data goes to the cache uncompressed and there is 
no corresponding compression latency. Actual compression happens in 
background when backing storage filling takes place.
   *  There is an additional benefit in network bandwidth saving when 
primary OSD performs a compression as resulting object shards for 
replication are less.
   *  Data-at-rest compression can also bring an additional performance 
improvement for HDD-based storage. Reducing the amount of data written 
to slow media can provide a net performance improvement even taking into 
account the compression overhead.

Some implementation notes:

The suggested approach is to perform data compression prior to Erasure 
Coding to reduce data portion passed to coding and avoid the need to 
introduce additional means to disable EC-generated chunks compression.
Data-At-Rest compression should support plugin architecture to enable 
multiple compression backends.
Compression engine should mark stored objects with some tags to indicate 
if compression took place and what algorithm was used.
To avoid (reduce) backing storage CPU overload caused by 
compression/decompression ( e.g. this can happen during massive reads ) 
we can introduce additional means to detect such situations and 
temporary disable compression for current write requests. Since there is 
way to mark objects as compressed/uncompressed this produces almost no 
issues for future handling. Hardware compression support usage, e.g. 
Intel QuickAssist can be an additional helper for this issue.

Any thoughts?

Thanks,
Igor.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

             reply	other threads:[~2015-09-22 17:04 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-22 17:04 Igor Fedotov [this message]
2015-09-22 19:11 ` Adding Data-At-Rest compression support to Ceph Sage Weil
2015-09-23 12:47   ` Igor Fedotov
2015-09-23 13:15     ` Sage Weil
2015-09-23 14:05       ` Gregory Farnum
2015-09-23 15:26         ` Igor Fedotov
2015-09-23 17:31           ` Samuel Just
2015-09-24 15:34             ` Igor Fedotov
2015-09-23 18:03           ` Gregory Farnum
2015-09-24 15:13             ` Igor Fedotov
2015-09-24 15:34               ` Sage Weil
2015-09-24 15:41                 ` HEWLETT, Paul (Paul)
2015-09-24 16:00                   ` Igor Fedotov
2015-09-24 15:56                 ` Igor Fedotov
2015-09-24 16:03                   ` Sage Weil
2015-09-24 16:14                     ` Igor Fedotov
2015-09-24 16:25                     ` Igor Fedotov
2015-09-24 17:36                       ` Robert LeBlanc
2015-09-24 17:53                         ` Samuel Just
2015-09-25 11:59                           ` Igor Fedotov
2015-09-25 14:14                             ` Sage Weil
2015-09-28 16:56                               ` Igor Fedotov
2015-09-24 18:10               ` Gregory Farnum
2015-09-25 13:16                 ` Igor Fedotov
2015-09-23 14:08       ` Igor Fedotov
2015-09-23 14:37         ` Sage Weil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56018A05.6090100@mirantis.com \
    --to=ifedotov@mirantis.com \
    --cc=ceph-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.