From mboxrd@z Thu Jan 1 00:00:00 1970 From: Igor Fedotov Subject: Re: Adding Data-At-Rest compression support to Ceph Date: Thu, 24 Sep 2015 18:56:06 +0300 Message-ID: <56041D16.6060805@mirantis.com> References: <56018A05.6090100@mirantis.com> <56029F66.3070503@mirantis.com> <5602C48C.4010009@mirantis.com> <5604131E.2030408@mirantis.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mail-la0-f43.google.com ([209.85.215.43]:36026 "EHLO mail-la0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756784AbbIXP4L (ORCPT ); Thu, 24 Sep 2015 11:56:11 -0400 Received: by lacao8 with SMTP id ao8so68726782lac.3 for ; Thu, 24 Sep 2015 08:56:09 -0700 (PDT) In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Sage Weil Cc: Gregory Farnum , ceph-devel On 24.09.2015 18:34, Sage Weil wrote: > I was also assuming each stripe unit would be independently > compressed, but I didn't think about the efficiency. This approach > implies that you'd want a relatively large stripe size (100s of KB or > more). Hmm, a quick google search suggests the zlib compression window > is only 32KB anyway, which isn't so big. The more aggressive > algorithms probably aren't what people would reach for anyway for CPU > utilization reasons... I guess? sage There is probably no need in strict alignment with the stripe size. We can use block sizes that client provides on write dynamically. If some client writes in stripes - then we compress that block. If others use larger blocks ( e.g. caching agent on flush) - we can use that size or split the provided block into several smaller chunks ( e.g. up to max N*stripe_size ) for overhead reduction on random read. Even if client uses dynamic block sizes ( low level RADOS use?) we can rely on them some way without static bind to stripe size. Surely this is much easier when appends are permitted only. General "random writes" case will be more complex. Thanks, Igor