From mboxrd@z Thu Jan 1 00:00:00 1970 From: Igor Fedotov Subject: Re: Adding compression/checksum support for bluestore. Date: Thu, 31 Mar 2016 20:18:24 +0300 Message-ID: <56FD5BE0.4040801@mirantis.com> References: <56FD4FEC.4060000@mirantis.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mail-lf0-f43.google.com ([209.85.215.43]:35566 "EHLO mail-lf0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751813AbcCaRS0 (ORCPT ); Thu, 31 Mar 2016 13:18:26 -0400 Received: by mail-lf0-f43.google.com with SMTP id k79so64254921lfb.2 for ; Thu, 31 Mar 2016 10:18:26 -0700 (PDT) In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Allen Samuels , Sage Weil Cc: ceph-devel On 31.03.2016 19:32, Allen Samuels wrote: >> But do we really need to store checksums as metadata? What's about >> pre(post)fixing 4K-4(?) blob with the checksum and store this pair to >> the disk. IMO we always need checksum values along with blob data >> thus let's store and read them together. This immediately eliminates >> the question about the granularity and corresponding overhead... Have >> I missed something? > If you store them inline with the data then nothing lines up on boundaries that the HW designers expect and you end up doing things like extra-copying of every data buffer. This will kill performance. Perhaps you are right. But not sure I fully understand what HW designers you mean here. Are you considering the case when Ceph is embedded into some hardware and incoming RW requests always operate aligned data and supposed to have the same alignment for data saved to disk? IMHO proper data alignment in the incoming requests is a particular case. Generally we don't have such a trait. Moreover compression completely destroys it if any. Thus in many cases we can easily append an additional data portion containing a checksum. > > If you store them in a separate place (not in metadata, not contiguous to data) then you'll have a full extra I/O that might even move the head (yikes!). Plus you'll have to deal with the RMW of these tiny things. Agree - that's not an option. > Putting them in the metadata is really the only viable option.