linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Ts'o <tytso@mit.edu>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: linux-ext4@vger.kernel.org, dm-devel@redhat.com
Subject: Re: [dm-devel] Some thoughts about providing data block checksumming for ext4
Date: Tue, 4 Nov 2014 21:33:37 -0500	[thread overview]
Message-ID: <20141105023337.GA324@thunk.org> (raw)
In-Reply-To: <alpine.LRH.2.02.1411041622490.30941@file01.intranet.prod.int.rdu2.redhat.com>

On Tue, Nov 04, 2014 at 04:39:55PM -0500, Mikulas Patocka wrote:
> 
> 
> On Mon, 3 Nov 2014, Theodore Ts'o wrote:
> 
> > But there is a way we can do even better!  If we can manage to
> > compress the block even by a tiny amount, so that 4k block can be
> > stored in 4092 bytes (which means we need to be able to compress the
> > block by 0.1%), we can store the checksum inline with the data, which
> > can then be atomically updated assuming a modern drive with a 4k
> > sector size (even a 512e disk will work fine, assuming the partition
> > is properly 4k aligned).  If the block is not sufficiently
> 
> There is still large number of drives with 512-byte sectors in use. So 
> we'd rather use 512-byte block?

There are a lot of systems (including Oracle IIRC) that use 4k blocks
and checksums, and accept the fact that very rarely it's possible that
even though writes are sent in chunks of 4k, it's possible (although
in general fairly rare) to have "torn writes" after a power failure. 

I'd much rather design for the future and not try to tie ourselves in
knots about the possibility of some torn writes on 512 byte sector
disks.  Many other file systems and databases have made similar
assumptions (and in fact have for years; I remember stories about
Oracle and another enterprise database having to deal with torn
writes eight years ago).

							- Ted

  parent reply	other threads:[~2014-11-05  2:33 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-03 23:33 Some thoughts about providing data block checksumming for ext4 Theodore Ts'o
2014-11-04 21:20 ` Andreas Dilger
2014-11-04 23:58   ` Theodore Ts'o
2014-11-04 21:39 ` [dm-devel] " Mikulas Patocka
2014-11-04 22:06   ` Mikulas Patocka
2014-11-05  0:27     ` Mikulas Patocka
2014-11-05 21:37       ` Milan Broz
2014-11-06 12:55         ` Theodore Ts'o
2014-11-05  2:33   ` Theodore Ts'o [this message]
2014-11-26 23:47 ` Darrick J. Wong
2014-11-27  0:07   ` Mike Snitzer
2014-11-27  0:39     ` Darrick J. Wong
2015-01-23 16:46       ` [dm-devel] " Vasily Tarasov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141105023337.GA324@thunk.org \
    --to=tytso@mit.edu \
    --cc=dm-devel@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).