From: "Martin K. Petersen" <martin.petersen@oracle.com>
To: Greg Freemyer <greg.freemyer@gmail.com>
Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
linux-ext4@vger.kernel.org
Subject: Re: Data integrity built into the storage stack
Date: Tue, 01 Sep 2009 01:19:49 -0400 [thread overview]
Message-ID: <yq1hbvntf62.fsf@sermon.lab.mkp.net> (raw)
In-Reply-To: <87f94c370908291423ub92922ft2cceab9e34ac6207@mail.gmail.com> (Greg Freemyer's message of "Sat, 29 Aug 2009 17:23:50 -0400")
>>>>> "Greg" == Greg Freemyer <greg.freemyer@gmail.com> writes:
Greg> We already have the scsi data integrity patches that went in last
Greg> winter and I believe fit into the storage stack below the
Greg> filesystem layer.
The filesystems can actually use it. It's exposed at the bio level.
Greg> I do believe there is a patch floating around for device mapper to
Greg> add some integrity capability.
The patch is in mainline. It allows passthrough so the filesystems can
access the integrity features. But DM itself doesn't use any of them,
it merely acts as a conduit.
DIF is inherently tied to storage device's logical blocks. These are
likely to be smaller than the blocks we're interested in protecting.
However, you could conceivably use the application tag space to add a
checksum with filesystem or MD/DM blocking size granularity. All the
hooks are there.
The application tag space is pretty much only available on disk
drives--array vendors use it for internal purposes. But in the MD/DM
case we're likely to run on raw disk so that's probably ok.
That said, I really think btrfs is the right answer to many of the
concerns raised in this thread. Everything is properly checksummed and
can be verified at read time.
The strength of DIX/DIF is that we can detect corruption at write time
while we still have the buffer we care about write sitting in memory.
So btrfs and DIX/DIF go hand in hand as far as I'm concerned. They
solve different problems but both are squarely aimed at preventing
silent data corruption.
I do agree that we do have to be more prepared for collateral damage
scenarios. As we discussed at LS we have 4KB drives coming out that can
invalidate previously acknowledged I/Os if it gets a subsequent write
failure on a sector. And there's also the issue of fractured writes
when talking to disk arrays. That's really what my I/O topology changes
were all about: Correctness. The fact that they may increase
performance is nice but that was not the main motivator.
--
Martin K. Petersen Oracle Linux Engineering
next prev parent reply other threads:[~2009-09-01 5:19 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-29 21:23 Data integrity built into the storage stack [was: Re: [testcase] test your fs/storage stack (was Re: [patch] ext2/3: document conditions when reliable operation is possible)] Greg Freemyer
2009-08-30 0:35 ` Rob Landley
2009-09-01 5:19 ` Martin K. Petersen [this message]
2009-09-01 12:44 ` Data integrity built into the storage stack Pavel Machek
2009-09-01 13:18 ` jim owens
2009-09-01 13:37 ` Pavel Machek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=yq1hbvntf62.fsf@sermon.lab.mkp.net \
--to=martin.petersen@oracle.com \
--cc=greg.freemyer@gmail.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox