linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bernd Schubert <bernd.schubert@fastmail.fm>
To: Chris Mason <chris.mason@oracle.com>,
	James Bottomley <James.Bottomley@HansenPartnership.com>,
	Gregory Farnum <gregory.farnum@dreamhost.com>,
	Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	linux-scsi@vger.kernel.org,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Sven Breuner <sven.breuner@itwm.fraunhofer.de>,
	Chuck Lever <chuck.lever@oracle.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	lsf-pc@lists.linux-foundation.org
Subject: Re: [Lsf-pc] [LSF/MM TOPIC] end-to-end data and metadata corruption detection
Date: Wed, 01 Feb 2012 18:59:44 +0100	[thread overview]
Message-ID: <4F297D90.1010509@fastmail.fm> (raw)
In-Reply-To: <20120201174131.GD16796@shiny>

On 02/01/2012 06:41 PM, Chris Mason wrote:
> On Wed, Feb 01, 2012 at 10:52:55AM -0600, James Bottomley wrote:
>> On Wed, 2012-02-01 at 11:45 -0500, Chris Mason wrote:
>>> On Tue, Jan 31, 2012 at 11:28:26AM -0800, Gregory Farnum wrote:
>>>> On Tue, Jan 31, 2012 at 11:22 AM, Bernd Schubert
>>>> <bernd.schubert@itwm.fraunhofer.de>  wrote:
>>>>> I guess we should talk to developers of other parallel file systems and see
>>>>> what they think about it. I think cephfs already uses data integrity
>>>>> provided by btrfs, although I'm not entirely sure and need to check the
>>>>> code. As I said before, Lustre does network checksums already and *might* be
>>>>> interested.
>>>>
>>>> Actually, right now Ceph doesn't check btrfs' data integrity
>>>> information, but since Ceph doesn't have any data-at-rest integrity
>>>> verification it relies on btrfs if you want that. Integrating
>>>> integrity verification throughout the system is on our long-term to-do
>>>> list.
>>>> We too will be said if using a kernel-level integrity system requires
>>>> using DIO, although we could probably work out a way to do
>>>> "translation" between our own integrity checksums and the
>>>> btrfs-generated ones if we have to (thanks to replication).
>>>
>>> DIO isn't really required, but doing this without synchronous writes
>>> will get painful in a hurry.  There's nothing wrong with letting the
>>> data sit in the page cache after the IO is done though.
>>
>> I broadly agree with this, but even if you do sync writes and cache read
>> only copies, we still have the problem of how we do the read side
>> verification of DIX.  In theory, when you read, you could either get the
>> cached copy or an actual read (which will supply protection
>> information), so for the cached copy we need to return cached protection
>> information implying that we need some way of actually caching it.
>
> Good point, reading from the cached copy is a lower level of protection
> because in theory bugs in your scsi drivers could corrupt the pages
> later on.

But that only matters if the application is going to verify if data are 
really on disk. For example (client server scenario)

1) client-A writes a page
2) client-B reads this page

client-B is simply not interested here where it gets the page from, as 
long as it gets correct data. The network files system in between also 
will just be happy existing in-cache crcs for network verification.
Only if the page is later on dropped from the cache and read again, 
on-disk crcs matter. If those are bad, one of the layers is going to 
complain or correct those data.

If the application wants to check data on disk it can either use DIO or 
alternatively something like fadvsise(DONTNEED_LOCAL_AND_REMOTE) 
(something I wanted to propose for some time already, at least I'm not 
happy that posix_fadvise(POSIX_FADV_DONTNEED) is not passed to the file 
system at all).


Cheers,
Bernd

  reply	other threads:[~2012-02-01 17:59 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-17 20:15 [LSF/MM TOPIC] end-to-end data and metadata corruption detection Chuck Lever
     [not found] ` <38C050B3-2AAD-4767-9A25-02C33627E427-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2012-01-26 12:31   ` Bernd Schubert
     [not found]     ` <4F2147BA.6030607-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX@public.gmane.org>
2012-01-26 14:53       ` Martin K. Petersen
     [not found]         ` <yq1k44e1pn6.fsf-+q57XtR/GgMb6DWv4sQWN6xOck334EZe@public.gmane.org>
2012-01-26 16:27           ` Bernd Schubert
2012-01-26 23:21             ` James Bottomley
     [not found]               ` <1327620104.6151.23.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org>
2012-01-31 19:16                 ` Bernd Schubert
2012-01-31 19:21                   ` Chuck Lever
2012-01-31 20:04                     ` Martin K. Petersen
2012-01-31  2:10             ` Martin K. Petersen
2012-01-31 19:22               ` Bernd Schubert
2012-01-31 19:28                 ` Gregory Farnum
2012-02-01 16:45                   ` [Lsf-pc] " Chris Mason
2012-02-01 16:52                     ` James Bottomley
     [not found]                       ` <1328115175.2768.11.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org>
2012-02-01 17:41                         ` Chris Mason
2012-02-01 17:59                           ` Bernd Schubert [this message]
2012-02-01 18:16                             ` James Bottomley
2012-02-01 18:30                               ` Andrea Arcangeli
2012-02-02  9:04                                 ` Bernd Schubert
2012-02-02 19:26                                   ` Andrea Arcangeli
     [not found]                                     ` <20120202192643.GC5873-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-02-02 19:46                                       ` Andreas Dilger
2012-02-02 22:52                                       ` Bernd Schubert
2012-02-01 18:15                       ` Martin K. Petersen
     [not found]                         ` <yq1d39ys9n1.fsf-+q57XtR/GgMb6DWv4sQWN6xOck334EZe@public.gmane.org>
2012-02-01 23:03                           ` Boaz Harrosh
     [not found]         ` <DE0353DF-83EA-480E-9C42-1EE760D6EE41@dilger.ca>
2012-01-31  2:22           ` Martin K. Petersen
2012-01-26 15:36   ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F297D90.1010509@fastmail.fm \
    --to=bernd.schubert@fastmail.fm \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=bernd.schubert@itwm.fraunhofer.de \
    --cc=chris.mason@oracle.com \
    --cc=chuck.lever@oracle.com \
    --cc=gregory.farnum@dreamhost.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=martin.petersen@oracle.com \
    --cc=sven.breuner@itwm.fraunhofer.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).