All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Mason <chris.mason@oracle.com>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Gregory Farnum <gregory.farnum@dreamhost.com>,
	Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	linux-scsi@vger.kernel.org,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Sven Breuner <sven.breuner@itwm.fraunhofer.de>,
	Chuck Lever <chuck.lever@oracle.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	lsf-pc@lists.linux-foundation.org
Subject: Re: [Lsf-pc] [LSF/MM TOPIC] end-to-end data and metadata corruption detection
Date: Wed, 1 Feb 2012 12:41:31 -0500	[thread overview]
Message-ID: <20120201174131.GD16796@shiny> (raw)
In-Reply-To: <1328115175.2768.11.camel@dabdike.int.hansenpartnership.com>

On Wed, Feb 01, 2012 at 10:52:55AM -0600, James Bottomley wrote:
> On Wed, 2012-02-01 at 11:45 -0500, Chris Mason wrote:
> > On Tue, Jan 31, 2012 at 11:28:26AM -0800, Gregory Farnum wrote:
> > > On Tue, Jan 31, 2012 at 11:22 AM, Bernd Schubert
> > > <bernd.schubert@itwm.fraunhofer.de> wrote:
> > > > I guess we should talk to developers of other parallel file systems and see
> > > > what they think about it. I think cephfs already uses data integrity
> > > > provided by btrfs, although I'm not entirely sure and need to check the
> > > > code. As I said before, Lustre does network checksums already and *might* be
> > > > interested.
> > > 
> > > Actually, right now Ceph doesn't check btrfs' data integrity
> > > information, but since Ceph doesn't have any data-at-rest integrity
> > > verification it relies on btrfs if you want that. Integrating
> > > integrity verification throughout the system is on our long-term to-do
> > > list.
> > > We too will be said if using a kernel-level integrity system requires
> > > using DIO, although we could probably work out a way to do
> > > "translation" between our own integrity checksums and the
> > > btrfs-generated ones if we have to (thanks to replication).
> > 
> > DIO isn't really required, but doing this without synchronous writes
> > will get painful in a hurry.  There's nothing wrong with letting the
> > data sit in the page cache after the IO is done though.
> 
> I broadly agree with this, but even if you do sync writes and cache read
> only copies, we still have the problem of how we do the read side
> verification of DIX.  In theory, when you read, you could either get the
> cached copy or an actual read (which will supply protection
> information), so for the cached copy we need to return cached protection
> information implying that we need some way of actually caching it.

Good point, reading from the cached copy is a lower level of protection
because in theory bugs in your scsi drivers could corrupt the pages
later on.

But I think even without keeping the crcs attached to the page, there is
value in keeping  the cached copy in lots of workloads.  The database is
going to O_DIRECT read (with crcs checked) and then stuff it into a
database buffer cache for long term use.  Stuffing it into a page cache
on the kernel side is about the same.

-chris

WARNING: multiple messages have this Message-ID (diff)
From: Chris Mason <chris.mason-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
To: James Bottomley
	<James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
Cc: Gregory Farnum
	<gregory.farnum-OZUH0SiS3Izby3iVrkZq2A@public.gmane.org>,
	Bernd Schubert
	<bernd.schubert-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX@public.gmane.org>,
	Linux NFS Mailing List
	<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	"Martin K. Petersen"
	<martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
	Sven Breuner
	<sven.breuner-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX@public.gmane.org>,
	Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
	linux-fsdevel
	<linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
Subject: Re: [Lsf-pc] [LSF/MM TOPIC] end-to-end data and metadata corruption detection
Date: Wed, 1 Feb 2012 12:41:31 -0500	[thread overview]
Message-ID: <20120201174131.GD16796@shiny> (raw)
In-Reply-To: <1328115175.2768.11.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org>

On Wed, Feb 01, 2012 at 10:52:55AM -0600, James Bottomley wrote:
> On Wed, 2012-02-01 at 11:45 -0500, Chris Mason wrote:
> > On Tue, Jan 31, 2012 at 11:28:26AM -0800, Gregory Farnum wrote:
> > > On Tue, Jan 31, 2012 at 11:22 AM, Bernd Schubert
> > > <bernd.schubert-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX@public.gmane.org> wrote:
> > > > I guess we should talk to developers of other parallel file systems and see
> > > > what they think about it. I think cephfs already uses data integrity
> > > > provided by btrfs, although I'm not entirely sure and need to check the
> > > > code. As I said before, Lustre does network checksums already and *might* be
> > > > interested.
> > > 
> > > Actually, right now Ceph doesn't check btrfs' data integrity
> > > information, but since Ceph doesn't have any data-at-rest integrity
> > > verification it relies on btrfs if you want that. Integrating
> > > integrity verification throughout the system is on our long-term to-do
> > > list.
> > > We too will be said if using a kernel-level integrity system requires
> > > using DIO, although we could probably work out a way to do
> > > "translation" between our own integrity checksums and the
> > > btrfs-generated ones if we have to (thanks to replication).
> > 
> > DIO isn't really required, but doing this without synchronous writes
> > will get painful in a hurry.  There's nothing wrong with letting the
> > data sit in the page cache after the IO is done though.
> 
> I broadly agree with this, but even if you do sync writes and cache read
> only copies, we still have the problem of how we do the read side
> verification of DIX.  In theory, when you read, you could either get the
> cached copy or an actual read (which will supply protection
> information), so for the cached copy we need to return cached protection
> information implying that we need some way of actually caching it.

Good point, reading from the cached copy is a lower level of protection
because in theory bugs in your scsi drivers could corrupt the pages
later on.

But I think even without keeping the crcs attached to the page, there is
value in keeping  the cached copy in lots of workloads.  The database is
going to O_DIRECT read (with crcs checked) and then stuff it into a
database buffer cache for long term use.  Stuffing it into a page cache
on the kernel side is about the same.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2012-02-01 17:41 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-17 20:15 [LSF/MM TOPIC] end-to-end data and metadata corruption detection Chuck Lever
2012-01-17 20:15 ` Chuck Lever
2012-01-26 12:31 ` Bernd Schubert
2012-01-26 12:31   ` Bernd Schubert
2012-01-26 14:53   ` Martin K. Petersen
2012-01-26 14:53     ` Martin K. Petersen
2012-01-26 16:27     ` Bernd Schubert
2012-01-26 16:27       ` Bernd Schubert
2012-01-26 23:21       ` James Bottomley
2012-01-31 19:16         ` Bernd Schubert
2012-01-31 19:16           ` Bernd Schubert
2012-01-31 19:21           ` Chuck Lever
2012-01-31 20:04             ` Martin K. Petersen
2012-01-31  2:10       ` Martin K. Petersen
2012-01-31 19:22         ` Bernd Schubert
2012-01-31 19:28           ` Gregory Farnum
2012-02-01 16:45             ` [Lsf-pc] " Chris Mason
2012-02-01 16:52               ` James Bottomley
2012-02-01 17:41                 ` Chris Mason [this message]
2012-02-01 17:41                   ` Chris Mason
2012-02-01 17:59                   ` Bernd Schubert
2012-02-01 18:16                     ` James Bottomley
2012-02-01 18:30                       ` Andrea Arcangeli
2012-02-02  9:04                         ` Bernd Schubert
2012-02-02 19:26                           ` Andrea Arcangeli
2012-02-02 19:46                             ` Andreas Dilger
2012-02-02 19:46                               ` Andreas Dilger
2012-02-02 22:52                             ` Bernd Schubert
2012-02-02 22:52                               ` Bernd Schubert
2012-02-01 18:15                 ` Martin K. Petersen
2012-02-01 23:03                   ` Boaz Harrosh
2012-02-01 23:03                     ` Boaz Harrosh
2012-02-01 23:03                     ` Boaz Harrosh
     [not found]     ` <DE0353DF-83EA-480E-9C42-1EE760D6EE41@dilger.ca>
2012-01-31  2:22       ` Martin K. Petersen
2012-01-31  2:22         ` Martin K. Petersen
2012-01-26 15:36 ` Martin K. Petersen
2012-01-26 15:36   ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120201174131.GD16796@shiny \
    --to=chris.mason@oracle.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=bernd.schubert@itwm.fraunhofer.de \
    --cc=chuck.lever@oracle.com \
    --cc=gregory.farnum@dreamhost.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=martin.petersen@oracle.com \
    --cc=sven.breuner@itwm.fraunhofer.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.