Re: [Qemu-devel] [PATCH for-2.6 v2 0/3] Bug fixes for gluster

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Rik van Riel <riel@redhat.com>
To: Ric Wheeler <rwheeler@redhat.com>, Kevin Wolf <kwolf@redhat.com>
Cc: Jeff Cody <jcody@redhat.com>,
	qemu-block@nongnu.org, qemu-devel@nongnu.org,
	pkarampu@redhat.com, rgowdapp@redhat.com, ndevos@redhat.com
Subject: Re: [Qemu-devel] [PATCH for-2.6 v2 0/3] Bug fixes for gluster
Date: Wed, 20 Apr 2016 14:37:28 -0400	[thread overview]
Message-ID: <1461177448.3200.23.camel@redhat.com> (raw)
In-Reply-To: <57175C83.1000402@redhat.com>

On Wed, 2016-04-20 at 06:40 -0400, Ric Wheeler wrote:
> On 04/20/2016 05:24 AM, Kevin Wolf wrote:
> > 
> > Am 20.04.2016 um 03:56 hat Ric Wheeler geschrieben:
> > > 
> > > On 04/19/2016 10:09 AM, Jeff Cody wrote:
> > > > 
> > > > On Tue, Apr 19, 2016 at 08:18:39AM -0400, Ric Wheeler wrote:
> > > > > 
> > > > > On 04/19/2016 08:07 AM, Jeff Cody wrote:
> > > > > > 
> > > > > > Bug fixes for gluster; third patch is to prevent
> > > > > > a potential data loss when trying to recover from
> > > > > > a recoverable error (such as ENOSPC).
> > > > > Hi Jeff,
> > > > > 
> > > > > Just a note, I have been talking to some of the disk drive
> > > > > people
> > > > > here at LSF (the kernel summit for file and storage people)
> > > > > and got
> > > > > a non-public confirmation that individual storage devices (s-
> > > > > ata
> > > > > drives or scsi) can also dump cache state when a synchronize
> > > > > cache
> > > > > command fails.  Also followed up with Rik van Riel - in the
> > > > > page
> > > > > cache in general, when we fail to write back dirty pages,
> > > > > they are
> > > > > simply marked "clean" (which means effectively that they get
> > > > > dropped).
> > > > > 
> > > > > Long winded way of saying that I think that this scenario is
> > > > > not
> > > > > unique to gluster - any failed fsync() to a file (or block
> > > > > device)
> > > > > might be an indication of permanent data loss.
> > > > > 
> > > > Ric,
> > > > 
> > > > Thanks.
> > > > 
> > > > I think you are right, we likely do need to address how QEMU
> > > > handles fsync
> > > > failures across the board in QEMU at some point
> > > > (2.7?).  Another point to
> > > > consider is that QEMU is cross-platform - so not only do we
> > > > have different
> > > > protocols, and filesystems, but also different underlying host
> > > > OSes as well.
> > > > It is likely, like you said, that there are other non-gluster
> > > > scenarios where
> > > > we have non-recoverable data loss on fsync failure.
> > > > 
> > > > With Gluster specifically, if we look at just ENOSPC, does this
> > > > mean that
> > > > even if Gluster retains its cache after fsync failure, we still
> > > > won't know
> > > > that there was no permanent data loss?  If we hit ENOSPC during
> > > > an fsync, I
> > > > presume that means Gluster itself may have encountered ENOSPC
> > > > from a fsync to
> > > > the underlying storage.  In that case, does Gluster just pass
> > > > the error up
> > > > the stack?
> > > > 
> > > > Jeff
> > > I still worry that in many non-gluster situations we will have
> > > permanent data loss here. Specifically, the way the page cache
> > > works, if we fail to write back cached data *at any time*, a
> > > future
> > > fsync() will get a failure.
> > And this is actually what saves the semantic correctness. If you
> > threw
> > away data, any following fsync() must fail. This is of course
> > inconvenient because you won't be able to resume a VM that is
> > configured
> > to stop on errors, and it means some data loss, but it's safe
> > because we
> > never tell the guest that the data is on disk when it really isn't.
> > 
> > gluster's behaviour (without resync-failed-syncs-after-fsync set)
> > is
> > different, if I understand correctly. It will throw away the data
> > and
> > then happily report success on the next fsync() call. And this is
> > what
> > causes not only data loss, but corruption.
> Yes, that makes sense to me - the kernel will remember that it could
> not write 
> data back from the page cache and the future fsync() will see an
> error.
> 
> > 
> > 
> > [ Hm, or having read what's below... Did I misunderstand and Linux
> >    returns failure only for a single fsync() and on the next one it
> >    returns success again? That would be bad. ]
> I would need to think through that scenario with the memory
> management people to 
> see if that could happen.

It could definitely happen.

1) block on disk contains contents A
2) page cache gets contents B written to it
3) fsync fails
4) page with contents B get evicted from memory
5) block with contents A gets read from disk

next prev parent reply	other threads:[~2016-04-20 18:37 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-19 12:07 [Qemu-devel] [PATCH for-2.6 v2 0/3] Bug fixes for gluster Jeff Cody
2016-04-19 12:07 ` [Qemu-devel] [PATCH for-2.6 v2 1/3] block/gluster: return correct error value Jeff Cody
2016-04-19 12:07 ` [Qemu-devel] [PATCH for-2.6 v2 2/3] block/gluster: code movement of qemu_gluster_close() Jeff Cody
2016-04-19 12:07 ` [Qemu-devel] [PATCH for-2.6 v2 3/3] block/gluster: prevent data loss after i/o error Jeff Cody
2016-04-19 12:27   ` Kevin Wolf
2016-04-19 12:29     ` Jeff Cody
2016-04-19 12:18 ` [Qemu-devel] [PATCH for-2.6 v2 0/3] Bug fixes for gluster Ric Wheeler
2016-04-19 14:09   ` Jeff Cody
2016-04-20  1:56     ` Ric Wheeler
2016-04-20  9:24       ` Kevin Wolf
2016-04-20 10:40         ` Ric Wheeler
2016-04-20 11:46           ` Kevin Wolf
2016-04-20 18:38             ` Rik van Riel
2016-04-21  8:43               ` Kevin Wolf
2016-04-20 18:37           ` Rik van Riel [this message]
2016-04-20  5:15     ` Raghavendra Gowdappa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1461177448.3200.23.camel@redhat.com \
    --to=riel@redhat.com \
    --cc=jcody@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=ndevos@redhat.com \
    --cc=pkarampu@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rgowdapp@redhat.com \
    --cc=rwheeler@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.