qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Raghavendra Gowdappa <rgowdapp@redhat.com>
To: Pranith Kumar Karampuri <pkarampu@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Jeff Cody <jcody@redhat.com>,
	Ric Wheeler <rwheeler@redhat.com>, Jeff Darcy <jdarcy@redhat.com>,
	qemu-block@nongnu.org, Vijay Bellur <vbellur@redhat.com>,
	qemu-devel@nongnu.org, kdhananj@redhat.com,
	raghavendra talur <rtalur@redhat.com>,
	Poornima Gurusiddaiah <pgurusid@redhat.com>Vijay Bellur
	<vbellur@redhat.com>
Subject: Re: [Qemu-devel] [Qemu-block] [PATCH for-2.6 2/2] block/gluster: prevent data loss after i/o error
Date: Mon, 11 Apr 2016 00:26:50 -0400 (EDT)	[thread overview]
Message-ID: <977766574.2310591.1460348810883.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <570610D7.8000104@redhat.com>


> +Raghavendra G who implemented this option in write-behind, to this
> upstream patch review discussion

Thanks Pranith. Kritika did inform us about the discussion. We were working on ways to find out solutions to the problems raised (and it was a long festive weekend in Bangalore).

Sorry for top-posting. I am trying to address two issues raised in this mail thread:
1. No ways to identify whether setting of option succeeded in gfapi.
2. Why retaining cache after fsync failure is not default behavior?

1. No ways to identify whether setting of option succeeded in gfapi:

I've added Poornima and Raghavendra Talur who work on gfapi to assist on this.

2. Why retaining cache after fsync failure is not default behavior?

It was mostly to not to break two synchronized applications, which rely on fsync failures to retry. Details of discussion can be found below. The other reason was we could not figure out what POSIX's take on the state of earlier write after fsync failure (Other filesystems xfs, ext4 had non-uniform behavior). The question more specifically was "is it correct for a write issued before a failed fsync to succeed on the backend storage (persisting happened after fsync failure)?". I've also added Vijay Bellur who was involved in the earlier discussion to cc list.

Following is the discussion we had earlier with Kevin, Jeff Cody and others on the same topic. I am quoting it verbatim below.

<quote>

> > > > > I would actually argue that this setting would be right for everyone,
> > > > > not just qemu. Can you think of a case where keeping the data cached
> > > > > after a failed fsync would actively hurt any application? I can only
> > > > > think of cases where data is unnecessarily lost if data is dropped.
> > > > > 
> > > > 
> > > > I worry about use cases with concurrent writers to the same file and
> > > > how different applications would handle fsync() failures with our new
> > > > behavior.
> > > 
> > > Any specific scenario you're worried about?
> > > 
> > > > Keeping the known old behavior as the default will ensure that we do
> > > > not break anything once this is out. qemu/virt users with gluster are
> > > > accustomed to setting the virt group and hence no additional knobs
> > > > would need to be altered by them.
> > > 
> > > Not changing anything is a safe way to avoid regressions. But it's also
> > > a safe way to leave bugs unfixed. I would say that the current behavíour
> > > is at least borderline buggy and very hard for applications to handle
> > > correctly.
> > 
> > I tend to agree with Kevin on this. If we've an error handling strategy
> > that
> > is posix-complaint, I don't think there is need to add one more option.
> > Most
> > of the times options tend to be used in default values, which is equivalent
> > to not providing an option at all. However, before doing that its better we
> > think through whether it can affect any existing deployments adversely
> > (even
> > when they are not posix-complaint).
> > 
> 
> One pattern that I can think of -
> 
> Applications that operate on the same file from different clients through
> some locking or other co-ordination would have this pattern:
> 
> lock (file), write (file), fsync (file), unlock (file);
>  
> Now if the first fsync() fails, based on application logic the offset used
> for the failed write + fsync could be re-utilized by a co-ordinating
> application on another node to write out legitimate data. When control
> returns back to the application that received a failure, the subsequent
> write + fsync can cause data to be overwritten at the old offset along with
> new data being written at the new offset.

Yeah. I agree. Co-ordinated applications on different mounts will have issues, if they are working on the assumption that after fsync no older writes will hit the backend. Given that there seems to be a fair bit of confusion on status of retry of older writes after an fsync failure, we can expect some regressions. So, to summarize,

1. Keep the behaviour in patchset 11 of [1] as default. Both fsync and flush act as barriers and will make sure either
   a. older writes are synced to backend
   b. old writes are failed and never retried.

   after a failed fsync/flush.

2. Modify the behaviour of patchset 11 of [1] to keep failed syncs after a failed fsync and retry them till a flush. After a flush, no retries of failed syncs will be attempted. This behaviour will be introduced as an option.

3. Transient and non-transient errors will be treated similarly and failed syncs will be retried alike.

Does everyone agree on the above points? If yes, I'll modify [1] accordingly.

[1] http://review.gluster.org/#/c/12594/

</quote>

  reply	other threads:[~2016-04-11  4:26 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-06  3:29 [Qemu-devel] [PATCH for-2.6 0/2] Bug fixes for gluster Jeff Cody
2016-04-06  3:29 ` [Qemu-devel] [PATCH for-2.6 1/2] block/gluster: return correct error value Jeff Cody
2016-04-06 15:00   ` Niels de Vos
2016-04-06  3:29 ` [Qemu-devel] [PATCH for-2.6 2/2] block/gluster: prevent data loss after i/o error Jeff Cody
2016-04-06 11:02   ` Kevin Wolf
2016-04-06 11:19     ` Ric Wheeler
2016-04-06 11:41       ` Kevin Wolf
2016-04-06 11:51         ` [Qemu-devel] [Qemu-block] " Kevin Wolf
2016-04-06 13:10           ` Jeff Cody
2016-04-06 13:20             ` Kevin Wolf
2016-04-07  7:48               ` Pranith Kumar Karampuri
2016-04-11  4:26                 ` Raghavendra Gowdappa [this message]
2016-05-09  6:38                   ` Raghavendra Talur
2016-05-09  8:02                     ` Kevin Wolf
2016-04-06 12:44     ` [Qemu-devel] " Jeff Cody
2016-04-06 14:47       ` Niels de Vos
2016-04-06 16:05         ` Jeff Cody
2016-04-07 16:17     ` Jeff Cody

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=977766574.2310591.1460348810883.JavaMail.zimbra@redhat.com \
    --to=rgowdapp@redhat.com \
    --cc=jcody@redhat.com \
    --cc=jdarcy@redhat.com \
    --cc=kdhananj@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pgurusid@redhat.com \
    --cc=pkarampu@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rtalur@redhat.com \
    --cc=rwheeler@redhat.com \
    --cc=vbellur@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).