qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Nicholas Thomas <nick@bytemark.co.uk>
To: Jamie Lokier <jamie@shareable.org>
Cc: Kevin Wolf <kwolf@redhat.com>,
	pbonzini@redhat.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 1/3] nbd: Only try to send flush/discard commands if connected to the NBD server
Date: Wed, 24 Oct 2012 13:16:34 +0100	[thread overview]
Message-ID: <1351080994.16343.293.camel@eboracum.office.bytemark.co.uk> (raw)
In-Reply-To: <20121023150225.GB17080@jl-vm1.vm.bytemark.co.uk>

On Tue, 2012-10-23 at 16:02 +0100, Jamie Lokier wrote:
> Nicholas Thomas wrote:
> > On Tue, 2012-10-23 at 12:33 +0200, Kevin Wolf wrote:
> > > Am 22.10.2012 13:09, schrieb nick@bytemark.co.uk:
> > > > 
> > > > This is unlikely to come up now, but is a necessary prerequisite for reconnection
> > > > behaviour.
> > > > 
> > > > Signed-off-by: Nick Thomas <nick@bytemark.co.uk>
> > > > ---
> > > >  block/nbd.c |   13 +++++++++++--
> > > >  1 files changed, 11 insertions(+), 2 deletions(-)
> > > 
> > > What's the real requirement here? Silently ignoring a flush and
> > > returning success for it feels wrong. Why is it correct?
> > > 
> > > Kevin
> > 
> > I just needed to avoid socket operations while s->sock == -1, and
> > extending the existing case of "can't do the command, so pretend I did
> > it" to "can't do the command right now, so pretend..." seemed like an
> > easy way out. 
> 
> Hi Nicholas,
> 
> Ignoring a flush is another way of saying "corrupt my data" in some
> circumstances.  We have options in QEMU already to say whether flushes
> are ignored on normal discs, but if someone's chosen the "I really
> care about my database/filesystem" option, and verified that their NBD
> setup really performs them (in normal circumstances), silently
> dropping flushes from time to time isn't nice.
> 
> I would much rather the guest is forced to wait until reconnection and
> then get a successful flush, if the problem is just that the server
> was done briefly.  Or, if that is too hard, that the flush is 

OK, that's certainly doable. Revised patches hopefully showing up this
week or (at a push) next. 

> Since the I/O _order_ before, and sometimes after, flush, is important
> for data integrity, this needs to be maintained when I/Os are queued in
> the disconnected state -- including those which were inflight at the
> time disconnect was detected and then retried on reconnect.

Hmm, discussing this on IRC I was told that it wasn't necessary to
preserve order - although I forget the fine detail. Depending on the
implementation of qemu's coroutine mutexes, operations may not actually
be performed in order right now - it's not too easy to work out what's
happening. 

I've also just noticed that flush & discard don't take the send_mutex
before writing to the socket. That can't be intentional, surely? Paolo?

I've got a proof-of-concept that makes flush, discard, readv_1 and
writev_1 call a common nbd_co_handle_request function. this retries I/Os
when a connection goes away and comes back, and lets me make this patch
unnecessary. It doesn't preserve I/O ordering, but it won't be too hard
to add that if it is actually necessary.

> Ignoring a discard is not too bad.  However, if discard is retried,
> then I/O order is important in relation to those as well.
> 
> > In the Bytemark case, the NBD server always opens the file O_SYNC, so
> > nbd_co_flush could check in_flight == 0 and return 0/1 based on that;
> > but I'd be surprised if that's true for all NBD servers. Should we be
> > returning 1 here for both "not supported" and "can't do it right now",
> > instead?
> 
> When the server is opening the file O_SYNC, wouldn't it make sense to
> tell QEMU -- and the guest -- that there's no need to send flushes at
> all, as it's equivalent to a disk with no write-cache (or disabled)?

Probably; our NBD server still implements the old-style handshakes, as
far as I recall, and doesn't actually implement the flush or discard
commands as a result. 

  reply	other threads:[~2012-10-24 12:16 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-22 11:09 [Qemu-devel] [PATCH 0/3] NBD reconnection behaviour nick
2012-10-22 11:09 ` [Qemu-devel] [PATCH 1/3] nbd: Only try to send flush/discard commands if connected to the NBD server nick
2012-10-23 10:33   ` Kevin Wolf
2012-10-23 11:08     ` Nicholas Thomas
2012-10-23 11:26       ` Kevin Wolf
2012-10-23 15:02       ` Jamie Lokier
2012-10-24 12:16         ` Nicholas Thomas [this message]
2012-10-24 12:57           ` Kevin Wolf
2012-10-24 14:32             ` Jamie Lokier
2012-10-24 15:16               ` Paolo Bonzini
2012-10-25  6:36               ` Kevin Wolf
2012-10-25 17:09                 ` Jamie Lokier
2012-10-26  7:59                   ` Kevin Wolf
2012-10-24 14:03           ` Paolo Bonzini
2012-10-24 14:10             ` Paolo Bonzini
2012-10-24 14:12               ` Nicholas Thomas
2012-10-22 11:09 ` [Qemu-devel] [PATCH 2/3] nbd: Explicitly disconnect and fail inflight I/O requests on error, then reconnect next I/O request nick
2012-10-23 10:40   ` Kevin Wolf
2012-10-24 14:31   ` Paolo Bonzini
2012-10-22 11:09 ` [Qemu-devel] [PATCH 3/3] nbd: Move reconnection attempts from each new I/O request to a 5-second timer nick

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1351080994.16343.293.camel@eboracum.office.bytemark.co.uk \
    --to=nick@bytemark.co.uk \
    --cc=jamie@shareable.org \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).