From: Kevin Wolf <kwolf@redhat.com>
To: Fam Zheng <famz@redhat.com>
Cc: pbonzini@redhat.com, stefanha@redhat.com, qemu-devel@nongnu.org,
qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v2 00/11] aio: Introduce handler type to fix nested aio_poll for dataplane
Date: Mon, 14 Sep 2015 10:40:35 +0200 [thread overview]
Message-ID: <20150914084035.GA3550@noname.str.redhat.com> (raw)
In-Reply-To: <20150914072730.GB31803@ad.nay.redhat.com>
Am 14.09.2015 um 09:27 hat Fam Zheng geschrieben:
> On Fri, 09/11 14:22, Kevin Wolf wrote:
> > Am 11.09.2015 um 13:46 hat Fam Zheng geschrieben:
> > > On Fri, 09/11 12:39, Kevin Wolf wrote:
> > > > Am 29.07.2015 um 06:42 hat Fam Zheng geschrieben:
> > > > > v2: Switch to disable/enable model. [Paolo]
> > > > >
> > > > > Most existing nested aio_poll()'s in block layer are inconsiderate of
> > > > > dispatching potential new r/w requests from ioeventfds and nbd exports, which
> > > > > might result in responsiveness issues (e.g. bdrv_drain_all will not return when
> > > > > new requests keep coming), or even wrong semantics (e.g. qmp_transaction cannot
> > > > > enforce atomicity due to aio_poll in bdrv_drain_all).
> > > > >
> > > > > Previous attampts to address this issue include new op blocker[1], bdrv_lock[2]
> > > > > and nested AioContext (patches not posted to qemu-devel).
> > > > >
> > > > > This approach is based on the idea proposed by Paolo Bonzini. The original idea
> > > > > is introducing "aio_context_disable_client / aio_context_enable_client to
> > > > > filter AioContext handlers according to the "client", e.g.
> > > > > AIO_CLIENT_DATAPLANE (ioeventfd), AIO_CLIENT_PROTOCOL, AIO_CLIENT_NBD_SERVER,
> > > > > AIO_CLIENT_CONTEXT, ... Extend aio_set_{event_notifier,fd}_handler to pass a
> > > > > client (type)."
> > > > >
> > > > > After this series, block layer aio_poll() will only process those "protocol"
> > > > > fds that are used in block I/O, plus the ctx->notifier for aio_notify(); other
> > > > > aio_poll()'s keep unchanged.
> > > > >
> > > > > The biggest advantage over approaches [1] and [2] is, no change is needed in
> > > > > virtio-{blk,scsi}-dataplane code, also this doesn't depend on converting QMP to
> > > > > coroutines.
> > > >
> > > > It seems that I haven't replied on the mailing list yet, even though I
> > > > think I already said this in person at KVM Forum: This series fixes only
> > > > a special case of the real problem, which is that bdrv_drain/all at a
> > > > single point doesn't make a lot of sense, but needs to be replaced by a
> > > > whole section with exclusive access, like a bdrv_drained_begin/end pair.
> > > >
> > > > To be clear: Anything that works with types of users instead of
> > > > individual users is bound to fall short of being a complete solution. I
> > > > don't prefer partial solutions when we know there is a bigger problem.
> > > >
> > > > This series addresses your immediate need of protecting against new data
> > > > plane requests, which it arguably achieves. The second case I always
> > > > have in mind is Berto's case where he has multiple streaming block jobs
> > > > in the same backing file chain [1].
> > > >
> > > > This involves a bdrv_reopen() of the target BDS to make it writable, and
> > > > bdrv_reopen() uses bdrv_drain_all() so drivers don't have to cope with
> > > > running requests while reopening themselves. It can however involve
> > > > nested event loops for synchronous operations like bdrv_flush(), and if
> > > > those can process completions of block jobs, which can respond by doing
> > > > anything to the respective node, things can go wrong.
> > >
> > > Just to get a better idea of bdrv_drained_begin/end, could you explain how to
> > > use the pair to fix the above problem?
> >
> > How to use it is easy part: In bdrv_reopen_multiple(), you would replace
> > the existing bdrv_drain_all() with begin and you would add the
> > corresponding end right before the return statement.
>
> OK, so that the completion of other jobs won't happen because we only complete
> the relevant requests?
>
> Does block_job_pause() work here?
If you find all block jobs that do something with the BDS,
block_job_pause() would work, yes. But if you go further down that road,
you end up not with a design that considers the problem, but with
special casing every single user. That doesn't feel very sustainable.
> > > > You don't solve this by adding client types (then problematic request
> > > > would be PROTOCOL in your proposal and you can never exclude that), but
> > > > you really need to have bdrv_drained_being/end pairs, where only
> > > > requests issued in between are processed and everything else waits.
> > >
> > > What do you mean by "only requests issued in between are processed"? Where are
> > > the requests from?
> >
> > Generally speaking, you would have code that looks like this:
> >
> > bdrv_drain_begin()
> > ...
> > bdrv_something_synchronous()
> > ...
> > bdrv_drain_end()
> >
> > You want to process everything that is necessary for completing
> > bdrv_something_synchronous(), but nothing else.
> >
> > The trickier question is how to implement this. I know that it's much
> > easier to say that your series doesn't work than actually proposing
> > something else that works...
>
> I see the basic idea but that is still way too obscure. How would
> bdrv_draind_begin/end know what is necessary in completing
> bdrv_something_synchronous()?
That's not about the interface, but the implementation part, which I
tried to answer below.
> > One relatively obvious answer we found when we discussed this a while
> > back was some kind of a recursive CoRwLock (reader = in-flight request;
> > writer = drained section), but that requires obviously that you're
> > running in a coroutine if you want to do something with a drained
> > request queue.
> >
> > I'm also not totally happy with the requirement of taking a reader lock
> > more or less everywhere. But I'm not sure yet if there is a good
> > alternative that can achieve the same.
>
> We're basically trying to prevent new requests from being submitted, but
> couldn't this blocking be a complication itself? A CoRwLock, if any, would be
> implemented with something like a CoQueue, but considering the existing CoQueue
> in BDS - throttled_reqs, aren't those what we want to keep *empty* between the
> bdrv_drain_begin/end pair? Now we're talking about queueing requests to
> another CoQueue, which feels contradictory to me.
You always queue requests _somewhere_. If they aren't queued in the
block layer, they are in the virtio ring or elsewhere. I think the point
of bdrv_drain() is that the block drivers don't have any pending
requests; they can be queued elsewhere without hurting anyone.
Kevin
next prev parent reply other threads:[~2015-09-14 8:40 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-29 4:42 [Qemu-devel] [PATCH v2 00/11] aio: Introduce handler type to fix nested aio_poll for dataplane Fam Zheng
2015-07-29 4:42 ` [Qemu-devel] [PATCH v2 01/11] aio: Introduce "type" in aio_set_fd_handler and aio_set_event_notifier Fam Zheng
2015-08-27 13:50 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2015-07-29 4:42 ` [Qemu-devel] [PATCH v2 02/11] aio: Save type to AioHandler Fam Zheng
2015-08-27 13:50 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2015-07-29 4:42 ` [Qemu-devel] [PATCH v2 03/11] block: Mark fd handlers as "protocol" Fam Zheng
2015-08-27 13:53 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2015-09-07 4:43 ` Fam Zheng
2015-07-29 4:42 ` [Qemu-devel] [PATCH v2 04/11] nbd: Mark fd handlers client type as "nbd server" Fam Zheng
2015-08-27 14:02 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2015-07-29 4:42 ` [Qemu-devel] [PATCH v2 05/11] aio: Mark ctx->notifier's client type as "context" Fam Zheng
2015-08-27 17:12 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2015-07-29 4:42 ` [Qemu-devel] [PATCH v2 06/11] dataplane: Mark host notifiers' client type as "dataplane" Fam Zheng
2015-08-27 17:14 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2015-07-29 4:42 ` [Qemu-devel] [PATCH v2 07/11] aio-posix: introduce aio_{disable, enable}_clients Fam Zheng
2015-08-27 17:23 ` Stefan Hajnoczi
2015-09-07 5:26 ` Fam Zheng
2015-07-29 4:42 ` [Qemu-devel] [PATCH v2 08/11] aio-win32: Implement " Fam Zheng
2015-07-29 4:42 ` [Qemu-devel] [PATCH v2 09/11] block: Introduce bdrv_aio_poll Fam Zheng
2015-08-27 17:25 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2015-08-28 11:50 ` Stefan Hajnoczi
2015-07-29 4:42 ` [Qemu-devel] [PATCH v2 10/11] block: Replace nested aio_poll with bdrv_aio_poll Fam Zheng
2015-08-28 11:50 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2015-07-29 4:42 ` [Qemu-devel] [PATCH v2 11/11] block: Only poll block layer fds in bdrv_aio_poll Fam Zheng
2015-07-29 7:36 ` Paolo Bonzini
2015-07-29 7:37 ` Paolo Bonzini
2015-07-29 10:57 ` Fam Zheng
2015-07-29 11:02 ` Paolo Bonzini
2015-07-29 11:53 ` Fam Zheng
2015-07-29 12:03 ` Paolo Bonzini
2015-07-30 1:35 ` Fam Zheng
2015-07-30 13:22 ` Paolo Bonzini
2015-09-09 3:22 ` Fam Zheng
2015-09-11 8:15 ` Paolo Bonzini
2015-09-11 9:14 ` Fam Zheng
2015-09-11 9:36 ` [Qemu-devel] [Qemu-block] " Alberto Garcia
2015-09-11 9:43 ` Daniel P. Berrange
2015-09-11 9:44 ` Fam Zheng
2015-09-11 9:54 ` Paolo Bonzini
2015-09-11 10:40 ` Fam Zheng
2015-09-11 10:46 ` Paolo Bonzini
2015-09-11 11:01 ` Fam Zheng
2015-09-11 11:02 ` Paolo Bonzini
2015-09-11 11:12 ` Fam Zheng
2015-09-11 9:45 ` Paolo Bonzini
2015-07-29 7:38 ` [Qemu-devel] [PATCH v2 00/11] aio: Introduce handler type to fix nested aio_poll for dataplane Paolo Bonzini
2015-08-28 11:53 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2015-09-07 6:28 ` Fam Zheng
2015-09-11 10:39 ` [Qemu-devel] " Kevin Wolf
2015-09-11 11:46 ` Fam Zheng
2015-09-11 12:22 ` Kevin Wolf
2015-09-14 7:27 ` Fam Zheng
2015-09-14 8:40 ` Kevin Wolf [this message]
2015-09-28 9:31 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150914084035.GA3550@noname.str.redhat.com \
--to=kwolf@redhat.com \
--cc=famz@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.