qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Yongji Xie <xieyongji@bytedance.com>
Cc: "qemu devel list" <qemu-devel@nongnu.org>,
	"Peter Lieven" <pl@kamp.de>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	"Juan Quintela" <quintela@redhat.com>,
	qemu-block@nongnu.org, "Eduardo Habkost" <eduardo@habkost.net>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"David Woodhouse" <dwmw2@infradead.org>,
	"Stefan Weil" <sw@weilnetz.de>, "Fam Zheng" <fam@euphon.net>,
	"Julia Suvorova" <jusual@redhat.com>,
	"Ronnie Sahlberg" <ronniesahlberg@gmail.com>,
	xen-devel@lists.xenproject.org, "Hanna Reitz" <hreitz@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	eesposit@redhat.com, "Kevin Wolf" <kwolf@redhat.com>,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	"Paul Durrant" <paul@xen.org>,
	"Aarushi Mehta" <mehta.aaru20@gmail.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Anthony Perard" <anthony.perard@citrix.com>,
	"Richard W.M. Jones" <rjones@redhat.com>,
	"Coiby Xu" <Coiby.Xu@gmail.com>,
	"Stefano Garzarella" <sgarzare@redhat.com>
Subject: Re: [PATCH v3 13/20] block/export: rewrite vduse-blk drain code
Date: Tue, 25 Apr 2023 12:42:41 -0400	[thread overview]
Message-ID: <20230425164241.GC725672@fedora> (raw)
In-Reply-To: <CACycT3suSR+nYhe4z2zuocYsBBVSDBCE+614zT0jfDZCBRveaA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4185 bytes --]

On Fri, Apr 21, 2023 at 11:36:02AM +0800, Yongji Xie wrote:
> Hi Stefan,
> 
> On Thu, Apr 20, 2023 at 7:39 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > vduse_blk_detach_ctx() waits for in-flight requests using
> > AIO_WAIT_WHILE(). This is not allowed according to a comment in
> > bdrv_set_aio_context_commit():
> >
> >   /*
> >    * Take the old AioContex when detaching it from bs.
> >    * At this point, new_context lock is already acquired, and we are now
> >    * also taking old_context. This is safe as long as bdrv_detach_aio_context
> >    * does not call AIO_POLL_WHILE().
> >    */
> >
> > Use this opportunity to rewrite the drain code in vduse-blk:
> >
> > - Use the BlockExport refcount so that vduse_blk_exp_delete() is only
> >   called when there are no more requests in flight.
> >
> > - Implement .drained_poll() so in-flight request coroutines are stopped
> >   by the time .bdrv_detach_aio_context() is called.
> >
> > - Remove AIO_WAIT_WHILE() from vduse_blk_detach_ctx() to solve the
> >   .bdrv_detach_aio_context() constraint violation. It's no longer
> >   needed due to the previous changes.
> >
> > - Always handle the VDUSE file descriptor, even in drained sections. The
> >   VDUSE file descriptor doesn't submit I/O, so it's safe to handle it in
> >   drained sections. This ensures that the VDUSE kernel code gets a fast
> >   response.
> >
> > - Suspend virtqueue fd handlers in .drained_begin() and resume them in
> >   .drained_end(). This eliminates the need for the
> >   aio_set_fd_handler(is_external=true) flag, which is being removed from
> >   QEMU.
> >
> > This is a long list but splitting it into individual commits would
> > probably lead to git bisect failures - the changes are all related.
> >
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  block/export/vduse-blk.c | 132 +++++++++++++++++++++++++++------------
> >  1 file changed, 93 insertions(+), 39 deletions(-)
> >
> > diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
> > index f7ae44e3ce..35dc8fcf45 100644
> > --- a/block/export/vduse-blk.c
> > +++ b/block/export/vduse-blk.c
> > @@ -31,7 +31,8 @@ typedef struct VduseBlkExport {
> >      VduseDev *dev;
> >      uint16_t num_queues;
> >      char *recon_file;
> > -    unsigned int inflight;
> > +    unsigned int inflight; /* atomic */
> > +    bool vqs_started;
> >  } VduseBlkExport;
> >
> >  typedef struct VduseBlkReq {
> > @@ -41,13 +42,24 @@ typedef struct VduseBlkReq {
> >
> >  static void vduse_blk_inflight_inc(VduseBlkExport *vblk_exp)
> >  {
> > -    vblk_exp->inflight++;
> > +    if (qatomic_fetch_inc(&vblk_exp->inflight) == 0) {
> 
> I wonder why we need to use atomic operations here.

The inflight counter is only modified by the vhost-user export thread,
but it may be read by another thread here:

  static bool vduse_blk_drained_poll(void *opaque)
  {
      BlockExport *exp = opaque;
      VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);

      return qatomic_read(&vblk_exp->inflight) > 0;

BlockDevOps->drained_poll() calls are invoked when BlockDriverStates are
drained (e.g. blk_drain_all() and related APIs).

> > @@ -355,13 +410,12 @@ static void vduse_blk_exp_delete(BlockExport *exp)
> >      g_free(vblk_exp->handler.serial);
> >  }
> >
> > +/* Called with exp->ctx acquired */
> >  static void vduse_blk_exp_request_shutdown(BlockExport *exp)
> >  {
> >      VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
> >
> > -    aio_context_acquire(vblk_exp->export.ctx);
> > -    vduse_blk_detach_ctx(vblk_exp);
> > -    aio_context_acquire(vblk_exp->export.ctx);
> > +    vduse_blk_stop_virtqueues(vblk_exp);
> 
> Can we add a AIO_WAIT_WHILE() here? Then we don't need to
> increase/decrease the BlockExport refcount during I/O processing.

I don't think so because vduse_blk_exp_request_shutdown() is not the
only place where we wait for requests to complete. There would still
need to be away to wait for requests to finish (without calling
AIO_WAIT_WHILE()) in vduse_blk_drained_poll().

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2023-04-25 16:43 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-20 11:37 [PATCH v3 00/20] block: remove aio_disable_external() API Stefan Hajnoczi
2023-04-20 11:37 ` [PATCH v3 01/20] hw/qdev: introduce qdev_is_realized() helper Stefan Hajnoczi
2023-04-20 11:37 ` [PATCH v3 02/20] virtio-scsi: avoid race between unplug and transport event Stefan Hajnoczi
2023-04-20 11:37 ` [PATCH v3 03/20] virtio-scsi: stop using aio_disable_external() during unplug Stefan Hajnoczi
2023-04-20 11:37 ` [PATCH v3 04/20] block/export: only acquire AioContext once for vhost_user_server_stop() Stefan Hajnoczi
2023-04-20 11:37 ` [PATCH v3 05/20] util/vhost-user-server: rename refcount to in_flight counter Stefan Hajnoczi
2023-04-20 11:37 ` [PATCH v3 06/20] block/export: wait for vhost-user-blk requests when draining Stefan Hajnoczi
2023-04-20 11:37 ` [PATCH v3 07/20] block/export: stop using is_external in vhost-user-blk server Stefan Hajnoczi
2023-04-20 11:37 ` [PATCH v3 08/20] hw/xen: do not use aio_set_fd_handler(is_external=true) in xen_xenstore Stefan Hajnoczi
2023-04-20 11:37 ` [PATCH v3 09/20] block: add blk_in_drain() API Stefan Hajnoczi
2023-04-20 11:37 ` [PATCH v3 10/20] block: drain from main loop thread in bdrv_co_yield_to_drain() Stefan Hajnoczi
2023-04-20 11:37 ` [PATCH v3 11/20] xen-block: implement BlockDevOps->drained_begin() Stefan Hajnoczi
2023-04-20 11:37 ` [PATCH v3 12/20] hw/xen: do not set is_external=true on evtchn fds Stefan Hajnoczi
2023-04-20 11:37 ` [PATCH v3 13/20] block/export: rewrite vduse-blk drain code Stefan Hajnoczi
2023-04-21  3:36   ` Yongji Xie
2023-04-25 16:42     ` Stefan Hajnoczi [this message]
2023-04-26  2:23       ` Yongji Xie
2023-04-20 11:37 ` [PATCH v3 14/20] block/export: don't require AioContext lock around blk_exp_ref/unref() Stefan Hajnoczi
2023-04-20 11:37 ` [PATCH v3 15/20] block/fuse: do not set is_external=true on FUSE fd Stefan Hajnoczi
2023-04-20 11:37 ` [PATCH v3 16/20] virtio: make it possible to detach host notifier from any thread Stefan Hajnoczi
2023-04-20 11:37 ` [PATCH v3 17/20] virtio-blk: implement BlockDevOps->drained_begin() Stefan Hajnoczi
2023-04-20 11:37 ` [PATCH v3 18/20] virtio-scsi: " Stefan Hajnoczi
2023-04-20 11:37 ` [PATCH v3 19/20] virtio: do not set is_external=true on host notifiers Stefan Hajnoczi
2023-04-20 11:37 ` [PATCH v3 20/20] aio: remove aio_disable_external() API Stefan Hajnoczi
2023-04-20 13:44   ` Philippe Mathieu-Daudé
2023-04-25 16:29     ` Stefan Hajnoczi
2023-04-20 13:39 ` [PATCH v3 00/20] block: " Philippe Mathieu-Daudé
2023-04-25 16:29   ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230425164241.GC725672@fedora \
    --to=stefanha@redhat.com \
    --cc=Coiby.Xu@gmail.com \
    --cc=anthony.perard@citrix.com \
    --cc=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=dwmw2@infradead.org \
    --cc=eduardo@habkost.net \
    --cc=eesposit@redhat.com \
    --cc=fam@euphon.net \
    --cc=hreitz@redhat.com \
    --cc=jusual@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mehta.aaru20@gmail.com \
    --cc=mst@redhat.com \
    --cc=paul@xen.org \
    --cc=pbonzini@redhat.com \
    --cc=philmd@linaro.org \
    --cc=pl@kamp.de \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=richard.henderson@linaro.org \
    --cc=rjones@redhat.com \
    --cc=ronniesahlberg@gmail.com \
    --cc=sgarzare@redhat.com \
    --cc=sstabellini@kernel.org \
    --cc=sw@weilnetz.de \
    --cc=xen-devel@lists.xenproject.org \
    --cc=xieyongji@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).