All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Peter Xu" <peterx@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Fam Zheng" <fam@euphon.net>
Subject: Re: [PATCH v2 2/3] dma-helpers: prevent dma_blk_cb() vs dma_aio_cancel() race
Date: Thu, 16 Feb 2023 16:27:54 -0500	[thread overview]
Message-ID: <Y+6f2rc6I4z7HVAy@fedora> (raw)
In-Reply-To: <Y+5LblriH/1jmHcB@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 4811 bytes --]

On Thu, Feb 16, 2023 at 04:27:42PM +0100, Kevin Wolf wrote:
> Am 10.02.2023 um 15:32 hat Stefan Hajnoczi geschrieben:
> > dma_blk_cb() only takes the AioContext lock around ->io_func(). That
> > means the rest of dma_blk_cb() is not protected. In particular, the
> > DMAAIOCB field accesses happen outside the lock.
> > 
> > There is a race when the main loop thread holds the AioContext lock and
> > invokes scsi_device_purge_requests() -> bdrv_aio_cancel() ->
> > dma_aio_cancel() while an IOThread executes dma_blk_cb(). The dbs->acb
> > field determines how cancellation proceeds. If dma_aio_cancel() see
> > dbs->acb == NULL while dma_blk_cb() is still running, the request can be
> > completed twice (-ECANCELED and the actual return value).
> > 
> > The following assertion can occur with virtio-scsi when an IOThread is
> > used:
> > 
> >   ../hw/scsi/scsi-disk.c:368: scsi_dma_complete: Assertion `r->req.aiocb != NULL' failed.
> > 
> > Fix the race by holding the AioContext across dma_blk_cb(). Now
> > dma_aio_cancel() under the AioContext lock will not see
> > inconsistent/intermediate states.
> > 
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Two things that seem to need attention in the review:
> 
> > diff --git a/softmmu/dma-helpers.c b/softmmu/dma-helpers.c
> > index 7820fec54c..2463964805 100644
> > --- a/softmmu/dma-helpers.c
> > +++ b/softmmu/dma-helpers.c
> > @@ -113,17 +113,19 @@ static void dma_complete(DMAAIOCB *dbs, int ret)
> >  static void dma_blk_cb(void *opaque, int ret)
> >  {
> >      DMAAIOCB *dbs = (DMAAIOCB *)opaque;
> > +    AioContext *ctx = dbs->ctx;
> >      dma_addr_t cur_addr, cur_len;
> >      void *mem;
> >  
> >      trace_dma_blk_cb(dbs, ret);
> >  
> > +    aio_context_acquire(ctx);
> 
> During the first iteration, the caller may already hold the AioContext
> lock. In the case of scsi-disk, it does. Locking a second time is fine
> in principle because it's a recursive lock, but we have to be careful
> not to call AIO_WAIT_WHILE() indirectly then because it could deadlock.
> 
> Except for the dbs->common.cb (see below) I don't see any functions that
> would be problematic in this respect. In fact, the one I would be most
> worried about is dbs->io_func, but it was already locked before.
> 
> >      dbs->acb = NULL;
> >      dbs->offset += dbs->iov.size;
> >  
> >      if (dbs->sg_cur_index == dbs->sg->nsg || ret < 0) {
> >          dma_complete(dbs, ret);
> 
> All request callbacks hold the AioContext lock now when they didn't
> before. I wonder if it's worth documenting the locking rules for
> dma_blk_io() in a comment. Could be a separate patch, though.
> 
> You remove the locking in scsi_dma_complete() to compensate. All the
> other callers come from IDE and nvme, which don't take the lock
> internally. Taking the main AioContext lock once is fine, so this looks
> good.
> 
> If it is possible that we already complete on the first iteration, this
> could however also be affected by the case above so that the AioContext
> is locked twice. In this case, the invoked callback must not call
> AIO_WAIT_WHILE() and we would need to audit IDE and nvme.
> 
> Is it possible? In other words, can dbs->sg->nsg be 0? If not, can we
> assert and document it?

In the nsg == 0 case there's another problem: the completion callback
function is invoked and the AIOCB is unref'd before dma_blk_io() returns
the stale AIOCB pointer.

That would lead to problems later because the pattern is typically:

  r->aiocb = dma_blk_io(...);
  ...
  use r and r->aiocb later

So I don't think nsg = 0 makes sense.

Functions I looked at invoke dma_blk_io() only when there are still
bytes to transfer. I think we're safe but I'll admit I'm not 100% sure.

> > -        return;
> > +        goto out;
> >      }
> >      dma_blk_unmap(dbs);
> >  
> > @@ -164,9 +166,9 @@ static void dma_blk_cb(void *opaque, int ret)
> >  
> >      if (dbs->iov.size == 0) {
> >          trace_dma_map_wait(dbs);
> > -        dbs->bh = aio_bh_new(dbs->ctx, reschedule_dma, dbs);
> > +        dbs->bh = aio_bh_new(ctx, reschedule_dma, dbs);
> 
> Does copying dbs->ctx to a local variable imply that it may change
> during the function? I didn't think so, but if it may, then why is
> calling aio_bh_new() with the old value right?

I changed this line for consistency, not to change behavior or fix a bug.

Regarding AioContext changes, they can't happen because no function that
changes the AioContext is called between this line and the earlier
aio_context_acquire().

(Having to worry about AioContext changes is a pain though and I look
forward to when we can get rid of this lock.)

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2023-02-16 21:28 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-10 14:32 [PATCH v2 0/3] virtio-scsi: fix SCSIDevice hot unplug with IOThread Stefan Hajnoczi
2023-02-10 14:32 ` [PATCH v2 1/3] scsi: protect req->aiocb with AioContext lock Stefan Hajnoczi
2023-02-15 18:17   ` Eric Blake
2023-02-16 12:34   ` Kevin Wolf
2023-02-10 14:32 ` [PATCH v2 2/3] dma-helpers: prevent dma_blk_cb() vs dma_aio_cancel() race Stefan Hajnoczi
2023-02-15 18:19   ` Eric Blake
2023-02-16 15:27   ` Kevin Wolf
2023-02-16 21:27     ` Stefan Hajnoczi [this message]
2023-02-10 14:32 ` [PATCH v2 3/3] virtio-scsi: reset SCSI devices from main loop thread Stefan Hajnoczi
2023-02-15 18:27   ` Eric Blake
2023-02-17 10:22   ` Kevin Wolf
     [not found]     ` <Y/Tz+qw7thcwO+G3@fedora>
2023-02-23 15:03       ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y+6f2rc6I4z7HVAy@fedora \
    --to=stefanha@redhat.com \
    --cc=david@redhat.com \
    --cc=fam@euphon.net \
    --cc=kwolf@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.