qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Hanna Czenczek <hreitz@redhat.com>
Cc: qemu-block@nongnu.org, stefanha@redhat.com, qemu-devel@nongnu.org
Subject: Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
Date: Fri, 26 Jan 2024 14:18:27 +0100	[thread overview]
Message-ID: <ZbOxI9Ar-YDn51Z0@redhat.com> (raw)
In-Reply-To: <08a66849-f190-4756-9b01-666f0d66afb6@redhat.com>

Am 25.01.2024 um 18:32 hat Hanna Czenczek geschrieben:
> On 23.01.24 18:10, Kevin Wolf wrote:
> > Am 23.01.2024 um 17:40 hat Hanna Czenczek geschrieben:
> > > On 21.12.23 22:23, Kevin Wolf wrote:
> > > > From: Stefan Hajnoczi<stefanha@redhat.com>
> > > > 
> > > > Stop depending on the AioContext lock and instead access
> > > > SCSIDevice->requests from only one thread at a time:
> > > > - When the VM is running only the BlockBackend's AioContext may access
> > > >     the requests list.
> > > > - When the VM is stopped only the main loop may access the requests
> > > >     list.
> > > > 
> > > > These constraints protect the requests list without the need for locking
> > > > in the I/O code path.
> > > > 
> > > > Note that multiple IOThreads are not supported yet because the code
> > > > assumes all SCSIRequests are executed from a single AioContext. Leave
> > > > that as future work.
> > > > 
> > > > Signed-off-by: Stefan Hajnoczi<stefanha@redhat.com>
> > > > Reviewed-by: Eric Blake<eblake@redhat.com>
> > > > Message-ID:<20231204164259.1515217-2-stefanha@redhat.com>
> > > > Signed-off-by: Kevin Wolf<kwolf@redhat.com>
> > > > ---
> > > >    include/hw/scsi/scsi.h |   7 +-
> > > >    hw/scsi/scsi-bus.c     | 181 ++++++++++++++++++++++++++++-------------
> > > >    2 files changed, 131 insertions(+), 57 deletions(-)
> > > My reproducer for https://issues.redhat.com/browse/RHEL-3934 now breaks more
> > > often because of this commit than because of the original bug, i.e. when
> > > repeatedly hot-plugging and unplugging a virtio-scsi and a scsi-hd device,
> > > this tends to happen when unplugging the scsi-hd:
> > > 
> > > {"execute":"device_del","arguments":{"id":"stg0"}}
> > > {"return": {}}
> > > qemu-system-x86_64: ../block/block-backend.c:2429: blk_get_aio_context:
> > > Assertion `ctx == blk->ctx' failed.
> 
> [...]
> 
> > > I don’t know anything about the problem yet, but as usual, I like
> > > speculation and discovering how wrong I was later on, so one thing I came
> > > across that’s funny about virtio-scsi is that requests can happen even while
> > > a disk is being attached or detached.  That is, Linux seems to probe all
> > > LUNs when a new virtio-scsi device is being attached, and it won’t stop just
> > > because a disk is being attached or removed.  So maybe that’s part of the
> > > problem, that we get a request while the BB is being detached, and
> > > temporarily in an inconsistent state (BDS context differs from BB context).
> > I don't know anything about the problem either, but since you already
> > speculated about the cause, let me speculate about the solution:
> > Can we hold the graph writer lock for the tran_commit() call in
> > bdrv_try_change_aio_context()? And of course take the reader lock for
> > blk_get_aio_context(), but that should be completely unproblematic.
> 
> Actually, now that completely unproblematic part is giving me trouble.  I
> wanted to just put a graph lock into blk_get_aio_context() (making it a
> coroutine with a wrapper)

Which is the first thing I neglected and already not great. We have
calls of blk_get_aio_context() in the SCSI I/O path, and creating a
coroutine and doing at least two context switches simply for this call
is a lot of overhead...

> but callers of blk_get_aio_context() generally assume the context is
> going to stay the BB’s context for as long as their AioContext *
> variable is in scope.

I'm not so sure about that. And taking another step back, I'm actually
also not sure how much it still matters now that they can submit I/O
from any thread.

Maybe the correct solution is to remove the assertion from
blk_get_aio_context() and just always return blk->ctx. If it's in the
middle of a change, you'll either get the old one or the new one. Either
one is fine to submit I/O from, and if you care about changes for other
reasons (like SCSI does), then you need explicit code to protect it
anyway (which SCSI apparently has, but it doesn't work).

> I was tempted to think callers know what happens to the BB they pass
> to blk_get_aio_context(), and it won’t change contexts so easily, but
> then I remembered this is exactly what happens in this case; we run
> scsi_device_for_each_req_async_bh() in one thread (which calls
> blk_get_aio_context()), and in the other, we change the BB’s context.

Let's think a bit more about scsi_device_for_each_req_async()
specifically. This is a function that runs in the main thread. Nothing
will change any AioContext assignment if it doesn't call it. It wants to
make sure that scsi_device_for_each_req_async_bh() is called in the
same AioContext where the virtqueue is processed, so it schedules a BH
and waits for it.

Waiting for it means running a nested event loop that could do anything,
including changing AioContexts. So this is what needs the locking, not
the blk_get_aio_context() call in scsi_device_for_each_req_async_bh().
If we lock before the nested event loop and unlock in the BH, the check
in the BH can become an assertion. (It is important that we unlock in
the BH rather than after waiting because if something takes the writer
lock, we need to unlock during the nested event loop of bdrv_wrlock() to
avoid a deadlock.)

And spawning a coroutine for this looks a lot more acceptable because
it's on a slow path anyway.

In fact, we probably don't technically need a coroutine to take the
reader lock here. We can have a new graph lock function that asserts
that there is no writer (we know because we're running in the main loop)
and then atomically increments the reader count. But maybe that already
complicates things again...

> It seems like there are very few blk_* functions right now that
> require taking a graph lock around it, so I’m hesitant to go that
> route.  But if we’re protecting a BB’s context via the graph write
> lock, I can’t think of a way around having to take a read lock
> whenever reading a BB’s context, and holding it for as long as we
> assume that context to remain the BB’s context.  It’s also hard to
> figure out how long that is, case by case; for example, dma_blk_read()
> schedules an AIO function in the BB context; but we probably don’t
> care that this context remains the BB’s context until the request is
> done.  In the case of scsi_device_for_each_req_async_bh(), we already
> take care to re-schedule it when it turns out the context is outdated,
> so it does seem quite important here, and we probably want to keep the
> lock until after the QTAILQ_FOREACH_SAFE() loop.

Maybe we need to audit all callers. Fortunately, there don't seem to be
too many. At least not direct ones...

> On a tangent, this TOCTTOU problem makes me wary of other blk_*
> functions that query information.  For example, fuse_read() (in
> block/export/fuse.c) truncates requests to the BB length.  But what if
> the BB length changes concurrently between blk_getlength() and
> blk_pread()?  While we can justify using the graph lock for a BB’s
> AioContext, we can’t use it for other metadata like its length.

Hm... Is "tough luck" an acceptable answer? ;-)

Kevin



  reply	other threads:[~2024-01-26 13:19 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
2023-12-21 21:23 ` [PULL 01/33] nbd/server: avoid per-NBDRequest nbd_client_get/put() Kevin Wolf
2023-12-21 21:23 ` [PULL 02/33] nbd/server: only traverse NBDExport->clients from main loop thread Kevin Wolf
2023-12-21 21:23 ` [PULL 03/33] nbd/server: introduce NBDClient->lock to protect fields Kevin Wolf
2023-12-21 21:23 ` [PULL 04/33] block/file-posix: set up Linux AIO and io_uring in the current thread Kevin Wolf
2023-12-21 21:23 ` [PULL 05/33] virtio-blk: add lock to protect s->rq Kevin Wolf
2023-12-21 21:23 ` [PULL 06/33] virtio-blk: don't lock AioContext in the completion code path Kevin Wolf
2023-12-21 21:23 ` [PULL 07/33] virtio-blk: don't lock AioContext in the submission " Kevin Wolf
2023-12-21 21:23 ` [PULL 08/33] block: Fix crash when loading snapshot on inactive node Kevin Wolf
2023-12-21 21:23 ` [PULL 09/33] vl: Improve error message for conflicting -incoming and -loadvm Kevin Wolf
2023-12-21 21:23 ` [PULL 10/33] iotests: Basic tests for internal snapshots Kevin Wolf
2023-12-21 21:23 ` [PULL 11/33] scsi: only access SCSIDevice->requests from one thread Kevin Wolf
2024-01-23 16:40   ` Hanna Czenczek
2024-01-23 17:10     ` Kevin Wolf
2024-01-23 17:23       ` Hanna Czenczek
2024-01-24 12:12       ` Hanna Czenczek
2024-01-24 21:53         ` Stefan Hajnoczi
2024-01-25  9:06           ` Hanna Czenczek
2024-01-25 13:25             ` Stefan Hajnoczi
2024-01-25 17:32       ` Hanna Czenczek
2024-01-26 13:18         ` Kevin Wolf [this message]
2024-01-26 15:24           ` Hanna Czenczek
2024-01-31 20:35             ` Stefan Hajnoczi
2024-02-01 14:10               ` Hanna Czenczek
2024-02-01 14:28                 ` Stefan Hajnoczi
2024-02-01 15:25                   ` Hanna Czenczek
2024-02-01 15:49                     ` Hanna Czenczek
2024-02-02 12:32                     ` Hanna Czenczek
2024-02-06 19:32                       ` Stefan Hajnoczi
2024-01-29 16:30       ` Hanna Czenczek
2024-01-31 10:17         ` Kevin Wolf
2024-02-01  9:43           ` Hanna Czenczek
2024-02-01 10:21             ` Kevin Wolf
2024-02-01 10:35               ` Hanna Czenczek
2024-01-23 17:21     ` Hanna Czenczek
2023-12-21 21:23 ` [PULL 12/33] virtio-scsi: don't lock AioContext around virtio_queue_aio_attach_host_notifier() Kevin Wolf
2023-12-21 21:23 ` [PULL 13/33] scsi: don't lock AioContext in I/O code path Kevin Wolf
2023-12-21 21:23 ` [PULL 14/33] dma-helpers: don't lock AioContext in dma_blk_cb() Kevin Wolf
2023-12-21 21:23 ` [PULL 15/33] virtio-scsi: replace AioContext lock with tmf_bh_lock Kevin Wolf
2023-12-21 21:23 ` [PULL 16/33] scsi: assert that callbacks run in the correct AioContext Kevin Wolf
2023-12-21 21:23 ` [PULL 17/33] tests: remove aio_context_acquire() tests Kevin Wolf
2023-12-21 21:23 ` [PULL 18/33] aio: make aio_context_acquire()/aio_context_release() a no-op Kevin Wolf
2023-12-21 21:23 ` [PULL 19/33] graph-lock: remove AioContext locking Kevin Wolf
2023-12-21 21:23 ` [PULL 20/33] block: " Kevin Wolf
2023-12-21 21:23 ` [PULL 21/33] block: remove bdrv_co_lock() Kevin Wolf
2023-12-21 21:23 ` [PULL 22/33] scsi: remove AioContext locking Kevin Wolf
2023-12-21 21:23 ` [PULL 23/33] aio-wait: draw equivalence between AIO_WAIT_WHILE() and AIO_WAIT_WHILE_UNLOCKED() Kevin Wolf
2023-12-21 21:23 ` [PULL 24/33] aio: remove aio_context_acquire()/aio_context_release() API Kevin Wolf
2023-12-21 21:23 ` [PULL 25/33] docs: remove AioContext lock from IOThread docs Kevin Wolf
2023-12-21 21:23 ` [PULL 26/33] scsi: remove outdated AioContext lock comment Kevin Wolf
2023-12-21 21:23 ` [PULL 27/33] job: remove outdated AioContext locking comments Kevin Wolf
2023-12-21 21:23 ` [PULL 28/33] block: " Kevin Wolf
2023-12-21 21:23 ` [PULL 29/33] block-coroutine-wrapper: use qemu_get_current_aio_context() Kevin Wolf
2023-12-21 21:23 ` [PULL 30/33] string-output-visitor: show structs as "<omitted>" Kevin Wolf
2023-12-21 21:23 ` [PULL 31/33] qdev-properties: alias all object class properties Kevin Wolf
2023-12-21 21:23 ` [PULL 32/33] qdev: add IOThreadVirtQueueMappingList property type Kevin Wolf
2023-12-21 21:23 ` [PULL 33/33] virtio-blk: add iothread-vq-mapping parameter Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZbOxI9Ar-YDn51Z0@redhat.com \
    --to=kwolf@redhat.com \
    --cc=hreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).