[PATCH 0/2] block: Allow concurrent BB context changes

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] block: Allow concurrent BB context changes
@ 2024-02-02 14:47 Hanna Czenczek
  2024-02-02 14:47 ` [PATCH 1/2] block-backend: Allow concurrent " Hanna Czenczek
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Hanna Czenczek @ 2024-02-02 14:47 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, Hanna Czenczek, qemu-stable, Stefan Hajnoczi,
	Kevin Wolf, Paolo Bonzini, Fam Zheng

Hi,

Without the AioContext lock, a BB's context may kind of change at any
time (unless it has a root node, and I/O requests are pending).  That
also means that its own context (BlockBackend.ctx) and that of its root
node can differ sometimes (while the context is being changed).

blk_get_aio_context() doesn't know this yet and asserts that both are
always equal (if there is a root node).  Because it's no longer true,
and because callers don't seem to really care about the root node's
context, we can and should remove the assertion and just return the BB's
context.

Beyond that, the question is whether the callers of
blk_get_aio_context() are OK with the context potentially changing
concurrently.  Honestly, it isn't entirely clear to me; most look OK,
except for the virtio-scsi code, which operates under the general
assumption that the BB's context is always equal to that of the
virtio-scsi device.  I doubt that this assumption always holds (it is
definitely not obvious to me that it would), but then again, this series
will not make matters worse in that regard, and that is what counts for
me now.

One clear point of contention is scsi_device_for_each_req_async(), which
is addressed by patch 2.  Right now, it schedules a BH in the BB
context, then the BH double-checks whether the context still fits, and
if not, re-schedules itself.  Because virtio-scsi's context is fixed,
this seems to indicate to me that it wants to be able to deal with a
case where BB and virtio-scsi context differ, which seems to break that
aforementioned general virtio-scsi assumption.

Unfortunately, I definitely have to touch that code, because accepting
concurrent changes of AioContexts breaks the double-check (just because
the BB has the right context in that place does not mean it stays in
that context); instead, we must prevent any concurrent change until the
BH is done.  Because changing contexts generally involves a drained
section, we can prevent it by keeping the BB in-flight counter elevated.

Question is, how to reason for that.  I’d really rather not include the
need to follow the BB context in my argument, because I find that part a
bit fishy.

Luckily, there’s a second, completely different reason for having
scsi_device_for_each_req_async() increment the in-flight counter:
Specifically, scsi_device_purge_requests() probably wants to await full
completion of scsi_device_for_each_req_async(), and we can do that most
easily in the very same way by incrementing the in-flight counter.  This
way, the blk_drain() in scsi_device_purge_requests() will not only await
all (cancelled) I/O requests, but also the non-I/O requests.

The fact that this prevents the BB AioContext from changing while the BH
is scheduled/running then is just a nice side effect.

Hanna Czenczek (2):
  block-backend: Allow concurrent context changes
  scsi: Await request purging

 block/block-backend.c | 22 +++++++++++-----------
 hw/scsi/scsi-bus.c    | 30 +++++++++++++++++++++---------
 2 files changed, 32 insertions(+), 20 deletions(-)

-- 
2.43.0

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] block-backend: Allow concurrent context changes
  2024-02-02 14:47 [PATCH 0/2] block: Allow concurrent BB context changes Hanna Czenczek
@ 2024-02-02 14:47 ` Hanna Czenczek
  2024-02-06 16:55   ` Stefan Hajnoczi
  2024-02-02 14:47 ` [PATCH 2/2] scsi: Await request purging Hanna Czenczek
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Hanna Czenczek @ 2024-02-02 14:47 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, Hanna Czenczek, qemu-stable, Stefan Hajnoczi,
	Kevin Wolf, Paolo Bonzini, Fam Zheng

Since AioContext locks have been removed, a BlockBackend's AioContext
may really change at any time (only exception is that it is often
confined to a drained section, as noted in this patch).  Therefore,
blk_get_aio_context() cannot rely on its root node's context always
matching that of the BlockBackend.

In practice, whether they match does not matter anymore anyway: Requests
can be sent to BDSs from any context, so anyone who requests the BB's
context should have no reason to require the root node to have the same
context.  Therefore, we can and should remove the assertion to that
effect.

In addition, because the context can be set and queried from different
threads concurrently, it has to be accessed with atomic operations.

Buglink: https://issues.redhat.com/browse/RHEL-19381
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
 block/block-backend.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 209eb07528..9c4de79e6b 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -44,7 +44,7 @@ struct BlockBackend {
     char *name;
     int refcnt;
     BdrvChild *root;
-    AioContext *ctx;
+    AioContext *ctx; /* access with atomic operations only */
     DriveInfo *legacy_dinfo;    /* null unless created by drive_new() */
     QTAILQ_ENTRY(BlockBackend) link;         /* for block_backends */
     QTAILQ_ENTRY(BlockBackend) monitor_link; /* for monitor_block_backends */
@@ -2414,22 +2414,22 @@ void blk_op_unblock_all(BlockBackend *blk, Error *reason)
     }
 }
 
+/**
+ * Return BB's current AioContext.  Note that this context may change
+ * concurrently at any time, with one exception: If the BB has a root node
+ * attached, its context will only change through bdrv_try_change_aio_context(),
+ * which creates a drained section.  Therefore, incrementing such a BB's
+ * in-flight counter will prevent its context from changing.
+ */
 AioContext *blk_get_aio_context(BlockBackend *blk)
 {
-    BlockDriverState *bs;
     IO_CODE();
 
     if (!blk) {
         return qemu_get_aio_context();
     }
 
-    bs = blk_bs(blk);
-    if (bs) {
-        AioContext *ctx = bdrv_get_aio_context(blk_bs(blk));
-        assert(ctx == blk->ctx);
-    }
-
-    return blk->ctx;
+    return qatomic_read(&blk->ctx);
 }
 
 int blk_set_aio_context(BlockBackend *blk, AioContext *new_context,
@@ -2442,7 +2442,7 @@ int blk_set_aio_context(BlockBackend *blk, AioContext *new_context,
     GLOBAL_STATE_CODE();
 
     if (!bs) {
-        blk->ctx = new_context;
+        qatomic_set(&blk->ctx, new_context);
         return 0;
     }
 
@@ -2471,7 +2471,7 @@ static void blk_root_set_aio_ctx_commit(void *opaque)
     AioContext *new_context = s->new_ctx;
     ThrottleGroupMember *tgm = &blk->public.throttle_group_member;
 
-    blk->ctx = new_context;
+    qatomic_set(&blk->ctx, new_context);
     if (tgm->throttle_state) {
         throttle_group_detach_aio_context(tgm);
         throttle_group_attach_aio_context(tgm, new_context);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] scsi: Await request purging
  2024-02-02 14:47 [PATCH 0/2] block: Allow concurrent BB context changes Hanna Czenczek
  2024-02-02 14:47 ` [PATCH 1/2] block-backend: Allow concurrent " Hanna Czenczek
@ 2024-02-02 14:47 ` Hanna Czenczek
  2024-02-06 16:56   ` Stefan Hajnoczi
  2024-02-06 16:53 ` [PATCH 0/2] block: Allow concurrent BB context changes Stefan Hajnoczi
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Hanna Czenczek @ 2024-02-02 14:47 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, Hanna Czenczek, qemu-stable, Stefan Hajnoczi,
	Kevin Wolf, Paolo Bonzini, Fam Zheng

scsi_device_for_each_req_async() currently does not provide any way to
be awaited.  One of its callers is scsi_device_purge_requests(), which
therefore currently does not guarantee that all requests are fully
settled when it returns.

We want all requests to be settled, because scsi_device_purge_requests()
is called through the unrealize path, including the one invoked by
virtio_scsi_hotunplug() through qdev_simple_device_unplug_cb(), which
most likely assumes that all SCSI requests are done then.

In fact, scsi_device_purge_requests() already contains a blk_drain(),
but this will not fully await scsi_device_for_each_req_async(), only the
I/O requests it potentially cancels (not the non-I/O requests).
However, we can have scsi_device_for_each_req_async() increment the BB
in-flight counter, and have scsi_device_for_each_req_async_bh()
decrement it when it is done.  This way, the blk_drain() will fully
await all SCSI requests to be purged.

This also removes the need for scsi_device_for_each_req_async_bh() to
double-check the current context and potentially re-schedule itself,
should it now differ from the BB's context: Changing a BB's AioContext
with a root node is done through bdrv_try_change_aio_context(), which
creates a drained section.  With this patch, we keep the BB in-flight
counter elevated throughout, so we know the BB's context cannot change.

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
 hw/scsi/scsi-bus.c | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 0a2eb11c56..230313022c 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -120,17 +120,13 @@ static void scsi_device_for_each_req_async_bh(void *opaque)
     SCSIRequest *next;
 
     /*
-     * If the AioContext changed before this BH was called then reschedule into
-     * the new AioContext before accessing ->requests. This can happen when
-     * scsi_device_for_each_req_async() is called and then the AioContext is
-     * changed before BHs are run.
+     * The BB cannot have changed contexts between this BH being scheduled and
+     * now: BBs' AioContexts, when they have a node attached, can only be
+     * changed via bdrv_try_change_aio_context(), in a drained section.  While
+     * we have the in-flight counter incremented, that drain must block.
      */
     ctx = blk_get_aio_context(s->conf.blk);
-    if (ctx != qemu_get_current_aio_context()) {
-        aio_bh_schedule_oneshot(ctx, scsi_device_for_each_req_async_bh,
-                                g_steal_pointer(&data));
-        return;
-    }
+    assert(ctx == qemu_get_current_aio_context());
 
     QTAILQ_FOREACH_SAFE(req, &s->requests, next, next) {
         data->fn(req, data->fn_opaque);
@@ -138,11 +134,16 @@ static void scsi_device_for_each_req_async_bh(void *opaque)
 
     /* Drop the reference taken by scsi_device_for_each_req_async() */
     object_unref(OBJECT(s));
+
+    /* Paired with blk_inc_in_flight() in scsi_device_for_each_req_async() */
+    blk_dec_in_flight(s->conf.blk);
 }
 
 /*
  * Schedule @fn() to be invoked for each enqueued request in device @s. @fn()
  * runs in the AioContext that is executing the request.
+ * Keeps the BlockBackend's in-flight counter incremented until everything is
+ * done, so draining it will settle all scheduled @fn() calls.
  */
 static void scsi_device_for_each_req_async(SCSIDevice *s,
                                            void (*fn)(SCSIRequest *, void *),
@@ -163,6 +164,8 @@ static void scsi_device_for_each_req_async(SCSIDevice *s,
      */
     object_ref(OBJECT(s));
 
+    /* Paired with blk_dec_in_flight() in scsi_device_for_each_req_async_bh() */
+    blk_inc_in_flight(s->conf.blk);
     aio_bh_schedule_oneshot(blk_get_aio_context(s->conf.blk),
                             scsi_device_for_each_req_async_bh,
                             data);
@@ -1728,11 +1731,20 @@ static void scsi_device_purge_one_req(SCSIRequest *req, void *opaque)
     scsi_req_cancel_async(req, NULL);
 }
 
+/**
+ * Cancel all requests, and block until they are deleted.
+ */
 void scsi_device_purge_requests(SCSIDevice *sdev, SCSISense sense)
 {
     scsi_device_for_each_req_async(sdev, scsi_device_purge_one_req, NULL);
 
+    /*
+     * Await all the scsi_device_purge_one_req() calls scheduled by
+     * scsi_device_for_each_req_async(), and all I/O requests that were
+     * cancelled this way, but may still take a bit of time to settle.
+     */
     blk_drain(sdev->conf.blk);
+
     scsi_device_set_ua(sdev, sense);
 }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] block: Allow concurrent BB context changes
  2024-02-02 14:47 [PATCH 0/2] block: Allow concurrent BB context changes Hanna Czenczek
  2024-02-02 14:47 ` [PATCH 1/2] block-backend: Allow concurrent " Hanna Czenczek
  2024-02-02 14:47 ` [PATCH 2/2] scsi: Await request purging Hanna Czenczek
@ 2024-02-06 16:53 ` Stefan Hajnoczi
  2024-02-07  9:35   ` Hanna Czenczek
  2024-02-07 13:55 ` Kevin Wolf
  2024-02-09 14:08 ` Michael Tokarev
  4 siblings, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2024-02-06 16:53 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: qemu-block, qemu-devel, qemu-stable, Kevin Wolf, Paolo Bonzini,
	Fam Zheng

[-- Attachment #1: Type: text/plain, Size: 4294 bytes --]

On Fri, Feb 02, 2024 at 03:47:53PM +0100, Hanna Czenczek wrote:
> Hi,
> 
> Without the AioContext lock, a BB's context may kind of change at any
> time (unless it has a root node, and I/O requests are pending).  That
> also means that its own context (BlockBackend.ctx) and that of its root
> node can differ sometimes (while the context is being changed).
> 
> blk_get_aio_context() doesn't know this yet and asserts that both are
> always equal (if there is a root node).  Because it's no longer true,
> and because callers don't seem to really care about the root node's
> context, we can and should remove the assertion and just return the BB's
> context.
> 
> Beyond that, the question is whether the callers of
> blk_get_aio_context() are OK with the context potentially changing
> concurrently.  Honestly, it isn't entirely clear to me; most look OK,
> except for the virtio-scsi code, which operates under the general
> assumption that the BB's context is always equal to that of the
> virtio-scsi device.  I doubt that this assumption always holds (it is
> definitely not obvious to me that it would), but then again, this series
> will not make matters worse in that regard, and that is what counts for
> me now.
> 
> One clear point of contention is scsi_device_for_each_req_async(), which
> is addressed by patch 2.  Right now, it schedules a BH in the BB
> context, then the BH double-checks whether the context still fits, and
> if not, re-schedules itself.  Because virtio-scsi's context is fixed,
> this seems to indicate to me that it wants to be able to deal with a
> case where BB and virtio-scsi context differ, which seems to break that
> aforementioned general virtio-scsi assumption.

I don't agree with the last sentence: virtio-scsi's context isn't fixed.

The AioContext changes when dataplane is started/stopped. virtio-scsi
switches AioContext between the IOThread's AioContext and the main
loop's qemu_aio_context.

However, virtio-scsi virtqueue processing only happens in the IOThread's
AioContext. Maybe this is what you meant when you said the AioContext is
fixed?

The BH function is aware that the current AioContext might not be the
same as the AioContext at the time the BH was scheduled. That doesn't
break assumptions in the code.

(It may be possible to rewrite virtio-blk, virtio-scsi, and core
VirtIODevice ioeventfd code to use the simpler model where the
AioContext really is fixed because things have changed significantly
over the years, but I looked a few weeks ago and it's difficult work.)

I'm just pointing out that I think this description is incomplete. I
*do* agree with what this patch series is doing :).

> Unfortunately, I definitely have to touch that code, because accepting
> concurrent changes of AioContexts breaks the double-check (just because
> the BB has the right context in that place does not mean it stays in
> that context); instead, we must prevent any concurrent change until the
> BH is done.  Because changing contexts generally involves a drained
> section, we can prevent it by keeping the BB in-flight counter elevated.
> 
> Question is, how to reason for that.  I’d really rather not include the
> need to follow the BB context in my argument, because I find that part a
> bit fishy.
> 
> Luckily, there’s a second, completely different reason for having
> scsi_device_for_each_req_async() increment the in-flight counter:
> Specifically, scsi_device_purge_requests() probably wants to await full
> completion of scsi_device_for_each_req_async(), and we can do that most
> easily in the very same way by incrementing the in-flight counter.  This
> way, the blk_drain() in scsi_device_purge_requests() will not only await
> all (cancelled) I/O requests, but also the non-I/O requests.
> 
> The fact that this prevents the BB AioContext from changing while the BH
> is scheduled/running then is just a nice side effect.
> 
> 
> Hanna Czenczek (2):
>   block-backend: Allow concurrent context changes
>   scsi: Await request purging
> 
>  block/block-backend.c | 22 +++++++++++-----------
>  hw/scsi/scsi-bus.c    | 30 +++++++++++++++++++++---------
>  2 files changed, 32 insertions(+), 20 deletions(-)
> 
> -- 
> 2.43.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] block-backend: Allow concurrent context changes
  2024-02-02 14:47 ` [PATCH 1/2] block-backend: Allow concurrent " Hanna Czenczek
@ 2024-02-06 16:55   ` Stefan Hajnoczi
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2024-02-06 16:55 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: qemu-block, qemu-devel, qemu-stable, Kevin Wolf, Paolo Bonzini,
	Fam Zheng

[-- Attachment #1: Type: text/plain, Size: 1175 bytes --]

On Fri, Feb 02, 2024 at 03:47:54PM +0100, Hanna Czenczek wrote:
> Since AioContext locks have been removed, a BlockBackend's AioContext
> may really change at any time (only exception is that it is often
> confined to a drained section, as noted in this patch).  Therefore,
> blk_get_aio_context() cannot rely on its root node's context always
> matching that of the BlockBackend.
> 
> In practice, whether they match does not matter anymore anyway: Requests
> can be sent to BDSs from any context, so anyone who requests the BB's
> context should have no reason to require the root node to have the same
> context.  Therefore, we can and should remove the assertion to that
> effect.
> 
> In addition, because the context can be set and queried from different
> threads concurrently, it has to be accessed with atomic operations.
> 
> Buglink: https://issues.redhat.com/browse/RHEL-19381
> Suggested-by: Kevin Wolf <kwolf@redhat.com>
> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> ---
>  block/block-backend.c | 22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] scsi: Await request purging
  2024-02-02 14:47 ` [PATCH 2/2] scsi: Await request purging Hanna Czenczek
@ 2024-02-06 16:56   ` Stefan Hajnoczi
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2024-02-06 16:56 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: qemu-block, qemu-devel, qemu-stable, Kevin Wolf, Paolo Bonzini,
	Fam Zheng

[-- Attachment #1: Type: text/plain, Size: 1742 bytes --]

On Fri, Feb 02, 2024 at 03:47:55PM +0100, Hanna Czenczek wrote:
> scsi_device_for_each_req_async() currently does not provide any way to
> be awaited.  One of its callers is scsi_device_purge_requests(), which
> therefore currently does not guarantee that all requests are fully
> settled when it returns.
> 
> We want all requests to be settled, because scsi_device_purge_requests()
> is called through the unrealize path, including the one invoked by
> virtio_scsi_hotunplug() through qdev_simple_device_unplug_cb(), which
> most likely assumes that all SCSI requests are done then.
> 
> In fact, scsi_device_purge_requests() already contains a blk_drain(),
> but this will not fully await scsi_device_for_each_req_async(), only the
> I/O requests it potentially cancels (not the non-I/O requests).
> However, we can have scsi_device_for_each_req_async() increment the BB
> in-flight counter, and have scsi_device_for_each_req_async_bh()
> decrement it when it is done.  This way, the blk_drain() will fully
> await all SCSI requests to be purged.
> 
> This also removes the need for scsi_device_for_each_req_async_bh() to
> double-check the current context and potentially re-schedule itself,
> should it now differ from the BB's context: Changing a BB's AioContext
> with a root node is done through bdrv_try_change_aio_context(), which
> creates a drained section.  With this patch, we keep the BB in-flight
> counter elevated throughout, so we know the BB's context cannot change.
> 
> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> ---
>  hw/scsi/scsi-bus.c | 30 +++++++++++++++++++++---------
>  1 file changed, 21 insertions(+), 9 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] block: Allow concurrent BB context changes
  2024-02-06 16:53 ` [PATCH 0/2] block: Allow concurrent BB context changes Stefan Hajnoczi
@ 2024-02-07  9:35   ` Hanna Czenczek
  2024-02-08 21:15     ` Stefan Hajnoczi
  0 siblings, 1 reply; 13+ messages in thread
From: Hanna Czenczek @ 2024-02-07  9:35 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-block, qemu-devel, qemu-stable, Kevin Wolf, Paolo Bonzini,
	Fam Zheng

[-- Attachment #1: Type: text/plain, Size: 5558 bytes --]

On 06.02.24 17:53, Stefan Hajnoczi wrote:
> On Fri, Feb 02, 2024 at 03:47:53PM +0100, Hanna Czenczek wrote:
>> Hi,
>>
>> Without the AioContext lock, a BB's context may kind of change at any
>> time (unless it has a root node, and I/O requests are pending).  That
>> also means that its own context (BlockBackend.ctx) and that of its root
>> node can differ sometimes (while the context is being changed).
>>
>> blk_get_aio_context() doesn't know this yet and asserts that both are
>> always equal (if there is a root node).  Because it's no longer true,
>> and because callers don't seem to really care about the root node's
>> context, we can and should remove the assertion and just return the BB's
>> context.
>>
>> Beyond that, the question is whether the callers of
>> blk_get_aio_context() are OK with the context potentially changing
>> concurrently.  Honestly, it isn't entirely clear to me; most look OK,
>> except for the virtio-scsi code, which operates under the general
>> assumption that the BB's context is always equal to that of the
>> virtio-scsi device.  I doubt that this assumption always holds (it is
>> definitely not obvious to me that it would), but then again, this series
>> will not make matters worse in that regard, and that is what counts for
>> me now.
>>
>> One clear point of contention is scsi_device_for_each_req_async(), which
>> is addressed by patch 2.  Right now, it schedules a BH in the BB
>> context, then the BH double-checks whether the context still fits, and
>> if not, re-schedules itself.  Because virtio-scsi's context is fixed,
>> this seems to indicate to me that it wants to be able to deal with a
>> case where BB and virtio-scsi context differ, which seems to break that
>> aforementioned general virtio-scsi assumption.
> I don't agree with the last sentence: virtio-scsi's context isn't fixed.
>
> The AioContext changes when dataplane is started/stopped. virtio-scsi
> switches AioContext between the IOThread's AioContext and the main
> loop's qemu_aio_context.
>
> However, virtio-scsi virtqueue processing only happens in the IOThread's
> AioContext. Maybe this is what you meant when you said the AioContext is
> fixed?

Specifically, I meant VirtIOSCSI.ctx, which is set only once in 
virtio_scsi_dataplane_setup().  That’s at least where the virtqueue 
notifiers are registered, so yes, virtqueue processing should at least 
be fixed to that context.  It seems like it’s always possible some 
things are processed in the main thread (not just setup/teardown, but 
also e.g. TMF_LOGICAL_UNIT_RESET), so to me it seems like virtio-scsi 
kind of runs in two contexts simultaneously. Yes, when virtqueue 
processing is paused, all processing VirtIOSCSI.ctx is stopped, but I 
wouldn’t say it switches contexts there.  It just stops processing some 
requests.

Either way, virtio-scsi request processing doesn’t stop just because a 
scsi-hd device is hot-plugged or -unplugged.  If the BB changes contexts 
in the hot-unplug path (while vq request processing is continuing in the 
I/O thread), its context will differ from that of virtio-scsi.

So should I just replace the “the context is fixed” and say that in this 
specific instance, virtio-scsi vq processing continues in the I/O thread?

> The BH function is aware that the current AioContext might not be the
> same as the AioContext at the time the BH was scheduled. That doesn't
> break assumptions in the code.
>
> (It may be possible to rewrite virtio-blk, virtio-scsi, and core
> VirtIODevice ioeventfd code to use the simpler model where the
> AioContext really is fixed because things have changed significantly
> over the years, but I looked a few weeks ago and it's difficult work.)
>
> I'm just pointing out that I think this description is incomplete. I
> *do* agree with what this patch series is doing :).

Well, this description won’t land in any commit log, so from my side, 
I’m not too worried about its correctness. O:)

Hanna

>> Unfortunately, I definitely have to touch that code, because accepting
>> concurrent changes of AioContexts breaks the double-check (just because
>> the BB has the right context in that place does not mean it stays in
>> that context); instead, we must prevent any concurrent change until the
>> BH is done.  Because changing contexts generally involves a drained
>> section, we can prevent it by keeping the BB in-flight counter elevated.
>>
>> Question is, how to reason for that.  I’d really rather not include the
>> need to follow the BB context in my argument, because I find that part a
>> bit fishy.
>>
>> Luckily, there’s a second, completely different reason for having
>> scsi_device_for_each_req_async() increment the in-flight counter:
>> Specifically, scsi_device_purge_requests() probably wants to await full
>> completion of scsi_device_for_each_req_async(), and we can do that most
>> easily in the very same way by incrementing the in-flight counter.  This
>> way, the blk_drain() in scsi_device_purge_requests() will not only await
>> all (cancelled) I/O requests, but also the non-I/O requests.
>>
>> The fact that this prevents the BB AioContext from changing while the BH
>> is scheduled/running then is just a nice side effect.
>>
>>
>> Hanna Czenczek (2):
>>    block-backend: Allow concurrent context changes
>>    scsi: Await request purging
>>
>>   block/block-backend.c | 22 +++++++++++-----------
>>   hw/scsi/scsi-bus.c    | 30 +++++++++++++++++++++---------
>>   2 files changed, 32 insertions(+), 20 deletions(-)
>>
>> -- 
>> 2.43.0
>>

[-- Attachment #2: Type: text/html, Size: 6345 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] block: Allow concurrent BB context changes
  2024-02-02 14:47 [PATCH 0/2] block: Allow concurrent BB context changes Hanna Czenczek
                   ` (2 preceding siblings ...)
  2024-02-06 16:53 ` [PATCH 0/2] block: Allow concurrent BB context changes Stefan Hajnoczi
@ 2024-02-07 13:55 ` Kevin Wolf
  2024-02-09 14:08 ` Michael Tokarev
  4 siblings, 0 replies; 13+ messages in thread
From: Kevin Wolf @ 2024-02-07 13:55 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: qemu-block, qemu-devel, qemu-stable, Stefan Hajnoczi,
	Paolo Bonzini, Fam Zheng

Am 02.02.2024 um 15:47 hat Hanna Czenczek geschrieben:
> Hi,
> 
> Without the AioContext lock, a BB's context may kind of change at any
> time (unless it has a root node, and I/O requests are pending).  That
> also means that its own context (BlockBackend.ctx) and that of its root
> node can differ sometimes (while the context is being changed).
> 
> blk_get_aio_context() doesn't know this yet and asserts that both are
> always equal (if there is a root node).  Because it's no longer true,
> and because callers don't seem to really care about the root node's
> context, we can and should remove the assertion and just return the BB's
> context.
> 
> Beyond that, the question is whether the callers of
> blk_get_aio_context() are OK with the context potentially changing
> concurrently.  Honestly, it isn't entirely clear to me; most look OK,
> except for the virtio-scsi code, which operates under the general
> assumption that the BB's context is always equal to that of the
> virtio-scsi device.  I doubt that this assumption always holds (it is
> definitely not obvious to me that it would), but then again, this series
> will not make matters worse in that regard, and that is what counts for
> me now.
> 
> One clear point of contention is scsi_device_for_each_req_async(), which
> is addressed by patch 2.  Right now, it schedules a BH in the BB
> context, then the BH double-checks whether the context still fits, and
> if not, re-schedules itself.  Because virtio-scsi's context is fixed,
> this seems to indicate to me that it wants to be able to deal with a
> case where BB and virtio-scsi context differ, which seems to break that
> aforementioned general virtio-scsi assumption.
> 
> Unfortunately, I definitely have to touch that code, because accepting
> concurrent changes of AioContexts breaks the double-check (just because
> the BB has the right context in that place does not mean it stays in
> that context); instead, we must prevent any concurrent change until the
> BH is done.  Because changing contexts generally involves a drained
> section, we can prevent it by keeping the BB in-flight counter elevated.
> 
> Question is, how to reason for that.  I’d really rather not include the
> need to follow the BB context in my argument, because I find that part a
> bit fishy.
> 
> Luckily, there’s a second, completely different reason for having
> scsi_device_for_each_req_async() increment the in-flight counter:
> Specifically, scsi_device_purge_requests() probably wants to await full
> completion of scsi_device_for_each_req_async(), and we can do that most
> easily in the very same way by incrementing the in-flight counter.  This
> way, the blk_drain() in scsi_device_purge_requests() will not only await
> all (cancelled) I/O requests, but also the non-I/O requests.
> 
> The fact that this prevents the BB AioContext from changing while the BH
> is scheduled/running then is just a nice side effect.
> 
> 
> Hanna Czenczek (2):
>   block-backend: Allow concurrent context changes
>   scsi: Await request purging
> 
>  block/block-backend.c | 22 +++++++++++-----------
>  hw/scsi/scsi-bus.c    | 30 +++++++++++++++++++++---------
>  2 files changed, 32 insertions(+), 20 deletions(-)

Thanks, applied to the block branch.

Kevin



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] block: Allow concurrent BB context changes
  2024-02-07  9:35   ` Hanna Czenczek
@ 2024-02-08 21:15     ` Stefan Hajnoczi
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2024-02-08 21:15 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: Stefan Hajnoczi, qemu-block, qemu-devel, qemu-stable, Kevin Wolf,
	Paolo Bonzini, Fam Zheng

On Wed, 7 Feb 2024 at 04:36, Hanna Czenczek <hreitz@redhat.com> wrote:
>
> On 06.02.24 17:53, Stefan Hajnoczi wrote:
>
> On Fri, Feb 02, 2024 at 03:47:53PM +0100, Hanna Czenczek wrote:
>
> Hi,
>
> Without the AioContext lock, a BB's context may kind of change at any
> time (unless it has a root node, and I/O requests are pending).  That
> also means that its own context (BlockBackend.ctx) and that of its root
> node can differ sometimes (while the context is being changed).
>
> blk_get_aio_context() doesn't know this yet and asserts that both are
> always equal (if there is a root node).  Because it's no longer true,
> and because callers don't seem to really care about the root node's
> context, we can and should remove the assertion and just return the BB's
> context.
>
> Beyond that, the question is whether the callers of
> blk_get_aio_context() are OK with the context potentially changing
> concurrently.  Honestly, it isn't entirely clear to me; most look OK,
> except for the virtio-scsi code, which operates under the general
> assumption that the BB's context is always equal to that of the
> virtio-scsi device.  I doubt that this assumption always holds (it is
> definitely not obvious to me that it would), but then again, this series
> will not make matters worse in that regard, and that is what counts for
> me now.
>
> One clear point of contention is scsi_device_for_each_req_async(), which
> is addressed by patch 2.  Right now, it schedules a BH in the BB
> context, then the BH double-checks whether the context still fits, and
> if not, re-schedules itself.  Because virtio-scsi's context is fixed,
> this seems to indicate to me that it wants to be able to deal with a
> case where BB and virtio-scsi context differ, which seems to break that
> aforementioned general virtio-scsi assumption.
>
> I don't agree with the last sentence: virtio-scsi's context isn't fixed.
>
> The AioContext changes when dataplane is started/stopped. virtio-scsi
> switches AioContext between the IOThread's AioContext and the main
> loop's qemu_aio_context.
>
> However, virtio-scsi virtqueue processing only happens in the IOThread's
> AioContext. Maybe this is what you meant when you said the AioContext is
> fixed?
>
>
> Specifically, I meant VirtIOSCSI.ctx, which is set only once in virtio_scsi_dataplane_setup().  That’s at least where the virtqueue notifiers are registered, so yes, virtqueue processing should at least be fixed to that context.  It seems like it’s always possible some things are processed in the main thread (not just setup/teardown, but also e.g. TMF_LOGICAL_UNIT_RESET), so to me it seems like virtio-scsi kind of runs in two contexts simultaneously.  Yes, when virtqueue processing is paused, all processing VirtIOSCSI.ctx is stopped, but I wouldn’t say it switches contexts there.  It just stops processing some requests.
>
> Either way, virtio-scsi request processing doesn’t stop just because a scsi-hd device is hot-plugged or -unplugged.  If the BB changes contexts in the hot-unplug path (while vq request processing is continuing in the I/O thread), its context will differ from that of virtio-scsi.
>
> So should I just replace the “the context is fixed” and say that in this specific instance, virtio-scsi vq processing continues in the I/O thread?
>
> The BH function is aware that the current AioContext might not be the
> same as the AioContext at the time the BH was scheduled. That doesn't
> break assumptions in the code.
>
> (It may be possible to rewrite virtio-blk, virtio-scsi, and core
> VirtIODevice ioeventfd code to use the simpler model where the
> AioContext really is fixed because things have changed significantly
> over the years, but I looked a few weeks ago and it's difficult work.)
>
> I'm just pointing out that I think this description is incomplete. I
> *do* agree with what this patch series is doing :).
>
>
> Well, this description won’t land in any commit log, so from my side, I’m not too worried about its correctness. O:)

Okay, I think we're in agreement. What you described in your reply
matches how I understand the code. No need to resend anything.

Stefan


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] block: Allow concurrent BB context changes
  2024-02-02 14:47 [PATCH 0/2] block: Allow concurrent BB context changes Hanna Czenczek
                   ` (3 preceding siblings ...)
  2024-02-07 13:55 ` Kevin Wolf
@ 2024-02-09 14:08 ` Michael Tokarev
  2024-02-09 16:51   ` Hanna Czenczek
  4 siblings, 1 reply; 13+ messages in thread
From: Michael Tokarev @ 2024-02-09 14:08 UTC (permalink / raw)
  To: Hanna Czenczek, qemu-block
  Cc: qemu-devel, qemu-stable, Stefan Hajnoczi, Kevin Wolf,
	Paolo Bonzini, Fam Zheng

02.02.2024 17:47, Hanna Czenczek :
> Hi,
> 
> Without the AioContext lock, a BB's context may kind of change at any
> time (unless it has a root node, and I/O requests are pending).  That
> also means that its own context (BlockBackend.ctx) and that of its root
> node can differ sometimes (while the context is being changed).

How relevant this is for -stable (8.2 at least) which does not have
"scsi: eliminate AioContext lock" patchset, and in particular,:
v8.2.0-124-geaad0fe260 "scsi: only access SCSIDevice->requests from
one thread"?

The issue first patch "block-backend: Allow concurrent context changes"
fixes (RHEL-19381) seems to be for 8.1.something, so it exists in 8.2
too, and this particular fix applies to 8.2.

But with other changes around all this, I'm a bit lost as of what should
be done on stable.  Not even thinking about 7.2 here :)

Thanks,

/mjt

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] block: Allow concurrent BB context changes
  2024-02-09 14:08 ` Michael Tokarev
@ 2024-02-09 16:51   ` Hanna Czenczek
  2024-02-10  8:46     ` Michael Tokarev
  0 siblings, 1 reply; 13+ messages in thread
From: Hanna Czenczek @ 2024-02-09 16:51 UTC (permalink / raw)
  To: Michael Tokarev, qemu-block
  Cc: qemu-devel, qemu-stable, Stefan Hajnoczi, Kevin Wolf,
	Paolo Bonzini, Fam Zheng

On 09.02.24 15:08, Michael Tokarev wrote:
> 02.02.2024 17:47, Hanna Czenczek :
>> Hi,
>>
>> Without the AioContext lock, a BB's context may kind of change at any
>> time (unless it has a root node, and I/O requests are pending). That
>> also means that its own context (BlockBackend.ctx) and that of its root
>> node can differ sometimes (while the context is being changed).
>
> How relevant this is for -stable (8.2 at least) which does not have
> "scsi: eliminate AioContext lock" patchset, and in particular,:
> v8.2.0-124-geaad0fe260 "scsi: only access SCSIDevice->requests from
> one thread"?
>
> The issue first patch "block-backend: Allow concurrent context changes"
> fixes (RHEL-19381) seems to be for 8.1.something, so it exists in 8.2
> too, and this particular fix applies to 8.2.
>
> But with other changes around all this, I'm a bit lost as of what should
> be done on stable.  Not even thinking about 7.2 here :)

Ah, sorry, yes.  Since we do still have the AioContext lock, this series 
won’t be necessary in -stable.  Sorry for the noise!

Hanna



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] block: Allow concurrent BB context changes
  2024-02-09 16:51   ` Hanna Czenczek
@ 2024-02-10  8:46     ` Michael Tokarev
  2024-02-12  8:52       ` Hanna Czenczek
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Tokarev @ 2024-02-10  8:46 UTC (permalink / raw)
  To: Hanna Czenczek, qemu-block
  Cc: qemu-devel, qemu-stable, Stefan Hajnoczi, Kevin Wolf,
	Paolo Bonzini, Fam Zheng

09.02.2024 19:51, Hanna Czenczek :
> On 09.02.24 15:08, Michael Tokarev wrote:
>> 02.02.2024 17:47, Hanna Czenczek :
>>> Hi,
>>>
>>> Without the AioContext lock, a BB's context may kind of change at any
>>> time (unless it has a root node, and I/O requests are pending). That
>>> also means that its own context (BlockBackend.ctx) and that of its root
>>> node can differ sometimes (while the context is being changed).
>>
>> How relevant this is for -stable (8.2 at least) which does not have
>> "scsi: eliminate AioContext lock" patchset, and in particular,:
>> v8.2.0-124-geaad0fe260 "scsi: only access SCSIDevice->requests from
>> one thread"?
>>
>> The issue first patch "block-backend: Allow concurrent context changes"
>> fixes (RHEL-19381) seems to be for 8.1.something, so it exists in 8.2
>> too, and this particular fix applies to 8.2.
>>
>> But with other changes around all this, I'm a bit lost as of what should
>> be done on stable.  Not even thinking about 7.2 here :)
> 
> Ah, sorry, yes.  Since we do still have the AioContext lock, this series won’t be necessary in -stable.  Sorry for the noise!

Hm. Now I'm confused even more.. :)

ad89367202 "block-backend: Allow concurrent context changes" - the first
one in this series - apparently is needed, as it fixes an issue reported
for qemu 8.1 (https://issues.redhat.com/browse/RHEL-19381).  Or is it not
the case?

FWIW, truth is born in the noise, not in silence ;)

Thanks,

/mjt


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] block: Allow concurrent BB context changes
  2024-02-10  8:46     ` Michael Tokarev
@ 2024-02-12  8:52       ` Hanna Czenczek
  0 siblings, 0 replies; 13+ messages in thread
From: Hanna Czenczek @ 2024-02-12  8:52 UTC (permalink / raw)
  To: Michael Tokarev, qemu-block
  Cc: qemu-devel, qemu-stable, Stefan Hajnoczi, Kevin Wolf,
	Paolo Bonzini, Fam Zheng

On 10.02.24 09:46, Michael Tokarev wrote:
> 09.02.2024 19:51, Hanna Czenczek :
>> On 09.02.24 15:08, Michael Tokarev wrote:
>>> 02.02.2024 17:47, Hanna Czenczek :
>>>> Hi,
>>>>
>>>> Without the AioContext lock, a BB's context may kind of change at any
>>>> time (unless it has a root node, and I/O requests are pending). That
>>>> also means that its own context (BlockBackend.ctx) and that of its 
>>>> root
>>>> node can differ sometimes (while the context is being changed).
>>>
>>> How relevant this is for -stable (8.2 at least) which does not have
>>> "scsi: eliminate AioContext lock" patchset, and in particular,:
>>> v8.2.0-124-geaad0fe260 "scsi: only access SCSIDevice->requests from
>>> one thread"?
>>>
>>> The issue first patch "block-backend: Allow concurrent context changes"
>>> fixes (RHEL-19381) seems to be for 8.1.something, so it exists in 8.2
>>> too, and this particular fix applies to 8.2.
>>>
>>> But with other changes around all this, I'm a bit lost as of what 
>>> should
>>> be done on stable.  Not even thinking about 7.2 here :)
>>
>> Ah, sorry, yes.  Since we do still have the AioContext lock, this 
>> series won’t be necessary in -stable.  Sorry for the noise!
>
> Hm. Now I'm confused even more.. :)
>
> ad89367202 "block-backend: Allow concurrent context changes" - the first
> one in this series - apparently is needed, as it fixes an issue reported
> for qemu 8.1 (https://issues.redhat.com/browse/RHEL-19381).  Or is it not
> the case?

Ah, yes, I got confused there.  There are two (unfortunately? 
fortunately? Red-Hat-internal) comments, one of which describes the 
crash that’s fixed here, so I thought that bug described this crash.  
But the actual description in the report describes something different 
(more like what’s fixed by 
https://lists.nongnu.org/archive/html/qemu-devel/2024-01/msg03649.html, 
but I’m not entirely sure yet).

So basically I got the bug link wrong.  We now have 
https://issues.redhat.com/browse/RHEL-24593, which has been reported 
only against 8.2.

Hanna

> FWIW, truth is born in the noise, not in silence ;)
>
> Thanks,
>
> /mjt
>



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-02-12  8:53 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-02 14:47 [PATCH 0/2] block: Allow concurrent BB context changes Hanna Czenczek
2024-02-02 14:47 ` [PATCH 1/2] block-backend: Allow concurrent " Hanna Czenczek
2024-02-06 16:55   ` Stefan Hajnoczi
2024-02-02 14:47 ` [PATCH 2/2] scsi: Await request purging Hanna Czenczek
2024-02-06 16:56   ` Stefan Hajnoczi
2024-02-06 16:53 ` [PATCH 0/2] block: Allow concurrent BB context changes Stefan Hajnoczi
2024-02-07  9:35   ` Hanna Czenczek
2024-02-08 21:15     ` Stefan Hajnoczi
2024-02-07 13:55 ` Kevin Wolf
2024-02-09 14:08 ` Michael Tokarev
2024-02-09 16:51   ` Hanna Czenczek
2024-02-10  8:46     ` Michael Tokarev
2024-02-12  8:52       ` Hanna Czenczek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).