* [PATCH v4 00/20] block: remove aio_disable_external() API
@ 2023-04-25 17:26 Stefan Hajnoczi
2023-04-25 17:26 ` [PATCH v4 01/20] block-backend: split blk_do_set_aio_context() Stefan Hajnoczi
` (19 more replies)
0 siblings, 20 replies; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-04-25 17:26 UTC (permalink / raw)
To: qemu-devel
Cc: Daniel P. Berrangé, Juan Quintela, Julia Suvorova,
Kevin Wolf, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Stefan Hajnoczi, Richard W.M. Jones,
qemu-block, Stefano Garzarella, Anthony Perard, Stefan Weil,
Xie Yongji, Paolo Bonzini, Aarushi Mehta,
Philippe Mathieu-Daudé, Eduardo Habkost, Stefano Stabellini,
Hanna Reitz, Ronnie Sahlberg
v4:
- Remove external_disable_cnt variable [Philippe]
- Add Patch 1 to fix assertion failure in .drained_end() -> blk_get_aio_context()
v3:
- Resend full patch series. v2 was sent in the middle of a git rebase and was
missing patches. [Eric]
- Apply Reviewed-by tags.
v2:
- Do not rely on BlockBackend request queuing, implement .drained_begin/end()
instead in xen-block, virtio-blk, and virtio-scsi [Paolo]
- Add qdev_is_realized() API [Philippe]
- Add patch to avoid AioContext lock around blk_exp_ref/unref() [Paolo]
- Add patch to call .drained_begin/end() from main loop thread to simplify
callback implementations
The aio_disable_external() API temporarily suspends file descriptor monitoring
in the event loop. The block layer uses this to prevent new I/O requests being
submitted from the guest and elsewhere between bdrv_drained_begin() and
bdrv_drained_end().
While the block layer still needs to prevent new I/O requests in drained
sections, the aio_disable_external() API can be replaced with
.drained_begin/end/poll() callbacks that have been added to BdrvChildClass and
BlockDevOps.
This newer .bdrained_begin/end/poll() approach is attractive because it works
without specifying a specific AioContext. The block layer is moving towards
multi-queue and that means multiple AioContexts may be processing I/O
simultaneously.
The aio_disable_external() was always somewhat hacky. It suspends all file
descriptors that were registered with is_external=true, even if they have
nothing to do with the BlockDriverState graph nodes that are being drained.
It's better to solve a block layer problem in the block layer than to have an
odd event loop API solution.
The approach in this patch series is to implement BlockDevOps
.drained_begin/end() callbacks that temporarily stop file descriptor handlers.
This ensures that new I/O requests are not submitted in drained sections.
The first two virtio-scsi patches were already sent as a separate series. I
included them because they are necessary in order to fully remove
aio_disable_external().
Stefan Hajnoczi (20):
block-backend: split blk_do_set_aio_context()
hw/qdev: introduce qdev_is_realized() helper
virtio-scsi: avoid race between unplug and transport event
virtio-scsi: stop using aio_disable_external() during unplug
util/vhost-user-server: rename refcount to in_flight counter
block/export: wait for vhost-user-blk requests when draining
block/export: stop using is_external in vhost-user-blk server
hw/xen: do not use aio_set_fd_handler(is_external=true) in
xen_xenstore
block: add blk_in_drain() API
block: drain from main loop thread in bdrv_co_yield_to_drain()
xen-block: implement BlockDevOps->drained_begin()
hw/xen: do not set is_external=true on evtchn fds
block/export: rewrite vduse-blk drain code
block/export: don't require AioContext lock around blk_exp_ref/unref()
block/fuse: do not set is_external=true on FUSE fd
virtio: make it possible to detach host notifier from any thread
virtio-blk: implement BlockDevOps->drained_begin()
virtio-scsi: implement BlockDevOps->drained_begin()
virtio: do not set is_external=true on host notifiers
aio: remove aio_disable_external() API
hw/block/dataplane/xen-block.h | 2 +
include/block/aio.h | 57 ---------
include/block/export.h | 2 +
include/hw/qdev-core.h | 17 ++-
include/hw/scsi/scsi.h | 14 +++
include/qemu/vhost-user-server.h | 8 +-
include/sysemu/block-backend-common.h | 25 ++--
include/sysemu/block-backend-global-state.h | 1 +
util/aio-posix.h | 1 -
block.c | 7 --
block/blkio.c | 15 +--
block/block-backend.c | 78 ++++++------
block/curl.c | 10 +-
block/export/export.c | 13 +-
block/export/fuse.c | 58 ++++++++-
block/export/vduse-blk.c | 128 ++++++++++++++------
block/export/vhost-user-blk-server.c | 70 +++++++----
block/io.c | 5 +-
block/io_uring.c | 4 +-
block/iscsi.c | 3 +-
block/linux-aio.c | 4 +-
block/nfs.c | 5 +-
block/nvme.c | 8 +-
block/ssh.c | 4 +-
block/win32-aio.c | 6 +-
hw/block/dataplane/virtio-blk.c | 19 ++-
hw/block/dataplane/xen-block.c | 42 +++++--
hw/block/virtio-blk.c | 38 +++++-
hw/block/xen-block.c | 24 +++-
hw/i386/kvm/xen_xenstore.c | 2 +-
hw/scsi/scsi-bus.c | 46 ++++++-
hw/scsi/scsi-disk.c | 28 ++++-
hw/scsi/virtio-scsi-dataplane.c | 31 +++--
hw/scsi/virtio-scsi.c | 59 +++++++--
hw/virtio/virtio.c | 6 +-
hw/xen/xen-bus.c | 11 +-
io/channel-command.c | 6 +-
io/channel-file.c | 3 +-
io/channel-socket.c | 3 +-
migration/rdma.c | 16 +--
tests/unit/test-aio.c | 27 +----
tests/unit/test-bdrv-drain.c | 1 -
tests/unit/test-fdmon-epoll.c | 73 -----------
util/aio-posix.c | 20 +--
util/aio-win32.c | 8 +-
util/async.c | 3 +-
util/fdmon-epoll.c | 18 +--
util/fdmon-io_uring.c | 8 +-
util/fdmon-poll.c | 3 +-
util/main-loop.c | 7 +-
util/qemu-coroutine-io.c | 7 +-
util/vhost-user-server.c | 33 ++---
hw/scsi/trace-events | 2 +
tests/unit/meson.build | 3 -
54 files changed, 612 insertions(+), 480 deletions(-)
delete mode 100644 tests/unit/test-fdmon-epoll.c
--
2.39.2
^ permalink raw reply [flat|nested] 48+ messages in thread
* [PATCH v4 01/20] block-backend: split blk_do_set_aio_context()
2023-04-25 17:26 [PATCH v4 00/20] block: remove aio_disable_external() API Stefan Hajnoczi
@ 2023-04-25 17:26 ` Stefan Hajnoczi
2023-04-25 17:26 ` [PATCH v4 02/20] hw/qdev: introduce qdev_is_realized() helper Stefan Hajnoczi
` (18 subsequent siblings)
19 siblings, 0 replies; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-04-25 17:26 UTC (permalink / raw)
To: qemu-devel
Cc: Daniel P. Berrangé, Juan Quintela, Julia Suvorova,
Kevin Wolf, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Stefan Hajnoczi, Richard W.M. Jones,
qemu-block, Stefano Garzarella, Anthony Perard, Stefan Weil,
Xie Yongji, Paolo Bonzini, Aarushi Mehta,
Philippe Mathieu-Daudé, Eduardo Habkost, Stefano Stabellini,
Hanna Reitz, Ronnie Sahlberg
blk_set_aio_context() is not fully transactional because
blk_do_set_aio_context() updates blk->ctx outside the transaction. Most
of the time this goes unnoticed but a BlockDevOps.drained_end() callback
that invokes blk_get_aio_context() fails assert(ctx == blk->ctx). This
happens because blk->ctx is only assigned after
BlockDevOps.drained_end() is called and we're in an intermediate state
where BlockDrvierState nodes already have the new context and the
BlockBackend still has the old context.
Making blk_set_aio_context() fully transactional solves this assertion
failure because the BlockBackend's context is updated as part of the
transaction (before BlockDevOps.drained_end() is called).
Split blk_do_set_aio_context() in order to solve this assertion failure.
This helper function actually serves two different purposes:
1. It drives blk_set_aio_context().
2. It responds to BdrvChildClass->change_aio_ctx().
Get rid of the helper function. Do #1 inside blk_set_aio_context() and
do #2 inside blk_root_set_aio_ctx_commit(). This simplifies the code.
The only drawback of the fully transactional approach is that
blk_set_aio_context() must contend with blk_root_set_aio_ctx_commit()
being invoked as part of the AioContext change propagation. This can be
solved by temporarily setting blk->allow_aio_context_change to true.
Future patches call blk_get_aio_context() from
BlockDevOps->drained_end(), so this patch will become necessary.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
block/block-backend.c | 71 +++++++++++++++++--------------------------
1 file changed, 28 insertions(+), 43 deletions(-)
diff --git a/block/block-backend.c b/block/block-backend.c
index 5566ea059d..ffd1d66f7d 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2199,52 +2199,31 @@ static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb)
return blk_get_aio_context(blk_acb->blk);
}
-static int blk_do_set_aio_context(BlockBackend *blk, AioContext *new_context,
- bool update_root_node, Error **errp)
-{
- BlockDriverState *bs = blk_bs(blk);
- ThrottleGroupMember *tgm = &blk->public.throttle_group_member;
- int ret;
-
- if (bs) {
- bdrv_ref(bs);
-
- if (update_root_node) {
- /*
- * update_root_node MUST be false for blk_root_set_aio_ctx_commit(),
- * as we are already in the commit function of a transaction.
- */
- ret = bdrv_try_change_aio_context(bs, new_context, blk->root, errp);
- if (ret < 0) {
- bdrv_unref(bs);
- return ret;
- }
- }
- /*
- * Make blk->ctx consistent with the root node before we invoke any
- * other operations like drain that might inquire blk->ctx
- */
- blk->ctx = new_context;
- if (tgm->throttle_state) {
- bdrv_drained_begin(bs);
- throttle_group_detach_aio_context(tgm);
- throttle_group_attach_aio_context(tgm, new_context);
- bdrv_drained_end(bs);
- }
-
- bdrv_unref(bs);
- } else {
- blk->ctx = new_context;
- }
-
- return 0;
-}
-
int blk_set_aio_context(BlockBackend *blk, AioContext *new_context,
Error **errp)
{
+ bool old_allow_change;
+ BlockDriverState *bs = blk_bs(blk);
+ int ret;
+
GLOBAL_STATE_CODE();
- return blk_do_set_aio_context(blk, new_context, true, errp);
+
+ if (!bs) {
+ blk->ctx = new_context;
+ return 0;
+ }
+
+ bdrv_ref(bs);
+
+ old_allow_change = blk->allow_aio_context_change;
+ blk->allow_aio_context_change = true;
+
+ ret = bdrv_try_change_aio_context(bs, new_context, NULL, errp);
+
+ blk->allow_aio_context_change = old_allow_change;
+
+ bdrv_unref(bs);
+ return ret;
}
typedef struct BdrvStateBlkRootContext {
@@ -2256,8 +2235,14 @@ static void blk_root_set_aio_ctx_commit(void *opaque)
{
BdrvStateBlkRootContext *s = opaque;
BlockBackend *blk = s->blk;
+ AioContext *new_context = s->new_ctx;
+ ThrottleGroupMember *tgm = &blk->public.throttle_group_member;
- blk_do_set_aio_context(blk, s->new_ctx, false, &error_abort);
+ blk->ctx = new_context;
+ if (tgm->throttle_state) {
+ throttle_group_detach_aio_context(tgm);
+ throttle_group_attach_aio_context(tgm, new_context);
+ }
}
static TransactionActionDrv set_blk_root_context = {
--
2.39.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v4 02/20] hw/qdev: introduce qdev_is_realized() helper
2023-04-25 17:26 [PATCH v4 00/20] block: remove aio_disable_external() API Stefan Hajnoczi
2023-04-25 17:26 ` [PATCH v4 01/20] block-backend: split blk_do_set_aio_context() Stefan Hajnoczi
@ 2023-04-25 17:26 ` Stefan Hajnoczi
2023-04-25 17:26 ` [PATCH v4 03/20] virtio-scsi: avoid race between unplug and transport event Stefan Hajnoczi
` (17 subsequent siblings)
19 siblings, 0 replies; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-04-25 17:26 UTC (permalink / raw)
To: qemu-devel
Cc: Daniel P. Berrangé, Juan Quintela, Julia Suvorova,
Kevin Wolf, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Stefan Hajnoczi, Richard W.M. Jones,
qemu-block, Stefano Garzarella, Anthony Perard, Stefan Weil,
Xie Yongji, Paolo Bonzini, Aarushi Mehta,
Philippe Mathieu-Daudé, Eduardo Habkost, Stefano Stabellini,
Hanna Reitz, Ronnie Sahlberg
Add a helper function to check whether the device is realized without
requiring the Big QEMU Lock. The next patch adds a second caller. The
goal is to avoid spreading DeviceState field accesses throughout the
code.
Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
include/hw/qdev-core.h | 17 ++++++++++++++---
hw/scsi/scsi-bus.c | 3 +--
2 files changed, 15 insertions(+), 5 deletions(-)
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index bd50ad5ee1..4d734cf35e 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -1,6 +1,7 @@
#ifndef QDEV_CORE_H
#define QDEV_CORE_H
+#include "qemu/atomic.h"
#include "qemu/queue.h"
#include "qemu/bitmap.h"
#include "qemu/rcu.h"
@@ -164,9 +165,6 @@ struct NamedClockList {
/**
* DeviceState:
- * @realized: Indicates whether the device has been fully constructed.
- * When accessed outside big qemu lock, must be accessed with
- * qatomic_load_acquire()
* @reset: ResettableState for the device; handled by Resettable interface.
*
* This structure should not be accessed directly. We declare it here
@@ -332,6 +330,19 @@ DeviceState *qdev_new(const char *name);
*/
DeviceState *qdev_try_new(const char *name);
+/**
+ * qdev_is_realized:
+ * @dev: The device to check.
+ *
+ * May be called outside big qemu lock.
+ *
+ * Returns: %true% if the device has been fully constructed, %false% otherwise.
+ */
+static inline bool qdev_is_realized(DeviceState *dev)
+{
+ return qatomic_load_acquire(&dev->realized);
+}
+
/**
* qdev_realize: Realize @dev.
* @dev: device to realize
diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index c97176110c..07275fb631 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -60,8 +60,7 @@ static SCSIDevice *do_scsi_device_find(SCSIBus *bus,
* the user access the device.
*/
- if (retval && !include_unrealized &&
- !qatomic_load_acquire(&retval->qdev.realized)) {
+ if (retval && !include_unrealized && !qdev_is_realized(&retval->qdev)) {
retval = NULL;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v4 03/20] virtio-scsi: avoid race between unplug and transport event
2023-04-25 17:26 [PATCH v4 00/20] block: remove aio_disable_external() API Stefan Hajnoczi
2023-04-25 17:26 ` [PATCH v4 01/20] block-backend: split blk_do_set_aio_context() Stefan Hajnoczi
2023-04-25 17:26 ` [PATCH v4 02/20] hw/qdev: introduce qdev_is_realized() helper Stefan Hajnoczi
@ 2023-04-25 17:26 ` Stefan Hajnoczi
2023-05-02 15:19 ` Kevin Wolf
2023-04-25 17:27 ` [PATCH v4 04/20] virtio-scsi: stop using aio_disable_external() during unplug Stefan Hajnoczi
` (16 subsequent siblings)
19 siblings, 1 reply; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-04-25 17:26 UTC (permalink / raw)
To: qemu-devel
Cc: Daniel P. Berrangé, Juan Quintela, Julia Suvorova,
Kevin Wolf, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Stefan Hajnoczi, Richard W.M. Jones,
qemu-block, Stefano Garzarella, Anthony Perard, Stefan Weil,
Xie Yongji, Paolo Bonzini, Aarushi Mehta,
Philippe Mathieu-Daudé, Eduardo Habkost, Stefano Stabellini,
Hanna Reitz, Ronnie Sahlberg, Daniil Tatianin
Only report a transport reset event to the guest after the SCSIDevice
has been unrealized by qdev_simple_device_unplug_cb().
qdev_simple_device_unplug_cb() sets the SCSIDevice's qdev.realized field
to false so that scsi_device_find/get() no longer see it.
scsi_target_emulate_report_luns() also needs to be updated to filter out
SCSIDevices that are unrealized.
These changes ensure that the guest driver does not see the SCSIDevice
that's being unplugged if it responds very quickly to the transport
reset event.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
hw/scsi/scsi-bus.c | 3 ++-
hw/scsi/virtio-scsi.c | 18 +++++++++---------
2 files changed, 11 insertions(+), 10 deletions(-)
diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 07275fb631..64d7311757 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -486,7 +486,8 @@ static bool scsi_target_emulate_report_luns(SCSITargetReq *r)
DeviceState *qdev = kid->child;
SCSIDevice *dev = SCSI_DEVICE(qdev);
- if (dev->channel == channel && dev->id == id && dev->lun != 0) {
+ if (dev->channel == channel && dev->id == id && dev->lun != 0 &&
+ qdev_is_realized(&dev->qdev)) {
store_lun(tmp, dev->lun);
g_byte_array_append(buf, tmp, 8);
len += 8;
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 612c525d9d..000961446c 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -1063,15 +1063,6 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
SCSIDevice *sd = SCSI_DEVICE(dev);
AioContext *ctx = s->ctx ?: qemu_get_aio_context();
- if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
- virtio_scsi_acquire(s);
- virtio_scsi_push_event(s, sd,
- VIRTIO_SCSI_T_TRANSPORT_RESET,
- VIRTIO_SCSI_EVT_RESET_REMOVED);
- scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
- virtio_scsi_release(s);
- }
-
aio_disable_external(ctx);
qdev_simple_device_unplug_cb(hotplug_dev, dev, errp);
aio_enable_external(ctx);
@@ -1082,6 +1073,15 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
blk_set_aio_context(sd->conf.blk, qemu_get_aio_context(), NULL);
virtio_scsi_release(s);
}
+
+ if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
+ virtio_scsi_acquire(s);
+ virtio_scsi_push_event(s, sd,
+ VIRTIO_SCSI_T_TRANSPORT_RESET,
+ VIRTIO_SCSI_EVT_RESET_REMOVED);
+ scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
+ virtio_scsi_release(s);
+ }
}
static struct SCSIBusInfo virtio_scsi_scsi_info = {
--
2.39.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v4 04/20] virtio-scsi: stop using aio_disable_external() during unplug
2023-04-25 17:26 [PATCH v4 00/20] block: remove aio_disable_external() API Stefan Hajnoczi
` (2 preceding siblings ...)
2023-04-25 17:26 ` [PATCH v4 03/20] virtio-scsi: avoid race between unplug and transport event Stefan Hajnoczi
@ 2023-04-25 17:27 ` Stefan Hajnoczi
2023-04-28 14:22 ` Kevin Wolf
2023-04-25 17:27 ` [PATCH v4 05/20] util/vhost-user-server: rename refcount to in_flight counter Stefan Hajnoczi
` (15 subsequent siblings)
19 siblings, 1 reply; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-04-25 17:27 UTC (permalink / raw)
To: qemu-devel
Cc: Daniel P. Berrangé, Juan Quintela, Julia Suvorova,
Kevin Wolf, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Stefan Hajnoczi, Richard W.M. Jones,
qemu-block, Stefano Garzarella, Anthony Perard, Stefan Weil,
Xie Yongji, Paolo Bonzini, Aarushi Mehta,
Philippe Mathieu-Daudé, Eduardo Habkost, Stefano Stabellini,
Hanna Reitz, Ronnie Sahlberg, Zhengui Li, Daniil Tatianin
This patch is part of an effort to remove the aio_disable_external()
API because it does not fit in a multi-queue block layer world where
many AioContexts may be submitting requests to the same disk.
The SCSI emulation code is already in good shape to stop using
aio_disable_external(). It was only used by commit 9c5aad84da1c
("virtio-scsi: fixed virtio_scsi_ctx_check failed when detaching scsi
disk") to ensure that virtio_scsi_hotunplug() works while the guest
driver is submitting I/O.
Ensure virtio_scsi_hotunplug() is safe as follows:
1. qdev_simple_device_unplug_cb() -> qdev_unrealize() ->
device_set_realized() calls qatomic_set(&dev->realized, false) so
that future scsi_device_get() calls return NULL because they exclude
SCSIDevices with realized=false.
That means virtio-scsi will reject new I/O requests to this
SCSIDevice with VIRTIO_SCSI_S_BAD_TARGET even while
virtio_scsi_hotunplug() is still executing. We are protected against
new requests!
2. Add a call to scsi_device_purge_requests() from scsi_unrealize() so
that in-flight requests are cancelled synchronously. This ensures
that no in-flight requests remain once qdev_simple_device_unplug_cb()
returns.
Thanks to these two conditions we don't need aio_disable_external()
anymore.
Cc: Zhengui Li <lizhengui@huawei.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
hw/scsi/scsi-disk.c | 1 +
hw/scsi/virtio-scsi.c | 3 ---
2 files changed, 1 insertion(+), 3 deletions(-)
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 97c9b1c8cd..e01bd84541 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -2522,6 +2522,7 @@ static void scsi_realize(SCSIDevice *dev, Error **errp)
static void scsi_unrealize(SCSIDevice *dev)
{
+ scsi_device_purge_requests(dev, SENSE_CODE(RESET));
del_boot_device_lchs(&dev->qdev, NULL);
}
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 000961446c..a02f9233ec 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -1061,11 +1061,8 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
VirtIODevice *vdev = VIRTIO_DEVICE(hotplug_dev);
VirtIOSCSI *s = VIRTIO_SCSI(vdev);
SCSIDevice *sd = SCSI_DEVICE(dev);
- AioContext *ctx = s->ctx ?: qemu_get_aio_context();
- aio_disable_external(ctx);
qdev_simple_device_unplug_cb(hotplug_dev, dev, errp);
- aio_enable_external(ctx);
if (s->ctx) {
virtio_scsi_acquire(s);
--
2.39.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v4 05/20] util/vhost-user-server: rename refcount to in_flight counter
2023-04-25 17:26 [PATCH v4 00/20] block: remove aio_disable_external() API Stefan Hajnoczi
` (3 preceding siblings ...)
2023-04-25 17:27 ` [PATCH v4 04/20] virtio-scsi: stop using aio_disable_external() during unplug Stefan Hajnoczi
@ 2023-04-25 17:27 ` Stefan Hajnoczi
2023-04-25 17:27 ` [PATCH v4 06/20] block/export: wait for vhost-user-blk requests when draining Stefan Hajnoczi
` (14 subsequent siblings)
19 siblings, 0 replies; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-04-25 17:27 UTC (permalink / raw)
To: qemu-devel
Cc: Daniel P. Berrangé, Juan Quintela, Julia Suvorova,
Kevin Wolf, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Stefan Hajnoczi, Richard W.M. Jones,
qemu-block, Stefano Garzarella, Anthony Perard, Stefan Weil,
Xie Yongji, Paolo Bonzini, Aarushi Mehta,
Philippe Mathieu-Daudé, Eduardo Habkost, Stefano Stabellini,
Hanna Reitz, Ronnie Sahlberg
The VuServer object has a refcount field and ref/unref APIs. The name is
confusing because it's actually an in-flight request counter instead of
a refcount.
Normally a refcount destroys the object upon reaching zero. The VuServer
counter is used to wake up the vhost-user coroutine when there are no
more requests.
Avoid confusing by renaming refcount and ref/unref to in_flight and
inc/dec.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
include/qemu/vhost-user-server.h | 6 +++---
block/export/vhost-user-blk-server.c | 11 +++++++----
util/vhost-user-server.c | 14 +++++++-------
3 files changed, 17 insertions(+), 14 deletions(-)
diff --git a/include/qemu/vhost-user-server.h b/include/qemu/vhost-user-server.h
index 25c72433ca..bc0ac9ddb6 100644
--- a/include/qemu/vhost-user-server.h
+++ b/include/qemu/vhost-user-server.h
@@ -41,7 +41,7 @@ typedef struct {
const VuDevIface *vu_iface;
/* Protected by ctx lock */
- unsigned int refcount;
+ unsigned int in_flight;
bool wait_idle;
VuDev vu_dev;
QIOChannel *ioc; /* The I/O channel with the client */
@@ -60,8 +60,8 @@ bool vhost_user_server_start(VuServer *server,
void vhost_user_server_stop(VuServer *server);
-void vhost_user_server_ref(VuServer *server);
-void vhost_user_server_unref(VuServer *server);
+void vhost_user_server_inc_in_flight(VuServer *server);
+void vhost_user_server_dec_in_flight(VuServer *server);
void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx);
void vhost_user_server_detach_aio_context(VuServer *server);
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
index e56b92f2e2..841acb36e3 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -50,7 +50,10 @@ static void vu_blk_req_complete(VuBlkReq *req, size_t in_len)
free(req);
}
-/* Called with server refcount increased, must decrease before returning */
+/*
+ * Called with server in_flight counter increased, must decrease before
+ * returning.
+ */
static void coroutine_fn vu_blk_virtio_process_req(void *opaque)
{
VuBlkReq *req = opaque;
@@ -68,12 +71,12 @@ static void coroutine_fn vu_blk_virtio_process_req(void *opaque)
in_num, out_num);
if (in_len < 0) {
free(req);
- vhost_user_server_unref(server);
+ vhost_user_server_dec_in_flight(server);
return;
}
vu_blk_req_complete(req, in_len);
- vhost_user_server_unref(server);
+ vhost_user_server_dec_in_flight(server);
}
static void vu_blk_process_vq(VuDev *vu_dev, int idx)
@@ -95,7 +98,7 @@ static void vu_blk_process_vq(VuDev *vu_dev, int idx)
Coroutine *co =
qemu_coroutine_create(vu_blk_virtio_process_req, req);
- vhost_user_server_ref(server);
+ vhost_user_server_inc_in_flight(server);
qemu_coroutine_enter(co);
}
}
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 5b6216069c..1622f8cfb3 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -75,16 +75,16 @@ static void panic_cb(VuDev *vu_dev, const char *buf)
error_report("vu_panic: %s", buf);
}
-void vhost_user_server_ref(VuServer *server)
+void vhost_user_server_inc_in_flight(VuServer *server)
{
assert(!server->wait_idle);
- server->refcount++;
+ server->in_flight++;
}
-void vhost_user_server_unref(VuServer *server)
+void vhost_user_server_dec_in_flight(VuServer *server)
{
- server->refcount--;
- if (server->wait_idle && !server->refcount) {
+ server->in_flight--;
+ if (server->wait_idle && !server->in_flight) {
aio_co_wake(server->co_trip);
}
}
@@ -192,13 +192,13 @@ static coroutine_fn void vu_client_trip(void *opaque)
/* Keep running */
}
- if (server->refcount) {
+ if (server->in_flight) {
/* Wait for requests to complete before we can unmap the memory */
server->wait_idle = true;
qemu_coroutine_yield();
server->wait_idle = false;
}
- assert(server->refcount == 0);
+ assert(server->in_flight == 0);
vu_deinit(vu_dev);
--
2.39.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v4 06/20] block/export: wait for vhost-user-blk requests when draining
2023-04-25 17:26 [PATCH v4 00/20] block: remove aio_disable_external() API Stefan Hajnoczi
` (4 preceding siblings ...)
2023-04-25 17:27 ` [PATCH v4 05/20] util/vhost-user-server: rename refcount to in_flight counter Stefan Hajnoczi
@ 2023-04-25 17:27 ` Stefan Hajnoczi
2023-05-02 15:42 ` Kevin Wolf
2023-04-25 17:27 ` [PATCH v4 07/20] block/export: stop using is_external in vhost-user-blk server Stefan Hajnoczi
` (13 subsequent siblings)
19 siblings, 1 reply; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-04-25 17:27 UTC (permalink / raw)
To: qemu-devel
Cc: Daniel P. Berrangé, Juan Quintela, Julia Suvorova,
Kevin Wolf, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Stefan Hajnoczi, Richard W.M. Jones,
qemu-block, Stefano Garzarella, Anthony Perard, Stefan Weil,
Xie Yongji, Paolo Bonzini, Aarushi Mehta,
Philippe Mathieu-Daudé, Eduardo Habkost, Stefano Stabellini,
Hanna Reitz, Ronnie Sahlberg
Each vhost-user-blk request runs in a coroutine. When the BlockBackend
enters a drained section we need to enter a quiescent state. Currently
any in-flight requests race with bdrv_drained_begin() because it is
unaware of vhost-user-blk requests.
When blk_co_preadv/pwritev()/etc returns it wakes the
bdrv_drained_begin() thread but vhost-user-blk request processing has
not yet finished. The request coroutine continues executing while the
main loop thread thinks it is in a drained section.
One example where this is unsafe is for blk_set_aio_context() where
bdrv_drained_begin() is called before .aio_context_detached() and
.aio_context_attach(). If request coroutines are still running after
bdrv_drained_begin(), then the AioContext could change underneath them
and they race with new requests processed in the new AioContext. This
could lead to virtqueue corruption, for example.
(This example is theoretical, I came across this while reading the
code and have not tried to reproduce it.)
It's easy to make bdrv_drained_begin() wait for in-flight requests: add
a .drained_poll() callback that checks the VuServer's in-flight counter.
VuServer just needs an API that returns true when there are requests in
flight. The in-flight counter needs to be atomic.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
include/qemu/vhost-user-server.h | 4 +++-
block/export/vhost-user-blk-server.c | 16 ++++++++++++++++
util/vhost-user-server.c | 14 ++++++++++----
3 files changed, 29 insertions(+), 5 deletions(-)
diff --git a/include/qemu/vhost-user-server.h b/include/qemu/vhost-user-server.h
index bc0ac9ddb6..b1c1cda886 100644
--- a/include/qemu/vhost-user-server.h
+++ b/include/qemu/vhost-user-server.h
@@ -40,8 +40,9 @@ typedef struct {
int max_queues;
const VuDevIface *vu_iface;
+ unsigned int in_flight; /* atomic */
+
/* Protected by ctx lock */
- unsigned int in_flight;
bool wait_idle;
VuDev vu_dev;
QIOChannel *ioc; /* The I/O channel with the client */
@@ -62,6 +63,7 @@ void vhost_user_server_stop(VuServer *server);
void vhost_user_server_inc_in_flight(VuServer *server);
void vhost_user_server_dec_in_flight(VuServer *server);
+bool vhost_user_server_has_in_flight(VuServer *server);
void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx);
void vhost_user_server_detach_aio_context(VuServer *server);
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
index 841acb36e3..092b86aae4 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -272,7 +272,20 @@ static void vu_blk_exp_resize(void *opaque)
vu_config_change_msg(&vexp->vu_server.vu_dev);
}
+/*
+ * Ensures that bdrv_drained_begin() waits until in-flight requests complete.
+ *
+ * Called with vexp->export.ctx acquired.
+ */
+static bool vu_blk_drained_poll(void *opaque)
+{
+ VuBlkExport *vexp = opaque;
+
+ return vhost_user_server_has_in_flight(&vexp->vu_server);
+}
+
static const BlockDevOps vu_blk_dev_ops = {
+ .drained_poll = vu_blk_drained_poll,
.resize_cb = vu_blk_exp_resize,
};
@@ -314,6 +327,7 @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
logical_block_size, num_queues);
+ blk_set_dev_ops(exp->blk, &vu_blk_dev_ops, vexp);
blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
vexp);
@@ -323,6 +337,7 @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
num_queues, &vu_blk_iface, errp)) {
blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
blk_aio_detach, vexp);
+ blk_set_dev_ops(exp->blk, NULL, NULL);
g_free(vexp->handler.serial);
return -EADDRNOTAVAIL;
}
@@ -336,6 +351,7 @@ static void vu_blk_exp_delete(BlockExport *exp)
blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
vexp);
+ blk_set_dev_ops(exp->blk, NULL, NULL);
g_free(vexp->handler.serial);
}
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 1622f8cfb3..2e6b640050 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -78,17 +78,23 @@ static void panic_cb(VuDev *vu_dev, const char *buf)
void vhost_user_server_inc_in_flight(VuServer *server)
{
assert(!server->wait_idle);
- server->in_flight++;
+ qatomic_inc(&server->in_flight);
}
void vhost_user_server_dec_in_flight(VuServer *server)
{
- server->in_flight--;
- if (server->wait_idle && !server->in_flight) {
- aio_co_wake(server->co_trip);
+ if (qatomic_fetch_dec(&server->in_flight) == 1) {
+ if (server->wait_idle) {
+ aio_co_wake(server->co_trip);
+ }
}
}
+bool vhost_user_server_has_in_flight(VuServer *server)
+{
+ return qatomic_load_acquire(&server->in_flight) > 0;
+}
+
static bool coroutine_fn
vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
{
--
2.39.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v4 07/20] block/export: stop using is_external in vhost-user-blk server
2023-04-25 17:26 [PATCH v4 00/20] block: remove aio_disable_external() API Stefan Hajnoczi
` (5 preceding siblings ...)
2023-04-25 17:27 ` [PATCH v4 06/20] block/export: wait for vhost-user-blk requests when draining Stefan Hajnoczi
@ 2023-04-25 17:27 ` Stefan Hajnoczi
2023-05-02 16:04 ` Kevin Wolf
2023-04-25 17:27 ` [PATCH v4 08/20] hw/xen: do not use aio_set_fd_handler(is_external=true) in xen_xenstore Stefan Hajnoczi
` (12 subsequent siblings)
19 siblings, 1 reply; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-04-25 17:27 UTC (permalink / raw)
To: qemu-devel
Cc: Daniel P. Berrangé, Juan Quintela, Julia Suvorova,
Kevin Wolf, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Stefan Hajnoczi, Richard W.M. Jones,
qemu-block, Stefano Garzarella, Anthony Perard, Stefan Weil,
Xie Yongji, Paolo Bonzini, Aarushi Mehta,
Philippe Mathieu-Daudé, Eduardo Habkost, Stefano Stabellini,
Hanna Reitz, Ronnie Sahlberg
vhost-user activity must be suspended during bdrv_drained_begin/end().
This prevents new requests from interfering with whatever is happening
in the drained section.
Previously this was done using aio_set_fd_handler()'s is_external
argument. In a multi-queue block layer world the aio_disable_external()
API cannot be used since multiple AioContext may be processing I/O, not
just one.
Switch to BlockDevOps->drained_begin/end() callbacks.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
block/export/vhost-user-blk-server.c | 43 ++++++++++++++--------------
util/vhost-user-server.c | 10 +++----
2 files changed, 26 insertions(+), 27 deletions(-)
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
index 092b86aae4..d20f69cd74 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -208,22 +208,6 @@ static const VuDevIface vu_blk_iface = {
.process_msg = vu_blk_process_msg,
};
-static void blk_aio_attached(AioContext *ctx, void *opaque)
-{
- VuBlkExport *vexp = opaque;
-
- vexp->export.ctx = ctx;
- vhost_user_server_attach_aio_context(&vexp->vu_server, ctx);
-}
-
-static void blk_aio_detach(void *opaque)
-{
- VuBlkExport *vexp = opaque;
-
- vhost_user_server_detach_aio_context(&vexp->vu_server);
- vexp->export.ctx = NULL;
-}
-
static void
vu_blk_initialize_config(BlockDriverState *bs,
struct virtio_blk_config *config,
@@ -272,6 +256,25 @@ static void vu_blk_exp_resize(void *opaque)
vu_config_change_msg(&vexp->vu_server.vu_dev);
}
+/* Called with vexp->export.ctx acquired */
+static void vu_blk_drained_begin(void *opaque)
+{
+ VuBlkExport *vexp = opaque;
+
+ vhost_user_server_detach_aio_context(&vexp->vu_server);
+}
+
+/* Called with vexp->export.blk AioContext acquired */
+static void vu_blk_drained_end(void *opaque)
+{
+ VuBlkExport *vexp = opaque;
+
+ /* Refresh AioContext in case it changed */
+ vexp->export.ctx = blk_get_aio_context(vexp->export.blk);
+
+ vhost_user_server_attach_aio_context(&vexp->vu_server, vexp->export.ctx);
+}
+
/*
* Ensures that bdrv_drained_begin() waits until in-flight requests complete.
*
@@ -285,6 +288,8 @@ static bool vu_blk_drained_poll(void *opaque)
}
static const BlockDevOps vu_blk_dev_ops = {
+ .drained_begin = vu_blk_drained_begin,
+ .drained_end = vu_blk_drained_end,
.drained_poll = vu_blk_drained_poll,
.resize_cb = vu_blk_exp_resize,
};
@@ -328,15 +333,11 @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
logical_block_size, num_queues);
blk_set_dev_ops(exp->blk, &vu_blk_dev_ops, vexp);
- blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
- vexp);
blk_set_dev_ops(exp->blk, &vu_blk_dev_ops, vexp);
if (!vhost_user_server_start(&vexp->vu_server, vu_opts->addr, exp->ctx,
num_queues, &vu_blk_iface, errp)) {
- blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
- blk_aio_detach, vexp);
blk_set_dev_ops(exp->blk, NULL, NULL);
g_free(vexp->handler.serial);
return -EADDRNOTAVAIL;
@@ -349,8 +350,6 @@ static void vu_blk_exp_delete(BlockExport *exp)
{
VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
- blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
- vexp);
blk_set_dev_ops(exp->blk, NULL, NULL);
g_free(vexp->handler.serial);
}
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 2e6b640050..332aea9306 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -278,7 +278,7 @@ set_watch(VuDev *vu_dev, int fd, int vu_evt,
vu_fd_watch->fd = fd;
vu_fd_watch->cb = cb;
qemu_socket_set_nonblock(fd);
- aio_set_fd_handler(server->ioc->ctx, fd, true, kick_handler,
+ aio_set_fd_handler(server->ioc->ctx, fd, false, kick_handler,
NULL, NULL, NULL, vu_fd_watch);
vu_fd_watch->vu_dev = vu_dev;
vu_fd_watch->pvt = pvt;
@@ -299,7 +299,7 @@ static void remove_watch(VuDev *vu_dev, int fd)
if (!vu_fd_watch) {
return;
}
- aio_set_fd_handler(server->ioc->ctx, fd, true,
+ aio_set_fd_handler(server->ioc->ctx, fd, false,
NULL, NULL, NULL, NULL, NULL);
QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
@@ -362,7 +362,7 @@ void vhost_user_server_stop(VuServer *server)
VuFdWatch *vu_fd_watch;
QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
- aio_set_fd_handler(server->ctx, vu_fd_watch->fd, true,
+ aio_set_fd_handler(server->ctx, vu_fd_watch->fd, false,
NULL, NULL, NULL, NULL, vu_fd_watch);
}
@@ -403,7 +403,7 @@ void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx)
qio_channel_attach_aio_context(server->ioc, ctx);
QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
- aio_set_fd_handler(ctx, vu_fd_watch->fd, true, kick_handler, NULL,
+ aio_set_fd_handler(ctx, vu_fd_watch->fd, false, kick_handler, NULL,
NULL, NULL, vu_fd_watch);
}
@@ -417,7 +417,7 @@ void vhost_user_server_detach_aio_context(VuServer *server)
VuFdWatch *vu_fd_watch;
QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
- aio_set_fd_handler(server->ctx, vu_fd_watch->fd, true,
+ aio_set_fd_handler(server->ctx, vu_fd_watch->fd, false,
NULL, NULL, NULL, NULL, vu_fd_watch);
}
--
2.39.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v4 08/20] hw/xen: do not use aio_set_fd_handler(is_external=true) in xen_xenstore
2023-04-25 17:26 [PATCH v4 00/20] block: remove aio_disable_external() API Stefan Hajnoczi
` (6 preceding siblings ...)
2023-04-25 17:27 ` [PATCH v4 07/20] block/export: stop using is_external in vhost-user-blk server Stefan Hajnoczi
@ 2023-04-25 17:27 ` Stefan Hajnoczi
2023-04-25 17:27 ` [PATCH v4 09/20] block: add blk_in_drain() API Stefan Hajnoczi
` (11 subsequent siblings)
19 siblings, 0 replies; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-04-25 17:27 UTC (permalink / raw)
To: qemu-devel
Cc: Daniel P. Berrangé, Juan Quintela, Julia Suvorova,
Kevin Wolf, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Stefan Hajnoczi, Richard W.M. Jones,
qemu-block, Stefano Garzarella, Anthony Perard, Stefan Weil,
Xie Yongji, Paolo Bonzini, Aarushi Mehta,
Philippe Mathieu-Daudé, Eduardo Habkost, Stefano Stabellini,
Hanna Reitz, Ronnie Sahlberg, David Woodhouse
There is no need to suspend activity between aio_disable_external() and
aio_enable_external(), which is mainly used for the block layer's drain
operation.
This is part of ongoing work to remove the aio_disable_external() API.
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
hw/i386/kvm/xen_xenstore.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c
index 900679af8a..6e81bc8791 100644
--- a/hw/i386/kvm/xen_xenstore.c
+++ b/hw/i386/kvm/xen_xenstore.c
@@ -133,7 +133,7 @@ static void xen_xenstore_realize(DeviceState *dev, Error **errp)
error_setg(errp, "Xenstore evtchn port init failed");
return;
}
- aio_set_fd_handler(qemu_get_aio_context(), xen_be_evtchn_fd(s->eh), true,
+ aio_set_fd_handler(qemu_get_aio_context(), xen_be_evtchn_fd(s->eh), false,
xen_xenstore_event, NULL, NULL, NULL, s);
s->impl = xs_impl_create(xen_domid);
--
2.39.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v4 09/20] block: add blk_in_drain() API
2023-04-25 17:26 [PATCH v4 00/20] block: remove aio_disable_external() API Stefan Hajnoczi
` (7 preceding siblings ...)
2023-04-25 17:27 ` [PATCH v4 08/20] hw/xen: do not use aio_set_fd_handler(is_external=true) in xen_xenstore Stefan Hajnoczi
@ 2023-04-25 17:27 ` Stefan Hajnoczi
2023-04-25 17:27 ` [PATCH v4 10/20] block: drain from main loop thread in bdrv_co_yield_to_drain() Stefan Hajnoczi
` (10 subsequent siblings)
19 siblings, 0 replies; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-04-25 17:27 UTC (permalink / raw)
To: qemu-devel
Cc: Daniel P. Berrangé, Juan Quintela, Julia Suvorova,
Kevin Wolf, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Stefan Hajnoczi, Richard W.M. Jones,
qemu-block, Stefano Garzarella, Anthony Perard, Stefan Weil,
Xie Yongji, Paolo Bonzini, Aarushi Mehta,
Philippe Mathieu-Daudé, Eduardo Habkost, Stefano Stabellini,
Hanna Reitz, Ronnie Sahlberg
The BlockBackend quiesce_counter is greater than zero during drained
sections. Add an API to check whether the BlockBackend is in a drained
section.
The next patch will use this API.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
include/sysemu/block-backend-global-state.h | 1 +
block/block-backend.c | 7 +++++++
2 files changed, 8 insertions(+)
diff --git a/include/sysemu/block-backend-global-state.h b/include/sysemu/block-backend-global-state.h
index 2b6d27db7c..ac7cbd6b5e 100644
--- a/include/sysemu/block-backend-global-state.h
+++ b/include/sysemu/block-backend-global-state.h
@@ -78,6 +78,7 @@ void blk_activate(BlockBackend *blk, Error **errp);
int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags);
void blk_aio_cancel(BlockAIOCB *acb);
int blk_commit_all(void);
+bool blk_in_drain(BlockBackend *blk);
void blk_drain(BlockBackend *blk);
void blk_drain_all(void);
void blk_set_on_error(BlockBackend *blk, BlockdevOnError on_read_error,
diff --git a/block/block-backend.c b/block/block-backend.c
index ffd1d66f7d..42721a3592 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1266,6 +1266,13 @@ blk_check_byte_request(BlockBackend *blk, int64_t offset, int64_t bytes)
return 0;
}
+/* Are we currently in a drained section? */
+bool blk_in_drain(BlockBackend *blk)
+{
+ GLOBAL_STATE_CODE(); /* change to IO_OR_GS_CODE(), if necessary */
+ return qatomic_read(&blk->quiesce_counter);
+}
+
/* To be called between exactly one pair of blk_inc/dec_in_flight() */
static void coroutine_fn blk_wait_while_drained(BlockBackend *blk)
{
--
2.39.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v4 10/20] block: drain from main loop thread in bdrv_co_yield_to_drain()
2023-04-25 17:26 [PATCH v4 00/20] block: remove aio_disable_external() API Stefan Hajnoczi
` (8 preceding siblings ...)
2023-04-25 17:27 ` [PATCH v4 09/20] block: add blk_in_drain() API Stefan Hajnoczi
@ 2023-04-25 17:27 ` Stefan Hajnoczi
2023-05-02 16:21 ` Kevin Wolf
2023-04-25 17:27 ` [PATCH v4 11/20] xen-block: implement BlockDevOps->drained_begin() Stefan Hajnoczi
` (9 subsequent siblings)
19 siblings, 1 reply; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-04-25 17:27 UTC (permalink / raw)
To: qemu-devel
Cc: Daniel P. Berrangé, Juan Quintela, Julia Suvorova,
Kevin Wolf, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Stefan Hajnoczi, Richard W.M. Jones,
qemu-block, Stefano Garzarella, Anthony Perard, Stefan Weil,
Xie Yongji, Paolo Bonzini, Aarushi Mehta,
Philippe Mathieu-Daudé, Eduardo Habkost, Stefano Stabellini,
Hanna Reitz, Ronnie Sahlberg
For simplicity, always run BlockDevOps .drained_begin/end/poll()
callbacks in the main loop thread. This makes it easier to implement the
callbacks and avoids extra locks.
Move the function pointer declarations from the I/O Code section to the
Global State section in block-backend-common.h.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
include/sysemu/block-backend-common.h | 25 +++++++++++++------------
block/io.c | 3 ++-
2 files changed, 15 insertions(+), 13 deletions(-)
diff --git a/include/sysemu/block-backend-common.h b/include/sysemu/block-backend-common.h
index 2391679c56..780cea7305 100644
--- a/include/sysemu/block-backend-common.h
+++ b/include/sysemu/block-backend-common.h
@@ -59,6 +59,19 @@ typedef struct BlockDevOps {
*/
bool (*is_medium_locked)(void *opaque);
+ /*
+ * Runs when the backend receives a drain request.
+ */
+ void (*drained_begin)(void *opaque);
+ /*
+ * Runs when the backend's last drain request ends.
+ */
+ void (*drained_end)(void *opaque);
+ /*
+ * Is the device still busy?
+ */
+ bool (*drained_poll)(void *opaque);
+
/*
* I/O API functions. These functions are thread-safe.
*
@@ -76,18 +89,6 @@ typedef struct BlockDevOps {
* Runs when the size changed (e.g. monitor command block_resize)
*/
void (*resize_cb)(void *opaque);
- /*
- * Runs when the backend receives a drain request.
- */
- void (*drained_begin)(void *opaque);
- /*
- * Runs when the backend's last drain request ends.
- */
- void (*drained_end)(void *opaque);
- /*
- * Is the device still busy?
- */
- bool (*drained_poll)(void *opaque);
} BlockDevOps;
/*
diff --git a/block/io.c b/block/io.c
index 2e267a85ab..4f9fe2f808 100644
--- a/block/io.c
+++ b/block/io.c
@@ -335,7 +335,8 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
if (ctx != co_ctx) {
aio_context_release(ctx);
}
- replay_bh_schedule_oneshot_event(ctx, bdrv_co_drain_bh_cb, &data);
+ replay_bh_schedule_oneshot_event(qemu_get_aio_context(),
+ bdrv_co_drain_bh_cb, &data);
qemu_coroutine_yield();
/* If we are resumed from some other event (such as an aio completion or a
--
2.39.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v4 11/20] xen-block: implement BlockDevOps->drained_begin()
2023-04-25 17:26 [PATCH v4 00/20] block: remove aio_disable_external() API Stefan Hajnoczi
` (9 preceding siblings ...)
2023-04-25 17:27 ` [PATCH v4 10/20] block: drain from main loop thread in bdrv_co_yield_to_drain() Stefan Hajnoczi
@ 2023-04-25 17:27 ` Stefan Hajnoczi
2023-04-25 17:27 ` [PATCH v4 12/20] hw/xen: do not set is_external=true on evtchn fds Stefan Hajnoczi
` (8 subsequent siblings)
19 siblings, 0 replies; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-04-25 17:27 UTC (permalink / raw)
To: qemu-devel
Cc: Daniel P. Berrangé, Juan Quintela, Julia Suvorova,
Kevin Wolf, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Stefan Hajnoczi, Richard W.M. Jones,
qemu-block, Stefano Garzarella, Anthony Perard, Stefan Weil,
Xie Yongji, Paolo Bonzini, Aarushi Mehta,
Philippe Mathieu-Daudé, Eduardo Habkost, Stefano Stabellini,
Hanna Reitz, Ronnie Sahlberg
Detach event channels during drained sections to stop I/O submission
from the ring. xen-block is no longer reliant on aio_disable_external()
after this patch. This will allow us to remove the
aio_disable_external() API once all other code that relies on it is
converted.
Extend xen_device_set_event_channel_context() to allow ctx=NULL. The
event channel still exists but the event loop does not monitor the file
descriptor. Event channel processing can resume by calling
xen_device_set_event_channel_context() with a non-NULL ctx.
Factor out xen_device_set_event_channel_context() calls in
hw/block/dataplane/xen-block.c into attach/detach helper functions.
Incidentally, these don't require the AioContext lock because
aio_set_fd_handler() is thread-safe.
It's safer to register BlockDevOps after the dataplane instance has been
created. The BlockDevOps .drained_begin/end() callbacks depend on the
dataplane instance, so move the blk_set_dev_ops() call after
xen_block_dataplane_create().
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
hw/block/dataplane/xen-block.h | 2 ++
hw/block/dataplane/xen-block.c | 42 +++++++++++++++++++++++++---------
hw/block/xen-block.c | 24 ++++++++++++++++---
hw/xen/xen-bus.c | 7 ++++--
4 files changed, 59 insertions(+), 16 deletions(-)
diff --git a/hw/block/dataplane/xen-block.h b/hw/block/dataplane/xen-block.h
index 76dcd51c3d..7b8e9df09f 100644
--- a/hw/block/dataplane/xen-block.h
+++ b/hw/block/dataplane/xen-block.h
@@ -26,5 +26,7 @@ void xen_block_dataplane_start(XenBlockDataPlane *dataplane,
unsigned int protocol,
Error **errp);
void xen_block_dataplane_stop(XenBlockDataPlane *dataplane);
+void xen_block_dataplane_attach(XenBlockDataPlane *dataplane);
+void xen_block_dataplane_detach(XenBlockDataPlane *dataplane);
#endif /* HW_BLOCK_DATAPLANE_XEN_BLOCK_H */
diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c
index 734da42ea7..02e0fd6115 100644
--- a/hw/block/dataplane/xen-block.c
+++ b/hw/block/dataplane/xen-block.c
@@ -663,6 +663,30 @@ void xen_block_dataplane_destroy(XenBlockDataPlane *dataplane)
g_free(dataplane);
}
+void xen_block_dataplane_detach(XenBlockDataPlane *dataplane)
+{
+ if (!dataplane || !dataplane->event_channel) {
+ return;
+ }
+
+ /* Only reason for failure is a NULL channel */
+ xen_device_set_event_channel_context(dataplane->xendev,
+ dataplane->event_channel,
+ NULL, &error_abort);
+}
+
+void xen_block_dataplane_attach(XenBlockDataPlane *dataplane)
+{
+ if (!dataplane || !dataplane->event_channel) {
+ return;
+ }
+
+ /* Only reason for failure is a NULL channel */
+ xen_device_set_event_channel_context(dataplane->xendev,
+ dataplane->event_channel,
+ dataplane->ctx, &error_abort);
+}
+
void xen_block_dataplane_stop(XenBlockDataPlane *dataplane)
{
XenDevice *xendev;
@@ -673,13 +697,11 @@ void xen_block_dataplane_stop(XenBlockDataPlane *dataplane)
xendev = dataplane->xendev;
- aio_context_acquire(dataplane->ctx);
- if (dataplane->event_channel) {
- /* Only reason for failure is a NULL channel */
- xen_device_set_event_channel_context(xendev, dataplane->event_channel,
- qemu_get_aio_context(),
- &error_abort);
+ if (!blk_in_drain(dataplane->blk)) {
+ xen_block_dataplane_detach(dataplane);
}
+
+ aio_context_acquire(dataplane->ctx);
/* Xen doesn't have multiple users for nodes, so this can't fail */
blk_set_aio_context(dataplane->blk, qemu_get_aio_context(), &error_abort);
aio_context_release(dataplane->ctx);
@@ -818,11 +840,9 @@ void xen_block_dataplane_start(XenBlockDataPlane *dataplane,
blk_set_aio_context(dataplane->blk, dataplane->ctx, NULL);
aio_context_release(old_context);
- /* Only reason for failure is a NULL channel */
- aio_context_acquire(dataplane->ctx);
- xen_device_set_event_channel_context(xendev, dataplane->event_channel,
- dataplane->ctx, &error_abort);
- aio_context_release(dataplane->ctx);
+ if (!blk_in_drain(dataplane->blk)) {
+ xen_block_dataplane_attach(dataplane);
+ }
return;
diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
index f5a744589d..f099914831 100644
--- a/hw/block/xen-block.c
+++ b/hw/block/xen-block.c
@@ -189,8 +189,26 @@ static void xen_block_resize_cb(void *opaque)
xen_device_backend_printf(xendev, "state", "%u", state);
}
+/* Suspend request handling */
+static void xen_block_drained_begin(void *opaque)
+{
+ XenBlockDevice *blockdev = opaque;
+
+ xen_block_dataplane_detach(blockdev->dataplane);
+}
+
+/* Resume request handling */
+static void xen_block_drained_end(void *opaque)
+{
+ XenBlockDevice *blockdev = opaque;
+
+ xen_block_dataplane_attach(blockdev->dataplane);
+}
+
static const BlockDevOps xen_block_dev_ops = {
- .resize_cb = xen_block_resize_cb,
+ .resize_cb = xen_block_resize_cb,
+ .drained_begin = xen_block_drained_begin,
+ .drained_end = xen_block_drained_end,
};
static void xen_block_realize(XenDevice *xendev, Error **errp)
@@ -242,8 +260,6 @@ static void xen_block_realize(XenDevice *xendev, Error **errp)
return;
}
- blk_set_dev_ops(blk, &xen_block_dev_ops, blockdev);
-
if (conf->discard_granularity == -1) {
conf->discard_granularity = conf->physical_block_size;
}
@@ -277,6 +293,8 @@ static void xen_block_realize(XenDevice *xendev, Error **errp)
blockdev->dataplane =
xen_block_dataplane_create(xendev, blk, conf->logical_block_size,
blockdev->props.iothread);
+
+ blk_set_dev_ops(blk, &xen_block_dev_ops, blockdev);
}
static void xen_block_frontend_changed(XenDevice *xendev,
diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
index c59850b1de..b8f408c9ed 100644
--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -846,8 +846,11 @@ void xen_device_set_event_channel_context(XenDevice *xendev,
NULL, NULL, NULL, NULL, NULL);
channel->ctx = ctx;
- aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), true,
- xen_device_event, NULL, xen_device_poll, NULL, channel);
+ if (ctx) {
+ aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh),
+ true, xen_device_event, NULL, xen_device_poll, NULL,
+ channel);
+ }
}
XenEventChannel *xen_device_bind_event_channel(XenDevice *xendev,
--
2.39.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v4 12/20] hw/xen: do not set is_external=true on evtchn fds
2023-04-25 17:26 [PATCH v4 00/20] block: remove aio_disable_external() API Stefan Hajnoczi
` (10 preceding siblings ...)
2023-04-25 17:27 ` [PATCH v4 11/20] xen-block: implement BlockDevOps->drained_begin() Stefan Hajnoczi
@ 2023-04-25 17:27 ` Stefan Hajnoczi
2023-04-25 17:27 ` [PATCH v4 13/20] block/export: rewrite vduse-blk drain code Stefan Hajnoczi
` (7 subsequent siblings)
19 siblings, 0 replies; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-04-25 17:27 UTC (permalink / raw)
To: qemu-devel
Cc: Daniel P. Berrangé, Juan Quintela, Julia Suvorova,
Kevin Wolf, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Stefan Hajnoczi, Richard W.M. Jones,
qemu-block, Stefano Garzarella, Anthony Perard, Stefan Weil,
Xie Yongji, Paolo Bonzini, Aarushi Mehta,
Philippe Mathieu-Daudé, Eduardo Habkost, Stefano Stabellini,
Hanna Reitz, Ronnie Sahlberg
is_external=true suspends fd handlers between aio_disable_external() and
aio_enable_external(). The block layer's drain operation uses this
mechanism to prevent new I/O from sneaking in between
bdrv_drained_begin() and bdrv_drained_end().
The previous commit converted the xen-block device to use BlockDevOps
.drained_begin/end() callbacks. It no longer relies on is_external=true
so it is safe to pass is_external=false.
This is part of ongoing work to remove the aio_disable_external() API.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
hw/xen/xen-bus.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
index b8f408c9ed..bf256d4da2 100644
--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -842,14 +842,14 @@ void xen_device_set_event_channel_context(XenDevice *xendev,
}
if (channel->ctx)
- aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), true,
+ aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), false,
NULL, NULL, NULL, NULL, NULL);
channel->ctx = ctx;
if (ctx) {
aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh),
- true, xen_device_event, NULL, xen_device_poll, NULL,
- channel);
+ false, xen_device_event, NULL, xen_device_poll,
+ NULL, channel);
}
}
@@ -923,7 +923,7 @@ void xen_device_unbind_event_channel(XenDevice *xendev,
QLIST_REMOVE(channel, list);
- aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), true,
+ aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), false,
NULL, NULL, NULL, NULL, NULL);
if (qemu_xen_evtchn_unbind(channel->xeh, channel->local_port) < 0) {
--
2.39.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v4 13/20] block/export: rewrite vduse-blk drain code
2023-04-25 17:26 [PATCH v4 00/20] block: remove aio_disable_external() API Stefan Hajnoczi
` (11 preceding siblings ...)
2023-04-25 17:27 ` [PATCH v4 12/20] hw/xen: do not set is_external=true on evtchn fds Stefan Hajnoczi
@ 2023-04-25 17:27 ` Stefan Hajnoczi
2023-04-25 17:27 ` [PATCH v4 14/20] block/export: don't require AioContext lock around blk_exp_ref/unref() Stefan Hajnoczi
` (6 subsequent siblings)
19 siblings, 0 replies; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-04-25 17:27 UTC (permalink / raw)
To: qemu-devel
Cc: Daniel P. Berrangé, Juan Quintela, Julia Suvorova,
Kevin Wolf, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Stefan Hajnoczi, Richard W.M. Jones,
qemu-block, Stefano Garzarella, Anthony Perard, Stefan Weil,
Xie Yongji, Paolo Bonzini, Aarushi Mehta,
Philippe Mathieu-Daudé, Eduardo Habkost, Stefano Stabellini,
Hanna Reitz, Ronnie Sahlberg
vduse_blk_detach_ctx() waits for in-flight requests using
AIO_WAIT_WHILE(). This is not allowed according to a comment in
bdrv_set_aio_context_commit():
/*
* Take the old AioContex when detaching it from bs.
* At this point, new_context lock is already acquired, and we are now
* also taking old_context. This is safe as long as bdrv_detach_aio_context
* does not call AIO_POLL_WHILE().
*/
Use this opportunity to rewrite the drain code in vduse-blk:
- Use the BlockExport refcount so that vduse_blk_exp_delete() is only
called when there are no more requests in flight.
- Implement .drained_poll() so in-flight request coroutines are stopped
by the time .bdrv_detach_aio_context() is called.
- Remove AIO_WAIT_WHILE() from vduse_blk_detach_ctx() to solve the
.bdrv_detach_aio_context() constraint violation. It's no longer
needed due to the previous changes.
- Always handle the VDUSE file descriptor, even in drained sections. The
VDUSE file descriptor doesn't submit I/O, so it's safe to handle it in
drained sections. This ensures that the VDUSE kernel code gets a fast
response.
- Suspend virtqueue fd handlers in .drained_begin() and resume them in
.drained_end(). This eliminates the need for the
aio_set_fd_handler(is_external=true) flag, which is being removed from
QEMU.
This is a long list but splitting it into individual commits would
probably lead to git bisect failures - the changes are all related.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
block/export/vduse-blk.c | 132 +++++++++++++++++++++++++++------------
1 file changed, 93 insertions(+), 39 deletions(-)
diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index f7ae44e3ce..35dc8fcf45 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -31,7 +31,8 @@ typedef struct VduseBlkExport {
VduseDev *dev;
uint16_t num_queues;
char *recon_file;
- unsigned int inflight;
+ unsigned int inflight; /* atomic */
+ bool vqs_started;
} VduseBlkExport;
typedef struct VduseBlkReq {
@@ -41,13 +42,24 @@ typedef struct VduseBlkReq {
static void vduse_blk_inflight_inc(VduseBlkExport *vblk_exp)
{
- vblk_exp->inflight++;
+ if (qatomic_fetch_inc(&vblk_exp->inflight) == 0) {
+ /* Prevent export from being deleted */
+ aio_context_acquire(vblk_exp->export.ctx);
+ blk_exp_ref(&vblk_exp->export);
+ aio_context_release(vblk_exp->export.ctx);
+ }
}
static void vduse_blk_inflight_dec(VduseBlkExport *vblk_exp)
{
- if (--vblk_exp->inflight == 0) {
+ if (qatomic_fetch_dec(&vblk_exp->inflight) == 1) {
+ /* Wake AIO_WAIT_WHILE() */
aio_wait_kick();
+
+ /* Now the export can be deleted */
+ aio_context_acquire(vblk_exp->export.ctx);
+ blk_exp_unref(&vblk_exp->export);
+ aio_context_release(vblk_exp->export.ctx);
}
}
@@ -124,8 +136,12 @@ static void vduse_blk_enable_queue(VduseDev *dev, VduseVirtq *vq)
{
VduseBlkExport *vblk_exp = vduse_dev_get_priv(dev);
+ if (!vblk_exp->vqs_started) {
+ return; /* vduse_blk_drained_end() will start vqs later */
+ }
+
aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq),
- true, on_vduse_vq_kick, NULL, NULL, NULL, vq);
+ false, on_vduse_vq_kick, NULL, NULL, NULL, vq);
/* Make sure we don't miss any kick afer reconnecting */
eventfd_write(vduse_queue_get_fd(vq), 1);
}
@@ -133,9 +149,14 @@ static void vduse_blk_enable_queue(VduseDev *dev, VduseVirtq *vq)
static void vduse_blk_disable_queue(VduseDev *dev, VduseVirtq *vq)
{
VduseBlkExport *vblk_exp = vduse_dev_get_priv(dev);
+ int fd = vduse_queue_get_fd(vq);
- aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq),
- true, NULL, NULL, NULL, NULL, NULL);
+ if (fd < 0) {
+ return;
+ }
+
+ aio_set_fd_handler(vblk_exp->export.ctx, fd, false,
+ NULL, NULL, NULL, NULL, NULL);
}
static const VduseOps vduse_blk_ops = {
@@ -152,42 +173,19 @@ static void on_vduse_dev_kick(void *opaque)
static void vduse_blk_attach_ctx(VduseBlkExport *vblk_exp, AioContext *ctx)
{
- int i;
-
aio_set_fd_handler(vblk_exp->export.ctx, vduse_dev_get_fd(vblk_exp->dev),
- true, on_vduse_dev_kick, NULL, NULL, NULL,
+ false, on_vduse_dev_kick, NULL, NULL, NULL,
vblk_exp->dev);
- for (i = 0; i < vblk_exp->num_queues; i++) {
- VduseVirtq *vq = vduse_dev_get_queue(vblk_exp->dev, i);
- int fd = vduse_queue_get_fd(vq);
-
- if (fd < 0) {
- continue;
- }
- aio_set_fd_handler(vblk_exp->export.ctx, fd, true,
- on_vduse_vq_kick, NULL, NULL, NULL, vq);
- }
+ /* Virtqueues are handled by vduse_blk_drained_end() */
}
static void vduse_blk_detach_ctx(VduseBlkExport *vblk_exp)
{
- int i;
-
- for (i = 0; i < vblk_exp->num_queues; i++) {
- VduseVirtq *vq = vduse_dev_get_queue(vblk_exp->dev, i);
- int fd = vduse_queue_get_fd(vq);
-
- if (fd < 0) {
- continue;
- }
- aio_set_fd_handler(vblk_exp->export.ctx, fd,
- true, NULL, NULL, NULL, NULL, NULL);
- }
aio_set_fd_handler(vblk_exp->export.ctx, vduse_dev_get_fd(vblk_exp->dev),
- true, NULL, NULL, NULL, NULL, NULL);
+ false, NULL, NULL, NULL, NULL, NULL);
- AIO_WAIT_WHILE(vblk_exp->export.ctx, vblk_exp->inflight > 0);
+ /* Virtqueues are handled by vduse_blk_drained_begin() */
}
@@ -220,8 +218,55 @@ static void vduse_blk_resize(void *opaque)
(char *)&config.capacity);
}
+static void vduse_blk_stop_virtqueues(VduseBlkExport *vblk_exp)
+{
+ for (uint16_t i = 0; i < vblk_exp->num_queues; i++) {
+ VduseVirtq *vq = vduse_dev_get_queue(vblk_exp->dev, i);
+ vduse_blk_disable_queue(vblk_exp->dev, vq);
+ }
+
+ vblk_exp->vqs_started = false;
+}
+
+static void vduse_blk_start_virtqueues(VduseBlkExport *vblk_exp)
+{
+ vblk_exp->vqs_started = true;
+
+ for (uint16_t i = 0; i < vblk_exp->num_queues; i++) {
+ VduseVirtq *vq = vduse_dev_get_queue(vblk_exp->dev, i);
+ vduse_blk_enable_queue(vblk_exp->dev, vq);
+ }
+}
+
+static void vduse_blk_drained_begin(void *opaque)
+{
+ BlockExport *exp = opaque;
+ VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
+
+ vduse_blk_stop_virtqueues(vblk_exp);
+}
+
+static void vduse_blk_drained_end(void *opaque)
+{
+ BlockExport *exp = opaque;
+ VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
+
+ vduse_blk_start_virtqueues(vblk_exp);
+}
+
+static bool vduse_blk_drained_poll(void *opaque)
+{
+ BlockExport *exp = opaque;
+ VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
+
+ return qatomic_read(&vblk_exp->inflight) > 0;
+}
+
static const BlockDevOps vduse_block_ops = {
- .resize_cb = vduse_blk_resize,
+ .resize_cb = vduse_blk_resize,
+ .drained_begin = vduse_blk_drained_begin,
+ .drained_end = vduse_blk_drained_end,
+ .drained_poll = vduse_blk_drained_poll,
};
static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
@@ -268,6 +313,7 @@ static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
vblk_exp->handler.serial = g_strdup(vblk_opts->serial ?: "");
vblk_exp->handler.logical_block_size = logical_block_size;
vblk_exp->handler.writable = opts->writable;
+ vblk_exp->vqs_started = true;
config.capacity =
cpu_to_le64(blk_getlength(exp->blk) >> VIRTIO_BLK_SECTOR_BITS);
@@ -322,14 +368,20 @@ static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
vduse_dev_setup_queue(vblk_exp->dev, i, queue_size);
}
- aio_set_fd_handler(exp->ctx, vduse_dev_get_fd(vblk_exp->dev), true,
+ aio_set_fd_handler(exp->ctx, vduse_dev_get_fd(vblk_exp->dev), false,
on_vduse_dev_kick, NULL, NULL, NULL, vblk_exp->dev);
blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
vblk_exp);
-
blk_set_dev_ops(exp->blk, &vduse_block_ops, exp);
+ /*
+ * We handle draining ourselves using an in-flight counter and by disabling
+ * virtqueue fd handlers. Do not queue BlockBackend requests, they need to
+ * complete so the in-flight counter reaches zero.
+ */
+ blk_set_disable_request_queuing(exp->blk, true);
+
return 0;
err:
vduse_dev_destroy(vblk_exp->dev);
@@ -344,6 +396,9 @@ static void vduse_blk_exp_delete(BlockExport *exp)
VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
int ret;
+ assert(qatomic_read(&vblk_exp->inflight) == 0);
+
+ vduse_blk_detach_ctx(vblk_exp);
blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
vblk_exp);
blk_set_dev_ops(exp->blk, NULL, NULL);
@@ -355,13 +410,12 @@ static void vduse_blk_exp_delete(BlockExport *exp)
g_free(vblk_exp->handler.serial);
}
+/* Called with exp->ctx acquired */
static void vduse_blk_exp_request_shutdown(BlockExport *exp)
{
VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
- aio_context_acquire(vblk_exp->export.ctx);
- vduse_blk_detach_ctx(vblk_exp);
- aio_context_acquire(vblk_exp->export.ctx);
+ vduse_blk_stop_virtqueues(vblk_exp);
}
const BlockExportDriver blk_exp_vduse_blk = {
--
2.39.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v4 14/20] block/export: don't require AioContext lock around blk_exp_ref/unref()
2023-04-25 17:26 [PATCH v4 00/20] block: remove aio_disable_external() API Stefan Hajnoczi
` (12 preceding siblings ...)
2023-04-25 17:27 ` [PATCH v4 13/20] block/export: rewrite vduse-blk drain code Stefan Hajnoczi
@ 2023-04-25 17:27 ` Stefan Hajnoczi
2023-04-25 17:27 ` [PATCH v4 15/20] block/fuse: do not set is_external=true on FUSE fd Stefan Hajnoczi
` (5 subsequent siblings)
19 siblings, 0 replies; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-04-25 17:27 UTC (permalink / raw)
To: qemu-devel
Cc: Daniel P. Berrangé, Juan Quintela, Julia Suvorova,
Kevin Wolf, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Stefan Hajnoczi, Richard W.M. Jones,
qemu-block, Stefano Garzarella, Anthony Perard, Stefan Weil,
Xie Yongji, Paolo Bonzini, Aarushi Mehta,
Philippe Mathieu-Daudé, Eduardo Habkost, Stefano Stabellini,
Hanna Reitz, Ronnie Sahlberg
The FUSE export calls blk_exp_ref/unref() without the AioContext lock.
Instead of fixing the FUSE export, adjust blk_exp_ref/unref() so they
work without the AioContext lock. This way it's less error-prone.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
include/block/export.h | 2 ++
block/export/export.c | 13 ++++++-------
block/export/vduse-blk.c | 4 ----
3 files changed, 8 insertions(+), 11 deletions(-)
diff --git a/include/block/export.h b/include/block/export.h
index 7feb02e10d..f2fe0f8078 100644
--- a/include/block/export.h
+++ b/include/block/export.h
@@ -57,6 +57,8 @@ struct BlockExport {
* Reference count for this block export. This includes strong references
* both from the owner (qemu-nbd or the monitor) and clients connected to
* the export.
+ *
+ * Use atomics to access this field.
*/
int refcount;
diff --git a/block/export/export.c b/block/export/export.c
index 28a91c9c42..ddaf8036e5 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -201,11 +201,10 @@ fail:
return NULL;
}
-/* Callers must hold exp->ctx lock */
void blk_exp_ref(BlockExport *exp)
{
- assert(exp->refcount > 0);
- exp->refcount++;
+ assert(qatomic_read(&exp->refcount) > 0);
+ qatomic_inc(&exp->refcount);
}
/* Runs in the main thread */
@@ -227,11 +226,10 @@ static void blk_exp_delete_bh(void *opaque)
aio_context_release(aio_context);
}
-/* Callers must hold exp->ctx lock */
void blk_exp_unref(BlockExport *exp)
{
- assert(exp->refcount > 0);
- if (--exp->refcount == 0) {
+ assert(qatomic_read(&exp->refcount) > 0);
+ if (qatomic_fetch_dec(&exp->refcount) == 1) {
/* Touch the block_exports list only in the main thread */
aio_bh_schedule_oneshot(qemu_get_aio_context(), blk_exp_delete_bh,
exp);
@@ -339,7 +337,8 @@ void qmp_block_export_del(const char *id,
if (!has_mode) {
mode = BLOCK_EXPORT_REMOVE_MODE_SAFE;
}
- if (mode == BLOCK_EXPORT_REMOVE_MODE_SAFE && exp->refcount > 1) {
+ if (mode == BLOCK_EXPORT_REMOVE_MODE_SAFE &&
+ qatomic_read(&exp->refcount) > 1) {
error_setg(errp, "export '%s' still in use", exp->id);
error_append_hint(errp, "Use mode='hard' to force client "
"disconnect\n");
diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index 35dc8fcf45..611430afda 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -44,9 +44,7 @@ static void vduse_blk_inflight_inc(VduseBlkExport *vblk_exp)
{
if (qatomic_fetch_inc(&vblk_exp->inflight) == 0) {
/* Prevent export from being deleted */
- aio_context_acquire(vblk_exp->export.ctx);
blk_exp_ref(&vblk_exp->export);
- aio_context_release(vblk_exp->export.ctx);
}
}
@@ -57,9 +55,7 @@ static void vduse_blk_inflight_dec(VduseBlkExport *vblk_exp)
aio_wait_kick();
/* Now the export can be deleted */
- aio_context_acquire(vblk_exp->export.ctx);
blk_exp_unref(&vblk_exp->export);
- aio_context_release(vblk_exp->export.ctx);
}
}
--
2.39.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v4 15/20] block/fuse: do not set is_external=true on FUSE fd
2023-04-25 17:26 [PATCH v4 00/20] block: remove aio_disable_external() API Stefan Hajnoczi
` (13 preceding siblings ...)
2023-04-25 17:27 ` [PATCH v4 14/20] block/export: don't require AioContext lock around blk_exp_ref/unref() Stefan Hajnoczi
@ 2023-04-25 17:27 ` Stefan Hajnoczi
2023-04-25 17:27 ` [PATCH v4 16/20] virtio: make it possible to detach host notifier from any thread Stefan Hajnoczi
` (4 subsequent siblings)
19 siblings, 0 replies; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-04-25 17:27 UTC (permalink / raw)
To: qemu-devel
Cc: Daniel P. Berrangé, Juan Quintela, Julia Suvorova,
Kevin Wolf, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Stefan Hajnoczi, Richard W.M. Jones,
qemu-block, Stefano Garzarella, Anthony Perard, Stefan Weil,
Xie Yongji, Paolo Bonzini, Aarushi Mehta,
Philippe Mathieu-Daudé, Eduardo Habkost, Stefano Stabellini,
Hanna Reitz, Ronnie Sahlberg
This is part of ongoing work to remove the aio_disable_external() API.
Use BlockDevOps .drained_begin/end/poll() instead of
aio_set_fd_handler(is_external=true).
As a side-effect the FUSE export now follows AioContext changes like the
other export types.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
block/export/fuse.c | 58 +++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 56 insertions(+), 2 deletions(-)
diff --git a/block/export/fuse.c b/block/export/fuse.c
index 06fa41079e..65a7f4d723 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -50,6 +50,7 @@ typedef struct FuseExport {
struct fuse_session *fuse_session;
struct fuse_buf fuse_buf;
+ unsigned int in_flight; /* atomic */
bool mounted, fd_handler_set_up;
char *mountpoint;
@@ -78,6 +79,42 @@ static void read_from_fuse_export(void *opaque);
static bool is_regular_file(const char *path, Error **errp);
+static void fuse_export_drained_begin(void *opaque)
+{
+ FuseExport *exp = opaque;
+
+ aio_set_fd_handler(exp->common.ctx,
+ fuse_session_fd(exp->fuse_session), false,
+ NULL, NULL, NULL, NULL, NULL);
+ exp->fd_handler_set_up = false;
+}
+
+static void fuse_export_drained_end(void *opaque)
+{
+ FuseExport *exp = opaque;
+
+ /* Refresh AioContext in case it changed */
+ exp->common.ctx = blk_get_aio_context(exp->common.blk);
+
+ aio_set_fd_handler(exp->common.ctx,
+ fuse_session_fd(exp->fuse_session), false,
+ read_from_fuse_export, NULL, NULL, NULL, exp);
+ exp->fd_handler_set_up = true;
+}
+
+static bool fuse_export_drained_poll(void *opaque)
+{
+ FuseExport *exp = opaque;
+
+ return qatomic_read(&exp->in_flight) > 0;
+}
+
+static const BlockDevOps fuse_export_blk_dev_ops = {
+ .drained_begin = fuse_export_drained_begin,
+ .drained_end = fuse_export_drained_end,
+ .drained_poll = fuse_export_drained_poll,
+};
+
static int fuse_export_create(BlockExport *blk_exp,
BlockExportOptions *blk_exp_args,
Error **errp)
@@ -101,6 +138,15 @@ static int fuse_export_create(BlockExport *blk_exp,
}
}
+ blk_set_dev_ops(exp->common.blk, &fuse_export_blk_dev_ops, exp);
+
+ /*
+ * We handle draining ourselves using an in-flight counter and by disabling
+ * the FUSE fd handler. Do not queue BlockBackend requests, they need to
+ * complete so the in-flight counter reaches zero.
+ */
+ blk_set_disable_request_queuing(exp->common.blk, true);
+
init_exports_table();
/*
@@ -224,7 +270,7 @@ static int setup_fuse_export(FuseExport *exp, const char *mountpoint,
g_hash_table_insert(exports, g_strdup(mountpoint), NULL);
aio_set_fd_handler(exp->common.ctx,
- fuse_session_fd(exp->fuse_session), true,
+ fuse_session_fd(exp->fuse_session), false,
read_from_fuse_export, NULL, NULL, NULL, exp);
exp->fd_handler_set_up = true;
@@ -246,6 +292,8 @@ static void read_from_fuse_export(void *opaque)
blk_exp_ref(&exp->common);
+ qatomic_inc(&exp->in_flight);
+
do {
ret = fuse_session_receive_buf(exp->fuse_session, &exp->fuse_buf);
} while (ret == -EINTR);
@@ -256,6 +304,10 @@ static void read_from_fuse_export(void *opaque)
fuse_session_process_buf(exp->fuse_session, &exp->fuse_buf);
out:
+ if (qatomic_fetch_dec(&exp->in_flight) == 1) {
+ aio_wait_kick(); /* wake AIO_WAIT_WHILE() */
+ }
+
blk_exp_unref(&exp->common);
}
@@ -268,7 +320,7 @@ static void fuse_export_shutdown(BlockExport *blk_exp)
if (exp->fd_handler_set_up) {
aio_set_fd_handler(exp->common.ctx,
- fuse_session_fd(exp->fuse_session), true,
+ fuse_session_fd(exp->fuse_session), false,
NULL, NULL, NULL, NULL, NULL);
exp->fd_handler_set_up = false;
}
@@ -287,6 +339,8 @@ static void fuse_export_delete(BlockExport *blk_exp)
{
FuseExport *exp = container_of(blk_exp, FuseExport, common);
+ blk_set_dev_ops(exp->common.blk, NULL, NULL);
+
if (exp->fuse_session) {
if (exp->mounted) {
fuse_session_unmount(exp->fuse_session);
--
2.39.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v4 16/20] virtio: make it possible to detach host notifier from any thread
2023-04-25 17:26 [PATCH v4 00/20] block: remove aio_disable_external() API Stefan Hajnoczi
` (14 preceding siblings ...)
2023-04-25 17:27 ` [PATCH v4 15/20] block/fuse: do not set is_external=true on FUSE fd Stefan Hajnoczi
@ 2023-04-25 17:27 ` Stefan Hajnoczi
2023-05-04 21:00 ` Kevin Wolf
2023-04-25 17:27 ` [PATCH v4 17/20] virtio-blk: implement BlockDevOps->drained_begin() Stefan Hajnoczi
` (3 subsequent siblings)
19 siblings, 1 reply; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-04-25 17:27 UTC (permalink / raw)
To: qemu-devel
Cc: Daniel P. Berrangé, Juan Quintela, Julia Suvorova,
Kevin Wolf, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Stefan Hajnoczi, Richard W.M. Jones,
qemu-block, Stefano Garzarella, Anthony Perard, Stefan Weil,
Xie Yongji, Paolo Bonzini, Aarushi Mehta,
Philippe Mathieu-Daudé, Eduardo Habkost, Stefano Stabellini,
Hanna Reitz, Ronnie Sahlberg
virtio_queue_aio_detach_host_notifier() does two things:
1. It removes the fd handler from the event loop.
2. It processes the virtqueue one last time.
The first step can be peformed by any thread and without taking the
AioContext lock.
The second step may need the AioContext lock (depending on the device
implementation) and runs in the thread where request processing takes
place. virtio-blk and virtio-scsi therefore call
virtio_queue_aio_detach_host_notifier() from a BH that is scheduled in
AioContext
Scheduling a BH is undesirable for .drained_begin() functions. The next
patch will introduce a .drained_begin() function that needs to call
virtio_queue_aio_detach_host_notifier().
Move the virtqueue processing out to the callers of
virtio_queue_aio_detach_host_notifier() so that the function can be
called from any thread. This is in preparation for the next patch.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
hw/block/dataplane/virtio-blk.c | 2 ++
hw/scsi/virtio-scsi-dataplane.c | 9 +++++++++
2 files changed, 11 insertions(+)
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index b28d81737e..bd7cc6e76b 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -286,8 +286,10 @@ static void virtio_blk_data_plane_stop_bh(void *opaque)
for (i = 0; i < s->conf->num_queues; i++) {
VirtQueue *vq = virtio_get_queue(s->vdev, i);
+ EventNotifier *host_notifier = virtio_queue_get_host_notifier(vq);
virtio_queue_aio_detach_host_notifier(vq, s->ctx);
+ virtio_queue_host_notifier_read(host_notifier);
}
}
diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 20bb91766e..81643445ed 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -71,12 +71,21 @@ static void virtio_scsi_dataplane_stop_bh(void *opaque)
{
VirtIOSCSI *s = opaque;
VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s);
+ EventNotifier *host_notifier;
int i;
virtio_queue_aio_detach_host_notifier(vs->ctrl_vq, s->ctx);
+ host_notifier = virtio_queue_get_host_notifier(vs->ctrl_vq);
+ virtio_queue_host_notifier_read(host_notifier);
+
virtio_queue_aio_detach_host_notifier(vs->event_vq, s->ctx);
+ host_notifier = virtio_queue_get_host_notifier(vs->event_vq);
+ virtio_queue_host_notifier_read(host_notifier);
+
for (i = 0; i < vs->conf.num_queues; i++) {
virtio_queue_aio_detach_host_notifier(vs->cmd_vqs[i], s->ctx);
+ host_notifier = virtio_queue_get_host_notifier(vs->cmd_vqs[i]);
+ virtio_queue_host_notifier_read(host_notifier);
}
}
--
2.39.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v4 17/20] virtio-blk: implement BlockDevOps->drained_begin()
2023-04-25 17:26 [PATCH v4 00/20] block: remove aio_disable_external() API Stefan Hajnoczi
` (15 preceding siblings ...)
2023-04-25 17:27 ` [PATCH v4 16/20] virtio: make it possible to detach host notifier from any thread Stefan Hajnoczi
@ 2023-04-25 17:27 ` Stefan Hajnoczi
2023-05-04 21:13 ` Kevin Wolf
2023-04-25 17:27 ` [PATCH v4 18/20] virtio-scsi: " Stefan Hajnoczi
` (2 subsequent siblings)
19 siblings, 1 reply; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-04-25 17:27 UTC (permalink / raw)
To: qemu-devel
Cc: Daniel P. Berrangé, Juan Quintela, Julia Suvorova,
Kevin Wolf, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Stefan Hajnoczi, Richard W.M. Jones,
qemu-block, Stefano Garzarella, Anthony Perard, Stefan Weil,
Xie Yongji, Paolo Bonzini, Aarushi Mehta,
Philippe Mathieu-Daudé, Eduardo Habkost, Stefano Stabellini,
Hanna Reitz, Ronnie Sahlberg
Detach ioeventfds during drained sections to stop I/O submission from
the guest. virtio-blk is no longer reliant on aio_disable_external()
after this patch. This will allow us to remove the
aio_disable_external() API once all other code that relies on it is
converted.
Take extra care to avoid attaching/detaching ioeventfds if the data
plane is started/stopped during a drained section. This should be rare,
but maybe the mirror block job can trigger it.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
hw/block/dataplane/virtio-blk.c | 17 +++++++++------
hw/block/virtio-blk.c | 38 ++++++++++++++++++++++++++++++++-
2 files changed, 48 insertions(+), 7 deletions(-)
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index bd7cc6e76b..d77fc6028c 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -245,13 +245,15 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
}
/* Get this show started by hooking up our callbacks */
- aio_context_acquire(s->ctx);
- for (i = 0; i < nvqs; i++) {
- VirtQueue *vq = virtio_get_queue(s->vdev, i);
+ if (!blk_in_drain(s->conf->conf.blk)) {
+ aio_context_acquire(s->ctx);
+ for (i = 0; i < nvqs; i++) {
+ VirtQueue *vq = virtio_get_queue(s->vdev, i);
- virtio_queue_aio_attach_host_notifier(vq, s->ctx);
+ virtio_queue_aio_attach_host_notifier(vq, s->ctx);
+ }
+ aio_context_release(s->ctx);
}
- aio_context_release(s->ctx);
return 0;
fail_aio_context:
@@ -317,7 +319,10 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
trace_virtio_blk_data_plane_stop(s);
aio_context_acquire(s->ctx);
- aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s);
+
+ if (!blk_in_drain(s->conf->conf.blk)) {
+ aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s);
+ }
/* Wait for virtio_blk_dma_restart_bh() and in flight I/O to complete */
blk_drain(s->conf->conf.blk);
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index cefca93b31..d8dedc575c 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1109,8 +1109,44 @@ static void virtio_blk_resize(void *opaque)
aio_bh_schedule_oneshot(qemu_get_aio_context(), virtio_resize_cb, vdev);
}
+/* Suspend virtqueue ioeventfd processing during drain */
+static void virtio_blk_drained_begin(void *opaque)
+{
+ VirtIOBlock *s = opaque;
+ VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
+ AioContext *ctx = blk_get_aio_context(s->conf.conf.blk);
+
+ if (!s->dataplane || !s->dataplane_started) {
+ return;
+ }
+
+ for (uint16_t i = 0; i < s->conf.num_queues; i++) {
+ VirtQueue *vq = virtio_get_queue(vdev, i);
+ virtio_queue_aio_detach_host_notifier(vq, ctx);
+ }
+}
+
+/* Resume virtqueue ioeventfd processing after drain */
+static void virtio_blk_drained_end(void *opaque)
+{
+ VirtIOBlock *s = opaque;
+ VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
+ AioContext *ctx = blk_get_aio_context(s->conf.conf.blk);
+
+ if (!s->dataplane || !s->dataplane_started) {
+ return;
+ }
+
+ for (uint16_t i = 0; i < s->conf.num_queues; i++) {
+ VirtQueue *vq = virtio_get_queue(vdev, i);
+ virtio_queue_aio_attach_host_notifier(vq, ctx);
+ }
+}
+
static const BlockDevOps virtio_block_ops = {
- .resize_cb = virtio_blk_resize,
+ .resize_cb = virtio_blk_resize,
+ .drained_begin = virtio_blk_drained_begin,
+ .drained_end = virtio_blk_drained_end,
};
static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
--
2.39.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v4 18/20] virtio-scsi: implement BlockDevOps->drained_begin()
2023-04-25 17:26 [PATCH v4 00/20] block: remove aio_disable_external() API Stefan Hajnoczi
` (16 preceding siblings ...)
2023-04-25 17:27 ` [PATCH v4 17/20] virtio-blk: implement BlockDevOps->drained_begin() Stefan Hajnoczi
@ 2023-04-25 17:27 ` Stefan Hajnoczi
2023-05-04 21:25 ` Kevin Wolf
2023-04-25 17:27 ` [PATCH v4 19/20] virtio: do not set is_external=true on host notifiers Stefan Hajnoczi
2023-04-25 17:27 ` [PATCH v4 20/20] aio: remove aio_disable_external() API Stefan Hajnoczi
19 siblings, 1 reply; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-04-25 17:27 UTC (permalink / raw)
To: qemu-devel
Cc: Daniel P. Berrangé, Juan Quintela, Julia Suvorova,
Kevin Wolf, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Stefan Hajnoczi, Richard W.M. Jones,
qemu-block, Stefano Garzarella, Anthony Perard, Stefan Weil,
Xie Yongji, Paolo Bonzini, Aarushi Mehta,
Philippe Mathieu-Daudé, Eduardo Habkost, Stefano Stabellini,
Hanna Reitz, Ronnie Sahlberg
The virtio-scsi Host Bus Adapter provides access to devices on a SCSI
bus. Those SCSI devices typically have a BlockBackend. When the
BlockBackend enters a drained section, the SCSI device must temporarily
stop submitting new I/O requests.
Implement this behavior by temporarily stopping virtio-scsi virtqueue
processing when one of the SCSI devices enters a drained section. The
new scsi_device_drained_begin() API allows scsi-disk to message the
virtio-scsi HBA.
scsi_device_drained_begin() uses a drain counter so that multiple SCSI
devices can have overlapping drained sections. The HBA only sees one
pair of .drained_begin/end() calls.
After this commit, virtio-scsi no longer depends on hw/virtio's
ioeventfd aio_set_event_notifier(is_external=true). This commit is a
step towards removing the aio_disable_external() API.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
include/hw/scsi/scsi.h | 14 ++++++++++++
hw/scsi/scsi-bus.c | 40 +++++++++++++++++++++++++++++++++
hw/scsi/scsi-disk.c | 27 +++++++++++++++++-----
hw/scsi/virtio-scsi-dataplane.c | 22 ++++++++++--------
hw/scsi/virtio-scsi.c | 38 +++++++++++++++++++++++++++++++
hw/scsi/trace-events | 2 ++
6 files changed, 129 insertions(+), 14 deletions(-)
diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
index 6f23a7a73e..e2bb1a2fbf 100644
--- a/include/hw/scsi/scsi.h
+++ b/include/hw/scsi/scsi.h
@@ -133,6 +133,16 @@ struct SCSIBusInfo {
void (*save_request)(QEMUFile *f, SCSIRequest *req);
void *(*load_request)(QEMUFile *f, SCSIRequest *req);
void (*free_request)(SCSIBus *bus, void *priv);
+
+ /*
+ * Temporarily stop submitting new requests between drained_begin() and
+ * drained_end(). Called from the main loop thread with the BQL held.
+ *
+ * Implement these callbacks if request processing is triggered by a file
+ * descriptor like an EventNotifier. Otherwise set them to NULL.
+ */
+ void (*drained_begin)(SCSIBus *bus);
+ void (*drained_end)(SCSIBus *bus);
};
#define TYPE_SCSI_BUS "SCSI"
@@ -144,6 +154,8 @@ struct SCSIBus {
SCSISense unit_attention;
const SCSIBusInfo *info;
+
+ int drain_count; /* protected by BQL */
};
/**
@@ -213,6 +225,8 @@ void scsi_req_cancel_complete(SCSIRequest *req);
void scsi_req_cancel(SCSIRequest *req);
void scsi_req_cancel_async(SCSIRequest *req, Notifier *notifier);
void scsi_req_retry(SCSIRequest *req);
+void scsi_device_drained_begin(SCSIDevice *sdev);
+void scsi_device_drained_end(SCSIDevice *sdev);
void scsi_device_purge_requests(SCSIDevice *sdev, SCSISense sense);
void scsi_device_set_ua(SCSIDevice *sdev, SCSISense sense);
void scsi_device_report_change(SCSIDevice *dev, SCSISense sense);
diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 64d7311757..b571fdf895 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -1668,6 +1668,46 @@ void scsi_device_purge_requests(SCSIDevice *sdev, SCSISense sense)
scsi_device_set_ua(sdev, sense);
}
+void scsi_device_drained_begin(SCSIDevice *sdev)
+{
+ SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, sdev->qdev.parent_bus);
+ if (!bus) {
+ return;
+ }
+
+ assert(qemu_get_current_aio_context() == qemu_get_aio_context());
+ assert(bus->drain_count < INT_MAX);
+
+ /*
+ * Multiple BlockBackends can be on a SCSIBus and each may begin/end
+ * draining at any time. Keep a counter so HBAs only see begin/end once.
+ */
+ if (bus->drain_count++ == 0) {
+ trace_scsi_bus_drained_begin(bus, sdev);
+ if (bus->info->drained_begin) {
+ bus->info->drained_begin(bus);
+ }
+ }
+}
+
+void scsi_device_drained_end(SCSIDevice *sdev)
+{
+ SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, sdev->qdev.parent_bus);
+ if (!bus) {
+ return;
+ }
+
+ assert(qemu_get_current_aio_context() == qemu_get_aio_context());
+ assert(bus->drain_count > 0);
+
+ if (bus->drain_count-- == 1) {
+ trace_scsi_bus_drained_end(bus, sdev);
+ if (bus->info->drained_end) {
+ bus->info->drained_end(bus);
+ }
+ }
+}
+
static char *scsibus_get_dev_path(DeviceState *dev)
{
SCSIDevice *d = SCSI_DEVICE(dev);
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index e01bd84541..2249087d6a 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -2360,6 +2360,20 @@ static void scsi_disk_reset(DeviceState *dev)
s->qdev.scsi_version = s->qdev.default_scsi_version;
}
+static void scsi_disk_drained_begin(void *opaque)
+{
+ SCSIDiskState *s = opaque;
+
+ scsi_device_drained_begin(&s->qdev);
+}
+
+static void scsi_disk_drained_end(void *opaque)
+{
+ SCSIDiskState *s = opaque;
+
+ scsi_device_drained_end(&s->qdev);
+}
+
static void scsi_disk_resize_cb(void *opaque)
{
SCSIDiskState *s = opaque;
@@ -2414,16 +2428,19 @@ static bool scsi_cd_is_medium_locked(void *opaque)
}
static const BlockDevOps scsi_disk_removable_block_ops = {
- .change_media_cb = scsi_cd_change_media_cb,
+ .change_media_cb = scsi_cd_change_media_cb,
+ .drained_begin = scsi_disk_drained_begin,
+ .drained_end = scsi_disk_drained_end,
.eject_request_cb = scsi_cd_eject_request_cb,
- .is_tray_open = scsi_cd_is_tray_open,
.is_medium_locked = scsi_cd_is_medium_locked,
-
- .resize_cb = scsi_disk_resize_cb,
+ .is_tray_open = scsi_cd_is_tray_open,
+ .resize_cb = scsi_disk_resize_cb,
};
static const BlockDevOps scsi_disk_block_ops = {
- .resize_cb = scsi_disk_resize_cb,
+ .drained_begin = scsi_disk_drained_begin,
+ .drained_end = scsi_disk_drained_end,
+ .resize_cb = scsi_disk_resize_cb,
};
static void scsi_disk_unit_attention_reported(SCSIDevice *dev)
diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 81643445ed..1060038e13 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -153,14 +153,16 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
s->dataplane_starting = false;
s->dataplane_started = true;
- aio_context_acquire(s->ctx);
- virtio_queue_aio_attach_host_notifier(vs->ctrl_vq, s->ctx);
- virtio_queue_aio_attach_host_notifier_no_poll(vs->event_vq, s->ctx);
+ if (s->bus.drain_count == 0) {
+ aio_context_acquire(s->ctx);
+ virtio_queue_aio_attach_host_notifier(vs->ctrl_vq, s->ctx);
+ virtio_queue_aio_attach_host_notifier_no_poll(vs->event_vq, s->ctx);
- for (i = 0; i < vs->conf.num_queues; i++) {
- virtio_queue_aio_attach_host_notifier(vs->cmd_vqs[i], s->ctx);
+ for (i = 0; i < vs->conf.num_queues; i++) {
+ virtio_queue_aio_attach_host_notifier(vs->cmd_vqs[i], s->ctx);
+ }
+ aio_context_release(s->ctx);
}
- aio_context_release(s->ctx);
return 0;
fail_host_notifiers:
@@ -206,9 +208,11 @@ void virtio_scsi_dataplane_stop(VirtIODevice *vdev)
}
s->dataplane_stopping = true;
- aio_context_acquire(s->ctx);
- aio_wait_bh_oneshot(s->ctx, virtio_scsi_dataplane_stop_bh, s);
- aio_context_release(s->ctx);
+ if (s->bus.drain_count == 0) {
+ aio_context_acquire(s->ctx);
+ aio_wait_bh_oneshot(s->ctx, virtio_scsi_dataplane_stop_bh, s);
+ aio_context_release(s->ctx);
+ }
blk_drain_all(); /* ensure there are no in-flight requests */
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index a02f9233ec..eba1e84dac 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -1081,6 +1081,42 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
}
}
+/* Suspend virtqueue ioeventfd processing during drain */
+static void virtio_scsi_drained_begin(SCSIBus *bus)
+{
+ VirtIOSCSI *s = container_of(bus, VirtIOSCSI, bus);
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
+ uint32_t total_queues = VIRTIO_SCSI_VQ_NUM_FIXED +
+ s->parent_obj.conf.num_queues;
+
+ if (!s->dataplane_started) {
+ return;
+ }
+
+ for (uint32_t i = 0; i < total_queues; i++) {
+ VirtQueue *vq = virtio_get_queue(vdev, i);
+ virtio_queue_aio_detach_host_notifier(vq, s->ctx);
+ }
+}
+
+/* Resume virtqueue ioeventfd processing after drain */
+static void virtio_scsi_drained_end(SCSIBus *bus)
+{
+ VirtIOSCSI *s = container_of(bus, VirtIOSCSI, bus);
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
+ uint32_t total_queues = VIRTIO_SCSI_VQ_NUM_FIXED +
+ s->parent_obj.conf.num_queues;
+
+ if (!s->dataplane_started) {
+ return;
+ }
+
+ for (uint32_t i = 0; i < total_queues; i++) {
+ VirtQueue *vq = virtio_get_queue(vdev, i);
+ virtio_queue_aio_attach_host_notifier(vq, s->ctx);
+ }
+}
+
static struct SCSIBusInfo virtio_scsi_scsi_info = {
.tcq = true,
.max_channel = VIRTIO_SCSI_MAX_CHANNEL,
@@ -1095,6 +1131,8 @@ static struct SCSIBusInfo virtio_scsi_scsi_info = {
.get_sg_list = virtio_scsi_get_sg_list,
.save_request = virtio_scsi_save_request,
.load_request = virtio_scsi_load_request,
+ .drained_begin = virtio_scsi_drained_begin,
+ .drained_end = virtio_scsi_drained_end,
};
void virtio_scsi_common_realize(DeviceState *dev,
diff --git a/hw/scsi/trace-events b/hw/scsi/trace-events
index ab238293f0..bdd4e2c7c7 100644
--- a/hw/scsi/trace-events
+++ b/hw/scsi/trace-events
@@ -6,6 +6,8 @@ scsi_req_cancel(int target, int lun, int tag) "target %d lun %d tag %d"
scsi_req_data(int target, int lun, int tag, int len) "target %d lun %d tag %d len %d"
scsi_req_data_canceled(int target, int lun, int tag, int len) "target %d lun %d tag %d len %d"
scsi_req_dequeue(int target, int lun, int tag) "target %d lun %d tag %d"
+scsi_bus_drained_begin(void *bus, void *sdev) "bus %p sdev %p"
+scsi_bus_drained_end(void *bus, void *sdev) "bus %p sdev %p"
scsi_req_continue(int target, int lun, int tag) "target %d lun %d tag %d"
scsi_req_continue_canceled(int target, int lun, int tag) "target %d lun %d tag %d"
scsi_req_parsed(int target, int lun, int tag, int cmd, int mode, int xfer) "target %d lun %d tag %d command %d dir %d length %d"
--
2.39.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v4 19/20] virtio: do not set is_external=true on host notifiers
2023-04-25 17:26 [PATCH v4 00/20] block: remove aio_disable_external() API Stefan Hajnoczi
` (17 preceding siblings ...)
2023-04-25 17:27 ` [PATCH v4 18/20] virtio-scsi: " Stefan Hajnoczi
@ 2023-04-25 17:27 ` Stefan Hajnoczi
2023-04-25 17:27 ` [PATCH v4 20/20] aio: remove aio_disable_external() API Stefan Hajnoczi
19 siblings, 0 replies; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-04-25 17:27 UTC (permalink / raw)
To: qemu-devel
Cc: Daniel P. Berrangé, Juan Quintela, Julia Suvorova,
Kevin Wolf, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Stefan Hajnoczi, Richard W.M. Jones,
qemu-block, Stefano Garzarella, Anthony Perard, Stefan Weil,
Xie Yongji, Paolo Bonzini, Aarushi Mehta,
Philippe Mathieu-Daudé, Eduardo Habkost, Stefano Stabellini,
Hanna Reitz, Ronnie Sahlberg
Host notifiers can now use is_external=false since virtio-blk and
virtio-scsi no longer rely on is_external=true for drained sections.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
hw/virtio/virtio.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 272d930721..9cdad7e550 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3491,7 +3491,7 @@ static void virtio_queue_host_notifier_aio_poll_end(EventNotifier *n)
void virtio_queue_aio_attach_host_notifier(VirtQueue *vq, AioContext *ctx)
{
- aio_set_event_notifier(ctx, &vq->host_notifier, true,
+ aio_set_event_notifier(ctx, &vq->host_notifier, false,
virtio_queue_host_notifier_read,
virtio_queue_host_notifier_aio_poll,
virtio_queue_host_notifier_aio_poll_ready);
@@ -3508,14 +3508,14 @@ void virtio_queue_aio_attach_host_notifier(VirtQueue *vq, AioContext *ctx)
*/
void virtio_queue_aio_attach_host_notifier_no_poll(VirtQueue *vq, AioContext *ctx)
{
- aio_set_event_notifier(ctx, &vq->host_notifier, true,
+ aio_set_event_notifier(ctx, &vq->host_notifier, false,
virtio_queue_host_notifier_read,
NULL, NULL);
}
void virtio_queue_aio_detach_host_notifier(VirtQueue *vq, AioContext *ctx)
{
- aio_set_event_notifier(ctx, &vq->host_notifier, true, NULL, NULL, NULL);
+ aio_set_event_notifier(ctx, &vq->host_notifier, false, NULL, NULL, NULL);
/* Test and clear notifier before after disabling event,
* in case poll callback didn't have time to run. */
virtio_queue_host_notifier_read(&vq->host_notifier);
--
2.39.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v4 20/20] aio: remove aio_disable_external() API
2023-04-25 17:26 [PATCH v4 00/20] block: remove aio_disable_external() API Stefan Hajnoczi
` (18 preceding siblings ...)
2023-04-25 17:27 ` [PATCH v4 19/20] virtio: do not set is_external=true on host notifiers Stefan Hajnoczi
@ 2023-04-25 17:27 ` Stefan Hajnoczi
2023-05-04 21:34 ` Kevin Wolf
19 siblings, 1 reply; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-04-25 17:27 UTC (permalink / raw)
To: qemu-devel
Cc: Daniel P. Berrangé, Juan Quintela, Julia Suvorova,
Kevin Wolf, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Stefan Hajnoczi, Richard W.M. Jones,
qemu-block, Stefano Garzarella, Anthony Perard, Stefan Weil,
Xie Yongji, Paolo Bonzini, Aarushi Mehta,
Philippe Mathieu-Daudé, Eduardo Habkost, Stefano Stabellini,
Hanna Reitz, Ronnie Sahlberg
All callers now pass is_external=false to aio_set_fd_handler() and
aio_set_event_notifier(). The aio_disable_external() API that
temporarily disables fd handlers that were registered is_external=true
is therefore dead code.
Remove aio_disable_external(), aio_enable_external(), and the
is_external arguments to aio_set_fd_handler() and
aio_set_event_notifier().
The entire test-fdmon-epoll test is removed because its sole purpose was
testing aio_disable_external().
Parts of this patch were generated using the following coccinelle
(https://coccinelle.lip6.fr/) semantic patch:
@@
expression ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque;
@@
- aio_set_fd_handler(ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque)
+ aio_set_fd_handler(ctx, fd, io_read, io_write, io_poll, io_poll_ready, opaque)
@@
expression ctx, notifier, is_external, io_read, io_poll, io_poll_ready;
@@
- aio_set_event_notifier(ctx, notifier, is_external, io_read, io_poll, io_poll_ready)
+ aio_set_event_notifier(ctx, notifier, io_read, io_poll, io_poll_ready)
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
include/block/aio.h | 57 ---------------------------
util/aio-posix.h | 1 -
block.c | 7 ----
block/blkio.c | 15 +++----
block/curl.c | 10 ++---
block/export/fuse.c | 8 ++--
block/export/vduse-blk.c | 10 ++---
block/io.c | 2 -
block/io_uring.c | 4 +-
block/iscsi.c | 3 +-
block/linux-aio.c | 4 +-
block/nfs.c | 5 +--
block/nvme.c | 8 ++--
block/ssh.c | 4 +-
block/win32-aio.c | 6 +--
hw/i386/kvm/xen_xenstore.c | 2 +-
hw/virtio/virtio.c | 6 +--
hw/xen/xen-bus.c | 8 ++--
io/channel-command.c | 6 +--
io/channel-file.c | 3 +-
io/channel-socket.c | 3 +-
migration/rdma.c | 16 ++++----
tests/unit/test-aio.c | 27 +------------
tests/unit/test-bdrv-drain.c | 1 -
tests/unit/test-fdmon-epoll.c | 73 -----------------------------------
util/aio-posix.c | 20 +++-------
util/aio-win32.c | 8 +---
util/async.c | 3 +-
util/fdmon-epoll.c | 18 +++------
util/fdmon-io_uring.c | 8 +---
util/fdmon-poll.c | 3 +-
util/main-loop.c | 7 ++--
util/qemu-coroutine-io.c | 7 ++--
util/vhost-user-server.c | 11 +++---
tests/unit/meson.build | 3 --
35 files changed, 82 insertions(+), 295 deletions(-)
delete mode 100644 tests/unit/test-fdmon-epoll.c
diff --git a/include/block/aio.h b/include/block/aio.h
index 543717f294..bb38f0753f 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -231,8 +231,6 @@ struct AioContext {
*/
QEMUTimerListGroup tlg;
- int external_disable_cnt;
-
/* Number of AioHandlers without .io_poll() */
int poll_disable_cnt;
@@ -475,7 +473,6 @@ bool aio_poll(AioContext *ctx, bool blocking);
*/
void aio_set_fd_handler(AioContext *ctx,
int fd,
- bool is_external,
IOHandler *io_read,
IOHandler *io_write,
AioPollFn *io_poll,
@@ -491,7 +488,6 @@ void aio_set_fd_handler(AioContext *ctx,
*/
void aio_set_event_notifier(AioContext *ctx,
EventNotifier *notifier,
- bool is_external,
EventNotifierHandler *io_read,
AioPollFn *io_poll,
EventNotifierHandler *io_poll_ready);
@@ -620,59 +616,6 @@ static inline void aio_timer_init(AioContext *ctx,
*/
int64_t aio_compute_timeout(AioContext *ctx);
-/**
- * aio_disable_external:
- * @ctx: the aio context
- *
- * Disable the further processing of external clients.
- */
-static inline void aio_disable_external(AioContext *ctx)
-{
- qatomic_inc(&ctx->external_disable_cnt);
-}
-
-/**
- * aio_enable_external:
- * @ctx: the aio context
- *
- * Enable the processing of external clients.
- */
-static inline void aio_enable_external(AioContext *ctx)
-{
- int old;
-
- old = qatomic_fetch_dec(&ctx->external_disable_cnt);
- assert(old > 0);
- if (old == 1) {
- /* Kick event loop so it re-arms file descriptors */
- aio_notify(ctx);
- }
-}
-
-/**
- * aio_external_disabled:
- * @ctx: the aio context
- *
- * Return true if the external clients are disabled.
- */
-static inline bool aio_external_disabled(AioContext *ctx)
-{
- return qatomic_read(&ctx->external_disable_cnt);
-}
-
-/**
- * aio_node_check:
- * @ctx: the aio context
- * @is_external: Whether or not the checked node is an external event source.
- *
- * Check if the node's is_external flag is okay to be polled by the ctx at this
- * moment. True means green light.
- */
-static inline bool aio_node_check(AioContext *ctx, bool is_external)
-{
- return !is_external || !qatomic_read(&ctx->external_disable_cnt);
-}
-
/**
* aio_co_schedule:
* @ctx: the aio context
diff --git a/util/aio-posix.h b/util/aio-posix.h
index 80b927c7f4..4264c518be 100644
--- a/util/aio-posix.h
+++ b/util/aio-posix.h
@@ -38,7 +38,6 @@ struct AioHandler {
#endif
int64_t poll_idle_timeout; /* when to stop userspace polling */
bool poll_ready; /* has polling detected an event? */
- bool is_external;
};
/* Add a handler to a ready list */
diff --git a/block.c b/block.c
index d79a52ca74..608c99a219 100644
--- a/block.c
+++ b/block.c
@@ -7268,9 +7268,6 @@ static void bdrv_detach_aio_context(BlockDriverState *bs)
bs->drv->bdrv_detach_aio_context(bs);
}
- if (bs->quiesce_counter) {
- aio_enable_external(bs->aio_context);
- }
bs->aio_context = NULL;
}
@@ -7280,10 +7277,6 @@ static void bdrv_attach_aio_context(BlockDriverState *bs,
BdrvAioNotifier *ban, *ban_tmp;
GLOBAL_STATE_CODE();
- if (bs->quiesce_counter) {
- aio_disable_external(new_context);
- }
-
bs->aio_context = new_context;
if (bs->drv && bs->drv->bdrv_attach_aio_context) {
diff --git a/block/blkio.c b/block/blkio.c
index 0cdc99a729..72117fa005 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -306,23 +306,18 @@ static void blkio_attach_aio_context(BlockDriverState *bs,
{
BDRVBlkioState *s = bs->opaque;
- aio_set_fd_handler(new_context,
- s->completion_fd,
- false,
- blkio_completion_fd_read,
- NULL,
+ aio_set_fd_handler(new_context, s->completion_fd,
+ blkio_completion_fd_read, NULL,
blkio_completion_fd_poll,
- blkio_completion_fd_poll_ready,
- bs);
+ blkio_completion_fd_poll_ready, bs);
}
static void blkio_detach_aio_context(BlockDriverState *bs)
{
BDRVBlkioState *s = bs->opaque;
- aio_set_fd_handler(bdrv_get_aio_context(bs),
- s->completion_fd,
- false, NULL, NULL, NULL, NULL, NULL);
+ aio_set_fd_handler(bdrv_get_aio_context(bs), s->completion_fd, NULL, NULL,
+ NULL, NULL, NULL);
}
/* Call with s->blkio_lock held to submit I/O after enqueuing a new request */
diff --git a/block/curl.c b/block/curl.c
index 8bb39a134e..0fc42d03d7 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -132,7 +132,7 @@ static gboolean curl_drop_socket(void *key, void *value, void *opaque)
CURLSocket *socket = value;
BDRVCURLState *s = socket->s;
- aio_set_fd_handler(s->aio_context, socket->fd, false,
+ aio_set_fd_handler(s->aio_context, socket->fd,
NULL, NULL, NULL, NULL, NULL);
return true;
}
@@ -180,20 +180,20 @@ static int curl_sock_cb(CURL *curl, curl_socket_t fd, int action,
trace_curl_sock_cb(action, (int)fd);
switch (action) {
case CURL_POLL_IN:
- aio_set_fd_handler(s->aio_context, fd, false,
+ aio_set_fd_handler(s->aio_context, fd,
curl_multi_do, NULL, NULL, NULL, socket);
break;
case CURL_POLL_OUT:
- aio_set_fd_handler(s->aio_context, fd, false,
+ aio_set_fd_handler(s->aio_context, fd,
NULL, curl_multi_do, NULL, NULL, socket);
break;
case CURL_POLL_INOUT:
- aio_set_fd_handler(s->aio_context, fd, false,
+ aio_set_fd_handler(s->aio_context, fd,
curl_multi_do, curl_multi_do,
NULL, NULL, socket);
break;
case CURL_POLL_REMOVE:
- aio_set_fd_handler(s->aio_context, fd, false,
+ aio_set_fd_handler(s->aio_context, fd,
NULL, NULL, NULL, NULL, NULL);
break;
}
diff --git a/block/export/fuse.c b/block/export/fuse.c
index 65a7f4d723..5c75c9407e 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -84,7 +84,7 @@ static void fuse_export_drained_begin(void *opaque)
FuseExport *exp = opaque;
aio_set_fd_handler(exp->common.ctx,
- fuse_session_fd(exp->fuse_session), false,
+ fuse_session_fd(exp->fuse_session),
NULL, NULL, NULL, NULL, NULL);
exp->fd_handler_set_up = false;
}
@@ -97,7 +97,7 @@ static void fuse_export_drained_end(void *opaque)
exp->common.ctx = blk_get_aio_context(exp->common.blk);
aio_set_fd_handler(exp->common.ctx,
- fuse_session_fd(exp->fuse_session), false,
+ fuse_session_fd(exp->fuse_session),
read_from_fuse_export, NULL, NULL, NULL, exp);
exp->fd_handler_set_up = true;
}
@@ -270,7 +270,7 @@ static int setup_fuse_export(FuseExport *exp, const char *mountpoint,
g_hash_table_insert(exports, g_strdup(mountpoint), NULL);
aio_set_fd_handler(exp->common.ctx,
- fuse_session_fd(exp->fuse_session), false,
+ fuse_session_fd(exp->fuse_session),
read_from_fuse_export, NULL, NULL, NULL, exp);
exp->fd_handler_set_up = true;
@@ -320,7 +320,7 @@ static void fuse_export_shutdown(BlockExport *blk_exp)
if (exp->fd_handler_set_up) {
aio_set_fd_handler(exp->common.ctx,
- fuse_session_fd(exp->fuse_session), false,
+ fuse_session_fd(exp->fuse_session),
NULL, NULL, NULL, NULL, NULL);
exp->fd_handler_set_up = false;
}
diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index 611430afda..048bdcbfb6 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -137,7 +137,7 @@ static void vduse_blk_enable_queue(VduseDev *dev, VduseVirtq *vq)
}
aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq),
- false, on_vduse_vq_kick, NULL, NULL, NULL, vq);
+ on_vduse_vq_kick, NULL, NULL, NULL, vq);
/* Make sure we don't miss any kick afer reconnecting */
eventfd_write(vduse_queue_get_fd(vq), 1);
}
@@ -151,7 +151,7 @@ static void vduse_blk_disable_queue(VduseDev *dev, VduseVirtq *vq)
return;
}
- aio_set_fd_handler(vblk_exp->export.ctx, fd, false,
+ aio_set_fd_handler(vblk_exp->export.ctx, fd,
NULL, NULL, NULL, NULL, NULL);
}
@@ -170,7 +170,7 @@ static void on_vduse_dev_kick(void *opaque)
static void vduse_blk_attach_ctx(VduseBlkExport *vblk_exp, AioContext *ctx)
{
aio_set_fd_handler(vblk_exp->export.ctx, vduse_dev_get_fd(vblk_exp->dev),
- false, on_vduse_dev_kick, NULL, NULL, NULL,
+ on_vduse_dev_kick, NULL, NULL, NULL,
vblk_exp->dev);
/* Virtqueues are handled by vduse_blk_drained_end() */
@@ -179,7 +179,7 @@ static void vduse_blk_attach_ctx(VduseBlkExport *vblk_exp, AioContext *ctx)
static void vduse_blk_detach_ctx(VduseBlkExport *vblk_exp)
{
aio_set_fd_handler(vblk_exp->export.ctx, vduse_dev_get_fd(vblk_exp->dev),
- false, NULL, NULL, NULL, NULL, NULL);
+ NULL, NULL, NULL, NULL, NULL);
/* Virtqueues are handled by vduse_blk_drained_begin() */
}
@@ -364,7 +364,7 @@ static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
vduse_dev_setup_queue(vblk_exp->dev, i, queue_size);
}
- aio_set_fd_handler(exp->ctx, vduse_dev_get_fd(vblk_exp->dev), false,
+ aio_set_fd_handler(exp->ctx, vduse_dev_get_fd(vblk_exp->dev),
on_vduse_dev_kick, NULL, NULL, NULL, vblk_exp->dev);
blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
diff --git a/block/io.c b/block/io.c
index 4f9fe2f808..9affddb3a0 100644
--- a/block/io.c
+++ b/block/io.c
@@ -361,7 +361,6 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
/* Stop things in parent-to-child order */
if (qatomic_fetch_inc(&bs->quiesce_counter) == 0) {
- aio_disable_external(bdrv_get_aio_context(bs));
bdrv_parent_drained_begin(bs, parent);
if (bs->drv && bs->drv->bdrv_drain_begin) {
bs->drv->bdrv_drain_begin(bs);
@@ -414,7 +413,6 @@ static void bdrv_do_drained_end(BlockDriverState *bs, BdrvChild *parent)
bs->drv->bdrv_drain_end(bs);
}
bdrv_parent_drained_end(bs, parent);
- aio_enable_external(bdrv_get_aio_context(bs));
}
}
diff --git a/block/io_uring.c b/block/io_uring.c
index 973e15d876..3a07215ded 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -399,7 +399,7 @@ int coroutine_fn luring_co_submit(BlockDriverState *bs, LuringState *s, int fd,
void luring_detach_aio_context(LuringState *s, AioContext *old_context)
{
- aio_set_fd_handler(old_context, s->ring.ring_fd, false,
+ aio_set_fd_handler(old_context, s->ring.ring_fd,
NULL, NULL, NULL, NULL, s);
qemu_bh_delete(s->completion_bh);
s->aio_context = NULL;
@@ -409,7 +409,7 @@ void luring_attach_aio_context(LuringState *s, AioContext *new_context)
{
s->aio_context = new_context;
s->completion_bh = aio_bh_new(new_context, qemu_luring_completion_bh, s);
- aio_set_fd_handler(s->aio_context, s->ring.ring_fd, false,
+ aio_set_fd_handler(s->aio_context, s->ring.ring_fd,
qemu_luring_completion_cb, NULL,
qemu_luring_poll_cb, qemu_luring_poll_ready, s);
}
diff --git a/block/iscsi.c b/block/iscsi.c
index 9fc0bed90b..34f97ab646 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -363,7 +363,6 @@ iscsi_set_events(IscsiLun *iscsilun)
if (ev != iscsilun->events) {
aio_set_fd_handler(iscsilun->aio_context, iscsi_get_fd(iscsi),
- false,
(ev & POLLIN) ? iscsi_process_read : NULL,
(ev & POLLOUT) ? iscsi_process_write : NULL,
NULL, NULL,
@@ -1540,7 +1539,7 @@ static void iscsi_detach_aio_context(BlockDriverState *bs)
IscsiLun *iscsilun = bs->opaque;
aio_set_fd_handler(iscsilun->aio_context, iscsi_get_fd(iscsilun->iscsi),
- false, NULL, NULL, NULL, NULL, NULL);
+ NULL, NULL, NULL, NULL, NULL);
iscsilun->events = 0;
if (iscsilun->nop_timer) {
diff --git a/block/linux-aio.c b/block/linux-aio.c
index d2cfb7f523..d4a9e21a11 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -438,7 +438,7 @@ int coroutine_fn laio_co_submit(BlockDriverState *bs, LinuxAioState *s, int fd,
void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context)
{
- aio_set_event_notifier(old_context, &s->e, false, NULL, NULL, NULL);
+ aio_set_event_notifier(old_context, &s->e, NULL, NULL, NULL);
qemu_bh_delete(s->completion_bh);
s->aio_context = NULL;
}
@@ -447,7 +447,7 @@ void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context)
{
s->aio_context = new_context;
s->completion_bh = aio_bh_new(new_context, qemu_laio_completion_bh, s);
- aio_set_event_notifier(new_context, &s->e, false,
+ aio_set_event_notifier(new_context, &s->e,
qemu_laio_completion_cb,
qemu_laio_poll_cb,
qemu_laio_poll_ready);
diff --git a/block/nfs.c b/block/nfs.c
index 006045d71a..8f89ece69f 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -195,7 +195,6 @@ static void nfs_set_events(NFSClient *client)
int ev = nfs_which_events(client->context);
if (ev != client->events) {
aio_set_fd_handler(client->aio_context, nfs_get_fd(client->context),
- false,
(ev & POLLIN) ? nfs_process_read : NULL,
(ev & POLLOUT) ? nfs_process_write : NULL,
NULL, NULL, client);
@@ -373,7 +372,7 @@ static void nfs_detach_aio_context(BlockDriverState *bs)
NFSClient *client = bs->opaque;
aio_set_fd_handler(client->aio_context, nfs_get_fd(client->context),
- false, NULL, NULL, NULL, NULL, NULL);
+ NULL, NULL, NULL, NULL, NULL);
client->events = 0;
}
@@ -391,7 +390,7 @@ static void nfs_client_close(NFSClient *client)
if (client->context) {
qemu_mutex_lock(&client->mutex);
aio_set_fd_handler(client->aio_context, nfs_get_fd(client->context),
- false, NULL, NULL, NULL, NULL, NULL);
+ NULL, NULL, NULL, NULL, NULL);
qemu_mutex_unlock(&client->mutex);
if (client->fh) {
nfs_close(client->context, client->fh);
diff --git a/block/nvme.c b/block/nvme.c
index 5b744c2bda..17937d398d 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -862,7 +862,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
}
aio_set_event_notifier(bdrv_get_aio_context(bs),
&s->irq_notifier[MSIX_SHARED_IRQ_IDX],
- false, nvme_handle_event, nvme_poll_cb,
+ nvme_handle_event, nvme_poll_cb,
nvme_poll_ready);
if (!nvme_identify(bs, namespace, errp)) {
@@ -948,7 +948,7 @@ static void nvme_close(BlockDriverState *bs)
g_free(s->queues);
aio_set_event_notifier(bdrv_get_aio_context(bs),
&s->irq_notifier[MSIX_SHARED_IRQ_IDX],
- false, NULL, NULL, NULL);
+ NULL, NULL, NULL);
event_notifier_cleanup(&s->irq_notifier[MSIX_SHARED_IRQ_IDX]);
qemu_vfio_pci_unmap_bar(s->vfio, 0, s->bar0_wo_map,
0, sizeof(NvmeBar) + NVME_DOORBELL_SIZE);
@@ -1546,7 +1546,7 @@ static void nvme_detach_aio_context(BlockDriverState *bs)
aio_set_event_notifier(bdrv_get_aio_context(bs),
&s->irq_notifier[MSIX_SHARED_IRQ_IDX],
- false, NULL, NULL, NULL);
+ NULL, NULL, NULL);
}
static void nvme_attach_aio_context(BlockDriverState *bs,
@@ -1556,7 +1556,7 @@ static void nvme_attach_aio_context(BlockDriverState *bs,
s->aio_context = new_context;
aio_set_event_notifier(new_context, &s->irq_notifier[MSIX_SHARED_IRQ_IDX],
- false, nvme_handle_event, nvme_poll_cb,
+ nvme_handle_event, nvme_poll_cb,
nvme_poll_ready);
for (unsigned i = 0; i < s->queue_count; i++) {
diff --git a/block/ssh.c b/block/ssh.c
index b3b3352075..2748253d4a 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -1019,7 +1019,7 @@ static void restart_coroutine(void *opaque)
AioContext *ctx = bdrv_get_aio_context(bs);
trace_ssh_restart_coroutine(restart->co);
- aio_set_fd_handler(ctx, s->sock, false, NULL, NULL, NULL, NULL, NULL);
+ aio_set_fd_handler(ctx, s->sock, NULL, NULL, NULL, NULL, NULL);
aio_co_wake(restart->co);
}
@@ -1049,7 +1049,7 @@ static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
trace_ssh_co_yield(s->sock, rd_handler, wr_handler);
aio_set_fd_handler(bdrv_get_aio_context(bs), s->sock,
- false, rd_handler, wr_handler, NULL, NULL, &restart);
+ rd_handler, wr_handler, NULL, NULL, &restart);
qemu_coroutine_yield();
trace_ssh_co_yield_back(s->sock);
}
diff --git a/block/win32-aio.c b/block/win32-aio.c
index ee87d6048f..6327861e1d 100644
--- a/block/win32-aio.c
+++ b/block/win32-aio.c
@@ -174,7 +174,7 @@ int win32_aio_attach(QEMUWin32AIOState *aio, HANDLE hfile)
void win32_aio_detach_aio_context(QEMUWin32AIOState *aio,
AioContext *old_context)
{
- aio_set_event_notifier(old_context, &aio->e, false, NULL, NULL, NULL);
+ aio_set_event_notifier(old_context, &aio->e, NULL, NULL, NULL);
aio->aio_ctx = NULL;
}
@@ -182,8 +182,8 @@ void win32_aio_attach_aio_context(QEMUWin32AIOState *aio,
AioContext *new_context)
{
aio->aio_ctx = new_context;
- aio_set_event_notifier(new_context, &aio->e, false,
- win32_aio_completion_cb, NULL, NULL);
+ aio_set_event_notifier(new_context, &aio->e, win32_aio_completion_cb,
+ NULL, NULL);
}
QEMUWin32AIOState *win32_aio_init(void)
diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c
index 6e81bc8791..0b189c6ab8 100644
--- a/hw/i386/kvm/xen_xenstore.c
+++ b/hw/i386/kvm/xen_xenstore.c
@@ -133,7 +133,7 @@ static void xen_xenstore_realize(DeviceState *dev, Error **errp)
error_setg(errp, "Xenstore evtchn port init failed");
return;
}
- aio_set_fd_handler(qemu_get_aio_context(), xen_be_evtchn_fd(s->eh), false,
+ aio_set_fd_handler(qemu_get_aio_context(), xen_be_evtchn_fd(s->eh),
xen_xenstore_event, NULL, NULL, NULL, s);
s->impl = xs_impl_create(xen_domid);
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 9cdad7e550..d48e240c37 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3491,7 +3491,7 @@ static void virtio_queue_host_notifier_aio_poll_end(EventNotifier *n)
void virtio_queue_aio_attach_host_notifier(VirtQueue *vq, AioContext *ctx)
{
- aio_set_event_notifier(ctx, &vq->host_notifier, false,
+ aio_set_event_notifier(ctx, &vq->host_notifier,
virtio_queue_host_notifier_read,
virtio_queue_host_notifier_aio_poll,
virtio_queue_host_notifier_aio_poll_ready);
@@ -3508,14 +3508,14 @@ void virtio_queue_aio_attach_host_notifier(VirtQueue *vq, AioContext *ctx)
*/
void virtio_queue_aio_attach_host_notifier_no_poll(VirtQueue *vq, AioContext *ctx)
{
- aio_set_event_notifier(ctx, &vq->host_notifier, false,
+ aio_set_event_notifier(ctx, &vq->host_notifier,
virtio_queue_host_notifier_read,
NULL, NULL);
}
void virtio_queue_aio_detach_host_notifier(VirtQueue *vq, AioContext *ctx)
{
- aio_set_event_notifier(ctx, &vq->host_notifier, false, NULL, NULL, NULL);
+ aio_set_event_notifier(ctx, &vq->host_notifier, NULL, NULL, NULL);
/* Test and clear notifier before after disabling event,
* in case poll callback didn't have time to run. */
virtio_queue_host_notifier_read(&vq->host_notifier);
diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
index bf256d4da2..1e08cf027a 100644
--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -842,14 +842,14 @@ void xen_device_set_event_channel_context(XenDevice *xendev,
}
if (channel->ctx)
- aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), false,
+ aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh),
NULL, NULL, NULL, NULL, NULL);
channel->ctx = ctx;
if (ctx) {
aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh),
- false, xen_device_event, NULL, xen_device_poll,
- NULL, channel);
+ xen_device_event, NULL, xen_device_poll, NULL,
+ channel);
}
}
@@ -923,7 +923,7 @@ void xen_device_unbind_event_channel(XenDevice *xendev,
QLIST_REMOVE(channel, list);
- aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), false,
+ aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh),
NULL, NULL, NULL, NULL, NULL);
if (qemu_xen_evtchn_unbind(channel->xeh, channel->local_port) < 0) {
diff --git a/io/channel-command.c b/io/channel-command.c
index e7edd091af..7ed726c802 100644
--- a/io/channel-command.c
+++ b/io/channel-command.c
@@ -337,10 +337,8 @@ static void qio_channel_command_set_aio_fd_handler(QIOChannel *ioc,
void *opaque)
{
QIOChannelCommand *cioc = QIO_CHANNEL_COMMAND(ioc);
- aio_set_fd_handler(ctx, cioc->readfd, false,
- io_read, NULL, NULL, NULL, opaque);
- aio_set_fd_handler(ctx, cioc->writefd, false,
- NULL, io_write, NULL, NULL, opaque);
+ aio_set_fd_handler(ctx, cioc->readfd, io_read, NULL, NULL, NULL, opaque);
+ aio_set_fd_handler(ctx, cioc->writefd, NULL, io_write, NULL, NULL, opaque);
}
diff --git a/io/channel-file.c b/io/channel-file.c
index d76663e6ae..8b5821f452 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -198,8 +198,7 @@ static void qio_channel_file_set_aio_fd_handler(QIOChannel *ioc,
void *opaque)
{
QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
- aio_set_fd_handler(ctx, fioc->fd, false, io_read, io_write,
- NULL, NULL, opaque);
+ aio_set_fd_handler(ctx, fioc->fd, io_read, io_write, NULL, NULL, opaque);
}
static GSource *qio_channel_file_create_watch(QIOChannel *ioc,
diff --git a/io/channel-socket.c b/io/channel-socket.c
index b0ea7d48b3..d99945ebec 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -899,8 +899,7 @@ static void qio_channel_socket_set_aio_fd_handler(QIOChannel *ioc,
void *opaque)
{
QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
- aio_set_fd_handler(ctx, sioc->fd, false,
- io_read, io_write, NULL, NULL, opaque);
+ aio_set_fd_handler(ctx, sioc->fd, io_read, io_write, NULL, NULL, opaque);
}
static GSource *qio_channel_socket_create_watch(QIOChannel *ioc,
diff --git a/migration/rdma.c b/migration/rdma.c
index 0af5e944f0..4149662fc6 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3105,15 +3105,15 @@ static void qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
{
QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
if (io_read) {
- aio_set_fd_handler(ctx, rioc->rdmain->recv_comp_channel->fd,
- false, io_read, io_write, NULL, NULL, opaque);
- aio_set_fd_handler(ctx, rioc->rdmain->send_comp_channel->fd,
- false, io_read, io_write, NULL, NULL, opaque);
+ aio_set_fd_handler(ctx, rioc->rdmain->recv_comp_channel->fd, io_read,
+ io_write, NULL, NULL, opaque);
+ aio_set_fd_handler(ctx, rioc->rdmain->send_comp_channel->fd, io_read,
+ io_write, NULL, NULL, opaque);
} else {
- aio_set_fd_handler(ctx, rioc->rdmaout->recv_comp_channel->fd,
- false, io_read, io_write, NULL, NULL, opaque);
- aio_set_fd_handler(ctx, rioc->rdmaout->send_comp_channel->fd,
- false, io_read, io_write, NULL, NULL, opaque);
+ aio_set_fd_handler(ctx, rioc->rdmaout->recv_comp_channel->fd, io_read,
+ io_write, NULL, NULL, opaque);
+ aio_set_fd_handler(ctx, rioc->rdmaout->send_comp_channel->fd, io_read,
+ io_write, NULL, NULL, opaque);
}
}
diff --git a/tests/unit/test-aio.c b/tests/unit/test-aio.c
index 321d7ab01a..519440eed3 100644
--- a/tests/unit/test-aio.c
+++ b/tests/unit/test-aio.c
@@ -130,7 +130,7 @@ static void *test_acquire_thread(void *opaque)
static void set_event_notifier(AioContext *ctx, EventNotifier *notifier,
EventNotifierHandler *handler)
{
- aio_set_event_notifier(ctx, notifier, false, handler, NULL, NULL);
+ aio_set_event_notifier(ctx, notifier, handler, NULL, NULL);
}
static void dummy_notifier_read(EventNotifier *n)
@@ -383,30 +383,6 @@ static void test_flush_event_notifier(void)
event_notifier_cleanup(&data.e);
}
-static void test_aio_external_client(void)
-{
- int i, j;
-
- for (i = 1; i < 3; i++) {
- EventNotifierTestData data = { .n = 0, .active = 10, .auto_set = true };
- event_notifier_init(&data.e, false);
- aio_set_event_notifier(ctx, &data.e, true, event_ready_cb, NULL, NULL);
- event_notifier_set(&data.e);
- for (j = 0; j < i; j++) {
- aio_disable_external(ctx);
- }
- for (j = 0; j < i; j++) {
- assert(!aio_poll(ctx, false));
- assert(event_notifier_test_and_clear(&data.e));
- event_notifier_set(&data.e);
- aio_enable_external(ctx);
- }
- assert(aio_poll(ctx, false));
- set_event_notifier(ctx, &data.e, NULL);
- event_notifier_cleanup(&data.e);
- }
-}
-
static void test_wait_event_notifier_noflush(void)
{
EventNotifierTestData data = { .n = 0 };
@@ -935,7 +911,6 @@ int main(int argc, char **argv)
g_test_add_func("/aio/event/wait", test_wait_event_notifier);
g_test_add_func("/aio/event/wait/no-flush-cb", test_wait_event_notifier_noflush);
g_test_add_func("/aio/event/flush", test_flush_event_notifier);
- g_test_add_func("/aio/external-client", test_aio_external_client);
g_test_add_func("/aio/timer/schedule", test_timer_schedule);
g_test_add_func("/aio/coroutine/queue-chaining", test_queue_chaining);
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index d9d3807062..5c89169e46 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -435,7 +435,6 @@ static void test_graph_change_drain_all(void)
g_assert_cmpint(bs_b->quiesce_counter, ==, 0);
g_assert_cmpint(b_s->drain_count, ==, 0);
- g_assert_cmpint(qemu_get_aio_context()->external_disable_cnt, ==, 0);
bdrv_unref(bs_b);
blk_unref(blk_b);
diff --git a/tests/unit/test-fdmon-epoll.c b/tests/unit/test-fdmon-epoll.c
deleted file mode 100644
index ef5a856d09..0000000000
--- a/tests/unit/test-fdmon-epoll.c
+++ /dev/null
@@ -1,73 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * fdmon-epoll tests
- *
- * Copyright (c) 2020 Red Hat, Inc.
- */
-
-#include "qemu/osdep.h"
-#include "block/aio.h"
-#include "qapi/error.h"
-#include "qemu/main-loop.h"
-
-static AioContext *ctx;
-
-static void dummy_fd_handler(EventNotifier *notifier)
-{
- event_notifier_test_and_clear(notifier);
-}
-
-static void add_event_notifiers(EventNotifier *notifiers, size_t n)
-{
- for (size_t i = 0; i < n; i++) {
- event_notifier_init(¬ifiers[i], false);
- aio_set_event_notifier(ctx, ¬ifiers[i], false,
- dummy_fd_handler, NULL, NULL);
- }
-}
-
-static void remove_event_notifiers(EventNotifier *notifiers, size_t n)
-{
- for (size_t i = 0; i < n; i++) {
- aio_set_event_notifier(ctx, ¬ifiers[i], false, NULL, NULL, NULL);
- event_notifier_cleanup(¬ifiers[i]);
- }
-}
-
-/* Check that fd handlers work when external clients are disabled */
-static void test_external_disabled(void)
-{
- EventNotifier notifiers[100];
-
- /* fdmon-epoll is only enabled when many fd handlers are registered */
- add_event_notifiers(notifiers, G_N_ELEMENTS(notifiers));
-
- event_notifier_set(¬ifiers[0]);
- assert(aio_poll(ctx, true));
-
- aio_disable_external(ctx);
- event_notifier_set(¬ifiers[0]);
- assert(aio_poll(ctx, true));
- aio_enable_external(ctx);
-
- remove_event_notifiers(notifiers, G_N_ELEMENTS(notifiers));
-}
-
-int main(int argc, char **argv)
-{
- /*
- * This code relies on the fact that fdmon-io_uring disables itself when
- * the glib main loop is in use. The main loop uses fdmon-poll and upgrades
- * to fdmon-epoll when the number of fds exceeds a threshold.
- */
- qemu_init_main_loop(&error_fatal);
- ctx = qemu_get_aio_context();
-
- while (g_main_context_iteration(NULL, false)) {
- /* Do nothing */
- }
-
- g_test_init(&argc, &argv, NULL);
- g_test_add_func("/fdmon-epoll/external-disabled", test_external_disabled);
- return g_test_run();
-}
diff --git a/util/aio-posix.c b/util/aio-posix.c
index a8be940f76..934b1bbb85 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -99,7 +99,6 @@ static bool aio_remove_fd_handler(AioContext *ctx, AioHandler *node)
void aio_set_fd_handler(AioContext *ctx,
int fd,
- bool is_external,
IOHandler *io_read,
IOHandler *io_write,
AioPollFn *io_poll,
@@ -144,7 +143,6 @@ void aio_set_fd_handler(AioContext *ctx,
new_node->io_poll = io_poll;
new_node->io_poll_ready = io_poll_ready;
new_node->opaque = opaque;
- new_node->is_external = is_external;
if (is_new) {
new_node->pfd.fd = fd;
@@ -196,12 +194,11 @@ static void aio_set_fd_poll(AioContext *ctx, int fd,
void aio_set_event_notifier(AioContext *ctx,
EventNotifier *notifier,
- bool is_external,
EventNotifierHandler *io_read,
AioPollFn *io_poll,
EventNotifierHandler *io_poll_ready)
{
- aio_set_fd_handler(ctx, event_notifier_get_fd(notifier), is_external,
+ aio_set_fd_handler(ctx, event_notifier_get_fd(notifier),
(IOHandler *)io_read, NULL, io_poll,
(IOHandler *)io_poll_ready, notifier);
}
@@ -285,13 +282,11 @@ bool aio_pending(AioContext *ctx)
/* TODO should this check poll ready? */
revents = node->pfd.revents & node->pfd.events;
- if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read &&
- aio_node_check(ctx, node->is_external)) {
+ if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read) {
result = true;
break;
}
- if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write &&
- aio_node_check(ctx, node->is_external)) {
+ if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write) {
result = true;
break;
}
@@ -350,9 +345,7 @@ static bool aio_dispatch_handler(AioContext *ctx, AioHandler *node)
QLIST_INSERT_HEAD(&ctx->poll_aio_handlers, node, node_poll);
}
if (!QLIST_IS_INSERTED(node, node_deleted) &&
- poll_ready && revents == 0 &&
- aio_node_check(ctx, node->is_external) &&
- node->io_poll_ready) {
+ poll_ready && revents == 0 && node->io_poll_ready) {
node->io_poll_ready(node->opaque);
/*
@@ -364,7 +357,6 @@ static bool aio_dispatch_handler(AioContext *ctx, AioHandler *node)
if (!QLIST_IS_INSERTED(node, node_deleted) &&
(revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) &&
- aio_node_check(ctx, node->is_external) &&
node->io_read) {
node->io_read(node->opaque);
@@ -375,7 +367,6 @@ static bool aio_dispatch_handler(AioContext *ctx, AioHandler *node)
}
if (!QLIST_IS_INSERTED(node, node_deleted) &&
(revents & (G_IO_OUT | G_IO_ERR)) &&
- aio_node_check(ctx, node->is_external) &&
node->io_write) {
node->io_write(node->opaque);
progress = true;
@@ -436,8 +427,7 @@ static bool run_poll_handlers_once(AioContext *ctx,
AioHandler *tmp;
QLIST_FOREACH_SAFE(node, &ctx->poll_aio_handlers, node_poll, tmp) {
- if (aio_node_check(ctx, node->is_external) &&
- node->io_poll(node->opaque)) {
+ if (node->io_poll(node->opaque)) {
aio_add_poll_ready_handler(ready_list, node);
node->poll_idle_timeout = now + POLL_IDLE_INTERVAL_NS;
diff --git a/util/aio-win32.c b/util/aio-win32.c
index 6bded009a4..948ef47a4d 100644
--- a/util/aio-win32.c
+++ b/util/aio-win32.c
@@ -32,7 +32,6 @@ struct AioHandler {
GPollFD pfd;
int deleted;
void *opaque;
- bool is_external;
QLIST_ENTRY(AioHandler) node;
};
@@ -64,7 +63,6 @@ static void aio_remove_fd_handler(AioContext *ctx, AioHandler *node)
void aio_set_fd_handler(AioContext *ctx,
int fd,
- bool is_external,
IOHandler *io_read,
IOHandler *io_write,
AioPollFn *io_poll,
@@ -111,7 +109,6 @@ void aio_set_fd_handler(AioContext *ctx,
node->opaque = opaque;
node->io_read = io_read;
node->io_write = io_write;
- node->is_external = is_external;
if (io_read) {
bitmask |= FD_READ | FD_ACCEPT | FD_CLOSE;
@@ -135,7 +132,6 @@ void aio_set_fd_handler(AioContext *ctx,
void aio_set_event_notifier(AioContext *ctx,
EventNotifier *e,
- bool is_external,
EventNotifierHandler *io_notify,
AioPollFn *io_poll,
EventNotifierHandler *io_poll_ready)
@@ -161,7 +157,6 @@ void aio_set_event_notifier(AioContext *ctx,
node->e = e;
node->pfd.fd = (uintptr_t)event_notifier_get_handle(e);
node->pfd.events = G_IO_IN;
- node->is_external = is_external;
QLIST_INSERT_HEAD_RCU(&ctx->aio_handlers, node, node);
g_source_add_poll(&ctx->source, &node->pfd);
@@ -368,8 +363,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
/* fill fd sets */
count = 0;
QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
- if (!node->deleted && node->io_notify
- && aio_node_check(ctx, node->is_external)) {
+ if (!node->deleted && node->io_notify) {
assert(count < MAXIMUM_WAIT_OBJECTS);
events[count++] = event_notifier_get_handle(node->e);
}
diff --git a/util/async.c b/util/async.c
index 21016a1ac7..be0726038e 100644
--- a/util/async.c
+++ b/util/async.c
@@ -377,7 +377,7 @@ aio_ctx_finalize(GSource *source)
g_free(bh);
}
- aio_set_event_notifier(ctx, &ctx->notifier, false, NULL, NULL, NULL);
+ aio_set_event_notifier(ctx, &ctx->notifier, NULL, NULL, NULL);
event_notifier_cleanup(&ctx->notifier);
qemu_rec_mutex_destroy(&ctx->lock);
qemu_lockcnt_destroy(&ctx->list_lock);
@@ -561,7 +561,6 @@ AioContext *aio_context_new(Error **errp)
QSLIST_INIT(&ctx->scheduled_coroutines);
aio_set_event_notifier(ctx, &ctx->notifier,
- false,
aio_context_notifier_cb,
aio_context_notifier_poll,
aio_context_notifier_poll_ready);
diff --git a/util/fdmon-epoll.c b/util/fdmon-epoll.c
index 1683aa1105..6b6a1a91f8 100644
--- a/util/fdmon-epoll.c
+++ b/util/fdmon-epoll.c
@@ -64,11 +64,6 @@ static int fdmon_epoll_wait(AioContext *ctx, AioHandlerList *ready_list,
int i, ret = 0;
struct epoll_event events[128];
- /* Fall back while external clients are disabled */
- if (qatomic_read(&ctx->external_disable_cnt)) {
- return fdmon_poll_ops.wait(ctx, ready_list, timeout);
- }
-
if (timeout > 0) {
ret = qemu_poll_ns(&pfd, 1, timeout);
if (ret > 0) {
@@ -133,13 +128,12 @@ bool fdmon_epoll_try_upgrade(AioContext *ctx, unsigned npfd)
return false;
}
- /* Do not upgrade while external clients are disabled */
- if (qatomic_read(&ctx->external_disable_cnt)) {
- return false;
- }
-
- if (npfd < EPOLL_ENABLE_THRESHOLD) {
- return false;
+ if (npfd >= EPOLL_ENABLE_THRESHOLD) {
+ if (fdmon_epoll_try_enable(ctx)) {
+ return true;
+ } else {
+ fdmon_epoll_disable(ctx);
+ }
}
/* The list must not change while we add fds to epoll */
diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c
index ab43052dd7..17ec18b7bd 100644
--- a/util/fdmon-io_uring.c
+++ b/util/fdmon-io_uring.c
@@ -276,11 +276,6 @@ static int fdmon_io_uring_wait(AioContext *ctx, AioHandlerList *ready_list,
unsigned wait_nr = 1; /* block until at least one cqe is ready */
int ret;
- /* Fall back while external clients are disabled */
- if (qatomic_read(&ctx->external_disable_cnt)) {
- return fdmon_poll_ops.wait(ctx, ready_list, timeout);
- }
-
if (timeout == 0) {
wait_nr = 0; /* non-blocking */
} else if (timeout > 0) {
@@ -315,8 +310,7 @@ static bool fdmon_io_uring_need_wait(AioContext *ctx)
return true;
}
- /* Are we falling back to fdmon-poll? */
- return qatomic_read(&ctx->external_disable_cnt);
+ return false;
}
static const FDMonOps fdmon_io_uring_ops = {
diff --git a/util/fdmon-poll.c b/util/fdmon-poll.c
index 5fe3b47865..17df917cf9 100644
--- a/util/fdmon-poll.c
+++ b/util/fdmon-poll.c
@@ -65,8 +65,7 @@ static int fdmon_poll_wait(AioContext *ctx, AioHandlerList *ready_list,
assert(npfd == 0);
QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
- if (!QLIST_IS_INSERTED(node, node_deleted) && node->pfd.events
- && aio_node_check(ctx, node->is_external)) {
+ if (!QLIST_IS_INSERTED(node, node_deleted) && node->pfd.events) {
add_pollfd(node);
}
}
diff --git a/util/main-loop.c b/util/main-loop.c
index e180c85145..3e43a9cd38 100644
--- a/util/main-loop.c
+++ b/util/main-loop.c
@@ -642,14 +642,13 @@ void qemu_set_fd_handler(int fd,
void *opaque)
{
iohandler_init();
- aio_set_fd_handler(iohandler_ctx, fd, false,
- fd_read, fd_write, NULL, NULL, opaque);
+ aio_set_fd_handler(iohandler_ctx, fd, fd_read, fd_write, NULL, NULL,
+ opaque);
}
void event_notifier_set_handler(EventNotifier *e,
EventNotifierHandler *handler)
{
iohandler_init();
- aio_set_event_notifier(iohandler_ctx, e, false,
- handler, NULL, NULL);
+ aio_set_event_notifier(iohandler_ctx, e, handler, NULL, NULL);
}
diff --git a/util/qemu-coroutine-io.c b/util/qemu-coroutine-io.c
index d791932d63..364f4d5abf 100644
--- a/util/qemu-coroutine-io.c
+++ b/util/qemu-coroutine-io.c
@@ -74,8 +74,7 @@ typedef struct {
static void fd_coroutine_enter(void *opaque)
{
FDYieldUntilData *data = opaque;
- aio_set_fd_handler(data->ctx, data->fd, false,
- NULL, NULL, NULL, NULL, NULL);
+ aio_set_fd_handler(data->ctx, data->fd, NULL, NULL, NULL, NULL, NULL);
qemu_coroutine_enter(data->co);
}
@@ -87,7 +86,7 @@ void coroutine_fn yield_until_fd_readable(int fd)
data.ctx = qemu_get_current_aio_context();
data.co = qemu_coroutine_self();
data.fd = fd;
- aio_set_fd_handler(
- data.ctx, fd, false, fd_coroutine_enter, NULL, NULL, NULL, &data);
+ aio_set_fd_handler(data.ctx, fd, fd_coroutine_enter, NULL, NULL, NULL,
+ &data);
qemu_coroutine_yield();
}
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 332aea9306..9ba19121a2 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -278,7 +278,7 @@ set_watch(VuDev *vu_dev, int fd, int vu_evt,
vu_fd_watch->fd = fd;
vu_fd_watch->cb = cb;
qemu_socket_set_nonblock(fd);
- aio_set_fd_handler(server->ioc->ctx, fd, false, kick_handler,
+ aio_set_fd_handler(server->ioc->ctx, fd, kick_handler,
NULL, NULL, NULL, vu_fd_watch);
vu_fd_watch->vu_dev = vu_dev;
vu_fd_watch->pvt = pvt;
@@ -299,8 +299,7 @@ static void remove_watch(VuDev *vu_dev, int fd)
if (!vu_fd_watch) {
return;
}
- aio_set_fd_handler(server->ioc->ctx, fd, false,
- NULL, NULL, NULL, NULL, NULL);
+ aio_set_fd_handler(server->ioc->ctx, fd, NULL, NULL, NULL, NULL, NULL);
QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
g_free(vu_fd_watch);
@@ -362,7 +361,7 @@ void vhost_user_server_stop(VuServer *server)
VuFdWatch *vu_fd_watch;
QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
- aio_set_fd_handler(server->ctx, vu_fd_watch->fd, false,
+ aio_set_fd_handler(server->ctx, vu_fd_watch->fd,
NULL, NULL, NULL, NULL, vu_fd_watch);
}
@@ -403,7 +402,7 @@ void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx)
qio_channel_attach_aio_context(server->ioc, ctx);
QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
- aio_set_fd_handler(ctx, vu_fd_watch->fd, false, kick_handler, NULL,
+ aio_set_fd_handler(ctx, vu_fd_watch->fd, kick_handler, NULL,
NULL, NULL, vu_fd_watch);
}
@@ -417,7 +416,7 @@ void vhost_user_server_detach_aio_context(VuServer *server)
VuFdWatch *vu_fd_watch;
QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
- aio_set_fd_handler(server->ctx, vu_fd_watch->fd, false,
+ aio_set_fd_handler(server->ctx, vu_fd_watch->fd,
NULL, NULL, NULL, NULL, vu_fd_watch);
}
diff --git a/tests/unit/meson.build b/tests/unit/meson.build
index 3bc78d8660..b33298a444 100644
--- a/tests/unit/meson.build
+++ b/tests/unit/meson.build
@@ -122,9 +122,6 @@ if have_block
if nettle.found() or gcrypt.found()
tests += {'test-crypto-pbkdf': [io]}
endif
- if config_host_data.get('CONFIG_EPOLL_CREATE1')
- tests += {'test-fdmon-epoll': [testblock]}
- endif
endif
if have_system
--
2.39.2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v4 04/20] virtio-scsi: stop using aio_disable_external() during unplug
2023-04-25 17:27 ` [PATCH v4 04/20] virtio-scsi: stop using aio_disable_external() during unplug Stefan Hajnoczi
@ 2023-04-28 14:22 ` Kevin Wolf
2023-05-01 15:09 ` Stefan Hajnoczi
0 siblings, 1 reply; 48+ messages in thread
From: Kevin Wolf @ 2023-04-28 14:22 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Daniel P. Berrangé, Juan Quintela,
Julia Suvorova, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Richard W.M. Jones, qemu-block,
Stefano Garzarella, Anthony Perard, Stefan Weil, Xie Yongji,
Paolo Bonzini, Aarushi Mehta, Philippe Mathieu-Daudé,
Eduardo Habkost, Stefano Stabellini, Hanna Reitz, Ronnie Sahlberg,
Zhengui Li, Daniil Tatianin
Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> This patch is part of an effort to remove the aio_disable_external()
> API because it does not fit in a multi-queue block layer world where
> many AioContexts may be submitting requests to the same disk.
>
> The SCSI emulation code is already in good shape to stop using
> aio_disable_external(). It was only used by commit 9c5aad84da1c
> ("virtio-scsi: fixed virtio_scsi_ctx_check failed when detaching scsi
> disk") to ensure that virtio_scsi_hotunplug() works while the guest
> driver is submitting I/O.
>
> Ensure virtio_scsi_hotunplug() is safe as follows:
>
> 1. qdev_simple_device_unplug_cb() -> qdev_unrealize() ->
> device_set_realized() calls qatomic_set(&dev->realized, false) so
> that future scsi_device_get() calls return NULL because they exclude
> SCSIDevices with realized=false.
>
> That means virtio-scsi will reject new I/O requests to this
> SCSIDevice with VIRTIO_SCSI_S_BAD_TARGET even while
> virtio_scsi_hotunplug() is still executing. We are protected against
> new requests!
>
> 2. Add a call to scsi_device_purge_requests() from scsi_unrealize() so
> that in-flight requests are cancelled synchronously. This ensures
> that no in-flight requests remain once qdev_simple_device_unplug_cb()
> returns.
>
> Thanks to these two conditions we don't need aio_disable_external()
> anymore.
>
> Cc: Zhengui Li <lizhengui@huawei.com>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Reviewed-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
qemu-iotests 040 starts failing for me after this patch, with what looks
like a use-after-free error of some kind.
(gdb) bt
#0 0x000055b6e3e1f31c in job_type (job=0xe3e3e3e3e3e3e3e3) at ../job.c:238
#1 0x000055b6e3e1cee5 in is_block_job (job=0xe3e3e3e3e3e3e3e3) at ../blockjob.c:41
#2 0x000055b6e3e1ce7d in block_job_next_locked (bjob=0x55b6e72b7570) at ../blockjob.c:54
#3 0x000055b6e3df6370 in blockdev_mark_auto_del (blk=0x55b6e74af0a0) at ../blockdev.c:157
#4 0x000055b6e393e23b in scsi_qdev_unrealize (qdev=0x55b6e7c04d40) at ../hw/scsi/scsi-bus.c:303
#5 0x000055b6e3db0d0e in device_set_realized (obj=0x55b6e7c04d40, value=false, errp=0x55b6e497c918 <error_abort>) at ../hw/core/qdev.c:599
#6 0x000055b6e3dba36e in property_set_bool (obj=0x55b6e7c04d40, v=0x55b6e7d7f290, name=0x55b6e41bd6d8 "realized", opaque=0x55b6e7246d20, errp=0x55b6e497c918 <error_abort>)
at ../qom/object.c:2285
#7 0x000055b6e3db7e65 in object_property_set (obj=0x55b6e7c04d40, name=0x55b6e41bd6d8 "realized", v=0x55b6e7d7f290, errp=0x55b6e497c918 <error_abort>) at ../qom/object.c:1420
#8 0x000055b6e3dbd84a in object_property_set_qobject (obj=0x55b6e7c04d40, name=0x55b6e41bd6d8 "realized", value=0x55b6e74c1890, errp=0x55b6e497c918 <error_abort>)
at ../qom/qom-qobject.c:28
#9 0x000055b6e3db8570 in object_property_set_bool (obj=0x55b6e7c04d40, name=0x55b6e41bd6d8 "realized", value=false, errp=0x55b6e497c918 <error_abort>) at ../qom/object.c:1489
#10 0x000055b6e3daf2b5 in qdev_unrealize (dev=0x55b6e7c04d40) at ../hw/core/qdev.c:306
#11 0x000055b6e3db509d in qdev_simple_device_unplug_cb (hotplug_dev=0x55b6e81c3630, dev=0x55b6e7c04d40, errp=0x7ffec5519200) at ../hw/core/qdev-hotplug.c:72
#12 0x000055b6e3c520f9 in virtio_scsi_hotunplug (hotplug_dev=0x55b6e81c3630, dev=0x55b6e7c04d40, errp=0x7ffec5519200) at ../hw/scsi/virtio-scsi.c:1065
#13 0x000055b6e3db4dec in hotplug_handler_unplug (plug_handler=0x55b6e81c3630, plugged_dev=0x55b6e7c04d40, errp=0x7ffec5519200) at ../hw/core/hotplug.c:56
#14 0x000055b6e3a28f84 in qdev_unplug (dev=0x55b6e7c04d40, errp=0x7ffec55192e0) at ../softmmu/qdev-monitor.c:935
#15 0x000055b6e3a290fa in qmp_device_del (id=0x55b6e74c1760 "scsi0", errp=0x7ffec55192e0) at ../softmmu/qdev-monitor.c:955
#16 0x000055b6e3fb0a5f in qmp_marshal_device_del (args=0x7f61cc005eb0, ret=0x7f61d5a8ae38, errp=0x7f61d5a8ae40) at qapi/qapi-commands-qdev.c:114
#17 0x000055b6e3fd52e1 in do_qmp_dispatch_bh (opaque=0x7f61d5a8ae08) at ../qapi/qmp-dispatch.c:128
#18 0x000055b6e4007b9e in aio_bh_call (bh=0x55b6e7dea730) at ../util/async.c:155
#19 0x000055b6e4007d2e in aio_bh_poll (ctx=0x55b6e72447c0) at ../util/async.c:184
#20 0x000055b6e3fe3b45 in aio_dispatch (ctx=0x55b6e72447c0) at ../util/aio-posix.c:421
#21 0x000055b6e4009544 in aio_ctx_dispatch (source=0x55b6e72447c0, callback=0x0, user_data=0x0) at ../util/async.c:326
#22 0x00007f61ddc14c7f in g_main_dispatch (context=0x55b6e7244b20) at ../glib/gmain.c:3454
#23 g_main_context_dispatch (context=0x55b6e7244b20) at ../glib/gmain.c:4172
#24 0x000055b6e400a7e8 in glib_pollfds_poll () at ../util/main-loop.c:290
#25 0x000055b6e400a0c2 in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:313
#26 0x000055b6e4009fa2 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:592
#27 0x000055b6e3a3047b in qemu_main_loop () at ../softmmu/runstate.c:731
#28 0x000055b6e3dab27d in qemu_default_main () at ../softmmu/main.c:37
#29 0x000055b6e3dab2b8 in main (argc=24, argv=0x7ffec55196a8) at ../softmmu/main.c:48
(gdb) p jobs
$4 = {lh_first = 0x0}
Kevin
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 04/20] virtio-scsi: stop using aio_disable_external() during unplug
2023-04-28 14:22 ` Kevin Wolf
@ 2023-05-01 15:09 ` Stefan Hajnoczi
2023-05-02 13:19 ` Kevin Wolf
0 siblings, 1 reply; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-05-01 15:09 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-devel, Daniel P. Berrangé, Juan Quintela,
Julia Suvorova, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Richard W.M. Jones, qemu-block,
Stefano Garzarella, Anthony Perard, Stefan Weil, Xie Yongji,
Paolo Bonzini, Aarushi Mehta, Philippe Mathieu-Daudé,
Eduardo Habkost, Stefano Stabellini, Hanna Reitz, Ronnie Sahlberg,
Zhengui Li, Daniil Tatianin
[-- Attachment #1: Type: text/plain, Size: 5606 bytes --]
On Fri, Apr 28, 2023 at 04:22:55PM +0200, Kevin Wolf wrote:
> Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> > This patch is part of an effort to remove the aio_disable_external()
> > API because it does not fit in a multi-queue block layer world where
> > many AioContexts may be submitting requests to the same disk.
> >
> > The SCSI emulation code is already in good shape to stop using
> > aio_disable_external(). It was only used by commit 9c5aad84da1c
> > ("virtio-scsi: fixed virtio_scsi_ctx_check failed when detaching scsi
> > disk") to ensure that virtio_scsi_hotunplug() works while the guest
> > driver is submitting I/O.
> >
> > Ensure virtio_scsi_hotunplug() is safe as follows:
> >
> > 1. qdev_simple_device_unplug_cb() -> qdev_unrealize() ->
> > device_set_realized() calls qatomic_set(&dev->realized, false) so
> > that future scsi_device_get() calls return NULL because they exclude
> > SCSIDevices with realized=false.
> >
> > That means virtio-scsi will reject new I/O requests to this
> > SCSIDevice with VIRTIO_SCSI_S_BAD_TARGET even while
> > virtio_scsi_hotunplug() is still executing. We are protected against
> > new requests!
> >
> > 2. Add a call to scsi_device_purge_requests() from scsi_unrealize() so
> > that in-flight requests are cancelled synchronously. This ensures
> > that no in-flight requests remain once qdev_simple_device_unplug_cb()
> > returns.
> >
> > Thanks to these two conditions we don't need aio_disable_external()
> > anymore.
> >
> > Cc: Zhengui Li <lizhengui@huawei.com>
> > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> > Reviewed-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>
> qemu-iotests 040 starts failing for me after this patch, with what looks
> like a use-after-free error of some kind.
>
> (gdb) bt
> #0 0x000055b6e3e1f31c in job_type (job=0xe3e3e3e3e3e3e3e3) at ../job.c:238
> #1 0x000055b6e3e1cee5 in is_block_job (job=0xe3e3e3e3e3e3e3e3) at ../blockjob.c:41
> #2 0x000055b6e3e1ce7d in block_job_next_locked (bjob=0x55b6e72b7570) at ../blockjob.c:54
> #3 0x000055b6e3df6370 in blockdev_mark_auto_del (blk=0x55b6e74af0a0) at ../blockdev.c:157
> #4 0x000055b6e393e23b in scsi_qdev_unrealize (qdev=0x55b6e7c04d40) at ../hw/scsi/scsi-bus.c:303
> #5 0x000055b6e3db0d0e in device_set_realized (obj=0x55b6e7c04d40, value=false, errp=0x55b6e497c918 <error_abort>) at ../hw/core/qdev.c:599
> #6 0x000055b6e3dba36e in property_set_bool (obj=0x55b6e7c04d40, v=0x55b6e7d7f290, name=0x55b6e41bd6d8 "realized", opaque=0x55b6e7246d20, errp=0x55b6e497c918 <error_abort>)
> at ../qom/object.c:2285
> #7 0x000055b6e3db7e65 in object_property_set (obj=0x55b6e7c04d40, name=0x55b6e41bd6d8 "realized", v=0x55b6e7d7f290, errp=0x55b6e497c918 <error_abort>) at ../qom/object.c:1420
> #8 0x000055b6e3dbd84a in object_property_set_qobject (obj=0x55b6e7c04d40, name=0x55b6e41bd6d8 "realized", value=0x55b6e74c1890, errp=0x55b6e497c918 <error_abort>)
> at ../qom/qom-qobject.c:28
> #9 0x000055b6e3db8570 in object_property_set_bool (obj=0x55b6e7c04d40, name=0x55b6e41bd6d8 "realized", value=false, errp=0x55b6e497c918 <error_abort>) at ../qom/object.c:1489
> #10 0x000055b6e3daf2b5 in qdev_unrealize (dev=0x55b6e7c04d40) at ../hw/core/qdev.c:306
> #11 0x000055b6e3db509d in qdev_simple_device_unplug_cb (hotplug_dev=0x55b6e81c3630, dev=0x55b6e7c04d40, errp=0x7ffec5519200) at ../hw/core/qdev-hotplug.c:72
> #12 0x000055b6e3c520f9 in virtio_scsi_hotunplug (hotplug_dev=0x55b6e81c3630, dev=0x55b6e7c04d40, errp=0x7ffec5519200) at ../hw/scsi/virtio-scsi.c:1065
> #13 0x000055b6e3db4dec in hotplug_handler_unplug (plug_handler=0x55b6e81c3630, plugged_dev=0x55b6e7c04d40, errp=0x7ffec5519200) at ../hw/core/hotplug.c:56
> #14 0x000055b6e3a28f84 in qdev_unplug (dev=0x55b6e7c04d40, errp=0x7ffec55192e0) at ../softmmu/qdev-monitor.c:935
> #15 0x000055b6e3a290fa in qmp_device_del (id=0x55b6e74c1760 "scsi0", errp=0x7ffec55192e0) at ../softmmu/qdev-monitor.c:955
> #16 0x000055b6e3fb0a5f in qmp_marshal_device_del (args=0x7f61cc005eb0, ret=0x7f61d5a8ae38, errp=0x7f61d5a8ae40) at qapi/qapi-commands-qdev.c:114
> #17 0x000055b6e3fd52e1 in do_qmp_dispatch_bh (opaque=0x7f61d5a8ae08) at ../qapi/qmp-dispatch.c:128
> #18 0x000055b6e4007b9e in aio_bh_call (bh=0x55b6e7dea730) at ../util/async.c:155
> #19 0x000055b6e4007d2e in aio_bh_poll (ctx=0x55b6e72447c0) at ../util/async.c:184
> #20 0x000055b6e3fe3b45 in aio_dispatch (ctx=0x55b6e72447c0) at ../util/aio-posix.c:421
> #21 0x000055b6e4009544 in aio_ctx_dispatch (source=0x55b6e72447c0, callback=0x0, user_data=0x0) at ../util/async.c:326
> #22 0x00007f61ddc14c7f in g_main_dispatch (context=0x55b6e7244b20) at ../glib/gmain.c:3454
> #23 g_main_context_dispatch (context=0x55b6e7244b20) at ../glib/gmain.c:4172
> #24 0x000055b6e400a7e8 in glib_pollfds_poll () at ../util/main-loop.c:290
> #25 0x000055b6e400a0c2 in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:313
> #26 0x000055b6e4009fa2 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:592
> #27 0x000055b6e3a3047b in qemu_main_loop () at ../softmmu/runstate.c:731
> #28 0x000055b6e3dab27d in qemu_default_main () at ../softmmu/main.c:37
> #29 0x000055b6e3dab2b8 in main (argc=24, argv=0x7ffec55196a8) at ../softmmu/main.c:48
> (gdb) p jobs
> $4 = {lh_first = 0x0}
I wasn't able to reproduce this with gcc 13.1.1 or clang 16.0.1:
$ tests/qemu-iotests/check -qcow2 040
Any suggestions on how to reproduce the issue?
Thanks,
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 04/20] virtio-scsi: stop using aio_disable_external() during unplug
2023-05-01 15:09 ` Stefan Hajnoczi
@ 2023-05-02 13:19 ` Kevin Wolf
2023-05-02 20:02 ` Stefan Hajnoczi
0 siblings, 1 reply; 48+ messages in thread
From: Kevin Wolf @ 2023-05-02 13:19 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Daniel P. Berrangé, Juan Quintela,
Julia Suvorova, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Richard W.M. Jones, qemu-block,
Stefano Garzarella, Anthony Perard, Stefan Weil, Xie Yongji,
Paolo Bonzini, Aarushi Mehta, Philippe Mathieu-Daudé,
Eduardo Habkost, Stefano Stabellini, Hanna Reitz, Ronnie Sahlberg,
Zhengui Li, Daniil Tatianin
[-- Attachment #1: Type: text/plain, Size: 6132 bytes --]
Am 01.05.2023 um 17:09 hat Stefan Hajnoczi geschrieben:
> On Fri, Apr 28, 2023 at 04:22:55PM +0200, Kevin Wolf wrote:
> > Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> > > This patch is part of an effort to remove the aio_disable_external()
> > > API because it does not fit in a multi-queue block layer world where
> > > many AioContexts may be submitting requests to the same disk.
> > >
> > > The SCSI emulation code is already in good shape to stop using
> > > aio_disable_external(). It was only used by commit 9c5aad84da1c
> > > ("virtio-scsi: fixed virtio_scsi_ctx_check failed when detaching scsi
> > > disk") to ensure that virtio_scsi_hotunplug() works while the guest
> > > driver is submitting I/O.
> > >
> > > Ensure virtio_scsi_hotunplug() is safe as follows:
> > >
> > > 1. qdev_simple_device_unplug_cb() -> qdev_unrealize() ->
> > > device_set_realized() calls qatomic_set(&dev->realized, false) so
> > > that future scsi_device_get() calls return NULL because they exclude
> > > SCSIDevices with realized=false.
> > >
> > > That means virtio-scsi will reject new I/O requests to this
> > > SCSIDevice with VIRTIO_SCSI_S_BAD_TARGET even while
> > > virtio_scsi_hotunplug() is still executing. We are protected against
> > > new requests!
> > >
> > > 2. Add a call to scsi_device_purge_requests() from scsi_unrealize() so
> > > that in-flight requests are cancelled synchronously. This ensures
> > > that no in-flight requests remain once qdev_simple_device_unplug_cb()
> > > returns.
> > >
> > > Thanks to these two conditions we don't need aio_disable_external()
> > > anymore.
> > >
> > > Cc: Zhengui Li <lizhengui@huawei.com>
> > > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> > > Reviewed-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> >
> > qemu-iotests 040 starts failing for me after this patch, with what looks
> > like a use-after-free error of some kind.
> >
> > (gdb) bt
> > #0 0x000055b6e3e1f31c in job_type (job=0xe3e3e3e3e3e3e3e3) at ../job.c:238
> > #1 0x000055b6e3e1cee5 in is_block_job (job=0xe3e3e3e3e3e3e3e3) at ../blockjob.c:41
> > #2 0x000055b6e3e1ce7d in block_job_next_locked (bjob=0x55b6e72b7570) at ../blockjob.c:54
> > #3 0x000055b6e3df6370 in blockdev_mark_auto_del (blk=0x55b6e74af0a0) at ../blockdev.c:157
> > #4 0x000055b6e393e23b in scsi_qdev_unrealize (qdev=0x55b6e7c04d40) at ../hw/scsi/scsi-bus.c:303
> > #5 0x000055b6e3db0d0e in device_set_realized (obj=0x55b6e7c04d40, value=false, errp=0x55b6e497c918 <error_abort>) at ../hw/core/qdev.c:599
> > #6 0x000055b6e3dba36e in property_set_bool (obj=0x55b6e7c04d40, v=0x55b6e7d7f290, name=0x55b6e41bd6d8 "realized", opaque=0x55b6e7246d20, errp=0x55b6e497c918 <error_abort>)
> > at ../qom/object.c:2285
> > #7 0x000055b6e3db7e65 in object_property_set (obj=0x55b6e7c04d40, name=0x55b6e41bd6d8 "realized", v=0x55b6e7d7f290, errp=0x55b6e497c918 <error_abort>) at ../qom/object.c:1420
> > #8 0x000055b6e3dbd84a in object_property_set_qobject (obj=0x55b6e7c04d40, name=0x55b6e41bd6d8 "realized", value=0x55b6e74c1890, errp=0x55b6e497c918 <error_abort>)
> > at ../qom/qom-qobject.c:28
> > #9 0x000055b6e3db8570 in object_property_set_bool (obj=0x55b6e7c04d40, name=0x55b6e41bd6d8 "realized", value=false, errp=0x55b6e497c918 <error_abort>) at ../qom/object.c:1489
> > #10 0x000055b6e3daf2b5 in qdev_unrealize (dev=0x55b6e7c04d40) at ../hw/core/qdev.c:306
> > #11 0x000055b6e3db509d in qdev_simple_device_unplug_cb (hotplug_dev=0x55b6e81c3630, dev=0x55b6e7c04d40, errp=0x7ffec5519200) at ../hw/core/qdev-hotplug.c:72
> > #12 0x000055b6e3c520f9 in virtio_scsi_hotunplug (hotplug_dev=0x55b6e81c3630, dev=0x55b6e7c04d40, errp=0x7ffec5519200) at ../hw/scsi/virtio-scsi.c:1065
> > #13 0x000055b6e3db4dec in hotplug_handler_unplug (plug_handler=0x55b6e81c3630, plugged_dev=0x55b6e7c04d40, errp=0x7ffec5519200) at ../hw/core/hotplug.c:56
> > #14 0x000055b6e3a28f84 in qdev_unplug (dev=0x55b6e7c04d40, errp=0x7ffec55192e0) at ../softmmu/qdev-monitor.c:935
> > #15 0x000055b6e3a290fa in qmp_device_del (id=0x55b6e74c1760 "scsi0", errp=0x7ffec55192e0) at ../softmmu/qdev-monitor.c:955
> > #16 0x000055b6e3fb0a5f in qmp_marshal_device_del (args=0x7f61cc005eb0, ret=0x7f61d5a8ae38, errp=0x7f61d5a8ae40) at qapi/qapi-commands-qdev.c:114
> > #17 0x000055b6e3fd52e1 in do_qmp_dispatch_bh (opaque=0x7f61d5a8ae08) at ../qapi/qmp-dispatch.c:128
> > #18 0x000055b6e4007b9e in aio_bh_call (bh=0x55b6e7dea730) at ../util/async.c:155
> > #19 0x000055b6e4007d2e in aio_bh_poll (ctx=0x55b6e72447c0) at ../util/async.c:184
> > #20 0x000055b6e3fe3b45 in aio_dispatch (ctx=0x55b6e72447c0) at ../util/aio-posix.c:421
> > #21 0x000055b6e4009544 in aio_ctx_dispatch (source=0x55b6e72447c0, callback=0x0, user_data=0x0) at ../util/async.c:326
> > #22 0x00007f61ddc14c7f in g_main_dispatch (context=0x55b6e7244b20) at ../glib/gmain.c:3454
> > #23 g_main_context_dispatch (context=0x55b6e7244b20) at ../glib/gmain.c:4172
> > #24 0x000055b6e400a7e8 in glib_pollfds_poll () at ../util/main-loop.c:290
> > #25 0x000055b6e400a0c2 in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:313
> > #26 0x000055b6e4009fa2 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:592
> > #27 0x000055b6e3a3047b in qemu_main_loop () at ../softmmu/runstate.c:731
> > #28 0x000055b6e3dab27d in qemu_default_main () at ../softmmu/main.c:37
> > #29 0x000055b6e3dab2b8 in main (argc=24, argv=0x7ffec55196a8) at ../softmmu/main.c:48
> > (gdb) p jobs
> > $4 = {lh_first = 0x0}
>
> I wasn't able to reproduce this with gcc 13.1.1 or clang 16.0.1:
>
> $ tests/qemu-iotests/check -qcow2 040
>
> Any suggestions on how to reproduce the issue?
It happens consistently for me with the same command line, both with gcc
and clang.
gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4)
clang version 15.0.7 (Fedora 15.0.7-2.fc37)
Maybe there is a semantic merge conflict? I have applied the series on
top of master (05d50ba2d4) and my block branch (88f81f7bc8).
Kevin
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 03/20] virtio-scsi: avoid race between unplug and transport event
2023-04-25 17:26 ` [PATCH v4 03/20] virtio-scsi: avoid race between unplug and transport event Stefan Hajnoczi
@ 2023-05-02 15:19 ` Kevin Wolf
2023-05-02 18:56 ` Stefan Hajnoczi
0 siblings, 1 reply; 48+ messages in thread
From: Kevin Wolf @ 2023-05-02 15:19 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Daniel P. Berrangé, Juan Quintela,
Julia Suvorova, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Richard W.M. Jones, qemu-block,
Stefano Garzarella, Anthony Perard, Stefan Weil, Xie Yongji,
Paolo Bonzini, Aarushi Mehta, Philippe Mathieu-Daudé,
Eduardo Habkost, Stefano Stabellini, Hanna Reitz, Ronnie Sahlberg,
Daniil Tatianin
Am 25.04.2023 um 19:26 hat Stefan Hajnoczi geschrieben:
> Only report a transport reset event to the guest after the SCSIDevice
> has been unrealized by qdev_simple_device_unplug_cb().
>
> qdev_simple_device_unplug_cb() sets the SCSIDevice's qdev.realized field
> to false so that scsi_device_find/get() no longer see it.
>
> scsi_target_emulate_report_luns() also needs to be updated to filter out
> SCSIDevices that are unrealized.
>
> These changes ensure that the guest driver does not see the SCSIDevice
> that's being unplugged if it responds very quickly to the transport
> reset event.
>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> Reviewed-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> @@ -1082,6 +1073,15 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
> blk_set_aio_context(sd->conf.blk, qemu_get_aio_context(), NULL);
> virtio_scsi_release(s);
> }
> +
> + if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
> + virtio_scsi_acquire(s);
> + virtio_scsi_push_event(s, sd,
> + VIRTIO_SCSI_T_TRANSPORT_RESET,
> + VIRTIO_SCSI_EVT_RESET_REMOVED);
> + scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
> + virtio_scsi_release(s);
> + }
> }
s, sd and s->bus are all unrealized at this point, whereas before this
patch they were still realized. I couldn't find any practical problem
with it, but it made me nervous enough that I thought I should comment
on it at least.
Should we maybe have documentation on these functions that says that
they accept unrealized objects as their parameters?
Kevin
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 06/20] block/export: wait for vhost-user-blk requests when draining
2023-04-25 17:27 ` [PATCH v4 06/20] block/export: wait for vhost-user-blk requests when draining Stefan Hajnoczi
@ 2023-05-02 15:42 ` Kevin Wolf
2023-05-02 19:40 ` Stefan Hajnoczi
0 siblings, 1 reply; 48+ messages in thread
From: Kevin Wolf @ 2023-05-02 15:42 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Daniel P. Berrangé, Juan Quintela,
Julia Suvorova, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Richard W.M. Jones, qemu-block,
Stefano Garzarella, Anthony Perard, Stefan Weil, Xie Yongji,
Paolo Bonzini, Aarushi Mehta, Philippe Mathieu-Daudé,
Eduardo Habkost, Stefano Stabellini, Hanna Reitz, Ronnie Sahlberg
Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> Each vhost-user-blk request runs in a coroutine. When the BlockBackend
> enters a drained section we need to enter a quiescent state. Currently
> any in-flight requests race with bdrv_drained_begin() because it is
> unaware of vhost-user-blk requests.
>
> When blk_co_preadv/pwritev()/etc returns it wakes the
> bdrv_drained_begin() thread but vhost-user-blk request processing has
> not yet finished. The request coroutine continues executing while the
> main loop thread thinks it is in a drained section.
>
> One example where this is unsafe is for blk_set_aio_context() where
> bdrv_drained_begin() is called before .aio_context_detached() and
> .aio_context_attach(). If request coroutines are still running after
> bdrv_drained_begin(), then the AioContext could change underneath them
> and they race with new requests processed in the new AioContext. This
> could lead to virtqueue corruption, for example.
>
> (This example is theoretical, I came across this while reading the
> code and have not tried to reproduce it.)
>
> It's easy to make bdrv_drained_begin() wait for in-flight requests: add
> a .drained_poll() callback that checks the VuServer's in-flight counter.
> VuServer just needs an API that returns true when there are requests in
> flight. The in-flight counter needs to be atomic.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> include/qemu/vhost-user-server.h | 4 +++-
> block/export/vhost-user-blk-server.c | 16 ++++++++++++++++
> util/vhost-user-server.c | 14 ++++++++++----
> 3 files changed, 29 insertions(+), 5 deletions(-)
>
> diff --git a/include/qemu/vhost-user-server.h b/include/qemu/vhost-user-server.h
> index bc0ac9ddb6..b1c1cda886 100644
> --- a/include/qemu/vhost-user-server.h
> +++ b/include/qemu/vhost-user-server.h
> @@ -40,8 +40,9 @@ typedef struct {
> int max_queues;
> const VuDevIface *vu_iface;
>
> + unsigned int in_flight; /* atomic */
> +
> /* Protected by ctx lock */
> - unsigned int in_flight;
> bool wait_idle;
> VuDev vu_dev;
> QIOChannel *ioc; /* The I/O channel with the client */
> @@ -62,6 +63,7 @@ void vhost_user_server_stop(VuServer *server);
>
> void vhost_user_server_inc_in_flight(VuServer *server);
> void vhost_user_server_dec_in_flight(VuServer *server);
> +bool vhost_user_server_has_in_flight(VuServer *server);
>
> void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx);
> void vhost_user_server_detach_aio_context(VuServer *server);
> diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
> index 841acb36e3..092b86aae4 100644
> --- a/block/export/vhost-user-blk-server.c
> +++ b/block/export/vhost-user-blk-server.c
> @@ -272,7 +272,20 @@ static void vu_blk_exp_resize(void *opaque)
> vu_config_change_msg(&vexp->vu_server.vu_dev);
> }
>
> +/*
> + * Ensures that bdrv_drained_begin() waits until in-flight requests complete.
> + *
> + * Called with vexp->export.ctx acquired.
> + */
> +static bool vu_blk_drained_poll(void *opaque)
> +{
> + VuBlkExport *vexp = opaque;
> +
> + return vhost_user_server_has_in_flight(&vexp->vu_server);
> +}
> +
> static const BlockDevOps vu_blk_dev_ops = {
> + .drained_poll = vu_blk_drained_poll,
> .resize_cb = vu_blk_exp_resize,
> };
You're adding a new function pointer to an existing BlockDevOps...
> @@ -314,6 +327,7 @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
> vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
> logical_block_size, num_queues);
>
> + blk_set_dev_ops(exp->blk, &vu_blk_dev_ops, vexp);
> blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
> vexp);
>
> blk_set_dev_ops(exp->blk, &vu_blk_dev_ops, vexp);
..but still add a second blk_set_dev_ops(). Maybe a bad merge conflict
resolution with commit ca858a5fe94?
> @@ -323,6 +337,7 @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
> num_queues, &vu_blk_iface, errp)) {
> blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
> blk_aio_detach, vexp);
> + blk_set_dev_ops(exp->blk, NULL, NULL);
> g_free(vexp->handler.serial);
> return -EADDRNOTAVAIL;
> }
> @@ -336,6 +351,7 @@ static void vu_blk_exp_delete(BlockExport *exp)
>
> blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
> vexp);
> + blk_set_dev_ops(exp->blk, NULL, NULL);
> g_free(vexp->handler.serial);
> }
These two hunks are then probably already fixes for ca858a5fe94 and
should be a separate patch if so.
> diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
> index 1622f8cfb3..2e6b640050 100644
> --- a/util/vhost-user-server.c
> +++ b/util/vhost-user-server.c
> @@ -78,17 +78,23 @@ static void panic_cb(VuDev *vu_dev, const char *buf)
> void vhost_user_server_inc_in_flight(VuServer *server)
> {
> assert(!server->wait_idle);
> - server->in_flight++;
> + qatomic_inc(&server->in_flight);
> }
>
> void vhost_user_server_dec_in_flight(VuServer *server)
> {
> - server->in_flight--;
> - if (server->wait_idle && !server->in_flight) {
> - aio_co_wake(server->co_trip);
> + if (qatomic_fetch_dec(&server->in_flight) == 1) {
> + if (server->wait_idle) {
> + aio_co_wake(server->co_trip);
> + }
> }
> }
>
> +bool vhost_user_server_has_in_flight(VuServer *server)
> +{
> + return qatomic_load_acquire(&server->in_flight) > 0;
> +}
> +
Any reason why you left the server->in_flight accesses in
vu_client_trip() non-atomic?
Kevin
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 07/20] block/export: stop using is_external in vhost-user-blk server
2023-04-25 17:27 ` [PATCH v4 07/20] block/export: stop using is_external in vhost-user-blk server Stefan Hajnoczi
@ 2023-05-02 16:04 ` Kevin Wolf
2023-05-02 20:06 ` Stefan Hajnoczi
0 siblings, 1 reply; 48+ messages in thread
From: Kevin Wolf @ 2023-05-02 16:04 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Daniel P. Berrangé, Juan Quintela,
Julia Suvorova, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Richard W.M. Jones, qemu-block,
Stefano Garzarella, Anthony Perard, Stefan Weil, Xie Yongji,
Paolo Bonzini, Aarushi Mehta, Philippe Mathieu-Daudé,
Eduardo Habkost, Stefano Stabellini, Hanna Reitz, Ronnie Sahlberg
Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> vhost-user activity must be suspended during bdrv_drained_begin/end().
> This prevents new requests from interfering with whatever is happening
> in the drained section.
>
> Previously this was done using aio_set_fd_handler()'s is_external
> argument. In a multi-queue block layer world the aio_disable_external()
> API cannot be used since multiple AioContext may be processing I/O, not
> just one.
>
> Switch to BlockDevOps->drained_begin/end() callbacks.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> block/export/vhost-user-blk-server.c | 43 ++++++++++++++--------------
> util/vhost-user-server.c | 10 +++----
> 2 files changed, 26 insertions(+), 27 deletions(-)
>
> diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
> index 092b86aae4..d20f69cd74 100644
> --- a/block/export/vhost-user-blk-server.c
> +++ b/block/export/vhost-user-blk-server.c
> @@ -208,22 +208,6 @@ static const VuDevIface vu_blk_iface = {
> .process_msg = vu_blk_process_msg,
> };
>
> -static void blk_aio_attached(AioContext *ctx, void *opaque)
> -{
> - VuBlkExport *vexp = opaque;
> -
> - vexp->export.ctx = ctx;
> - vhost_user_server_attach_aio_context(&vexp->vu_server, ctx);
> -}
> -
> -static void blk_aio_detach(void *opaque)
> -{
> - VuBlkExport *vexp = opaque;
> -
> - vhost_user_server_detach_aio_context(&vexp->vu_server);
> - vexp->export.ctx = NULL;
> -}
So for changing the AioContext, we now rely on the fact that the node to
be changed is always drained, so the drain callbacks implicitly cover
this case, too?
> static void
> vu_blk_initialize_config(BlockDriverState *bs,
> struct virtio_blk_config *config,
> @@ -272,6 +256,25 @@ static void vu_blk_exp_resize(void *opaque)
> vu_config_change_msg(&vexp->vu_server.vu_dev);
> }
>
> +/* Called with vexp->export.ctx acquired */
> +static void vu_blk_drained_begin(void *opaque)
> +{
> + VuBlkExport *vexp = opaque;
> +
> + vhost_user_server_detach_aio_context(&vexp->vu_server);
> +}
Compared to the old code, we're losing the vexp->export.ctx = NULL. This
is correct at this point because after drained_begin we still keep
processing requests until we arrive at a quiescent state.
However, if we detach the AioContext because we're deleting the
iothread, won't we end up with a dangling pointer in vexp->export.ctx?
Or can we be certain that nothing interesting happens before drained_end
updates it with a new valid pointer again?
Kevin
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 10/20] block: drain from main loop thread in bdrv_co_yield_to_drain()
2023-04-25 17:27 ` [PATCH v4 10/20] block: drain from main loop thread in bdrv_co_yield_to_drain() Stefan Hajnoczi
@ 2023-05-02 16:21 ` Kevin Wolf
2023-05-02 20:10 ` Stefan Hajnoczi
0 siblings, 1 reply; 48+ messages in thread
From: Kevin Wolf @ 2023-05-02 16:21 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Daniel P. Berrangé, Juan Quintela,
Julia Suvorova, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Richard W.M. Jones, qemu-block,
Stefano Garzarella, Anthony Perard, Stefan Weil, Xie Yongji,
Paolo Bonzini, Aarushi Mehta, Philippe Mathieu-Daudé,
Eduardo Habkost, Stefano Stabellini, Hanna Reitz, Ronnie Sahlberg
Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> For simplicity, always run BlockDevOps .drained_begin/end/poll()
> callbacks in the main loop thread. This makes it easier to implement the
> callbacks and avoids extra locks.
>
> Move the function pointer declarations from the I/O Code section to the
> Global State section in block-backend-common.h.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
If we're updating function pointers, we should probably update them in
BdrvChildClass and BlockDriver, too.
This means that a non-coroutine caller can't run in an iothread, not
even the home iothread of the BlockDriverState. (I'm not sure if it was
allowed previously. I don't think we're actually doing this, but in
theory it could have worked.) Maybe put a GLOBAL_STATE_CODE() after
handling the bdrv_co_yield_to_drain() case? Or would that look too odd?
IO_OR_GS_CODE();
if (qemu_in_coroutine()) {
bdrv_co_yield_to_drain(bs, true, parent, poll);
return;
}
GLOBAL_STATE_CODE();
Kevin
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 03/20] virtio-scsi: avoid race between unplug and transport event
2023-05-02 15:19 ` Kevin Wolf
@ 2023-05-02 18:56 ` Stefan Hajnoczi
2023-05-03 8:00 ` Kevin Wolf
0 siblings, 1 reply; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-05-02 18:56 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-devel, Daniel P. Berrangé, Juan Quintela,
Julia Suvorova, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Richard W.M. Jones, qemu-block,
Stefano Garzarella, Anthony Perard, Stefan Weil, Xie Yongji,
Paolo Bonzini, Aarushi Mehta, Philippe Mathieu-Daudé,
Eduardo Habkost, Stefano Stabellini, Hanna Reitz, Ronnie Sahlberg,
Daniil Tatianin
[-- Attachment #1: Type: text/plain, Size: 2612 bytes --]
On Tue, May 02, 2023 at 05:19:46PM +0200, Kevin Wolf wrote:
> Am 25.04.2023 um 19:26 hat Stefan Hajnoczi geschrieben:
> > Only report a transport reset event to the guest after the SCSIDevice
> > has been unrealized by qdev_simple_device_unplug_cb().
> >
> > qdev_simple_device_unplug_cb() sets the SCSIDevice's qdev.realized field
> > to false so that scsi_device_find/get() no longer see it.
> >
> > scsi_target_emulate_report_luns() also needs to be updated to filter out
> > SCSIDevices that are unrealized.
> >
> > These changes ensure that the guest driver does not see the SCSIDevice
> > that's being unplugged if it responds very quickly to the transport
> > reset event.
> >
> > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > Reviewed-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>
> > @@ -1082,6 +1073,15 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > blk_set_aio_context(sd->conf.blk, qemu_get_aio_context(), NULL);
> > virtio_scsi_release(s);
> > }
> > +
> > + if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
> > + virtio_scsi_acquire(s);
> > + virtio_scsi_push_event(s, sd,
> > + VIRTIO_SCSI_T_TRANSPORT_RESET,
> > + VIRTIO_SCSI_EVT_RESET_REMOVED);
> > + scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
> > + virtio_scsi_release(s);
> > + }
> > }
>
> s, sd and s->bus are all unrealized at this point, whereas before this
> patch they were still realized. I couldn't find any practical problem
> with it, but it made me nervous enough that I thought I should comment
> on it at least.
>
> Should we maybe have documentation on these functions that says that
> they accept unrealized objects as their parameters?
s is the VirtIOSCSI controller, not the SCSIDevice that is being
unplugged. The VirtIOSCSI controller is still realized.
s->bus is the VirtIOSCSI controller's bus, it is still realized.
You are right that the SCSIDevice (sd) has been unrealized at this
point:
- sd->conf.blk is safe because qdev properties stay alive the
Object is deleted, but I'm not sure we should rely on that.
- virti_scsi_push_event(.., sd, ...) is questionable because the LUN
that's fetched from sd no longer belongs to the unplugged SCSIDevice.
How about I change the code to fetch sd->conf.blk and the LUN before
unplugging?
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 06/20] block/export: wait for vhost-user-blk requests when draining
2023-05-02 15:42 ` Kevin Wolf
@ 2023-05-02 19:40 ` Stefan Hajnoczi
0 siblings, 0 replies; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-05-02 19:40 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-devel, Daniel P. Berrangé, Juan Quintela,
Julia Suvorova, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Richard W.M. Jones, qemu-block,
Stefano Garzarella, Anthony Perard, Stefan Weil, Xie Yongji,
Paolo Bonzini, Aarushi Mehta, Philippe Mathieu-Daudé,
Eduardo Habkost, Stefano Stabellini, Hanna Reitz, Ronnie Sahlberg
[-- Attachment #1: Type: text/plain, Size: 6898 bytes --]
On Tue, May 02, 2023 at 05:42:51PM +0200, Kevin Wolf wrote:
> Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> > Each vhost-user-blk request runs in a coroutine. When the BlockBackend
> > enters a drained section we need to enter a quiescent state. Currently
> > any in-flight requests race with bdrv_drained_begin() because it is
> > unaware of vhost-user-blk requests.
> >
> > When blk_co_preadv/pwritev()/etc returns it wakes the
> > bdrv_drained_begin() thread but vhost-user-blk request processing has
> > not yet finished. The request coroutine continues executing while the
> > main loop thread thinks it is in a drained section.
> >
> > One example where this is unsafe is for blk_set_aio_context() where
> > bdrv_drained_begin() is called before .aio_context_detached() and
> > .aio_context_attach(). If request coroutines are still running after
> > bdrv_drained_begin(), then the AioContext could change underneath them
> > and they race with new requests processed in the new AioContext. This
> > could lead to virtqueue corruption, for example.
> >
> > (This example is theoretical, I came across this while reading the
> > code and have not tried to reproduce it.)
> >
> > It's easy to make bdrv_drained_begin() wait for in-flight requests: add
> > a .drained_poll() callback that checks the VuServer's in-flight counter.
> > VuServer just needs an API that returns true when there are requests in
> > flight. The in-flight counter needs to be atomic.
> >
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> > include/qemu/vhost-user-server.h | 4 +++-
> > block/export/vhost-user-blk-server.c | 16 ++++++++++++++++
> > util/vhost-user-server.c | 14 ++++++++++----
> > 3 files changed, 29 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/qemu/vhost-user-server.h b/include/qemu/vhost-user-server.h
> > index bc0ac9ddb6..b1c1cda886 100644
> > --- a/include/qemu/vhost-user-server.h
> > +++ b/include/qemu/vhost-user-server.h
> > @@ -40,8 +40,9 @@ typedef struct {
> > int max_queues;
> > const VuDevIface *vu_iface;
> >
> > + unsigned int in_flight; /* atomic */
> > +
> > /* Protected by ctx lock */
> > - unsigned int in_flight;
> > bool wait_idle;
> > VuDev vu_dev;
> > QIOChannel *ioc; /* The I/O channel with the client */
> > @@ -62,6 +63,7 @@ void vhost_user_server_stop(VuServer *server);
> >
> > void vhost_user_server_inc_in_flight(VuServer *server);
> > void vhost_user_server_dec_in_flight(VuServer *server);
> > +bool vhost_user_server_has_in_flight(VuServer *server);
> >
> > void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx);
> > void vhost_user_server_detach_aio_context(VuServer *server);
> > diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
> > index 841acb36e3..092b86aae4 100644
> > --- a/block/export/vhost-user-blk-server.c
> > +++ b/block/export/vhost-user-blk-server.c
> > @@ -272,7 +272,20 @@ static void vu_blk_exp_resize(void *opaque)
> > vu_config_change_msg(&vexp->vu_server.vu_dev);
> > }
> >
> > +/*
> > + * Ensures that bdrv_drained_begin() waits until in-flight requests complete.
> > + *
> > + * Called with vexp->export.ctx acquired.
> > + */
> > +static bool vu_blk_drained_poll(void *opaque)
> > +{
> > + VuBlkExport *vexp = opaque;
> > +
> > + return vhost_user_server_has_in_flight(&vexp->vu_server);
> > +}
> > +
> > static const BlockDevOps vu_blk_dev_ops = {
> > + .drained_poll = vu_blk_drained_poll,
> > .resize_cb = vu_blk_exp_resize,
> > };
>
> You're adding a new function pointer to an existing BlockDevOps...
>
> > @@ -314,6 +327,7 @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
> > vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
> > logical_block_size, num_queues);
> >
> > + blk_set_dev_ops(exp->blk, &vu_blk_dev_ops, vexp);
> > blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
> > vexp);
> >
> > blk_set_dev_ops(exp->blk, &vu_blk_dev_ops, vexp);
>
> ..but still add a second blk_set_dev_ops(). Maybe a bad merge conflict
> resolution with commit ca858a5fe94?
Thanks, I probably didn't have ca858a5fe94 in my tree when writing this
code.
> > @@ -323,6 +337,7 @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
> > num_queues, &vu_blk_iface, errp)) {
> > blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
> > blk_aio_detach, vexp);
> > + blk_set_dev_ops(exp->blk, NULL, NULL);
> > g_free(vexp->handler.serial);
> > return -EADDRNOTAVAIL;
> > }
> > @@ -336,6 +351,7 @@ static void vu_blk_exp_delete(BlockExport *exp)
> >
> > blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
> > vexp);
> > + blk_set_dev_ops(exp->blk, NULL, NULL);
> > g_free(vexp->handler.serial);
> > }
>
> These two hunks are then probably already fixes for ca858a5fe94 and
> should be a separate patch if so.
Sure, I can split them out.
hw/ doesn't need to call blk_set_dev_ops(blk, NULL, NULL) because
hw/core/qdev-properties-system.c:release_drive() -> blk_detach_dev()
does it automatically, but block/export does. It's easy to overlook and
that's probably why ca858a5fe94 didn't include it.
> > diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
> > index 1622f8cfb3..2e6b640050 100644
> > --- a/util/vhost-user-server.c
> > +++ b/util/vhost-user-server.c
> > @@ -78,17 +78,23 @@ static void panic_cb(VuDev *vu_dev, const char *buf)
> > void vhost_user_server_inc_in_flight(VuServer *server)
> > {
> > assert(!server->wait_idle);
> > - server->in_flight++;
> > + qatomic_inc(&server->in_flight);
> > }
> >
> > void vhost_user_server_dec_in_flight(VuServer *server)
> > {
> > - server->in_flight--;
> > - if (server->wait_idle && !server->in_flight) {
> > - aio_co_wake(server->co_trip);
> > + if (qatomic_fetch_dec(&server->in_flight) == 1) {
> > + if (server->wait_idle) {
> > + aio_co_wake(server->co_trip);
> > + }
> > }
> > }
> >
> > +bool vhost_user_server_has_in_flight(VuServer *server)
> > +{
> > + return qatomic_load_acquire(&server->in_flight) > 0;
> > +}
> > +
>
> Any reason why you left the server->in_flight accesses in
> vu_client_trip() non-atomic?
I don't remember if it was a mistake or if there is a reason why it's
safe. I'll replace those accesses with calls to
vhost_user_server_has_in_flight().
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 04/20] virtio-scsi: stop using aio_disable_external() during unplug
2023-05-02 13:19 ` Kevin Wolf
@ 2023-05-02 20:02 ` Stefan Hajnoczi
2023-05-03 11:40 ` Kevin Wolf
0 siblings, 1 reply; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-05-02 20:02 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-devel, Daniel P. Berrangé, Juan Quintela,
Julia Suvorova, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Richard W.M. Jones, qemu-block,
Stefano Garzarella, Anthony Perard, Stefan Weil, Xie Yongji,
Paolo Bonzini, Aarushi Mehta, Philippe Mathieu-Daudé,
Eduardo Habkost, Stefano Stabellini, Hanna Reitz, Ronnie Sahlberg,
Zhengui Li, Daniil Tatianin
[-- Attachment #1: Type: text/plain, Size: 6718 bytes --]
On Tue, May 02, 2023 at 03:19:52PM +0200, Kevin Wolf wrote:
> Am 01.05.2023 um 17:09 hat Stefan Hajnoczi geschrieben:
> > On Fri, Apr 28, 2023 at 04:22:55PM +0200, Kevin Wolf wrote:
> > > Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> > > > This patch is part of an effort to remove the aio_disable_external()
> > > > API because it does not fit in a multi-queue block layer world where
> > > > many AioContexts may be submitting requests to the same disk.
> > > >
> > > > The SCSI emulation code is already in good shape to stop using
> > > > aio_disable_external(). It was only used by commit 9c5aad84da1c
> > > > ("virtio-scsi: fixed virtio_scsi_ctx_check failed when detaching scsi
> > > > disk") to ensure that virtio_scsi_hotunplug() works while the guest
> > > > driver is submitting I/O.
> > > >
> > > > Ensure virtio_scsi_hotunplug() is safe as follows:
> > > >
> > > > 1. qdev_simple_device_unplug_cb() -> qdev_unrealize() ->
> > > > device_set_realized() calls qatomic_set(&dev->realized, false) so
> > > > that future scsi_device_get() calls return NULL because they exclude
> > > > SCSIDevices with realized=false.
> > > >
> > > > That means virtio-scsi will reject new I/O requests to this
> > > > SCSIDevice with VIRTIO_SCSI_S_BAD_TARGET even while
> > > > virtio_scsi_hotunplug() is still executing. We are protected against
> > > > new requests!
> > > >
> > > > 2. Add a call to scsi_device_purge_requests() from scsi_unrealize() so
> > > > that in-flight requests are cancelled synchronously. This ensures
> > > > that no in-flight requests remain once qdev_simple_device_unplug_cb()
> > > > returns.
> > > >
> > > > Thanks to these two conditions we don't need aio_disable_external()
> > > > anymore.
> > > >
> > > > Cc: Zhengui Li <lizhengui@huawei.com>
> > > > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> > > > Reviewed-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
> > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > >
> > > qemu-iotests 040 starts failing for me after this patch, with what looks
> > > like a use-after-free error of some kind.
> > >
> > > (gdb) bt
> > > #0 0x000055b6e3e1f31c in job_type (job=0xe3e3e3e3e3e3e3e3) at ../job.c:238
> > > #1 0x000055b6e3e1cee5 in is_block_job (job=0xe3e3e3e3e3e3e3e3) at ../blockjob.c:41
> > > #2 0x000055b6e3e1ce7d in block_job_next_locked (bjob=0x55b6e72b7570) at ../blockjob.c:54
> > > #3 0x000055b6e3df6370 in blockdev_mark_auto_del (blk=0x55b6e74af0a0) at ../blockdev.c:157
> > > #4 0x000055b6e393e23b in scsi_qdev_unrealize (qdev=0x55b6e7c04d40) at ../hw/scsi/scsi-bus.c:303
> > > #5 0x000055b6e3db0d0e in device_set_realized (obj=0x55b6e7c04d40, value=false, errp=0x55b6e497c918 <error_abort>) at ../hw/core/qdev.c:599
> > > #6 0x000055b6e3dba36e in property_set_bool (obj=0x55b6e7c04d40, v=0x55b6e7d7f290, name=0x55b6e41bd6d8 "realized", opaque=0x55b6e7246d20, errp=0x55b6e497c918 <error_abort>)
> > > at ../qom/object.c:2285
> > > #7 0x000055b6e3db7e65 in object_property_set (obj=0x55b6e7c04d40, name=0x55b6e41bd6d8 "realized", v=0x55b6e7d7f290, errp=0x55b6e497c918 <error_abort>) at ../qom/object.c:1420
> > > #8 0x000055b6e3dbd84a in object_property_set_qobject (obj=0x55b6e7c04d40, name=0x55b6e41bd6d8 "realized", value=0x55b6e74c1890, errp=0x55b6e497c918 <error_abort>)
> > > at ../qom/qom-qobject.c:28
> > > #9 0x000055b6e3db8570 in object_property_set_bool (obj=0x55b6e7c04d40, name=0x55b6e41bd6d8 "realized", value=false, errp=0x55b6e497c918 <error_abort>) at ../qom/object.c:1489
> > > #10 0x000055b6e3daf2b5 in qdev_unrealize (dev=0x55b6e7c04d40) at ../hw/core/qdev.c:306
> > > #11 0x000055b6e3db509d in qdev_simple_device_unplug_cb (hotplug_dev=0x55b6e81c3630, dev=0x55b6e7c04d40, errp=0x7ffec5519200) at ../hw/core/qdev-hotplug.c:72
> > > #12 0x000055b6e3c520f9 in virtio_scsi_hotunplug (hotplug_dev=0x55b6e81c3630, dev=0x55b6e7c04d40, errp=0x7ffec5519200) at ../hw/scsi/virtio-scsi.c:1065
> > > #13 0x000055b6e3db4dec in hotplug_handler_unplug (plug_handler=0x55b6e81c3630, plugged_dev=0x55b6e7c04d40, errp=0x7ffec5519200) at ../hw/core/hotplug.c:56
> > > #14 0x000055b6e3a28f84 in qdev_unplug (dev=0x55b6e7c04d40, errp=0x7ffec55192e0) at ../softmmu/qdev-monitor.c:935
> > > #15 0x000055b6e3a290fa in qmp_device_del (id=0x55b6e74c1760 "scsi0", errp=0x7ffec55192e0) at ../softmmu/qdev-monitor.c:955
> > > #16 0x000055b6e3fb0a5f in qmp_marshal_device_del (args=0x7f61cc005eb0, ret=0x7f61d5a8ae38, errp=0x7f61d5a8ae40) at qapi/qapi-commands-qdev.c:114
> > > #17 0x000055b6e3fd52e1 in do_qmp_dispatch_bh (opaque=0x7f61d5a8ae08) at ../qapi/qmp-dispatch.c:128
> > > #18 0x000055b6e4007b9e in aio_bh_call (bh=0x55b6e7dea730) at ../util/async.c:155
> > > #19 0x000055b6e4007d2e in aio_bh_poll (ctx=0x55b6e72447c0) at ../util/async.c:184
> > > #20 0x000055b6e3fe3b45 in aio_dispatch (ctx=0x55b6e72447c0) at ../util/aio-posix.c:421
> > > #21 0x000055b6e4009544 in aio_ctx_dispatch (source=0x55b6e72447c0, callback=0x0, user_data=0x0) at ../util/async.c:326
> > > #22 0x00007f61ddc14c7f in g_main_dispatch (context=0x55b6e7244b20) at ../glib/gmain.c:3454
> > > #23 g_main_context_dispatch (context=0x55b6e7244b20) at ../glib/gmain.c:4172
> > > #24 0x000055b6e400a7e8 in glib_pollfds_poll () at ../util/main-loop.c:290
> > > #25 0x000055b6e400a0c2 in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:313
> > > #26 0x000055b6e4009fa2 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:592
> > > #27 0x000055b6e3a3047b in qemu_main_loop () at ../softmmu/runstate.c:731
> > > #28 0x000055b6e3dab27d in qemu_default_main () at ../softmmu/main.c:37
> > > #29 0x000055b6e3dab2b8 in main (argc=24, argv=0x7ffec55196a8) at ../softmmu/main.c:48
> > > (gdb) p jobs
> > > $4 = {lh_first = 0x0}
> >
> > I wasn't able to reproduce this with gcc 13.1.1 or clang 16.0.1:
> >
> > $ tests/qemu-iotests/check -qcow2 040
> >
> > Any suggestions on how to reproduce the issue?
>
> It happens consistently for me with the same command line, both with gcc
> and clang.
>
> gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4)
> clang version 15.0.7 (Fedora 15.0.7-2.fc37)
>
> Maybe there is a semantic merge conflict? I have applied the series on
> top of master (05d50ba2d4) and my block branch (88f81f7bc8).
I can't find 88f81f7bc8 but rebased on repo.or.cz/qemu/kevin.git block
(4514dac7f2e9) and the test passes here.
I rebased on qemu.git/master (05d50ba2d4) and it also passes.
Please let me know if the following tree (a0ff680a72f6) works on your
machine:
https://gitlab.com/stefanha/qemu/-/tree/remove-aio_disable_external
Thanks,
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 07/20] block/export: stop using is_external in vhost-user-blk server
2023-05-02 16:04 ` Kevin Wolf
@ 2023-05-02 20:06 ` Stefan Hajnoczi
2023-05-03 8:08 ` Kevin Wolf
0 siblings, 1 reply; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-05-02 20:06 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-devel, Daniel P. Berrangé, Juan Quintela,
Julia Suvorova, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Richard W.M. Jones, qemu-block,
Stefano Garzarella, Anthony Perard, Stefan Weil, Xie Yongji,
Paolo Bonzini, Aarushi Mehta, Philippe Mathieu-Daudé,
Eduardo Habkost, Stefano Stabellini, Hanna Reitz, Ronnie Sahlberg
[-- Attachment #1: Type: text/plain, Size: 2962 bytes --]
On Tue, May 02, 2023 at 06:04:24PM +0200, Kevin Wolf wrote:
> Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> > vhost-user activity must be suspended during bdrv_drained_begin/end().
> > This prevents new requests from interfering with whatever is happening
> > in the drained section.
> >
> > Previously this was done using aio_set_fd_handler()'s is_external
> > argument. In a multi-queue block layer world the aio_disable_external()
> > API cannot be used since multiple AioContext may be processing I/O, not
> > just one.
> >
> > Switch to BlockDevOps->drained_begin/end() callbacks.
> >
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> > block/export/vhost-user-blk-server.c | 43 ++++++++++++++--------------
> > util/vhost-user-server.c | 10 +++----
> > 2 files changed, 26 insertions(+), 27 deletions(-)
> >
> > diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
> > index 092b86aae4..d20f69cd74 100644
> > --- a/block/export/vhost-user-blk-server.c
> > +++ b/block/export/vhost-user-blk-server.c
> > @@ -208,22 +208,6 @@ static const VuDevIface vu_blk_iface = {
> > .process_msg = vu_blk_process_msg,
> > };
> >
> > -static void blk_aio_attached(AioContext *ctx, void *opaque)
> > -{
> > - VuBlkExport *vexp = opaque;
> > -
> > - vexp->export.ctx = ctx;
> > - vhost_user_server_attach_aio_context(&vexp->vu_server, ctx);
> > -}
> > -
> > -static void blk_aio_detach(void *opaque)
> > -{
> > - VuBlkExport *vexp = opaque;
> > -
> > - vhost_user_server_detach_aio_context(&vexp->vu_server);
> > - vexp->export.ctx = NULL;
> > -}
>
> So for changing the AioContext, we now rely on the fact that the node to
> be changed is always drained, so the drain callbacks implicitly cover
> this case, too?
Yes.
> > static void
> > vu_blk_initialize_config(BlockDriverState *bs,
> > struct virtio_blk_config *config,
> > @@ -272,6 +256,25 @@ static void vu_blk_exp_resize(void *opaque)
> > vu_config_change_msg(&vexp->vu_server.vu_dev);
> > }
> >
> > +/* Called with vexp->export.ctx acquired */
> > +static void vu_blk_drained_begin(void *opaque)
> > +{
> > + VuBlkExport *vexp = opaque;
> > +
> > + vhost_user_server_detach_aio_context(&vexp->vu_server);
> > +}
>
> Compared to the old code, we're losing the vexp->export.ctx = NULL. This
> is correct at this point because after drained_begin we still keep
> processing requests until we arrive at a quiescent state.
>
> However, if we detach the AioContext because we're deleting the
> iothread, won't we end up with a dangling pointer in vexp->export.ctx?
> Or can we be certain that nothing interesting happens before drained_end
> updates it with a new valid pointer again?
If you want I can add the detach() callback back again and set ctx to
NULL there?
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 10/20] block: drain from main loop thread in bdrv_co_yield_to_drain()
2023-05-02 16:21 ` Kevin Wolf
@ 2023-05-02 20:10 ` Stefan Hajnoczi
0 siblings, 0 replies; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-05-02 20:10 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-devel, Daniel P. Berrangé, Juan Quintela,
Julia Suvorova, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Richard W.M. Jones, qemu-block,
Stefano Garzarella, Anthony Perard, Stefan Weil, Xie Yongji,
Paolo Bonzini, Aarushi Mehta, Philippe Mathieu-Daudé,
Eduardo Habkost, Stefano Stabellini, Hanna Reitz, Ronnie Sahlberg
[-- Attachment #1: Type: text/plain, Size: 1338 bytes --]
On Tue, May 02, 2023 at 06:21:20PM +0200, Kevin Wolf wrote:
> Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> > For simplicity, always run BlockDevOps .drained_begin/end/poll()
> > callbacks in the main loop thread. This makes it easier to implement the
> > callbacks and avoids extra locks.
> >
> > Move the function pointer declarations from the I/O Code section to the
> > Global State section in block-backend-common.h.
> >
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>
> If we're updating function pointers, we should probably update them in
> BdrvChildClass and BlockDriver, too.
I'll do that in the next revision.
> This means that a non-coroutine caller can't run in an iothread, not
> even the home iothread of the BlockDriverState. (I'm not sure if it was
> allowed previously. I don't think we're actually doing this, but in
> theory it could have worked.) Maybe put a GLOBAL_STATE_CODE() after
> handling the bdrv_co_yield_to_drain() case? Or would that look too odd?
>
> IO_OR_GS_CODE();
>
> if (qemu_in_coroutine()) {
> bdrv_co_yield_to_drain(bs, true, parent, poll);
> return;
> }
>
> GLOBAL_STATE_CODE();
That looks good to me, it makes explicit that IO_OR_GS_CODE() only
applies until the end of the if statement.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 03/20] virtio-scsi: avoid race between unplug and transport event
2023-05-02 18:56 ` Stefan Hajnoczi
@ 2023-05-03 8:00 ` Kevin Wolf
2023-05-03 13:12 ` Stefan Hajnoczi
0 siblings, 1 reply; 48+ messages in thread
From: Kevin Wolf @ 2023-05-03 8:00 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Daniel P. Berrangé, Juan Quintela,
Julia Suvorova, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Richard W.M. Jones, qemu-block,
Stefano Garzarella, Anthony Perard, Stefan Weil, Xie Yongji,
Paolo Bonzini, Aarushi Mehta, Philippe Mathieu-Daudé,
Eduardo Habkost, Stefano Stabellini, Hanna Reitz, Ronnie Sahlberg,
Daniil Tatianin
[-- Attachment #1: Type: text/plain, Size: 3253 bytes --]
Am 02.05.2023 um 20:56 hat Stefan Hajnoczi geschrieben:
> On Tue, May 02, 2023 at 05:19:46PM +0200, Kevin Wolf wrote:
> > Am 25.04.2023 um 19:26 hat Stefan Hajnoczi geschrieben:
> > > Only report a transport reset event to the guest after the SCSIDevice
> > > has been unrealized by qdev_simple_device_unplug_cb().
> > >
> > > qdev_simple_device_unplug_cb() sets the SCSIDevice's qdev.realized field
> > > to false so that scsi_device_find/get() no longer see it.
> > >
> > > scsi_target_emulate_report_luns() also needs to be updated to filter out
> > > SCSIDevices that are unrealized.
> > >
> > > These changes ensure that the guest driver does not see the SCSIDevice
> > > that's being unplugged if it responds very quickly to the transport
> > > reset event.
> > >
> > > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > > Reviewed-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> >
> > > @@ -1082,6 +1073,15 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > > blk_set_aio_context(sd->conf.blk, qemu_get_aio_context(), NULL);
> > > virtio_scsi_release(s);
> > > }
> > > +
> > > + if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
> > > + virtio_scsi_acquire(s);
> > > + virtio_scsi_push_event(s, sd,
> > > + VIRTIO_SCSI_T_TRANSPORT_RESET,
> > > + VIRTIO_SCSI_EVT_RESET_REMOVED);
> > > + scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
> > > + virtio_scsi_release(s);
> > > + }
> > > }
> >
> > s, sd and s->bus are all unrealized at this point, whereas before this
> > patch they were still realized. I couldn't find any practical problem
> > with it, but it made me nervous enough that I thought I should comment
> > on it at least.
> >
> > Should we maybe have documentation on these functions that says that
> > they accept unrealized objects as their parameters?
>
> s is the VirtIOSCSI controller, not the SCSIDevice that is being
> unplugged. The VirtIOSCSI controller is still realized.
>
> s->bus is the VirtIOSCSI controller's bus, it is still realized.
You're right, I misread this part.
> You are right that the SCSIDevice (sd) has been unrealized at this
> point:
> - sd->conf.blk is safe because qdev properties stay alive the
> Object is deleted, but I'm not sure we should rely on that.
This feels relatively safe (and it's preexisting anyway), reading a
property doesn't do anything unpredictable and we know the pointer is
still valid.
> - virti_scsi_push_event(.., sd, ...) is questionable because the LUN
> that's fetched from sd no longer belongs to the unplugged SCSIDevice.
This call is what made me nervous.
> How about I change the code to fetch sd->conf.blk and the LUN before
> unplugging?
You mean passing sd->id and sd->lun to virtio_scsi_push_event() instead
of sd itself? That would certainly look cleaner and make sure that we
don't later add code to it that does something with sd that would
require it to be realized.
Kevin
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 07/20] block/export: stop using is_external in vhost-user-blk server
2023-05-02 20:06 ` Stefan Hajnoczi
@ 2023-05-03 8:08 ` Kevin Wolf
2023-05-03 13:11 ` Stefan Hajnoczi
0 siblings, 1 reply; 48+ messages in thread
From: Kevin Wolf @ 2023-05-03 8:08 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Daniel P. Berrangé, Juan Quintela,
Julia Suvorova, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Richard W.M. Jones, qemu-block,
Stefano Garzarella, Anthony Perard, Stefan Weil, Xie Yongji,
Paolo Bonzini, Aarushi Mehta, Philippe Mathieu-Daudé,
Eduardo Habkost, Stefano Stabellini, Hanna Reitz, Ronnie Sahlberg
[-- Attachment #1: Type: text/plain, Size: 3744 bytes --]
Am 02.05.2023 um 22:06 hat Stefan Hajnoczi geschrieben:
> On Tue, May 02, 2023 at 06:04:24PM +0200, Kevin Wolf wrote:
> > Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> > > vhost-user activity must be suspended during bdrv_drained_begin/end().
> > > This prevents new requests from interfering with whatever is happening
> > > in the drained section.
> > >
> > > Previously this was done using aio_set_fd_handler()'s is_external
> > > argument. In a multi-queue block layer world the aio_disable_external()
> > > API cannot be used since multiple AioContext may be processing I/O, not
> > > just one.
> > >
> > > Switch to BlockDevOps->drained_begin/end() callbacks.
> > >
> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > ---
> > > block/export/vhost-user-blk-server.c | 43 ++++++++++++++--------------
> > > util/vhost-user-server.c | 10 +++----
> > > 2 files changed, 26 insertions(+), 27 deletions(-)
> > >
> > > diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
> > > index 092b86aae4..d20f69cd74 100644
> > > --- a/block/export/vhost-user-blk-server.c
> > > +++ b/block/export/vhost-user-blk-server.c
> > > @@ -208,22 +208,6 @@ static const VuDevIface vu_blk_iface = {
> > > .process_msg = vu_blk_process_msg,
> > > };
> > >
> > > -static void blk_aio_attached(AioContext *ctx, void *opaque)
> > > -{
> > > - VuBlkExport *vexp = opaque;
> > > -
> > > - vexp->export.ctx = ctx;
> > > - vhost_user_server_attach_aio_context(&vexp->vu_server, ctx);
> > > -}
> > > -
> > > -static void blk_aio_detach(void *opaque)
> > > -{
> > > - VuBlkExport *vexp = opaque;
> > > -
> > > - vhost_user_server_detach_aio_context(&vexp->vu_server);
> > > - vexp->export.ctx = NULL;
> > > -}
> >
> > So for changing the AioContext, we now rely on the fact that the node to
> > be changed is always drained, so the drain callbacks implicitly cover
> > this case, too?
>
> Yes.
Ok. This surprised me a bit at first, but I think it's fine.
We just need to remember it if we ever decide that once we have
multiqueue, we can actually change the default AioContext without
draining the node. But maybe at that point, we have to do more
fundamental changes anyway.
> > > static void
> > > vu_blk_initialize_config(BlockDriverState *bs,
> > > struct virtio_blk_config *config,
> > > @@ -272,6 +256,25 @@ static void vu_blk_exp_resize(void *opaque)
> > > vu_config_change_msg(&vexp->vu_server.vu_dev);
> > > }
> > >
> > > +/* Called with vexp->export.ctx acquired */
> > > +static void vu_blk_drained_begin(void *opaque)
> > > +{
> > > + VuBlkExport *vexp = opaque;
> > > +
> > > + vhost_user_server_detach_aio_context(&vexp->vu_server);
> > > +}
> >
> > Compared to the old code, we're losing the vexp->export.ctx = NULL. This
> > is correct at this point because after drained_begin we still keep
> > processing requests until we arrive at a quiescent state.
> >
> > However, if we detach the AioContext because we're deleting the
> > iothread, won't we end up with a dangling pointer in vexp->export.ctx?
> > Or can we be certain that nothing interesting happens before drained_end
> > updates it with a new valid pointer again?
>
> If you want I can add the detach() callback back again and set ctx to
> NULL there?
I haven't thought enough about it to say if it's a problem. If you have
and are confident that it's correct the way it is, I'm happy with it.
But bringing the callback back is the minimal change compared to the old
state. It's just unnecessary code if we don't actually need it.
Kevin
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 04/20] virtio-scsi: stop using aio_disable_external() during unplug
2023-05-02 20:02 ` Stefan Hajnoczi
@ 2023-05-03 11:40 ` Kevin Wolf
2023-05-03 13:05 ` Stefan Hajnoczi
0 siblings, 1 reply; 48+ messages in thread
From: Kevin Wolf @ 2023-05-03 11:40 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Daniel P. Berrangé, Juan Quintela,
Julia Suvorova, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Richard W.M. Jones, qemu-block,
Stefano Garzarella, Anthony Perard, Stefan Weil, Xie Yongji,
Paolo Bonzini, Aarushi Mehta, Philippe Mathieu-Daudé,
Eduardo Habkost, Stefano Stabellini, Hanna Reitz, Ronnie Sahlberg,
Zhengui Li, Daniil Tatianin
[-- Attachment #1: Type: text/plain, Size: 7879 bytes --]
Am 02.05.2023 um 22:02 hat Stefan Hajnoczi geschrieben:
> On Tue, May 02, 2023 at 03:19:52PM +0200, Kevin Wolf wrote:
> > Am 01.05.2023 um 17:09 hat Stefan Hajnoczi geschrieben:
> > > On Fri, Apr 28, 2023 at 04:22:55PM +0200, Kevin Wolf wrote:
> > > > Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> > > > > This patch is part of an effort to remove the aio_disable_external()
> > > > > API because it does not fit in a multi-queue block layer world where
> > > > > many AioContexts may be submitting requests to the same disk.
> > > > >
> > > > > The SCSI emulation code is already in good shape to stop using
> > > > > aio_disable_external(). It was only used by commit 9c5aad84da1c
> > > > > ("virtio-scsi: fixed virtio_scsi_ctx_check failed when detaching scsi
> > > > > disk") to ensure that virtio_scsi_hotunplug() works while the guest
> > > > > driver is submitting I/O.
> > > > >
> > > > > Ensure virtio_scsi_hotunplug() is safe as follows:
> > > > >
> > > > > 1. qdev_simple_device_unplug_cb() -> qdev_unrealize() ->
> > > > > device_set_realized() calls qatomic_set(&dev->realized, false) so
> > > > > that future scsi_device_get() calls return NULL because they exclude
> > > > > SCSIDevices with realized=false.
> > > > >
> > > > > That means virtio-scsi will reject new I/O requests to this
> > > > > SCSIDevice with VIRTIO_SCSI_S_BAD_TARGET even while
> > > > > virtio_scsi_hotunplug() is still executing. We are protected against
> > > > > new requests!
> > > > >
> > > > > 2. Add a call to scsi_device_purge_requests() from scsi_unrealize() so
> > > > > that in-flight requests are cancelled synchronously. This ensures
> > > > > that no in-flight requests remain once qdev_simple_device_unplug_cb()
> > > > > returns.
> > > > >
> > > > > Thanks to these two conditions we don't need aio_disable_external()
> > > > > anymore.
> > > > >
> > > > > Cc: Zhengui Li <lizhengui@huawei.com>
> > > > > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> > > > > Reviewed-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
> > > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > >
> > > > qemu-iotests 040 starts failing for me after this patch, with what looks
> > > > like a use-after-free error of some kind.
> > > >
> > > > (gdb) bt
> > > > #0 0x000055b6e3e1f31c in job_type (job=0xe3e3e3e3e3e3e3e3) at ../job.c:238
> > > > #1 0x000055b6e3e1cee5 in is_block_job (job=0xe3e3e3e3e3e3e3e3) at ../blockjob.c:41
> > > > #2 0x000055b6e3e1ce7d in block_job_next_locked (bjob=0x55b6e72b7570) at ../blockjob.c:54
> > > > #3 0x000055b6e3df6370 in blockdev_mark_auto_del (blk=0x55b6e74af0a0) at ../blockdev.c:157
> > > > #4 0x000055b6e393e23b in scsi_qdev_unrealize (qdev=0x55b6e7c04d40) at ../hw/scsi/scsi-bus.c:303
> > > > #5 0x000055b6e3db0d0e in device_set_realized (obj=0x55b6e7c04d40, value=false, errp=0x55b6e497c918 <error_abort>) at ../hw/core/qdev.c:599
> > > > #6 0x000055b6e3dba36e in property_set_bool (obj=0x55b6e7c04d40, v=0x55b6e7d7f290, name=0x55b6e41bd6d8 "realized", opaque=0x55b6e7246d20, errp=0x55b6e497c918 <error_abort>)
> > > > at ../qom/object.c:2285
> > > > #7 0x000055b6e3db7e65 in object_property_set (obj=0x55b6e7c04d40, name=0x55b6e41bd6d8 "realized", v=0x55b6e7d7f290, errp=0x55b6e497c918 <error_abort>) at ../qom/object.c:1420
> > > > #8 0x000055b6e3dbd84a in object_property_set_qobject (obj=0x55b6e7c04d40, name=0x55b6e41bd6d8 "realized", value=0x55b6e74c1890, errp=0x55b6e497c918 <error_abort>)
> > > > at ../qom/qom-qobject.c:28
> > > > #9 0x000055b6e3db8570 in object_property_set_bool (obj=0x55b6e7c04d40, name=0x55b6e41bd6d8 "realized", value=false, errp=0x55b6e497c918 <error_abort>) at ../qom/object.c:1489
> > > > #10 0x000055b6e3daf2b5 in qdev_unrealize (dev=0x55b6e7c04d40) at ../hw/core/qdev.c:306
> > > > #11 0x000055b6e3db509d in qdev_simple_device_unplug_cb (hotplug_dev=0x55b6e81c3630, dev=0x55b6e7c04d40, errp=0x7ffec5519200) at ../hw/core/qdev-hotplug.c:72
> > > > #12 0x000055b6e3c520f9 in virtio_scsi_hotunplug (hotplug_dev=0x55b6e81c3630, dev=0x55b6e7c04d40, errp=0x7ffec5519200) at ../hw/scsi/virtio-scsi.c:1065
> > > > #13 0x000055b6e3db4dec in hotplug_handler_unplug (plug_handler=0x55b6e81c3630, plugged_dev=0x55b6e7c04d40, errp=0x7ffec5519200) at ../hw/core/hotplug.c:56
> > > > #14 0x000055b6e3a28f84 in qdev_unplug (dev=0x55b6e7c04d40, errp=0x7ffec55192e0) at ../softmmu/qdev-monitor.c:935
> > > > #15 0x000055b6e3a290fa in qmp_device_del (id=0x55b6e74c1760 "scsi0", errp=0x7ffec55192e0) at ../softmmu/qdev-monitor.c:955
> > > > #16 0x000055b6e3fb0a5f in qmp_marshal_device_del (args=0x7f61cc005eb0, ret=0x7f61d5a8ae38, errp=0x7f61d5a8ae40) at qapi/qapi-commands-qdev.c:114
> > > > #17 0x000055b6e3fd52e1 in do_qmp_dispatch_bh (opaque=0x7f61d5a8ae08) at ../qapi/qmp-dispatch.c:128
> > > > #18 0x000055b6e4007b9e in aio_bh_call (bh=0x55b6e7dea730) at ../util/async.c:155
> > > > #19 0x000055b6e4007d2e in aio_bh_poll (ctx=0x55b6e72447c0) at ../util/async.c:184
> > > > #20 0x000055b6e3fe3b45 in aio_dispatch (ctx=0x55b6e72447c0) at ../util/aio-posix.c:421
> > > > #21 0x000055b6e4009544 in aio_ctx_dispatch (source=0x55b6e72447c0, callback=0x0, user_data=0x0) at ../util/async.c:326
> > > > #22 0x00007f61ddc14c7f in g_main_dispatch (context=0x55b6e7244b20) at ../glib/gmain.c:3454
> > > > #23 g_main_context_dispatch (context=0x55b6e7244b20) at ../glib/gmain.c:4172
> > > > #24 0x000055b6e400a7e8 in glib_pollfds_poll () at ../util/main-loop.c:290
> > > > #25 0x000055b6e400a0c2 in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:313
> > > > #26 0x000055b6e4009fa2 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:592
> > > > #27 0x000055b6e3a3047b in qemu_main_loop () at ../softmmu/runstate.c:731
> > > > #28 0x000055b6e3dab27d in qemu_default_main () at ../softmmu/main.c:37
> > > > #29 0x000055b6e3dab2b8 in main (argc=24, argv=0x7ffec55196a8) at ../softmmu/main.c:48
> > > > (gdb) p jobs
> > > > $4 = {lh_first = 0x0}
> > >
> > > I wasn't able to reproduce this with gcc 13.1.1 or clang 16.0.1:
> > >
> > > $ tests/qemu-iotests/check -qcow2 040
> > >
> > > Any suggestions on how to reproduce the issue?
> >
> > It happens consistently for me with the same command line, both with gcc
> > and clang.
> >
> > gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4)
> > clang version 15.0.7 (Fedora 15.0.7-2.fc37)
> >
> > Maybe there is a semantic merge conflict? I have applied the series on
> > top of master (05d50ba2d4) and my block branch (88f81f7bc8).
>
> I can't find 88f81f7bc8 but rebased on repo.or.cz/qemu/kevin.git block
> (4514dac7f2e9) and the test passes here.
>
> I rebased on qemu.git/master (05d50ba2d4) and it also passes.
>
> Please let me know if the following tree (a0ff680a72f6) works on your
> machine:
> https://gitlab.com/stefanha/qemu/-/tree/remove-aio_disable_external
Fails in the same way.
So I tried to debug this myself now. The problem is that iterating the
jobs in blockdev_mark_auto_del() is incorrect: job_cancel_locked()
frees the job and then block_job_next_locked() is a use after free.
It also drops job_mutex temporarily and polls, so even switching to a
*_FOREACH_SAFE style loop won't fix this. I guess we have to restart
the whole search from the start after a job_cancel_locked() because the
list might look very different after the call.
Now, of course, how this is related to your patch and why it doesn't
trigger before it, is still less than clear. What I found out is that
adding the scsi_device_purge_requests() is enough to crash it. Maybe
it's related to the blk_drain() inside of it. That the job finishes
earlier during the unplug now or something like that.
Anyway, changing blockdev_mark_auto_del() fixes it. I'll send a patch.
Kevin
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 04/20] virtio-scsi: stop using aio_disable_external() during unplug
2023-05-03 11:40 ` Kevin Wolf
@ 2023-05-03 13:05 ` Stefan Hajnoczi
0 siblings, 0 replies; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-05-03 13:05 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-devel, Daniel P. Berrangé, Juan Quintela,
Julia Suvorova, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Richard W.M. Jones, qemu-block,
Stefano Garzarella, Anthony Perard, Stefan Weil, Xie Yongji,
Paolo Bonzini, Aarushi Mehta, Philippe Mathieu-Daudé,
Eduardo Habkost, Stefano Stabellini, Hanna Reitz, Ronnie Sahlberg,
Zhengui Li, Daniil Tatianin
[-- Attachment #1: Type: text/plain, Size: 8244 bytes --]
On Wed, May 03, 2023 at 01:40:49PM +0200, Kevin Wolf wrote:
> Am 02.05.2023 um 22:02 hat Stefan Hajnoczi geschrieben:
> > On Tue, May 02, 2023 at 03:19:52PM +0200, Kevin Wolf wrote:
> > > Am 01.05.2023 um 17:09 hat Stefan Hajnoczi geschrieben:
> > > > On Fri, Apr 28, 2023 at 04:22:55PM +0200, Kevin Wolf wrote:
> > > > > Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> > > > > > This patch is part of an effort to remove the aio_disable_external()
> > > > > > API because it does not fit in a multi-queue block layer world where
> > > > > > many AioContexts may be submitting requests to the same disk.
> > > > > >
> > > > > > The SCSI emulation code is already in good shape to stop using
> > > > > > aio_disable_external(). It was only used by commit 9c5aad84da1c
> > > > > > ("virtio-scsi: fixed virtio_scsi_ctx_check failed when detaching scsi
> > > > > > disk") to ensure that virtio_scsi_hotunplug() works while the guest
> > > > > > driver is submitting I/O.
> > > > > >
> > > > > > Ensure virtio_scsi_hotunplug() is safe as follows:
> > > > > >
> > > > > > 1. qdev_simple_device_unplug_cb() -> qdev_unrealize() ->
> > > > > > device_set_realized() calls qatomic_set(&dev->realized, false) so
> > > > > > that future scsi_device_get() calls return NULL because they exclude
> > > > > > SCSIDevices with realized=false.
> > > > > >
> > > > > > That means virtio-scsi will reject new I/O requests to this
> > > > > > SCSIDevice with VIRTIO_SCSI_S_BAD_TARGET even while
> > > > > > virtio_scsi_hotunplug() is still executing. We are protected against
> > > > > > new requests!
> > > > > >
> > > > > > 2. Add a call to scsi_device_purge_requests() from scsi_unrealize() so
> > > > > > that in-flight requests are cancelled synchronously. This ensures
> > > > > > that no in-flight requests remain once qdev_simple_device_unplug_cb()
> > > > > > returns.
> > > > > >
> > > > > > Thanks to these two conditions we don't need aio_disable_external()
> > > > > > anymore.
> > > > > >
> > > > > > Cc: Zhengui Li <lizhengui@huawei.com>
> > > > > > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> > > > > > Reviewed-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
> > > > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > >
> > > > > qemu-iotests 040 starts failing for me after this patch, with what looks
> > > > > like a use-after-free error of some kind.
> > > > >
> > > > > (gdb) bt
> > > > > #0 0x000055b6e3e1f31c in job_type (job=0xe3e3e3e3e3e3e3e3) at ../job.c:238
> > > > > #1 0x000055b6e3e1cee5 in is_block_job (job=0xe3e3e3e3e3e3e3e3) at ../blockjob.c:41
> > > > > #2 0x000055b6e3e1ce7d in block_job_next_locked (bjob=0x55b6e72b7570) at ../blockjob.c:54
> > > > > #3 0x000055b6e3df6370 in blockdev_mark_auto_del (blk=0x55b6e74af0a0) at ../blockdev.c:157
> > > > > #4 0x000055b6e393e23b in scsi_qdev_unrealize (qdev=0x55b6e7c04d40) at ../hw/scsi/scsi-bus.c:303
> > > > > #5 0x000055b6e3db0d0e in device_set_realized (obj=0x55b6e7c04d40, value=false, errp=0x55b6e497c918 <error_abort>) at ../hw/core/qdev.c:599
> > > > > #6 0x000055b6e3dba36e in property_set_bool (obj=0x55b6e7c04d40, v=0x55b6e7d7f290, name=0x55b6e41bd6d8 "realized", opaque=0x55b6e7246d20, errp=0x55b6e497c918 <error_abort>)
> > > > > at ../qom/object.c:2285
> > > > > #7 0x000055b6e3db7e65 in object_property_set (obj=0x55b6e7c04d40, name=0x55b6e41bd6d8 "realized", v=0x55b6e7d7f290, errp=0x55b6e497c918 <error_abort>) at ../qom/object.c:1420
> > > > > #8 0x000055b6e3dbd84a in object_property_set_qobject (obj=0x55b6e7c04d40, name=0x55b6e41bd6d8 "realized", value=0x55b6e74c1890, errp=0x55b6e497c918 <error_abort>)
> > > > > at ../qom/qom-qobject.c:28
> > > > > #9 0x000055b6e3db8570 in object_property_set_bool (obj=0x55b6e7c04d40, name=0x55b6e41bd6d8 "realized", value=false, errp=0x55b6e497c918 <error_abort>) at ../qom/object.c:1489
> > > > > #10 0x000055b6e3daf2b5 in qdev_unrealize (dev=0x55b6e7c04d40) at ../hw/core/qdev.c:306
> > > > > #11 0x000055b6e3db509d in qdev_simple_device_unplug_cb (hotplug_dev=0x55b6e81c3630, dev=0x55b6e7c04d40, errp=0x7ffec5519200) at ../hw/core/qdev-hotplug.c:72
> > > > > #12 0x000055b6e3c520f9 in virtio_scsi_hotunplug (hotplug_dev=0x55b6e81c3630, dev=0x55b6e7c04d40, errp=0x7ffec5519200) at ../hw/scsi/virtio-scsi.c:1065
> > > > > #13 0x000055b6e3db4dec in hotplug_handler_unplug (plug_handler=0x55b6e81c3630, plugged_dev=0x55b6e7c04d40, errp=0x7ffec5519200) at ../hw/core/hotplug.c:56
> > > > > #14 0x000055b6e3a28f84 in qdev_unplug (dev=0x55b6e7c04d40, errp=0x7ffec55192e0) at ../softmmu/qdev-monitor.c:935
> > > > > #15 0x000055b6e3a290fa in qmp_device_del (id=0x55b6e74c1760 "scsi0", errp=0x7ffec55192e0) at ../softmmu/qdev-monitor.c:955
> > > > > #16 0x000055b6e3fb0a5f in qmp_marshal_device_del (args=0x7f61cc005eb0, ret=0x7f61d5a8ae38, errp=0x7f61d5a8ae40) at qapi/qapi-commands-qdev.c:114
> > > > > #17 0x000055b6e3fd52e1 in do_qmp_dispatch_bh (opaque=0x7f61d5a8ae08) at ../qapi/qmp-dispatch.c:128
> > > > > #18 0x000055b6e4007b9e in aio_bh_call (bh=0x55b6e7dea730) at ../util/async.c:155
> > > > > #19 0x000055b6e4007d2e in aio_bh_poll (ctx=0x55b6e72447c0) at ../util/async.c:184
> > > > > #20 0x000055b6e3fe3b45 in aio_dispatch (ctx=0x55b6e72447c0) at ../util/aio-posix.c:421
> > > > > #21 0x000055b6e4009544 in aio_ctx_dispatch (source=0x55b6e72447c0, callback=0x0, user_data=0x0) at ../util/async.c:326
> > > > > #22 0x00007f61ddc14c7f in g_main_dispatch (context=0x55b6e7244b20) at ../glib/gmain.c:3454
> > > > > #23 g_main_context_dispatch (context=0x55b6e7244b20) at ../glib/gmain.c:4172
> > > > > #24 0x000055b6e400a7e8 in glib_pollfds_poll () at ../util/main-loop.c:290
> > > > > #25 0x000055b6e400a0c2 in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:313
> > > > > #26 0x000055b6e4009fa2 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:592
> > > > > #27 0x000055b6e3a3047b in qemu_main_loop () at ../softmmu/runstate.c:731
> > > > > #28 0x000055b6e3dab27d in qemu_default_main () at ../softmmu/main.c:37
> > > > > #29 0x000055b6e3dab2b8 in main (argc=24, argv=0x7ffec55196a8) at ../softmmu/main.c:48
> > > > > (gdb) p jobs
> > > > > $4 = {lh_first = 0x0}
> > > >
> > > > I wasn't able to reproduce this with gcc 13.1.1 or clang 16.0.1:
> > > >
> > > > $ tests/qemu-iotests/check -qcow2 040
> > > >
> > > > Any suggestions on how to reproduce the issue?
> > >
> > > It happens consistently for me with the same command line, both with gcc
> > > and clang.
> > >
> > > gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4)
> > > clang version 15.0.7 (Fedora 15.0.7-2.fc37)
> > >
> > > Maybe there is a semantic merge conflict? I have applied the series on
> > > top of master (05d50ba2d4) and my block branch (88f81f7bc8).
> >
> > I can't find 88f81f7bc8 but rebased on repo.or.cz/qemu/kevin.git block
> > (4514dac7f2e9) and the test passes here.
> >
> > I rebased on qemu.git/master (05d50ba2d4) and it also passes.
> >
> > Please let me know if the following tree (a0ff680a72f6) works on your
> > machine:
> > https://gitlab.com/stefanha/qemu/-/tree/remove-aio_disable_external
>
> Fails in the same way.
>
> So I tried to debug this myself now. The problem is that iterating the
> jobs in blockdev_mark_auto_del() is incorrect: job_cancel_locked()
> frees the job and then block_job_next_locked() is a use after free.
>
> It also drops job_mutex temporarily and polls, so even switching to a
> *_FOREACH_SAFE style loop won't fix this. I guess we have to restart
> the whole search from the start after a job_cancel_locked() because the
> list might look very different after the call.
>
> Now, of course, how this is related to your patch and why it doesn't
> trigger before it, is still less than clear. What I found out is that
> adding the scsi_device_purge_requests() is enough to crash it. Maybe
> it's related to the blk_drain() inside of it. That the job finishes
> earlier during the unplug now or something like that.
>
> Anyway, changing blockdev_mark_auto_del() fixes it. I'll send a patch.
Thanks, and sorry for taking your time with debugging it!
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 07/20] block/export: stop using is_external in vhost-user-blk server
2023-05-03 8:08 ` Kevin Wolf
@ 2023-05-03 13:11 ` Stefan Hajnoczi
2023-05-03 13:43 ` Kevin Wolf
0 siblings, 1 reply; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-05-03 13:11 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-devel, Daniel P. Berrangé, Juan Quintela,
Julia Suvorova, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Richard W.M. Jones, qemu-block,
Stefano Garzarella, Anthony Perard, Stefan Weil, Xie Yongji,
Paolo Bonzini, Aarushi Mehta, Philippe Mathieu-Daudé,
Eduardo Habkost, Stefano Stabellini, Hanna Reitz, Ronnie Sahlberg
[-- Attachment #1: Type: text/plain, Size: 4592 bytes --]
On Wed, May 03, 2023 at 10:08:46AM +0200, Kevin Wolf wrote:
> Am 02.05.2023 um 22:06 hat Stefan Hajnoczi geschrieben:
> > On Tue, May 02, 2023 at 06:04:24PM +0200, Kevin Wolf wrote:
> > > Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> > > > vhost-user activity must be suspended during bdrv_drained_begin/end().
> > > > This prevents new requests from interfering with whatever is happening
> > > > in the drained section.
> > > >
> > > > Previously this was done using aio_set_fd_handler()'s is_external
> > > > argument. In a multi-queue block layer world the aio_disable_external()
> > > > API cannot be used since multiple AioContext may be processing I/O, not
> > > > just one.
> > > >
> > > > Switch to BlockDevOps->drained_begin/end() callbacks.
> > > >
> > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > ---
> > > > block/export/vhost-user-blk-server.c | 43 ++++++++++++++--------------
> > > > util/vhost-user-server.c | 10 +++----
> > > > 2 files changed, 26 insertions(+), 27 deletions(-)
> > > >
> > > > diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
> > > > index 092b86aae4..d20f69cd74 100644
> > > > --- a/block/export/vhost-user-blk-server.c
> > > > +++ b/block/export/vhost-user-blk-server.c
> > > > @@ -208,22 +208,6 @@ static const VuDevIface vu_blk_iface = {
> > > > .process_msg = vu_blk_process_msg,
> > > > };
> > > >
> > > > -static void blk_aio_attached(AioContext *ctx, void *opaque)
> > > > -{
> > > > - VuBlkExport *vexp = opaque;
> > > > -
> > > > - vexp->export.ctx = ctx;
> > > > - vhost_user_server_attach_aio_context(&vexp->vu_server, ctx);
> > > > -}
> > > > -
> > > > -static void blk_aio_detach(void *opaque)
> > > > -{
> > > > - VuBlkExport *vexp = opaque;
> > > > -
> > > > - vhost_user_server_detach_aio_context(&vexp->vu_server);
> > > > - vexp->export.ctx = NULL;
> > > > -}
> > >
> > > So for changing the AioContext, we now rely on the fact that the node to
> > > be changed is always drained, so the drain callbacks implicitly cover
> > > this case, too?
> >
> > Yes.
>
> Ok. This surprised me a bit at first, but I think it's fine.
>
> We just need to remember it if we ever decide that once we have
> multiqueue, we can actually change the default AioContext without
> draining the node. But maybe at that point, we have to do more
> fundamental changes anyway.
>
> > > > static void
> > > > vu_blk_initialize_config(BlockDriverState *bs,
> > > > struct virtio_blk_config *config,
> > > > @@ -272,6 +256,25 @@ static void vu_blk_exp_resize(void *opaque)
> > > > vu_config_change_msg(&vexp->vu_server.vu_dev);
> > > > }
> > > >
> > > > +/* Called with vexp->export.ctx acquired */
> > > > +static void vu_blk_drained_begin(void *opaque)
> > > > +{
> > > > + VuBlkExport *vexp = opaque;
> > > > +
> > > > + vhost_user_server_detach_aio_context(&vexp->vu_server);
> > > > +}
> > >
> > > Compared to the old code, we're losing the vexp->export.ctx = NULL. This
> > > is correct at this point because after drained_begin we still keep
> > > processing requests until we arrive at a quiescent state.
> > >
> > > However, if we detach the AioContext because we're deleting the
> > > iothread, won't we end up with a dangling pointer in vexp->export.ctx?
> > > Or can we be certain that nothing interesting happens before drained_end
> > > updates it with a new valid pointer again?
> >
> > If you want I can add the detach() callback back again and set ctx to
> > NULL there?
>
> I haven't thought enough about it to say if it's a problem. If you have
> and are confident that it's correct the way it is, I'm happy with it.
>
> But bringing the callback back is the minimal change compared to the old
> state. It's just unnecessary code if we don't actually need it.
The reasoning behind my patch is that detach() sets NULL today and we
would see crashes if ctx was accessed between detach() -> attach().
Therefore, I'm assuming there are no ctx accesses in the code today and
removing the ctx = NULL assignment doesn't break anything.
However, my approach is not very defensive. If the code is changed in a
way that accesses ctx when it's not supposed to, then a dangling pointer
will be accessed.
I think leaving the detach() callback there can be justified because it
will make it easier to detect bugs in the future. I'll add it back in
the next revision.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 03/20] virtio-scsi: avoid race between unplug and transport event
2023-05-03 8:00 ` Kevin Wolf
@ 2023-05-03 13:12 ` Stefan Hajnoczi
0 siblings, 0 replies; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-05-03 13:12 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-devel, Daniel P. Berrangé, Juan Quintela,
Julia Suvorova, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Richard W.M. Jones, qemu-block,
Stefano Garzarella, Anthony Perard, Stefan Weil, Xie Yongji,
Paolo Bonzini, Aarushi Mehta, Philippe Mathieu-Daudé,
Eduardo Habkost, Stefano Stabellini, Hanna Reitz, Ronnie Sahlberg,
Daniil Tatianin
[-- Attachment #1: Type: text/plain, Size: 3502 bytes --]
On Wed, May 03, 2023 at 10:00:57AM +0200, Kevin Wolf wrote:
> Am 02.05.2023 um 20:56 hat Stefan Hajnoczi geschrieben:
> > On Tue, May 02, 2023 at 05:19:46PM +0200, Kevin Wolf wrote:
> > > Am 25.04.2023 um 19:26 hat Stefan Hajnoczi geschrieben:
> > > > Only report a transport reset event to the guest after the SCSIDevice
> > > > has been unrealized by qdev_simple_device_unplug_cb().
> > > >
> > > > qdev_simple_device_unplug_cb() sets the SCSIDevice's qdev.realized field
> > > > to false so that scsi_device_find/get() no longer see it.
> > > >
> > > > scsi_target_emulate_report_luns() also needs to be updated to filter out
> > > > SCSIDevices that are unrealized.
> > > >
> > > > These changes ensure that the guest driver does not see the SCSIDevice
> > > > that's being unplugged if it responds very quickly to the transport
> > > > reset event.
> > > >
> > > > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> > > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > > > Reviewed-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
> > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > >
> > > > @@ -1082,6 +1073,15 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > > > blk_set_aio_context(sd->conf.blk, qemu_get_aio_context(), NULL);
> > > > virtio_scsi_release(s);
> > > > }
> > > > +
> > > > + if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
> > > > + virtio_scsi_acquire(s);
> > > > + virtio_scsi_push_event(s, sd,
> > > > + VIRTIO_SCSI_T_TRANSPORT_RESET,
> > > > + VIRTIO_SCSI_EVT_RESET_REMOVED);
> > > > + scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
> > > > + virtio_scsi_release(s);
> > > > + }
> > > > }
> > >
> > > s, sd and s->bus are all unrealized at this point, whereas before this
> > > patch they were still realized. I couldn't find any practical problem
> > > with it, but it made me nervous enough that I thought I should comment
> > > on it at least.
> > >
> > > Should we maybe have documentation on these functions that says that
> > > they accept unrealized objects as their parameters?
> >
> > s is the VirtIOSCSI controller, not the SCSIDevice that is being
> > unplugged. The VirtIOSCSI controller is still realized.
> >
> > s->bus is the VirtIOSCSI controller's bus, it is still realized.
>
> You're right, I misread this part.
>
> > You are right that the SCSIDevice (sd) has been unrealized at this
> > point:
> > - sd->conf.blk is safe because qdev properties stay alive the
> > Object is deleted, but I'm not sure we should rely on that.
>
> This feels relatively safe (and it's preexisting anyway), reading a
> property doesn't do anything unpredictable and we know the pointer is
> still valid.
>
> > - virti_scsi_push_event(.., sd, ...) is questionable because the LUN
> > that's fetched from sd no longer belongs to the unplugged SCSIDevice.
>
> This call is what made me nervous.
>
> > How about I change the code to fetch sd->conf.blk and the LUN before
> > unplugging?
>
> You mean passing sd->id and sd->lun to virtio_scsi_push_event() instead
> of sd itself? That would certainly look cleaner and make sure that we
> don't later add code to it that does something with sd that would
> require it to be realized.
Yes, I'll do that in the next revision.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 07/20] block/export: stop using is_external in vhost-user-blk server
2023-05-03 13:11 ` Stefan Hajnoczi
@ 2023-05-03 13:43 ` Kevin Wolf
0 siblings, 0 replies; 48+ messages in thread
From: Kevin Wolf @ 2023-05-03 13:43 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Daniel P. Berrangé, Juan Quintela,
Julia Suvorova, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Richard W.M. Jones, qemu-block,
Stefano Garzarella, Anthony Perard, Stefan Weil, Xie Yongji,
Paolo Bonzini, Aarushi Mehta, Philippe Mathieu-Daudé,
Eduardo Habkost, Stefano Stabellini, Hanna Reitz, Ronnie Sahlberg
[-- Attachment #1: Type: text/plain, Size: 5021 bytes --]
Am 03.05.2023 um 15:11 hat Stefan Hajnoczi geschrieben:
> On Wed, May 03, 2023 at 10:08:46AM +0200, Kevin Wolf wrote:
> > Am 02.05.2023 um 22:06 hat Stefan Hajnoczi geschrieben:
> > > On Tue, May 02, 2023 at 06:04:24PM +0200, Kevin Wolf wrote:
> > > > Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> > > > > vhost-user activity must be suspended during bdrv_drained_begin/end().
> > > > > This prevents new requests from interfering with whatever is happening
> > > > > in the drained section.
> > > > >
> > > > > Previously this was done using aio_set_fd_handler()'s is_external
> > > > > argument. In a multi-queue block layer world the aio_disable_external()
> > > > > API cannot be used since multiple AioContext may be processing I/O, not
> > > > > just one.
> > > > >
> > > > > Switch to BlockDevOps->drained_begin/end() callbacks.
> > > > >
> > > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > ---
> > > > > block/export/vhost-user-blk-server.c | 43 ++++++++++++++--------------
> > > > > util/vhost-user-server.c | 10 +++----
> > > > > 2 files changed, 26 insertions(+), 27 deletions(-)
> > > > >
> > > > > diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
> > > > > index 092b86aae4..d20f69cd74 100644
> > > > > --- a/block/export/vhost-user-blk-server.c
> > > > > +++ b/block/export/vhost-user-blk-server.c
> > > > > @@ -208,22 +208,6 @@ static const VuDevIface vu_blk_iface = {
> > > > > .process_msg = vu_blk_process_msg,
> > > > > };
> > > > >
> > > > > -static void blk_aio_attached(AioContext *ctx, void *opaque)
> > > > > -{
> > > > > - VuBlkExport *vexp = opaque;
> > > > > -
> > > > > - vexp->export.ctx = ctx;
> > > > > - vhost_user_server_attach_aio_context(&vexp->vu_server, ctx);
> > > > > -}
> > > > > -
> > > > > -static void blk_aio_detach(void *opaque)
> > > > > -{
> > > > > - VuBlkExport *vexp = opaque;
> > > > > -
> > > > > - vhost_user_server_detach_aio_context(&vexp->vu_server);
> > > > > - vexp->export.ctx = NULL;
> > > > > -}
> > > >
> > > > So for changing the AioContext, we now rely on the fact that the node to
> > > > be changed is always drained, so the drain callbacks implicitly cover
> > > > this case, too?
> > >
> > > Yes.
> >
> > Ok. This surprised me a bit at first, but I think it's fine.
> >
> > We just need to remember it if we ever decide that once we have
> > multiqueue, we can actually change the default AioContext without
> > draining the node. But maybe at that point, we have to do more
> > fundamental changes anyway.
> >
> > > > > static void
> > > > > vu_blk_initialize_config(BlockDriverState *bs,
> > > > > struct virtio_blk_config *config,
> > > > > @@ -272,6 +256,25 @@ static void vu_blk_exp_resize(void *opaque)
> > > > > vu_config_change_msg(&vexp->vu_server.vu_dev);
> > > > > }
> > > > >
> > > > > +/* Called with vexp->export.ctx acquired */
> > > > > +static void vu_blk_drained_begin(void *opaque)
> > > > > +{
> > > > > + VuBlkExport *vexp = opaque;
> > > > > +
> > > > > + vhost_user_server_detach_aio_context(&vexp->vu_server);
> > > > > +}
> > > >
> > > > Compared to the old code, we're losing the vexp->export.ctx = NULL. This
> > > > is correct at this point because after drained_begin we still keep
> > > > processing requests until we arrive at a quiescent state.
> > > >
> > > > However, if we detach the AioContext because we're deleting the
> > > > iothread, won't we end up with a dangling pointer in vexp->export.ctx?
> > > > Or can we be certain that nothing interesting happens before drained_end
> > > > updates it with a new valid pointer again?
> > >
> > > If you want I can add the detach() callback back again and set ctx to
> > > NULL there?
> >
> > I haven't thought enough about it to say if it's a problem. If you have
> > and are confident that it's correct the way it is, I'm happy with it.
> >
> > But bringing the callback back is the minimal change compared to the old
> > state. It's just unnecessary code if we don't actually need it.
>
> The reasoning behind my patch is that detach() sets NULL today and we
> would see crashes if ctx was accessed between detach() -> attach().
> Therefore, I'm assuming there are no ctx accesses in the code today and
> removing the ctx = NULL assignment doesn't break anything.
Sometimes ctx = NULL defaults to qemu_get_aio_context(), so in theory
there could be cases where NULL works, but a dangling pointer wouldn't.
> However, my approach is not very defensive. If the code is changed in
> a way that accesses ctx when it's not supposed to, then a dangling
> pointer will be accessed.
>
> I think leaving the detach() callback there can be justified because it
> will make it easier to detect bugs in the future. I'll add it back in
> the next revision.
Ok, sounds good me.
Kevin
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 16/20] virtio: make it possible to detach host notifier from any thread
2023-04-25 17:27 ` [PATCH v4 16/20] virtio: make it possible to detach host notifier from any thread Stefan Hajnoczi
@ 2023-05-04 21:00 ` Kevin Wolf
2023-05-10 21:40 ` Stefan Hajnoczi
0 siblings, 1 reply; 48+ messages in thread
From: Kevin Wolf @ 2023-05-04 21:00 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Daniel P. Berrangé, Juan Quintela,
Julia Suvorova, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Richard W.M. Jones, qemu-block,
Stefano Garzarella, Anthony Perard, Stefan Weil, Xie Yongji,
Paolo Bonzini, Aarushi Mehta, Philippe Mathieu-Daudé,
Eduardo Habkost, Stefano Stabellini, Hanna Reitz, Ronnie Sahlberg
Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> virtio_queue_aio_detach_host_notifier() does two things:
> 1. It removes the fd handler from the event loop.
> 2. It processes the virtqueue one last time.
>
> The first step can be peformed by any thread and without taking the
> AioContext lock.
>
> The second step may need the AioContext lock (depending on the device
> implementation) and runs in the thread where request processing takes
> place. virtio-blk and virtio-scsi therefore call
> virtio_queue_aio_detach_host_notifier() from a BH that is scheduled in
> AioContext
>
> Scheduling a BH is undesirable for .drained_begin() functions. The next
> patch will introduce a .drained_begin() function that needs to call
> virtio_queue_aio_detach_host_notifier().
Why is it undesirable? In my mental model, .drained_begin() is still
free to start as many asynchronous things as it likes. The only
important thing to take care of is that .drained_poll() returns true as
long as the BH (or other asynchronous operation) is still pending.
Of course, your way of doing things still seems to result in simpler
code because you don't have to deal with a BH at all if you only really
want the first part and not the second.
> Move the virtqueue processing out to the callers of
> virtio_queue_aio_detach_host_notifier() so that the function can be
> called from any thread. This is in preparation for the next patch.
Did you forget to remove it in virtio_queue_aio_detach_host_notifier()?
If it's unchanged, I don't think the AioContext requirement is lifted.
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> hw/block/dataplane/virtio-blk.c | 2 ++
> hw/scsi/virtio-scsi-dataplane.c | 9 +++++++++
> 2 files changed, 11 insertions(+)
> diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
> index b28d81737e..bd7cc6e76b 100644
> --- a/hw/block/dataplane/virtio-blk.c
> +++ b/hw/block/dataplane/virtio-blk.c
> @@ -286,8 +286,10 @@ static void virtio_blk_data_plane_stop_bh(void *opaque)
>
> for (i = 0; i < s->conf->num_queues; i++) {
> VirtQueue *vq = virtio_get_queue(s->vdev, i);
> + EventNotifier *host_notifier = virtio_queue_get_host_notifier(vq);
>
> virtio_queue_aio_detach_host_notifier(vq, s->ctx);
> + virtio_queue_host_notifier_read(host_notifier);
> }
> }
The existing code in virtio_queue_aio_detach_host_notifier() has a
comment before the read:
/* Test and clear notifier before after disabling event,
* in case poll callback didn't have time to run. */
Do we want to keep it around in the new places? (And also fix the
"before after", I suppose, or replace it with a similar, but better
comment that explains why we're reading here.)
Kevin
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 17/20] virtio-blk: implement BlockDevOps->drained_begin()
2023-04-25 17:27 ` [PATCH v4 17/20] virtio-blk: implement BlockDevOps->drained_begin() Stefan Hajnoczi
@ 2023-05-04 21:13 ` Kevin Wolf
2023-05-11 20:54 ` Stefan Hajnoczi
2023-05-11 21:22 ` Stefan Hajnoczi
0 siblings, 2 replies; 48+ messages in thread
From: Kevin Wolf @ 2023-05-04 21:13 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Daniel P. Berrangé, Juan Quintela,
Julia Suvorova, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Richard W.M. Jones, qemu-block,
Stefano Garzarella, Anthony Perard, Stefan Weil, Xie Yongji,
Paolo Bonzini, Aarushi Mehta, Philippe Mathieu-Daudé,
Eduardo Habkost, Stefano Stabellini, Hanna Reitz, Ronnie Sahlberg
Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> Detach ioeventfds during drained sections to stop I/O submission from
> the guest. virtio-blk is no longer reliant on aio_disable_external()
> after this patch. This will allow us to remove the
> aio_disable_external() API once all other code that relies on it is
> converted.
>
> Take extra care to avoid attaching/detaching ioeventfds if the data
> plane is started/stopped during a drained section. This should be rare,
> but maybe the mirror block job can trigger it.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> hw/block/dataplane/virtio-blk.c | 17 +++++++++------
> hw/block/virtio-blk.c | 38 ++++++++++++++++++++++++++++++++-
> 2 files changed, 48 insertions(+), 7 deletions(-)
>
> diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
> index bd7cc6e76b..d77fc6028c 100644
> --- a/hw/block/dataplane/virtio-blk.c
> +++ b/hw/block/dataplane/virtio-blk.c
> @@ -245,13 +245,15 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
> }
>
> /* Get this show started by hooking up our callbacks */
> - aio_context_acquire(s->ctx);
> - for (i = 0; i < nvqs; i++) {
> - VirtQueue *vq = virtio_get_queue(s->vdev, i);
> + if (!blk_in_drain(s->conf->conf.blk)) {
> + aio_context_acquire(s->ctx);
> + for (i = 0; i < nvqs; i++) {
> + VirtQueue *vq = virtio_get_queue(s->vdev, i);
>
> - virtio_queue_aio_attach_host_notifier(vq, s->ctx);
> + virtio_queue_aio_attach_host_notifier(vq, s->ctx);
> + }
> + aio_context_release(s->ctx);
> }
> - aio_context_release(s->ctx);
> return 0;
>
> fail_aio_context:
> @@ -317,7 +319,10 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
> trace_virtio_blk_data_plane_stop(s);
>
> aio_context_acquire(s->ctx);
> - aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s);
> +
> + if (!blk_in_drain(s->conf->conf.blk)) {
> + aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s);
> + }
So here we actually get a semantic change: What you described as the
second part in the previous patch, processing the virtqueue one last
time, isn't done any more if the device is drained.
If it's okay to just skip this during drain, why do we need to do it
outside of drain?
Kevin
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 18/20] virtio-scsi: implement BlockDevOps->drained_begin()
2023-04-25 17:27 ` [PATCH v4 18/20] virtio-scsi: " Stefan Hajnoczi
@ 2023-05-04 21:25 ` Kevin Wolf
0 siblings, 0 replies; 48+ messages in thread
From: Kevin Wolf @ 2023-05-04 21:25 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Daniel P. Berrangé, Juan Quintela,
Julia Suvorova, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Richard W.M. Jones, qemu-block,
Stefano Garzarella, Anthony Perard, Stefan Weil, Xie Yongji,
Paolo Bonzini, Aarushi Mehta, Philippe Mathieu-Daudé,
Eduardo Habkost, Stefano Stabellini, Hanna Reitz, Ronnie Sahlberg
Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> The virtio-scsi Host Bus Adapter provides access to devices on a SCSI
> bus. Those SCSI devices typically have a BlockBackend. When the
> BlockBackend enters a drained section, the SCSI device must temporarily
> stop submitting new I/O requests.
>
> Implement this behavior by temporarily stopping virtio-scsi virtqueue
> processing when one of the SCSI devices enters a drained section. The
> new scsi_device_drained_begin() API allows scsi-disk to message the
> virtio-scsi HBA.
>
> scsi_device_drained_begin() uses a drain counter so that multiple SCSI
> devices can have overlapping drained sections. The HBA only sees one
> pair of .drained_begin/end() calls.
>
> After this commit, virtio-scsi no longer depends on hw/virtio's
> ioeventfd aio_set_event_notifier(is_external=true). This commit is a
> step towards removing the aio_disable_external() API.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> @@ -206,9 +208,11 @@ void virtio_scsi_dataplane_stop(VirtIODevice *vdev)
> }
> s->dataplane_stopping = true;
>
> - aio_context_acquire(s->ctx);
> - aio_wait_bh_oneshot(s->ctx, virtio_scsi_dataplane_stop_bh, s);
> - aio_context_release(s->ctx);
> + if (s->bus.drain_count == 0) {
> + aio_context_acquire(s->ctx);
> + aio_wait_bh_oneshot(s->ctx, virtio_scsi_dataplane_stop_bh, s);
> + aio_context_release(s->ctx);
> + }
Same question as for virtio-blk: We lose processing the virtqueue one
last time during drain. Is it okay, and if so, why do we need it outside
of drain?
Kevin
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 20/20] aio: remove aio_disable_external() API
2023-04-25 17:27 ` [PATCH v4 20/20] aio: remove aio_disable_external() API Stefan Hajnoczi
@ 2023-05-04 21:34 ` Kevin Wolf
2023-05-11 21:24 ` Stefan Hajnoczi
0 siblings, 1 reply; 48+ messages in thread
From: Kevin Wolf @ 2023-05-04 21:34 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Daniel P. Berrangé, Juan Quintela,
Julia Suvorova, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Richard W.M. Jones, qemu-block,
Stefano Garzarella, Anthony Perard, Stefan Weil, Xie Yongji,
Paolo Bonzini, Aarushi Mehta, Philippe Mathieu-Daudé,
Eduardo Habkost, Stefano Stabellini, Hanna Reitz, Ronnie Sahlberg
Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> All callers now pass is_external=false to aio_set_fd_handler() and
> aio_set_event_notifier(). The aio_disable_external() API that
> temporarily disables fd handlers that were registered is_external=true
> is therefore dead code.
>
> Remove aio_disable_external(), aio_enable_external(), and the
> is_external arguments to aio_set_fd_handler() and
> aio_set_event_notifier().
>
> The entire test-fdmon-epoll test is removed because its sole purpose was
> testing aio_disable_external().
>
> Parts of this patch were generated using the following coccinelle
> (https://coccinelle.lip6.fr/) semantic patch:
>
> @@
> expression ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque;
> @@
> - aio_set_fd_handler(ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque)
> + aio_set_fd_handler(ctx, fd, io_read, io_write, io_poll, io_poll_ready, opaque)
>
> @@
> expression ctx, notifier, is_external, io_read, io_poll, io_poll_ready;
> @@
> - aio_set_event_notifier(ctx, notifier, is_external, io_read, io_poll, io_poll_ready)
> + aio_set_event_notifier(ctx, notifier, io_read, io_poll, io_poll_ready)
>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> diff --git a/util/fdmon-epoll.c b/util/fdmon-epoll.c
> index 1683aa1105..6b6a1a91f8 100644
> --- a/util/fdmon-epoll.c
> +++ b/util/fdmon-epoll.c
> @@ -133,13 +128,12 @@ bool fdmon_epoll_try_upgrade(AioContext *ctx, unsigned npfd)
> return false;
> }
>
> - /* Do not upgrade while external clients are disabled */
> - if (qatomic_read(&ctx->external_disable_cnt)) {
> - return false;
> - }
> -
> - if (npfd < EPOLL_ENABLE_THRESHOLD) {
> - return false;
> + if (npfd >= EPOLL_ENABLE_THRESHOLD) {
> + if (fdmon_epoll_try_enable(ctx)) {
> + return true;
> + } else {
> + fdmon_epoll_disable(ctx);
> + }
> }
>
> /* The list must not change while we add fds to epoll */
I don't understand this hunk. Why are you changing more than just
deleting the external_disable_cnt check?
Is this a mismerge with your own commit e62da985?
Kevin
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 16/20] virtio: make it possible to detach host notifier from any thread
2023-05-04 21:00 ` Kevin Wolf
@ 2023-05-10 21:40 ` Stefan Hajnoczi
0 siblings, 0 replies; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-05-10 21:40 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-devel, Daniel P. Berrangé, Juan Quintela,
Julia Suvorova, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Richard W.M. Jones, qemu-block,
Stefano Garzarella, Anthony Perard, Stefan Weil, Xie Yongji,
Paolo Bonzini, Aarushi Mehta, Philippe Mathieu-Daudé,
Eduardo Habkost, Stefano Stabellini, Hanna Reitz, Ronnie Sahlberg
[-- Attachment #1: Type: text/plain, Size: 3304 bytes --]
On Thu, May 04, 2023 at 11:00:35PM +0200, Kevin Wolf wrote:
> Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> > virtio_queue_aio_detach_host_notifier() does two things:
> > 1. It removes the fd handler from the event loop.
> > 2. It processes the virtqueue one last time.
> >
> > The first step can be peformed by any thread and without taking the
> > AioContext lock.
> >
> > The second step may need the AioContext lock (depending on the device
> > implementation) and runs in the thread where request processing takes
> > place. virtio-blk and virtio-scsi therefore call
> > virtio_queue_aio_detach_host_notifier() from a BH that is scheduled in
> > AioContext
> >
> > Scheduling a BH is undesirable for .drained_begin() functions. The next
> > patch will introduce a .drained_begin() function that needs to call
> > virtio_queue_aio_detach_host_notifier().
>
> Why is it undesirable? In my mental model, .drained_begin() is still
> free to start as many asynchronous things as it likes. The only
> important thing to take care of is that .drained_poll() returns true as
> long as the BH (or other asynchronous operation) is still pending.
>
> Of course, your way of doing things still seems to result in simpler
> code because you don't have to deal with a BH at all if you only really
> want the first part and not the second.
I have clarified this in the commit description. We can't wait
synchronously, but we could wait asynchronously as you described. It's
simpler to split the function instead of implementing async wait using
.drained_poll().
>
> > Move the virtqueue processing out to the callers of
> > virtio_queue_aio_detach_host_notifier() so that the function can be
> > called from any thread. This is in preparation for the next patch.
>
> Did you forget to remove it in virtio_queue_aio_detach_host_notifier()?
> If it's unchanged, I don't think the AioContext requirement is lifted.
Yes! Thank you :)
>
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> > hw/block/dataplane/virtio-blk.c | 2 ++
> > hw/scsi/virtio-scsi-dataplane.c | 9 +++++++++
> > 2 files changed, 11 insertions(+)
> > diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
> > index b28d81737e..bd7cc6e76b 100644
> > --- a/hw/block/dataplane/virtio-blk.c
> > +++ b/hw/block/dataplane/virtio-blk.c
> > @@ -286,8 +286,10 @@ static void virtio_blk_data_plane_stop_bh(void *opaque)
> >
> > for (i = 0; i < s->conf->num_queues; i++) {
> > VirtQueue *vq = virtio_get_queue(s->vdev, i);
> > + EventNotifier *host_notifier = virtio_queue_get_host_notifier(vq);
> >
> > virtio_queue_aio_detach_host_notifier(vq, s->ctx);
> > + virtio_queue_host_notifier_read(host_notifier);
> > }
> > }
>
> The existing code in virtio_queue_aio_detach_host_notifier() has a
> comment before the read:
>
> /* Test and clear notifier before after disabling event,
> * in case poll callback didn't have time to run. */
>
> Do we want to keep it around in the new places? (And also fix the
> "before after", I suppose, or replace it with a similar, but better
> comment that explains why we're reading here.)
I will add the comment.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 17/20] virtio-blk: implement BlockDevOps->drained_begin()
2023-05-04 21:13 ` Kevin Wolf
@ 2023-05-11 20:54 ` Stefan Hajnoczi
2023-05-11 21:22 ` Stefan Hajnoczi
1 sibling, 0 replies; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-05-11 20:54 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-devel, Daniel P. Berrangé, Juan Quintela,
Julia Suvorova, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Richard W.M. Jones, qemu-block,
Stefano Garzarella, Anthony Perard, Stefan Weil, Xie Yongji,
Paolo Bonzini, Aarushi Mehta, Philippe Mathieu-Daudé,
Eduardo Habkost, Stefano Stabellini, Hanna Reitz, Ronnie Sahlberg
[-- Attachment #1: Type: text/plain, Size: 3599 bytes --]
On Thu, May 04, 2023 at 11:13:42PM +0200, Kevin Wolf wrote:
> Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> > Detach ioeventfds during drained sections to stop I/O submission from
> > the guest. virtio-blk is no longer reliant on aio_disable_external()
> > after this patch. This will allow us to remove the
> > aio_disable_external() API once all other code that relies on it is
> > converted.
> >
> > Take extra care to avoid attaching/detaching ioeventfds if the data
> > plane is started/stopped during a drained section. This should be rare,
> > but maybe the mirror block job can trigger it.
> >
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> > hw/block/dataplane/virtio-blk.c | 17 +++++++++------
> > hw/block/virtio-blk.c | 38 ++++++++++++++++++++++++++++++++-
> > 2 files changed, 48 insertions(+), 7 deletions(-)
> >
> > diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
> > index bd7cc6e76b..d77fc6028c 100644
> > --- a/hw/block/dataplane/virtio-blk.c
> > +++ b/hw/block/dataplane/virtio-blk.c
> > @@ -245,13 +245,15 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
> > }
> >
> > /* Get this show started by hooking up our callbacks */
> > - aio_context_acquire(s->ctx);
> > - for (i = 0; i < nvqs; i++) {
> > - VirtQueue *vq = virtio_get_queue(s->vdev, i);
> > + if (!blk_in_drain(s->conf->conf.blk)) {
> > + aio_context_acquire(s->ctx);
> > + for (i = 0; i < nvqs; i++) {
> > + VirtQueue *vq = virtio_get_queue(s->vdev, i);
> >
> > - virtio_queue_aio_attach_host_notifier(vq, s->ctx);
> > + virtio_queue_aio_attach_host_notifier(vq, s->ctx);
> > + }
> > + aio_context_release(s->ctx);
> > }
> > - aio_context_release(s->ctx);
> > return 0;
> >
> > fail_aio_context:
> > @@ -317,7 +319,10 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
> > trace_virtio_blk_data_plane_stop(s);
> >
> > aio_context_acquire(s->ctx);
> > - aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s);
> > +
> > + if (!blk_in_drain(s->conf->conf.blk)) {
> > + aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s);
> > + }
>
> So here we actually get a semantic change: What you described as the
> second part in the previous patch, processing the virtqueue one last
> time, isn't done any more if the device is drained.
>
> If it's okay to just skip this during drain, why do we need to do it
> outside of drain?
Yes, it's safe because virtio_blk_data_plane_stop() has two cases:
1. The device is being reset. It is not necessary to process new
requests.
2. 'stop'/'cont'. 'cont' will call virtio_blk_data_plane_start() ->
event_notifier_set() so new requests will be processed when the guest
resumes exection.
That's why I think this is safe and the right thing to do.
However, your question led me to a pre-existing drain bug when a vCPU
resets the device during a drained section (e.g. when a mirror block job
has started a drained section and the main loop runs until the block job
exits). New requests must not be processed by
virtio_blk_data_plane_stop() because that would violate drain semantics.
It turns out requests are still processed because of
virtio_blk_data_plane_stop() -> virtio_bus_cleanup_host_notifier() ->
virtio_queue_host_notifier_read().
I think that should be handled in a separate patch series. It's not
related to aio_disable_external().
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 17/20] virtio-blk: implement BlockDevOps->drained_begin()
2023-05-04 21:13 ` Kevin Wolf
2023-05-11 20:54 ` Stefan Hajnoczi
@ 2023-05-11 21:22 ` Stefan Hajnoczi
1 sibling, 0 replies; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-05-11 21:22 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-devel, Daniel P. Berrangé, Juan Quintela,
Julia Suvorova, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Richard W.M. Jones, qemu-block,
Stefano Garzarella, Anthony Perard, Stefan Weil, Xie Yongji,
Paolo Bonzini, Aarushi Mehta, Philippe Mathieu-Daudé,
Eduardo Habkost, Stefano Stabellini, Hanna Reitz, Ronnie Sahlberg
[-- Attachment #1: Type: text/plain, Size: 3311 bytes --]
On Thu, May 04, 2023 at 11:13:42PM +0200, Kevin Wolf wrote:
> Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> > Detach ioeventfds during drained sections to stop I/O submission from
> > the guest. virtio-blk is no longer reliant on aio_disable_external()
> > after this patch. This will allow us to remove the
> > aio_disable_external() API once all other code that relies on it is
> > converted.
> >
> > Take extra care to avoid attaching/detaching ioeventfds if the data
> > plane is started/stopped during a drained section. This should be rare,
> > but maybe the mirror block job can trigger it.
> >
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> > hw/block/dataplane/virtio-blk.c | 17 +++++++++------
> > hw/block/virtio-blk.c | 38 ++++++++++++++++++++++++++++++++-
> > 2 files changed, 48 insertions(+), 7 deletions(-)
> >
> > diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
> > index bd7cc6e76b..d77fc6028c 100644
> > --- a/hw/block/dataplane/virtio-blk.c
> > +++ b/hw/block/dataplane/virtio-blk.c
> > @@ -245,13 +245,15 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
> > }
> >
> > /* Get this show started by hooking up our callbacks */
> > - aio_context_acquire(s->ctx);
> > - for (i = 0; i < nvqs; i++) {
> > - VirtQueue *vq = virtio_get_queue(s->vdev, i);
> > + if (!blk_in_drain(s->conf->conf.blk)) {
> > + aio_context_acquire(s->ctx);
> > + for (i = 0; i < nvqs; i++) {
> > + VirtQueue *vq = virtio_get_queue(s->vdev, i);
> >
> > - virtio_queue_aio_attach_host_notifier(vq, s->ctx);
> > + virtio_queue_aio_attach_host_notifier(vq, s->ctx);
> > + }
> > + aio_context_release(s->ctx);
> > }
> > - aio_context_release(s->ctx);
> > return 0;
> >
> > fail_aio_context:
> > @@ -317,7 +319,10 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
> > trace_virtio_blk_data_plane_stop(s);
> >
> > aio_context_acquire(s->ctx);
> > - aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s);
> > +
> > + if (!blk_in_drain(s->conf->conf.blk)) {
> > + aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s);
> > + }
>
> So here we actually get a semantic change: What you described as the
> second part in the previous patch, processing the virtqueue one last
> time, isn't done any more if the device is drained.
>
> If it's okay to just skip this during drain, why do we need to do it
> outside of drain?
I forgot to answer why we need to process requests one last time outside
drain.
This approach comes from how vhost uses ioeventfd. When switching from
vhost back to QEMU emulation, there's a chance that a final virtqueue
kick snuck in while ioeventfd was being disabled.
This is not the case with dataplane nowadays (it may have in the past).
The only state dataplane transitions are device reset and 'stop'/'cont'.
Neither of these require QEMU to process new requests in while stopping
dataplane.
My confidence is not 100%, but still pretty high that the
virtio_queue_host_notifier_read() call could be dropped from dataplane
code. Since I'm not 100% sure I didn't attempt that.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 20/20] aio: remove aio_disable_external() API
2023-05-04 21:34 ` Kevin Wolf
@ 2023-05-11 21:24 ` Stefan Hajnoczi
0 siblings, 0 replies; 48+ messages in thread
From: Stefan Hajnoczi @ 2023-05-11 21:24 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-devel, Daniel P. Berrangé, Juan Quintela,
Julia Suvorova, xen-devel, eesposit, Richard Henderson, Fam Zheng,
Michael S. Tsirkin, Coiby Xu, David Woodhouse, Marcel Apfelbaum,
Peter Lieven, Paul Durrant, Richard W.M. Jones, qemu-block,
Stefano Garzarella, Anthony Perard, Stefan Weil, Xie Yongji,
Paolo Bonzini, Aarushi Mehta, Philippe Mathieu-Daudé,
Eduardo Habkost, Stefano Stabellini, Hanna Reitz, Ronnie Sahlberg
[-- Attachment #1: Type: text/plain, Size: 2612 bytes --]
On Thu, May 04, 2023 at 11:34:17PM +0200, Kevin Wolf wrote:
> Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> > All callers now pass is_external=false to aio_set_fd_handler() and
> > aio_set_event_notifier(). The aio_disable_external() API that
> > temporarily disables fd handlers that were registered is_external=true
> > is therefore dead code.
> >
> > Remove aio_disable_external(), aio_enable_external(), and the
> > is_external arguments to aio_set_fd_handler() and
> > aio_set_event_notifier().
> >
> > The entire test-fdmon-epoll test is removed because its sole purpose was
> > testing aio_disable_external().
> >
> > Parts of this patch were generated using the following coccinelle
> > (https://coccinelle.lip6.fr/) semantic patch:
> >
> > @@
> > expression ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque;
> > @@
> > - aio_set_fd_handler(ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque)
> > + aio_set_fd_handler(ctx, fd, io_read, io_write, io_poll, io_poll_ready, opaque)
> >
> > @@
> > expression ctx, notifier, is_external, io_read, io_poll, io_poll_ready;
> > @@
> > - aio_set_event_notifier(ctx, notifier, is_external, io_read, io_poll, io_poll_ready)
> > + aio_set_event_notifier(ctx, notifier, io_read, io_poll, io_poll_ready)
> >
> > Reviewed-by: Juan Quintela <quintela@redhat.com>
> > Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>
> > diff --git a/util/fdmon-epoll.c b/util/fdmon-epoll.c
> > index 1683aa1105..6b6a1a91f8 100644
> > --- a/util/fdmon-epoll.c
> > +++ b/util/fdmon-epoll.c
> > @@ -133,13 +128,12 @@ bool fdmon_epoll_try_upgrade(AioContext *ctx, unsigned npfd)
> > return false;
> > }
> >
> > - /* Do not upgrade while external clients are disabled */
> > - if (qatomic_read(&ctx->external_disable_cnt)) {
> > - return false;
> > - }
> > -
> > - if (npfd < EPOLL_ENABLE_THRESHOLD) {
> > - return false;
> > + if (npfd >= EPOLL_ENABLE_THRESHOLD) {
> > + if (fdmon_epoll_try_enable(ctx)) {
> > + return true;
> > + } else {
> > + fdmon_epoll_disable(ctx);
> > + }
> > }
> >
> > /* The list must not change while we add fds to epoll */
>
> I don't understand this hunk. Why are you changing more than just
> deleting the external_disable_cnt check?
>
> Is this a mismerge with your own commit e62da985?
Yes, it's a mismerge. Thanks for catching that!
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 48+ messages in thread
end of thread, other threads:[~2023-05-11 21:25 UTC | newest]
Thread overview: 48+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-25 17:26 [PATCH v4 00/20] block: remove aio_disable_external() API Stefan Hajnoczi
2023-04-25 17:26 ` [PATCH v4 01/20] block-backend: split blk_do_set_aio_context() Stefan Hajnoczi
2023-04-25 17:26 ` [PATCH v4 02/20] hw/qdev: introduce qdev_is_realized() helper Stefan Hajnoczi
2023-04-25 17:26 ` [PATCH v4 03/20] virtio-scsi: avoid race between unplug and transport event Stefan Hajnoczi
2023-05-02 15:19 ` Kevin Wolf
2023-05-02 18:56 ` Stefan Hajnoczi
2023-05-03 8:00 ` Kevin Wolf
2023-05-03 13:12 ` Stefan Hajnoczi
2023-04-25 17:27 ` [PATCH v4 04/20] virtio-scsi: stop using aio_disable_external() during unplug Stefan Hajnoczi
2023-04-28 14:22 ` Kevin Wolf
2023-05-01 15:09 ` Stefan Hajnoczi
2023-05-02 13:19 ` Kevin Wolf
2023-05-02 20:02 ` Stefan Hajnoczi
2023-05-03 11:40 ` Kevin Wolf
2023-05-03 13:05 ` Stefan Hajnoczi
2023-04-25 17:27 ` [PATCH v4 05/20] util/vhost-user-server: rename refcount to in_flight counter Stefan Hajnoczi
2023-04-25 17:27 ` [PATCH v4 06/20] block/export: wait for vhost-user-blk requests when draining Stefan Hajnoczi
2023-05-02 15:42 ` Kevin Wolf
2023-05-02 19:40 ` Stefan Hajnoczi
2023-04-25 17:27 ` [PATCH v4 07/20] block/export: stop using is_external in vhost-user-blk server Stefan Hajnoczi
2023-05-02 16:04 ` Kevin Wolf
2023-05-02 20:06 ` Stefan Hajnoczi
2023-05-03 8:08 ` Kevin Wolf
2023-05-03 13:11 ` Stefan Hajnoczi
2023-05-03 13:43 ` Kevin Wolf
2023-04-25 17:27 ` [PATCH v4 08/20] hw/xen: do not use aio_set_fd_handler(is_external=true) in xen_xenstore Stefan Hajnoczi
2023-04-25 17:27 ` [PATCH v4 09/20] block: add blk_in_drain() API Stefan Hajnoczi
2023-04-25 17:27 ` [PATCH v4 10/20] block: drain from main loop thread in bdrv_co_yield_to_drain() Stefan Hajnoczi
2023-05-02 16:21 ` Kevin Wolf
2023-05-02 20:10 ` Stefan Hajnoczi
2023-04-25 17:27 ` [PATCH v4 11/20] xen-block: implement BlockDevOps->drained_begin() Stefan Hajnoczi
2023-04-25 17:27 ` [PATCH v4 12/20] hw/xen: do not set is_external=true on evtchn fds Stefan Hajnoczi
2023-04-25 17:27 ` [PATCH v4 13/20] block/export: rewrite vduse-blk drain code Stefan Hajnoczi
2023-04-25 17:27 ` [PATCH v4 14/20] block/export: don't require AioContext lock around blk_exp_ref/unref() Stefan Hajnoczi
2023-04-25 17:27 ` [PATCH v4 15/20] block/fuse: do not set is_external=true on FUSE fd Stefan Hajnoczi
2023-04-25 17:27 ` [PATCH v4 16/20] virtio: make it possible to detach host notifier from any thread Stefan Hajnoczi
2023-05-04 21:00 ` Kevin Wolf
2023-05-10 21:40 ` Stefan Hajnoczi
2023-04-25 17:27 ` [PATCH v4 17/20] virtio-blk: implement BlockDevOps->drained_begin() Stefan Hajnoczi
2023-05-04 21:13 ` Kevin Wolf
2023-05-11 20:54 ` Stefan Hajnoczi
2023-05-11 21:22 ` Stefan Hajnoczi
2023-04-25 17:27 ` [PATCH v4 18/20] virtio-scsi: " Stefan Hajnoczi
2023-05-04 21:25 ` Kevin Wolf
2023-04-25 17:27 ` [PATCH v4 19/20] virtio: do not set is_external=true on host notifiers Stefan Hajnoczi
2023-04-25 17:27 ` [PATCH v4 20/20] aio: remove aio_disable_external() API Stefan Hajnoczi
2023-05-04 21:34 ` Kevin Wolf
2023-05-11 21:24 ` Stefan Hajnoczi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).