[PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency
@ 2026-01-13 17:48 Jaehoon Kim
  2026-01-13 17:48 ` [PATCH RFC v1 1/3] aio-poll: avoid unnecessary polling time computation Jaehoon Kim
                   ` (4 more replies)
  0 siblings, 5 replies; 30+ messages in thread
From: Jaehoon Kim @ 2026-01-13 17:48 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: pbonzini, stefanha, fam, armbru, eblake, berrange, eduardo, dave,
	sw, Jaehoon Kim

Dear all,

I'm submitting this patch series for review under the RFC tag.

This patch series refines the aio_poll adaptive polling logic to reduce
unnecessary busy-waiting and improve CPU efficiency.

The first patch prevents redundant polling time calculation when polling
is disabled. The second patch enhances the adaptive polling mechanism by
dynamically adjusting the iothread's polling duration based on event
intervals measured by individual AioHandlers. The third patch introduces
a new 'poll-weight' parameter to adjust how much the current interval
influences the next polling duration.

We evaluated the patches on an s390x host with a single guest using 16
virtio block devices backed by FCP multipath devices in a separate-disk
setup, with the I/O scheduler set to 'none' in both host and guest.

The fio workload included sequential and random read/write with varying
numbers of jobs (1,4,8,16) and io_depth of 8. The tests were conducted
with single and dual iothreads, using the newly introduced poll-weight
parameter to measure their impact on CPU cost and throughput.

Compared to the baseline, across four FIO workload patterns (sequential
R/W, random R/W), and averaged over FIO job counts of 1, 4, 8, and 16,
throughput decreased slightly (-3% to -8% for one iothread, -2% to -5%
for two iothreads), while CPU usage on the s390x host dropped
significantly (-10% to -25% and -7% to -12%, respectively).

Best regards,
Jaehoon Kim

Jaehoon Kim (3):
  aio-poll: avoid unnecessary polling time computation
  aio-poll: refine iothread polling using weighted handler intervals
  qapi/iothread: introduce poll-weight parameter for aio-poll

 include/qemu/aio.h                |   8 +-
 include/system/iothread.h         |   1 +
 iothread.c                        |  10 ++
 monitor/hmp-cmds.c                |   1 +
 qapi/misc.json                    |   6 ++
 qapi/qom.json                     |   8 +-
 qemu-options.hx                   |   7 +-
 tests/unit/test-nested-aio-poll.c |   2 +-
 util/aio-posix.c                  | 151 +++++++++++++++++++++---------
 util/aio-win32.c                  |   3 +-
 util/async.c                      |   2 +
 11 files changed, 147 insertions(+), 52 deletions(-)

-- 
2.50.1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH RFC v1 1/3] aio-poll: avoid unnecessary polling time computation
  2026-01-13 17:48 [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency Jaehoon Kim
@ 2026-01-13 17:48 ` Jaehoon Kim
  2026-02-16 14:58   ` Stefan Hajnoczi
  2026-02-16 15:21   ` Stefan Hajnoczi
  2026-01-13 17:48 ` [PATCH RFC v1 2/3] aio-poll: refine iothread polling using weighted handler intervals Jaehoon Kim
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 30+ messages in thread
From: Jaehoon Kim @ 2026-01-13 17:48 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: pbonzini, stefanha, fam, armbru, eblake, berrange, eduardo, dave,
	sw, Jaehoon Kim

Nodes are no longer added to poll_aio_handlers when adaptive polling is
disabled, preventing unnecessary try_poll_mode() calls. Additionally,
aio_poll() skips try_poll_mode() when timeout is 0.

This avoids iterating over all nodes to compute max_ns unnecessarily
when polling is disabled or timeout is 0.

Signed-off-by: Jaehoon Kim <jhkim@linux.ibm.com>
---
 util/aio-posix.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/util/aio-posix.c b/util/aio-posix.c
index e24b955fd9..7ddf92a25f 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -306,9 +306,8 @@ static bool aio_dispatch_handler(AioContext *ctx, AioHandler *node)
      * fdmon_supports_polling(), but only until the fd fires for the first
      * time.
      */
-    if (!QLIST_IS_INSERTED(node, node_deleted) &&
-        !QLIST_IS_INSERTED(node, node_poll) &&
-        node->io_poll) {
+    if (ctx->poll_max_ns && !QLIST_IS_INSERTED(node, node_deleted) &&
+        !QLIST_IS_INSERTED(node, node_poll) && node->io_poll) {
         trace_poll_add(ctx, node, node->pfd.fd, revents);
         if (ctx->poll_started && node->io_poll_begin) {
             node->io_poll_begin(node->opaque);
@@ -630,7 +629,7 @@ static void adjust_polling_time(AioContext *ctx, AioPolledEvent *poll,
 bool aio_poll(AioContext *ctx, bool blocking)
 {
     AioHandlerList ready_list = QLIST_HEAD_INITIALIZER(ready_list);
-    bool progress;
+    bool progress = false;
     bool use_notify_me;
     int64_t timeout;
     int64_t start = 0;
@@ -655,7 +654,9 @@ bool aio_poll(AioContext *ctx, bool blocking)
     }
 
     timeout = blocking ? aio_compute_timeout(ctx) : 0;
-    progress = try_poll_mode(ctx, &ready_list, &timeout);
+    if ((ctx->poll_max_ns != 0) && (timeout != 0)) {
+        progress = try_poll_mode(ctx, &ready_list, &timeout);
+    }
     assert(!(timeout && progress));
 
     /*
-- 
2.50.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC v1 2/3] aio-poll: refine iothread polling using weighted handler intervals
  2026-01-13 17:48 [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency Jaehoon Kim
  2026-01-13 17:48 ` [PATCH RFC v1 1/3] aio-poll: avoid unnecessary polling time computation Jaehoon Kim
@ 2026-01-13 17:48 ` Jaehoon Kim
  2026-01-13 17:48 ` [PATCH RFC v1 3/3] qapi/iothread: introduce poll-weight parameter for aio-poll Jaehoon Kim
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 30+ messages in thread
From: Jaehoon Kim @ 2026-01-13 17:48 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: pbonzini, stefanha, fam, armbru, eblake, berrange, eduardo, dave,
	sw, Jaehoon Kim

Refine adaptive polling in aio_poll by updating iothread polling
duration based on weighted AioHandler event intervals.

Each AioHandler's poll.ns is updated using a weighted factor when an
event occurs. Idle handlers accumulate block_ns until poll_max_ns and
then reset to 0, preventing sporadically active handlers from
unnecessarily prolonging iothread polling.

The iothread polling duration is set based on the largest poll.ns among
active handlers. The shrink divider defaults to 2, matching the grow
rate, to reduce frequent poll_ns resets for slow devices.

The default weight factor (POLL_WEIGHT_SHIFT=2) was selected based on
various FIO tests to balance mean poll_ns, reset frequency, and high
poll_ns occurrences. Correlation between current block_ns and weighted
value (adj_block_ns) slightly decreases as weight increases. Lower
weights cause more fluctuation; higher weights maintain poll_ns once it
rises.

The table below shows results for a representative randread case (bs=4k,
iodepth=8, 2 IOThread), illustrating the average poll_ns, the ratio of
poll_ns resets to 0, and the time spent near the maximum poll_ns for
different weight values.

Weight| Mean poll_ns |  poll_ns reset rate | Time near max(%)
------| ------------ | ------------------- | -----------------
  1   |     4523     |       89.9%         |     7.41%
  2   |     8442     |       78.6%         |     15.84%
  3   |    11147     |       70.4%         |     21.38%
  4   |    11624     |       70.1%         |     23.35%

Weight=1 reacts quickly, Weight=3-4 holds poll_ns higher once it rises,
and Weight=2 provides a good balance between responsiveness and CPU
usage.

Signed-off-by: Jaehoon Kim <jhkim@linux.ibm.com>
---
 include/qemu/aio.h |   4 +-
 util/aio-posix.c   | 135 +++++++++++++++++++++++++++++++--------------
 util/async.c       |   1 +
 3 files changed, 99 insertions(+), 41 deletions(-)

diff --git a/include/qemu/aio.h b/include/qemu/aio.h
index 8cca2360d1..6c77a190e9 100644
--- a/include/qemu/aio.h
+++ b/include/qemu/aio.h
@@ -195,7 +195,8 @@ struct BHListSlice {
 typedef QSLIST_HEAD(, AioHandler) AioHandlerSList;
 
 typedef struct AioPolledEvent {
-    int64_t ns;        /* current polling time in nanoseconds */
+    bool has_event; /* Flag to indicate if an event has occurred */
+    int64_t ns;     /* estimated block time in nanoseconds */
 } AioPolledEvent;
 
 struct AioContext {
@@ -306,6 +307,7 @@ struct AioContext {
     int poll_disable_cnt;
 
     /* Polling mode parameters */
+    int64_t poll_ns;        /* current polling time in nanoseconds */
     int64_t poll_max_ns;    /* maximum polling time in nanoseconds */
     int64_t poll_grow;      /* polling time growth factor */
     int64_t poll_shrink;    /* polling time shrink factor */
diff --git a/util/aio-posix.c b/util/aio-posix.c
index 7ddf92a25f..dd6008898b 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -28,9 +28,11 @@
 
 /* Stop userspace polling on a handler if it isn't active for some time */
 #define POLL_IDLE_INTERVAL_NS (7 * NANOSECONDS_PER_SECOND)
+#define POLL_WEIGHT_SHIFT   (2)
 
-static void adjust_polling_time(AioContext *ctx, AioPolledEvent *poll,
-                                int64_t block_ns);
+static void adjust_block_ns(AioContext *ctx, int64_t block_ns);
+static void grow_polling_time(AioContext *ctx, int64_t block_ns);
+static void shrink_polling_time(AioContext *ctx, int64_t block_ns);
 
 bool aio_poll_disabled(AioContext *ctx)
 {
@@ -372,7 +374,7 @@ static bool aio_dispatch_ready_handlers(AioContext *ctx,
          * add the handler to ctx->poll_aio_handlers.
          */
         if (ctx->poll_max_ns && QLIST_IS_INSERTED(node, node_poll)) {
-            adjust_polling_time(ctx, &node->poll, block_ns);
+            node->poll.has_event = true;
         }
     }
 
@@ -559,18 +561,13 @@ static bool run_poll_handlers(AioContext *ctx, AioHandlerList *ready_list,
 static bool try_poll_mode(AioContext *ctx, AioHandlerList *ready_list,
                           int64_t *timeout)
 {
-    AioHandler *node;
     int64_t max_ns;
 
     if (QLIST_EMPTY_RCU(&ctx->poll_aio_handlers)) {
         return false;
     }
 
-    max_ns = 0;
-    QLIST_FOREACH(node, &ctx->poll_aio_handlers, node_poll) {
-        max_ns = MAX(max_ns, node->poll.ns);
-    }
-    max_ns = qemu_soonest_timeout(*timeout, max_ns);
+    max_ns = qemu_soonest_timeout(*timeout, ctx->poll_ns);
 
     if (max_ns && !ctx->fdmon_ops->need_wait(ctx)) {
         /*
@@ -586,46 +583,98 @@ static bool try_poll_mode(AioContext *ctx, AioHandlerList *ready_list,
     return false;
 }
 
-static void adjust_polling_time(AioContext *ctx, AioPolledEvent *poll,
-                                int64_t block_ns)
+static void shrink_polling_time(AioContext *ctx, int64_t block_ns)
 {
-    if (block_ns <= poll->ns) {
-        /* This is the sweet spot, no adjustment needed */
-    } else if (block_ns > ctx->poll_max_ns) {
-        /* We'd have to poll for too long, poll less */
-        int64_t old = poll->ns;
-
-        if (ctx->poll_shrink) {
-            poll->ns /= ctx->poll_shrink;
-        } else {
-            poll->ns = 0;
-        }
+    /*
+     * Reduce polling time if the block_ns is zero or
+     * less than the current poll_ns.
+     */
+    int64_t old = ctx->poll_ns;
+    int64_t shrink = ctx->poll_shrink;
 
-        trace_poll_shrink(ctx, old, poll->ns);
-    } else if (poll->ns < ctx->poll_max_ns &&
-               block_ns < ctx->poll_max_ns) {
-        /* There is room to grow, poll longer */
-        int64_t old = poll->ns;
-        int64_t grow = ctx->poll_grow;
+    if (shrink == 0) {
+        shrink = 2;
+    }
 
-        if (grow == 0) {
-            grow = 2;
-        }
+    if (block_ns < (ctx->poll_ns / shrink)) {
+        ctx->poll_ns /= shrink;
+    }
 
-        if (poll->ns) {
-            poll->ns *= grow;
-        } else {
-            poll->ns = 4000; /* start polling at 4 microseconds */
-        }
+    trace_poll_shrink(ctx, old, ctx->poll_ns);
+}
 
-        if (poll->ns > ctx->poll_max_ns) {
-            poll->ns = ctx->poll_max_ns;
-        }
+static void grow_polling_time(AioContext *ctx, int64_t block_ns)
+{
+    /* There is room to grow, poll longer */
+    int64_t old = ctx->poll_ns;
+    int64_t grow = ctx->poll_grow;
 
-        trace_poll_grow(ctx, old, poll->ns);
+    if (grow == 0) {
+        grow = 2;
     }
+
+    if (block_ns > ctx->poll_ns * grow) {
+        ctx->poll_ns = block_ns;
+    } else {
+        ctx->poll_ns *= grow;
+    }
+
+    if (ctx->poll_ns > ctx->poll_max_ns) {
+        ctx->poll_ns = ctx->poll_max_ns;
+    }
+
+    trace_poll_grow(ctx, old, ctx->poll_ns);
 }
 
+static void adjust_block_ns(AioContext *ctx, int64_t block_ns)
+{
+    AioHandler *node;
+    int64_t adj_block_ns = -1;
+
+    QLIST_FOREACH(node, &ctx->poll_aio_handlers, node_poll) {
+        if (node->poll.has_event) {
+            /*
+             * Update poll.ns for the node with an event.
+             * Uses a weighted average of the current block_ns and the previous
+             * poll.ns to smooth out polling time adjustments.
+             */
+            node->poll.ns = node->poll.ns
+                ? (node->poll.ns - (node->poll.ns >> POLL_WEIGHT_SHIFT))
+                + (block_ns >> POLL_WEIGHT_SHIFT) : block_ns;
+
+            if (node->poll.ns >= ctx->poll_max_ns) {
+                node->poll.ns = 0;
+            }
+            /*
+             * To avoid excessive polling time increase, update adj_block_ns
+             * for nodes with the event flag set to true
+             */
+            adj_block_ns = MAX(adj_block_ns, node->poll.ns);
+            node->poll.has_event = false;
+         } else {
+            /*
+             * No event now, but was active before.
+             * If it waits longer than poll_max_ns, poll.ns will stay 0
+             * until the next event arrives.
+             */
+            if (node->poll.ns != 0) {
+                node->poll.ns += block_ns;
+                if (node->poll.ns >= ctx->poll_max_ns) {
+                    node->poll.ns = 0;
+                }
+            }
+        }
+    }
+
+    if (adj_block_ns >= 0) {
+        if (adj_block_ns > ctx->poll_ns) {
+            grow_polling_time(ctx, adj_block_ns);
+        } else {
+            shrink_polling_time(ctx, adj_block_ns);
+         }
+     }
+ }
+
 bool aio_poll(AioContext *ctx, bool blocking)
 {
     AioHandlerList ready_list = QLIST_HEAD_INITIALIZER(ready_list);
@@ -722,6 +771,10 @@ bool aio_poll(AioContext *ctx, bool blocking)
 
     aio_free_deleted_handlers(ctx);
 
+    if (ctx->poll_max_ns) {
+        adjust_block_ns(ctx, block_ns);
+    }
+
     qemu_lockcnt_dec(&ctx->list_lock);
 
     progress |= timerlistgroup_run_timers(&ctx->tlg);
@@ -783,6 +836,7 @@ void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
 
     qemu_lockcnt_inc(&ctx->list_lock);
     QLIST_FOREACH(node, &ctx->aio_handlers, node) {
+        node->poll.has_event = false;
         node->poll.ns = 0;
     }
     qemu_lockcnt_dec(&ctx->list_lock);
@@ -793,6 +847,7 @@ void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
     ctx->poll_max_ns = max_ns;
     ctx->poll_grow = grow;
     ctx->poll_shrink = shrink;
+    ctx->poll_ns = 0;
 
     aio_notify(ctx);
 }
diff --git a/util/async.c b/util/async.c
index 80d6b01a8a..9d3627566f 100644
--- a/util/async.c
+++ b/util/async.c
@@ -606,6 +606,7 @@ AioContext *aio_context_new(Error **errp)
     timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx);
 
     ctx->poll_max_ns = 0;
+    ctx->poll_ns = 0;
     ctx->poll_grow = 0;
     ctx->poll_shrink = 0;
 
-- 
2.50.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC v1 3/3] qapi/iothread: introduce poll-weight parameter for aio-poll
  2026-01-13 17:48 [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency Jaehoon Kim
  2026-01-13 17:48 ` [PATCH RFC v1 1/3] aio-poll: avoid unnecessary polling time computation Jaehoon Kim
  2026-01-13 17:48 ` [PATCH RFC v1 2/3] aio-poll: refine iothread polling using weighted handler intervals Jaehoon Kim
@ 2026-01-13 17:48 ` Jaehoon Kim
  2026-01-14  7:48   ` Markus Armbruster
  2026-01-19 18:16 ` [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency Stefan Hajnoczi
  2026-02-19 22:27 ` Stefan Hajnoczi
  4 siblings, 1 reply; 30+ messages in thread
From: Jaehoon Kim @ 2026-01-13 17:48 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: pbonzini, stefanha, fam, armbru, eblake, berrange, eduardo, dave,
	sw, Jaehoon Kim

Introduce a new poll-weight parameter for aio-poll. This parameter
controls how much the most recent event interval affects the next
polling duration. When set to 0, a default value of 2 is used, meaning
the current interval contributes roughly 25% to the calculation. Larger
values decrease the weight of the current interval, enabling more
gradual adjustments to polling duration.

Signed-off-by: Jaehoon Kim <jhkim@linux.ibm.com>
---
 include/qemu/aio.h                |  4 +++-
 include/system/iothread.h         |  1 +
 iothread.c                        | 10 ++++++++++
 monitor/hmp-cmds.c                |  1 +
 qapi/misc.json                    |  6 ++++++
 qapi/qom.json                     |  8 +++++++-
 qemu-options.hx                   |  7 ++++++-
 tests/unit/test-nested-aio-poll.c |  2 +-
 util/aio-posix.c                  |  9 ++++++---
 util/aio-win32.c                  |  3 ++-
 util/async.c                      |  1 +
 11 files changed, 44 insertions(+), 8 deletions(-)

diff --git a/include/qemu/aio.h b/include/qemu/aio.h
index 6c77a190e9..50b8db2712 100644
--- a/include/qemu/aio.h
+++ b/include/qemu/aio.h
@@ -311,6 +311,7 @@ struct AioContext {
     int64_t poll_max_ns;    /* maximum polling time in nanoseconds */
     int64_t poll_grow;      /* polling time growth factor */
     int64_t poll_shrink;    /* polling time shrink factor */
+    int64_t poll_weight;    /* weight of current interval in calculation */
 
     /* AIO engine parameters */
     int64_t aio_max_batch;  /* maximum number of requests in a batch */
@@ -792,12 +793,13 @@ void aio_context_destroy(AioContext *ctx);
  * @max_ns: how long to busy poll for, in nanoseconds
  * @grow: polling time growth factor
  * @shrink: polling time shrink factor
+ * @weight: weight factor applied to the current polling interval
  *
  * Poll mode can be disabled by setting poll_max_ns to 0.
  */
 void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
                                  int64_t grow, int64_t shrink,
-                                 Error **errp);
+                                 int64_t weight, Error **errp);
 
 /**
  * aio_context_set_aio_params:
diff --git a/include/system/iothread.h b/include/system/iothread.h
index e26d13c6c7..6ea57ed126 100644
--- a/include/system/iothread.h
+++ b/include/system/iothread.h
@@ -38,6 +38,7 @@ struct IOThread {
     int64_t poll_max_ns;
     int64_t poll_grow;
     int64_t poll_shrink;
+    int64_t poll_weight;
 };
 typedef struct IOThread IOThread;
 
diff --git a/iothread.c b/iothread.c
index caf68e0764..68a944e57c 100644
--- a/iothread.c
+++ b/iothread.c
@@ -164,6 +164,7 @@ static void iothread_set_aio_context_params(EventLoopBase *base, Error **errp)
                                 iothread->poll_max_ns,
                                 iothread->poll_grow,
                                 iothread->poll_shrink,
+                                iothread->poll_weight,
                                 errp);
     if (*errp) {
         return;
@@ -233,6 +234,9 @@ static IOThreadParamInfo poll_grow_info = {
 static IOThreadParamInfo poll_shrink_info = {
     "poll-shrink", offsetof(IOThread, poll_shrink),
 };
+static IOThreadParamInfo poll_weight_info = {
+    "poll-weight", offsetof(IOThread, poll_weight),
+};
 
 static void iothread_get_param(Object *obj, Visitor *v,
         const char *name, IOThreadParamInfo *info, Error **errp)
@@ -288,6 +292,7 @@ static void iothread_set_poll_param(Object *obj, Visitor *v,
                                     iothread->poll_max_ns,
                                     iothread->poll_grow,
                                     iothread->poll_shrink,
+                                    iothread->poll_weight,
                                     errp);
     }
 }
@@ -311,6 +316,10 @@ static void iothread_class_init(ObjectClass *klass, const void *class_data)
                               iothread_get_poll_param,
                               iothread_set_poll_param,
                               NULL, &poll_shrink_info);
+    object_class_property_add(klass, "poll-weight", "int",
+                              iothread_get_poll_param,
+                              iothread_set_poll_param,
+                              NULL, &poll_weight_info);
 }
 
 static const TypeInfo iothread_info = {
@@ -356,6 +365,7 @@ static int query_one_iothread(Object *object, void *opaque)
     info->poll_max_ns = iothread->poll_max_ns;
     info->poll_grow = iothread->poll_grow;
     info->poll_shrink = iothread->poll_shrink;
+    info->poll_weight = iothread->poll_weight;
     info->aio_max_batch = iothread->parent_obj.aio_max_batch;
 
     QAPI_LIST_APPEND(*tail, info);
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 5a673cddb2..40e3b1da50 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -205,6 +205,7 @@ void hmp_info_iothreads(Monitor *mon, const QDict *qdict)
         monitor_printf(mon, "  poll-max-ns=%" PRId64 "\n", value->poll_max_ns);
         monitor_printf(mon, "  poll-grow=%" PRId64 "\n", value->poll_grow);
         monitor_printf(mon, "  poll-shrink=%" PRId64 "\n", value->poll_shrink);
+        monitor_printf(mon, "  poll-weight=%" PRId64 "\n", value->poll_weight);
         monitor_printf(mon, "  aio-max-batch=%" PRId64 "\n",
                        value->aio_max_batch);
     }
diff --git a/qapi/misc.json b/qapi/misc.json
index 28c641fe2f..b21cc48a03 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -85,6 +85,11 @@
 # @poll-shrink: how many ns will be removed from polling time, 0 means
 #     that it's not configured (since 2.9)
 #
+# @poll-weight: the weight factor for adaptive polling.
+#     Determines how much the current event interval contributes to
+#     the next polling time calculation.  0 means that the default
+#     value is used.  (since 10.1)
+#
 # @aio-max-batch: maximum number of requests in a batch for the AIO
 #     engine, 0 means that the engine will use its default (since 6.1)
 #
@@ -96,6 +101,7 @@
            'poll-max-ns': 'int',
            'poll-grow': 'int',
            'poll-shrink': 'int',
+           'poll-weight': 'int',
            'aio-max-batch': 'int' } }
 
 ##
diff --git a/qapi/qom.json b/qapi/qom.json
index 6f5c9de0f0..d90823478d 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -606,6 +606,11 @@
 #     algorithm detects it is spending too long polling without
 #     encountering events.  0 selects a default behaviour (default: 0)
 #
+# @poll-weight: the weight factor for adaptive polling.
+#     Determines how much the current event interval contributes to
+#     the next polling time calculation.  0 selects a default
+#     behaviour (default: 0) since 10.1.
+#
 # The @aio-max-batch option is available since 6.1.
 #
 # Since: 2.0
@@ -614,7 +619,8 @@
   'base': 'EventLoopBaseProperties',
   'data': { '*poll-max-ns': 'int',
             '*poll-grow': 'int',
-            '*poll-shrink': 'int' } }
+            '*poll-shrink': 'int',
+            '*poll-weight': 'int' } }
 
 ##
 # @MainLoopProperties:
diff --git a/qemu-options.hx b/qemu-options.hx
index ec92723f10..74adaf55fc 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -6352,7 +6352,7 @@ SRST
 
             CN=laptop.example.com,O=Example Home,L=London,ST=London,C=GB
 
-    ``-object iothread,id=id,poll-max-ns=poll-max-ns,poll-grow=poll-grow,poll-shrink=poll-shrink,aio-max-batch=aio-max-batch``
+    ``-object iothread,id=id,poll-max-ns=poll-max-ns,poll-grow=poll-grow,poll-shrink=poll-shrink,poll-weight=poll-weight,aio-max-batch=aio-max-batch``
         Creates a dedicated event loop thread that devices can be
         assigned to. This is known as an IOThread. By default device
         emulation happens in vCPU threads or the main event loop thread.
@@ -6388,6 +6388,11 @@ SRST
         the polling time when the algorithm detects it is spending too
         long polling without encountering events.
 
+        The ``poll-weight`` parameter is the weight factor used in the
+        adaptive polling algorithm. It determines how much the most
+        recent event interval affects the calculation of the next
+        polling duration.
+
         The ``aio-max-batch`` parameter is the maximum number of requests
         in a batch for the AIO engine, 0 means that the engine will use
         its default.
diff --git a/tests/unit/test-nested-aio-poll.c b/tests/unit/test-nested-aio-poll.c
index 9ab1ad08a7..ed791fa23b 100644
--- a/tests/unit/test-nested-aio-poll.c
+++ b/tests/unit/test-nested-aio-poll.c
@@ -81,7 +81,7 @@ static void test(void)
     qemu_set_current_aio_context(td.ctx);
 
     /* Enable polling */
-    aio_context_set_poll_params(td.ctx, 1000000, 2, 2, &error_abort);
+    aio_context_set_poll_params(td.ctx, 1000000, 2, 2, 0, &error_abort);
 
     /* Make the event notifier active (set) right away */
     event_notifier_init(&td.poll_notifier, 1);
diff --git a/util/aio-posix.c b/util/aio-posix.c
index dd6008898b..d4f3d8ca8f 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -630,6 +630,7 @@ static void adjust_block_ns(AioContext *ctx, int64_t block_ns)
 {
     AioHandler *node;
     int64_t adj_block_ns = -1;
+    int64_t poll_weight = ctx->poll_weight ? : POLL_WEIGHT_SHIFT;
 
     QLIST_FOREACH(node, &ctx->poll_aio_handlers, node_poll) {
         if (node->poll.has_event) {
@@ -639,8 +640,8 @@ static void adjust_block_ns(AioContext *ctx, int64_t block_ns)
              * poll.ns to smooth out polling time adjustments.
              */
             node->poll.ns = node->poll.ns
-                ? (node->poll.ns - (node->poll.ns >> POLL_WEIGHT_SHIFT))
-                + (block_ns >> POLL_WEIGHT_SHIFT) : block_ns;
+                ? (node->poll.ns - (node->poll.ns >> poll_weight))
+                + (block_ns >> poll_weight) : block_ns;
 
             if (node->poll.ns >= ctx->poll_max_ns) {
                 node->poll.ns = 0;
@@ -830,7 +831,8 @@ void aio_context_destroy(AioContext *ctx)
 }
 
 void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
-                                 int64_t grow, int64_t shrink, Error **errp)
+                                 int64_t grow, int64_t shrink,
+                                 int64_t weight, Error **errp)
 {
     AioHandler *node;
 
@@ -847,6 +849,7 @@ void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
     ctx->poll_max_ns = max_ns;
     ctx->poll_grow = grow;
     ctx->poll_shrink = shrink;
+    ctx->poll_weight = weight;
     ctx->poll_ns = 0;
 
     aio_notify(ctx);
diff --git a/util/aio-win32.c b/util/aio-win32.c
index 6e6f699e4b..1985843233 100644
--- a/util/aio-win32.c
+++ b/util/aio-win32.c
@@ -429,7 +429,8 @@ void aio_context_destroy(AioContext *ctx)
 }
 
 void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
-                                 int64_t grow, int64_t shrink, Error **errp)
+                                 int64_t grow, int64_t shrink,
+                                 int64_t weight, Error **errp)
 {
     if (max_ns) {
         error_setg(errp, "AioContext polling is not implemented on Windows");
diff --git a/util/async.c b/util/async.c
index 9d3627566f..741fcfd6a7 100644
--- a/util/async.c
+++ b/util/async.c
@@ -609,6 +609,7 @@ AioContext *aio_context_new(Error **errp)
     ctx->poll_ns = 0;
     ctx->poll_grow = 0;
     ctx->poll_shrink = 0;
+    ctx->poll_weight = 0;
 
     ctx->aio_max_batch = 0;
 
-- 
2.50.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v1 3/3] qapi/iothread: introduce poll-weight parameter for aio-poll
  2026-01-13 17:48 ` [PATCH RFC v1 3/3] qapi/iothread: introduce poll-weight parameter for aio-poll Jaehoon Kim
@ 2026-01-14  7:48   ` Markus Armbruster
  2026-01-15  5:14     ` JAEHOON KIM
  0 siblings, 1 reply; 30+ messages in thread
From: Markus Armbruster @ 2026-01-14  7:48 UTC (permalink / raw)
  To: Jaehoon Kim
  Cc: qemu-devel, qemu-block, pbonzini, stefanha, fam, eblake, berrange,
	eduardo, dave, sw

Jaehoon Kim <jhkim@linux.ibm.com> writes:

> Introduce a new poll-weight parameter for aio-poll. This parameter
> controls how much the most recent event interval affects the next
> polling duration. When set to 0, a default value of 2 is used, meaning
> the current interval contributes roughly 25% to the calculation. Larger
> values decrease the weight of the current interval, enabling more
> gradual adjustments to polling duration.
>
> Signed-off-by: Jaehoon Kim <jhkim@linux.ibm.com>

[...]

> diff --git a/qapi/misc.json b/qapi/misc.json
> index 28c641fe2f..b21cc48a03 100644
> --- a/qapi/misc.json
> +++ b/qapi/misc.json
> @@ -85,6 +85,11 @@
>  # @poll-shrink: how many ns will be removed from polling time, 0 means
>  #     that it's not configured (since 2.9)
>  #
> +# @poll-weight: the weight factor for adaptive polling.
> +#     Determines how much the current event interval contributes to
> +#     the next polling time calculation.  0 means that the default
> +#     value is used.  (since 10.1)

When the default value is used, the actual value being used remains
hidden.  Why?

> +#
>  # @aio-max-batch: maximum number of requests in a batch for the AIO
>  #     engine, 0 means that the engine will use its default (since 6.1)
>  #
> @@ -96,6 +101,7 @@
>             'poll-max-ns': 'int',
>             'poll-grow': 'int',
>             'poll-shrink': 'int',
> +           'poll-weight': 'int',
>             'aio-max-batch': 'int' } }
>  
>  ##
> diff --git a/qapi/qom.json b/qapi/qom.json
> index 6f5c9de0f0..d90823478d 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -606,6 +606,11 @@
>  #     algorithm detects it is spending too long polling without
>  #     encountering events.  0 selects a default behaviour (default: 0)
>  #
> +# @poll-weight: the weight factor for adaptive polling.
> +#     Determines how much the current event interval contributes to
> +#     the next polling time calculation.  0 selects a default
> +#     behaviour (default: 0) since 10.1.

This leaves the actual default behavior unspecified.  Is this a good
idea?

> +#
>  # The @aio-max-batch option is available since 6.1.
>  #
>  # Since: 2.0
> @@ -614,7 +619,8 @@
>    'base': 'EventLoopBaseProperties',
>    'data': { '*poll-max-ns': 'int',
>              '*poll-grow': 'int',
> -            '*poll-shrink': 'int' } }
> +            '*poll-shrink': 'int',
> +            '*poll-weight': 'int' } }
>  
>  ##
>  # @MainLoopProperties:

[...]



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v1 3/3] qapi/iothread: introduce poll-weight parameter for aio-poll
  2026-01-14  7:48   ` Markus Armbruster
@ 2026-01-15  5:14     ` JAEHOON KIM
  2026-01-15  7:28       ` Markus Armbruster
  0 siblings, 1 reply; 30+ messages in thread
From: JAEHOON KIM @ 2026-01-15  5:14 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, qemu-block, pbonzini, stefanha, fam, eblake, berrange,
	eduardo, dave, sw

On 1/14/2026 1:48 AM, Markus Armbruster wrote:
> Jaehoon Kim <jhkim@linux.ibm.com> writes:
>
>> Introduce a new poll-weight parameter for aio-poll. This parameter
>> controls how much the most recent event interval affects the next
>> polling duration. When set to 0, a default value of 2 is used, meaning
>> the current interval contributes roughly 25% to the calculation. Larger
>> values decrease the weight of the current interval, enabling more
>> gradual adjustments to polling duration.
>>
>> Signed-off-by: Jaehoon Kim <jhkim@linux.ibm.com>
> [...]
>
>> diff --git a/qapi/misc.json b/qapi/misc.json
>> index 28c641fe2f..b21cc48a03 100644
>> --- a/qapi/misc.json
>> +++ b/qapi/misc.json
>> @@ -85,6 +85,11 @@
>>   # @poll-shrink: how many ns will be removed from polling time, 0 means
>>   #     that it's not configured (since 2.9)
>>   #
>> +# @poll-weight: the weight factor for adaptive polling.
>> +#     Determines how much the current event interval contributes to
>> +#     the next polling time calculation.  0 means that the default
>> +#     value is used.  (since 10.1)
> When the default value is used, the actual value being used remains
> hidden.  Why?
Actually, I just followed the existing pattern of poll-grow, which also 
defaults to a factor of 2 when set to 0.
It wasn't my intention to hide the value; I kept this because the 
previous API has been working fine without issues.
If you think the actual value should be visible, I'll consider ways to 
make it explicit in the next version.
>> +#
>>   # @aio-max-batch: maximum number of requests in a batch for the AIO
>>   #     engine, 0 means that the engine will use its default (since 6.1)
>>   #
>> @@ -96,6 +101,7 @@
>>              'poll-max-ns': 'int',
>>              'poll-grow': 'int',
>>              'poll-shrink': 'int',
>> +           'poll-weight': 'int',
>>              'aio-max-batch': 'int' } }
>>   
>>   ##
>> diff --git a/qapi/qom.json b/qapi/qom.json
>> index 6f5c9de0f0..d90823478d 100644
>> --- a/qapi/qom.json
>> +++ b/qapi/qom.json
>> @@ -606,6 +606,11 @@
>>   #     algorithm detects it is spending too long polling without
>>   #     encountering events.  0 selects a default behaviour (default: 0)
>>   #
>> +# @poll-weight: the weight factor for adaptive polling.
>> +#     Determines how much the current event interval contributes to
>> +#     the next polling time calculation.  0 selects a default
>> +#     behaviour (default: 0) since 10.1.
> This leaves the actual default behavior unspecified.  Is this a good
> idea?
I agree that the documentation should be more explicit.
I'll update it to clarify that the default factor is 2 and explain its 
meaning.
>> +#
>>   # The @aio-max-batch option is available since 6.1.
>>   #
>>   # Since: 2.0
>> @@ -614,7 +619,8 @@
>>     'base': 'EventLoopBaseProperties',
>>     'data': { '*poll-max-ns': 'int',
>>               '*poll-grow': 'int',
>> -            '*poll-shrink': 'int' } }
>> +            '*poll-shrink': 'int',
>> +            '*poll-weight': 'int' } }
>>   
>>   ##
>>   # @MainLoopProperties:
> [...]
>
>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v1 3/3] qapi/iothread: introduce poll-weight parameter for aio-poll
  2026-01-15  5:14     ` JAEHOON KIM
@ 2026-01-15  7:28       ` Markus Armbruster
  2026-01-15 10:05         ` Halil Pasic
  0 siblings, 1 reply; 30+ messages in thread
From: Markus Armbruster @ 2026-01-15  7:28 UTC (permalink / raw)
  To: JAEHOON KIM
  Cc: qemu-devel, qemu-block, pbonzini, stefanha, fam, eblake, berrange,
	eduardo, dave, sw, devel

Cc: devel@lists.libvirt.org for a possible query-iothreads change
discussed below.

JAEHOON KIM <jhkim@linux.ibm.com> writes:

> On 1/14/2026 1:48 AM, Markus Armbruster wrote:
>> Jaehoon Kim <jhkim@linux.ibm.com> writes:
>>
>>> Introduce a new poll-weight parameter for aio-poll. This parameter
>>> controls how much the most recent event interval affects the next
>>> polling duration. When set to 0, a default value of 2 is used, meaning
>>> the current interval contributes roughly 25% to the calculation. Larger
>>> values decrease the weight of the current interval, enabling more
>>> gradual adjustments to polling duration.
>>>
>>> Signed-off-by: Jaehoon Kim <jhkim@linux.ibm.com>
>> [...]
>>
>>> diff --git a/qapi/misc.json b/qapi/misc.json
>>> index 28c641fe2f..b21cc48a03 100644
>>> --- a/qapi/misc.json
>>> +++ b/qapi/misc.json
>>> @@ -85,6 +85,11 @@
>>>  # @poll-shrink: how many ns will be removed from polling time, 0 means
>>>  #     that it's not configured (since 2.9)
>>>  #
>>> +# @poll-weight: the weight factor for adaptive polling.
>>> +#     Determines how much the current event interval contributes to
>>> +#     the next polling time calculation.  0 means that the default
>>> +#     value is used.  (since 10.1)
>>
>> When the default value is used, the actual value being used remains
>> hidden.  Why?
>
> Actually, I just followed the existing pattern of poll-grow, which also 
> defaults to a factor of 2 when set to 0.

Yes, and consistency is always desirable.  But let's have a look at the
new interface in isolation, to see whether it's actually good.

> It wasn't my intention to hide the value; I kept this because the 
> previous API has been working fine without issues.
> If you think the actual value should be visible, I'll consider ways to 
> make it explicit in the next version.

As is, query-iothreads tells us "the weight factor for adaptive polling
is X, and it was set by the user", or "the weight factor for adaptive
polling was not set by the user, but picked by the system."

If we returned the actual value, it would tell us "the weight factor for
adaptive polling is X".  

Only the former interface tells us whether the user or the system
picked.

Only the latter interface tells us what the system picked.

Which one is useful in practice?

I'd argue the latter.  A management application knows whether it set a
value without query-iothreads' help, but it doesn't know what the system
picked.  The people coding it may know if a contract specifies what the
system picks (see below).

If we conclude that returning the actual value is better for new
@poll-weight, then it would surely be better for @poll-grow and
@poll-shrink, too.  Could we still improve them?

Libvirt developers, any advice?

>>> +#
>>>  # @aio-max-batch: maximum number of requests in a batch for the AIO
>>>  #     engine, 0 means that the engine will use its default (since 6.1)
>>>  #
>>> @@ -96,6 +101,7 @@
>>>              'poll-max-ns': 'int',
>>>              'poll-grow': 'int',
>>>              'poll-shrink': 'int',
>>> +           'poll-weight': 'int',
>>>              'aio-max-batch': 'int' } }
>>>   
>>>  ##
>>> diff --git a/qapi/qom.json b/qapi/qom.json
>>> index 6f5c9de0f0..d90823478d 100644
>>> --- a/qapi/qom.json
>>> +++ b/qapi/qom.json
>>> @@ -606,6 +606,11 @@
>>>  #     algorithm detects it is spending too long polling without
>>>  #     encountering events.  0 selects a default behaviour (default: 0)
>>>  #
>>> +# @poll-weight: the weight factor for adaptive polling.
>>> +#     Determines how much the current event interval contributes to
>>> +#     the next polling time calculation.  0 selects a default
>>> +#     behaviour (default: 0) since 10.1.
>>
>> This leaves the actual default behavior unspecified.  Is this a good
>> idea?
>
> I agree that the documentation should be more explicit.
> I'll update it to clarify that the default factor is 2 and explain its 
> meaning.

I understand that you're mirroring how @poll-grow and @poll-shrink work,
but let's ignore that for a minute.

Compare four possible interfaces:

1. Optional @poll-weight defaults to 2.  Values <= 0 are rejected.

2. Optional @poll-weight defaults to 2.  Value 0 is replaced by the
   default value 2.  Values < 0 are rejected.

3. Optional @poll-weight defaults to 0.  Values < 0 are rejected.  Value
   0 makes the system pick a value, namely 2.

4. Optional @poll-weight defaults to 0.  Values < 0 are rejected.  Value
   0 makes the system pick a value.  It currently picks 2.

The difference between 3. and 4. is that 3. makes "system picks 2" part
of the contract, while 4. doesn't.

1. is the simplest.  Is 2.'s additional complexity worthwhile?  3.'s?
4.'s?

>>> +#
>>>  # The @aio-max-batch option is available since 6.1.
>>>  #
>>>  # Since: 2.0
>>> @@ -614,7 +619,8 @@
>>>     'base': 'EventLoopBaseProperties',
>>>     'data': { '*poll-max-ns': 'int',
>>>               '*poll-grow': 'int',
>>> -            '*poll-shrink': 'int' } }
>>> +            '*poll-shrink': 'int',
>>> +            '*poll-weight': 'int' } }
>>>   
>>>  ##
>>>  # @MainLoopProperties:
>>
>> [...]



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v1 3/3] qapi/iothread: introduce poll-weight parameter for aio-poll
  2026-01-15  7:28       ` Markus Armbruster
@ 2026-01-15 10:05         ` Halil Pasic
  2026-01-15 16:00           ` JAEHOON KIM
  2026-01-16  8:19           ` Markus Armbruster
  0 siblings, 2 replies; 30+ messages in thread
From: Halil Pasic @ 2026-01-15 10:05 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: JAEHOON KIM, qemu-devel, qemu-block, pbonzini, stefanha, fam,
	eblake, berrange, eduardo, dave, sw, devel, Halil Pasic

On Thu, 15 Jan 2026 08:28:51 +0100
Markus Armbruster <armbru@redhat.com> wrote:

> I understand that you're mirroring how @poll-grow and @poll-shrink work,
> but let's ignore that for a minute.
> 
> Compare four possible interfaces:
> 
> 1. Optional @poll-weight defaults to 2.  Values <= 0 are rejected.
> 
> 2. Optional @poll-weight defaults to 2.  Value 0 is replaced by the
>    default value 2.  Values < 0 are rejected.
> 
> 3. Optional @poll-weight defaults to 0.  Values < 0 are rejected.  Value
>    0 makes the system pick a value, namely 2.
> 
> 4. Optional @poll-weight defaults to 0.  Values < 0 are rejected.  Value
>    0 makes the system pick a value.  It currently picks 2.
> 
> The difference between 3. and 4. is that 3. makes "system picks 2" part
> of the contract, while 4. doesn't.
> 
> 1. is the simplest.  Is 2.'s additional complexity worthwhile?  3.'s?
> 4.'s?

Isn't there more options? Like

5. Optional @poll-weight defaults to system-default.  Value 0 is replaced
by the system pick the system default value. Currently the system default
value is 2. Values < 0 are rejected.

That would mean:
* current value inspectable
* system default not part of the interface contract
* interface offers a "please go back to value not user specified:
  operation

BTW I like your approach with explicitly listing and evaluating the
options a lot!

Regards,
Halil


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v1 3/3] qapi/iothread: introduce poll-weight parameter for aio-poll
  2026-01-15 10:05         ` Halil Pasic
@ 2026-01-15 16:00           ` JAEHOON KIM
  2026-01-16  8:19           ` Markus Armbruster
  1 sibling, 0 replies; 30+ messages in thread
From: JAEHOON KIM @ 2026-01-15 16:00 UTC (permalink / raw)
  To: Halil Pasic, Markus Armbruster
  Cc: qemu-devel, qemu-block, pbonzini, stefanha, fam, eblake, berrange,
	eduardo, dave, sw, devel

On 1/15/2026 4:05 AM, Halil Pasic wrote:
> On Thu, 15 Jan 2026 08:28:51 +0100
> Markus Armbruster <armbru@redhat.com> wrote:
>
>> I understand that you're mirroring how @poll-grow and @poll-shrink work,
>> but let's ignore that for a minute.
>>
>> Compare four possible interfaces:
>>
>> 1. Optional @poll-weight defaults to 2.  Values <= 0 are rejected.
>>
>> 2. Optional @poll-weight defaults to 2.  Value 0 is replaced by the
>>     default value 2.  Values < 0 are rejected.
>>
>> 3. Optional @poll-weight defaults to 0.  Values < 0 are rejected.  Value
>>     0 makes the system pick a value, namely 2.
>>
>> 4. Optional @poll-weight defaults to 0.  Values < 0 are rejected.  Value
>>     0 makes the system pick a value.  It currently picks 2.
>>
>> The difference between 3. and 4. is that 3. makes "system picks 2" part
>> of the contract, while 4. doesn't.
>>
>> 1. is the simplest.  Is 2.'s additional complexity worthwhile?  3.'s?
>> 4.'s?
> Isn't there more options? Like
>
> 5. Optional @poll-weight defaults to system-default.  Value 0 is replaced
> by the system pick the system default value. Currently the system default
> value is 2. Values < 0 are rejected.
>
> That would mean:
> * current value inspectable
> * system default not part of the interface contract
> * interface offers a "please go back to value not user specified:
>    operation
>
> BTW I like your approach with explicitly listing and evaluating the
> options a lot!
>
> Regards,
> Halil
>
Thank you both for laying out options 1-5 so clearly; the detailed 
breakdown was very helpful.
After considering the trade-offs, I agree that "Option 1" is the 
simplest and most robust interface. It ensures the value exposed to 
users always reflects the actual effect.
Option 5 is a clever way to reset values, but I'm leaning toward Option 
1 to keep the interface as predictable as possible.
Avoiding special meanings for '0' makes the logic easier for users to 
reason about.

I will update the next revision to follow "Option 1".
Thanks again for the feedback!




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v1 3/3] qapi/iothread: introduce poll-weight parameter for aio-poll
  2026-01-15 10:05         ` Halil Pasic
  2026-01-15 16:00           ` JAEHOON KIM
@ 2026-01-16  8:19           ` Markus Armbruster
  1 sibling, 0 replies; 30+ messages in thread
From: Markus Armbruster @ 2026-01-16  8:19 UTC (permalink / raw)
  To: Halil Pasic
  Cc: JAEHOON KIM, qemu-devel, qemu-block, pbonzini, stefanha, fam,
	eblake, berrange, eduardo, dave, sw, devel

Halil Pasic <pasic@linux.ibm.com> writes:

> On Thu, 15 Jan 2026 08:28:51 +0100
> Markus Armbruster <armbru@redhat.com> wrote:
>
>> I understand that you're mirroring how @poll-grow and @poll-shrink work,
>> but let's ignore that for a minute.
>> 
>> Compare four possible interfaces:
>> 
>> 1. Optional @poll-weight defaults to 2.  Values <= 0 are rejected.
>> 
>> 2. Optional @poll-weight defaults to 2.  Value 0 is replaced by the
>>    default value 2.  Values < 0 are rejected.
>> 
>> 3. Optional @poll-weight defaults to 0.  Values < 0 are rejected.  Value
>>    0 makes the system pick a value, namely 2.
>> 
>> 4. Optional @poll-weight defaults to 0.  Values < 0 are rejected.  Value
>>    0 makes the system pick a value.  It currently picks 2.
>> 
>> The difference between 3. and 4. is that 3. makes "system picks 2" part
>> of the contract, while 4. doesn't.
>> 
>> 1. is the simplest.  Is 2.'s additional complexity worthwhile?  3.'s?
>> 4.'s?
>
> Isn't there more options? Like

Yes :)

> 5. Optional @poll-weight defaults to system-default.  Value 0 is replaced
> by the system pick the system default value. Currently the system default
> value is 2. Values < 0 are rejected.
>
> That would mean:
> * current value inspectable
> * system default not part of the interface contract
> * interface offers a "please go back to value not user specified:
>   operation
>
> BTW I like your approach with explicitly listing and evaluating the
> options a lot!

Thanks!



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency
  2026-01-13 17:48 [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency Jaehoon Kim
                   ` (2 preceding siblings ...)
  2026-01-13 17:48 ` [PATCH RFC v1 3/3] qapi/iothread: introduce poll-weight parameter for aio-poll Jaehoon Kim
@ 2026-01-19 18:16 ` Stefan Hajnoczi
  2026-01-23 19:15   ` JAEHOON KIM
  2026-02-19 22:27 ` Stefan Hajnoczi
  4 siblings, 1 reply; 30+ messages in thread
From: Stefan Hajnoczi @ 2026-01-19 18:16 UTC (permalink / raw)
  To: Jaehoon Kim
  Cc: qemu-devel, qemu-block, pbonzini, fam, armbru, eblake, berrange,
	eduardo, dave, sw

[-- Attachment #1: Type: text/plain, Size: 1171 bytes --]

On Tue, Jan 13, 2026 at 11:48:21AM -0600, Jaehoon Kim wrote:
> We evaluated the patches on an s390x host with a single guest using 16
> virtio block devices backed by FCP multipath devices in a separate-disk
> setup, with the I/O scheduler set to 'none' in both host and guest.
> 
> The fio workload included sequential and random read/write with varying
> numbers of jobs (1,4,8,16) and io_depth of 8. The tests were conducted
> with single and dual iothreads, using the newly introduced poll-weight
> parameter to measure their impact on CPU cost and throughput.
> 
> Compared to the baseline, across four FIO workload patterns (sequential
> R/W, random R/W), and averaged over FIO job counts of 1, 4, 8, and 16,
> throughput decreased slightly (-3% to -8% for one iothread, -2% to -5%
> for two iothreads), while CPU usage on the s390x host dropped
> significantly (-10% to -25% and -7% to -12%, respectively).

Hi Jaehoon,
I would like to run the same fio benchmarks on a local NVMe drive (<10us
request latency) to see how that type of hardware configuration is
affected. Are the scripts and fio job files available somewhere?

Thanks,
Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency
  2026-01-19 18:16 ` [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency Stefan Hajnoczi
@ 2026-01-23 19:15   ` JAEHOON KIM
  2026-01-27 21:11     ` Stefan Hajnoczi
  2026-02-03 21:12     ` Stefan Hajnoczi
  0 siblings, 2 replies; 30+ messages in thread
From: JAEHOON KIM @ 2026-01-23 19:15 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, qemu-block, pbonzini, fam, armbru, eblake, berrange,
	eduardo, dave, sw

On 1/19/2026 12:16 PM, Stefan Hajnoczi wrote:
> On Tue, Jan 13, 2026 at 11:48:21AM -0600, Jaehoon Kim wrote:
>> We evaluated the patches on an s390x host with a single guest using 16
>> virtio block devices backed by FCP multipath devices in a separate-disk
>> setup, with the I/O scheduler set to 'none' in both host and guest.
>>
>> The fio workload included sequential and random read/write with varying
>> numbers of jobs (1,4,8,16) and io_depth of 8. The tests were conducted
>> with single and dual iothreads, using the newly introduced poll-weight
>> parameter to measure their impact on CPU cost and throughput.
>>
>> Compared to the baseline, across four FIO workload patterns (sequential
>> R/W, random R/W), and averaged over FIO job counts of 1, 4, 8, and 16,
>> throughput decreased slightly (-3% to -8% for one iothread, -2% to -5%
>> for two iothreads), while CPU usage on the s390x host dropped
>> significantly (-10% to -25% and -7% to -12%, respectively).
> Hi Jaehoon,
> I would like to run the same fio benchmarks on a local NVMe drive (<10us
> request latency) to see how that type of hardware configuration is
> affected. Are the scripts and fio job files available somewhere?
>
> Thanks,
> Stefan

Thank you for your reply.
The fio scripts are not available in a location you can access, but there is nothing particularly special in the settings.
I’m sharing below the methodology and test setup used by our performance team.

Guest Setup
----------------------
- 12 vCPUs, 4 GiB memory
- 16 virtio disks based on the FCP multipath devices in the host

FIO test parameters
-----------------------
- FIO Version: fio-3.33
- Filesize: 2G
- Blocksize: 8K / 128K
- Direct I/O: 1
- FIO I/O Engine: libaio
- NUMJOB List: 1, 4, 8, 16
- IODEPTH: 8
- Runtime (s): 150

Two FIO samples for random read
--------------------------------
fio --direct=1 --name=test --numjobs=16 --filename=base.0.0:base.1.0:base.2.0:base.3.0:base.4.0:base.5.0:base.6.0:base.7.0:base.8.0:base.9.0:base.10.0:base.11.0:base.12.0:base.13.0:base.14.0:base.15.0 --size=32G  --time_based --runtime=4m --readwrite=randread --ioengine=libaio --iodepth=8 --bs=8k
fio --direct=1 --name=test --numjobs=4  --filename=subw1/base.0.0:subw4/base.3.0:subw8/base.7.0:subw12/base.11.0:subw16/base.15.0                                                                        --size=8G   --time_based --runtime=4m --readwrite=randread --ioengine=libaio --iodepth=8 --bs=8k


additional notes
----------------
- Each file is placed on a separate disk device mounted under subw<n> as specified in --filename=....
- We execute one warmup run, then two measurement runs and calculate the average

Please let me know if you need any additional information.

Regards,
Jaehoon Kim



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency
  2026-01-23 19:15   ` JAEHOON KIM
@ 2026-01-27 21:11     ` Stefan Hajnoczi
  2026-02-03 21:12     ` Stefan Hajnoczi
  1 sibling, 0 replies; 30+ messages in thread
From: Stefan Hajnoczi @ 2026-01-27 21:11 UTC (permalink / raw)
  To: JAEHOON KIM
  Cc: qemu-devel, qemu-block, pbonzini, fam, armbru, eblake, berrange,
	eduardo, dave, sw

[-- Attachment #1: Type: text/plain, Size: 3004 bytes --]

On Fri, Jan 23, 2026 at 01:15:04PM -0600, JAEHOON KIM wrote:
> On 1/19/2026 12:16 PM, Stefan Hajnoczi wrote:
> > On Tue, Jan 13, 2026 at 11:48:21AM -0600, Jaehoon Kim wrote:
> > > We evaluated the patches on an s390x host with a single guest using 16
> > > virtio block devices backed by FCP multipath devices in a separate-disk
> > > setup, with the I/O scheduler set to 'none' in both host and guest.
> > > 
> > > The fio workload included sequential and random read/write with varying
> > > numbers of jobs (1,4,8,16) and io_depth of 8. The tests were conducted
> > > with single and dual iothreads, using the newly introduced poll-weight
> > > parameter to measure their impact on CPU cost and throughput.
> > > 
> > > Compared to the baseline, across four FIO workload patterns (sequential
> > > R/W, random R/W), and averaged over FIO job counts of 1, 4, 8, and 16,
> > > throughput decreased slightly (-3% to -8% for one iothread, -2% to -5%
> > > for two iothreads), while CPU usage on the s390x host dropped
> > > significantly (-10% to -25% and -7% to -12%, respectively).
> > Hi Jaehoon,
> > I would like to run the same fio benchmarks on a local NVMe drive (<10us
> > request latency) to see how that type of hardware configuration is
> > affected. Are the scripts and fio job files available somewhere?
> > 
> > Thanks,
> > Stefan
> 
> Thank you for your reply.
> The fio scripts are not available in a location you can access, but there is nothing particularly special in the settings.
> I’m sharing below the methodology and test setup used by our performance team.
> 
> Guest Setup
> ----------------------
> - 12 vCPUs, 4 GiB memory
> - 16 virtio disks based on the FCP multipath devices in the host
> 
> FIO test parameters
> -----------------------
> - FIO Version: fio-3.33
> - Filesize: 2G
> - Blocksize: 8K / 128K
> - Direct I/O: 1
> - FIO I/O Engine: libaio
> - NUMJOB List: 1, 4, 8, 16
> - IODEPTH: 8
> - Runtime (s): 150
> 
> Two FIO samples for random read
> --------------------------------
> fio --direct=1 --name=test --numjobs=16 --filename=base.0.0:base.1.0:base.2.0:base.3.0:base.4.0:base.5.0:base.6.0:base.7.0:base.8.0:base.9.0:base.10.0:base.11.0:base.12.0:base.13.0:base.14.0:base.15.0 --size=32G  --time_based --runtime=4m --readwrite=randread --ioengine=libaio --iodepth=8 --bs=8k
> fio --direct=1 --name=test --numjobs=4  --filename=subw1/base.0.0:subw4/base.3.0:subw8/base.7.0:subw12/base.11.0:subw16/base.15.0                                                                        --size=8G   --time_based --runtime=4m --readwrite=randread --ioengine=libaio --iodepth=8 --bs=8k
> 
> 
> additional notes
> ----------------
> - Each file is placed on a separate disk device mounted under subw<n> as specified in --filename=....
> - We execute one warmup run, then two measurement runs and calculate the average

Thanks, I will share x86_64 with fast local NVMe results when I have collected them.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency
  2026-01-23 19:15   ` JAEHOON KIM
  2026-01-27 21:11     ` Stefan Hajnoczi
@ 2026-02-03 21:12     ` Stefan Hajnoczi
  2026-02-06  6:50       ` JAEHOON KIM
  1 sibling, 1 reply; 30+ messages in thread
From: Stefan Hajnoczi @ 2026-02-03 21:12 UTC (permalink / raw)
  To: JAEHOON KIM
  Cc: qemu-devel, qemu-block, pbonzini, fam, armbru, eblake, berrange,
	eduardo, dave, sw

[-- Attachment #1: Type: text/plain, Size: 6014 bytes --]

On Fri, Jan 23, 2026 at 01:15:04PM -0600, JAEHOON KIM wrote:
> On 1/19/2026 12:16 PM, Stefan Hajnoczi wrote:
> > On Tue, Jan 13, 2026 at 11:48:21AM -0600, Jaehoon Kim wrote:
> > > We evaluated the patches on an s390x host with a single guest using 16
> > > virtio block devices backed by FCP multipath devices in a separate-disk
> > > setup, with the I/O scheduler set to 'none' in both host and guest.
> > > 
> > > The fio workload included sequential and random read/write with varying
> > > numbers of jobs (1,4,8,16) and io_depth of 8. The tests were conducted
> > > with single and dual iothreads, using the newly introduced poll-weight
> > > parameter to measure their impact on CPU cost and throughput.
> > > 
> > > Compared to the baseline, across four FIO workload patterns (sequential
> > > R/W, random R/W), and averaged over FIO job counts of 1, 4, 8, and 16,
> > > throughput decreased slightly (-3% to -8% for one iothread, -2% to -5%
> > > for two iothreads), while CPU usage on the s390x host dropped
> > > significantly (-10% to -25% and -7% to -12%, respectively).
> > Hi Jaehoon,
> > I would like to run the same fio benchmarks on a local NVMe drive (<10us
> > request latency) to see how that type of hardware configuration is
> > affected. Are the scripts and fio job files available somewhere?
> > 
> > Thanks,
> > Stefan
> 
> Thank you for your reply.
> The fio scripts are not available in a location you can access, but there is nothing particularly special in the settings.
> I’m sharing below the methodology and test setup used by our performance team.
> 
> Guest Setup
> ----------------------
> - 12 vCPUs, 4 GiB memory
> - 16 virtio disks based on the FCP multipath devices in the host
> 
> FIO test parameters
> -----------------------
> - FIO Version: fio-3.33
> - Filesize: 2G
> - Blocksize: 8K / 128K
> - Direct I/O: 1
> - FIO I/O Engine: libaio
> - NUMJOB List: 1, 4, 8, 16
> - IODEPTH: 8
> - Runtime (s): 150
> 
> Two FIO samples for random read
> --------------------------------
> fio --direct=1 --name=test --numjobs=16 --filename=base.0.0:base.1.0:base.2.0:base.3.0:base.4.0:base.5.0:base.6.0:base.7.0:base.8.0:base.9.0:base.10.0:base.11.0:base.12.0:base.13.0:base.14.0:base.15.0 --size=32G  --time_based --runtime=4m --readwrite=randread --ioengine=libaio --iodepth=8 --bs=8k
> fio --direct=1 --name=test --numjobs=4  --filename=subw1/base.0.0:subw4/base.3.0:subw8/base.7.0:subw12/base.11.0:subw16/base.15.0                                                                        --size=8G   --time_based --runtime=4m --readwrite=randread --ioengine=libaio --iodepth=8 --bs=8k
> 
> 
> additional notes
> ----------------
> - Each file is placed on a separate disk device mounted under subw<n> as specified in --filename=....
> - We execute one warmup run, then two measurement runs and calculate the average

Hi Jaehoon,
I ran fio benchmarks on an Intel Optane SSD DC P4800X Series drive (<10
microsecond latency). This is with just 1 drive.

The 8 KiB block size results show something similar to what you
reported: there are IOPS (or throughput) regressions and CPU utilization
improvements.

Although the CPU improvements are welcome, I think the default behavior
should only be changed if the IOPS regressions can be brought below 5%.

The regressions seem to happen regardless of whether 1 or 2 IOThreads
are configured. CPU utilization is different (98% vs 78%) depending on
the number of IOThreads, so the regressions happen across a range of CPU
utilizations.

The 128 KiB block size results are not interesting because the drive
already saturates at numjobs=1. This is expected since the drive cannot
go much above ~2 GiB/s throughput.

You can find the Ansible playbook, libvirt domain XML, fio
command-lines, and the fio/sar data here:

https://gitlab.com/stefanha/virt-playbooks/-/tree/aio-polling-efficiency

Please let me know if you'd like me to rerun the benchmark with new
patches or a configuration change.

Do you want to have a video call to discuss your work and how to get the
patches merged?

Host
----
CPU: Intel Xeon Silver 4214 CPU @ 2.20GHz
RAM: 32 GiB

Guest
-----
vCPUs: 8
RAM: 4 GiB
Disk: 1 virtio-blk aio=native cache=none

IOPS
----
rw        bs   numjobs iothreads iops   diff
randread  8k   1       1         163417 -7.8%
randread  8k   1       2         165041 -2.4%
randread  8k   4       1         221508 -0.64%
randread  8k   4       2         251298 0.008%
randread  8k   8       1         222128 -0.51%
randread  8k   8       2         249489 -2.6%
randread  8k   16      1         230535 -0.18%
randread  8k   16      2         246732 -0.22%
randread  128k 1       1          17616 -0.11%
randread  128k 1       2          17678 0.027%
randread  128k 4       1          17536 -0.27%
randread  128k 4       2          17610 -0.031%
randread  128k 8       1          17369 -0.42%
randread  128k 8       2          17433 -0.071%
randread  128k 16      1          17215 -0.61%
randread  128k 16      2          17269 -0.22%
randwrite 8k   1       1         156597 -3.1%
randwrite 8k   1       2         157720 -3.8%
randwrite 8k   4       1         218448 -0.5%
randwrite 8k   4       2         247075 -5.1%
randwrite 8k   8       1         220866 -0.75%
randwrite 8k   8       2         260935 -0.011%
randwrite 8k   16      1         230913 0.23%
randwrite 8k   16      2         261125 -0.01%
randwrite 128k 1       1          16009 0.094%
randwrite 128k 1       2          16070 0.035%
randwrite 128k 4       1          16073 -0.62%
randwrite 128k 4       2          16131 0.059%
randwrite 128k 8       1          16106 0.092%
randwrite 128k 8       2          16153 0.048%
randwrite 128k 16      1          16102 -0.0091%
randwrite 128k 16      2          16160 0.048%

IOThread CPU usage
------------------
iothreads before  after
1         98.7    95.81
2         78.43   66.13

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency
  2026-02-03 21:12     ` Stefan Hajnoczi
@ 2026-02-06  6:50       ` JAEHOON KIM
  2026-02-12 18:53         ` Stefan Hajnoczi
  0 siblings, 1 reply; 30+ messages in thread
From: JAEHOON KIM @ 2026-02-06  6:50 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, qemu-block, pbonzini, fam, armbru, eblake, berrange,
	eduardo, dave, sw

On 2/3/2026 3:12 PM, Stefan Hajnoczi wrote:
> On Fri, Jan 23, 2026 at 01:15:04PM -0600, JAEHOON KIM wrote:
>> On 1/19/2026 12:16 PM, Stefan Hajnoczi wrote:
>>> On Tue, Jan 13, 2026 at 11:48:21AM -0600, Jaehoon Kim wrote:
>>>> We evaluated the patches on an s390x host with a single guest using 16
>>>> virtio block devices backed by FCP multipath devices in a separate-disk
>>>> setup, with the I/O scheduler set to 'none' in both host and guest.
>>>>
>>>> The fio workload included sequential and random read/write with varying
>>>> numbers of jobs (1,4,8,16) and io_depth of 8. The tests were conducted
>>>> with single and dual iothreads, using the newly introduced poll-weight
>>>> parameter to measure their impact on CPU cost and throughput.
>>>>
>>>> Compared to the baseline, across four FIO workload patterns (sequential
>>>> R/W, random R/W), and averaged over FIO job counts of 1, 4, 8, and 16,
>>>> throughput decreased slightly (-3% to -8% for one iothread, -2% to -5%
>>>> for two iothreads), while CPU usage on the s390x host dropped
>>>> significantly (-10% to -25% and -7% to -12%, respectively).
>>> Hi Jaehoon,
>>> I would like to run the same fio benchmarks on a local NVMe drive (<10us
>>> request latency) to see how that type of hardware configuration is
>>> affected. Are the scripts and fio job files available somewhere?
>>>
>>> Thanks,
>>> Stefan
>> Thank you for your reply.
>> The fio scripts are not available in a location you can access, but there is nothing particularly special in the settings.
>> I’m sharing below the methodology and test setup used by our performance team.
>>
>> Guest Setup
>> ----------------------
>> - 12 vCPUs, 4 GiB memory
>> - 16 virtio disks based on the FCP multipath devices in the host
>>
>> FIO test parameters
>> -----------------------
>> - FIO Version: fio-3.33
>> - Filesize: 2G
>> - Blocksize: 8K / 128K
>> - Direct I/O: 1
>> - FIO I/O Engine: libaio
>> - NUMJOB List: 1, 4, 8, 16
>> - IODEPTH: 8
>> - Runtime (s): 150
>>
>> Two FIO samples for random read
>> --------------------------------
>> fio --direct=1 --name=test --numjobs=16 --filename=base.0.0:base.1.0:base.2.0:base.3.0:base.4.0:base.5.0:base.6.0:base.7.0:base.8.0:base.9.0:base.10.0:base.11.0:base.12.0:base.13.0:base.14.0:base.15.0 --size=32G  --time_based --runtime=4m --readwrite=randread --ioengine=libaio --iodepth=8 --bs=8k
>> fio --direct=1 --name=test --numjobs=4  --filename=subw1/base.0.0:subw4/base.3.0:subw8/base.7.0:subw12/base.11.0:subw16/base.15.0                                                                        --size=8G   --time_based --runtime=4m --readwrite=randread --ioengine=libaio --iodepth=8 --bs=8k
>>
>>
>> additional notes
>> ----------------
>> - Each file is placed on a separate disk device mounted under subw<n> as specified in --filename=....
>> - We execute one warmup run, then two measurement runs and calculate the average
> Hi Jaehoon,
> I ran fio benchmarks on an Intel Optane SSD DC P4800X Series drive (<10
> microsecond latency). This is with just 1 drive.
>
> The 8 KiB block size results show something similar to what you
> reported: there are IOPS (or throughput) regressions and CPU utilization
> improvements.
>
> Although the CPU improvements are welcome, I think the default behavior
> should only be changed if the IOPS regressions can be brought below 5%.
>
> The regressions seem to happen regardless of whether 1 or 2 IOThreads
> are configured. CPU utilization is different (98% vs 78%) depending on
> the number of IOThreads, so the regressions happen across a range of CPU
> utilizations.
>
> The 128 KiB block size results are not interesting because the drive
> already saturates at numjobs=1. This is expected since the drive cannot
> go much above ~2 GiB/s throughput.
>
> You can find the Ansible playbook, libvirt domain XML, fio
> command-lines, and the fio/sar data here:
>
> https://gitlab.com/stefanha/virt-playbooks/-/tree/aio-polling-efficiency
>
> Please let me know if you'd like me to rerun the benchmark with new
> patches or a configuration change.
>
> Do you want to have a video call to discuss your work and how to get the
> patches merged?
>
> Host
> ----
> CPU: Intel Xeon Silver 4214 CPU @ 2.20GHz
> RAM: 32 GiB
>
> Guest
> -----
> vCPUs: 8
> RAM: 4 GiB
> Disk: 1 virtio-blk aio=native cache=none
>
> IOPS
> ----
> rw        bs   numjobs iothreads iops   diff
> randread  8k   1       1         163417 -7.8%
> randread  8k   1       2         165041 -2.4%
> randread  8k   4       1         221508 -0.64%
> randread  8k   4       2         251298 0.008%
> randread  8k   8       1         222128 -0.51%
> randread  8k   8       2         249489 -2.6%
> randread  8k   16      1         230535 -0.18%
> randread  8k   16      2         246732 -0.22%
> randread  128k 1       1          17616 -0.11%
> randread  128k 1       2          17678 0.027%
> randread  128k 4       1          17536 -0.27%
> randread  128k 4       2          17610 -0.031%
> randread  128k 8       1          17369 -0.42%
> randread  128k 8       2          17433 -0.071%
> randread  128k 16      1          17215 -0.61%
> randread  128k 16      2          17269 -0.22%
> randwrite 8k   1       1         156597 -3.1%
> randwrite 8k   1       2         157720 -3.8%
> randwrite 8k   4       1         218448 -0.5%
> randwrite 8k   4       2         247075 -5.1%
> randwrite 8k   8       1         220866 -0.75%
> randwrite 8k   8       2         260935 -0.011%
> randwrite 8k   16      1         230913 0.23%
> randwrite 8k   16      2         261125 -0.01%
> randwrite 128k 1       1          16009 0.094%
> randwrite 128k 1       2          16070 0.035%
> randwrite 128k 4       1          16073 -0.62%
> randwrite 128k 4       2          16131 0.059%
> randwrite 128k 8       1          16106 0.092%
> randwrite 128k 8       2          16153 0.048%
> randwrite 128k 16      1          16102 -0.0091%
> randwrite 128k 16      2          16160 0.048%
>
> IOThread CPU usage
> ------------------
> iothreads before  after
> 1         98.7    95.81
> 2         78.43   66.13
>
> Stefan

Hello Stefan,

Thank you very much for your effort in running these benchmarks.
The results show a pattern very similar to what our performance team
observed.

I fully agree with the 5% threshold for the default behavior.
However, we need an approach that balances the current performance
oriented polling scheme with CPU efficiency.

I found that relying on grow/shrink parameters was too limited to
achieve these results. This is why I've adjusted the process using a
weight-based grow/shrink approach to ensure the polling window remains
robust against jitter. Specifically, it avoids abrupt resets to zero
by implementing a gradual shrink rather than an immediate reset, even
when device latency exceeds the threshold.

As seen in both your results and our team's measurements, this may lead
to a bit of a performance trade-off, but it provides a reasonable
balance for CPU-sensitive environment.

Thank you for suggesting the video call and I am also looking forward to
hearing your thoughts. I'm on US Central Time. Except for Tuesday, I can
adjust my schedule to a time that works for you.

Please let me know your preferred time.

Regards,
Jaehoon Kim



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency
  2026-02-06  6:50       ` JAEHOON KIM
@ 2026-02-12 18:53         ` Stefan Hajnoczi
  2026-02-13 15:13           ` JAEHOON KIM
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Hajnoczi @ 2026-02-12 18:53 UTC (permalink / raw)
  To: JAEHOON KIM
  Cc: qemu-devel, qemu-block, pbonzini, fam, armbru, eblake, berrange,
	eduardo, dave, sw

[-- Attachment #1: Type: text/plain, Size: 8143 bytes --]

On Fri, Feb 06, 2026 at 12:50:38AM -0600, JAEHOON KIM wrote:
> On 2/3/2026 3:12 PM, Stefan Hajnoczi wrote:
> > On Fri, Jan 23, 2026 at 01:15:04PM -0600, JAEHOON KIM wrote:
> > > On 1/19/2026 12:16 PM, Stefan Hajnoczi wrote:
> > > > On Tue, Jan 13, 2026 at 11:48:21AM -0600, Jaehoon Kim wrote:
> > > > > We evaluated the patches on an s390x host with a single guest using 16
> > > > > virtio block devices backed by FCP multipath devices in a separate-disk
> > > > > setup, with the I/O scheduler set to 'none' in both host and guest.
> > > > > 
> > > > > The fio workload included sequential and random read/write with varying
> > > > > numbers of jobs (1,4,8,16) and io_depth of 8. The tests were conducted
> > > > > with single and dual iothreads, using the newly introduced poll-weight
> > > > > parameter to measure their impact on CPU cost and throughput.
> > > > > 
> > > > > Compared to the baseline, across four FIO workload patterns (sequential
> > > > > R/W, random R/W), and averaged over FIO job counts of 1, 4, 8, and 16,
> > > > > throughput decreased slightly (-3% to -8% for one iothread, -2% to -5%
> > > > > for two iothreads), while CPU usage on the s390x host dropped
> > > > > significantly (-10% to -25% and -7% to -12%, respectively).
> > > > Hi Jaehoon,
> > > > I would like to run the same fio benchmarks on a local NVMe drive (<10us
> > > > request latency) to see how that type of hardware configuration is
> > > > affected. Are the scripts and fio job files available somewhere?
> > > > 
> > > > Thanks,
> > > > Stefan
> > > Thank you for your reply.
> > > The fio scripts are not available in a location you can access, but there is nothing particularly special in the settings.
> > > I’m sharing below the methodology and test setup used by our performance team.
> > > 
> > > Guest Setup
> > > ----------------------
> > > - 12 vCPUs, 4 GiB memory
> > > - 16 virtio disks based on the FCP multipath devices in the host
> > > 
> > > FIO test parameters
> > > -----------------------
> > > - FIO Version: fio-3.33
> > > - Filesize: 2G
> > > - Blocksize: 8K / 128K
> > > - Direct I/O: 1
> > > - FIO I/O Engine: libaio
> > > - NUMJOB List: 1, 4, 8, 16
> > > - IODEPTH: 8
> > > - Runtime (s): 150
> > > 
> > > Two FIO samples for random read
> > > --------------------------------
> > > fio --direct=1 --name=test --numjobs=16 --filename=base.0.0:base.1.0:base.2.0:base.3.0:base.4.0:base.5.0:base.6.0:base.7.0:base.8.0:base.9.0:base.10.0:base.11.0:base.12.0:base.13.0:base.14.0:base.15.0 --size=32G  --time_based --runtime=4m --readwrite=randread --ioengine=libaio --iodepth=8 --bs=8k
> > > fio --direct=1 --name=test --numjobs=4  --filename=subw1/base.0.0:subw4/base.3.0:subw8/base.7.0:subw12/base.11.0:subw16/base.15.0                                                                        --size=8G   --time_based --runtime=4m --readwrite=randread --ioengine=libaio --iodepth=8 --bs=8k
> > > 
> > > 
> > > additional notes
> > > ----------------
> > > - Each file is placed on a separate disk device mounted under subw<n> as specified in --filename=....
> > > - We execute one warmup run, then two measurement runs and calculate the average
> > Hi Jaehoon,
> > I ran fio benchmarks on an Intel Optane SSD DC P4800X Series drive (<10
> > microsecond latency). This is with just 1 drive.
> > 
> > The 8 KiB block size results show something similar to what you
> > reported: there are IOPS (or throughput) regressions and CPU utilization
> > improvements.
> > 
> > Although the CPU improvements are welcome, I think the default behavior
> > should only be changed if the IOPS regressions can be brought below 5%.
> > 
> > The regressions seem to happen regardless of whether 1 or 2 IOThreads
> > are configured. CPU utilization is different (98% vs 78%) depending on
> > the number of IOThreads, so the regressions happen across a range of CPU
> > utilizations.
> > 
> > The 128 KiB block size results are not interesting because the drive
> > already saturates at numjobs=1. This is expected since the drive cannot
> > go much above ~2 GiB/s throughput.
> > 
> > You can find the Ansible playbook, libvirt domain XML, fio
> > command-lines, and the fio/sar data here:
> > 
> > https://gitlab.com/stefanha/virt-playbooks/-/tree/aio-polling-efficiency
> > 
> > Please let me know if you'd like me to rerun the benchmark with new
> > patches or a configuration change.
> > 
> > Do you want to have a video call to discuss your work and how to get the
> > patches merged?
> > 
> > Host
> > ----
> > CPU: Intel Xeon Silver 4214 CPU @ 2.20GHz
> > RAM: 32 GiB
> > 
> > Guest
> > -----
> > vCPUs: 8
> > RAM: 4 GiB
> > Disk: 1 virtio-blk aio=native cache=none
> > 
> > IOPS
> > ----
> > rw        bs   numjobs iothreads iops   diff
> > randread  8k   1       1         163417 -7.8%
> > randread  8k   1       2         165041 -2.4%
> > randread  8k   4       1         221508 -0.64%
> > randread  8k   4       2         251298 0.008%
> > randread  8k   8       1         222128 -0.51%
> > randread  8k   8       2         249489 -2.6%
> > randread  8k   16      1         230535 -0.18%
> > randread  8k   16      2         246732 -0.22%
> > randread  128k 1       1          17616 -0.11%
> > randread  128k 1       2          17678 0.027%
> > randread  128k 4       1          17536 -0.27%
> > randread  128k 4       2          17610 -0.031%
> > randread  128k 8       1          17369 -0.42%
> > randread  128k 8       2          17433 -0.071%
> > randread  128k 16      1          17215 -0.61%
> > randread  128k 16      2          17269 -0.22%
> > randwrite 8k   1       1         156597 -3.1%
> > randwrite 8k   1       2         157720 -3.8%
> > randwrite 8k   4       1         218448 -0.5%
> > randwrite 8k   4       2         247075 -5.1%
> > randwrite 8k   8       1         220866 -0.75%
> > randwrite 8k   8       2         260935 -0.011%
> > randwrite 8k   16      1         230913 0.23%
> > randwrite 8k   16      2         261125 -0.01%
> > randwrite 128k 1       1          16009 0.094%
> > randwrite 128k 1       2          16070 0.035%
> > randwrite 128k 4       1          16073 -0.62%
> > randwrite 128k 4       2          16131 0.059%
> > randwrite 128k 8       1          16106 0.092%
> > randwrite 128k 8       2          16153 0.048%
> > randwrite 128k 16      1          16102 -0.0091%
> > randwrite 128k 16      2          16160 0.048%
> > 
> > IOThread CPU usage
> > ------------------
> > iothreads before  after
> > 1         98.7    95.81
> > 2         78.43   66.13
> > 
> > Stefan
> 
> Hello Stefan,
> 
> Thank you very much for your effort in running these benchmarks.
> The results show a pattern very similar to what our performance team
> observed.
> 
> I fully agree with the 5% threshold for the default behavior.
> However, we need an approach that balances the current performance
> oriented polling scheme with CPU efficiency.
> 
> I found that relying on grow/shrink parameters was too limited to
> achieve these results. This is why I've adjusted the process using a
> weight-based grow/shrink approach to ensure the polling window remains
> robust against jitter. Specifically, it avoids abrupt resets to zero
> by implementing a gradual shrink rather than an immediate reset, even
> when device latency exceeds the threshold.
> 
> As seen in both your results and our team's measurements, this may lead
> to a bit of a performance trade-off, but it provides a reasonable
> balance for CPU-sensitive environment.
> 
> Thank you for suggesting the video call and I am also looking forward to
> hearing your thoughts. I'm on US Central Time. Except for Tuesday, I can
> adjust my schedule to a time that works for you.
> 
> Please let me know your preferred time.

Is Monday, February 16th at 10:00am CST good for you? If not, please
feel free to pick any time on Monday.

Meeting link: https://meet.jit.si/AioPollingOptimization

Anyone else interested in this topic is welcome to join.

Thanks,
Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency
  2026-02-12 18:53         ` Stefan Hajnoczi
@ 2026-02-13 15:13           ` JAEHOON KIM
  2026-02-16 12:42             ` Stefan Hajnoczi
  0 siblings, 1 reply; 30+ messages in thread
From: JAEHOON KIM @ 2026-02-13 15:13 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, qemu-block, pbonzini, fam, armbru, eblake, berrange,
	eduardo, dave, sw

On 2/12/2026 12:53 PM, Stefan Hajnoczi wrote:
> On Fri, Feb 06, 2026 at 12:50:38AM -0600, JAEHOON KIM wrote:
>> On 2/3/2026 3:12 PM, Stefan Hajnoczi wrote:
>>> On Fri, Jan 23, 2026 at 01:15:04PM -0600, JAEHOON KIM wrote:
>>>> On 1/19/2026 12:16 PM, Stefan Hajnoczi wrote:
>>>>> On Tue, Jan 13, 2026 at 11:48:21AM -0600, Jaehoon Kim wrote:
>>>>>> We evaluated the patches on an s390x host with a single guest using 16
>>>>>> virtio block devices backed by FCP multipath devices in a separate-disk
>>>>>> setup, with the I/O scheduler set to 'none' in both host and guest.
>>>>>>
>>>>>> The fio workload included sequential and random read/write with varying
>>>>>> numbers of jobs (1,4,8,16) and io_depth of 8. The tests were conducted
>>>>>> with single and dual iothreads, using the newly introduced poll-weight
>>>>>> parameter to measure their impact on CPU cost and throughput.
>>>>>>
>>>>>> Compared to the baseline, across four FIO workload patterns (sequential
>>>>>> R/W, random R/W), and averaged over FIO job counts of 1, 4, 8, and 16,
>>>>>> throughput decreased slightly (-3% to -8% for one iothread, -2% to -5%
>>>>>> for two iothreads), while CPU usage on the s390x host dropped
>>>>>> significantly (-10% to -25% and -7% to -12%, respectively).
>>>>> Hi Jaehoon,
>>>>> I would like to run the same fio benchmarks on a local NVMe drive (<10us
>>>>> request latency) to see how that type of hardware configuration is
>>>>> affected. Are the scripts and fio job files available somewhere?
>>>>>
>>>>> Thanks,
>>>>> Stefan
>>>> Thank you for your reply.
>>>> The fio scripts are not available in a location you can access, but there is nothing particularly special in the settings.
>>>> I’m sharing below the methodology and test setup used by our performance team.
>>>>
>>>> Guest Setup
>>>> ----------------------
>>>> - 12 vCPUs, 4 GiB memory
>>>> - 16 virtio disks based on the FCP multipath devices in the host
>>>>
>>>> FIO test parameters
>>>> -----------------------
>>>> - FIO Version: fio-3.33
>>>> - Filesize: 2G
>>>> - Blocksize: 8K / 128K
>>>> - Direct I/O: 1
>>>> - FIO I/O Engine: libaio
>>>> - NUMJOB List: 1, 4, 8, 16
>>>> - IODEPTH: 8
>>>> - Runtime (s): 150
>>>>
>>>> Two FIO samples for random read
>>>> --------------------------------
>>>> fio --direct=1 --name=test --numjobs=16 --filename=base.0.0:base.1.0:base.2.0:base.3.0:base.4.0:base.5.0:base.6.0:base.7.0:base.8.0:base.9.0:base.10.0:base.11.0:base.12.0:base.13.0:base.14.0:base.15.0 --size=32G  --time_based --runtime=4m --readwrite=randread --ioengine=libaio --iodepth=8 --bs=8k
>>>> fio --direct=1 --name=test --numjobs=4  --filename=subw1/base.0.0:subw4/base.3.0:subw8/base.7.0:subw12/base.11.0:subw16/base.15.0                                                                        --size=8G   --time_based --runtime=4m --readwrite=randread --ioengine=libaio --iodepth=8 --bs=8k
>>>>
>>>>
>>>> additional notes
>>>> ----------------
>>>> - Each file is placed on a separate disk device mounted under subw<n> as specified in --filename=....
>>>> - We execute one warmup run, then two measurement runs and calculate the average
>>> Hi Jaehoon,
>>> I ran fio benchmarks on an Intel Optane SSD DC P4800X Series drive (<10
>>> microsecond latency). This is with just 1 drive.
>>>
>>> The 8 KiB block size results show something similar to what you
>>> reported: there are IOPS (or throughput) regressions and CPU utilization
>>> improvements.
>>>
>>> Although the CPU improvements are welcome, I think the default behavior
>>> should only be changed if the IOPS regressions can be brought below 5%.
>>>
>>> The regressions seem to happen regardless of whether 1 or 2 IOThreads
>>> are configured. CPU utilization is different (98% vs 78%) depending on
>>> the number of IOThreads, so the regressions happen across a range of CPU
>>> utilizations.
>>>
>>> The 128 KiB block size results are not interesting because the drive
>>> already saturates at numjobs=1. This is expected since the drive cannot
>>> go much above ~2 GiB/s throughput.
>>>
>>> You can find the Ansible playbook, libvirt domain XML, fio
>>> command-lines, and the fio/sar data here:
>>>
>>> https://gitlab.com/stefanha/virt-playbooks/-/tree/aio-polling-efficiency
>>>
>>> Please let me know if you'd like me to rerun the benchmark with new
>>> patches or a configuration change.
>>>
>>> Do you want to have a video call to discuss your work and how to get the
>>> patches merged?
>>>
>>> Host
>>> ----
>>> CPU: Intel Xeon Silver 4214 CPU @ 2.20GHz
>>> RAM: 32 GiB
>>>
>>> Guest
>>> -----
>>> vCPUs: 8
>>> RAM: 4 GiB
>>> Disk: 1 virtio-blk aio=native cache=none
>>>
>>> IOPS
>>> ----
>>> rw        bs   numjobs iothreads iops   diff
>>> randread  8k   1       1         163417 -7.8%
>>> randread  8k   1       2         165041 -2.4%
>>> randread  8k   4       1         221508 -0.64%
>>> randread  8k   4       2         251298 0.008%
>>> randread  8k   8       1         222128 -0.51%
>>> randread  8k   8       2         249489 -2.6%
>>> randread  8k   16      1         230535 -0.18%
>>> randread  8k   16      2         246732 -0.22%
>>> randread  128k 1       1          17616 -0.11%
>>> randread  128k 1       2          17678 0.027%
>>> randread  128k 4       1          17536 -0.27%
>>> randread  128k 4       2          17610 -0.031%
>>> randread  128k 8       1          17369 -0.42%
>>> randread  128k 8       2          17433 -0.071%
>>> randread  128k 16      1          17215 -0.61%
>>> randread  128k 16      2          17269 -0.22%
>>> randwrite 8k   1       1         156597 -3.1%
>>> randwrite 8k   1       2         157720 -3.8%
>>> randwrite 8k   4       1         218448 -0.5%
>>> randwrite 8k   4       2         247075 -5.1%
>>> randwrite 8k   8       1         220866 -0.75%
>>> randwrite 8k   8       2         260935 -0.011%
>>> randwrite 8k   16      1         230913 0.23%
>>> randwrite 8k   16      2         261125 -0.01%
>>> randwrite 128k 1       1          16009 0.094%
>>> randwrite 128k 1       2          16070 0.035%
>>> randwrite 128k 4       1          16073 -0.62%
>>> randwrite 128k 4       2          16131 0.059%
>>> randwrite 128k 8       1          16106 0.092%
>>> randwrite 128k 8       2          16153 0.048%
>>> randwrite 128k 16      1          16102 -0.0091%
>>> randwrite 128k 16      2          16160 0.048%
>>>
>>> IOThread CPU usage
>>> ------------------
>>> iothreads before  after
>>> 1         98.7    95.81
>>> 2         78.43   66.13
>>>
>>> Stefan
>> Hello Stefan,
>>
>> Thank you very much for your effort in running these benchmarks.
>> The results show a pattern very similar to what our performance team
>> observed.
>>
>> I fully agree with the 5% threshold for the default behavior.
>> However, we need an approach that balances the current performance
>> oriented polling scheme with CPU efficiency.
>>
>> I found that relying on grow/shrink parameters was too limited to
>> achieve these results. This is why I've adjusted the process using a
>> weight-based grow/shrink approach to ensure the polling window remains
>> robust against jitter. Specifically, it avoids abrupt resets to zero
>> by implementing a gradual shrink rather than an immediate reset, even
>> when device latency exceeds the threshold.
>>
>> As seen in both your results and our team's measurements, this may lead
>> to a bit of a performance trade-off, but it provides a reasonable
>> balance for CPU-sensitive environment.
>>
>> Thank you for suggesting the video call and I am also looking forward to
>> hearing your thoughts. I'm on US Central Time. Except for Tuesday, I can
>> adjust my schedule to a time that works for you.
>>
>> Please let me know your preferred time.
> Is Monday, February 16th at 10:00am CST good for you? If not, please
> feel free to pick any time on Monday.
>
> Meeting link: https://meet.jit.si/AioPollingOptimization
>
> Anyone else interested in this topic is welcome to join.
>
> Thanks,
> Stefan

Thank you for the invite, Stefan.
Monday at 10:00 AM CST works well for me.
I'll make sure to be there and look forward to the discussion. See you then!

Thanks,
Jaehoon



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency
  2026-02-13 15:13           ` JAEHOON KIM
@ 2026-02-16 12:42             ` Stefan Hajnoczi
  0 siblings, 0 replies; 30+ messages in thread
From: Stefan Hajnoczi @ 2026-02-16 12:42 UTC (permalink / raw)
  To: JAEHOON KIM
  Cc: qemu-devel, qemu-block, pbonzini, fam, armbru, eblake, berrange,
	eduardo, dave, sw

[-- Attachment #1: Type: text/plain, Size: 9156 bytes --]

On Fri, Feb 13, 2026 at 09:13:29AM -0600, JAEHOON KIM wrote:
> On 2/12/2026 12:53 PM, Stefan Hajnoczi wrote:
> > On Fri, Feb 06, 2026 at 12:50:38AM -0600, JAEHOON KIM wrote:
> > > On 2/3/2026 3:12 PM, Stefan Hajnoczi wrote:
> > > > On Fri, Jan 23, 2026 at 01:15:04PM -0600, JAEHOON KIM wrote:
> > > > > On 1/19/2026 12:16 PM, Stefan Hajnoczi wrote:
> > > > > > On Tue, Jan 13, 2026 at 11:48:21AM -0600, Jaehoon Kim wrote:
> > > > > > > We evaluated the patches on an s390x host with a single guest using 16
> > > > > > > virtio block devices backed by FCP multipath devices in a separate-disk
> > > > > > > setup, with the I/O scheduler set to 'none' in both host and guest.
> > > > > > > 
> > > > > > > The fio workload included sequential and random read/write with varying
> > > > > > > numbers of jobs (1,4,8,16) and io_depth of 8. The tests were conducted
> > > > > > > with single and dual iothreads, using the newly introduced poll-weight
> > > > > > > parameter to measure their impact on CPU cost and throughput.
> > > > > > > 
> > > > > > > Compared to the baseline, across four FIO workload patterns (sequential
> > > > > > > R/W, random R/W), and averaged over FIO job counts of 1, 4, 8, and 16,
> > > > > > > throughput decreased slightly (-3% to -8% for one iothread, -2% to -5%
> > > > > > > for two iothreads), while CPU usage on the s390x host dropped
> > > > > > > significantly (-10% to -25% and -7% to -12%, respectively).
> > > > > > Hi Jaehoon,
> > > > > > I would like to run the same fio benchmarks on a local NVMe drive (<10us
> > > > > > request latency) to see how that type of hardware configuration is
> > > > > > affected. Are the scripts and fio job files available somewhere?
> > > > > > 
> > > > > > Thanks,
> > > > > > Stefan
> > > > > Thank you for your reply.
> > > > > The fio scripts are not available in a location you can access, but there is nothing particularly special in the settings.
> > > > > I’m sharing below the methodology and test setup used by our performance team.
> > > > > 
> > > > > Guest Setup
> > > > > ----------------------
> > > > > - 12 vCPUs, 4 GiB memory
> > > > > - 16 virtio disks based on the FCP multipath devices in the host
> > > > > 
> > > > > FIO test parameters
> > > > > -----------------------
> > > > > - FIO Version: fio-3.33
> > > > > - Filesize: 2G
> > > > > - Blocksize: 8K / 128K
> > > > > - Direct I/O: 1
> > > > > - FIO I/O Engine: libaio
> > > > > - NUMJOB List: 1, 4, 8, 16
> > > > > - IODEPTH: 8
> > > > > - Runtime (s): 150
> > > > > 
> > > > > Two FIO samples for random read
> > > > > --------------------------------
> > > > > fio --direct=1 --name=test --numjobs=16 --filename=base.0.0:base.1.0:base.2.0:base.3.0:base.4.0:base.5.0:base.6.0:base.7.0:base.8.0:base.9.0:base.10.0:base.11.0:base.12.0:base.13.0:base.14.0:base.15.0 --size=32G  --time_based --runtime=4m --readwrite=randread --ioengine=libaio --iodepth=8 --bs=8k
> > > > > fio --direct=1 --name=test --numjobs=4  --filename=subw1/base.0.0:subw4/base.3.0:subw8/base.7.0:subw12/base.11.0:subw16/base.15.0                                                                        --size=8G   --time_based --runtime=4m --readwrite=randread --ioengine=libaio --iodepth=8 --bs=8k
> > > > > 
> > > > > 
> > > > > additional notes
> > > > > ----------------
> > > > > - Each file is placed on a separate disk device mounted under subw<n> as specified in --filename=....
> > > > > - We execute one warmup run, then two measurement runs and calculate the average
> > > > Hi Jaehoon,
> > > > I ran fio benchmarks on an Intel Optane SSD DC P4800X Series drive (<10
> > > > microsecond latency). This is with just 1 drive.
> > > > 
> > > > The 8 KiB block size results show something similar to what you
> > > > reported: there are IOPS (or throughput) regressions and CPU utilization
> > > > improvements.
> > > > 
> > > > Although the CPU improvements are welcome, I think the default behavior
> > > > should only be changed if the IOPS regressions can be brought below 5%.
> > > > 
> > > > The regressions seem to happen regardless of whether 1 or 2 IOThreads
> > > > are configured. CPU utilization is different (98% vs 78%) depending on
> > > > the number of IOThreads, so the regressions happen across a range of CPU
> > > > utilizations.
> > > > 
> > > > The 128 KiB block size results are not interesting because the drive
> > > > already saturates at numjobs=1. This is expected since the drive cannot
> > > > go much above ~2 GiB/s throughput.
> > > > 
> > > > You can find the Ansible playbook, libvirt domain XML, fio
> > > > command-lines, and the fio/sar data here:
> > > > 
> > > > https://gitlab.com/stefanha/virt-playbooks/-/tree/aio-polling-efficiency
> > > > 
> > > > Please let me know if you'd like me to rerun the benchmark with new
> > > > patches or a configuration change.
> > > > 
> > > > Do you want to have a video call to discuss your work and how to get the
> > > > patches merged?
> > > > 
> > > > Host
> > > > ----
> > > > CPU: Intel Xeon Silver 4214 CPU @ 2.20GHz
> > > > RAM: 32 GiB
> > > > 
> > > > Guest
> > > > -----
> > > > vCPUs: 8
> > > > RAM: 4 GiB
> > > > Disk: 1 virtio-blk aio=native cache=none
> > > > 
> > > > IOPS
> > > > ----
> > > > rw        bs   numjobs iothreads iops   diff
> > > > randread  8k   1       1         163417 -7.8%
> > > > randread  8k   1       2         165041 -2.4%
> > > > randread  8k   4       1         221508 -0.64%
> > > > randread  8k   4       2         251298 0.008%
> > > > randread  8k   8       1         222128 -0.51%
> > > > randread  8k   8       2         249489 -2.6%
> > > > randread  8k   16      1         230535 -0.18%
> > > > randread  8k   16      2         246732 -0.22%
> > > > randread  128k 1       1          17616 -0.11%
> > > > randread  128k 1       2          17678 0.027%
> > > > randread  128k 4       1          17536 -0.27%
> > > > randread  128k 4       2          17610 -0.031%
> > > > randread  128k 8       1          17369 -0.42%
> > > > randread  128k 8       2          17433 -0.071%
> > > > randread  128k 16      1          17215 -0.61%
> > > > randread  128k 16      2          17269 -0.22%
> > > > randwrite 8k   1       1         156597 -3.1%
> > > > randwrite 8k   1       2         157720 -3.8%
> > > > randwrite 8k   4       1         218448 -0.5%
> > > > randwrite 8k   4       2         247075 -5.1%
> > > > randwrite 8k   8       1         220866 -0.75%
> > > > randwrite 8k   8       2         260935 -0.011%
> > > > randwrite 8k   16      1         230913 0.23%
> > > > randwrite 8k   16      2         261125 -0.01%
> > > > randwrite 128k 1       1          16009 0.094%
> > > > randwrite 128k 1       2          16070 0.035%
> > > > randwrite 128k 4       1          16073 -0.62%
> > > > randwrite 128k 4       2          16131 0.059%
> > > > randwrite 128k 8       1          16106 0.092%
> > > > randwrite 128k 8       2          16153 0.048%
> > > > randwrite 128k 16      1          16102 -0.0091%
> > > > randwrite 128k 16      2          16160 0.048%
> > > > 
> > > > IOThread CPU usage
> > > > ------------------
> > > > iothreads before  after
> > > > 1         98.7    95.81
> > > > 2         78.43   66.13
> > > > 
> > > > Stefan
> > > Hello Stefan,
> > > 
> > > Thank you very much for your effort in running these benchmarks.
> > > The results show a pattern very similar to what our performance team
> > > observed.
> > > 
> > > I fully agree with the 5% threshold for the default behavior.
> > > However, we need an approach that balances the current performance
> > > oriented polling scheme with CPU efficiency.
> > > 
> > > I found that relying on grow/shrink parameters was too limited to
> > > achieve these results. This is why I've adjusted the process using a
> > > weight-based grow/shrink approach to ensure the polling window remains
> > > robust against jitter. Specifically, it avoids abrupt resets to zero
> > > by implementing a gradual shrink rather than an immediate reset, even
> > > when device latency exceeds the threshold.
> > > 
> > > As seen in both your results and our team's measurements, this may lead
> > > to a bit of a performance trade-off, but it provides a reasonable
> > > balance for CPU-sensitive environment.
> > > 
> > > Thank you for suggesting the video call and I am also looking forward to
> > > hearing your thoughts. I'm on US Central Time. Except for Tuesday, I can
> > > adjust my schedule to a time that works for you.
> > > 
> > > Please let me know your preferred time.
> > Is Monday, February 16th at 10:00am CST good for you? If not, please
> > feel free to pick any time on Monday.
> > 
> > Meeting link: https://meet.jit.si/AioPollingOptimization
> > 
> > Anyone else interested in this topic is welcome to join.
> > 
> > Thanks,
> > Stefan
> 
> Thank you for the invite, Stefan.
> Monday at 10:00 AM CST works well for me.
> I'll make sure to be there and look forward to the discussion. See you then!

Great, talk to you soon!

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v1 1/3] aio-poll: avoid unnecessary polling time computation
  2026-01-13 17:48 ` [PATCH RFC v1 1/3] aio-poll: avoid unnecessary polling time computation Jaehoon Kim
@ 2026-02-16 14:58   ` Stefan Hajnoczi
  2026-02-16 15:21   ` Stefan Hajnoczi
  1 sibling, 0 replies; 30+ messages in thread
From: Stefan Hajnoczi @ 2026-02-16 14:58 UTC (permalink / raw)
  To: Jaehoon Kim
  Cc: qemu-devel, qemu-block, pbonzini, fam, armbru, eblake, berrange,
	eduardo, dave, sw

[-- Attachment #1: Type: text/plain, Size: 2706 bytes --]

On Tue, Jan 13, 2026 at 11:48:22AM -0600, Jaehoon Kim wrote:
> Nodes are no longer added to poll_aio_handlers when adaptive polling is
> disabled, preventing unnecessary try_poll_mode() calls. Additionally,
> aio_poll() skips try_poll_mode() when timeout is 0.

Do these two changes need to be made together? If not, please split them
into two commits. This will make the commit descriptions easier to
understand.

> 
> This avoids iterating over all nodes to compute max_ns unnecessarily
> when polling is disabled or timeout is 0.
> 
> Signed-off-by: Jaehoon Kim <jhkim@linux.ibm.com>
> ---
>  util/aio-posix.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/util/aio-posix.c b/util/aio-posix.c
> index e24b955fd9..7ddf92a25f 100644
> --- a/util/aio-posix.c
> +++ b/util/aio-posix.c
> @@ -306,9 +306,8 @@ static bool aio_dispatch_handler(AioContext *ctx, AioHandler *node)
>       * fdmon_supports_polling(), but only until the fd fires for the first
>       * time.
>       */
> -    if (!QLIST_IS_INSERTED(node, node_deleted) &&
> -        !QLIST_IS_INSERTED(node, node_poll) &&
> -        node->io_poll) {
> +    if (ctx->poll_max_ns && !QLIST_IS_INSERTED(node, node_deleted) &&
> +        !QLIST_IS_INSERTED(node, node_poll) && node->io_poll) {
>          trace_poll_add(ctx, node, node->pfd.fd, revents);
>          if (ctx->poll_started && node->io_poll_begin) {
>              node->io_poll_begin(node->opaque);
> @@ -630,7 +629,7 @@ static void adjust_polling_time(AioContext *ctx, AioPolledEvent *poll,
>  bool aio_poll(AioContext *ctx, bool blocking)
>  {
>      AioHandlerList ready_list = QLIST_HEAD_INITIALIZER(ready_list);
> -    bool progress;
> +    bool progress = false;
>      bool use_notify_me;
>      int64_t timeout;
>      int64_t start = 0;
> @@ -655,7 +654,9 @@ bool aio_poll(AioContext *ctx, bool blocking)
>      }
>  
>      timeout = blocking ? aio_compute_timeout(ctx) : 0;
> -    progress = try_poll_mode(ctx, &ready_list, &timeout);
> +    if ((ctx->poll_max_ns != 0) && (timeout != 0)) {
> +        progress = try_poll_mode(ctx, &ready_list, &timeout);
> +    }
>      assert(!(timeout && progress));

Can you walk me through the timeout == 0 case when polling is active?

Further down in aio_poll():

  /* If polling is allowed, non-blocking aio_poll does not need the
   * system call---a single round of run_poll_handlers_once suffices.
   */
  if (timeout || ctx->fdmon_ops->need_wait(ctx)) {

My concern is that aio_poll(timeout=0) could return without polling or
waiting for fds when polling is active. Maybe I've missed something that
prevents this?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v1 1/3] aio-poll: avoid unnecessary polling time computation
  2026-01-13 17:48 ` [PATCH RFC v1 1/3] aio-poll: avoid unnecessary polling time computation Jaehoon Kim
  2026-02-16 14:58   ` Stefan Hajnoczi
@ 2026-02-16 15:21   ` Stefan Hajnoczi
  2026-02-16 20:47     ` JAEHOON KIM
  1 sibling, 1 reply; 30+ messages in thread
From: Stefan Hajnoczi @ 2026-02-16 15:21 UTC (permalink / raw)
  To: Jaehoon Kim
  Cc: qemu-devel, qemu-block, pbonzini, fam, armbru, eblake, berrange,
	eduardo, dave, sw

[-- Attachment #1: Type: text/plain, Size: 594 bytes --]

On Tue, Jan 13, 2026 at 11:48:22AM -0600, Jaehoon Kim wrote:
> Nodes are no longer added to poll_aio_handlers when adaptive polling is
> disabled, preventing unnecessary try_poll_mode() calls. Additionally,
> aio_poll() skips try_poll_mode() when timeout is 0.
> 
> This avoids iterating over all nodes to compute max_ns unnecessarily
> when polling is disabled or timeout is 0.

Did you consider optimizing the case when polling is active and there is
no way around calculating the polling time? Glib has ordered data
structures that have lower operation costs than O(n).

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v1 1/3] aio-poll: avoid unnecessary polling time computation
  2026-02-16 15:21   ` Stefan Hajnoczi
@ 2026-02-16 20:47     ` JAEHOON KIM
  2026-02-17 13:16       ` Stefan Hajnoczi
  0 siblings, 1 reply; 30+ messages in thread
From: JAEHOON KIM @ 2026-02-16 20:47 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, qemu-block, pbonzini, fam, armbru, eblake, berrange,
	eduardo, dave, sw

On 2/16/2026 9:21 AM, Stefan Hajnoczi wrote:
> On Tue, Jan 13, 2026 at 11:48:22AM -0600, Jaehoon Kim wrote:
>> Nodes are no longer added to poll_aio_handlers when adaptive polling is
>> disabled, preventing unnecessary try_poll_mode() calls. Additionally,
>> aio_poll() skips try_poll_mode() when timeout is 0.
>>
>> This avoids iterating over all nodes to compute max_ns unnecessarily
>> when polling is disabled or timeout is 0.
> Did you consider optimizing the case when polling is active and there is
> no way around calculating the polling time? Glib has ordered data
> structures that have lower operation costs than O(n).
>
> Stefan

If I understand correctly, while optimizing the data structure to be 
cheaper
than O(n) is one approach, I believe bypassing the function entirely 
efficient
for cases where timeout=0 is frequently triggered.

Since the result of the calculation is not used at all in these cases, 
it seems
to more efficient to skip the function than to optimize calculation itself.

Please let me know if I have misunderstood your point.

Thanks,
Jaehoon Kim.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v1 1/3] aio-poll: avoid unnecessary polling time computation
  2026-02-16 20:47     ` JAEHOON KIM
@ 2026-02-17 13:16       ` Stefan Hajnoczi
  2026-02-18 13:43         ` JAEHOON KIM
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Hajnoczi @ 2026-02-17 13:16 UTC (permalink / raw)
  To: JAEHOON KIM
  Cc: qemu-devel, qemu-block, pbonzini, fam, armbru, eblake, berrange,
	eduardo, dave, sw

[-- Attachment #1: Type: text/plain, Size: 1385 bytes --]

On Mon, Feb 16, 2026 at 02:47:27PM -0600, JAEHOON KIM wrote:
> On 2/16/2026 9:21 AM, Stefan Hajnoczi wrote:
> > On Tue, Jan 13, 2026 at 11:48:22AM -0600, Jaehoon Kim wrote:
> > > Nodes are no longer added to poll_aio_handlers when adaptive polling is
> > > disabled, preventing unnecessary try_poll_mode() calls. Additionally,
> > > aio_poll() skips try_poll_mode() when timeout is 0.
> > > 
> > > This avoids iterating over all nodes to compute max_ns unnecessarily
> > > when polling is disabled or timeout is 0.
> > Did you consider optimizing the case when polling is active and there is
> > no way around calculating the polling time? Glib has ordered data
> > structures that have lower operation costs than O(n).
> > 
> > Stefan
> 
> If I understand correctly, while optimizing the data structure to be cheaper
> than O(n) is one approach, I believe bypassing the function entirely
> efficient
> for cases where timeout=0 is frequently triggered.
> 
> Since the result of the calculation is not used at all in these cases, it
> seems
> to more efficient to skip the function than to optimize calculation itself.

Yes, skipping it when not needed makes sense and I'm in favor of it.

I was thinking that, separately from this patch, the data structure
might be worth optimizing since the value needs to be computed when
polling is enabled.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v1 1/3] aio-poll: avoid unnecessary polling time computation
  2026-02-17 13:16       ` Stefan Hajnoczi
@ 2026-02-18 13:43         ` JAEHOON KIM
  0 siblings, 0 replies; 30+ messages in thread
From: JAEHOON KIM @ 2026-02-18 13:43 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, qemu-block, pbonzini, fam, armbru, eblake, berrange,
	eduardo, dave, sw

On 2/17/2026 7:16 AM, Stefan Hajnoczi wrote:
> On Mon, Feb 16, 2026 at 02:47:27PM -0600, JAEHOON KIM wrote:
>> On 2/16/2026 9:21 AM, Stefan Hajnoczi wrote:
>>> On Tue, Jan 13, 2026 at 11:48:22AM -0600, Jaehoon Kim wrote:
>>>> Nodes are no longer added to poll_aio_handlers when adaptive polling is
>>>> disabled, preventing unnecessary try_poll_mode() calls. Additionally,
>>>> aio_poll() skips try_poll_mode() when timeout is 0.
>>>>
>>>> This avoids iterating over all nodes to compute max_ns unnecessarily
>>>> when polling is disabled or timeout is 0.
>>> Did you consider optimizing the case when polling is active and there is
>>> no way around calculating the polling time? Glib has ordered data
>>> structures that have lower operation costs than O(n).
>>>
>>> Stefan
>> If I understand correctly, while optimizing the data structure to be cheaper
>> than O(n) is one approach, I believe bypassing the function entirely
>> efficient
>> for cases where timeout=0 is frequently triggered.
>>
>> Since the result of the calculation is not used at all in these cases, it
>> seems
>> to more efficient to skip the function than to optimize calculation itself.
> Yes, skipping it when not needed makes sense and I'm in favor of it.
>
> I was thinking that, separately from this patch, the data structure
> might be worth optimizing since the value needs to be computed when
> polling is enabled.
>
> Stefan

That makes sense. I agree that when polling is enabled, optimizing the
data structure could help reduce the overhead of computing the value.
I'll take a closer look at potential improvements there.

Thanks,
Jaehoon



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency
  2026-01-13 17:48 [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency Jaehoon Kim
                   ` (3 preceding siblings ...)
  2026-01-19 18:16 ` [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency Stefan Hajnoczi
@ 2026-02-19 22:27 ` Stefan Hajnoczi
  2026-02-20 19:00   ` JAEHOON KIM
  4 siblings, 1 reply; 30+ messages in thread
From: Stefan Hajnoczi @ 2026-02-19 22:27 UTC (permalink / raw)
  To: Jaehoon Kim
  Cc: qemu-devel, qemu-block, pbonzini, fam, armbru, eblake, berrange,
	eduardo, dave, sw

[-- Attachment #1: Type: text/plain, Size: 1794 bytes --]

Hi Jaehoon,
Following the call earlier this week I ran a single fio job to get a
clearer picture of:
1. The QEMU 10.0.0 regression that prompted you to optimize AioContext
   polling.
2. How the poll-weight parameter affects IOPS.

run      rw        bs   numjobs iothreads iops   diff
v9.2.0   randread  8k   1       1         174944 3.6%
v10.0.0  randread  8k   1       1         174285 3.2%
baseline randread  8k   1       1         168908 0.0%
w2       randread  8k   1       1         163718 -3.1%
w3       randread  8k   1       1         165805 -1.8%
w4       randread  8k   1       1         167388 -0.9%

This time I only ran randread bs=8k iodepth=8 numjobs=1 with a single
IOThread.

Observations:

- There might be an IOPS regression between v10.0.0 and the baseline
  (9ad7f544c696) that your patches apply on top of. This is different
  from the CPU utilization regression that you found in v9.2.0 ->
  v10.0.0. I will bisect it.

- poll-weight=3 and 4 improve IOPS to a level that is acceptable. CPU
  utilization looks like this:

run         %usr     %nice      %sys   %iowait    %steal      %irq     %soft    %guest    %gnice     %idle
baseline   49.37      0.00     31.10      0.00      0.00     11.61      0.04      0.00      0.00      7.89
w2         46.24      0.00     32.61      0.00      0.00     11.84      0.10      0.00      0.00      9.21
w3         48.04      0.00     32.17      0.00      0.00     11.98      0.08      0.00      0.00      7.73
w4         48.56      0.00     31.23      0.00      0.00     11.48      0.03      0.00      0.00      8.69

poll-weight=2 is the winner at CPU utilization. I'm not sure if
poll-weight=3 will produce an acceptable CPU utilization improvement for
you. Do you have data or want to re-run to measure poll-weight=3?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency
  2026-02-19 22:27 ` Stefan Hajnoczi
@ 2026-02-20 19:00   ` JAEHOON KIM
  2026-02-24  4:24     ` Stefan Hajnoczi
  2026-02-26  6:03     ` JAEHOON KIM
  0 siblings, 2 replies; 30+ messages in thread
From: JAEHOON KIM @ 2026-02-20 19:00 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, qemu-block, pbonzini, fam, armbru, eblake, berrange,
	eduardo, dave, sw

On 2/19/2026 4:27 PM, Stefan Hajnoczi wrote:
> Hi Jaehoon,
> Following the call earlier this week I ran a single fio job to get a
> clearer picture of:
> 1. The QEMU 10.0.0 regression that prompted you to optimize AioContext
>     polling.
> 2. How the poll-weight parameter affects IOPS.
>
> run      rw        bs   numjobs iothreads iops   diff
> v9.2.0   randread  8k   1       1         174944 3.6%
> v10.0.0  randread  8k   1       1         174285 3.2%
> baseline randread  8k   1       1         168908 0.0%
> w2       randread  8k   1       1         163718 -3.1%
> w3       randread  8k   1       1         165805 -1.8%
> w4       randread  8k   1       1         167388 -0.9%
>
> This time I only ran randread bs=8k iodepth=8 numjobs=1 with a single
> IOThread.
>
> Observations:
>
> - There might be an IOPS regression between v10.0.0 and the baseline
>    (9ad7f544c696) that your patches apply on top of. This is different
>    from the CPU utilization regression that you found in v9.2.0 ->
>    v10.0.0. I will bisect it.
>
> - poll-weight=3 and 4 improve IOPS to a level that is acceptable. CPU
>    utilization looks like this:
>
> run         %usr     %nice      %sys   %iowait    %steal      %irq     %soft    %guest    %gnice     %idle
> baseline   49.37      0.00     31.10      0.00      0.00     11.61      0.04      0.00      0.00      7.89
> w2         46.24      0.00     32.61      0.00      0.00     11.84      0.10      0.00      0.00      9.21
> w3         48.04      0.00     32.17      0.00      0.00     11.98      0.08      0.00      0.00      7.73
> w4         48.56      0.00     31.23      0.00      0.00     11.48      0.03      0.00      0.00      8.69
>
> poll-weight=2 is the winner at CPU utilization. I'm not sure if
> poll-weight=3 will produce an acceptable CPU utilization improvement for
> you. Do you have data or want to re-run to measure poll-weight=3?
>
> Stefan

Thank you very much for sharing the detailed measurement results.
I truly appreciate the effort.

Regarding w=3, I will discuss with our performance team to see
if the CPU consumption levels are acceptable within our internal
test environment. I will get back to you with more definitive data
as soon as possible.

Thanks again for your thorough analysis.

Regards,
Jaehoon.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency
  2026-02-20 19:00   ` JAEHOON KIM
@ 2026-02-24  4:24     ` Stefan Hajnoczi
  2026-02-26  6:03     ` JAEHOON KIM
  1 sibling, 0 replies; 30+ messages in thread
From: Stefan Hajnoczi @ 2026-02-24  4:24 UTC (permalink / raw)
  To: JAEHOON KIM
  Cc: qemu-devel, qemu-block, pbonzini, fam, armbru, eblake, berrange,
	eduardo, dave, sw

[-- Attachment #1: Type: text/plain, Size: 1442 bytes --]

On Fri, Feb 20, 2026 at 01:00:16PM -0600, JAEHOON KIM wrote:
> On 2/19/2026 4:27 PM, Stefan Hajnoczi wrote:
> > Hi Jaehoon,
> > Following the call earlier this week I ran a single fio job to get a
> > clearer picture of:
> > 1. The QEMU 10.0.0 regression that prompted you to optimize AioContext
> >     polling.
> > 2. How the poll-weight parameter affects IOPS.
> > 
> > run      rw        bs   numjobs iothreads iops   diff
> > v9.2.0   randread  8k   1       1         174944 3.6%
> > v10.0.0  randread  8k   1       1         174285 3.2%
> > baseline randread  8k   1       1         168908 0.0%
> > w2       randread  8k   1       1         163718 -3.1%
> > w3       randread  8k   1       1         165805 -1.8%
> > w4       randread  8k   1       1         167388 -0.9%
> > 
> > This time I only ran randread bs=8k iodepth=8 numjobs=1 with a single
> > IOThread.
> > 
> > Observations:
> > 
> > - There might be an IOPS regression between v10.0.0 and the baseline
> >    (9ad7f544c696) that your patches apply on top of. This is different
> >    from the CPU utilization regression that you found in v9.2.0 ->
> >    v10.0.0. I will bisect it.

I reran v10.0.0, v10.1.0, and the baseline to check for a regression.
The fio results came back not showing a regression. It's weird that the
results I previously posted showed a change, but there doesn't seem to
be a reproducible issue after all.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency
  2026-02-20 19:00   ` JAEHOON KIM
  2026-02-24  4:24     ` Stefan Hajnoczi
@ 2026-02-26  6:03     ` JAEHOON KIM
  2026-03-09 20:46       ` JAEHOON KIM
  1 sibling, 1 reply; 30+ messages in thread
From: JAEHOON KIM @ 2026-02-26  6:03 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, qemu-block, pbonzini, fam, armbru, eblake, berrange,
	eduardo, dave, sw

On 2/20/2026 1:00 PM, JAEHOON KIM wrote:
> On 2/19/2026 4:27 PM, Stefan Hajnoczi wrote:
>> Hi Jaehoon,
>> Following the call earlier this week I ran a single fio job to get a
>> clearer picture of:
>> 1. The QEMU 10.0.0 regression that prompted you to optimize AioContext
>>     polling.
>> 2. How the poll-weight parameter affects IOPS.
>>
>> run      rw        bs   numjobs iothreads iops   diff
>> v9.2.0   randread  8k   1       1         174944 3.6%
>> v10.0.0  randread  8k   1       1         174285 3.2%
>> baseline randread  8k   1       1         168908 0.0%
>> w2       randread  8k   1       1         163718 -3.1%
>> w3       randread  8k   1       1         165805 -1.8%
>> w4       randread  8k   1       1         167388 -0.9%
>>
>> This time I only ran randread bs=8k iodepth=8 numjobs=1 with a single
>> IOThread.
>>
>> Observations:
>>
>> - There might be an IOPS regression between v10.0.0 and the baseline
>>    (9ad7f544c696) that your patches apply on top of. This is different
>>    from the CPU utilization regression that you found in v9.2.0 ->
>>    v10.0.0. I will bisect it.
>>
>> - poll-weight=3 and 4 improve IOPS to a level that is acceptable. CPU
>>    utilization looks like this:
>>
>> run         %usr     %nice      %sys   %iowait    %steal %irq     
>> %soft    %guest    %gnice     %idle
>> baseline   49.37      0.00     31.10      0.00      0.00 11.61      
>> 0.04      0.00      0.00      7.89
>> w2         46.24      0.00     32.61      0.00      0.00 11.84      
>> 0.10      0.00      0.00      9.21
>> w3         48.04      0.00     32.17      0.00      0.00 11.98      
>> 0.08      0.00      0.00      7.73
>> w4         48.56      0.00     31.23      0.00      0.00 11.48      
>> 0.03      0.00      0.00      8.69
>>
>> poll-weight=2 is the winner at CPU utilization. I'm not sure if
>> poll-weight=3 will produce an acceptable CPU utilization improvement for
>> you. Do you have data or want to re-run to measure poll-weight=3?
>>
>> Stefan
>
> Thank you very much for sharing the detailed measurement results.
> I truly appreciate the effort.
>
> Regarding w=3, I will discuss with our performance team to see
> if the CPU consumption levels are acceptable within our internal
> test environment. I will get back to you with more definitive data
> as soon as possible.
>
> Thanks again for your thorough analysis.
>
> Regards,
> Jaehoon.
>
Hello Stefan,

Thank you for your patience. I would like to share the observed changes 
in throughput
and CPU consumption in our performance test environment as follows.

We observed that when using W=3, performance returns to a level 
comparable to
QEMU v9.1, while W=2 results in slightly lower CPU consumption.

Our initial preference is to use W=2 by default.
However, we fully understand your concerns regarding the potential 
performance drop.
Should the patch be accepted, we would provide guidance on using the 
weighted value as
a configurable option.

For reference, we consider values between -2 and 2 as noise in our analysis.
I look forward to your feedback.

The table below shows a comparison between:
*  - Host:* RHEL10.1-GA+qemu-10.0.0-14.el10_1, *Guest:* RHEL 9.6 GA vs.
*  - Host:* RHEL10.1-GA+qemu-10.0.0-14.el10_1 (w=2, w=3), *Guest:* RHEL 
9.6 GA
     for FIO FCP and FICON with 1 iothread and 8 iothreads.
     The values shown are the averages for numjobs 1, 4, and 8.

FIO FCP -1 iothread
+---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
|                     | Seq. Read  | Seq. Read  | Seq. Write | Seq. 
Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
|                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3 (%)  
  |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
+---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
| Throughput avg      |   -3.00    |   -2.33    |    0.00    |  -0.33    
|   -3.00    |   -3.33    |    1.33    |   -0.33    |
| CPU consumption avg |   -5.67    |   -4.33    |   -6.33    |  -5.33    
|   -7.33    |   -5.33    |  -10.33    |   -8.67    |
+---------------------+------------+------------+------------+------------+------------+------------+------------+------------+

FIO FCP -8 iothread
+---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
|                     | Seq. Read  | Seq. Read  | Seq. Write | Seq. 
Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
|                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3 (%)  
  |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
+---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
| Throughput avg      |   -5.00    |   -4.00    |   -3.67    |  -3.33    
|   -4.67    |   -4.00    |   -0.33    |   -0.67    |
| CPU consumption avg |  -13.00    |  -10.67    |  -16.00    | -14.33    
|  -14.00    |   -9.33    |  -13.67    |  -11.00    |
+---------------------+------------+------------+------------+------------+------------+------------+------------+------------+


FIO FICON -1 iothread
+---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
|                     | Seq. Read  | Seq. Read  | Seq. Write | Seq. 
Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
|                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3 (%)  
  |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
+---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
| Throughput avg      |   -0.67    |   -0.67    |   -6.33    |  -6.67    
|   -0.67    |    0.00    |    1.33    |    1.33    |
| CPU consumption avg |   -7.67    |   -7.33    |  -13.67    | -13.00    
|   -9.00    |   -8.33    |   -5.00    |   -4.33    |
+---------------------+------------+------------+------------+------------+------------+------------+------------+------------+

FIO FICON -8 iothread
+---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
|                     | Seq. Read  | Seq. Read  | Seq. Write | Seq. 
Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
|                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3 (%)  
  |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
+---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
| Throughput avg      |   -3.00    |   -2.67    |   -7.33    |  -7.00    
|   -0.67    |   -1.00    |    0.67    |    0.67    |
| CPU consumption avg |  -16.33    |  -14.33    |  -25.33    | -27.00    
|   -8.67    |   -7.00    |   -6.67    |   -5.00    |
+---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 



The table below shows a comparison between:
* - Host:* RHEL 10.0 GA + qemu-9.1.0-15.el10, *Guest:* RHEL 9.6 GA vs.
* - Host:* RHEL 10.1 GA + qemu-10.0.0-14.el10_1 (w=2, w=3), *Guest:* 
RHEL 9.6 GA.

FIO FCP -1 iothread
+---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
|                     | Seq. Read  | Seq. Read  | Seq. Write | Seq. 
Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
|                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3 (%)  
  |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
+---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
| Throughput avg      |   -0.67    |    0.00    |   -0.33    |  -1.00    
|   -1.00    |   -0.67    |    3.33    |    2.00    |
| CPU consumption avg |    0.67    |    2.00    |    1.67    | 3.00    
|   -3.33    |   -0.33    |   -2.33    |    0.00    |
+---------------------+------------+------------+------------+------------+------------+------------+------------+------------+

FIO FCP -8 iothread
+---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
|                     | Seq. Read  | Seq. Read  | Seq. Write | Seq. 
Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
|                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3 (%)  
  |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
+---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
| Throughput avg      |   -2.00    |   -1.33    |   -1.00    | 0.00    
|   -0.33    |    1.00    |    1.00    |    0.67    |
| CPU consumption avg |   -3.00    |   -1.00    |   -2.00    | 0.00    
|  -10.33    |   -5.67    |   -6.67    |   -3.33    |
+---------------------+------------+------------+------------+------------+------------+------------+------------+------------+


FIO FICON -1 iothread
+---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
|                     | Seq. Read  | Seq. Read  | Seq. Write | Seq. 
Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
|                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3 (%)  
  |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
+---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
| Throughput avg      |   -1.67    |   -1.67    |   -0.33    | 0.00    
|   -1.33    |   -1.00    |   -1.67    |   -2.33    |
| CPU consumption avg |   -0.33    |    1.00    |    1.00    | 2.00    
|   -2.33    |   -1.33    |   -1.33    |   -0.33    |
+---------------------+------------+------------+------------+------------+------------+------------+------------+------------+

FIO FICON -8 iothread
+---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
|                     | Seq. Read  | Seq. Read  | Seq. Write | Seq. 
Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
|                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3 (%)  
  |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
+---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
| Throughput avg      |   -1.00    |   -0.67    |   -0.33    | 0.33    
|    0.67    |    0.33    |    0.33    |    0.33    |
| CPU consumption avg |   -1.33    |    1.00    |    4.33    | 2.33    
|   -2.00    |    0.00    |   -3.00    |   -1.33    |
+---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 



Regards,
Jaehoon.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency
  2026-02-26  6:03     ` JAEHOON KIM
@ 2026-03-09 20:46       ` JAEHOON KIM
  2026-03-23 14:08         ` JAEHOON KIM
  0 siblings, 1 reply; 30+ messages in thread
From: JAEHOON KIM @ 2026-03-09 20:46 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, qemu-block, pbonzini, fam, armbru, eblake, berrange,
	eduardo, dave, sw

On 2/26/2026 12:03 AM, JAEHOON KIM wrote:
> On 2/20/2026 1:00 PM, JAEHOON KIM wrote:
>> On 2/19/2026 4:27 PM, Stefan Hajnoczi wrote:
>>> Hi Jaehoon,
>>> Following the call earlier this week I ran a single fio job to get a
>>> clearer picture of:
>>> 1. The QEMU 10.0.0 regression that prompted you to optimize AioContext
>>>     polling.
>>> 2. How the poll-weight parameter affects IOPS.
>>>
>>> run      rw        bs   numjobs iothreads iops   diff
>>> v9.2.0   randread  8k   1       1         174944 3.6%
>>> v10.0.0  randread  8k   1       1         174285 3.2%
>>> baseline randread  8k   1       1         168908 0.0%
>>> w2       randread  8k   1       1         163718 -3.1%
>>> w3       randread  8k   1       1         165805 -1.8%
>>> w4       randread  8k   1       1         167388 -0.9%
>>>
>>> This time I only ran randread bs=8k iodepth=8 numjobs=1 with a single
>>> IOThread.
>>>
>>> Observations:
>>>
>>> - There might be an IOPS regression between v10.0.0 and the baseline
>>>    (9ad7f544c696) that your patches apply on top of. This is different
>>>    from the CPU utilization regression that you found in v9.2.0 ->
>>>    v10.0.0. I will bisect it.
>>>
>>> - poll-weight=3 and 4 improve IOPS to a level that is acceptable. CPU
>>>    utilization looks like this:
>>>
>>> run         %usr     %nice      %sys   %iowait    %steal %irq     
>>> %soft    %guest    %gnice     %idle
>>> baseline   49.37      0.00     31.10      0.00      0.00 11.61      
>>> 0.04      0.00      0.00      7.89
>>> w2         46.24      0.00     32.61      0.00      0.00 11.84      
>>> 0.10      0.00      0.00      9.21
>>> w3         48.04      0.00     32.17      0.00      0.00 11.98      
>>> 0.08      0.00      0.00      7.73
>>> w4         48.56      0.00     31.23      0.00      0.00 11.48      
>>> 0.03      0.00      0.00      8.69
>>>
>>> poll-weight=2 is the winner at CPU utilization. I'm not sure if
>>> poll-weight=3 will produce an acceptable CPU utilization improvement 
>>> for
>>> you. Do you have data or want to re-run to measure poll-weight=3?
>>>
>>> Stefan
>>
>> Thank you very much for sharing the detailed measurement results.
>> I truly appreciate the effort.
>>
>> Regarding w=3, I will discuss with our performance team to see
>> if the CPU consumption levels are acceptable within our internal
>> test environment. I will get back to you with more definitive data
>> as soon as possible.
>>
>> Thanks again for your thorough analysis.
>>
>> Regards,
>> Jaehoon.
>>
> Hello Stefan,
>
> Thank you for your patience. I would like to share the observed 
> changes in throughput
> and CPU consumption in our performance test environment as follows.
>
> We observed that when using W=3, performance returns to a level 
> comparable to
> QEMU v9.1, while W=2 results in slightly lower CPU consumption.
>
> Our initial preference is to use W=2 by default.
> However, we fully understand your concerns regarding the potential 
> performance drop.
> Should the patch be accepted, we would provide guidance on using the 
> weighted value as
> a configurable option.
>
> For reference, we consider values between -2 and 2 as noise in our 
> analysis.
> I look forward to your feedback.
>
> The table below shows a comparison between:
> *  - Host:* RHEL10.1-GA+qemu-10.0.0-14.el10_1, *Guest:* RHEL 9.6 GA vs.
> *  - Host:* RHEL10.1-GA+qemu-10.0.0-14.el10_1 (w=2, w=3), *Guest:* 
> RHEL 9.6 GA
>     for FIO FCP and FICON with 1 iothread and 8 iothreads.
>     The values shown are the averages for numjobs 1, 4, and 8.
>
> FIO FCP -1 iothread
> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>
> |                     | Seq. Read  | Seq. Read  | Seq. Write | Seq. 
> Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
> |                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3 
> (%)   |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>
> | Throughput avg      |   -3.00    |   -2.33    |    0.00    |  -0.33  
>   |   -3.00    |   -3.33    |    1.33    |   -0.33    |
> | CPU consumption avg |   -5.67    |   -4.33    |   -6.33    |  -5.33  
>   |   -7.33    |   -5.33    |  -10.33    |   -8.67    |
> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>
>
> FIO FCP -8 iothread
> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>
> |                     | Seq. Read  | Seq. Read  | Seq. Write | Seq. 
> Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
> |                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3 
> (%)   |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>
> | Throughput avg      |   -5.00    |   -4.00    |   -3.67    |  -3.33  
>   |   -4.67    |   -4.00    |   -0.33    |   -0.67    |
> | CPU consumption avg |  -13.00    |  -10.67    |  -16.00    | -14.33  
>   |  -14.00    |   -9.33    |  -13.67    |  -11.00    |
> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>
>
>
> FIO FICON -1 iothread
> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>
> |                     | Seq. Read  | Seq. Read  | Seq. Write | Seq. 
> Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
> |                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3 
> (%)   |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>
> | Throughput avg      |   -0.67    |   -0.67    |   -6.33    |  -6.67  
>   |   -0.67    |    0.00    |    1.33    |    1.33    |
> | CPU consumption avg |   -7.67    |   -7.33    |  -13.67    | -13.00  
>   |   -9.00    |   -8.33    |   -5.00    |   -4.33    |
> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>
>
> FIO FICON -8 iothread
> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>
> |                     | Seq. Read  | Seq. Read  | Seq. Write | Seq. 
> Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
> |                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3 
> (%)   |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>
> | Throughput avg      |   -3.00    |   -2.67    |   -7.33    |  -7.00  
>   |   -0.67    |   -1.00    |    0.67    |    0.67    |
> | CPU consumption avg |  -16.33    |  -14.33    |  -25.33    | -27.00  
>   |   -8.67    |   -7.00    |   -6.67    |   -5.00    |
> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>
>
>
> The table below shows a comparison between:
> * - Host:* RHEL 10.0 GA + qemu-9.1.0-15.el10, *Guest:* RHEL 9.6 GA vs.
> * - Host:* RHEL 10.1 GA + qemu-10.0.0-14.el10_1 (w=2, w=3), *Guest:* 
> RHEL 9.6 GA.
>
> FIO FCP -1 iothread
> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>
> |                     | Seq. Read  | Seq. Read  | Seq. Write | Seq. 
> Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
> |                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3 
> (%)   |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>
> | Throughput avg      |   -0.67    |    0.00    |   -0.33    |  -1.00  
>   |   -1.00    |   -0.67    |    3.33    |    2.00    |
> | CPU consumption avg |    0.67    |    2.00    |    1.67    | 3.00    
> |   -3.33    |   -0.33    |   -2.33    |    0.00    |
> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>
>
> FIO FCP -8 iothread
> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>
> |                     | Seq. Read  | Seq. Read  | Seq. Write | Seq. 
> Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
> |                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3 
> (%)   |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>
> | Throughput avg      |   -2.00    |   -1.33    |   -1.00    | 0.00    
> |   -0.33    |    1.00    |    1.00    |    0.67    |
> | CPU consumption avg |   -3.00    |   -1.00    |   -2.00    | 0.00    
> |  -10.33    |   -5.67    |   -6.67    |   -3.33    |
> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>
>
>
> FIO FICON -1 iothread
> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>
> |                     | Seq. Read  | Seq. Read  | Seq. Write | Seq. 
> Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
> |                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3 
> (%)   |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>
> | Throughput avg      |   -1.67    |   -1.67    |   -0.33    | 0.00    
> |   -1.33    |   -1.00    |   -1.67    |   -2.33    |
> | CPU consumption avg |   -0.33    |    1.00    |    1.00    | 2.00    
> |   -2.33    |   -1.33    |   -1.33    |   -0.33    |
> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>
>
> FIO FICON -8 iothread
> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>
> |                     | Seq. Read  | Seq. Read  | Seq. Write | Seq. 
> Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
> |                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3 
> (%)   |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>
> | Throughput avg      |   -1.00    |   -0.67    |   -0.33    | 0.33    
> |    0.67    |    0.33    |    0.33    |    0.33    |
> | CPU consumption avg |   -1.33    |    1.00    |    4.33    | 2.33    
> |   -2.00    |    0.00    |   -3.00    |   -1.33    |
> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>
>
>
> Regards,
> Jaehoon.
>
>
Hello Stefan,

I’m following up to see if you’ve had a chance to review the performance 
results I shared.
As I mentioned, we are completely fine with using w=3 as the default value.
It effectively restores performance to QEMU v9.1 levels, and we will 
provide guidance
for users who need further CPU savings to use w=2.

I’ve updated the patch to set the default value to 3 and include the 
feedback so far.
Please let me know if you have any comments or if I should post the new 
patch.

Regards,
Jaehoon



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency
  2026-03-09 20:46       ` JAEHOON KIM
@ 2026-03-23 14:08         ` JAEHOON KIM
  2026-03-23 18:51           ` Stefan Hajnoczi
  0 siblings, 1 reply; 30+ messages in thread
From: JAEHOON KIM @ 2026-03-23 14:08 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, qemu-block, pbonzini, fam, armbru, eblake, berrange,
	eduardo, dave, sw

On 3/9/2026 3:46 PM, JAEHOON KIM wrote:
> On 2/26/2026 12:03 AM, JAEHOON KIM wrote:
>> On 2/20/2026 1:00 PM, JAEHOON KIM wrote:
>>> On 2/19/2026 4:27 PM, Stefan Hajnoczi wrote:
>>>> Hi Jaehoon,
>>>> Following the call earlier this week I ran a single fio job to get a
>>>> clearer picture of:
>>>> 1. The QEMU 10.0.0 regression that prompted you to optimize AioContext
>>>>     polling.
>>>> 2. How the poll-weight parameter affects IOPS.
>>>>
>>>> run      rw        bs   numjobs iothreads iops   diff
>>>> v9.2.0   randread  8k   1       1         174944 3.6%
>>>> v10.0.0  randread  8k   1       1         174285 3.2%
>>>> baseline randread  8k   1       1         168908 0.0%
>>>> w2       randread  8k   1       1         163718 -3.1%
>>>> w3       randread  8k   1       1         165805 -1.8%
>>>> w4       randread  8k   1       1         167388 -0.9%
>>>>
>>>> This time I only ran randread bs=8k iodepth=8 numjobs=1 with a single
>>>> IOThread.
>>>>
>>>> Observations:
>>>>
>>>> - There might be an IOPS regression between v10.0.0 and the baseline
>>>>    (9ad7f544c696) that your patches apply on top of. This is different
>>>>    from the CPU utilization regression that you found in v9.2.0 ->
>>>>    v10.0.0. I will bisect it.
>>>>
>>>> - poll-weight=3 and 4 improve IOPS to a level that is acceptable. CPU
>>>>    utilization looks like this:
>>>>
>>>> run         %usr     %nice      %sys   %iowait    %steal %irq     
>>>> %soft    %guest    %gnice     %idle
>>>> baseline   49.37      0.00     31.10      0.00      0.00 11.61      
>>>> 0.04      0.00      0.00      7.89
>>>> w2         46.24      0.00     32.61      0.00      0.00 11.84      
>>>> 0.10      0.00      0.00      9.21
>>>> w3         48.04      0.00     32.17      0.00      0.00 11.98      
>>>> 0.08      0.00      0.00      7.73
>>>> w4         48.56      0.00     31.23      0.00      0.00 11.48      
>>>> 0.03      0.00      0.00      8.69
>>>>
>>>> poll-weight=2 is the winner at CPU utilization. I'm not sure if
>>>> poll-weight=3 will produce an acceptable CPU utilization 
>>>> improvement for
>>>> you. Do you have data or want to re-run to measure poll-weight=3?
>>>>
>>>> Stefan
>>>
>>> Thank you very much for sharing the detailed measurement results.
>>> I truly appreciate the effort.
>>>
>>> Regarding w=3, I will discuss with our performance team to see
>>> if the CPU consumption levels are acceptable within our internal
>>> test environment. I will get back to you with more definitive data
>>> as soon as possible.
>>>
>>> Thanks again for your thorough analysis.
>>>
>>> Regards,
>>> Jaehoon.
>>>
>> Hello Stefan,
>>
>> Thank you for your patience. I would like to share the observed 
>> changes in throughput
>> and CPU consumption in our performance test environment as follows.
>>
>> We observed that when using W=3, performance returns to a level 
>> comparable to
>> QEMU v9.1, while W=2 results in slightly lower CPU consumption.
>>
>> Our initial preference is to use W=2 by default.
>> However, we fully understand your concerns regarding the potential 
>> performance drop.
>> Should the patch be accepted, we would provide guidance on using the 
>> weighted value as
>> a configurable option.
>>
>> For reference, we consider values between -2 and 2 as noise in our 
>> analysis.
>> I look forward to your feedback.
>>
>> The table below shows a comparison between:
>> *  - Host:* RHEL10.1-GA+qemu-10.0.0-14.el10_1, *Guest:* RHEL 9.6 GA vs.
>> *  - Host:* RHEL10.1-GA+qemu-10.0.0-14.el10_1 (w=2, w=3), *Guest:* 
>> RHEL 9.6 GA
>>     for FIO FCP and FICON with 1 iothread and 8 iothreads.
>>     The values shown are the averages for numjobs 1, 4, and 8.
>>
>> FIO FCP -1 iothread
>> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>>
>> |                     | Seq. Read  | Seq. Read  | Seq. Write | Seq. 
>> Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
>> |                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3 
>> (%)   |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
>> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>>
>> | Throughput avg      |   -3.00    |   -2.33    |    0.00    | 
>>  -0.33    |   -3.00    |   -3.33    |    1.33    |   -0.33    |
>> | CPU consumption avg |   -5.67    |   -4.33    |   -6.33    | 
>>  -5.33    |   -7.33    |   -5.33    |  -10.33    |   -8.67    |
>> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>>
>>
>> FIO FCP -8 iothread
>> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>>
>> |                     | Seq. Read  | Seq. Read  | Seq. Write | Seq. 
>> Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
>> |                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3 
>> (%)   |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
>> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>>
>> | Throughput avg      |   -5.00    |   -4.00    |   -3.67    | 
>>  -3.33    |   -4.67    |   -4.00    |   -0.33    |   -0.67    |
>> | CPU consumption avg |  -13.00    |  -10.67    |  -16.00    | 
>> -14.33    |  -14.00    |   -9.33    |  -13.67    |  -11.00    |
>> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>>
>>
>>
>> FIO FICON -1 iothread
>> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>>
>> |                     | Seq. Read  | Seq. Read  | Seq. Write | Seq. 
>> Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
>> |                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3 
>> (%)   |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
>> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>>
>> | Throughput avg      |   -0.67    |   -0.67    |   -6.33    | 
>>  -6.67    |   -0.67    |    0.00    |    1.33    |    1.33    |
>> | CPU consumption avg |   -7.67    |   -7.33    |  -13.67    | 
>> -13.00    |   -9.00    |   -8.33    |   -5.00    |   -4.33    |
>> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>>
>>
>> FIO FICON -8 iothread
>> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>>
>> |                     | Seq. Read  | Seq. Read  | Seq. Write | Seq. 
>> Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
>> |                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3 
>> (%)   |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
>> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>>
>> | Throughput avg      |   -3.00    |   -2.67    |   -7.33    | 
>>  -7.00    |   -0.67    |   -1.00    |    0.67    |    0.67    |
>> | CPU consumption avg |  -16.33    |  -14.33    |  -25.33    | 
>> -27.00    |   -8.67    |   -7.00    |   -6.67    |   -5.00    |
>> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>>
>>
>>
>> The table below shows a comparison between:
>> * - Host:* RHEL 10.0 GA + qemu-9.1.0-15.el10, *Guest:* RHEL 9.6 GA vs.
>> * - Host:* RHEL 10.1 GA + qemu-10.0.0-14.el10_1 (w=2, w=3), *Guest:* 
>> RHEL 9.6 GA.
>>
>> FIO FCP -1 iothread
>> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>>
>> |                     | Seq. Read  | Seq. Read  | Seq. Write | Seq. 
>> Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
>> |                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3 
>> (%)   |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
>> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>>
>> | Throughput avg      |   -0.67    |    0.00    |   -0.33    | 
>>  -1.00    |   -1.00    |   -0.67    |    3.33    |    2.00    |
>> | CPU consumption avg |    0.67    |    2.00    |    1.67    | 3.00  
>>   |   -3.33    |   -0.33    |   -2.33    |    0.00    |
>> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>>
>>
>> FIO FCP -8 iothread
>> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>>
>> |                     | Seq. Read  | Seq. Read  | Seq. Write | Seq. 
>> Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
>> |                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3 
>> (%)   |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
>> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>>
>> | Throughput avg      |   -2.00    |   -1.33    |   -1.00    | 0.00  
>>   |   -0.33    |    1.00    |    1.00    |    0.67    |
>> | CPU consumption avg |   -3.00    |   -1.00    |   -2.00    | 0.00  
>>   |  -10.33    |   -5.67    |   -6.67    |   -3.33    |
>> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>>
>>
>>
>> FIO FICON -1 iothread
>> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>>
>> |                     | Seq. Read  | Seq. Read  | Seq. Write | Seq. 
>> Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
>> |                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3 
>> (%)   |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
>> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>>
>> | Throughput avg      |   -1.67    |   -1.67    |   -0.33    | 0.00  
>>   |   -1.33    |   -1.00    |   -1.67    |   -2.33    |
>> | CPU consumption avg |   -0.33    |    1.00    |    1.00    | 2.00  
>>   |   -2.33    |   -1.33    |   -1.33    |   -0.33    |
>> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>>
>>
>> FIO FICON -8 iothread
>> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>>
>> |                     | Seq. Read  | Seq. Read  | Seq. Write | Seq. 
>> Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
>> |                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3 
>> (%)   |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
>> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>>
>> | Throughput avg      |   -1.00    |   -0.67    |   -0.33    | 0.33  
>>   |    0.67    |    0.33    |    0.33    |    0.33    |
>> | CPU consumption avg |   -1.33    |    1.00    |    4.33    | 2.33  
>>   |   -2.00    |    0.00    |   -3.00    |   -1.33    |
>> +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+ 
>>
>>
>>
>> Regards,
>> Jaehoon.
>>
>>
> Hello Stefan,
>
> I’m following up to see if you’ve had a chance to review the 
> performance results I shared.
> As I mentioned, we are completely fine with using w=3 as the default 
> value.
> It effectively restores performance to QEMU v9.1 levels, and we will 
> provide guidance
> for users who need further CPU savings to use w=2.
>
> I’ve updated the patch to set the default value to 3 and include the 
> feedback so far.
> Please let me know if you have any comments or if I should post the 
> new patch.
>
> Regards,
> Jaehoon
>
>
Hello,

I'd like to follow up on this thread. I have posted a v2 incorporating 
the feedback so far.
I'd really appreciate your thoughts when you have a chance.

Thanks again for your time.

Regards,
Jaehoon.




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency
  2026-03-23 14:08         ` JAEHOON KIM
@ 2026-03-23 18:51           ` Stefan Hajnoczi
  0 siblings, 0 replies; 30+ messages in thread
From: Stefan Hajnoczi @ 2026-03-23 18:51 UTC (permalink / raw)
  To: JAEHOON KIM
  Cc: qemu-devel, qemu-block, pbonzini, fam, armbru, eblake, berrange,
	eduardo, dave, sw

[-- Attachment #1: Type: text/plain, Size: 14999 bytes --]

On Mon, Mar 23, 2026 at 09:08:13AM -0500, JAEHOON KIM wrote:
> On 3/9/2026 3:46 PM, JAEHOON KIM wrote:
> > On 2/26/2026 12:03 AM, JAEHOON KIM wrote:
> > > On 2/20/2026 1:00 PM, JAEHOON KIM wrote:
> > > > On 2/19/2026 4:27 PM, Stefan Hajnoczi wrote:
> > > > > Hi Jaehoon,
> > > > > Following the call earlier this week I ran a single fio job to get a
> > > > > clearer picture of:
> > > > > 1. The QEMU 10.0.0 regression that prompted you to optimize AioContext
> > > > >     polling.
> > > > > 2. How the poll-weight parameter affects IOPS.
> > > > > 
> > > > > run      rw        bs   numjobs iothreads iops   diff
> > > > > v9.2.0   randread  8k   1       1         174944 3.6%
> > > > > v10.0.0  randread  8k   1       1         174285 3.2%
> > > > > baseline randread  8k   1       1         168908 0.0%
> > > > > w2       randread  8k   1       1         163718 -3.1%
> > > > > w3       randread  8k   1       1         165805 -1.8%
> > > > > w4       randread  8k   1       1         167388 -0.9%
> > > > > 
> > > > > This time I only ran randread bs=8k iodepth=8 numjobs=1 with a single
> > > > > IOThread.
> > > > > 
> > > > > Observations:
> > > > > 
> > > > > - There might be an IOPS regression between v10.0.0 and the baseline
> > > > >    (9ad7f544c696) that your patches apply on top of. This is different
> > > > >    from the CPU utilization regression that you found in v9.2.0 ->
> > > > >    v10.0.0. I will bisect it.
> > > > > 
> > > > > - poll-weight=3 and 4 improve IOPS to a level that is acceptable. CPU
> > > > >    utilization looks like this:
> > > > > 
> > > > > run         %usr     %nice      %sys   %iowait    %steal
> > > > > %irq     %soft    %guest    %gnice     %idle
> > > > > baseline   49.37      0.00     31.10      0.00      0.00
> > > > > 11.61      0.04      0.00      0.00      7.89
> > > > > w2         46.24      0.00     32.61      0.00      0.00
> > > > > 11.84      0.10      0.00      0.00      9.21
> > > > > w3         48.04      0.00     32.17      0.00      0.00
> > > > > 11.98      0.08      0.00      0.00      7.73
> > > > > w4         48.56      0.00     31.23      0.00      0.00
> > > > > 11.48      0.03      0.00      0.00      8.69
> > > > > 
> > > > > poll-weight=2 is the winner at CPU utilization. I'm not sure if
> > > > > poll-weight=3 will produce an acceptable CPU utilization
> > > > > improvement for
> > > > > you. Do you have data or want to re-run to measure poll-weight=3?
> > > > > 
> > > > > Stefan
> > > > 
> > > > Thank you very much for sharing the detailed measurement results.
> > > > I truly appreciate the effort.
> > > > 
> > > > Regarding w=3, I will discuss with our performance team to see
> > > > if the CPU consumption levels are acceptable within our internal
> > > > test environment. I will get back to you with more definitive data
> > > > as soon as possible.
> > > > 
> > > > Thanks again for your thorough analysis.
> > > > 
> > > > Regards,
> > > > Jaehoon.
> > > > 
> > > Hello Stefan,
> > > 
> > > Thank you for your patience. I would like to share the observed
> > > changes in throughput
> > > and CPU consumption in our performance test environment as follows.
> > > 
> > > We observed that when using W=3, performance returns to a level
> > > comparable to
> > > QEMU v9.1, while W=2 results in slightly lower CPU consumption.
> > > 
> > > Our initial preference is to use W=2 by default.
> > > However, we fully understand your concerns regarding the potential
> > > performance drop.
> > > Should the patch be accepted, we would provide guidance on using the
> > > weighted value as
> > > a configurable option.
> > > 
> > > For reference, we consider values between -2 and 2 as noise in our
> > > analysis.
> > > I look forward to your feedback.
> > > 
> > > The table below shows a comparison between:
> > > *  - Host:* RHEL10.1-GA+qemu-10.0.0-14.el10_1, *Guest:* RHEL 9.6 GA vs.
> > > *  - Host:* RHEL10.1-GA+qemu-10.0.0-14.el10_1 (w=2, w=3), *Guest:*
> > > RHEL 9.6 GA
> > >     for FIO FCP and FICON with 1 iothread and 8 iothreads.
> > >     The values shown are the averages for numjobs 1, 4, and 8.
> > > 
> > > FIO FCP -1 iothread
> > > +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
> > > 
> > > |                     | Seq. Read  | Seq. Read  | Seq. Write | Seq.
> > > Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
> > > |                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3
> > > (%)   |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
> > > +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
> > > 
> > > | Throughput avg      |   -3.00    |   -2.33    |    0.00    |
> > >  -0.33    |   -3.00    |   -3.33    |    1.33    |   -0.33    |
> > > | CPU consumption avg |   -5.67    |   -4.33    |   -6.33    |
> > >  -5.33    |   -7.33    |   -5.33    |  -10.33    |   -8.67    |
> > > +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
> > > 
> > > 
> > > FIO FCP -8 iothread
> > > +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
> > > 
> > > |                     | Seq. Read  | Seq. Read  | Seq. Write | Seq.
> > > Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
> > > |                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3
> > > (%)   |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
> > > +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
> > > 
> > > | Throughput avg      |   -5.00    |   -4.00    |   -3.67    |
> > >  -3.33    |   -4.67    |   -4.00    |   -0.33    |   -0.67    |
> > > | CPU consumption avg |  -13.00    |  -10.67    |  -16.00    |
> > > -14.33    |  -14.00    |   -9.33    |  -13.67    |  -11.00    |
> > > +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
> > > 
> > > 
> > > 
> > > FIO FICON -1 iothread
> > > +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
> > > 
> > > |                     | Seq. Read  | Seq. Read  | Seq. Write | Seq.
> > > Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
> > > |                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3
> > > (%)   |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
> > > +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
> > > 
> > > | Throughput avg      |   -0.67    |   -0.67    |   -6.33    |
> > >  -6.67    |   -0.67    |    0.00    |    1.33    |    1.33    |
> > > | CPU consumption avg |   -7.67    |   -7.33    |  -13.67    |
> > > -13.00    |   -9.00    |   -8.33    |   -5.00    |   -4.33    |
> > > +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
> > > 
> > > 
> > > FIO FICON -8 iothread
> > > +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
> > > 
> > > |                     | Seq. Read  | Seq. Read  | Seq. Write | Seq.
> > > Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
> > > |                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3
> > > (%)   |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
> > > +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
> > > 
> > > | Throughput avg      |   -3.00    |   -2.67    |   -7.33    |
> > >  -7.00    |   -0.67    |   -1.00    |    0.67    |    0.67    |
> > > | CPU consumption avg |  -16.33    |  -14.33    |  -25.33    |
> > > -27.00    |   -8.67    |   -7.00    |   -6.67    |   -5.00    |
> > > +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
> > > 
> > > 
> > > 
> > > The table below shows a comparison between:
> > > * - Host:* RHEL 10.0 GA + qemu-9.1.0-15.el10, *Guest:* RHEL 9.6 GA vs.
> > > * - Host:* RHEL 10.1 GA + qemu-10.0.0-14.el10_1 (w=2, w=3), *Guest:*
> > > RHEL 9.6 GA.
> > > 
> > > FIO FCP -1 iothread
> > > +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
> > > 
> > > |                     | Seq. Read  | Seq. Read  | Seq. Write | Seq.
> > > Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
> > > |                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3
> > > (%)   |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
> > > +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
> > > 
> > > | Throughput avg      |   -0.67    |    0.00    |   -0.33    |
> > >  -1.00    |   -1.00    |   -0.67    |    3.33    |    2.00    |
> > > | CPU consumption avg |    0.67    |    2.00    |    1.67    | 3.00 
> > >   |   -3.33    |   -0.33    |   -2.33    |    0.00    |
> > > +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
> > > 
> > > 
> > > FIO FCP -8 iothread
> > > +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
> > > 
> > > |                     | Seq. Read  | Seq. Read  | Seq. Write | Seq.
> > > Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
> > > |                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3
> > > (%)   |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
> > > +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
> > > 
> > > | Throughput avg      |   -2.00    |   -1.33    |   -1.00    | 0.00 
> > >   |   -0.33    |    1.00    |    1.00    |    0.67    |
> > > | CPU consumption avg |   -3.00    |   -1.00    |   -2.00    | 0.00 
> > >   |  -10.33    |   -5.67    |   -6.67    |   -3.33    |
> > > +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
> > > 
> > > 
> > > 
> > > FIO FICON -1 iothread
> > > +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
> > > 
> > > |                     | Seq. Read  | Seq. Read  | Seq. Write | Seq.
> > > Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
> > > |                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3
> > > (%)   |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
> > > +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
> > > 
> > > | Throughput avg      |   -1.67    |   -1.67    |   -0.33    | 0.00 
> > >   |   -1.33    |   -1.00    |   -1.67    |   -2.33    |
> > > | CPU consumption avg |   -0.33    |    1.00    |    1.00    | 2.00 
> > >   |   -2.33    |   -1.33    |   -1.33    |   -0.33    |
> > > +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
> > > 
> > > 
> > > FIO FICON -8 iothread
> > > +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
> > > 
> > > |                     | Seq. Read  | Seq. Read  | Seq. Write | Seq.
> > > Write | Rand. Read | Rand. Read | Rand. Write| Rand. Write|
> > > |                     |   w2 (%)   |   w3 (%)   |   w2 (%)   |  w3
> > > (%)   |   w2 (%)   |   w3 (%)   |   w2 (%)   |   w3 (%)   |
> > > +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
> > > 
> > > | Throughput avg      |   -1.00    |   -0.67    |   -0.33    | 0.33 
> > >   |    0.67    |    0.33    |    0.33    |    0.33    |
> > > | CPU consumption avg |   -1.33    |    1.00    |    4.33    | 2.33 
> > >   |   -2.00    |    0.00    |   -3.00    |   -1.33    |
> > > +---------------------+------------+------------+------------+------------+------------+------------+------------+------------+
> > > 
> > > 
> > > 
> > > Regards,
> > > Jaehoon.
> > > 
> > > 
> > Hello Stefan,
> > 
> > I’m following up to see if you’ve had a chance to review the performance
> > results I shared.
> > As I mentioned, we are completely fine with using w=3 as the default
> > value.
> > It effectively restores performance to QEMU v9.1 levels, and we will
> > provide guidance
> > for users who need further CPU savings to use w=2.
> > 
> > I’ve updated the patch to set the default value to 3 and include the
> > feedback so far.
> > Please let me know if you have any comments or if I should post the new
> > patch.
> > 
> > Regards,
> > Jaehoon
> > 
> > 
> Hello,
> 
> I'd like to follow up on this thread. I have posted a v2 incorporating the
> feedback so far.
> I'd really appreciate your thoughts when you have a chance.
> 
> Thanks again for your time.

Hi Jaehoon,
Sorry that I haven't replied yet. I will review your emails and the v2
patches tomorrow.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2026-03-23 18:52 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-13 17:48 [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency Jaehoon Kim
2026-01-13 17:48 ` [PATCH RFC v1 1/3] aio-poll: avoid unnecessary polling time computation Jaehoon Kim
2026-02-16 14:58   ` Stefan Hajnoczi
2026-02-16 15:21   ` Stefan Hajnoczi
2026-02-16 20:47     ` JAEHOON KIM
2026-02-17 13:16       ` Stefan Hajnoczi
2026-02-18 13:43         ` JAEHOON KIM
2026-01-13 17:48 ` [PATCH RFC v1 2/3] aio-poll: refine iothread polling using weighted handler intervals Jaehoon Kim
2026-01-13 17:48 ` [PATCH RFC v1 3/3] qapi/iothread: introduce poll-weight parameter for aio-poll Jaehoon Kim
2026-01-14  7:48   ` Markus Armbruster
2026-01-15  5:14     ` JAEHOON KIM
2026-01-15  7:28       ` Markus Armbruster
2026-01-15 10:05         ` Halil Pasic
2026-01-15 16:00           ` JAEHOON KIM
2026-01-16  8:19           ` Markus Armbruster
2026-01-19 18:16 ` [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency Stefan Hajnoczi
2026-01-23 19:15   ` JAEHOON KIM
2026-01-27 21:11     ` Stefan Hajnoczi
2026-02-03 21:12     ` Stefan Hajnoczi
2026-02-06  6:50       ` JAEHOON KIM
2026-02-12 18:53         ` Stefan Hajnoczi
2026-02-13 15:13           ` JAEHOON KIM
2026-02-16 12:42             ` Stefan Hajnoczi
2026-02-19 22:27 ` Stefan Hajnoczi
2026-02-20 19:00   ` JAEHOON KIM
2026-02-24  4:24     ` Stefan Hajnoczi
2026-02-26  6:03     ` JAEHOON KIM
2026-03-09 20:46       ` JAEHOON KIM
2026-03-23 14:08         ` JAEHOON KIM
2026-03-23 18:51           ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.