Re: [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Stefan Hajnoczi <stefanha@redhat.com>
To: JAEHOON KIM <jhkim@linux.ibm.com>
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org,
	pbonzini@redhat.com, fam@euphon.net, armbru@redhat.com,
	eblake@redhat.com, berrange@redhat.com, eduardo@habkost.net,
	dave@treblig.org, sw@weilnetz.de
Subject: Re: [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency
Date: Thu, 12 Feb 2026 13:53:21 -0500	[thread overview]
Message-ID: <20260212185321.GA257116@fedora> (raw)
In-Reply-To: <b507e8c4-47c8-44fb-a7fc-6239e029cd56@linux.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 8143 bytes --]

On Fri, Feb 06, 2026 at 12:50:38AM -0600, JAEHOON KIM wrote:
> On 2/3/2026 3:12 PM, Stefan Hajnoczi wrote:
> > On Fri, Jan 23, 2026 at 01:15:04PM -0600, JAEHOON KIM wrote:
> > > On 1/19/2026 12:16 PM, Stefan Hajnoczi wrote:
> > > > On Tue, Jan 13, 2026 at 11:48:21AM -0600, Jaehoon Kim wrote:
> > > > > We evaluated the patches on an s390x host with a single guest using 16
> > > > > virtio block devices backed by FCP multipath devices in a separate-disk
> > > > > setup, with the I/O scheduler set to 'none' in both host and guest.
> > > > > 
> > > > > The fio workload included sequential and random read/write with varying
> > > > > numbers of jobs (1,4,8,16) and io_depth of 8. The tests were conducted
> > > > > with single and dual iothreads, using the newly introduced poll-weight
> > > > > parameter to measure their impact on CPU cost and throughput.
> > > > > 
> > > > > Compared to the baseline, across four FIO workload patterns (sequential
> > > > > R/W, random R/W), and averaged over FIO job counts of 1, 4, 8, and 16,
> > > > > throughput decreased slightly (-3% to -8% for one iothread, -2% to -5%
> > > > > for two iothreads), while CPU usage on the s390x host dropped
> > > > > significantly (-10% to -25% and -7% to -12%, respectively).
> > > > Hi Jaehoon,
> > > > I would like to run the same fio benchmarks on a local NVMe drive (<10us
> > > > request latency) to see how that type of hardware configuration is
> > > > affected. Are the scripts and fio job files available somewhere?
> > > > 
> > > > Thanks,
> > > > Stefan
> > > Thank you for your reply.
> > > The fio scripts are not available in a location you can access, but there is nothing particularly special in the settings.
> > > I’m sharing below the methodology and test setup used by our performance team.
> > > 
> > > Guest Setup
> > > ----------------------
> > > - 12 vCPUs, 4 GiB memory
> > > - 16 virtio disks based on the FCP multipath devices in the host
> > > 
> > > FIO test parameters
> > > -----------------------
> > > - FIO Version: fio-3.33
> > > - Filesize: 2G
> > > - Blocksize: 8K / 128K
> > > - Direct I/O: 1
> > > - FIO I/O Engine: libaio
> > > - NUMJOB List: 1, 4, 8, 16
> > > - IODEPTH: 8
> > > - Runtime (s): 150
> > > 
> > > Two FIO samples for random read
> > > --------------------------------
> > > fio --direct=1 --name=test --numjobs=16 --filename=base.0.0:base.1.0:base.2.0:base.3.0:base.4.0:base.5.0:base.6.0:base.7.0:base.8.0:base.9.0:base.10.0:base.11.0:base.12.0:base.13.0:base.14.0:base.15.0 --size=32G  --time_based --runtime=4m --readwrite=randread --ioengine=libaio --iodepth=8 --bs=8k
> > > fio --direct=1 --name=test --numjobs=4  --filename=subw1/base.0.0:subw4/base.3.0:subw8/base.7.0:subw12/base.11.0:subw16/base.15.0                                                                        --size=8G   --time_based --runtime=4m --readwrite=randread --ioengine=libaio --iodepth=8 --bs=8k
> > > 
> > > 
> > > additional notes
> > > ----------------
> > > - Each file is placed on a separate disk device mounted under subw<n> as specified in --filename=....
> > > - We execute one warmup run, then two measurement runs and calculate the average
> > Hi Jaehoon,
> > I ran fio benchmarks on an Intel Optane SSD DC P4800X Series drive (<10
> > microsecond latency). This is with just 1 drive.
> > 
> > The 8 KiB block size results show something similar to what you
> > reported: there are IOPS (or throughput) regressions and CPU utilization
> > improvements.
> > 
> > Although the CPU improvements are welcome, I think the default behavior
> > should only be changed if the IOPS regressions can be brought below 5%.
> > 
> > The regressions seem to happen regardless of whether 1 or 2 IOThreads
> > are configured. CPU utilization is different (98% vs 78%) depending on
> > the number of IOThreads, so the regressions happen across a range of CPU
> > utilizations.
> > 
> > The 128 KiB block size results are not interesting because the drive
> > already saturates at numjobs=1. This is expected since the drive cannot
> > go much above ~2 GiB/s throughput.
> > 
> > You can find the Ansible playbook, libvirt domain XML, fio
> > command-lines, and the fio/sar data here:
> > 
> > https://gitlab.com/stefanha/virt-playbooks/-/tree/aio-polling-efficiency
> > 
> > Please let me know if you'd like me to rerun the benchmark with new
> > patches or a configuration change.
> > 
> > Do you want to have a video call to discuss your work and how to get the
> > patches merged?
> > 
> > Host
> > ----
> > CPU: Intel Xeon Silver 4214 CPU @ 2.20GHz
> > RAM: 32 GiB
> > 
> > Guest
> > -----
> > vCPUs: 8
> > RAM: 4 GiB
> > Disk: 1 virtio-blk aio=native cache=none
> > 
> > IOPS
> > ----
> > rw        bs   numjobs iothreads iops   diff
> > randread  8k   1       1         163417 -7.8%
> > randread  8k   1       2         165041 -2.4%
> > randread  8k   4       1         221508 -0.64%
> > randread  8k   4       2         251298 0.008%
> > randread  8k   8       1         222128 -0.51%
> > randread  8k   8       2         249489 -2.6%
> > randread  8k   16      1         230535 -0.18%
> > randread  8k   16      2         246732 -0.22%
> > randread  128k 1       1          17616 -0.11%
> > randread  128k 1       2          17678 0.027%
> > randread  128k 4       1          17536 -0.27%
> > randread  128k 4       2          17610 -0.031%
> > randread  128k 8       1          17369 -0.42%
> > randread  128k 8       2          17433 -0.071%
> > randread  128k 16      1          17215 -0.61%
> > randread  128k 16      2          17269 -0.22%
> > randwrite 8k   1       1         156597 -3.1%
> > randwrite 8k   1       2         157720 -3.8%
> > randwrite 8k   4       1         218448 -0.5%
> > randwrite 8k   4       2         247075 -5.1%
> > randwrite 8k   8       1         220866 -0.75%
> > randwrite 8k   8       2         260935 -0.011%
> > randwrite 8k   16      1         230913 0.23%
> > randwrite 8k   16      2         261125 -0.01%
> > randwrite 128k 1       1          16009 0.094%
> > randwrite 128k 1       2          16070 0.035%
> > randwrite 128k 4       1          16073 -0.62%
> > randwrite 128k 4       2          16131 0.059%
> > randwrite 128k 8       1          16106 0.092%
> > randwrite 128k 8       2          16153 0.048%
> > randwrite 128k 16      1          16102 -0.0091%
> > randwrite 128k 16      2          16160 0.048%
> > 
> > IOThread CPU usage
> > ------------------
> > iothreads before  after
> > 1         98.7    95.81
> > 2         78.43   66.13
> > 
> > Stefan
> 
> Hello Stefan,
> 
> Thank you very much for your effort in running these benchmarks.
> The results show a pattern very similar to what our performance team
> observed.
> 
> I fully agree with the 5% threshold for the default behavior.
> However, we need an approach that balances the current performance
> oriented polling scheme with CPU efficiency.
> 
> I found that relying on grow/shrink parameters was too limited to
> achieve these results. This is why I've adjusted the process using a
> weight-based grow/shrink approach to ensure the polling window remains
> robust against jitter. Specifically, it avoids abrupt resets to zero
> by implementing a gradual shrink rather than an immediate reset, even
> when device latency exceeds the threshold.
> 
> As seen in both your results and our team's measurements, this may lead
> to a bit of a performance trade-off, but it provides a reasonable
> balance for CPU-sensitive environment.
> 
> Thank you for suggesting the video call and I am also looking forward to
> hearing your thoughts. I'm on US Central Time. Except for Tuesday, I can
> adjust my schedule to a time that works for you.
> 
> Please let me know your preferred time.

Is Monday, February 16th at 10:00am CST good for you? If not, please
feel free to pick any time on Monday.

Meeting link: https://meet.jit.si/AioPollingOptimization

Anyone else interested in this topic is welcome to join.

Thanks,
Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

next prev parent reply	other threads:[~2026-02-12 18:55 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-13 17:48 [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency Jaehoon Kim
2026-01-13 17:48 ` [PATCH RFC v1 1/3] aio-poll: avoid unnecessary polling time computation Jaehoon Kim
2026-02-16 14:58   ` Stefan Hajnoczi
2026-02-16 15:21   ` Stefan Hajnoczi
2026-02-16 20:47     ` JAEHOON KIM
2026-02-17 13:16       ` Stefan Hajnoczi
2026-02-18 13:43         ` JAEHOON KIM
2026-01-13 17:48 ` [PATCH RFC v1 2/3] aio-poll: refine iothread polling using weighted handler intervals Jaehoon Kim
2026-01-13 17:48 ` [PATCH RFC v1 3/3] qapi/iothread: introduce poll-weight parameter for aio-poll Jaehoon Kim
2026-01-14  7:48   ` Markus Armbruster
2026-01-15  5:14     ` JAEHOON KIM
2026-01-15  7:28       ` Markus Armbruster
2026-01-15 10:05         ` Halil Pasic
2026-01-15 16:00           ` JAEHOON KIM
2026-01-16  8:19           ` Markus Armbruster
2026-01-19 18:16 ` [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency Stefan Hajnoczi
2026-01-23 19:15   ` JAEHOON KIM
2026-01-27 21:11     ` Stefan Hajnoczi
2026-02-03 21:12     ` Stefan Hajnoczi
2026-02-06  6:50       ` JAEHOON KIM
2026-02-12 18:53         ` Stefan Hajnoczi [this message]
2026-02-13 15:13           ` JAEHOON KIM
2026-02-16 12:42             ` Stefan Hajnoczi
2026-02-19 22:27 ` Stefan Hajnoczi
2026-02-20 19:00   ` JAEHOON KIM
2026-02-24  4:24     ` Stefan Hajnoczi
2026-02-26  6:03     ` JAEHOON KIM
2026-03-09 20:46       ` JAEHOON KIM
2026-03-23 14:08         ` JAEHOON KIM
2026-03-23 18:51           ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260212185321.GA257116@fedora \
    --to=stefanha@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=dave@treblig.org \
    --cc=eblake@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=fam@euphon.net \
    --cc=jhkim@linux.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=sw@weilnetz.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.