Re: [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Stefan Hajnoczi <stefanha@redhat.com>
To: JAEHOON KIM <jhkim@linux.ibm.com>
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org,
	pbonzini@redhat.com, fam@euphon.net, armbru@redhat.com,
	eblake@redhat.com, berrange@redhat.com, eduardo@habkost.net,
	dave@treblig.org, sw@weilnetz.de
Subject: Re: [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency
Date: Tue, 3 Feb 2026 16:12:22 -0500	[thread overview]
Message-ID: <20260203211222.GB449076@fedora> (raw)
In-Reply-To: <eb9693ae-6aae-4599-8c8c-22268357d2c0@linux.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 6014 bytes --]

On Fri, Jan 23, 2026 at 01:15:04PM -0600, JAEHOON KIM wrote:
> On 1/19/2026 12:16 PM, Stefan Hajnoczi wrote:
> > On Tue, Jan 13, 2026 at 11:48:21AM -0600, Jaehoon Kim wrote:
> > > We evaluated the patches on an s390x host with a single guest using 16
> > > virtio block devices backed by FCP multipath devices in a separate-disk
> > > setup, with the I/O scheduler set to 'none' in both host and guest.
> > > 
> > > The fio workload included sequential and random read/write with varying
> > > numbers of jobs (1,4,8,16) and io_depth of 8. The tests were conducted
> > > with single and dual iothreads, using the newly introduced poll-weight
> > > parameter to measure their impact on CPU cost and throughput.
> > > 
> > > Compared to the baseline, across four FIO workload patterns (sequential
> > > R/W, random R/W), and averaged over FIO job counts of 1, 4, 8, and 16,
> > > throughput decreased slightly (-3% to -8% for one iothread, -2% to -5%
> > > for two iothreads), while CPU usage on the s390x host dropped
> > > significantly (-10% to -25% and -7% to -12%, respectively).
> > Hi Jaehoon,
> > I would like to run the same fio benchmarks on a local NVMe drive (<10us
> > request latency) to see how that type of hardware configuration is
> > affected. Are the scripts and fio job files available somewhere?
> > 
> > Thanks,
> > Stefan
> 
> Thank you for your reply.
> The fio scripts are not available in a location you can access, but there is nothing particularly special in the settings.
> I’m sharing below the methodology and test setup used by our performance team.
> 
> Guest Setup
> ----------------------
> - 12 vCPUs, 4 GiB memory
> - 16 virtio disks based on the FCP multipath devices in the host
> 
> FIO test parameters
> -----------------------
> - FIO Version: fio-3.33
> - Filesize: 2G
> - Blocksize: 8K / 128K
> - Direct I/O: 1
> - FIO I/O Engine: libaio
> - NUMJOB List: 1, 4, 8, 16
> - IODEPTH: 8
> - Runtime (s): 150
> 
> Two FIO samples for random read
> --------------------------------
> fio --direct=1 --name=test --numjobs=16 --filename=base.0.0:base.1.0:base.2.0:base.3.0:base.4.0:base.5.0:base.6.0:base.7.0:base.8.0:base.9.0:base.10.0:base.11.0:base.12.0:base.13.0:base.14.0:base.15.0 --size=32G  --time_based --runtime=4m --readwrite=randread --ioengine=libaio --iodepth=8 --bs=8k
> fio --direct=1 --name=test --numjobs=4  --filename=subw1/base.0.0:subw4/base.3.0:subw8/base.7.0:subw12/base.11.0:subw16/base.15.0                                                                        --size=8G   --time_based --runtime=4m --readwrite=randread --ioengine=libaio --iodepth=8 --bs=8k
> 
> 
> additional notes
> ----------------
> - Each file is placed on a separate disk device mounted under subw<n> as specified in --filename=....
> - We execute one warmup run, then two measurement runs and calculate the average

Hi Jaehoon,
I ran fio benchmarks on an Intel Optane SSD DC P4800X Series drive (<10
microsecond latency). This is with just 1 drive.

The 8 KiB block size results show something similar to what you
reported: there are IOPS (or throughput) regressions and CPU utilization
improvements.

Although the CPU improvements are welcome, I think the default behavior
should only be changed if the IOPS regressions can be brought below 5%.

The regressions seem to happen regardless of whether 1 or 2 IOThreads
are configured. CPU utilization is different (98% vs 78%) depending on
the number of IOThreads, so the regressions happen across a range of CPU
utilizations.

The 128 KiB block size results are not interesting because the drive
already saturates at numjobs=1. This is expected since the drive cannot
go much above ~2 GiB/s throughput.

You can find the Ansible playbook, libvirt domain XML, fio
command-lines, and the fio/sar data here:

https://gitlab.com/stefanha/virt-playbooks/-/tree/aio-polling-efficiency

Please let me know if you'd like me to rerun the benchmark with new
patches or a configuration change.

Do you want to have a video call to discuss your work and how to get the
patches merged?

Host
----
CPU: Intel Xeon Silver 4214 CPU @ 2.20GHz
RAM: 32 GiB

Guest
-----
vCPUs: 8
RAM: 4 GiB
Disk: 1 virtio-blk aio=native cache=none

IOPS
----
rw        bs   numjobs iothreads iops   diff
randread  8k   1       1         163417 -7.8%
randread  8k   1       2         165041 -2.4%
randread  8k   4       1         221508 -0.64%
randread  8k   4       2         251298 0.008%
randread  8k   8       1         222128 -0.51%
randread  8k   8       2         249489 -2.6%
randread  8k   16      1         230535 -0.18%
randread  8k   16      2         246732 -0.22%
randread  128k 1       1          17616 -0.11%
randread  128k 1       2          17678 0.027%
randread  128k 4       1          17536 -0.27%
randread  128k 4       2          17610 -0.031%
randread  128k 8       1          17369 -0.42%
randread  128k 8       2          17433 -0.071%
randread  128k 16      1          17215 -0.61%
randread  128k 16      2          17269 -0.22%
randwrite 8k   1       1         156597 -3.1%
randwrite 8k   1       2         157720 -3.8%
randwrite 8k   4       1         218448 -0.5%
randwrite 8k   4       2         247075 -5.1%
randwrite 8k   8       1         220866 -0.75%
randwrite 8k   8       2         260935 -0.011%
randwrite 8k   16      1         230913 0.23%
randwrite 8k   16      2         261125 -0.01%
randwrite 128k 1       1          16009 0.094%
randwrite 128k 1       2          16070 0.035%
randwrite 128k 4       1          16073 -0.62%
randwrite 128k 4       2          16131 0.059%
randwrite 128k 8       1          16106 0.092%
randwrite 128k 8       2          16153 0.048%
randwrite 128k 16      1          16102 -0.0091%
randwrite 128k 16      2          16160 0.048%

IOThread CPU usage
------------------
iothreads before  after
1         98.7    95.81
2         78.43   66.13

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

next prev parent reply	other threads:[~2026-02-03 21:12 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-13 17:48 [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency Jaehoon Kim
2026-01-13 17:48 ` [PATCH RFC v1 1/3] aio-poll: avoid unnecessary polling time computation Jaehoon Kim
2026-02-16 14:58   ` Stefan Hajnoczi
2026-02-16 15:21   ` Stefan Hajnoczi
2026-02-16 20:47     ` JAEHOON KIM
2026-02-17 13:16       ` Stefan Hajnoczi
2026-02-18 13:43         ` JAEHOON KIM
2026-01-13 17:48 ` [PATCH RFC v1 2/3] aio-poll: refine iothread polling using weighted handler intervals Jaehoon Kim
2026-01-13 17:48 ` [PATCH RFC v1 3/3] qapi/iothread: introduce poll-weight parameter for aio-poll Jaehoon Kim
2026-01-14  7:48   ` Markus Armbruster
2026-01-15  5:14     ` JAEHOON KIM
2026-01-15  7:28       ` Markus Armbruster
2026-01-15 10:05         ` Halil Pasic
2026-01-15 16:00           ` JAEHOON KIM
2026-01-16  8:19           ` Markus Armbruster
2026-01-19 18:16 ` [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency Stefan Hajnoczi
2026-01-23 19:15   ` JAEHOON KIM
2026-01-27 21:11     ` Stefan Hajnoczi
2026-02-03 21:12     ` Stefan Hajnoczi [this message]
2026-02-06  6:50       ` JAEHOON KIM
2026-02-12 18:53         ` Stefan Hajnoczi
2026-02-13 15:13           ` JAEHOON KIM
2026-02-16 12:42             ` Stefan Hajnoczi
2026-02-19 22:27 ` Stefan Hajnoczi
2026-02-20 19:00   ` JAEHOON KIM
2026-02-24  4:24     ` Stefan Hajnoczi
2026-02-26  6:03     ` JAEHOON KIM
2026-03-09 20:46       ` JAEHOON KIM
2026-03-23 14:08         ` JAEHOON KIM
2026-03-23 18:51           ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260203211222.GB449076@fedora \
    --to=stefanha@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=dave@treblig.org \
    --cc=eblake@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=fam@euphon.net \
    --cc=jhkim@linux.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=sw@weilnetz.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.