qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* virtio-blk using a single iothread
@ 2023-06-08  7:40 Sagi Grimberg
  2023-06-08 16:08 ` Stefan Hajnoczi
  0 siblings, 1 reply; 6+ messages in thread
From: Sagi Grimberg @ 2023-06-08  7:40 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Hajnoczi; +Cc: Qemu Developers

Hey Stefan, Paolo,

I just had a report from a user experiencing lower virtio-blk
performance than he expected. This user is running virtio-blk on top of
nvme-tcp device. The guest is running 12 CPU cores.

The guest read/write throughput is capped at around 30% of the available
throughput from the host (~800MB/s from the guest vs. 2800MB/s from the
host - 25Gb/s nic). The workload running on the guest is a
multi-threaded fio workload.

What is observed is the fact that virtio-blk is using a single disk-wide
iothread processing all the vqs. Specifically nvme-tcp (similar to other
tcp based protocols) is negatively impacted by lack of thread
concurrency that can distribute I/O requests to different TCP
connections.

We also attempted to move the iothread to a dedicated core, however that
did yield any meaningful performance improvements). The reason appears
to be less about CPU utilization on the iothread core, but more around
single TCP connection serialization.

Moving to io=threads does increase the throughput, however sacrificing
latency significantly.

So the user find itself with available host cpus and TCP connections
that it could easily use to get maximum throughput, without the ability
to leverage them. True, other guests will use different
threads/contexts, however the goal here is to allow the full performance
from a single device.

I've seen several discussions and attempts in the past to allow a
virtio-blk device leverage multiple iothreads, but around 2 years ago
the discussions over this paused. So wanted to ask, are there any plans
or anything in the works to address this limitation?

I've seen that the spdk folks are heading in this direction with their
vhost-blk implementation:
https://review.spdk.io/gerrit/c/spdk/spdk/+/16068

Thanks,


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: virtio-blk using a single iothread
  2023-06-08  7:40 virtio-blk using a single iothread Sagi Grimberg
@ 2023-06-08 16:08 ` Stefan Hajnoczi
  2023-06-11 12:27   ` Sagi Grimberg
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2023-06-08 16:08 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Paolo Bonzini, Qemu Developers

[-- Attachment #1: Type: text/plain, Size: 3817 bytes --]

On Thu, Jun 08, 2023 at 10:40:57AM +0300, Sagi Grimberg wrote:
> Hey Stefan, Paolo,
> 
> I just had a report from a user experiencing lower virtio-blk
> performance than he expected. This user is running virtio-blk on top of
> nvme-tcp device. The guest is running 12 CPU cores.
> 
> The guest read/write throughput is capped at around 30% of the available
> throughput from the host (~800MB/s from the guest vs. 2800MB/s from the
> host - 25Gb/s nic). The workload running on the guest is a
> multi-threaded fio workload.
> 
> What is observed is the fact that virtio-blk is using a single disk-wide
> iothread processing all the vqs. Specifically nvme-tcp (similar to other
> tcp based protocols) is negatively impacted by lack of thread
> concurrency that can distribute I/O requests to different TCP
> connections.
> 
> We also attempted to move the iothread to a dedicated core, however that
> did yield any meaningful performance improvements). The reason appears
> to be less about CPU utilization on the iothread core, but more around
> single TCP connection serialization.
> 
> Moving to io=threads does increase the throughput, however sacrificing
> latency significantly.
> 
> So the user find itself with available host cpus and TCP connections
> that it could easily use to get maximum throughput, without the ability
> to leverage them. True, other guests will use different
> threads/contexts, however the goal here is to allow the full performance
> from a single device.
> 
> I've seen several discussions and attempts in the past to allow a
> virtio-blk device leverage multiple iothreads, but around 2 years ago
> the discussions over this paused. So wanted to ask, are there any plans
> or anything in the works to address this limitation?
> 
> I've seen that the spdk folks are heading in this direction with their
> vhost-blk implementation:
> https://review.spdk.io/gerrit/c/spdk/spdk/+/16068

Hi Sagi,
Yes, there is an ongoing QEMU multi-queue block layer effort to make it
possible for multiple IOThreads to process disk I/O for the same
--blockdev in parallel.

Most of my recent QEMU patches have been part of this effort. There is a
work-in-progress branch that supports mapping virtio-blk virtqueues to
specific IOThreads:
https://gitlab.com/stefanha/qemu/-/commits/virtio-blk-iothread-vq-mapping

The syntax is:

  --device '{"driver":"virtio-blk-pci","iothread-vq-mapping":[{"iothread":"iothread0"},{"iothread":"iothread1"}],"drive":"drive0"}'

This says "assign virtqueues round-robin to iothread0 and iothread1".
Half the virtqueues will be processed by iothread0 and the other half by
iothread1. There is also syntax for assigning specific virtqueues to
each IOThread, but usually the automatic round-robin assignment is all
that's needed.

This work is not finished yet. Basic I/O (e.g. fio) works without
crashes, but expect to hit issues if you use blockjobs, hotplug, etc.

Performance optimization work has just begun, so it won't deliver all
the benefits yet. I ran a benchmark yesterday where going from 1 to 2
IOThreads increased performance by 25%. That's much less than we're
aiming for; attaching two independent virtio-blk devices improves the
performance by ~100%. I know we can get there eventually. Some of the
bottlenecks are known (e.g. block statistics collection causes lock
contention) and others are yet to be investigated.

The Ansible playbook, libvirt XML, fio jobs, etc for the benchmark are
available here:
https://gitlab.com/stefanha/virt-playbooks/-/tree/8379665537c47c0901f426f0b9333ade8236ac3b

You are welcome to give the QEMU patches a try. I will be away next week
to attend KVM Forum, so I may not respond to emails quickly but am
interested in what you find.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: virtio-blk using a single iothread
  2023-06-08 16:08 ` Stefan Hajnoczi
@ 2023-06-11 12:27   ` Sagi Grimberg
  2023-06-21 12:23     ` Stefan Hajnoczi
                       ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Sagi Grimberg @ 2023-06-11 12:27 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Paolo Bonzini, Qemu Developers



On 6/8/23 19:08, Stefan Hajnoczi wrote:
> On Thu, Jun 08, 2023 at 10:40:57AM +0300, Sagi Grimberg wrote:
>> Hey Stefan, Paolo,
>>
>> I just had a report from a user experiencing lower virtio-blk
>> performance than he expected. This user is running virtio-blk on top of
>> nvme-tcp device. The guest is running 12 CPU cores.
>>
>> The guest read/write throughput is capped at around 30% of the available
>> throughput from the host (~800MB/s from the guest vs. 2800MB/s from the
>> host - 25Gb/s nic). The workload running on the guest is a
>> multi-threaded fio workload.
>>
>> What is observed is the fact that virtio-blk is using a single disk-wide
>> iothread processing all the vqs. Specifically nvme-tcp (similar to other
>> tcp based protocols) is negatively impacted by lack of thread
>> concurrency that can distribute I/O requests to different TCP
>> connections.
>>
>> We also attempted to move the iothread to a dedicated core, however that
>> did yield any meaningful performance improvements). The reason appears
>> to be less about CPU utilization on the iothread core, but more around
>> single TCP connection serialization.
>>
>> Moving to io=threads does increase the throughput, however sacrificing
>> latency significantly.
>>
>> So the user find itself with available host cpus and TCP connections
>> that it could easily use to get maximum throughput, without the ability
>> to leverage them. True, other guests will use different
>> threads/contexts, however the goal here is to allow the full performance
>> from a single device.
>>
>> I've seen several discussions and attempts in the past to allow a
>> virtio-blk device leverage multiple iothreads, but around 2 years ago
>> the discussions over this paused. So wanted to ask, are there any plans
>> or anything in the works to address this limitation?
>>
>> I've seen that the spdk folks are heading in this direction with their
>> vhost-blk implementation:
>> https://review.spdk.io/gerrit/c/spdk/spdk/+/16068
> 
> Hi Sagi,
> Yes, there is an ongoing QEMU multi-queue block layer effort to make it
> possible for multiple IOThreads to process disk I/O for the same
> --blockdev in parallel.

Great to know.

> Most of my recent QEMU patches have been part of this effort. There is a
> work-in-progress branch that supports mapping virtio-blk virtqueues to
> specific IOThreads:
> https://gitlab.com/stefanha/qemu/-/commits/virtio-blk-iothread-vq-mapping

Thanks for the pointer.

> The syntax is:
> 
>    --device '{"driver":"virtio-blk-pci","iothread-vq-mapping":[{"iothread":"iothread0"},{"iothread":"iothread1"}],"drive":"drive0"}'
> 
> This says "assign virtqueues round-robin to iothread0 and iothread1".
> Half the virtqueues will be processed by iothread0 and the other half by
> iothread1. There is also syntax for assigning specific virtqueues to
> each IOThread, but usually the automatic round-robin assignment is all
> that's needed.
> 
> This work is not finished yet. Basic I/O (e.g. fio) works without
> crashes, but expect to hit issues if you use blockjobs, hotplug, etc.
> 
> Performance optimization work has just begun, so it won't deliver all
> the benefits yet. I ran a benchmark yesterday where going from 1 to 2
> IOThreads increased performance by 25%. That's much less than we're
> aiming for; attaching two independent virtio-blk devices improves the
> performance by ~100%. I know we can get there eventually. Some of the
> bottlenecks are known (e.g. block statistics collection causes lock
> contention) and others are yet to be investigated.

Hmm, I rebased this branch on top of mainline master and ran a naive
test, and it seems that performance regressed quite a bit :(

I'm running this test on my laptop (Intel(R) Core(TM) i7-8650U CPU
@1.90GHz), so this is more qualitative test for BW only.
I use null_blk as the host device.

With mainline master I get ~9GB/s 64k randread, and with your branch
I get ~5GB/s, this is regardless of assigning iothreads (one or
two) or not.

my qemu command:
taskset -c 0-3 build/qemu-system-x86_64 -cpu host -m 1G -enable-kvm -smp 
4 -drive 
file=/var/lib/libvirt/images/ubuntu-22/root-disk-clone.qcow2,format=qcow2 
-drive 
if=none,id=drive0,cache=none,aio=native,format=raw,file=/dev/nullb0 
-device virtio-blk-pci,drive=drive0,scsi=off -nographic

my guest fio jobfile:
--
[global]
group_reporting
runtime=3000
time_based
loops=1
direct=1
invalidate=1
randrepeat=0
norandommap
exitall
cpus_allowed=0-3
cpus_allowed_policy=split

[read]
filename=/dev/vda
numjobs=4
iodepth=32
bs=64k
rw=randread
ioengine=io_uring
--

Maybe I'm doing something wrong? Didn't expect to find a regression
against mainline on the default setup.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: virtio-blk using a single iothread
  2023-06-11 12:27   ` Sagi Grimberg
@ 2023-06-21 12:23     ` Stefan Hajnoczi
  2023-07-27 15:11     ` Stefan Hajnoczi
  2023-07-31 15:51     ` Stefan Hajnoczi
  2 siblings, 0 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2023-06-21 12:23 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Stefan Hajnoczi, Paolo Bonzini, Qemu Developers

Hi Sagi,
I just got back from a conference and am going to be offline for a
week starting tomorrow. I haven't had time to look through your email
but will reply when I'm back from vacation.

Stefan

On Sun, 11 Jun 2023 at 14:29, Sagi Grimberg <sagi@grimberg.me> wrote:
>
>
>
> On 6/8/23 19:08, Stefan Hajnoczi wrote:
> > On Thu, Jun 08, 2023 at 10:40:57AM +0300, Sagi Grimberg wrote:
> >> Hey Stefan, Paolo,
> >>
> >> I just had a report from a user experiencing lower virtio-blk
> >> performance than he expected. This user is running virtio-blk on top of
> >> nvme-tcp device. The guest is running 12 CPU cores.
> >>
> >> The guest read/write throughput is capped at around 30% of the available
> >> throughput from the host (~800MB/s from the guest vs. 2800MB/s from the
> >> host - 25Gb/s nic). The workload running on the guest is a
> >> multi-threaded fio workload.
> >>
> >> What is observed is the fact that virtio-blk is using a single disk-wide
> >> iothread processing all the vqs. Specifically nvme-tcp (similar to other
> >> tcp based protocols) is negatively impacted by lack of thread
> >> concurrency that can distribute I/O requests to different TCP
> >> connections.
> >>
> >> We also attempted to move the iothread to a dedicated core, however that
> >> did yield any meaningful performance improvements). The reason appears
> >> to be less about CPU utilization on the iothread core, but more around
> >> single TCP connection serialization.
> >>
> >> Moving to io=threads does increase the throughput, however sacrificing
> >> latency significantly.
> >>
> >> So the user find itself with available host cpus and TCP connections
> >> that it could easily use to get maximum throughput, without the ability
> >> to leverage them. True, other guests will use different
> >> threads/contexts, however the goal here is to allow the full performance
> >> from a single device.
> >>
> >> I've seen several discussions and attempts in the past to allow a
> >> virtio-blk device leverage multiple iothreads, but around 2 years ago
> >> the discussions over this paused. So wanted to ask, are there any plans
> >> or anything in the works to address this limitation?
> >>
> >> I've seen that the spdk folks are heading in this direction with their
> >> vhost-blk implementation:
> >> https://review.spdk.io/gerrit/c/spdk/spdk/+/16068
> >
> > Hi Sagi,
> > Yes, there is an ongoing QEMU multi-queue block layer effort to make it
> > possible for multiple IOThreads to process disk I/O for the same
> > --blockdev in parallel.
>
> Great to know.
>
> > Most of my recent QEMU patches have been part of this effort. There is a
> > work-in-progress branch that supports mapping virtio-blk virtqueues to
> > specific IOThreads:
> > https://gitlab.com/stefanha/qemu/-/commits/virtio-blk-iothread-vq-mapping
>
> Thanks for the pointer.
>
> > The syntax is:
> >
> >    --device '{"driver":"virtio-blk-pci","iothread-vq-mapping":[{"iothread":"iothread0"},{"iothread":"iothread1"}],"drive":"drive0"}'
> >
> > This says "assign virtqueues round-robin to iothread0 and iothread1".
> > Half the virtqueues will be processed by iothread0 and the other half by
> > iothread1. There is also syntax for assigning specific virtqueues to
> > each IOThread, but usually the automatic round-robin assignment is all
> > that's needed.
> >
> > This work is not finished yet. Basic I/O (e.g. fio) works without
> > crashes, but expect to hit issues if you use blockjobs, hotplug, etc.
> >
> > Performance optimization work has just begun, so it won't deliver all
> > the benefits yet. I ran a benchmark yesterday where going from 1 to 2
> > IOThreads increased performance by 25%. That's much less than we're
> > aiming for; attaching two independent virtio-blk devices improves the
> > performance by ~100%. I know we can get there eventually. Some of the
> > bottlenecks are known (e.g. block statistics collection causes lock
> > contention) and others are yet to be investigated.
>
> Hmm, I rebased this branch on top of mainline master and ran a naive
> test, and it seems that performance regressed quite a bit :(
>
> I'm running this test on my laptop (Intel(R) Core(TM) i7-8650U CPU
> @1.90GHz), so this is more qualitative test for BW only.
> I use null_blk as the host device.
>
> With mainline master I get ~9GB/s 64k randread, and with your branch
> I get ~5GB/s, this is regardless of assigning iothreads (one or
> two) or not.
>
> my qemu command:
> taskset -c 0-3 build/qemu-system-x86_64 -cpu host -m 1G -enable-kvm -smp
> 4 -drive
> file=/var/lib/libvirt/images/ubuntu-22/root-disk-clone.qcow2,format=qcow2
> -drive
> if=none,id=drive0,cache=none,aio=native,format=raw,file=/dev/nullb0
> -device virtio-blk-pci,drive=drive0,scsi=off -nographic
>
> my guest fio jobfile:
> --
> [global]
> group_reporting
> runtime=3000
> time_based
> loops=1
> direct=1
> invalidate=1
> randrepeat=0
> norandommap
> exitall
> cpus_allowed=0-3
> cpus_allowed_policy=split
>
> [read]
> filename=/dev/vda
> numjobs=4
> iodepth=32
> bs=64k
> rw=randread
> ioengine=io_uring
> --
>
> Maybe I'm doing something wrong? Didn't expect to find a regression
> against mainline on the default setup.
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: virtio-blk using a single iothread
  2023-06-11 12:27   ` Sagi Grimberg
  2023-06-21 12:23     ` Stefan Hajnoczi
@ 2023-07-27 15:11     ` Stefan Hajnoczi
  2023-07-31 15:51     ` Stefan Hajnoczi
  2 siblings, 0 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2023-07-27 15:11 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Paolo Bonzini, Qemu Developers

[-- Attachment #1: Type: text/plain, Size: 6723 bytes --]

On Sun, Jun 11, 2023 at 03:27:57PM +0300, Sagi Grimberg wrote:
> 
> 
> On 6/8/23 19:08, Stefan Hajnoczi wrote:
> > On Thu, Jun 08, 2023 at 10:40:57AM +0300, Sagi Grimberg wrote:
> > > Hey Stefan, Paolo,
> > > 
> > > I just had a report from a user experiencing lower virtio-blk
> > > performance than he expected. This user is running virtio-blk on top of
> > > nvme-tcp device. The guest is running 12 CPU cores.
> > > 
> > > The guest read/write throughput is capped at around 30% of the available
> > > throughput from the host (~800MB/s from the guest vs. 2800MB/s from the
> > > host - 25Gb/s nic). The workload running on the guest is a
> > > multi-threaded fio workload.
> > > 
> > > What is observed is the fact that virtio-blk is using a single disk-wide
> > > iothread processing all the vqs. Specifically nvme-tcp (similar to other
> > > tcp based protocols) is negatively impacted by lack of thread
> > > concurrency that can distribute I/O requests to different TCP
> > > connections.
> > > 
> > > We also attempted to move the iothread to a dedicated core, however that
> > > did yield any meaningful performance improvements). The reason appears
> > > to be less about CPU utilization on the iothread core, but more around
> > > single TCP connection serialization.
> > > 
> > > Moving to io=threads does increase the throughput, however sacrificing
> > > latency significantly.
> > > 
> > > So the user find itself with available host cpus and TCP connections
> > > that it could easily use to get maximum throughput, without the ability
> > > to leverage them. True, other guests will use different
> > > threads/contexts, however the goal here is to allow the full performance
> > > from a single device.
> > > 
> > > I've seen several discussions and attempts in the past to allow a
> > > virtio-blk device leverage multiple iothreads, but around 2 years ago
> > > the discussions over this paused. So wanted to ask, are there any plans
> > > or anything in the works to address this limitation?
> > > 
> > > I've seen that the spdk folks are heading in this direction with their
> > > vhost-blk implementation:
> > > https://review.spdk.io/gerrit/c/spdk/spdk/+/16068
> > 
> > Hi Sagi,
> > Yes, there is an ongoing QEMU multi-queue block layer effort to make it
> > possible for multiple IOThreads to process disk I/O for the same
> > --blockdev in parallel.
> 
> Great to know.
> 
> > Most of my recent QEMU patches have been part of this effort. There is a
> > work-in-progress branch that supports mapping virtio-blk virtqueues to
> > specific IOThreads:
> > https://gitlab.com/stefanha/qemu/-/commits/virtio-blk-iothread-vq-mapping
> 
> Thanks for the pointer.
> 
> > The syntax is:
> > 
> >    --device '{"driver":"virtio-blk-pci","iothread-vq-mapping":[{"iothread":"iothread0"},{"iothread":"iothread1"}],"drive":"drive0"}'
> > 
> > This says "assign virtqueues round-robin to iothread0 and iothread1".
> > Half the virtqueues will be processed by iothread0 and the other half by
> > iothread1. There is also syntax for assigning specific virtqueues to
> > each IOThread, but usually the automatic round-robin assignment is all
> > that's needed.
> > 
> > This work is not finished yet. Basic I/O (e.g. fio) works without
> > crashes, but expect to hit issues if you use blockjobs, hotplug, etc.
> > 
> > Performance optimization work has just begun, so it won't deliver all
> > the benefits yet. I ran a benchmark yesterday where going from 1 to 2
> > IOThreads increased performance by 25%. That's much less than we're
> > aiming for; attaching two independent virtio-blk devices improves the
> > performance by ~100%. I know we can get there eventually. Some of the
> > bottlenecks are known (e.g. block statistics collection causes lock
> > contention) and others are yet to be investigated.
> 
> Hmm, I rebased this branch on top of mainline master and ran a naive
> test, and it seems that performance regressed quite a bit :(
> 
> I'm running this test on my laptop (Intel(R) Core(TM) i7-8650U CPU
> @1.90GHz), so this is more qualitative test for BW only.
> I use null_blk as the host device.
> 
> With mainline master I get ~9GB/s 64k randread, and with your branch
> I get ~5GB/s, this is regardless of assigning iothreads (one or
> two) or not.
> 
> my qemu command:
> taskset -c 0-3 build/qemu-system-x86_64 -cpu host -m 1G -enable-kvm -smp 4
> -drive
> file=/var/lib/libvirt/images/ubuntu-22/root-disk-clone.qcow2,format=qcow2
> -drive if=none,id=drive0,cache=none,aio=native,format=raw,file=/dev/nullb0
> -device virtio-blk-pci,drive=drive0,scsi=off -nographic
> 
> my guest fio jobfile:
> --
> [global]
> group_reporting
> runtime=3000
> time_based
> loops=1
> direct=1
> invalidate=1
> randrepeat=0
> norandommap
> exitall
> cpus_allowed=0-3
> cpus_allowed_policy=split
> 
> [read]
> filename=/dev/vda
> numjobs=4
> iodepth=32
> bs=64k
> rw=randread
> ioengine=io_uring

Hi Sagi,
I have some news and pushed new code to my repo:
https://gitlab.com/stefanha/qemu/-/commits/virtio-blk-iothread-vq-mapping

This branch changes virtio-blk emulation to process requests in
coroutines. The reason for this change was to reduce the number of
coroutines created per request and minimize nested event loops
(AIO_WAIT_WHILE() -> aio_poll()). However, I found a performance issue
with the implementation: request coroutines were yielding and thereby
deferring request processing until later in the event loop.

The new code I pushed yesterday works around this by skipping request
serialization/tracking (bs->tracked_requests) for read requests. I only
modified the code for read requests because that's what I benchmark.
bs->tracked_requests and its lock, bs->reqs_lock, was causing contention
and coroutine yields.

A proper solution that keeps request tracking but makes it SMP-friendly
will need to be implemented, but for now this may solve the issues you
were seeing.

On my system 4 KB randread iodepth=64 numjobs=8 now achieves the same
IOPS on bare metal and in a VM. I'm not sure if this addresses the
performance issue you were seeing but there's a good chance it does.

I'll run your fio jobs and compare against qemu.git/master without my
patches.

(I also added the --device virtio-blk-pci,stats-enabled=off,... option
to skip block I/O statistics collection. The statistics data is
protected by a lock that can cause contention when multiple IOThreads
process requests for the same device. In my testing it doesn't have much
of an effect on IOPS but I can see the difference in traces of futex
syscalls.)

Thanks,
Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: virtio-blk using a single iothread
  2023-06-11 12:27   ` Sagi Grimberg
  2023-06-21 12:23     ` Stefan Hajnoczi
  2023-07-27 15:11     ` Stefan Hajnoczi
@ 2023-07-31 15:51     ` Stefan Hajnoczi
  2 siblings, 0 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2023-07-31 15:51 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Paolo Bonzini, Qemu Developers

[-- Attachment #1: Type: text/plain, Size: 1158 bytes --]

On Sun, Jun 11, 2023 at 03:27:57PM +0300, Sagi Grimberg wrote:
> Maybe I'm doing something wrong? Didn't expect to find a regression
> against mainline on the default setup.

Hi Sagi,
I ran the latest branch against ed8ad9728a where it forked off master.
master achieves fewer IOPS.

It looks like the regression you saw was solved by the changes I made
last week.

Both "master" and "modified" are running with 1 IOThread:

                           IOPS
			  ------
randread 4k 64 master-1   213504
randread 4k 64 master-2   212650
randread 4k 64 master-3   211699
randread 4k 64 master-4   211940
randread 4k 64 master-5   214110
randread 4k 64 modified-1 234708
randread 4k 64 modified-2 236014
randread 4k 64 modified-3 235328
randread 4k 64 modified-4 235742

The improvement is around 10%.

You can find the benchmark configuration and raw data here:
https://gitlab.com/stefanha/virt-playbooks/-/tree/1a464c0676fe9133fb244d8a2dd1439001c7bc42

The configuration is in go.yml, plays/benchmark.yml, files/test.xml.j2,
and files/fio.sh.

The raw data is in notebook/fio-output/ and you can explore the Jupyter
notebook by running notebook/go.sh.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 484 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-07-31 15:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-08  7:40 virtio-blk using a single iothread Sagi Grimberg
2023-06-08 16:08 ` Stefan Hajnoczi
2023-06-11 12:27   ` Sagi Grimberg
2023-06-21 12:23     ` Stefan Hajnoczi
2023-07-27 15:11     ` Stefan Hajnoczi
2023-07-31 15:51     ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).