From: Elena <elena.ufimtseva@oracle.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: eduardo@habkost.net, john.g.johnson@oracle.com,
cohuck@redhat.com, jag.raman@oracle.com, john.levon@nutanix.com,
eblake@redhat.com, david@redhat.com, qemu-devel@nongnu.org,
peterx@redhat.com, armbru@redhat.com, mst@redhat.com,
berrange@redhat.com, pbonzini@redhat.com, philmd@redhat.com
Subject: Re: [RFC 0/8] ioregionfd introduction
Date: Tue, 15 Feb 2022 10:16:04 -0800 [thread overview]
Message-ID: <20220215181604.GA33858@nuker> (raw)
In-Reply-To: <YgpsrdhBKfhbXPnG@stefanha-x1.localdomain>
On Mon, Feb 14, 2022 at 02:52:29PM +0000, Stefan Hajnoczi wrote:
> On Mon, Feb 07, 2022 at 11:22:14PM -0800, Elena Ufimtseva wrote:
> > This patchset is an RFC version for the ioregionfd implementation
> > in QEMU. The kernel patches are to be posted with some fixes as a v4.
> >
> > For this implementation version 3 of the posted kernel patches was user:
> > https://lore.kernel.org/kvm/cover.1613828726.git.eafanasova@gmail.com/
> >
> > The future version will include support for vfio/libvfio-user.
> > Please refer to the design discussion here proposed by Stefan:
> > https://lore.kernel.org/all/YXpb1f3KicZxj1oj@stefanha-x1.localdomain/T/
> >
> > The vfio-user version needed some bug-fixing and it was decided to send
> > this for multiprocess first.
> >
> > The ioregionfd is configured currently trough the command line and each
> > ioregionfd represent an object. This allow for easy parsing and does
> > not require device/remote object command line option modifications.
> >
> > The following command line can be used to specify ioregionfd:
> > <snip>
> > '-object', 'x-remote-object,id=robj1,devid=lsi0,fd='+str(remote.fileno()),\
> > '-object', 'ioregionfd-object,id=ioreg2,devid=lsi0,iofd='+str(iord.fileno())+',bar=1',\
> > '-object', 'ioregionfd-object,id=ioreg3,devid=lsi0,iofd='+str(iord.fileno())+',bar=2',\
>
Hi Stefan
Thank you for taking a look!
> Explicit configuration of ioregionfd-object is okay for early
> prototyping, but what is the plan for integrating this? I guess
> x-remote-object would query the remote device to find out which
> ioregionfds need to be registered and the user wouldn't need to specify
> ioregionfds on the command-line?
Yes, this can be done. For some reason I thought that user will be able
to configure the number/size of the regions to be configured as
ioregionfds.
>
> > </snip>
> >
> > Proxy side of ioregionfd in this version uses only one file descriptor:
> > <snip>
> > '-device', 'x-pci-proxy-dev,id=lsi0,fd='+str(proxy.fileno())+',ioregfd='+str(iowr.fileno()), \
> > </snip>
>
> This raises the question of the ioregionfd file descriptor lifecycle. In
> the end I think it shouldn't be specified on the command-line. Instead
> the remote device should create it and pass it to QEMU over the
> mpqemu/remote fd?
Yes, this will be same as vfio-user does.
>
> >
> > This is done for RFC version and my though was that next version will
> > be for vfio-user, so I have not dedicated much effort to this command
> > line options.
> >
> > The multiprocess messaging protocol was extended to support inquiries
> > by the proxy if device has any ioregionfds.
> > This RFC implements inquires by proxy about the type of BAR (ioregionfd
> > or not) and the type of it (memory/io).
> >
> > Currently there are few limitations in this version of ioregionfd.
> > - one ioregionfd per bar, only full bar size is supported;
> > - one file descriptor per device for all of its ioregionfds;
> > - each remote device runs fd handler for all its BARs in one IOThread;
> > - proxy supports only one fd.
> >
> > Some of these limitations will be dropped in the future version.
> > This RFC is to acquire the feedback/suggestions from the community
> > on the general approach.
> >
> > The quick performance test was done for the remote lsi device with
> > ioregionfd and without for both mem BARs (1 and 2) with help
> > of the fio tool:
> >
> > Random R/W:
> >
> > read IOPS read BW write IOPS write BW
> > no ioregionfd 889 3559KiB/s 890 3561KiB/s
> > ioregionfd 938 3756KiB/s 939 3757KiB/s
>
> This is extremely slow, even for random I/O. How does this compare to
> QEMU running the LSI device without multi-process mode?
These tests had the iodepth=256. I have changed this to 1 and tested
without multiprocess, with multiprocess and multiprocess with both mmio
regions as ioregionfds:
read IOPS read BW(KiB/s) write IOPS write BW (KiB/s)
no multiprocess 89 358 90 360
multiprocess 138 556 139 557
multiprocess ioregionfd 174 698 173 693
The fio config for randomrw:
[global]
bs=4K
iodepth=1
direct=0
ioengine=libaio
group_reporting
time_based
runtime=240
numjobs=1
name=raw-randreadwrite
rw=randrw
size=8G
[job1]
filename=/fio/randomrw
And QEMU command line for non-mutliprocess:
/usr/local/bin/qemu-system-x86_64 -name "OL7.4" -machine q35,accel=kvm -smp sockets=1,cores=2,threads=2 -m 2048 -hda /home/homedir/ol7u9boot.img -boot d -vnc :0 -chardev stdio,id=seabios -device isa-debugcon,iobase=0x402,chardev=seabios -device lsi53c895a,id=lsi1 -drive id=drive_image1,if=none,file=/home/homedir/10gb.qcow2 -device scsi-hd,id=drive1,drive=drive_image1,bus=lsi1.0,scsi-id=0
QEMU command line for multiprocess:
remote_cmd = [ PROC_QEMU, \
'-machine', 'x-remote', \
'-device', 'lsi53c895a,id=lsi0', \
'-drive', 'id=drive_image1,file=/home/homedir/10gb.qcow2', \
'-device', 'scsi-hd,id=drive2,drive=drive_image1,bus=lsi0.0,' \
'scsi-id=0', \
'-nographic', \
'-monitor', 'unix:/home/homedir/rem-sock,server,nowait', \
'-object', 'x-remote-object,id=robj1,devid=lsi0,fd='+str(remote.fileno()),\
'-object', 'ioregionfd-object,id=ioreg2,devid=lsi0,iofd='+str(iord.fileno())+',bar=1,',\
'-object', 'ioregionfd-object,id=ioreg3,devid=lsi0,iofd='+str(iord.fileno())+',bar=2',\
]
proxy_cmd = [ PROC_QEMU, \
'-D', '/tmp/qemu-debug-log', \
'-name', 'OL7.4', \
'-machine', 'pc,accel=kvm', \
'-smp', 'sockets=1,cores=2,threads=2', \
'-m', '2048', \
'-object', 'memory-backend-memfd,id=sysmem-file,size=2G', \
'-numa', 'node,memdev=sysmem-file', \
'-hda','/home/homedir/ol7u9boot.img', \
'-boot', 'd', \
'-vnc', ':0', \
'-device', 'x-pci-proxy-dev,id=lsi0,fd='+str(proxy.fileno())+',ioregfd='+str(iowr.fileno()), \
'-monitor', 'unix:/home/homedir/qemu-sock,server,nowait', \
'-netdev','tap,id=mynet0,ifname=tap0,script=no,downscript=no', '-device','e1000,netdev=mynet0,mac=52:55:00:d1:55:01',\
]
Where for the test without ioregionfds, they are commented out.
I am doing more testing as I see some inconsistent results.
>
> > Sequential Read and Sequential Write:
> >
> > Sequential read Sequential write
> > read IOPS read BW write IOPS write BW
> >
> > no ioregionfd 367k 1434MiB/s 76k 297MiB/s
> > ioregionfd 374k 1459MiB/s 77.3k 302MiB/s
>
> It's normal for read and write IOPS to differ, but the read IOPS are
> very high. I wonder if caching and read-ahead are hiding the LSI
> device's actual performance here.
>
> What are the fio and QEMU command-lines?
>
> In order to benchmark ioregionfd it's best to run a benchmark where the
> bottleneck is MMIO/PIO dispatch. Otherwise we're looking at some other
> bottleneck (e.g. physical disk I/O performance) and the MMIO/PIO
> dispatch cost doesn't affect IOPS significantly.
>
> I suggest trying --blockdev null-co,size=64G,id=null0 as the disk
> instead of a file or host block device. The fio block size should be 4k
> to minimize the amount of time spent on I/O buffer contents and
> iodepth=1 because batching multiple requests with iodepth > 0 hides the
> MMIO/PIO dispatch bottleneck.
The queue depth in the tests above was 256, I will try that you have
suggested. The block size is 4k.
I am also looking at some other system issue that can interfere with
test, will be running test on the fresh install and with settings you
mentioned above.
Thank you!
>
> Stefan
next prev parent reply other threads:[~2022-02-15 18:19 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-08 7:22 [RFC 0/8] ioregionfd introduction Elena Ufimtseva
2022-02-08 7:22 ` [RFC 1/8] ioregionfd: introduce a syscall and memory API Elena Ufimtseva
2022-02-16 12:19 ` David Hildenbrand
2022-02-08 7:22 ` [RFC 2/8] multiprocess: place RemoteObject definition in a header file Elena Ufimtseva
2022-02-08 7:22 ` [RFC 3/8] ioregionfd: introduce memory API functions Elena Ufimtseva
2022-02-14 14:32 ` Stefan Hajnoczi
2022-02-08 7:22 ` [RFC 4/8] ioregionfd: Introduce IORegionDFObject type Elena Ufimtseva
2022-02-11 13:46 ` Markus Armbruster
2022-02-15 18:19 ` Elena
2022-02-14 14:37 ` Stefan Hajnoczi
2022-02-15 18:18 ` Elena
2022-02-16 11:08 ` Stefan Hajnoczi
2022-02-08 7:22 ` [RFC 5/8] multiprocess: prepare ioregionfds for remote device Elena Ufimtseva
2022-02-08 7:22 ` [RFC 6/8] multiprocess: add MPQEMU_CMD_BAR_INFO Elena Ufimtseva
2022-02-08 7:22 ` [RFC 7/8] multiprocess: add ioregionfd memory region in proxy Elena Ufimtseva
2022-02-08 7:22 ` [RFC 8/8] multiprocess: handle ioregionfd commands Elena Ufimtseva
2022-02-09 10:33 ` [RFC 0/8] ioregionfd introduction Stefan Hajnoczi
2022-02-14 14:52 ` Stefan Hajnoczi
2022-02-15 18:16 ` Elena [this message]
2022-02-16 11:20 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220215181604.GA33858@nuker \
--to=elena.ufimtseva@oracle.com \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=cohuck@redhat.com \
--cc=david@redhat.com \
--cc=eblake@redhat.com \
--cc=eduardo@habkost.net \
--cc=jag.raman@oracle.com \
--cc=john.g.johnson@oracle.com \
--cc=john.levon@nutanix.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=philmd@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).