From: Peter-Jan Gootzen <pgootzen@nvidia.com>
To: "stefanha@redhat.com" <stefanha@redhat.com>,
"mszeredi@redhat.com" <mszeredi@redhat.com>
Cc: Idan Zach <izach@nvidia.com>,
"virtualization@lists.linux.dev" <virtualization@lists.linux.dev>,
"jefflexu@linux.alibaba.com" <jefflexu@linux.alibaba.com>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Max Gurtovoy <mgurtovoy@nvidia.com>,
"vgoyal@redhat.com" <vgoyal@redhat.com>,
Oren Duer <oren@nvidia.com>, Yoray Zack <yorayz@nvidia.com>
Subject: Re: [PATCH] fuse: cleanup request queuing towards virtiofs
Date: Wed, 5 Jun 2024 10:40:44 +0000 [thread overview]
Message-ID: <02a56c0d80c2fab16b7d3b536727ff6865aded40.camel@nvidia.com> (raw)
In-Reply-To: <20240529183231.GC1203999@fedora.redhat.com>
On Wed, 2024-05-29 at 14:32 -0400, Stefan Hajnoczi wrote:
> On Wed, May 29, 2024 at 05:52:07PM +0200, Miklos Szeredi wrote:
> > Virtiofs has its own queing mechanism, but still requests are first
> > queued
> > on fiq->pending to be immediately dequeued and queued onto the
> > virtio
> > queue.
> >
> > The queuing on fiq->pending is unnecessary and might even have some
> > performance impact due to being a contention point.
> >
> > Forget requests are handled similarly.
> >
> > Move the queuing of requests and forgets into the fiq->ops->*.
> > fuse_iqueue_ops are renamed to reflect the new semantics.
> >
> > Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> > ---
> > fs/fuse/dev.c | 159 ++++++++++++++++++++++++-----------------
> > ---
> > fs/fuse/fuse_i.h | 19 ++----
> > fs/fuse/virtio_fs.c | 41 ++++--------
> > 3 files changed, 106 insertions(+), 113 deletions(-)
>
> This is a little scary but I can't think of a scenario where directly
> dispatching requests to virtqueues is a problem.
>
> Is there someone who can run single and multiqueue virtiofs
> performance
> benchmarks?
>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
I ran some tests and experiments on the patch (on top of v6.10-rc2) with
our multi-queue capable virtio-fs device. No issues were found.
Experimental system setup (which is not the fastest possible setup nor
the most optimized setup!):
# Host:
- Dell PowerEdge R7525
- CPU: 2x AMD EPYC 7413 24-Core
- VM: QEMU KVM with 24 cores, vCPUs locked to the NUMA nodes on which
the DPU is attached. VFIO-pci device to passthrough the DPU.
Running a default x86_64 ext4 buildroot with fio installed.
# Virtio-fs device:
- BlueField-3 DPU
- CPU: ARM Cortex-A78AE, 16 cores
- One thread per queue, each busy polling on one request queue
- Each queue is 1024 descriptors deep
# Workload (deviations are specified in the table):
- fio 3.34
- sequential read
- ioengine=io_uring, single 4GiB file, iodepth=128, bs=256KiB,
runtime=30s, ramp_time=10s, direct=1
- T is the number of threads (numjobs=T with thread=1)
- Q is the number of request queues
| Workload | Before patch | After patch |
| ------------------ | ------------ | ----------- |
| T=1 Q=1 | 9216MiB/s | 9201MiB/s |
| T=2 Q=2 | 10.8GiB/s | 10.7GiB/s |
| T=4 Q=4 | 12.6GiB/s | 12.2GiB/s |
| T=8 Q=8 | 19.5GiB/s | 19.7GiB/s |
| T=16 Q=1 | 9451MiB/s | 9558MiB/s |
| T=16 Q=2 | 13.5GiB/s | 13.4GiB/s |
| T=16 Q=4 | 11.8GiB/s | 11.4GiB/s |
| T=16 Q=8 | 11.1GiB/s | 10.8GiB/s |
| T=24 Q=24 | 26.5GiB/s | 26.5GiB/s |
| T=24 Q=24 24 files | 26.5GiB/s | 26.6GiB/s |
| T=24 Q=24 4k | 948MiB/s | 955MiB/s |
Averaging out those results, the difference is within a reasonable
margin of a error (less than 1%). So in this setup's
case we see no difference in performance.
However if the virtio-fs device was more optimized, e.g. if it didn't
copy the data to its memory, then the bottleneck could possibly be on
the driver-side and this patch could show some benefit at those higher
message rates.
So although I would have hoped for some performance increase already
with this setup, I still think this is a good patch and a logical
optimization for high performance virtio-fs devices that might show a
benefit in the future.
Tested-by: Peter-Jan Gootzen <pgootzen@nvidia.com>
Reviewed-by: Peter-Jan Gootzen <pgootzen@nvidia.com>
next prev parent reply other threads:[~2024-06-05 10:40 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-29 15:52 [PATCH] fuse: cleanup request queuing towards virtiofs Miklos Szeredi
2024-05-29 18:32 ` Stefan Hajnoczi
2024-05-30 9:06 ` Miklos Szeredi
2024-05-30 13:38 ` Peter-Jan Gootzen
2024-06-05 10:40 ` Peter-Jan Gootzen [this message]
2024-06-05 11:04 ` Stefan Hajnoczi
2024-05-30 3:20 ` Jingbo Xu
2024-05-30 9:00 ` Miklos Szeredi
2024-05-30 15:36 ` Jingbo Xu
2024-05-30 17:07 ` Bernd Schubert
2024-09-23 10:33 ` Lai, Yi
2024-09-23 22:48 ` Joanne Koong
2024-09-23 23:47 ` Joanne Koong
2024-09-24 8:58 ` Miklos Szeredi
2024-09-24 9:52 ` Lai, Yi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=02a56c0d80c2fab16b7d3b536727ff6865aded40.camel@nvidia.com \
--to=pgootzen@nvidia.com \
--cc=izach@nvidia.com \
--cc=jefflexu@linux.alibaba.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=mgurtovoy@nvidia.com \
--cc=mszeredi@redhat.com \
--cc=oren@nvidia.com \
--cc=stefanha@redhat.com \
--cc=vgoyal@redhat.com \
--cc=virtualization@lists.linux.dev \
--cc=yorayz@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).