From: Bernd Schubert <bschubert@ddn.com>
To: "lsf-pc@lists.linux-foundation.org" <lsf-pc@lists.linux-foundation.org>
Cc: Ming Lei <ming.lei@redhat.com>,
Amir Goldstein <amir73il@gmail.com>,
Miklos Szeredi <miklos@szeredi.hu>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: [LSF/MM/BFP ATTEND][LSF/MM/BFP TOPIC] fuse uring communication
Date: Sun, 5 Feb 2023 00:59:58 +0000 [thread overview]
Message-ID: <7038cabf-e9bb-394a-e084-11bc23813fc7@ddn.com> (raw)
Hello,
I'm working for some time on fuse uring based communication that is numa
aware and core-affine.
In the current /dev/fuse based IO model requests are queued on lists
that are not core-affine or numa aware. For every request a round trip
between userspace and kernel is needed.
When we benchmarked our atomic-open patches (also still WIP) initially
confusing findings came up [1] and could be tracked down to multiple
threads reading from /dev/fuse. After switching to a single thread that
reads from /dev/fuse we got consistent and expected results.
Later we also figured out that adding a polling spin fuse_dev_do_read()
before going into a waitq sleep when no request is available greatly
improved meta data benchmark performance [2].
That made us to think about the current communication and to look into a
ring based queuing model. Around that time IORING_OP_URING_CMD was added
to uring and the new userspace block device driver (ublk) is using that
command, to send requests from kernel to userspace.
I started to look how ublk works and started to adapt a similar model to
fuse. State as today is that it is basically working, but I'm still
fixing issues found by xfstests. Benchmarks and patch cleanup for
submission follow next.
https://github.com/bsbernd/linux/tree/fuse-uring
https://github.com/bsbernd/libfuse/tree/uring
(these branches will _not_ be used for upstream submission, these are
purely for base development)
A fuse design documentation update will also be added in the 1st RFC
request, basic details follow as
- Initial mount setup goes over /dev/fuse
- fuse.ko queues FUSE_INIT in the existing /dev/fuse (background) queue
- User space sets up the ring and all queues with a new ioctl
- fuse.ko sets up the ring and allocates request queues/request memory
per queue/request
- Userspace mmaps these buffers and assigns them per queue/request
- Data are send through these mmaped buffers, there is no kmap involved
(difference to ublk)
- Similar to ublk user space first submits SQEs with as
FUSE_URING_REQ_FETCH, then later as FUSE_URING_REQ_COMMIT_AND_FETCH -
commit results of the current request and fetch the next one.
- FUSE_URING_REQ_FETCH also takes the FUSE_INIT request, later these
lists are not checked anymore, as there is nothing supposed to be on them
- The ring currently only only handles fuse pending and background
requests (with credits assigned)
- Forget requires libfuse still read /dev/fuse (handling will be added
to the ring later)
- In the WIP state request interrupts are not supported (yet)
- Userspace needs to send fuse notifications to /dev/fuse, needs to be
handled by the ring as well (or maybe a separate ring)
- My goal was to keep compatibility with existing fuse file systems,
except of the so far missing interrupt handling that should work so far.
There are certainly some questionable design decisions and longer
discussion threads might come up in the next weeks/months. Debating and
resolving some of these in person might be very helpful.
Ming is also working on zero-copy for ublk and I'm going to look into
that next. Splice and zero-copy is currently not supported yet in my
uring branch [3]
Thanks,
Bernd
[1]
https://lore.kernel.org/linux-fsdevel/20220322121212.5087-1-dharamhans87@gmail.com/
[2]
https://lore.kernel.org/lkml/6ba14287-336d-cdcd-0d39-680f288ca776@ddn.com/
[3]
https://patchwork.kernel.org/project/linux-block/cover/20221103085004.1029763-1-ming.lei@redhat.com/
next reply other threads:[~2023-02-05 1:00 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-05 0:59 Bernd Schubert [this message]
2023-02-10 10:45 ` [LSF/MM/BFP ATTEND][LSF/MM/BFP TOPIC] fuse uring communication Miklos Szeredi
2023-02-10 11:46 ` Bernd Schubert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7038cabf-e9bb-394a-e084-11bc23813fc7@ddn.com \
--to=bschubert@ddn.com \
--cc=amir73il@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=miklos@szeredi.hu \
--cc=ming.lei@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.