linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Wei <dw@davidwei.uk>
To: lsf-pc@lists.linux-foundation.org
Cc: linux-fsdevel@vger.kernel.org, Bernd Schubert <bschubert@ddn.com>,
	Keith Busch <kbusch@kernel.org>, Ming Lei <ming.lei@redhat.com>,
	Jens Axboe <axboe@kernel.dk>, Josef Bacik <josef@toxicpanda.com>,
	Joanne Koong <joannelkoong@gmail.com>
Subject: [LSF/MM/BPF TOPIC] FUSE io_uring zero copy
Date: Thu, 30 Jan 2025 13:28:55 -0800	[thread overview]
Message-ID: <dc3a5c7d-b254-48ea-9749-2c464bfd3931@davidwei.uk> (raw)

Hi folks, I want to propose a discussion on adding zero copy to FUSE
io_uring in the kernel. The source is some userspace buffer or device
memory e.g. GPU VRAM. The destination is FUSE server in userspace, which
will then either forward it over the network or to an underlying
FS/block device. The FUSE server may want to read the data.

My goal is to eliminate copies in this entire data path, including the
initial hop between the userspace client and the kernel. I know Ming and
Keith are working on adding ublk zero copy but it does not cover this
initial hop and it does not allow the ublk/FUSE server to read the data.

My idea is to use shared memory or dma-buf, i.e. the source data is
encapsulated in an mmap()able fd. The client and FUSE server exchange
this fd through a back channel with no kernel involvement. The FUSE
server maps the fd into its address space and registers the fd with
io_uring via the io_uring_register() infra. When the client does e.g. a
DIO write, the pages are pinned and forwarded to FUSE kernel, which does
a lookup and understands that the pages belong to the fd that was
registered from the FUSE server. Then io_uring tells the FUSE server
that the data is in the fd it registered, so there is no need to copy
anything at all.

I would like to discuss this and get feedback from the community. My top
question is why do this in the kernel at all? It is entirely possible to
bypass the kernel entirely by having the client and FUSE server exchange
the fd and then do the I/O purely through IPC.

David

             reply	other threads:[~2025-01-30 21:28 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-30 21:28 David Wei [this message]
2025-01-30 22:05 ` [LSF/MM/BPF TOPIC] FUSE io_uring zero copy Bernd Schubert
2025-01-30 22:51   ` David Wei
2025-01-31 14:13   ` Amir Goldstein
2025-01-30 22:22 ` Keith Busch
2025-01-30 22:40   ` David Wei
2025-02-05  2:27 ` Ming Lei
2025-02-07  2:10   ` David Wei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dc3a5c7d-b254-48ea-9749-2c464bfd3931@davidwei.uk \
    --to=dw@davidwei.uk \
    --cc=axboe@kernel.dk \
    --cc=bschubert@ddn.com \
    --cc=joannelkoong@gmail.com \
    --cc=josef@toxicpanda.com \
    --cc=kbusch@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).