All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <tom.leiming@gmail.com>
To: Bernd Schubert <bernd.schubert@fastmail.fm>
Cc: Jens Axboe <axboe@kernel.dk>,
	io-uring@vger.kernel.org, Pavel Begunkov <asml.silence@gmail.com>,
	Miklos Szeredi <miklos@szeredi.hu>,
	Joanne Koong <joannelkoong@gmail.com>,
	Josef Bacik <josef@toxicpanda.com>
Subject: Re: Large CQE for fuse headers
Date: Mon, 14 Oct 2024 10:44:33 +0800	[thread overview]
Message-ID: <ZwyFke6PayyOznP_@fedora> (raw)
In-Reply-To: <2fe2a3d3-4720-4d33-871e-5408ba44a543@fastmail.fm>

On Sun, Oct 13, 2024 at 11:20:53PM +0200, Bernd Schubert wrote:
> 
> 
> On 10/12/24 16:38, Jens Axboe wrote:
> > On 10/11/24 7:55 PM, Ming Lei wrote:
> >> On Fri, Oct 11, 2024 at 4:56?AM Bernd Schubert
> >> <bernd.schubert@fastmail.fm> wrote:
> >>>
> >>> Hello,
> >>>
> >>> as discussed during LPC, we would like to have large CQE sizes, at least
> >>> 256B. Ideally 256B for fuse, but CQE512 might be a bit too much...
> >>>
> >>> Pavel said that this should be ok, but it would be better to have the CQE
> >>> size as function argument.
> >>> Could you give me some hints how this should look like and especially how
> >>> we are going to communicate the CQE size to the kernel? I guess just adding
> >>> IORING_SETUP_CQE256 / IORING_SETUP_CQE512 would be much easier.
> >>>
> >>> I'm basically through with other changes Miklos had been asking for and
> >>> moving fuse headers into the CQE is next.
> >>
> >> Big CQE may not be efficient,  there are copy from kernel to CQE and
> >> from CQE to userspace. And not flexible, it is one ring-wide property,
> >> if it is big,
> >> any CQE from this ring has to be big.
> > 
> > There isn't really a copy - the kernel fills it in, generally the
> > application itself, just in the kernel, and then the application can
> > read it on that side. It's the same memory, and it'll also generally be
> > cache hot when the applicatio reaps it. Unless a lot of time has passed,
> > obviously.
> > 
> > That said, yeah bigger sqe/cqe is less ideal than smaller ones,
> > obviously. Currently you can fit 4 normal cqes in a cache line, or a
> > single sqe. Making either of them bigger will obviously bloat that.
> > 
> >> If you are saying uring_cmd,  another way is to mapped one area for
> >> this purpose, the fuse driver can write fuse headers to this indexed
> >> mmap buffer, and userspace read it, which is just efficient, without
> >> io_uring core changes. ublk uses this way to fill IO request header.
> >> But it requires each command to have a unique tag.
> > 
> > That may indeed be a decent idea for this too. You don't even need fancy
> > tagging, you can just use the cqe index for your tag too, as it should
> > not be bigger than the the cq ring space. Then you can get away with
> > just using normal cqe sizes, and just have a shared region between the
> > two where data gets written by the uring_cmd completion, and the app can
> > access it directly from userspace.
> 
> Would be good if Miklos could chime in here, adding back mmap for headers
> wouldn't be difficult, but would add back more fuse-uring startup and
> tear-down code.
> 
> From performance point of view, I don't know anything about CPU cache
> prefetching, but shouldn't the cpu cache logic be able to easily prefetch 
> larger linear io-uring rings into 2nd/3rd level caches? And if if the
> fuse header is in a separated buffer, it can't auto prefetch that
> without additional instructions? I.e. how would the cpu cache logic
> auto know about these additional memory areas?

It also depends on how fuse user code consumes the big CQE payload, if
fuse header needs to keep in memory a bit long, you may have to copy it
somewhere for post-processing since io_uring(kernel) needs CQE to be
returned back asap.


Thanks,
Ming

  reply	other threads:[~2024-10-14  2:44 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-10 20:56 Large CQE for fuse headers Bernd Schubert
2024-10-11 17:57 ` Jens Axboe
2024-10-11 18:35   ` Bernd Schubert
2024-10-11 18:39     ` Jens Axboe
2024-10-11 19:03       ` Bernd Schubert
2024-10-11 19:24         ` Jens Axboe
2024-10-11 21:38 ` Pavel Begunkov
2024-10-12  1:55 ` Ming Lei
2024-10-12 14:38   ` Jens Axboe
2024-10-13 21:20     ` Bernd Schubert
2024-10-14  2:44       ` Ming Lei [this message]
2024-10-14 11:10         ` Miklos Szeredi
2024-10-14 12:47           ` Bernd Schubert
2024-10-14 13:34             ` Pavel Begunkov
2024-10-14 15:21               ` Bernd Schubert
2024-10-14 17:48                 ` Pavel Begunkov
2024-10-14 21:27                   ` Bernd Schubert
2024-10-16 10:54                     ` Miklos Szeredi
2024-10-16 11:53                       ` Bernd Schubert
2024-10-16 12:24                         ` Miklos Szeredi
2024-10-17  0:59                         ` Ming Lei
2024-10-14 13:20           ` Bernd Schubert
2024-10-14 10:31       ` Miklos Szeredi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZwyFke6PayyOznP_@fedora \
    --to=tom.leiming@gmail.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=bernd.schubert@fastmail.fm \
    --cc=io-uring@vger.kernel.org \
    --cc=joannelkoong@gmail.com \
    --cc=josef@toxicpanda.com \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.