Linux block layer
 help / color / mirror / Atom feed
From: Bernd Schubert <bschubert@ddn.com>
To: Ming Lei <ming.lei@redhat.com>, Jens Axboe <axboe@kernel.dk>,
	Pavel Begunkov <asml.silence@gmail.com>,
	Miklos Szeredi <mszeredi@redhat.com>,
	Christoph Hellwig <hch@lst.de>,
	Ziyang Zhang <ZiyangZhang@linux.alibaba.com>,
	Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Cc: "lsf-pc@lists.linux-foundation.org" 
	<lsf-pc@lists.linux-foundation.org>,
	"io-uring@vger.kernel.org" <io-uring@vger.kernel.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Subject: Re: [LSF/MM/BPF TOPIC] ublk & io_uring: ublk zero copy support
Date: Fri, 5 May 2023 21:57:47 +0000	[thread overview]
Message-ID: <41cfb9c2-9774-e9e1-d8e7-4999a710f2e7@ddn.com> (raw)
In-Reply-To: <ZEx+h/iFf46XiWG1@ovpn-8-24.pek2.redhat.com>

Hi Ming,

On 4/29/23 04:18, Ming Lei wrote:
> Hello,
> 
> ublk zero copy is observed to improve big chunk(64KB+) sequential IO performance a
> lot, such as, IOPS of ublk-loop over tmpfs is increased by 1~2X[1], Jens also observed
> that IOPS of ublk-qcow2 can be increased by ~1X[2]. Meantime it saves memory bandwidth.
> 
> So this is one important performance improvement.
> 
> So far there are three proposal:

looks like there is no dedicated session. Could we still have a 
discussion in a free slot, if possible?

Thanks,
Bernd


> 
> 1) splice based
> 
> - spliced page from ->splice_read() can't be written
> 
> ublk READ request can't be handled because spliced page can't be written
> to, and extending splice for ublk zero copy isn't one good solution[3]
> 
> - it is very hard to meet above requirements  wrt. request buffer lifetime
> 
> splice/pipe focuses on page reference lifetime, but ublk zero copy pays more
> attention to ublk request buffer lifetime. If is very inefficient to respect
> request buffer lifetime by using all pipe buffer's ->release() which requires
> all pipe buffers and pipe to be kept when ublk server handles IO. That means
> one single dedicated ``pipe_inode_info`` has to be allocated runtime for each
> provided buffer, and the pipe needs to be populated with pages in ublk request
> buffer.
> 
> IMO, it isn't one good way to take splice from both correctness and performance
> viewpoint.
> 
> 2) io_uring register buffer based
> 
> - the main idea is to register one runtime buffer in fast io path, and
>    unregister it after the buffer is used by the following OPs
> 
> - the main problem is that bad performance caused by io_uring link model
> 
> registering buffer has to be one OP, same with unregistering buffer; the
> following normal OPs(such as FS IO) have to depend on the registering
> buffer OP, then io_uring link has to be used.
> 
> It is normal to see more than one normal OPs which depend on the registering
> buffer OP, so all these OPs(registering buffer, normal (FS IO) OPs and
> unregistering buffer) have to be linked together, then normal(FS IO) OPs
> have to be submitted one by one, and this way is slow, because there is
> often no dependency among all these normal FS OPs. Basically io_uring
> link model does not support this kind of 1:N dependency.
> 
> No one posted code for showing this approach yet.
> 
> 3) io_uring fused command[1]
> 
> - fused command extend current io_uring usage by allowing submitting following
> FS OPs(called secondary OPs) after the primary command provides buffer, and
> primary command won't be completed until all secondary OPs are done.
> 
> This way solves the problem in 2), and meantime avoids the buffer register cost in
> both submission and completion IO fast code path because the primary command won't
> be completed until all secondary OPs are done, so no need to write/read the
> buffer into per-context global data structure.
> 
> Meantime buffer lifetime problem is addressed simply, so correctness gets guaranteed,
> and performance is pretty good, and even IOPS of 4k IO gets a little
> improved in some workloads, or at least no perf regression is observed
> for small size IO.
> 
> fused command can be thought as one single request logically, just it has more
> than one SQE(all share same link flag), that is why is named as fused command.
> 
> - the only concern is that fused command starts one use usage of io_uring, but
> still not see comments wrt. what/why is bad with this kind of new usage/interface.
> 
> I propose this topic and want to discuss about how to move on with this
> feature.
> 
> 
> [1] https://lore.kernel.org/linux-block/20230330113630.1388860-1-ming.lei@redhat.com/
> [2] https://lore.kernel.org/linux-block/b3fc9991-4c53-9218-a8cc-5b4dd3952108@kernel.dk/
> [3] https://lore.kernel.org/linux-block/CAHk-=wgJsi7t7YYpuo6ewXGnHz2nmj67iWR6KPGoz5TBu34mWQ@mail.gmail.com/
> 
> 
> Thanks,
> Ming
> 


  reply	other threads:[~2023-05-05 21:58 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-29  2:18 [LSF/MM/BPF TOPIC] ublk & io_uring: ublk zero copy support Ming Lei
2023-05-05 21:57 ` Bernd Schubert [this message]
2023-05-06  1:38   ` Ming Lei
2023-05-08  2:16     ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41cfb9c2-9774-e9e1-d8e7-4999a710f2e7@ddn.com \
    --to=bschubert@ddn.com \
    --cc=ZiyangZhang@linux.alibaba.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=ming.lei@redhat.com \
    --cc=mszeredi@redhat.com \
    --cc=xiaoguang.wang@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox