From: Stefan Hajnoczi <stefanha@redhat.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: linux-block@vger.kernel.org, lsf-pc@lists.linux-foundation.org,
"Liu Xiaodong" <xiaodong.liu@intel.com>,
"Jim Harris" <james.r.harris@intel.com>,
"Hans Holmberg" <Hans.Holmberg@wdc.com>,
"Matias Bjørling" <Matias.Bjorling@wdc.com>,
"hch@lst.de" <hch@lst.de>,
ZiyangZhang <ZiyangZhang@linux.alibaba.com>
Subject: Re: [LSF/MM/BPF BoF]: extend UBLK to cover real storage hardware
Date: Thu, 16 Feb 2023 10:28:23 -0500 [thread overview]
Message-ID: <Y+5Ll3agvKFnvJGv@fedora> (raw)
In-Reply-To: <Y+19AM8zuU9+abQS@T590>
[-- Attachment #1: Type: text/plain, Size: 4310 bytes --]
On Thu, Feb 16, 2023 at 08:46:56AM +0800, Ming Lei wrote:
> On Wed, Feb 15, 2023 at 10:27:07AM -0500, Stefan Hajnoczi wrote:
> > On Wed, Feb 15, 2023 at 08:51:27AM +0800, Ming Lei wrote:
> > > On Mon, Feb 13, 2023 at 02:13:59PM -0500, Stefan Hajnoczi wrote:
> > > > On Mon, Feb 13, 2023 at 11:47:31AM +0800, Ming Lei wrote:
> > > > > On Wed, Feb 08, 2023 at 07:17:10AM -0500, Stefan Hajnoczi wrote:
> > > > > > On Wed, Feb 08, 2023 at 10:12:19AM +0800, Ming Lei wrote:
> > > > > > > On Mon, Feb 06, 2023 at 03:27:09PM -0500, Stefan Hajnoczi wrote:
> > > > > > > > On Mon, Feb 06, 2023 at 11:00:27PM +0800, Ming Lei wrote:
> > > > > > > > > Hello,
> > > > > > > > >
> > > > > > > > > So far UBLK is only used for implementing virtual block device from
> > > > > > > > > userspace, such as loop, nbd, qcow2, ...[1].
> > > > > > > >
> > > > > > > > I won't be at LSF/MM so here are my thoughts:
> > > > > > >
> > > > > > > Thanks for the thoughts, :-)
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > It could be useful for UBLK to cover real storage hardware too:
> > > > > > > > >
> > > > > > > > > - for fast prototype or performance evaluation
> > > > > > > > >
> > > > > > > > > - some network storages are attached to host, such as iscsi and nvme-tcp,
> > > > > > > > > the current UBLK interface doesn't support such devices, since it needs
> > > > > > > > > all LUNs/Namespaces to share host resources(such as tag)
> > > > > > > >
> > > > > > > > Can you explain this in more detail? It seems like an iSCSI or
> > > > > > > > NVMe-over-TCP initiator could be implemented as a ublk server today.
> > > > > > > > What am I missing?
> > > > > > >
> > > > > > > The current ublk can't do that yet, because the interface doesn't
> > > > > > > support multiple ublk disks sharing single host, which is exactly
> > > > > > > the case of scsi and nvme.
> > > > > >
> > > > > > Can you give an example that shows exactly where a problem is hit?
> > > > > >
> > > > > > I took a quick look at the ublk source code and didn't spot a place
> > > > > > where it prevents a single ublk server process from handling multiple
> > > > > > devices.
> > > > > >
> > > > > > Regarding "host resources(such as tag)", can the ublk server deal with
> > > > > > that in userspace? The Linux block layer doesn't have the concept of a
> > > > > > "host", that would come in at the SCSI/NVMe level that's implemented in
> > > > > > userspace.
> > > > > >
> > > > > > I don't understand yet...
> > > > >
> > > > > blk_mq_tag_set is embedded into driver host structure, and referred by queue
> > > > > via q->tag_set, both scsi and nvme allocates tag in host/queue wide,
> > > > > that said all LUNs/NSs share host/queue tags, current every ublk
> > > > > device is independent, and can't shard tags.
> > > >
> > > > Does this actually prevent ublk servers with multiple ublk devices or is
> > > > it just sub-optimal?
> > >
> > > It is former, ublk can't support multiple devices which share single host
> > > because duplicated tag can be seen in host side, then io is failed.
> >
> > The kernel sees two independent block devices so there is no issue
> > within the kernel.
>
> This way either wastes memory, or performance is bad since we can't
> make a perfect queue depth for each ublk device.
>
> >
> > Userspace can do its own hw tag allocation if there are shared storage
> > controller resources (e.g. NVMe CIDs) to avoid duplicating tags.
> >
> > Have I missed something?
>
> Please look at lib/sbitmap.c and block/blk-mq-tag.c and see how many
> hard issues fixed/reported in the past, and how much optimization done
> in this area.
>
> In theory hw tag allocation can be done in userspace, but just hard to
> do efficiently:
>
> 1) it has been proved as one hard task for sharing data efficiently in
> SMP, so don't reinvent wheel in userspace, and this work could take
> much more efforts than extending current ublk interface, and just
> fruitless
>
> 2) two times tag allocation slows down io path much
>
> 2) even worse for userspace allocation, cause task can be killed and
> no cleanup is done, so tag leak can be caused easily
So then it is not "the former" after all?
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2023-02-16 15:29 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-06 15:00 [LSF/MM/BPF BoF]: extend UBLK to cover real storage hardware Ming Lei
2023-02-06 17:53 ` Hannes Reinecke
2023-03-08 8:50 ` Hans Holmberg
2023-03-08 12:27 ` Ming Lei
2023-02-06 18:26 ` Bart Van Assche
2023-02-08 1:38 ` Ming Lei
2023-02-08 18:02 ` Bart Van Assche
2023-02-06 20:27 ` Stefan Hajnoczi
2023-02-08 2:12 ` Ming Lei
2023-02-08 12:17 ` Stefan Hajnoczi
2023-02-13 3:47 ` Ming Lei
2023-02-13 19:13 ` Stefan Hajnoczi
2023-02-15 0:51 ` Ming Lei
2023-02-15 15:27 ` Stefan Hajnoczi
2023-02-16 0:46 ` Ming Lei
2023-02-16 15:28 ` Stefan Hajnoczi [this message]
2023-02-16 9:44 ` Andreas Hindborg
2023-02-16 10:45 ` Ming Lei
2023-02-16 11:21 ` Andreas Hindborg
2023-02-17 2:20 ` Ming Lei
2023-02-17 16:39 ` Stefan Hajnoczi
2023-02-18 11:22 ` Ming Lei
2023-02-18 18:38 ` Stefan Hajnoczi
2023-02-22 23:17 ` Ming Lei
2023-02-23 20:18 ` Stefan Hajnoczi
2023-03-02 3:22 ` Ming Lei
2023-03-02 15:09 ` Stefan Hajnoczi
2023-03-17 3:10 ` Ming Lei
2023-03-17 14:41 ` Stefan Hajnoczi
2023-03-18 0:30 ` Ming Lei
2023-03-20 12:34 ` Stefan Hajnoczi
2023-03-20 15:30 ` Ming Lei
2023-03-21 11:25 ` Stefan Hajnoczi
2023-03-16 14:24 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y+5Ll3agvKFnvJGv@fedora \
--to=stefanha@redhat.com \
--cc=Hans.Holmberg@wdc.com \
--cc=Matias.Bjorling@wdc.com \
--cc=ZiyangZhang@linux.alibaba.com \
--cc=hch@lst.de \
--cc=james.r.harris@intel.com \
--cc=linux-block@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=ming.lei@redhat.com \
--cc=xiaodong.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox