Re: ublk: RFC fetch_req_multishot

public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed

From: Jared Holzman <jholzman@nvidia.com>
To: Caleb Sander Mateos <csander@purestorage.com>,
	Ofer Oshri <ofer@nvidia.com>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"ming.lei@redhat.com" <ming.lei@redhat.com>,
	"axboe@kernel.dk" <axboe@kernel.dk>, Yoav Cohen <yoav@nvidia.com>,
	Guy Eisenberg <geisenberg@nvidia.com>,
	Omri Levi <omril@nvidia.com>
Subject: Re: ublk: RFC fetch_req_multishot
Date: Fri, 25 Apr 2025 00:07:14 +0300	[thread overview]
Message-ID: <5dca544d-5d23-4269-b447-6fbcda5de56e@nvidia.com> (raw)
In-Reply-To: <CADUfDZqUQ+n5tr=XG+sJWR_q55fzNSzLHvUZXkysOw=c+vfVGg@mail.gmail.com>

On 24/04/2025 22:07, Caleb Sander Mateos wrote:
> On Thu, Apr 24, 2025 at 11:58 AM Ofer Oshri <ofer@nvidia.com> wrote:
>>
>>
>>
>> ________________________________
>> From: Caleb Sander Mateos <csander@purestorage.com>
>> Sent: Thursday, April 24, 2025 9:28 PM
>> To: Ofer Oshri <ofer@nvidia.com>
>> Cc: linux-block@vger.kernel.org <linux-block@vger.kernel.org>; ming.lei@redhat.com <ming.lei@redhat.com>; axboe@kernel.dk <axboe@kernel.dk>; Jared Holzman <jholzman@nvidia.com>; Yoav Cohen <yoav@nvidia.com>; Guy Eisenberg <geisenberg@nvidia.com>; Omri Levi <omril@nvidia.com>
>> Subject: Re: ublk: RFC fetch_req_multishot
>>
>> External email: Use caution opening links or attachments
>>
>>
>> On Thu, Apr 24, 2025 at 11:19 AM Ofer Oshri <ofer@nvidia.com> wrote:
>>>
>>> Hi,
>>>
>>> Our code uses a single io_uring per core, which is shared among all block devices - meaning each block device on a core uses the same io_uring.
>>>
>>> Let’s say the size of the io_uring is N. Each block device submits M UBLK_U_IO_FETCH_REQ requests. As a result, with the current implementation, we can only support up to P block devices, where P = N / M. This means that when we attempt to support block device P+1, it will fail due to io_uring exhaustion.
>>
>> What do you mean by "size of the io_uring", the submission queue size?
>> Why can't you submit all P * M UBLK_U_IO_FETCH_REQ operations in
>> batches of N?
>>
>> Best,
>> Caleb
>>
>> N is the size of the submission queue, and P is not fixed and unknown at the time of ring initialization....
> 
> I don't think it matters whether P (the number of ublk devices) is
> known ahead of time or changes dynamically. My point is that you can
> submit the UBLK_U_IO_FETCH_REQ operations in batches of N to avoid
> exceeding the io_uring SQ depth. (If there are other operations
> potentially interleaved with the UBLK_U_IO_FETCH_REQ ones, then just
> submit each time the io_uring SQ fills up.) Any values of P, M, and N
> should work. Perhaps I'm misunderstanding you, because I don't know
> what "io_uring exhaustion" refers to.
> 
> Multishot ublk io_uring operations don't seem like a trivial feature
> to implement. Currently, incoming ublk requests are posted to the ublk
> server using io_uring's "task work" mechanism, which inserts the
> io_uring operation into an intrusive linked list. If you wanted a
> single ublk io_uring operation to post multiple completions, it would
> need to allocate some structure for each incoming request to insert
> into the task work list. There is also an assumption that the ublk
> io_uring operations correspond 1-1 with the blk-mq requests for the
> ublk device, which would be broken by multishot ublk io_uring
> operations.
> 
> Best,
> Caleb

Hi Caleb,

I think what Ofer is trying to say is that we have a scaling issue. 

Our deployment could consist of 100s of ublk devices, not all of which will be dispatching IO at the same time. If we were to submit the maximum number of IO requests that our application can handle for every ublk device we need to deploy, the memory requirements would be excessive.

For this reason, we would prefer to have a global pool of IO requests that can be registered with the ublk-control device that each of the ublk devices registered to it can use.

We understand this is a complex undertaking and would be willing to do the work ourselves, but before we start we want to know if the requirement is reasonable enough for our changes to be accepted upstream.

Regards,

Jared

next prev parent reply	other threads:[~2025-04-24 21:07 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-24 18:19 ublk: RFC fetch_req_multishot Ofer Oshri
2025-04-24 18:28 ` Caleb Sander Mateos
2025-04-24 19:07   ` Ofer Oshri
     [not found]   ` <IA1PR12MB60672D37508D641368D211B8B6852@IA1PR12MB6067.namprd12.prod.outlook.com>
2025-04-24 19:07     ` Caleb Sander Mateos
2025-04-24 21:07       ` Jared Holzman [this message]
2025-04-24 21:52         ` Caleb Sander Mateos
2025-04-25  5:23       ` Ming Lei
2025-06-06 12:03         ` Ming Lei
2025-04-25  4:10 ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5dca544d-5d23-4269-b447-6fbcda5de56e@nvidia.com \
    --to=jholzman@nvidia.com \
    --cc=axboe@kernel.dk \
    --cc=csander@purestorage.com \
    --cc=geisenberg@nvidia.com \
    --cc=linux-block@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=ofer@nvidia.com \
    --cc=omril@nvidia.com \
    --cc=yoav@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox