From: Kanchan Joshi <joshi.k@samsung.com>
To: Keith Busch <kbusch@kernel.org>
Cc: Keith Busch <kbusch@meta.com>,
linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
axboe@kernel.dk, hch@lst.de, xiaoguang.wang@linux.alibaba.com
Subject: Re: [RFC 0/3] nvme uring passthrough diet
Date: Fri, 5 May 2023 13:44:55 +0530 [thread overview]
Message-ID: <20230505081455.GA32732@green245> (raw)
In-Reply-To: <ZFJ7pAuTY6ESCVgp@kbusch-mbp.dhcp.thefacebook.com>
[-- Attachment #1: Type: text/plain, Size: 3246 bytes --]
On Wed, May 03, 2023 at 09:20:04AM -0600, Keith Busch wrote:
>On Wed, May 03, 2023 at 12:57:17PM +0530, Kanchan Joshi wrote:
>> On Mon, May 01, 2023 at 08:33:03AM -0700, Keith Busch wrote:
>> > From: Keith Busch <kbusch@kernel.org>
>> >
>> > When you disable all the optional features in your kernel config and
>> > request queue, it looks like the normal request dispatching is just as
>> > fast as any attempts to bypass it. So let's do that instead of
>> > reinventing everything.
>> >
>> > This doesn't require additional queues or user setup. It continues to
>> > work with multiple threads and processes, and relies on the well tested
>> > queueing mechanisms that track timeouts, handle tag exhuastion, and sync
>> > with controller state needed for reset control, hotplug events, and
>> > other error handling.
>>
>> I agree with your point that there are some functional holes in
>> the complete-bypass approach. Yet the work was needed to be done
>> to figure out the gain (of approach) and see whether the effort to fill
>> these holes is worth.
>>
>> On your specific points
>> - requiring additional queues: not a showstopper IMO.
>> If queues are lying unused with HW, we can reap more performance by
>> giving those to application. If not, we fall back to the existing path.
>> No disruption as such.
>
>The current way we're reserving special queues is bad and should
>try to not extend it futher. It applies to the whole module and
>would steal resources from some devices that don't want poll queues.
>If you have a mix of device types in your system, the low end ones
>don't want to split their resources this way.
>
>NVMe has no problem creating new queues on the fly. Queue allocation
>doesn't have to be an initialization thing, but you would need to
>reserve the QID's ahead of time.
Totally in agreement with that. Jens also mentioned this point.
And I had added preallocation in my to-be-killed list. Thanks for
expanding.
Related to that, I think one-qid-per-ring also need to be lifted.
That should allow to do io on two/more devices with the single ring
and see how well that scales.
>> - tag exhaustion: that is not missing, a retry will be made. I actually
>> wanted to do single command-id management at the io_uring level itself,
>> and that would have cleaned things up. But it did not fit in
>> because of submission/completion lifetime differences.
>> - timeout and other bits you mentioned: yes, those need more work.
>>
>> Now with the alternate proposed in this series, I doubt whether similar
>> gains are possible. Happy to be wrong if that happens.
>
>One other thing: the pure-bypass does appear better at low queue
>depths, but utilizing the plug for aggregated sq doorbell writes
>is a real win at higher queue depths from this series. Batching
>submissions at 4 deep is the tipping point on my test box; this
>series outperforms pure bypass at any higher batch count.
I see.
I hit 5M cliff without plug/batching primarily because pure-bypass
is reducing the code to do the IO. But plug/batching is needed to get
better at this.
If we create space for a pointer in io_uring_cmd, that can get added in
the plug list (in place of struct request). That will be one way to sort
out the plugging.
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
prev parent reply other threads:[~2023-05-05 8:18 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20230501154403epcas5p388c607114ad6f9d20dfd3ec958d88947@epcas5p3.samsung.com>
2023-05-01 15:33 ` [RFC 0/3] nvme uring passthrough diet Keith Busch
2023-05-01 15:33 ` [RFC 1/3] nvme: skip block cgroups for passthrough commands Keith Busch
2023-05-03 5:04 ` Christoph Hellwig
2023-05-03 15:25 ` Keith Busch
2023-05-15 15:47 ` Keith Busch
2023-05-01 15:33 ` [RFC 2/3] nvme: fix cdev name leak Keith Busch
2023-05-01 15:33 ` [RFC 3/3] nvme: create special request queue for cdev Keith Busch
2023-05-02 12:20 ` Johannes Thumshirn
2023-05-03 5:04 ` Christoph Hellwig
2023-05-03 14:56 ` Keith Busch
2023-05-01 19:01 ` [RFC 0/3] nvme uring passthrough diet Kanchan Joshi
2023-05-01 19:31 ` Keith Busch
2023-05-03 7:27 ` Kanchan Joshi
2023-05-03 15:20 ` Keith Busch
2023-05-05 8:14 ` Kanchan Joshi [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230505081455.GA32732@green245 \
--to=joshi.k@samsung.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=kbusch@meta.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=xiaoguang.wang@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox