* Increase maxsize of io_queue_depth for nvme driver?
@ 2025-09-14 12:45 Apachez
2025-09-14 14:31 ` Keith Busch
0 siblings, 1 reply; 3+ messages in thread
From: Apachez @ 2025-09-14 12:45 UTC (permalink / raw)
To: linux-nvme
Hi,
According to current version of the nvme driver in Linux Kernel
6.17-rc5 the boundaries for io_queue_depth are set to >= 2 and <=
4095:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/nvme/host/pci.c?h=v6.17-rc5#n93
However according to
https://nvmexpress.org/wp-content/uploads/NVMe-NVM-Express-2.0a-2021.07.26-Ratified.pdf
the Maximum Queue Entries Supported (MQES) is a 15 bit value (so 32767
should be the largest possible) while other "internet sources" often
referers max queue of a NVMe to be up to 64k (65535).
Using nvme-cli on a Micron 7450 NVMe SSD 800GB I get this result:
# nvme show-regs -H /dev/nvme0 | grep -i 'Maximum Queue Entries Supported'
Maximum Queue Entries Supported (MQES): 8192
I would like to propose that NVME_PCI_MAX_QUEUE_SIZE should be
increased from 4095 to 32767 to match the current NVMe specification
regarding MQES and to give the sysop ability to fully utilize the
performance of the hardware being used, or am I missing something
here?
A spinoff of this would also be that the default of
io_queue_depth=1000 should first attempt to use the reported MQES
value by the device and if that doesnt exist the io_queue_depth should
be set to current default of 1000?
Kind Regards
Apachez
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Increase maxsize of io_queue_depth for nvme driver?
2025-09-14 12:45 Increase maxsize of io_queue_depth for nvme driver? Apachez
@ 2025-09-14 14:31 ` Keith Busch
[not found] ` <5836da67-1fdd-43ac-a039-58705ba1f370@nvidia.com>
0 siblings, 1 reply; 3+ messages in thread
From: Keith Busch @ 2025-09-14 14:31 UTC (permalink / raw)
To: Apachez; +Cc: linux-nvme
On Sun, Sep 14, 2025 at 02:45:35PM +0200, Apachez wrote:
> I would like to propose that NVME_PCI_MAX_QUEUE_SIZE should be
> increased from 4095 to 32767 to match the current NVMe specification
> regarding MQES and to give the sysop ability to fully utilize the
> performance of the hardware being used, or am I missing something
> here?
We use the upper bits of the command id to detect duplicate completions,
so we don't have enough bits to tag commands beyond 4095. As far as I
know, though, we haven't seen such breakage in a *long* time. It's more
of a sanity thing to know with high certainty that duplicate completions
are not occurring.
But I don't think you'll see any performance difference by increasing
the queue depth. Devices saturate the link at far lower already, so
going higher just increases completion latency.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Increase maxsize of io_queue_depth for nvme driver?
[not found] ` <5836da67-1fdd-43ac-a039-58705ba1f370@nvidia.com>
@ 2025-09-19 8:13 ` Apachez
0 siblings, 0 replies; 3+ messages in thread
From: Apachez @ 2025-09-19 8:13 UTC (permalink / raw)
To: Chaitanya Kulkarni; +Cc: Keith Busch, linux-nvme@lists.infradead.org
On Fri, Sep 19, 2025 at 2:29 AM Chaitanya Kulkarni
<chaitanyak@nvidia.com> wrote:
>
> On 9/14/25 7:31 AM, Keith Busch wrote:
>
> On Sun, Sep 14, 2025 at 02:45:35PM +0200, Apachez wrote:
>
> I would like to propose that NVME_PCI_MAX_QUEUE_SIZE should be
> increased from 4095 to 32767 to match the current NVMe specification
> regarding MQES and to give the sysop ability to fully utilize the
> performance of the hardware being used, or am I missing something
> here?
>
> We use the upper bits of the command id to detect duplicate completions,
> so we don't have enough bits to tag commands beyond 4095. As far as I
> know, though, we haven't seen such breakage in a *long* time. It's more
> of a sanity thing to know with high certainty that duplicate completions
> are not occurring.
>
> But I don't think you'll see any performance difference by increasing
> the queue depth. Devices saturate the link at far lower already, so
> going higher just increases completion latency.
>
> Apachez, can you share the performance numbers where it shows clear win for the
>
> performance difference woth above mentioned queue depth number ?
>
> -ck
Hi,
I currently dont have any hard data if io_queue_depth larger than 4095
would improve
anything other than that the NVMe specification defines MQES as a 15 bit number
meaning (as I interpret it) that the NVME_PCI_MAX_QUEUE_SIZE should be max
size 32768 rather than 4095.
And one of the features with NVMe storage over HDD or SSD is support of a larger
queue size.
Its often claimed to be something like:
SATA <= 32, SAS <= 256, NVMe <= 65535.
As example Micron 7450 MAX states in their datasheet that the drive
supports 8192 as
queue size but Linux currently only let me set this to 4095 as max size.
Using a large io_queue_depth doesnt seem to hurt on above NVMe drives.
Im currently using this for ZFS and NVMe:
# Set maximum number of I/Os active to each device
# Should be equal or greater than sum of each queues *_max_active
# Normally SATA <= 32, SAS <= 256, NVMe <= 65535.
# To find out supported max queue for NVMe:
# nvme show-regs -H /dev/nvmeX | grep -i 'Maximum Queue Entries Supported'
# For NVMe should match /sys/module/nvme/parameters/io_queue_depth
# nvme.io_queue_depth limits are >= 2 and <= 4095
options zfs zfs_vdev_max_active=4095
options nvme io_queue_depth=4095
Kind Regards
Apachez
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-09-19 8:14 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-14 12:45 Increase maxsize of io_queue_depth for nvme driver? Apachez
2025-09-14 14:31 ` Keith Busch
[not found] ` <5836da67-1fdd-43ac-a039-58705ba1f370@nvidia.com>
2025-09-19 8:13 ` Apachez
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox