* [PATCH v1] mpt3sas: Limit NVMe request size to 2 MiB
@ 2026-04-09 18:42 Ranjan Kumar
2026-04-10 4:11 ` Damien Le Moal
0 siblings, 1 reply; 4+ messages in thread
From: Ranjan Kumar @ 2026-04-09 18:42 UTC (permalink / raw)
To: linux-scsi, martin.petersen
Cc: sathya.prakash, chandrakanth.patil, dlemoal, Ranjan Kumar, stable,
Mira Limbeck, Keith Busch
Some firmware reports NVMe maximum transfer sizes that follow the drive
capability. When those values are very large, the block layer may build
I/O that this driver cannot handle, which can cause a kernel oops.
When an NVMe device is set up, cap how large a single transfer may be
to the smaller of the firmware-reported limit and roughly two mebibytes
with a small margin. If no valid limit is reported, apply the same
upper bound.
Cc: stable@vger.kernel.org
Fixes: 9b8b84879d4a ("block: Increase BLK_DEF_MAX_SECTORS_CAP")
Reported-by: Mira Limbeck <m.limbeck@proxmox.com>
Closes: https://lore.kernel.org/r/291f78bf-4b4a-40dd-867d-053b36c564b3@proxmox.com
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9b8b84879d4a
Suggested-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
---
drivers/scsi/mpt3sas/mpt3sas_scsih.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 6ff788557294..b6abc83d8121 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -54,6 +54,7 @@
#include <linux/interrupt.h>
#include <linux/raid_class.h>
#include <linux/unaligned.h>
+#include <linux/sizes.h>
#include "mpt3sas_base.h"
@@ -2738,8 +2739,17 @@ scsih_sdev_configure(struct scsi_device *sdev, struct queue_limits *lim)
pcie_device->enclosure_level,
pcie_device->connector_name);
+ /*
+ * Firmware may report NVMe MDTS from the drive; values above
+ * what the driver can handle can cause a kernel oops. Cap queue
+ * I/O in sectors to min(MDTS, 2 MiB - 4096 B).
+ */
if (pcie_device->nvme_mdts)
- lim->max_hw_sectors = pcie_device->nvme_mdts / 512;
+ lim->max_hw_sectors = min_t(u32,
+ pcie_device->nvme_mdts / 512,
+ (SZ_2M / 512) - 8);
+ else
+ lim->max_hw_sectors = (SZ_2M / 512) - 8;
pcie_device_put(pcie_device);
spin_unlock_irqrestore(&ioc->pcie_device_lock, flags);
--
2.47.3
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v1] mpt3sas: Limit NVMe request size to 2 MiB
2026-04-09 18:42 [PATCH v1] mpt3sas: Limit NVMe request size to 2 MiB Ranjan Kumar
@ 2026-04-10 4:11 ` Damien Le Moal
2026-04-10 16:51 ` Keith Busch
0 siblings, 1 reply; 4+ messages in thread
From: Damien Le Moal @ 2026-04-10 4:11 UTC (permalink / raw)
To: Ranjan Kumar, linux-scsi, martin.petersen
Cc: sathya.prakash, chandrakanth.patil, stable, Mira Limbeck,
Keith Busch
On 2026/04/09 20:42, Ranjan Kumar wrote:
> Some firmware reports NVMe maximum transfer sizes that follow the drive
> capability. When those values are very large, the block layer may build
> I/O that this driver cannot handle, which can cause a kernel oops.
>
> When an NVMe device is set up, cap how large a single transfer may be
> to the smaller of the firmware-reported limit and roughly two mebibytes
> with a small margin. If no valid limit is reported, apply the same
> upper bound.
>
> Cc: stable@vger.kernel.org
> Fixes: 9b8b84879d4a ("block: Increase BLK_DEF_MAX_SECTORS_CAP")
> Reported-by: Mira Limbeck <m.limbeck@proxmox.com>
> Closes: https://lore.kernel.org/r/291f78bf-4b4a-40dd-867d-053b36c564b3@proxmox.com
> Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9b8b84879d4a
> Suggested-by: Keith Busch <kbusch@kernel.org>
> Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
> ---
> drivers/scsi/mpt3sas/mpt3sas_scsih.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> index 6ff788557294..b6abc83d8121 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> @@ -54,6 +54,7 @@
> #include <linux/interrupt.h>
> #include <linux/raid_class.h>
> #include <linux/unaligned.h>
> +#include <linux/sizes.h>
>
> #include "mpt3sas_base.h"
>
> @@ -2738,8 +2739,17 @@ scsih_sdev_configure(struct scsi_device *sdev, struct queue_limits *lim)
> pcie_device->enclosure_level,
> pcie_device->connector_name);
>
> + /*
> + * Firmware may report NVMe MDTS from the drive; values above
> + * what the driver can handle can cause a kernel oops. Cap queue
> + * I/O in sectors to min(MDTS, 2 MiB - 4096 B).
> + */
This comment has some grammar issues and is really hard to parse/understand.
> if (pcie_device->nvme_mdts)
> - lim->max_hw_sectors = pcie_device->nvme_mdts / 512;
> + lim->max_hw_sectors = min_t(u32,
> + pcie_device->nvme_mdts / 512,
> + (SZ_2M / 512) - 8);
> + else
> + lim->max_hw_sectors = (SZ_2M / 512) - 8;
I am very confused here: SZ_2MB assumes that you have an SSD with a minimum page
size of 4K, which can fit 4K / 8 = 512 PRP entries, each referencing 4K (one
page), so a maximum of 2MiB. However, if I am not mistaken, there is nothing in
nvme specs that forces the MPS field to be 0 (which leads to a page size of 4K).
So this seems incorrect to me, even though that will probably work for the vast
majority of SSDs out there, some exotic ones will not be correctly supported.
Keith ? Am I missing something here ?
Or do we simply do not care about SSDs with a minimum page size > 4K having
their maximum command size truncated ?
>
> pcie_device_put(pcie_device);
> spin_unlock_irqrestore(&ioc->pcie_device_lock, flags);
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v1] mpt3sas: Limit NVMe request size to 2 MiB
2026-04-10 4:11 ` Damien Le Moal
@ 2026-04-10 16:51 ` Keith Busch
2026-04-11 5:49 ` Ranjan Kumar
0 siblings, 1 reply; 4+ messages in thread
From: Keith Busch @ 2026-04-10 16:51 UTC (permalink / raw)
To: Damien Le Moal
Cc: Ranjan Kumar, linux-scsi, martin.petersen, sathya.prakash,
chandrakanth.patil, stable, Mira Limbeck
On Fri, Apr 10, 2026 at 06:11:22AM +0200, Damien Le Moal wrote:
> On 2026/04/09 20:42, Ranjan Kumar wrote:
>
> > if (pcie_device->nvme_mdts)
> > - lim->max_hw_sectors = pcie_device->nvme_mdts / 512;
> > + lim->max_hw_sectors = min_t(u32,
> > + pcie_device->nvme_mdts / 512,
> > + (SZ_2M / 512) - 8);
> > + else
> > + lim->max_hw_sectors = (SZ_2M / 512) - 8;
>
> I am very confused here: SZ_2MB assumes that you have an SSD with a minimum page
> size of 4K, which can fit 4K / 8 = 512 PRP entries, each referencing 4K (one
> page), so a maximum of 2MiB. However, if I am not mistaken, there is nothing in
> nvme specs that forces the MPS field to be 0 (which leads to a page size of 4K).
>
> So this seems incorrect to me, even though that will probably work for the vast
> majority of SSDs out there, some exotic ones will not be correctly supported.
>
> Keith ? Am I missing something here ?
>
> Or do we simply do not care about SSDs with a minimum page size > 4K having
> their maximum command size truncated ?
Spec doesn't require it, but industry converged on that as always being
the minimum supported page size. The nvme driver rejects any device that
doesn't support 4k pages because they can't be reliably supported on a
lot of archs, even ones with larger page sizes. So it should be a safe
assumption that everyone supports 4k since no on is complaining. :)
On the patch, I initially left the "- 8" in the calculation to account
for page offsets. But it's not necessary because that gets absorbed in
PRP1 within the command, so we'd have at most 512 entries in the PRP
list for a 2M transfer.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v1] mpt3sas: Limit NVMe request size to 2 MiB
2026-04-10 16:51 ` Keith Busch
@ 2026-04-11 5:49 ` Ranjan Kumar
0 siblings, 0 replies; 4+ messages in thread
From: Ranjan Kumar @ 2026-04-11 5:49 UTC (permalink / raw)
To: Keith Busch, Damien Le Moal
Cc: linux-scsi, martin.petersen, sathya.prakash, chandrakanth.patil,
stable, Mira Limbeck
[-- Attachment #1: Type: text/plain, Size: 2038 bytes --]
Hi Damien and Keith,
Thank you both for the review and the clarifications.I will submit the v2 patch.
On Fri, Apr 10, 2026 at 10:21 PM Keith Busch <kbusch@kernel.org> wrote:
>
> On Fri, Apr 10, 2026 at 06:11:22AM +0200, Damien Le Moal wrote:
> > On 2026/04/09 20:42, Ranjan Kumar wrote:
> >
> > > if (pcie_device->nvme_mdts)
> > > - lim->max_hw_sectors = pcie_device->nvme_mdts / 512;
> > > + lim->max_hw_sectors = min_t(u32,
> > > + pcie_device->nvme_mdts / 512,
> > > + (SZ_2M / 512) - 8);
> > > + else
> > > + lim->max_hw_sectors = (SZ_2M / 512) - 8;
> >
> > I am very confused here: SZ_2MB assumes that you have an SSD with a minimum page
> > size of 4K, which can fit 4K / 8 = 512 PRP entries, each referencing 4K (one
> > page), so a maximum of 2MiB. However, if I am not mistaken, there is nothing in
> > nvme specs that forces the MPS field to be 0 (which leads to a page size of 4K).
> >
> > So this seems incorrect to me, even though that will probably work for the vast
> > majority of SSDs out there, some exotic ones will not be correctly supported.
> >
> > Keith ? Am I missing something here ?
> >
> > Or do we simply do not care about SSDs with a minimum page size > 4K having
> > their maximum command size truncated ?
>
> Spec doesn't require it, but industry converged on that as always being
> the minimum supported page size. The nvme driver rejects any device that
> doesn't support 4k pages because they can't be reliably supported on a
> lot of archs, even ones with larger page sizes. So it should be a safe
> assumption that everyone supports 4k since no on is complaining. :)
>
> On the patch, I initially left the "- 8" in the calculation to account
> for page offsets. But it's not necessary because that gets absorbed in
> PRP1 within the command, so we'd have at most 512 entries in the PRP
> list for a 2M transfer.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5469 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-04-11 5:49 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-09 18:42 [PATCH v1] mpt3sas: Limit NVMe request size to 2 MiB Ranjan Kumar
2026-04-10 4:11 ` Damien Le Moal
2026-04-10 16:51 ` Keith Busch
2026-04-11 5:49 ` Ranjan Kumar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox