public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] scsi: megaraid_sas: fix PRP list out-of-bounds write
@ 2026-03-27  3:20 me
  2026-04-01  7:02 ` Thorsten Leemhuis
  2026-04-21  2:47 ` Martin K. Petersen
  0 siblings, 2 replies; 4+ messages in thread
From: me @ 2026-03-27  3:20 UTC (permalink / raw)
  To: linux-scsi@vger.kernel.org
  Cc: Kashyap Desai, Sumit Saxena, Shivasharan S, Chandrakanth patil,
	megaraidlinux.pdl@broadcom.com, regressions@lists.linux.dev

megasas_make_prp_nvme() builds NVMe PRP lists in cmd->sg_frame,
which is a DMA-pool allocation sized by instance->max_chain_frame_sz.

On the affected controller, max_chain_frame_sz is 4096 bytes. The
function stores 64-bit PRP entries in that buffer and, at each PRP-list
page boundary, uses the last slot for a chain pointer to the next page.

When ptr_sgl reaches offset 4088, the code stores the chain pointer
there, increments ptr_sgl, and then writes the next PRP entry at offset
4096, past the end of the allocation.

On an affected system this reproduces reliably on 6.19.10 during normal
I/O to an NVMe device behind a MegaRAID SAS39xx controller:

  BUG: unable to handle page fault for address: ff76d0e56380c000
  #PF: supervisor write access in kernel mode
  #PF: error_code(0x0002) - not-present page
  RIP: 0010:megasas_make_prp_nvme.isra.0+0x12f/0x220 [megaraid_sas]
  RAX: 0000000000000200

RAX=0x200 indicates the fault happens at the 512th 8-byte slot, i.e.
exactly the 4096-byte boundary of the chain frame.

Fix this by checking that the chain frame still has room before writing:

- the page-boundary chain pointer plus at least one following PRP entry
- each PRP entry itself

If either check fails, return false and let the caller use the existing
IEEE SGL fallback path.

Tested on:
- ASUS ESC8000A-E13
- 2x AMD EPYC 9335
- Broadcom MegaRAID 9560-16i / SAS39xx
- KIOXIA 14TB NVMe behind the controller

Before this patch, 6.19.10 crashed repeatedly during boot and normal
disk I/O. After applying it, the system boots cleanly and completes 4GB
direct-I/O reads without crashes.

Cc: stable@vger.kernel.org
Signed-off-by: Lukasz Magiera <me@magik.net>
---
 drivers/scsi/megaraid/megaraid_sas_fusion.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
index 4e498a6..29fa7f1 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
@@ -2225,6 +2225,18 @@ megasas_make_prp_nvme(struct megasas_instance *instance, struct scsi_cmnd *scmd,
 		/* Put PRP pointer due to page boundary*/
 		page_mask_result = (uintptr_t)(ptr_sgl + 1) & page_mask;
 		if (unlikely(!page_mask_result)) {
+			/*
+			 * Bounds check: if the chain frame buffer cannot
+			 * fit the chain pointer plus at least one more
+			 * PRP entry, bail out to IEEE SGL fallback.
+			 * This prevents writing past the end of the
+			 * DMA-allocated chain frame buffer.
+			 */
+			if ((num_prp_in_chain + 2) * sizeof(u64) >
+			    instance->max_chain_frame_sz) {
+				build_prp = false;
+				break;
+			}
 			scmd_printk(KERN_NOTICE,
 				    scmd, "page boundary ptr_sgl: 0x%p\n",
 				    ptr_sgl);
@@ -2234,6 +2246,13 @@ megasas_make_prp_nvme(struct megasas_instance *instance, struct scsi_cmnd *scmd,
 			num_prp_in_chain++;
 		}
 
+		/* Bounds check: ensure space for this PRP entry */
+		if ((num_prp_in_chain + 1) * sizeof(u64) >
+		    instance->max_chain_frame_sz) {
+			build_prp = false;
+			break;
+		}
+
 		*ptr_sgl = cpu_to_le64(sge_addr);
 		ptr_sgl++;
 		ptr_sgl_phys += 8;
-- 
2.43.0




^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] scsi: megaraid_sas: fix PRP list out-of-bounds write
  2026-03-27  3:20 [PATCH] scsi: megaraid_sas: fix PRP list out-of-bounds write me
@ 2026-04-01  7:02 ` Thorsten Leemhuis
  2026-04-21  2:47 ` Martin K. Petersen
  1 sibling, 0 replies; 4+ messages in thread
From: Thorsten Leemhuis @ 2026-04-01  7:02 UTC (permalink / raw)
  To: me, linux-scsi@vger.kernel.org
  Cc: Kashyap Desai, Sumit Saxena, Shivasharan S, Chandrakanth patil,
	megaraidlinux.pdl@broadcom.com, regressions@lists.linux.dev

On 3/27/26 04:20, me@magik.net wrote:
> megasas_make_prp_nvme() builds NVMe PRP lists in cmd->sg_frame,
> which is a DMA-pool allocation sized by instance->max_chain_frame_sz.
> [...]
> Before this patch, 6.19.10 crashed repeatedly during boot and normal
> disk I/O. After applying it, the system boots cleanly and completes 4GB
> direct-I/O reads without crashes.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Lukasz Magiera <me@magik.net>

You CCed the regression list, but this lacks a Fixes: tag, which makes
me wonder: what change caused the problem? That tag would also help the
stable team to see where this needs to be applied, so it most likely is
needed.

> [...]

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] scsi: megaraid_sas: fix PRP list out-of-bounds write
@ 2026-04-02 13:25 Daniel Fernau
  0 siblings, 0 replies; 4+ messages in thread
From: Daniel Fernau @ 2026-04-02 13:25 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: me, linux-scsi@vger.kernel.org, Kashyap Desai, Sumit Saxena,
	Shivasharan S, Chandrakanth patil, megaraidlinux.pdl@broadcom.com,
	regressions@lists.linux.dev

[-- Attachment #1: Type: text/plain, Size: 2661 bytes --]

I tested this on an HPE MR416i-o Gen11 with 6.17.13-2-pve and can
reproduce the crash reliably with large direct-I/O through Ceph.

I also rebuilt and loaded the posted patched megaraid_sas module on that
system and verified that the patched module was actually in use after
reboot (module path, hash, srcversion, and vermagic all matched the test
build).

With the patch loaded, the crash still occurs for my reproducer, but the
failure mode appears to have changed.

Before the patch, I was hitting the PRP-list boundary condition, with
the driver logging:

  page boundary ptr_sgl: ...

and then faulting in the PRP construction path.

With the patch loaded, addr2line now resolves the fault into
megasas_make_prp_nvme() at the SG-advance path:

  megasas_make_prp_nvme()
    drivers/scsi/megaraid/megaraid_sas_fusion.c:2271

which corresponds to:

  sg_scmd = sg_next(sg_scmd);
  sge_addr = sg_dma_address(sg_scmd);
  sge_len = sg_dma_len(sg_scmd);

The relevant disassembly also lines up with that sequence. So at least on
my hardware and workload, the current bounds-check patch is not
sufficient by itself. It looks like there is an additional unsafe path
when advancing to the next SG entry and dereferencing it.

In other words, the posted patch may address the PRP-frame boundary
write, but my reproducer still reaches another failure in
megasas_make_prp_nvme() afterwards.

I am going to investigate this further over the next few days, including
whether an additional guard is needed around sg_next() / sg_dma_address()
/ sg_dma_len() in the PRP builder, and I will report back with results.

For reference, this is on an HPE MR416i-o Gen11, and I can still
reproduce it with the patched module definitely loaded.


Best,
Daniel





> On Apr 1, 2026, at 9:02 AM, Thorsten Leemhuis <regressions@leemhuis.info> wrote:
> 
> On 3/27/26 04:20, me@magik.net wrote:
> 
> > megasas_make_prp_nvme() builds NVMe PRP lists in cmd->sg_frame,
> > which is a DMA-pool allocation sized by instance->max_chain_frame_sz.
> > [...]
> > Before this patch, 6.19.10 crashed repeatedly during boot and normal
> > disk I/O. After applying it, the system boots cleanly and completes 4GB
> > direct-I/O reads without crashes.
> > 
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Lukasz Magiera <me@magik.net>
> 
> 
> You CCed the regression list, but this lacks a Fixes: tag, which makes
> me wonder: what change caused the problem? That tag would also help the
> stable team to see where this needs to be applied, so it most likely is
> needed.
> 
> 
> > [...]
> 
> 
> Ciao, Thorsten

[-- Attachment #2: publickey - mail@danielfernau.com - 0x618592AD.asc --]
[-- Type: application/pgp-keys, Size: 844 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] scsi: megaraid_sas: fix PRP list out-of-bounds write
  2026-03-27  3:20 [PATCH] scsi: megaraid_sas: fix PRP list out-of-bounds write me
  2026-04-01  7:02 ` Thorsten Leemhuis
@ 2026-04-21  2:47 ` Martin K. Petersen
  1 sibling, 0 replies; 4+ messages in thread
From: Martin K. Petersen @ 2026-04-21  2:47 UTC (permalink / raw)
  To: me
  Cc: linux-scsi@vger.kernel.org, Kashyap Desai, Sumit Saxena,
	Shivasharan S, Chandrakanth patil, megaraidlinux.pdl@broadcom.com,
	regressions@lists.linux.dev


> megasas_make_prp_nvme() builds NVMe PRP lists in cmd->sg_frame,
> which is a DMA-pool allocation sized by instance->max_chain_frame_sz.

Broadcom: Please comment and review!

-- 
Martin K. Petersen

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-04-21  2:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-27  3:20 [PATCH] scsi: megaraid_sas: fix PRP list out-of-bounds write me
2026-04-01  7:02 ` Thorsten Leemhuis
2026-04-21  2:47 ` Martin K. Petersen
  -- strict thread matches above, loose matches on Subject: below --
2026-04-02 13:25 Daniel Fernau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox