[PATCH v2] block: Increase BLK_DEF_MAX_SECTORS

public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
@ 2025-06-18  6:00 Damien Le Moal
  2025-06-18  6:17 ` Hannes Reinecke
                   ` (6 more replies)
  0 siblings, 7 replies; 26+ messages in thread
From: Damien Le Moal @ 2025-06-18  6:00 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Christoph Hellwig, Martin K . Petersen

Back in 2015, commit d2be537c3ba3 ("block: bump BLK_DEF_MAX_SECTORS to
2560") increased the default maximum size of a block device I/O to 2560
sectors (1280 KiB) to "accommodate a 10-data-disk stripe write with
chunk size 128k". This choice is rather arbitrary and since then,
improvements to the block layer have software RAID drivers correctly
advertize their stripe width through chunk_sectors and abuses of
BLK_DEF_MAX_SECTORS_CAP by drivers (to set the HW limit rather than the
default user controlled maximum I/O size) have been fixed.

Since many block devices can benefit from a larger value of
BLK_DEF_MAX_SECTORS_CAP, and in particular HDDs, increase this value to
be 4MiB, or 8192 sectors.

And given that BLK_DEF_MAX_SECTORS_CAP is only used in the block layer
and should not be used by drivers directly, move this macro definition
to the block layer internal header file block/blk.h.

Suggested-by: Martin K . Petersen <martin.petersen@oracle.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
Changes from v1:
 - Move BLK_DEF_MAX_SECTORS_CAP definition to block/blk.h
 - Define the macro value using SZ_4M to make it more readable
 - Added review tag

 block/blk.h            | 9 +++++++++
 include/linux/blkdev.h | 9 ---------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/block/blk.h b/block/blk.h
index 37ec459fe656..1141b343d0b5 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -13,6 +13,15 @@
 
 struct elevator_type;
 
+/*
+ * Default upper limit for the software max_sectors limit used for regular I/Os.
+ * This can be increased through sysfs.
+ *
+ * This should not be confused with the max_hw_sector limit that is entirely
+ * controlled by the block device driver, usually based on hardware limits.
+ */
+#define BLK_DEF_MAX_SECTORS_CAP	(SZ_4M >> SECTOR_SHIFT)
+
 #define	BLK_DEV_MAX_SECTORS	(LLONG_MAX >> 9)
 #define	BLK_MIN_SEGMENT_SIZE	4096
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 85aab8bc96e7..c2b3ddea8b6d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1231,15 +1231,6 @@ enum blk_default_limits {
 	BLK_SEG_BOUNDARY_MASK	= 0xFFFFFFFFUL,
 };
 
-/*
- * Default upper limit for the software max_sectors limit used for
- * regular file system I/O.  This can be increased through sysfs.
- *
- * Not to be confused with the max_hw_sector limit that is entirely
- * controlled by the driver, usually based on hardware limits.
- */
-#define BLK_DEF_MAX_SECTORS_CAP	2560u
-
 static inline struct queue_limits *bdev_limits(struct block_device *bdev)
 {
 	return &bdev_get_queue(bdev)->limits;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-06-18  6:00 [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP Damien Le Moal
@ 2025-06-18  6:17 ` Hannes Reinecke
  2025-06-18  8:51 ` Johannes Thumshirn
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Hannes Reinecke @ 2025-06-18  6:17 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe, linux-block
  Cc: Christoph Hellwig, Martin K . Petersen

On 6/18/25 08:00, Damien Le Moal wrote:
> Back in 2015, commit d2be537c3ba3 ("block: bump BLK_DEF_MAX_SECTORS to
> 2560") increased the default maximum size of a block device I/O to 2560
> sectors (1280 KiB) to "accommodate a 10-data-disk stripe write with
> chunk size 128k". This choice is rather arbitrary and since then,
> improvements to the block layer have software RAID drivers correctly
> advertize their stripe width through chunk_sectors and abuses of
> BLK_DEF_MAX_SECTORS_CAP by drivers (to set the HW limit rather than the
> default user controlled maximum I/O size) have been fixed.
> 
> Since many block devices can benefit from a larger value of
> BLK_DEF_MAX_SECTORS_CAP, and in particular HDDs, increase this value to
> be 4MiB, or 8192 sectors.
> 
> And given that BLK_DEF_MAX_SECTORS_CAP is only used in the block layer
> and should not be used by drivers directly, move this macro definition
> to the block layer internal header file block/blk.h.
> 
> Suggested-by: Martin K . Petersen <martin.petersen@oracle.com>
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
> ---
> Changes from v1:
>   - Move BLK_DEF_MAX_SECTORS_CAP definition to block/blk.h
>   - Define the macro value using SZ_4M to make it more readable
>   - Added review tag
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-06-18  6:00 [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP Damien Le Moal
  2025-06-18  6:17 ` Hannes Reinecke
@ 2025-06-18  8:51 ` Johannes Thumshirn
  2025-06-18  9:06 ` John Garry
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Johannes Thumshirn @ 2025-06-18  8:51 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe, linux-block@vger.kernel.org
  Cc: hch, Martin K . Petersen

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-06-18  6:00 [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP Damien Le Moal
  2025-06-18  6:17 ` Hannes Reinecke
  2025-06-18  8:51 ` Johannes Thumshirn
@ 2025-06-18  9:06 ` John Garry
  2025-06-18  9:47   ` Damien Le Moal
  2025-06-18 10:19 ` Martin K. Petersen
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: John Garry @ 2025-06-18  9:06 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe, linux-block
  Cc: Christoph Hellwig, Martin K . Petersen

On 18/06/2025 07:00, Damien Le Moal wrote:
> Back in 2015, commit d2be537c3ba3 ("block: bump BLK_DEF_MAX_SECTORS to
> 2560") increased the default maximum size of a block device I/O to 2560
> sectors (1280 KiB) to "accommodate a 10-data-disk stripe write with
> chunk size 128k". This choice is rather arbitrary and since then,
> improvements to the block layer have software RAID drivers correctly
> advertize their stripe width through chunk_sectors and abuses of
> BLK_DEF_MAX_SECTORS_CAP by drivers (to set the HW limit rather than the
> default user controlled maximum I/O size) have been fixed.
> 
> Since many block devices can benefit from a larger value of
> BLK_DEF_MAX_SECTORS_CAP, and in particular HDDs, increase this value to
> be 4MiB, or 8192 sectors.
> 
> And given that BLK_DEF_MAX_SECTORS_CAP is only used in the block layer
> and should not be used by drivers directly, move this macro definition
> to the block layer internal header file block/blk.h.
> 
> Suggested-by: Martin K . Petersen <martin.petersen@oracle.com>
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

Regardless of comment below:
Reviewed-by: John Garry <john.g.garry@oracle.com>

> ---
> Changes from v1:
>   - Move BLK_DEF_MAX_SECTORS_CAP definition to block/blk.h

it's only referenced in blk-settings.c, so I don't know why it doesn't 
live there.

However it is co-located with enum blk_default_limits and the same 
comment goes for members of enum blk_default_limits. I think all those 
in enum blk_default_limits could potentially be moved to blk-settings.c 
after Christoph's work for atomic queue limit updates.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-06-18  9:06 ` John Garry
@ 2025-06-18  9:47   ` Damien Le Moal
  0 siblings, 0 replies; 26+ messages in thread
From: Damien Le Moal @ 2025-06-18  9:47 UTC (permalink / raw)
  To: John Garry, Jens Axboe, linux-block
  Cc: Christoph Hellwig, Martin K . Petersen

On 6/18/25 18:06, John Garry wrote:
> On 18/06/2025 07:00, Damien Le Moal wrote:
>> Back in 2015, commit d2be537c3ba3 ("block: bump BLK_DEF_MAX_SECTORS to
>> 2560") increased the default maximum size of a block device I/O to 2560
>> sectors (1280 KiB) to "accommodate a 10-data-disk stripe write with
>> chunk size 128k". This choice is rather arbitrary and since then,
>> improvements to the block layer have software RAID drivers correctly
>> advertize their stripe width through chunk_sectors and abuses of
>> BLK_DEF_MAX_SECTORS_CAP by drivers (to set the HW limit rather than the
>> default user controlled maximum I/O size) have been fixed.
>>
>> Since many block devices can benefit from a larger value of
>> BLK_DEF_MAX_SECTORS_CAP, and in particular HDDs, increase this value to
>> be 4MiB, or 8192 sectors.
>>
>> And given that BLK_DEF_MAX_SECTORS_CAP is only used in the block layer
>> and should not be used by drivers directly, move this macro definition
>> to the block layer internal header file block/blk.h.
>>
>> Suggested-by: Martin K . Petersen <martin.petersen@oracle.com>
>> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
>> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
> 
> Regardless of comment below:
> Reviewed-by: John Garry <john.g.garry@oracle.com>
> 
>> ---
>> Changes from v1:
>>   - Move BLK_DEF_MAX_SECTORS_CAP definition to block/blk.h
> 
> it's only referenced in blk-settings.c, so I don't know why it doesn't 
> live there.
> 
> However it is co-located with enum blk_default_limits and the same 
> comment goes for members of enum blk_default_limits. I think all those 
> in enum blk_default_limits could potentially be moved to blk-settings.c 
> after Christoph's work for atomic queue limit updates.

I actually checked that and a few drivers are still using 2 of the 4 enum defaults.

Jens,

DO you prefer we move BLK_DEF_MAX_SECTORS_CAP to blk-settings.c ? blk.h has a
couple of settings macro at the top, it is together with that for now.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-06-18  6:00 [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP Damien Le Moal
                   ` (2 preceding siblings ...)
  2025-06-18  9:06 ` John Garry
@ 2025-06-18 10:19 ` Martin K. Petersen
  2025-06-23 13:40 ` Christoph Hellwig
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Martin K. Petersen @ 2025-06-18 10:19 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen


Damien,

> And given that BLK_DEF_MAX_SECTORS_CAP is only used in the block layer
> and should not be used by drivers directly, move this macro definition
> to the block layer internal header file block/blk.h.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-06-18  6:00 [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP Damien Le Moal
                   ` (3 preceding siblings ...)
  2025-06-18 10:19 ` Martin K. Petersen
@ 2025-06-23 13:40 ` Christoph Hellwig
  2025-06-24 16:49 ` Jens Axboe
  2026-03-31 12:02 ` Mira Limbeck
  6 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2025-06-23 13:40 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-06-18  6:00 [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP Damien Le Moal
                   ` (4 preceding siblings ...)
  2025-06-23 13:40 ` Christoph Hellwig
@ 2025-06-24 16:49 ` Jens Axboe
  2025-08-27  7:07   ` Sebastian Andrzej Siewior
  2026-03-31 12:02 ` Mira Limbeck
  6 siblings, 1 reply; 26+ messages in thread
From: Jens Axboe @ 2025-06-24 16:49 UTC (permalink / raw)
  To: linux-block, Damien Le Moal; +Cc: Christoph Hellwig, Martin K . Petersen


On Wed, 18 Jun 2025 15:00:45 +0900, Damien Le Moal wrote:
> Back in 2015, commit d2be537c3ba3 ("block: bump BLK_DEF_MAX_SECTORS to
> 2560") increased the default maximum size of a block device I/O to 2560
> sectors (1280 KiB) to "accommodate a 10-data-disk stripe write with
> chunk size 128k". This choice is rather arbitrary and since then,
> improvements to the block layer have software RAID drivers correctly
> advertize their stripe width through chunk_sectors and abuses of
> BLK_DEF_MAX_SECTORS_CAP by drivers (to set the HW limit rather than the
> default user controlled maximum I/O size) have been fixed.
> 
> [...]

Applied, thanks!

[1/1] block: Increase BLK_DEF_MAX_SECTORS_CAP
      commit: 345c5091ffec5d4d53d7fe572fef3bcc3805824b

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-06-24 16:49 ` Jens Axboe
@ 2025-08-27  7:07   ` Sebastian Andrzej Siewior
  2025-08-27  7:38     ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-08-27  7:07 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Damien Le Moal, Christoph Hellwig,
	Martin K . Petersen

On 2025-06-24 10:49:16 [-0600], Jens Axboe wrote:
> Applied, thanks!
> 
> [1/1] block: Increase BLK_DEF_MAX_SECTORS_CAP
>       commit: 345c5091ffec5d4d53d7fe572fef3bcc3805824b

I have here a PowerEdge R6525 which exposes a "DELLBOSS VD" device with
firmware MV.R00-0. I updated firmware left and right of the components I
could find but started with this commit I get:
|[   10.894688] ata1: SATA max UDMA/133 abar m2048@0xa6300000 port 0xa6300100 irq 97 lpm-pol 1
|[   11.233656] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
|[   11.267540] ata1.00: ATA-8: DELLBOSS VD, MV.R00-0, max UDMA7
|[   11.279106] ata1.00: 468731008 sectors, multi 0: LBA48 NCQ (depth 32)
|[   11.309332] ata1.00: Invalid log directory version 0x0000
|[   11.324380] ata1.00: Security Log not supported
|[   11.336514] ata1.00: Security Log not supported
|[   11.350523] ata1.00: configured for UDMA/133
|[   11.351026] scsi 0:0:0:0: Direct-Access     ATA      DELLBOSS VD      00-0 PQ: 0 ANSI: 5
|[   11.361416] scsi 0:0:0:0: Attached scsi generic sg0 type 0
|[   11.361928] sd 0:0:0:0: [sda] 468731008 512-byte logical blocks: (240 GB/224 GiB)
|[   11.361932] sd 0:0:0:0: [sda] 4096-byte physical blocks
|[   11.361942] sd 0:0:0:0: [sda] Write Protect is off
|[   11.361944] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
|[   11.361957] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
|[   11.361979] sd 0:0:0:0: [sda] Preferred minimum I/O size 4096 bytes
|[   12.654692] EXT4-fs (sda2): mounted filesystem abc6d95d-c676-442b-b22a-ce59dbdc47d3 ro with ordered data mode. Quota mode: none.
|[   14.497319] EXT4-fs (sda2): re-mounted abc6d95d-c676-442b-b22a-ce59dbdc47d3 r/w.
|[   67.619838] ata1: illegal qc_active transition (100000000->180005576)
|[   67.627051] ata1.00: Read log 0x10 page 0x00 failed, Emask 0x100
|[   67.633773] ata1: failed to read log page 10h (errno=-5)
|[   67.639802] ata1.00: exception Emask 0x1 SAct 0x80005576 SErr 0x0 action 0x6 frozen
|[   67.648156] ata1.00: irq_stat 0x40000008
|[   67.652785] ata1.00: failed command: WRITE FPDMA QUEUED
|[   67.658707] ata1.00: cmd 61/08:08:d0:85:12/00:00:00:00:00/40 tag 1 ncq dma 4096 out
|                         res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x3 (HSM violation) 
…
|[   67.881878] ata1: hard resetting link
|[   68.194853] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
|[   68.201931] ata1.00: Security Log not supported
|[   68.207468] ata1.00: Security Log not supported
|[   68.212723] ata1.00: configured for UDMA/133
|[   68.527427] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
|[   68.534478] ata1.00: Security Log not supported
|[   68.539924] ata1.00: Security Log not supported
|[   68.545218] ata1.00: configured for UDMA/133
|[   68.550246] sd 0:0:0:0: [sda] tag#7 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=50s
|[   68.560502] sd 0:0:0:0: [sda] tag#7 CDB: Read(10) 28 00 00 50 08 00 00 00 10 00
|[   68.568502] I/O error, dev sda, sector 5244928 op 0x0:(READ) flags 0x83700 phys_seg 2 prio class 2

and this never recovers. After reverting 9b8b84879d4ad ("block: Increase
BLK_DEF_MAX_SECTORS_CAP") on top of v6.17-rc3 things are back to normal.

Did I forget to update firmware somewhere or is this "normal" and this
device requires a quirk?

> Best regards,

Sebastian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-08-27  7:07   ` Sebastian Andrzej Siewior
@ 2025-08-27  7:38     ` Christoph Hellwig
  2025-08-27  7:52       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2025-08-27  7:38 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Jens Axboe, linux-block, Damien Le Moal, Christoph Hellwig,
	Martin K . Petersen

On Wed, Aug 27, 2025 at 09:07:05AM +0200, Sebastian Andrzej Siewior wrote:
> 
> I have here a PowerEdge R6525 which exposes a "DELLBOSS VD" device with
> firmware MV.R00-0. I updated firmware left and right of the components I
> could find but started with this commit I get:

...

> Did I forget to update firmware somewhere or is this "normal" and this
> device requires a quirk?

Looks like it needs a quirk.  Note that if the above commit triggered
this for you, you could also reproduce it before by say doing a large
direct I/O read.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-08-27  7:38     ` Christoph Hellwig
@ 2025-08-27  7:52       ` Sebastian Andrzej Siewior
  2025-08-27  8:00         ` Christoph Hellwig
  2025-08-27  8:01         ` Damien Le Moal
  0 siblings, 2 replies; 26+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-08-27  7:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Damien Le Moal, Martin K . Petersen

On 2025-08-27 09:38:36 [+0200], Christoph Hellwig wrote:
> > Did I forget to update firmware somewhere or is this "normal" and this
> > device requires a quirk?
> 
> Looks like it needs a quirk.  

Just wanted to make sure I did not forget to update firmware somewhere…
It should be easy to fix this one the firmware's side (in case someone
capable is reading this).

>                               Note that if the above commit triggered
> this for you, you could also reproduce it before by say doing a large
> direct I/O read.

On a kernel without that commit in question? Booting Debian's current
v6.12 and
|  dd if=vmlinux.o of=/dev/null bs=1G count=1 iflag=direct

works like a charm. According to strace it does
| openat(AT_FDCWD, "vmlinux.o", O_RDONLY|O_DIRECT) = 3
| dup2(3, 0)                              = 0
| lseek(0, 0, SEEK_CUR)                   = 0
| read(0, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\1\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 1073741824) = 841980992

so it should be what you asked for. Asked for 1G, got ~800M.

Sebastian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-08-27  7:52       ` Sebastian Andrzej Siewior
@ 2025-08-27  8:00         ` Christoph Hellwig
  2025-08-27  8:03           ` Damien Le Moal
  2025-08-27  8:01         ` Damien Le Moal
  1 sibling, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2025-08-27  8:00 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Damien Le Moal,
	Martin K . Petersen

On Wed, Aug 27, 2025 at 09:52:21AM +0200, Sebastian Andrzej Siewior wrote:
> > this for you, you could also reproduce it before by say doing a large
> > direct I/O read.
> 
> On a kernel without that commit in question? Booting Debian's current
> v6.12 and
> |  dd if=vmlinux.o of=/dev/null bs=1G count=1 iflag=direct
> 
> works like a charm. According to strace it does
> | openat(AT_FDCWD, "vmlinux.o", O_RDONLY|O_DIRECT) = 3
> | dup2(3, 0)                              = 0
> | lseek(0, 0, SEEK_CUR)                   = 0
> | read(0, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\1\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 1073741824) = 841980992
> 
> so it should be what you asked for. Asked for 1G, got ~800M.

This is probably splitting thing up into multiple bios because your
output memory is fragmented.  You'd have to do it into hugetlbfs or
vma otherwise backed by very larger folios.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-08-27  8:00         ` Christoph Hellwig
@ 2025-08-27  8:03           ` Damien Le Moal
  0 siblings, 0 replies; 26+ messages in thread
From: Damien Le Moal @ 2025-08-27  8:03 UTC (permalink / raw)
  To: Christoph Hellwig, Sebastian Andrzej Siewior
  Cc: Jens Axboe, linux-block, Martin K . Petersen

On 8/27/25 5:00 PM, Christoph Hellwig wrote:
> On Wed, Aug 27, 2025 at 09:52:21AM +0200, Sebastian Andrzej Siewior wrote:
>>> this for you, you could also reproduce it before by say doing a large
>>> direct I/O read.
>>
>> On a kernel without that commit in question? Booting Debian's current
>> v6.12 and
>> |  dd if=vmlinux.o of=/dev/null bs=1G count=1 iflag=direct
>>
>> works like a charm. According to strace it does
>> | openat(AT_FDCWD, "vmlinux.o", O_RDONLY|O_DIRECT) = 3
>> | dup2(3, 0)                              = 0
>> | lseek(0, 0, SEEK_CUR)                   = 0
>> | read(0, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\1\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 1073741824) = 841980992
>>
>> so it should be what you asked for. Asked for 1G, got ~800M.
> 
> This is probably splitting thing up into multiple bios because your
> output memory is fragmented.  You'd have to do it into hugetlbfs or
> vma otherwise backed by very larger folios.

and also need:

echo 4096 > /sys/block/sdX/queue/max_sectors_kb

or some large number.

But given that commit 345c5091ffec sets the default to 4MiB, I/Os are split to
4M and trigger the issue. So there is likely a cut-off command size < 4M where
things stop working with this adapter.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-08-27  7:52       ` Sebastian Andrzej Siewior
  2025-08-27  8:00         ` Christoph Hellwig
@ 2025-08-27  8:01         ` Damien Le Moal
  2025-08-27  8:42           ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 26+ messages in thread
From: Damien Le Moal @ 2025-08-27  8:01 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Christoph Hellwig
  Cc: Jens Axboe, linux-block, Martin K . Petersen

On 8/27/25 4:52 PM, Sebastian Andrzej Siewior wrote:
> On 2025-08-27 09:38:36 [+0200], Christoph Hellwig wrote:
>>> Did I forget to update firmware somewhere or is this "normal" and this
>>> device requires a quirk?
>>
>> Looks like it needs a quirk.  
> 
> Just wanted to make sure I did not forget to update firmware somewhere…
> It should be easy to fix this one the firmware's side (in case someone
> capable is reading this).
> 
>>                               Note that if the above commit triggered
>> this for you, you could also reproduce it before by say doing a large
>> direct I/O read.
> 
> On a kernel without that commit in question? Booting Debian's current
> v6.12 and
> |  dd if=vmlinux.o of=/dev/null bs=1G count=1 iflag=direct

Don't read a file. Read the disk directly. So please use "if=/dev/sdX".
Also, there is no way that a 1GiB I/O will be done as a single large command.
That is not going to happen.

With 345c5091ffec reverted, what does:

cat /sys/block/sdX/queue/max_sectors_kb
cat /sys/block/sdX/queue/max_hw_sectors_kb

say ?

Likely, the first one is "1280". So before running dd, you need to do:

echo 4096 > /sys/block/sdX/queue/max_sectors_kb

and then

dd if=/dev/sdX of=/dev/null bs=4M count=1 iflag=direct

And you will likely trigger the issue, even with 345c5091ffec reverted.
The issue is likely caused by a FW bug handling large commands.
Please try.



> 
> works like a charm. According to strace it does
> | openat(AT_FDCWD, "vmlinux.o", O_RDONLY|O_DIRECT) = 3
> | dup2(3, 0)                              = 0
> | lseek(0, 0, SEEK_CUR)                   = 0
> | read(0, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\1\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 1073741824) = 841980992
> 
> so it should be what you asked for. Asked for 1G, got ~800M.
> 
> Sebastian


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-08-27  8:01         ` Damien Le Moal
@ 2025-08-27  8:42           ` Sebastian Andrzej Siewior
  2025-08-27  9:01             ` Damien Le Moal
  0 siblings, 1 reply; 26+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-08-27  8:42 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Martin K . Petersen

On 2025-08-27 17:01:49 [+0900], Damien Le Moal wrote:
> Don't read a file. Read the disk directly. So please use "if=/dev/sdX".
> Also, there is no way that a 1GiB I/O will be done as a single large command.
> That is not going to happen.
> 
> With 345c5091ffec reverted, what does:
> 
> cat /sys/block/sdX/queue/max_sectors_kb
> cat /sys/block/sdX/queue/max_hw_sectors_kb
> 
> say ?

| # cat /sys/block/sda/queue/max_sectors_kb
| 1280
| # cat /sys/block/sda/queue/max_hw_sectors_kb
| 32767

> Likely, the first one is "1280". So before running dd, you need to do:
> 
> echo 4096 > /sys/block/sdX/queue/max_sectors_kb
> 
> and then
> 
> dd if=/dev/sdX of=/dev/null bs=4M count=1 iflag=direct

| # echo 4096 > /sys/block/sda/queue/max_sectors_kb
| # cat /sys/block/sda/queue/max_sectors_kb 
| 4096
| # dd if=/dev/sda of=/dev/null bs=4M count=1 iflag=direct
| 1+0 records in
| 1+0 records out
| 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.00966543 s, 434 MB/s

It passed.
After a reboot I issued the same dd command five times and all came
back. Then I increased the sector size and issued it again. The first
two came back and then

| root@zen3:~# dd if=/dev/sda of=/dev/null bs=4M count=1 iflag=direct
| 1+0 records in
| 1+0 records out
| 4194304 bytes (4.2 MB, 4.0 MiB) copied, 33.1699 s, 126 kB/s
| root@zen3:~# dd if=/dev/sda of=/dev/null bs=4M count=1 iflag=direct
| 1+0 records in
| 1+0 records out
| 4194304 bytes (4.2 MB, 4.0 MiB) copied, 57.3711 s, 73.1 kB/s
| root@zen3:~# dd if=/dev/sda of=/dev/null bs=4M count=1 iflag=direct
| 1+0 records in
| 1+0 records out
| 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0264171 s, 159 MB/s

They all came back but as you see on the speed side, it took while. And
I see
| [  191.641315] ata1.00: exception Emask 0x0 SAct 0x800000 SErr 0x0 action 0x6 frozen
| [  191.648839] ata1.00: failed command: READ FPDMA QUEUED
| [  191.653995] ata1.00: cmd 60/00:b8:00:00:00/20:00:00:00:00/40 tag 23 ncq dma 4194304 in
|                         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
| [  191.669306] ata1.00: status: { DRDY }
| [  191.672981] ata1: hard resetting link
| [  192.702763] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
| [  192.702964] ata1.00: Security Log not supported
| [  192.703207] ata1.00: Security Log not supported
| [  192.703215] ata1.00: configured for UDMA/133
| [  192.703282] ata1: EH complete
| [  248.985303] ata1.00: exception Emask 0x0 SAct 0x10001 SErr 0x0 action 0x6 frozen
| [  248.992733] ata1.00: failed command: READ FPDMA QUEUED
| [  248.997889] ata1.00: cmd 60/00:00:00:00:00/20:00:00:00:00/40 tag 0 ncq dma 4194304 in
|                         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
| [  249.013107] ata1.00: status: { DRDY }
| [  249.016775] ata1.00: failed command: WRITE FPDMA QUEUED
| [  249.022011] ata1.00: cmd 61/08:80:40:d1:18/00:00:00:00:00/40 tag 16 ncq dma 4096 out
|                         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
| [  249.037135] ata1.00: status: { DRDY }
| [  249.040802] ata1: hard resetting link
| [  250.076059] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
| [  250.076258] ata1.00: Security Log not supported
| [  250.076471] ata1.00: Security Log not supported
| [  250.076478] ata1.00: configured for UDMA/133
| [  250.076537] ata1: EH complete

> And you will likely trigger the issue, even with 345c5091ffec reverted.
> The issue is likely caused by a FW bug handling large commands.
> Please try.
Done. It seems the firmware is not always dedicated to fulfill larger
requests.

Sebastian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-08-27  8:42           ` Sebastian Andrzej Siewior
@ 2025-08-27  9:01             ` Damien Le Moal
  2025-08-27 10:16               ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 26+ messages in thread
From: Damien Le Moal @ 2025-08-27  9:01 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Martin K . Petersen

On 8/27/25 5:42 PM, Sebastian Andrzej Siewior wrote:
> On 2025-08-27 17:01:49 [+0900], Damien Le Moal wrote:
>> Don't read a file. Read the disk directly. So please use "if=/dev/sdX".
>> Also, there is no way that a 1GiB I/O will be done as a single large command.
>> That is not going to happen.
>>
>> With 345c5091ffec reverted, what does:
>>
>> cat /sys/block/sdX/queue/max_sectors_kb
>> cat /sys/block/sdX/queue/max_hw_sectors_kb
>>
>> say ?
> 
> | # cat /sys/block/sda/queue/max_sectors_kb
> | 1280
> | # cat /sys/block/sda/queue/max_hw_sectors_kb
> | 32767
> 
>> Likely, the first one is "1280". So before running dd, you need to do:
>>
>> echo 4096 > /sys/block/sdX/queue/max_sectors_kb
>>
>> and then
>>
>> dd if=/dev/sdX of=/dev/null bs=4M count=1 iflag=direct
> 
> | # echo 4096 > /sys/block/sda/queue/max_sectors_kb
> | # cat /sys/block/sda/queue/max_sectors_kb 
> | 4096
> | # dd if=/dev/sda of=/dev/null bs=4M count=1 iflag=direct
> | 1+0 records in
> | 1+0 records out
> | 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.00966543 s, 434 MB/s
> 
> It passed.
> After a reboot I issued the same dd command five times and all came
> back. Then I increased the sector size and issued it again. The first
> two came back and then
> 
> | root@zen3:~# dd if=/dev/sda of=/dev/null bs=4M count=1 iflag=direct
> | 1+0 records in
> | 1+0 records out
> | 4194304 bytes (4.2 MB, 4.0 MiB) copied, 33.1699 s, 126 kB/s
> | root@zen3:~# dd if=/dev/sda of=/dev/null bs=4M count=1 iflag=direct
> | 1+0 records in
> | 1+0 records out
> | 4194304 bytes (4.2 MB, 4.0 MiB) copied, 57.3711 s, 73.1 kB/s
> | root@zen3:~# dd if=/dev/sda of=/dev/null bs=4M count=1 iflag=direct
> | 1+0 records in
> | 1+0 records out
> | 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0264171 s, 159 MB/s
> 
> They all came back but as you see on the speed side, it took while. And
> I see
> | [  191.641315] ata1.00: exception Emask 0x0 SAct 0x800000 SErr 0x0 action 0x6 frozen
> | [  191.648839] ata1.00: failed command: READ FPDMA QUEUED
> | [  191.653995] ata1.00: cmd 60/00:b8:00:00:00/20:00:00:00:00/40 tag 23 ncq dma 4194304 in
> |                         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> | [  191.669306] ata1.00: status: { DRDY }
> | [  191.672981] ata1: hard resetting link
> | [  192.702763] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> | [  192.702964] ata1.00: Security Log not supported
> | [  192.703207] ata1.00: Security Log not supported
> | [  192.703215] ata1.00: configured for UDMA/133
> | [  192.703282] ata1: EH complete
> | [  248.985303] ata1.00: exception Emask 0x0 SAct 0x10001 SErr 0x0 action 0x6 frozen
> | [  248.992733] ata1.00: failed command: READ FPDMA QUEUED
> | [  248.997889] ata1.00: cmd 60/00:00:00:00:00/20:00:00:00:00/40 tag 0 ncq dma 4194304 in
> |                         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> | [  249.013107] ata1.00: status: { DRDY }
> | [  249.016775] ata1.00: failed command: WRITE FPDMA QUEUED
> | [  249.022011] ata1.00: cmd 61/08:80:40:d1:18/00:00:00:00:00/40 tag 16 ncq dma 4096 out
> |                         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

The drive is not responding. So likely a drive/adapter FW bug.

> | [  249.037135] ata1.00: status: { DRDY }
> | [  249.040802] ata1: hard resetting link
> | [  250.076059] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> | [  250.076258] ata1.00: Security Log not supported
> | [  250.076471] ata1.00: Security Log not supported
> | [  250.076478] ata1.00: configured for UDMA/133
> | [  250.076537] ata1: EH complete
> 
>> And you will likely trigger the issue, even with 345c5091ffec reverted.
>> The issue is likely caused by a FW bug handling large commands.
>> Please try.
> Done. It seems the firmware is not always dedicated to fulfill larger
> requests.

Yep, looks like it.
What is the driver used for that "PowerEdge R6525 which exposes a "DELLBOSS VD"
device with firmware MV.R00-0" ? Is it the regular ahci driver ?
If yes, we can quirk it to limit the max command size, but we would need to
know what the limit is. That means repeating that test with varying max command
sise (max_sectors_kb) to try to figure out the threshold.

And maybe contact Dell support too if this is still a supported device.
(I have zero experience with this, no idea what that DELLBOSS VD is...)


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-08-27  9:01             ` Damien Le Moal
@ 2025-08-27 10:16               ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 26+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-08-27 10:16 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Martin K . Petersen

On 2025-08-27 18:01:58 [+0900], Damien Le Moal wrote:
> Yep, looks like it.
> What is the driver used for that "PowerEdge R6525 which exposes a "DELLBOSS VD"
> device with firmware MV.R00-0" ? Is it the regular ahci driver ?
> If yes, we can quirk it to limit the max command size, but we would need to
> know what the limit is. That means repeating that test with varying max command
> sise (max_sectors_kb) to try to figure out the threshold.

issued 'dd if=/dev/sda of=/dev/null bs=4M count=1 iflag=direct'

| echo 3584 > /sys/block/sda/queue/max_sectors_kb

survives

| echo 4096 > /sys/block/sda/queue/max_sectors_kb
dies after a few attempts.

| lrwxrwxrwx 1 root root 0 Aug 27 11:01 0:0:0:0 -> ../../../devices/pci0000:a0/0000:a0:03.1/0000:a1:00.0/ata1/host0/target0:0:0/0:0:0:0
| a1:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9230 PCIe 2.0 x2 4-port SATA 6 Gb/s RAID Controller (rev 11)
|         Subsystem: Dell BOSS-S1 Adapter
|         Kernel driver in use: ahci
|         Kernel modules: ahci

so a Marvell one. Interesting.

> And maybe contact Dell support too if this is still a supported device.
> (I have zero experience with this, no idea what that DELLBOSS VD is...)

There are two physical disks which are behind this controller and
exposed as one virtual device after applying some raid magic. I have no
idea if this limitation is due to the physical device or the controller.

Now, how do I put this. The disk behind it is an INTEL SSDSCKKB240G8R.
The firmware XC31DL6P exposed this problem. After an update to XC31DL6R
the problem is gone.

Sebastian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-06-18  6:00 [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP Damien Le Moal
                   ` (5 preceding siblings ...)
  2025-06-24 16:49 ` Jens Axboe
@ 2026-03-31 12:02 ` Mira Limbeck
  2026-03-31 12:30   ` Mira Limbeck
  2026-03-31 19:48   ` Damien Le Moal
  6 siblings, 2 replies; 26+ messages in thread
From: Mira Limbeck @ 2026-03-31 12:02 UTC (permalink / raw)
  To: dlemoal; +Cc: axboe, hch, linux-block, martin.petersen, Friedrich Weber

Hi,


Some of our Proxmox VE users started seeing `unable to handle page
fault` after switching to our downstream kernel 6.17, and after
bisecting with the mainline kernel we've identified this patch as the
first commit (9b8b84879d4adc506b0d3944e20b28d9f3f6994b) where we see
those errors.

It requires a certain combination of hardware though, so far we've seen
this with a combination of:
Broadcom/LSI HBAs with NVMe support (9400, 9500)
KIOXIA KCD8 NVMes

The Hardware of our test machine consists of:

81:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI
Fusion-MPT 12GSAS/PCIe Secure SAS38xx [1000:00e6]
	Subsystem: Broadcom / LSI 9500-16i Tri-Mode HBA [1000:4050]
	Kernel driver in use: mpt3sas
	Kernel modules: mpt3sas

FW Package Ver(37.00.00.00)
SAS3816: FWVersion(37.00.00.00), ChipRevision(0x00)

and

Mar 31 10:52:44 pve-test-hba kernel: scsi 16:2:0:0: Direct-Access
NVMe     KIOXIA KCD8XRUG7 0105 PQ: 0 ANSI: 6

The 4 NVMes are exposed as SCSI devices via the Broadcom controller.

```
Mar 27 15:03:11 pve-test-hba kernel: sd 3:2:2:0: [sdc] tag#2463 page boundary curr_buff: 0x000000002ce04d26
Mar 27 15:03:11 pve-test-hba kernel: BUG: unable to handle page fault for address: ff76b049c31fe000
Mar 27 15:03:11 pve-test-hba kernel: #PF: supervisor write access in kernel mode
Mar 27 15:03:11 pve-test-hba kernel: #PF: error_code(0x0002) - not-present page
Mar 27 15:03:11 pve-test-hba kernel: PGD 100010067 P4D 1008c7067 PUD 1008c8067 PMD 11cd9f067 PTE 0
Mar 27 15:03:11 pve-test-hba kernel: Oops: Oops: 0002 [#1] SMP NOPTI
Mar 27 15:03:11 pve-test-hba kernel: CPU: 7 UID: 0 PID: 4385 Comm: dmcrypt_write/2 Tainted: G E 6.16.0-rc4-step14-00001-g9b8b84879d4a #16 PREEMPT(voluntary)
Mar 27 15:03:11 pve-test-hba kernel: Tainted: [E]=UNSIGNED_MODULE
Mar 27 15:03:11 pve-test-hba kernel: Hardware name: <snip>
Mar 27 15:03:11 pve-test-hba kernel: RIP: 0010:_base_build_sg_scmd_ieee+0x478/0x590 [mpt3sas]
Mar 27 15:03:11 pve-test-hba kernel: Code: 20 48 83 c3 20 48 89 d1 48 83 e1 fc 83 e2 01 48 0f 45 d9 4c 8b 73 10 44 8b 63 18 4c 89 e9 4c 8d 69 08 44 85 e8 74 31 45 29 d7 <4c> 89 31 49 83 c1 08 41 83 c0 01 45 29 d4 45 85 ff 7f af 4c 8b 75
Mar 27 15:03:11 pve-test-hba kernel: RSP: 0018:ff76b049c170b8a8 EFLAGS: 00010206
Mar 27 15:03:11 pve-test-hba kernel: RAX: 0000000000000fff RBX: ff220bd05e8a0270 RCX: ff76b049c31fe000
Mar 27 15:03:11 pve-test-hba kernel: RDX: ff76b049c31fe008 RSI: 0000000000000000 RDI: 0000000000000000
Mar 27 15:03:11 pve-test-hba kernel: RBP: ff76b049c170b908 R08: 0000000000000200 R09: 00000000ff161000
Mar 27 15:03:11 pve-test-hba kernel: R10: 0000000000001000 R11: 0000000000001000 R12: 00000000001a0000
Mar 27 15:03:11 pve-test-hba kernel: R13: ff76b049c31fe008 R14: 00000000f9600000 R15: 000000000019f000
Mar 27 15:03:11 pve-test-hba kernel: FS: 0000000000000000(0000) GS:ff220bd3d1ee7000(0000) knlGS:0000000000000000
Mar 27 15:03:11 pve-test-hba kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 27 15:03:11 pve-test-hba kernel: CR2: ff76b049c31fe000 CR3: 0000000109f8c007 CR4: 0000000000f71ef0
Mar 27 15:03:11 pve-test-hba kernel: PKRU: 55555554
Mar 27 15:03:11 pve-test-hba kernel: Call Trace:
Mar 27 15:03:11 pve-test-hba kernel: <TASK>
Mar 27 15:03:11 pve-test-hba kernel: scsih_qcmd+0x37c/0x620 [mpt3sas]
Mar 27 15:03:11 pve-test-hba kernel: scsi_queue_rq+0x3ec/0xd30
Mar 27 15:03:11 pve-test-hba kernel: blk_mq_dispatch_rq_list+0x118/0x740
Mar 27 15:03:11 pve-test-hba kernel: ? sbitmap_get+0x73/0x180
Mar 27 15:03:11 pve-test-hba kernel: ? sbitmap_get+0x73/0x180
Mar 27 15:03:11 pve-test-hba kernel: __blk_mq_sched_dispatch_requests+0x3fc/0x5b0
Mar 27 15:03:11 pve-test-hba kernel: ? elv_attempt_insert_merge+0xa6/0x100
Mar 27 15:03:11 pve-test-hba kernel: blk_mq_sched_dispatch_requests+0x2d/0x70
Mar 27 15:03:11 pve-test-hba kernel: blk_mq_run_hw_queue+0x250/0x340
Mar 27 15:03:11 pve-test-hba kernel: blk_mq_dispatch_list+0x16c/0x450
Mar 27 15:03:11 pve-test-hba kernel: blk_mq_flush_plug_list+0x62/0x1e0
Mar 27 15:03:11 pve-test-hba kernel: blk_add_rq_to_plug+0xff/0x1f0
Mar 27 15:03:11 pve-test-hba kernel: blk_mq_submit_bio+0x616/0x7e0
Mar 27 15:03:11 pve-test-hba kernel: __submit_bio+0x74/0x290
Mar 27 15:03:11 pve-test-hba kernel: submit_bio_noacct_nocheck+0x1a2/0x3b0
Mar 27 15:03:11 pve-test-hba kernel: submit_bio_noacct+0x1a0/0x5b0
Mar 27 15:03:11 pve-test-hba kernel: dm_submit_bio_remap+0x49/0xb0
Mar 27 15:03:11 pve-test-hba kernel: dmcrypt_write+0x120/0x150 [dm_crypt]
Mar 27 15:03:11 pve-test-hba kernel: ? __pfx_dmcrypt_write+0x10/0x10 [dm_crypt]
Mar 27 15:03:11 pve-test-hba kernel: kthread+0x10a/0x230
Mar 27 15:03:11 pve-test-hba kernel: ? __pfx_kthread+0x10/0x10
Mar 27 15:03:11 pve-test-hba kernel: ret_from_fork+0x1d1/0x200
Mar 27 15:03:11 pve-test-hba kernel: ? __pfx_kthread+0x10/0x10
Mar 27 15:03:11 pve-test-hba kernel: ret_from_fork_asm+0x1a/0x30
Mar 27 15:03:11 pve-test-hba kernel: </TASK>
Mar 27 15:03:11 pve-test-hba kernel: Modules linked in: dm_crypt(E) ebtable_filter(E) ebtables(E) ip_set(E) ip6table_raw(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) nf_tables(E) sunrpc(E) iptable_raw(E) xt_CT(E) iptable_nat(E) xt>
Mar 27 15:03:11 pve-test-hba kernel: blake2b_generic(E) xor(E) raid6_pq(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) hid_generic(E) usbmouse(E) rndis_host(E) usbhid(E) cdc_ether(E) hid(E) usbnet(E) mii(E) xhci_pci>
Mar 27 15:03:11 pve-test-hba kernel: CR2: ff76b049c31fe000
Mar 27 15:03:11 pve-test-hba kernel: ---[ end trace 0000000000000000 ]---
Mar 27 15:03:11 pve-test-hba kernel: RIP: 0010:_base_build_sg_scmd_ieee+0x478/0x590 [mpt3sas]
Mar 27 15:03:11 pve-test-hba kernel: Code: 20 48 83 c3 20 48 89 d1 48 83 e1 fc 83 e2 01 48 0f 45 d9 4c 8b 73 10 44 8b 63 18 4c 89 e9 4c 8d 69 08 44 85 e8 74 31 45 29 d7 <4c> 89 31 49 83 c1 08 41 83 c0 01 45 29 d4 45 85 ff 7f af 4c 8b 75
Mar 27 15:03:11 pve-test-hba kernel: RSP: 0018:ff76b049c170b8a8 EFLAGS: 00010206
Mar 27 15:03:11 pve-test-hba kernel: RAX: 0000000000000fff RBX: ff220bd05e8a0270 RCX: ff76b049c31fe000
Mar 27 15:03:11 pve-test-hba kernel: RDX: ff76b049c31fe008 RSI: 0000000000000000 RDI: 0000000000000000
Mar 27 15:03:11 pve-test-hba kernel: RBP: ff76b049c170b908 R08: 0000000000000200 R09: 00000000ff161000
Mar 27 15:03:11 pve-test-hba kernel: R10: 0000000000001000 R11: 0000000000001000 R12: 00000000001a0000
Mar 27 15:03:11 pve-test-hba kernel: R13: ff76b049c31fe008 R14: 00000000f9600000 R15: 000000000019f000
Mar 27 15:03:11 pve-test-hba kernel: FS: 0000000000000000(0000) GS:ff220bd3d1ee7000(0000) knlGS:0000000000000000
Mar 27 15:03:11 pve-test-hba kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 27 15:03:11 pve-test-hba kernel: CR2: ff76b049c31fe000 CR3: 0000000109f8c007 CR4: 0000000000f71ef0
Mar 27 15:03:11 pve-test-hba kernel: PKRU: 55555554
Mar 27 15:03:11 pve-test-hba kernel: note: dmcrypt_write/2[4385] exited with irqs disabled
Mar 27 15:03:11 pve-test-hba kernel: ------------[ cut here ]------------
Mar 27 15:03:11 pve-test-hba kernel: WARNING: CPU: 7 PID: 4385 at kernel/exit.c:902 do_exit+0x7d3/0xa30
Mar 27 15:03:11 pve-test-hba kernel: Modules linked in: dm_crypt(E) ebtable_filter(E) ebtables(E) ip_set(E) ip6table_raw(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) nf_tables(E) sunrpc(E) iptable_raw(E) xt_CT(E) iptable_nat(E) xt>
Mar 27 15:03:11 pve-test-hba kernel: blake2b_generic(E) xor(E) raid6_pq(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) hid_generic(E) usbmouse(E) rndis_host(E) usbhid(E) cdc_ether(E) hid(E) usbnet(E) mii(E) xhci_pci>
Mar 27 15:03:11 pve-test-hba kernel: CPU: 7 UID: 0 PID: 4385 Comm: dmcrypt_write/2 Tainted: G D E 6.16.0-rc4-step14-00001-g9b8b84879d4a #16 PREEMPT(voluntary)
Mar 27 15:03:11 pve-test-hba kernel: Tainted: [D]=DIE, [E]=UNSIGNED_MODULE
Mar 27 15:03:11 pve-test-hba kernel: Hardware name: <snip>
Mar 27 15:03:11 pve-test-hba kernel: RIP: 0010:do_exit+0x7d3/0xa30
Mar 27 15:03:11 pve-test-hba kernel: Code: 48 89 45 c0 48 8b 83 50 0d 00 00 e9 44 fe ff ff 48 8b bb 10 0b 00 00 31 f6 e8 69 e2 ff ff e9 f7 fd ff ff 0f 0b e9 6b f8 ff ff <0f> 0b e9 72 f8 ff ff 4c 89 e6 bf 05 06 00 00 e8 49 46 01 00 e9 ab
Mar 27 15:03:11 pve-test-hba kernel: RSP: 0018:ff76b049c170bec0 EFLAGS: 00010282
Mar 27 15:03:11 pve-test-hba kernel: RAX: 0000000000000246 RBX: ff220bd04a4e0000 RCX: 0000000000000000
Mar 27 15:03:11 pve-test-hba kernel: RDX: 000000000000270f RSI: 0000000000002710 RDI: 0000000000000009
Mar 27 15:03:11 pve-test-hba kernel: RBP: ff76b049c170bf10 R08: 0000000000000000 R09: 0000000000000000
Mar 27 15:03:11 pve-test-hba kernel: R10: 0000000000001000 R11: 0000000000001000 R12: 00000000001a0000
Mar 27 15:03:11 pve-test-hba kernel: R13: ff76b049c31fe008 R14: 00000000f9600000 R15: 000000000019f000
Mar 27 15:03:11 pve-test-hba kernel: FS: 0000000000000000(0000) GS:ff220bd3d1ee7000(0000) knlGS:0000000000000000
Mar 27 15:03:11 pve-test-hba kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 27 15:03:11 pve-test-hba kernel: CR2: ff76b049c31fe000 CR3: 0000000109f8c007 CR4: 0000000000f71ef0
Mar 27 15:03:11 pve-test-hba kernel: PKRU: 55555554
Mar 27 15:03:11 pve-test-hba kernel: note: dmcrypt_write/2[4385] exited with irqs disabled
Mar 27 15:03:11 pve-test-hba kernel: ------------[ cut here ]------------
Mar 27 15:03:11 pve-test-hba kernel: WARNING: CPU: 7 PID: 4385 at kernel/exit.c:902 do_exit+0x7d3/0xa30
Mar 27 15:03:11 pve-test-hba kernel: Modules linked in: dm_crypt(E) ebtable_filter(E) ebtables(E) ip_set(E) ip6table_raw(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) nf_tables(E) sunrpc(E) iptable_raw(E) xt_CT(E) iptable_nat(E) xt>
Mar 27 15:03:11 pve-test-hba kernel: blake2b_generic(E) xor(E) raid6_pq(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) hid_generic(E) usbmouse(E) rndis_host(E) usbhid(E) cdc_ether(E) hid(E) usbnet(E) mii(E) xhci_pci>
Mar 27 15:03:11 pve-test-hba kernel: CPU: 7 UID: 0 PID: 4385 Comm: dmcrypt_write/2 Tainted: G D E 6.16.0-rc4-step14-00001-g9b8b84879d4a #16 PREEMPT(voluntary)
Mar 27 15:03:11 pve-test-hba kernel: Tainted: [D]=DIE, [E]=UNSIGNED_MODULE
Mar 27 15:03:11 pve-test-hba kernel: Hardware name: <snip>
Mar 27 15:03:11 pve-test-hba kernel: RIP: 0010:do_exit+0x7d3/0xa30
Mar 27 15:03:11 pve-test-hba kernel: Code: 48 89 45 c0 48 8b 83 50 0d 00 00 e9 44 fe ff ff 48 8b bb 10 0b 00 00 31 f6 e8 69 e2 ff ff e9 f7 fd ff ff 0f 0b e9 6b f8 ff ff <0f> 0b e9 72 f8 ff ff 4c 89 e6 bf 05 06 00 00 e8 49 46 01 00 e9 ab
Mar 27 15:03:11 pve-test-hba kernel: RSP: 0018:ff76b049c170bec0 EFLAGS: 00010282
Mar 27 15:03:11 pve-test-hba kernel: RAX: 0000000000000246 RBX: ff220bd04a4e0000 RCX: 0000000000000000
Mar 27 15:03:11 pve-test-hba kernel: RDX: 000000000000270f RSI: 0000000000002710 RDI: 0000000000000009
Mar 27 15:03:11 pve-test-hba kernel: RBP: ff76b049c170bf10 R08: 0000000000000000 R09: 0000000000000000
Mar 27 15:03:11 pve-test-hba kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000009
Mar 27 15:03:11 pve-test-hba kernel: R13: ff220bd04a4e0000 R14: ff220bd04a4e0000 R15: 0000000000000000
Mar 27 15:03:11 pve-test-hba kernel: FS: 0000000000000000(0000) GS:ff220bd3d1ee7000(0000) knlGS:0000000000000000
Mar 27 15:03:11 pve-test-hba kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 27 15:03:11 pve-test-hba kernel: CR2: ff76b049c31fe000 CR3: 0000000109f8c007 CR4: 0000000000f71ef0
Mar 27 15:03:11 pve-test-hba kernel: PKRU: 55555554
Mar 27 15:03:11 pve-test-hba kernel: Call Trace:
Mar 27 15:03:11 pve-test-hba kernel: <TASK>
Mar 27 15:03:11 pve-test-hba kernel: make_task_dead+0x81/0x160
Mar 27 15:03:11 pve-test-hba kernel: rewind_stack_and_make_dead+0x16/0x20
Mar 27 15:03:11 pve-test-hba kernel: </TASK>
Mar 27 15:03:11 pve-test-hba kernel: ---[ end trace 0000000000000000 ]---
```

Please note that the kernel used here was the last build during
bisecting, having this patch as the last commit.
The stack trace looks similar in all tested (bad) versions.
We've also tested 7.0-rc5, which also triggered the issue.

The easiest way we found to trigger this was to create a Ceph OSD on the
disks. When they were started on boot, the error was triggered.

So far we are not sure if it's the Broadcom controller, or the disk that
is causing it in the end.

Since we saw the quirks added for certain devices [0][1], we also tried
changing the sector size on an unaffected kernel to 8191, 8192 and
16384, but could not trigger the issue.

Any ideas what could be the cause for this, or how to troubleshoot this
further?

Happy to provide any further information if needed.


Thanks,
Mira

[0]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2e983271363108b3813b38754eb96d9b1cb252bb
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5f64ae1ef639a2bab7e39497c55f76cc0682f108


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2026-03-31 12:02 ` Mira Limbeck
@ 2026-03-31 12:30   ` Mira Limbeck
  2026-03-31 19:48   ` Damien Le Moal
  1 sibling, 0 replies; 26+ messages in thread
From: Mira Limbeck @ 2026-03-31 12:30 UTC (permalink / raw)
  To: m.limbeck; +Cc: axboe, dlemoal, f.weber, hch, linux-block, martin.petersen

> Since we saw the quirks added for certain devices [0][1], we also tried
> changing the sector size on an unaffected kernel to 8191, 8192 and
> 16384, but could not trigger the issue.

Small correction here, we just tried it again on the other disk that is
encrypted and the issue was triggered with 16384.
Testing different values, the maximum seems to be somewhere between 1281
(works) and 2560 (crashes).

We are happy to gather the maximum value that still works if needed.

Thanks,
Mira

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2026-03-31 12:02 ` Mira Limbeck
  2026-03-31 12:30   ` Mira Limbeck
@ 2026-03-31 19:48   ` Damien Le Moal
  2026-04-01 10:32     ` Mira Limbeck
  1 sibling, 1 reply; 26+ messages in thread
From: Damien Le Moal @ 2026-03-31 19:48 UTC (permalink / raw)
  To: Mira Limbeck; +Cc: axboe, hch, linux-block, martin.petersen, Friedrich Weber

On 3/31/26 21:02, Mira Limbeck wrote:
> Hi,
> 
> 
> Some of our Proxmox VE users started seeing `unable to handle page
> fault` after switching to our downstream kernel 6.17, and after
> bisecting with the mainline kernel we've identified this patch as the
> first commit (9b8b84879d4adc506b0d3944e20b28d9f3f6994b) where we see
> those errors.

Please test with the current mainline kernel. We do not deal with non-standard
kernels.

> It requires a certain combination of hardware though, so far we've seen
> this with a combination of:
> Broadcom/LSI HBAs with NVMe support (9400, 9500)
> KIOXIA KCD8 NVMes
> 
> The Hardware of our test machine consists of:
> 
> 81:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI
> Fusion-MPT 12GSAS/PCIe Secure SAS38xx [1000:00e6]
> 	Subsystem: Broadcom / LSI 9500-16i Tri-Mode HBA [1000:4050]
> 	Kernel driver in use: mpt3sas
> 	Kernel modules: mpt3sas
> 
> FW Package Ver(37.00.00.00)
> SAS3816: FWVersion(37.00.00.00), ChipRevision(0x00)
> 
> and
> 
> Mar 31 10:52:44 pve-test-hba kernel: scsi 16:2:0:0: Direct-Access
> NVMe     KIOXIA KCD8XRUG7 0105 PQ: 0 ANSI: 6
> 
> The 4 NVMes are exposed as SCSI devices via the Broadcom controller.
> 
> ```
> Mar 27 15:03:11 pve-test-hba kernel: sd 3:2:2:0: [sdc] tag#2463 page boundary curr_buff: 0x000000002ce04d26
> Mar 27 15:03:11 pve-test-hba kernel: BUG: unable to handle page fault for address: ff76b049c31fe000
> Mar 27 15:03:11 pve-test-hba kernel: #PF: supervisor write access in kernel mode
> Mar 27 15:03:11 pve-test-hba kernel: #PF: error_code(0x0002) - not-present page
> Mar 27 15:03:11 pve-test-hba kernel: PGD 100010067 P4D 1008c7067 PUD 1008c8067 PMD 11cd9f067 PTE 0
> Mar 27 15:03:11 pve-test-hba kernel: Oops: Oops: 0002 [#1] SMP NOPTI
> Mar 27 15:03:11 pve-test-hba kernel: CPU: 7 UID: 0 PID: 4385 Comm: dmcrypt_write/2 Tainted: G E 6.16.0-rc4-step14-00001-g9b8b84879d4a #16 PREEMPT(voluntary)
> Mar 27 15:03:11 pve-test-hba kernel: Tainted: [E]=UNSIGNED_MODULE
> Mar 27 15:03:11 pve-test-hba kernel: Hardware name: <snip>
> Mar 27 15:03:11 pve-test-hba kernel: RIP: 0010:_base_build_sg_scmd_ieee+0x478/0x590 [mpt3sas]
> Mar 27 15:03:11 pve-test-hba kernel: Code: 20 48 83 c3 20 48 89 d1 48 83 e1 fc 83 e2 01 48 0f 45 d9 4c 8b 73 10 44 8b 63 18 4c 89 e9 4c 8d 69 08 44 85 e8 74 31 45 29 d7 <4c> 89 31 49 83 c1 08 41 83 c0 01 45 29 d4 45 85 ff 7f af 4c 8b 75
> Mar 27 15:03:11 pve-test-hba kernel: RSP: 0018:ff76b049c170b8a8 EFLAGS: 00010206
> Mar 27 15:03:11 pve-test-hba kernel: RAX: 0000000000000fff RBX: ff220bd05e8a0270 RCX: ff76b049c31fe000
> Mar 27 15:03:11 pve-test-hba kernel: RDX: ff76b049c31fe008 RSI: 0000000000000000 RDI: 0000000000000000
> Mar 27 15:03:11 pve-test-hba kernel: RBP: ff76b049c170b908 R08: 0000000000000200 R09: 00000000ff161000
> Mar 27 15:03:11 pve-test-hba kernel: R10: 0000000000001000 R11: 0000000000001000 R12: 00000000001a0000
> Mar 27 15:03:11 pve-test-hba kernel: R13: ff76b049c31fe008 R14: 00000000f9600000 R15: 000000000019f000
> Mar 27 15:03:11 pve-test-hba kernel: FS: 0000000000000000(0000) GS:ff220bd3d1ee7000(0000) knlGS:0000000000000000
> Mar 27 15:03:11 pve-test-hba kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Mar 27 15:03:11 pve-test-hba kernel: CR2: ff76b049c31fe000 CR3: 0000000109f8c007 CR4: 0000000000f71ef0
> Mar 27 15:03:11 pve-test-hba kernel: PKRU: 55555554
> Mar 27 15:03:11 pve-test-hba kernel: Call Trace:
> Mar 27 15:03:11 pve-test-hba kernel: <TASK>
> Mar 27 15:03:11 pve-test-hba kernel: scsih_qcmd+0x37c/0x620 [mpt3sas]
> Mar 27 15:03:11 pve-test-hba kernel: scsi_queue_rq+0x3ec/0xd30
> Mar 27 15:03:11 pve-test-hba kernel: blk_mq_dispatch_rq_list+0x118/0x740
> Mar 27 15:03:11 pve-test-hba kernel: ? sbitmap_get+0x73/0x180
> Mar 27 15:03:11 pve-test-hba kernel: ? sbitmap_get+0x73/0x180
> Mar 27 15:03:11 pve-test-hba kernel: __blk_mq_sched_dispatch_requests+0x3fc/0x5b0
> Mar 27 15:03:11 pve-test-hba kernel: ? elv_attempt_insert_merge+0xa6/0x100
> Mar 27 15:03:11 pve-test-hba kernel: blk_mq_sched_dispatch_requests+0x2d/0x70
> Mar 27 15:03:11 pve-test-hba kernel: blk_mq_run_hw_queue+0x250/0x340
> Mar 27 15:03:11 pve-test-hba kernel: blk_mq_dispatch_list+0x16c/0x450
> Mar 27 15:03:11 pve-test-hba kernel: blk_mq_flush_plug_list+0x62/0x1e0
> Mar 27 15:03:11 pve-test-hba kernel: blk_add_rq_to_plug+0xff/0x1f0
> Mar 27 15:03:11 pve-test-hba kernel: blk_mq_submit_bio+0x616/0x7e0
> Mar 27 15:03:11 pve-test-hba kernel: __submit_bio+0x74/0x290
> Mar 27 15:03:11 pve-test-hba kernel: submit_bio_noacct_nocheck+0x1a2/0x3b0
> Mar 27 15:03:11 pve-test-hba kernel: submit_bio_noacct+0x1a0/0x5b0
> Mar 27 15:03:11 pve-test-hba kernel: dm_submit_bio_remap+0x49/0xb0
> Mar 27 15:03:11 pve-test-hba kernel: dmcrypt_write+0x120/0x150 [dm_crypt]
> Mar 27 15:03:11 pve-test-hba kernel: ? __pfx_dmcrypt_write+0x10/0x10 [dm_crypt]
> Mar 27 15:03:11 pve-test-hba kernel: kthread+0x10a/0x230
> Mar 27 15:03:11 pve-test-hba kernel: ? __pfx_kthread+0x10/0x10
> Mar 27 15:03:11 pve-test-hba kernel: ret_from_fork+0x1d1/0x200
> Mar 27 15:03:11 pve-test-hba kernel: ? __pfx_kthread+0x10/0x10
> Mar 27 15:03:11 pve-test-hba kernel: ret_from_fork_asm+0x1a/0x30
> Mar 27 15:03:11 pve-test-hba kernel: </TASK>
> Mar 27 15:03:11 pve-test-hba kernel: Modules linked in: dm_crypt(E) ebtable_filter(E) ebtables(E) ip_set(E) ip6table_raw(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) nf_tables(E) sunrpc(E) iptable_raw(E) xt_CT(E) iptable_nat(E) xt>
> Mar 27 15:03:11 pve-test-hba kernel: blake2b_generic(E) xor(E) raid6_pq(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) hid_generic(E) usbmouse(E) rndis_host(E) usbhid(E) cdc_ether(E) hid(E) usbnet(E) mii(E) xhci_pci>
> Mar 27 15:03:11 pve-test-hba kernel: CR2: ff76b049c31fe000
> Mar 27 15:03:11 pve-test-hba kernel: ---[ end trace 0000000000000000 ]---
> Mar 27 15:03:11 pve-test-hba kernel: RIP: 0010:_base_build_sg_scmd_ieee+0x478/0x590 [mpt3sas]
> Mar 27 15:03:11 pve-test-hba kernel: Code: 20 48 83 c3 20 48 89 d1 48 83 e1 fc 83 e2 01 48 0f 45 d9 4c 8b 73 10 44 8b 63 18 4c 89 e9 4c 8d 69 08 44 85 e8 74 31 45 29 d7 <4c> 89 31 49 83 c1 08 41 83 c0 01 45 29 d4 45 85 ff 7f af 4c 8b 75
> Mar 27 15:03:11 pve-test-hba kernel: RSP: 0018:ff76b049c170b8a8 EFLAGS: 00010206
> Mar 27 15:03:11 pve-test-hba kernel: RAX: 0000000000000fff RBX: ff220bd05e8a0270 RCX: ff76b049c31fe000
> Mar 27 15:03:11 pve-test-hba kernel: RDX: ff76b049c31fe008 RSI: 0000000000000000 RDI: 0000000000000000
> Mar 27 15:03:11 pve-test-hba kernel: RBP: ff76b049c170b908 R08: 0000000000000200 R09: 00000000ff161000
> Mar 27 15:03:11 pve-test-hba kernel: R10: 0000000000001000 R11: 0000000000001000 R12: 00000000001a0000
> Mar 27 15:03:11 pve-test-hba kernel: R13: ff76b049c31fe008 R14: 00000000f9600000 R15: 000000000019f000
> Mar 27 15:03:11 pve-test-hba kernel: FS: 0000000000000000(0000) GS:ff220bd3d1ee7000(0000) knlGS:0000000000000000
> Mar 27 15:03:11 pve-test-hba kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Mar 27 15:03:11 pve-test-hba kernel: CR2: ff76b049c31fe000 CR3: 0000000109f8c007 CR4: 0000000000f71ef0
> Mar 27 15:03:11 pve-test-hba kernel: PKRU: 55555554
> Mar 27 15:03:11 pve-test-hba kernel: note: dmcrypt_write/2[4385] exited with irqs disabled
> Mar 27 15:03:11 pve-test-hba kernel: ------------[ cut here ]------------
> Mar 27 15:03:11 pve-test-hba kernel: WARNING: CPU: 7 PID: 4385 at kernel/exit.c:902 do_exit+0x7d3/0xa30
> Mar 27 15:03:11 pve-test-hba kernel: Modules linked in: dm_crypt(E) ebtable_filter(E) ebtables(E) ip_set(E) ip6table_raw(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) nf_tables(E) sunrpc(E) iptable_raw(E) xt_CT(E) iptable_nat(E) xt>
> Mar 27 15:03:11 pve-test-hba kernel: blake2b_generic(E) xor(E) raid6_pq(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) hid_generic(E) usbmouse(E) rndis_host(E) usbhid(E) cdc_ether(E) hid(E) usbnet(E) mii(E) xhci_pci>
> Mar 27 15:03:11 pve-test-hba kernel: CPU: 7 UID: 0 PID: 4385 Comm: dmcrypt_write/2 Tainted: G D E 6.16.0-rc4-step14-00001-g9b8b84879d4a #16 PREEMPT(voluntary)
> Mar 27 15:03:11 pve-test-hba kernel: Tainted: [D]=DIE, [E]=UNSIGNED_MODULE
> Mar 27 15:03:11 pve-test-hba kernel: Hardware name: <snip>
> Mar 27 15:03:11 pve-test-hba kernel: RIP: 0010:do_exit+0x7d3/0xa30
> Mar 27 15:03:11 pve-test-hba kernel: Code: 48 89 45 c0 48 8b 83 50 0d 00 00 e9 44 fe ff ff 48 8b bb 10 0b 00 00 31 f6 e8 69 e2 ff ff e9 f7 fd ff ff 0f 0b e9 6b f8 ff ff <0f> 0b e9 72 f8 ff ff 4c 89 e6 bf 05 06 00 00 e8 49 46 01 00 e9 ab
> Mar 27 15:03:11 pve-test-hba kernel: RSP: 0018:ff76b049c170bec0 EFLAGS: 00010282
> Mar 27 15:03:11 pve-test-hba kernel: RAX: 0000000000000246 RBX: ff220bd04a4e0000 RCX: 0000000000000000
> Mar 27 15:03:11 pve-test-hba kernel: RDX: 000000000000270f RSI: 0000000000002710 RDI: 0000000000000009
> Mar 27 15:03:11 pve-test-hba kernel: RBP: ff76b049c170bf10 R08: 0000000000000000 R09: 0000000000000000
> Mar 27 15:03:11 pve-test-hba kernel: R10: 0000000000001000 R11: 0000000000001000 R12: 00000000001a0000
> Mar 27 15:03:11 pve-test-hba kernel: R13: ff76b049c31fe008 R14: 00000000f9600000 R15: 000000000019f000
> Mar 27 15:03:11 pve-test-hba kernel: FS: 0000000000000000(0000) GS:ff220bd3d1ee7000(0000) knlGS:0000000000000000
> Mar 27 15:03:11 pve-test-hba kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Mar 27 15:03:11 pve-test-hba kernel: CR2: ff76b049c31fe000 CR3: 0000000109f8c007 CR4: 0000000000f71ef0
> Mar 27 15:03:11 pve-test-hba kernel: PKRU: 55555554
> Mar 27 15:03:11 pve-test-hba kernel: note: dmcrypt_write/2[4385] exited with irqs disabled
> Mar 27 15:03:11 pve-test-hba kernel: ------------[ cut here ]------------
> Mar 27 15:03:11 pve-test-hba kernel: WARNING: CPU: 7 PID: 4385 at kernel/exit.c:902 do_exit+0x7d3/0xa30
> Mar 27 15:03:11 pve-test-hba kernel: Modules linked in: dm_crypt(E) ebtable_filter(E) ebtables(E) ip_set(E) ip6table_raw(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) nf_tables(E) sunrpc(E) iptable_raw(E) xt_CT(E) iptable_nat(E) xt>
> Mar 27 15:03:11 pve-test-hba kernel: blake2b_generic(E) xor(E) raid6_pq(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) hid_generic(E) usbmouse(E) rndis_host(E) usbhid(E) cdc_ether(E) hid(E) usbnet(E) mii(E) xhci_pci>
> Mar 27 15:03:11 pve-test-hba kernel: CPU: 7 UID: 0 PID: 4385 Comm: dmcrypt_write/2 Tainted: G D E 6.16.0-rc4-step14-00001-g9b8b84879d4a #16 PREEMPT(voluntary)
> Mar 27 15:03:11 pve-test-hba kernel: Tainted: [D]=DIE, [E]=UNSIGNED_MODULE
> Mar 27 15:03:11 pve-test-hba kernel: Hardware name: <snip>
> Mar 27 15:03:11 pve-test-hba kernel: RIP: 0010:do_exit+0x7d3/0xa30
> Mar 27 15:03:11 pve-test-hba kernel: Code: 48 89 45 c0 48 8b 83 50 0d 00 00 e9 44 fe ff ff 48 8b bb 10 0b 00 00 31 f6 e8 69 e2 ff ff e9 f7 fd ff ff 0f 0b e9 6b f8 ff ff <0f> 0b e9 72 f8 ff ff 4c 89 e6 bf 05 06 00 00 e8 49 46 01 00 e9 ab
> Mar 27 15:03:11 pve-test-hba kernel: RSP: 0018:ff76b049c170bec0 EFLAGS: 00010282
> Mar 27 15:03:11 pve-test-hba kernel: RAX: 0000000000000246 RBX: ff220bd04a4e0000 RCX: 0000000000000000
> Mar 27 15:03:11 pve-test-hba kernel: RDX: 000000000000270f RSI: 0000000000002710 RDI: 0000000000000009
> Mar 27 15:03:11 pve-test-hba kernel: RBP: ff76b049c170bf10 R08: 0000000000000000 R09: 0000000000000000
> Mar 27 15:03:11 pve-test-hba kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000009
> Mar 27 15:03:11 pve-test-hba kernel: R13: ff220bd04a4e0000 R14: ff220bd04a4e0000 R15: 0000000000000000
> Mar 27 15:03:11 pve-test-hba kernel: FS: 0000000000000000(0000) GS:ff220bd3d1ee7000(0000) knlGS:0000000000000000
> Mar 27 15:03:11 pve-test-hba kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Mar 27 15:03:11 pve-test-hba kernel: CR2: ff76b049c31fe000 CR3: 0000000109f8c007 CR4: 0000000000f71ef0
> Mar 27 15:03:11 pve-test-hba kernel: PKRU: 55555554
> Mar 27 15:03:11 pve-test-hba kernel: Call Trace:
> Mar 27 15:03:11 pve-test-hba kernel: <TASK>
> Mar 27 15:03:11 pve-test-hba kernel: make_task_dead+0x81/0x160
> Mar 27 15:03:11 pve-test-hba kernel: rewind_stack_and_make_dead+0x16/0x20
> Mar 27 15:03:11 pve-test-hba kernel: </TASK>
> Mar 27 15:03:11 pve-test-hba kernel: ---[ end trace 0000000000000000 ]---
> ```
> 
> Please note that the kernel used here was the last build during
> bisecting, having this patch as the last commit.
> The stack trace looks similar in all tested (bad) versions.
> We've also tested 7.0-rc5, which also triggered the issue.
> 
> The easiest way we found to trigger this was to create a Ceph OSD on the
> disks. When they were started on boot, the error was triggered.
> 
> So far we are not sure if it's the Broadcom controller, or the disk that
> is causing it in the end.
> 
> Since we saw the quirks added for certain devices [0][1], we also tried
> changing the sector size on an unaffected kernel to 8191, 8192 and
> 16384, but could not trigger the issue.
> 
> Any ideas what could be the cause for this, or how to troubleshoot this
> further?
> 
> Happy to provide any further information if needed.
> 
> 
> Thanks,
> Mira
> 
> [0]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2e983271363108b3813b38754eb96d9b1cb252bb
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5f64ae1ef639a2bab7e39497c55f76cc0682f108
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2026-03-31 19:48   ` Damien Le Moal
@ 2026-04-01 10:32     ` Mira Limbeck
  2026-04-01 20:02       ` Damien Le Moal
  0 siblings, 1 reply; 26+ messages in thread
From: Mira Limbeck @ 2026-04-01 10:32 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: axboe, hch, linux-block, martin.petersen, Friedrich Weber

On 3/31/26 9:48 PM, Damien Le Moal wrote:
> On 3/31/26 21:02, Mira Limbeck wrote:
>> Hi,
>>
>>
>> Some of our Proxmox VE users started seeing `unable to handle page
>> fault` after switching to our downstream kernel 6.17, and after
>> bisecting with the mainline kernel we've identified this patch as the
>> first commit (9b8b84879d4adc506b0d3944e20b28d9f3f6994b) where we see
>> those errors.
> 
> Please test with the current mainline kernel. We do not deal with non-standard
> kernels.
> 

Sorry if I wasn't clear enough, we did test the following mainline
kernels without any downstream patches (git tags):
v6.16 (unaffected)
v6.17 (affected)
v7.0-rc5 (affected)

Afterwards we started to bisect between mainline 6.16
(038d61fd642278bab63ee8ef722c50d10ab01e8f) and mainline 6.17
(e5f0a698b34ed76002dc5cff3804a61c80233a7a) without any downstream
patches, which led us to this commit as the first bad one:
9b8b84879d4adc506b0d3944e20b28d9f3f6994b

Building our downstream kernel 6.17 with this commit reverted, fixed it.

To make sure that's also the case for the current mainline kernel, we've
tried 7.0-rc6 today (v7.0-rc6, 7aaa8047eafd0bd628065b15757d9b48c5f9c07d,
affected), and again with this commit reverted (unaffected).

Here the logs from 7.0-rc6:

Apr 01 11:41:19 pve-test-hba kernel: sd 9:2:2:0: [sdc] tag#3962 page boundary curr_buff: 0x00000000f4d7cfce
Apr 01 11:41:19 pve-test-hba kernel: BUG: unable to handle page fault for address: ff3a241243d70000
Apr 01 11:41:19 pve-test-hba kernel: #PF: supervisor write access in kernel mode
Apr 01 11:41:19 pve-test-hba kernel: #PF: error_code(0x0002) - not-present page
Apr 01 11:41:19 pve-test-hba kernel: PGD 100010067 P4D 10066d067 PUD 10066e067 PMD 11f0fa067 PTE 0
Apr 01 11:41:19 pve-test-hba kernel: Oops: Oops: 0002 [#3] SMP NOPTI
Apr 01 11:41:19 pve-test-hba kernel: CPU: 15 UID: 0 PID: 6695 Comm: vgs Tainted: G      D W   E       7.0.0-rc6 #19 PREEMPT(full)
Apr 01 11:41:19 pve-test-hba kernel: Tainted: [D]=DIE, [W]=WARN, [E]=UNSIGNED_MODULE
Apr 01 11:41:19 pve-test-hba kernel: Hardware name: <snip>
Apr 01 11:41:19 pve-test-hba kernel: RIP: 0010:_base_build_sg_scmd_ieee+0x478/0x590 [mpt3sas]
Apr 01 11:41:19 pve-test-hba kernel: Code: 20 48 83 c3 20 48 89 d1 48 83 e1 fc 83 e2 01 48 0f 45 d9 4c 8b 73 10 44 8b 63 18 4c 89 e9 4c 8d 69 08 44 85 e8 74 31 45 29 d7 <4c> 89 31 49 83 c1 08 41 83 c0 01 45 29 d4 45 85 ff 7f af 4c 8b 75
Apr 01 11:41:19 pve-test-hba kernel: RSP: 0018:ff3a2412497ff370 EFLAGS: 00010206
Apr 01 11:41:19 pve-test-hba kernel: RAX: 0000000000000fff RBX: ff1cd46a5ff2c578 RCX: ff3a241243d70000
Apr 01 11:41:19 pve-test-hba kernel: RDX: ff3a241243d70008 RSI: 0000000000000000 RDI: 0000000000000000
Apr 01 11:41:19 pve-test-hba kernel: RBP: ff3a2412497ff3d0 R08: 0000000000000200 R09: 00000000fea86000
Apr 01 11:41:19 pve-test-hba kernel: R10: 0000000000001000 R11: 0000000000001000 R12: 00000000000b0000
Apr 01 11:41:19 pve-test-hba kernel: R13: ff3a241243d70008 R14: 00000000fa200000 R15: 00000000000af000
Apr 01 11:41:19 pve-test-hba kernel: FS:  00007f3fbd181940(0000) GS:ff1cd46e06c7e000(0000) knlGS:0000000000000000
Apr 01 11:41:19 pve-test-hba kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 01 11:41:19 pve-test-hba kernel: CR2: ff3a241243d70000 CR3: 000000010c6d2006 CR4: 0000000000f71ef0
Apr 01 11:41:19 pve-test-hba kernel: PKRU: 55555554
Apr 01 11:41:19 pve-test-hba kernel: Call Trace:
Apr 01 11:41:19 pve-test-hba kernel:  <TASK>
Apr 01 11:41:19 pve-test-hba kernel:  scsih_qcmd+0x4a1/0x5f0 [mpt3sas]
Apr 01 11:41:19 pve-test-hba kernel:  scsi_queue_rq+0x71c/0xd10
Apr 01 11:41:19 pve-test-hba kernel:  blk_mq_dispatch_rq_list+0x11b/0x740
Apr 01 11:41:19 pve-test-hba kernel:  __blk_mq_sched_dispatch_requests+0xb4/0x5c0
Apr 01 11:41:19 pve-test-hba kernel:  ? elv_attempt_insert_merge+0xa6/0x100
Apr 01 11:41:19 pve-test-hba kernel:  blk_mq_sched_dispatch_requests+0x2d/0x70
Apr 01 11:41:19 pve-test-hba kernel:  blk_mq_run_hw_queue+0x250/0x340
Apr 01 11:41:19 pve-test-hba kernel:  blk_mq_dispatch_list+0x16f/0x470
Apr 01 11:41:19 pve-test-hba kernel:  blk_mq_flush_plug_list+0x62/0x1e0
Apr 01 11:41:19 pve-test-hba kernel:  __blk_flush_plug+0xde/0x140
Apr 01 11:41:19 pve-test-hba kernel:  __submit_bio+0x19b/0x260
Apr 01 11:41:19 pve-test-hba kernel:  submit_bio_noacct_nocheck+0x275/0x340
Apr 01 11:41:19 pve-test-hba kernel:  submit_bio_noacct+0x1eb/0x630
Apr 01 11:41:19 pve-test-hba kernel:  submit_bio+0xb1/0x110
Apr 01 11:41:19 pve-test-hba kernel:  blkdev_direct_IO+0x2fd/0x760
Apr 01 11:41:19 pve-test-hba kernel:  blkdev_read_iter+0xe5/0x180
Apr 01 11:41:19 pve-test-hba kernel:  aio_read+0x136/0x210
Apr 01 11:41:19 pve-test-hba kernel:  ? dput.part.0+0x39/0x120
Apr 01 11:41:19 pve-test-hba kernel:  io_submit_one+0x635/0x9c0
Apr 01 11:41:19 pve-test-hba kernel:  ? io_submit_one+0x635/0x9c0
Apr 01 11:41:19 pve-test-hba kernel:  ? path_openat+0x530/0x1290
Apr 01 11:41:19 pve-test-hba kernel:  __x64_sys_io_submit+0x90/0x1c0
Apr 01 11:41:19 pve-test-hba kernel:  x64_sys_call+0x175b/0x25e0
Apr 01 11:41:19 pve-test-hba kernel:  do_syscall_64+0xcb/0x14b0
Apr 01 11:41:19 pve-test-hba kernel:  ? __do_sys_newfstat+0x4c/0x80
Apr 01 11:41:19 pve-test-hba kernel:  ? __x64_sys_newfstat+0x15/0x20
Apr 01 11:41:19 pve-test-hba kernel:  ? x64_sys_call+0x1b1b/0x25e0
Apr 01 11:41:19 pve-test-hba kernel:  ? do_syscall_64+0x109/0x14b0
Apr 01 11:41:19 pve-test-hba kernel:  ? do_syscall_64+0x109/0x14b0
Apr 01 11:41:19 pve-test-hba kernel:  ? __x64_sys_newfstat+0x15/0x20
Apr 01 11:41:19 pve-test-hba kernel:  ? x64_sys_call+0x1b1b/0x25e0
Apr 01 11:41:19 pve-test-hba kernel:  ? do_syscall_64+0x109/0x14b0
Apr 01 11:41:19 pve-test-hba kernel:  ? handle_mm_fault+0xbb/0x380
Apr 01 11:41:19 pve-test-hba kernel:  ? do_user_addr_fault+0x2ec/0x880
Apr 01 11:41:19 pve-test-hba kernel:  ? irqentry_exit+0xb2/0x780
Apr 01 11:41:19 pve-test-hba kernel:  ? exc_page_fault+0x92/0x1c0
Apr 01 11:41:19 pve-test-hba kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Apr 01 11:41:19 pve-test-hba kernel: RIP: 0033:0x7f3fbd41e7b9
Apr 01 11:41:19 pve-test-hba kernel: Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 66 0d 00 f7 d8 64 89 01 48
Apr 01 11:41:19 pve-test-hba kernel: RSP: 002b:00007ffde2893c18 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
Apr 01 11:41:19 pve-test-hba kernel: RAX: ffffffffffffffda RBX: 00007f3fbd1816f0 RCX: 00007f3fbd41e7b9
Apr 01 11:41:19 pve-test-hba kernel: RDX: 00007ffde2893cc0 RSI: 0000000000000001 RDI: 00007f3fbd82f000
Apr 01 11:41:19 pve-test-hba kernel: RBP: 00007f3fbd82f000 R08: 0000596c2e67f000 R09: 0000000000000000
Apr 01 11:41:19 pve-test-hba kernel: R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000001
Apr 01 11:41:19 pve-test-hba kernel: R13: 0000000000000002 R14: 00007ffde2893cc0 R15: 0000596c12ab5e50
Apr 01 11:41:19 pve-test-hba kernel:  </TASK>
Apr 01 11:41:19 pve-test-hba kernel: Modules linked in: dm_crypt(E) nf_tables(E) sunrpc(E) bonding(E) tls(E) softdog(E) nfnetlink_log(E) binfmt_misc(E) ipmi_ssif(E) amd_atl(E) intel_rapl_msr(E) intel_rapl_common(E) amd64_edac(E) edac_mce_amd(E) kvm_amd(E) dax_hmem(E) cxl_acpi(E) kvm(E) cxl_port(E) irqbypass(E) cxl_pmem(E) cxl_core(E) ghash_clmulni_intel(E) joydev(E) acpi_ipmi(E) aesni_intel(E) ast(E) input_leds(E) fwctl(E) ses(E) i2c_algo_bit(E) enclosure(E) rapl(E) einj(E) ccp(E) pcspkr(E) ipmi_si(E) wmi_bmof(E) spd5118(E) ipmi_devintf(E) hsmp_acpi(E) k10temp(E) hsmp_common(E) ipmi_msghandler(E) mac_hid(E) sch_fq_codel(E) msr(E) vhost_net(E) vhost(E) nvme_fabrics(E) vhost_iotlb(E) tap(E) nvme_core(E) nvme_keyring(E) nvme_auth(E) hkdf(E) efi_pstore(E) nfnetlink(E) dmi_sysfs(E) autofs4(E) btrfs(E) libblake2b(E) xor(E) raid6_pq(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) hid_generic(E) usbmouse(E) usbhid(E) rndis_host(E) cdc_ether(E) hid(E) usbnet(E) mii(E) xhci_pci_renesas(E) mpt3sas(E) xhci_pci(E) raid_class(E)
Apr 01 11:41:19 pve-test-hba kernel:  ahci(E) tg3(E) scsi_transport_sas(E) libahci(E) xhci_hcd(E) i2c_piix4(E) i2c_smbus(E) wmi(E) 8250_dw(E)
Apr 01 11:41:19 pve-test-hba kernel: CR2: ff3a241243d70000
Apr 01 11:41:19 pve-test-hba kernel: ---[ end trace 0000000000000000 ]---
Apr 01 11:41:19 pve-test-hba kernel: RIP: 0010:_base_build_sg_scmd_ieee+0x478/0x590 [mpt3sas]
Apr 01 11:41:19 pve-test-hba kernel: Code: 20 48 83 c3 20 48 89 d1 48 83 e1 fc 83 e2 01 48 0f 45 d9 4c 8b 73 10 44 8b 63 18 4c 89 e9 4c 8d 69 08 44 85 e8 74 31 45 29 d7 <4c> 89 31 49 83 c1 08 41 83 c0 01 45 29 d4 45 85 ff 7f af 4c 8b 75
Apr 01 11:41:19 pve-test-hba kernel: RSP: 0018:ff3a241249717a68 EFLAGS: 00010206
Apr 01 11:41:19 pve-test-hba kernel: RAX: 0000000000000fff RBX: ff1cd46a5ff2c578 RCX: ff3a241243d68000
Apr 01 11:41:19 pve-test-hba kernel: RDX: ff3a241243d68008 RSI: 0000000000000000 RDI: 0000000000000000
Apr 01 11:41:19 pve-test-hba kernel: RBP: ff3a241249717ac8 R08: 0000000000000200 R09: 00000000fea8a000
Apr 01 11:41:19 pve-test-hba kernel: R10: 0000000000001000 R11: 0000000000001000 R12: 00000000000b0000
Apr 01 11:41:19 pve-test-hba kernel: R13: ff3a241243d68008 R14: 00000000fa200000 R15: 00000000000af000
Apr 01 11:41:19 pve-test-hba kernel: FS:  00007f3fbd181940(0000) GS:ff1cd46e06c7e000(0000) knlGS:0000000000000000
Apr 01 11:41:19 pve-test-hba kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 01 11:41:19 pve-test-hba kernel: CR2: ff3a241243d70000 CR3: 000000010c6d2006 CR4: 0000000000f71ef0
Apr 01 11:41:19 pve-test-hba kernel: PKRU: 55555554
Apr 01 11:41:19 pve-test-hba kernel: note: vgs[6695] exited with irqs disabled
Apr 01 11:41:19 pve-test-hba kernel: ------------[ cut here ]------------
Apr 01 11:41:19 pve-test-hba kernel: WARNING: kernel/exit.c:903 at do_exit+0x81a/0xa80, CPU#15: vgs/6695
Apr 01 11:41:19 pve-test-hba kernel: Modules linked in: dm_crypt(E) nf_tables(E) sunrpc(E) bonding(E) tls(E) softdog(E) nfnetlink_log(E) binfmt_misc(E) ipmi_ssif(E) amd_atl(E) intel_rapl_msr(E) intel_rapl_common(E) amd64_edac(E) edac_mce_amd(E) kvm_amd(E) dax_hmem(E) cxl_acpi(E) kvm(E) cxl_port(E) irqbypass(E) cxl_pmem(E) cxl_core(E) ghash_clmulni_intel(E) joydev(E) acpi_ipmi(E) aesni_intel(E) ast(E) input_leds(E) fwctl(E) ses(E) i2c_algo_bit(E) enclosure(E) rapl(E) einj(E) ccp(E) pcspkr(E) ipmi_si(E) wmi_bmof(E) spd5118(E) ipmi_devintf(E) hsmp_acpi(E) k10temp(E) hsmp_common(E) ipmi_msghandler(E) mac_hid(E) sch_fq_codel(E) msr(E) vhost_net(E) vhost(E) nvme_fabrics(E) vhost_iotlb(E) tap(E) nvme_core(E) nvme_keyring(E) nvme_auth(E) hkdf(E) efi_pstore(E) nfnetlink(E) dmi_sysfs(E) autofs4(E) btrfs(E) libblake2b(E) xor(E) raid6_pq(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) hid_generic(E) usbmouse(E) usbhid(E) rndis_host(E) cdc_ether(E) hid(E) usbnet(E) mii(E) xhci_pci_renesas(E) mpt3sas(E) xhci_pci(E) raid_class(E)
Apr 01 11:41:19 pve-test-hba kernel:  ahci(E) tg3(E) scsi_transport_sas(E) libahci(E) xhci_hcd(E) i2c_piix4(E) i2c_smbus(E) wmi(E) 8250_dw(E)
Apr 01 11:41:19 pve-test-hba kernel: CPU: 15 UID: 0 PID: 6695 Comm: vgs Tainted: G      D W   E       7.0.0-rc6 #19 PREEMPT(full)
Apr 01 11:41:19 pve-test-hba kernel: Tainted: [D]=DIE, [W]=WARN, [E]=UNSIGNED_MODULE
Apr 01 11:41:19 pve-test-hba kernel: Hardware name: <snip>
Apr 01 11:41:19 pve-test-hba kernel: RIP: 0010:do_exit+0x81a/0xa80
Apr 01 11:41:19 pve-test-hba kernel: Code: 4c 89 ab 50 0b 00 00 48 89 45 c0 48 8b 83 78 0d 00 00 e9 2b fe ff ff 48 8b bb 30 0b 00 00 31 f6 e8 3b e2 ff ff e9 de fd ff ff <0f> 0b e9 2b f8 ff ff 4c 89 e6 bf 05 06 00 00 e8 d2 44 01 00 e9 7a
Apr 01 11:41:19 pve-test-hba kernel: RSP: 0018:ff3a2412497ffec0 EFLAGS: 00010286
Apr 01 11:41:19 pve-test-hba kernel: RAX: 0000000000000246 RBX: ff1cd46a581f3100 RCX: 0000000000000000
Apr 01 11:41:19 pve-test-hba kernel: RDX: 000000000000270f RSI: 0000000000002710 RDI: 0000000000000009
Apr 01 11:41:19 pve-test-hba kernel: RBP: ff3a2412497fff10 R08: 0000000000000000 R09: 0000000000000000
Apr 01 11:41:19 pve-test-hba kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000009
Apr 01 11:41:19 pve-test-hba kernel: R13: ff1cd46a581f3100 R14: ff1cd46a581f3100 R15: 0000000000000000
Apr 01 11:41:19 pve-test-hba kernel: FS:  00007f3fbd181940(0000) GS:ff1cd46e06c7e000(0000) knlGS:0000000000000000
Apr 01 11:41:19 pve-test-hba kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 01 11:41:19 pve-test-hba kernel: CR2: ff3a241243d70000 CR3: 000000010c6d2006 CR4: 0000000000f71ef0
Apr 01 11:41:19 pve-test-hba kernel: PKRU: 55555554
Apr 01 11:41:19 pve-test-hba kernel: Call Trace:
Apr 01 11:41:19 pve-test-hba kernel:  <TASK>
Apr 01 11:41:19 pve-test-hba kernel:  make_task_dead+0x81/0x160
Apr 01 11:41:19 pve-test-hba kernel:  rewind_stack_and_make_dead+0x16/0x20
Apr 01 11:41:19 pve-test-hba kernel: RIP: 0033:0x7f3fbd41e7b9
Apr 01 11:41:19 pve-test-hba kernel: Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 66 0d 00 f7 d8 64 89 01 48
Apr 01 11:41:19 pve-test-hba kernel: RSP: 002b:00007ffde2893c18 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
Apr 01 11:41:19 pve-test-hba kernel: RAX: ffffffffffffffda RBX: 00007f3fbd1816f0 RCX: 00007f3fbd41e7b9
Apr 01 11:41:19 pve-test-hba kernel: RDX: 00007ffde2893cc0 RSI: 0000000000000001 RDI: 00007f3fbd82f000
Apr 01 11:41:19 pve-test-hba kernel: RBP: 00007f3fbd82f000 R08: 0000596c2e67f000 R09: 0000000000000000
Apr 01 11:41:19 pve-test-hba kernel: R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000001
Apr 01 11:41:19 pve-test-hba kernel: R13: 0000000000000002 R14: 00007ffde2893cc0 R15: 0000596c12ab5e50
Apr 01 11:41:19 pve-test-hba kernel:  </TASK>
Apr 01 11:41:19 pve-test-hba kernel: ---[ end trace 0000000000000000 ]---
Apr 01 11:41:19 pve-test-hba kernel: BUG: kernel NULL pointer dereference, address: 0000000000000300
Apr 01 11:41:19 pve-test-hba kernel: #PF: supervisor write access in kernel mode
Apr 01 11:41:19 pve-test-hba kernel: #PF: error_code(0x0002) - not-present page
Apr 01 11:41:19 pve-test-hba kernel: PGD 11f55f067 P4D 0
Apr 01 11:41:19 pve-test-hba kernel: Oops: Oops: 0002 [#4] SMP NOPTI
Apr 01 11:41:19 pve-test-hba kernel: CPU: 15 UID: 0 PID: 6695 Comm: vgs Tainted: G      D W   E       7.0.0-rc6 #19 PREEMPT(full)
Apr 01 11:41:19 pve-test-hba kernel: Tainted: [D]=DIE, [W]=WARN, [E]=UNSIGNED_MODULE
Apr 01 11:41:19 pve-test-hba kernel: Hardware name: <snip>
Apr 01 11:41:19 pve-test-hba kernel: RIP: 0010:__blk_flush_plug+0x8a/0x140
Apr 01 11:41:19 pve-test-hba kernel: Code: 48 89 5d c8 48 39 c1 74 6a 49 8b 47 30 48 8b 75 b8 48 39 c6 74 4a 49 8b 4f 30 49 8b 57 38 48 8b 45 c0 48 89 59 08 48 89 4d c0 <48> 89 02 48 89 50 08 49 89 77 30 49 89 77 38 eb 25 48 8b 7d c0 44
Apr 01 11:41:19 pve-test-hba kernel: RSP: 0018:ff3a2412497ffce8 EFLAGS: 00010206
Apr 01 11:41:19 pve-test-hba kernel: RAX: ff3a2412497ffcf0 RBX: ff3a2412497ffcf0 RCX: ff1cd46a42ef3a00
Apr 01 11:41:19 pve-test-hba kernel: RDX: 0000000000000300 RSI: ff3a2412497ff760 RDI: ff3a2412497ff730
Apr 01 11:41:19 pve-test-hba kernel: RBP: ff3a2412497ffd30 R08: 0000000000000000 R09: 0000000000000000
Apr 01 11:41:19 pve-test-hba kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
Apr 01 11:41:19 pve-test-hba kernel: R13: dead000000000122 R14: dead000000000100 R15: ff3a2412497ff730
Apr 01 11:41:19 pve-test-hba kernel: FS:  0000000000000000(0000) GS:ff1cd46e06c7e000(0000) knlGS:0000000000000000
Apr 01 11:41:19 pve-test-hba kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 01 11:41:19 pve-test-hba kernel: CR2: 0000000000000300 CR3: 000000010c6d2006 CR4: 0000000000f71ef0
Apr 01 11:41:19 pve-test-hba kernel: PKRU: 55555554
Apr 01 11:41:19 pve-test-hba kernel: Call Trace:
Apr 01 11:41:19 pve-test-hba kernel:  <TASK>
Apr 01 11:41:19 pve-test-hba kernel:  schedule+0xa3/0xf0
Apr 01 11:41:19 pve-test-hba kernel:  schedule_timeout+0x104/0x110
Apr 01 11:41:19 pve-test-hba kernel:  __wait_for_common+0x98/0x180
Apr 01 11:41:19 pve-test-hba kernel:  ? __pfx_schedule_timeout+0x10/0x10
Apr 01 11:41:19 pve-test-hba kernel:  wait_for_completion+0x24/0x40
Apr 01 11:41:19 pve-test-hba kernel:  exit_aio+0x116/0x120
Apr 01 11:41:19 pve-test-hba kernel:  __mmput+0x15/0x120
Apr 01 11:41:19 pve-test-hba kernel:  mmput+0x31/0x40
Apr 01 11:41:19 pve-test-hba kernel:  do_exit+0x28c/0xa80
Apr 01 11:41:19 pve-test-hba kernel:  make_task_dead+0x81/0x160
Apr 01 11:41:19 pve-test-hba kernel:  rewind_stack_and_make_dead+0x16/0x20
Apr 01 11:41:19 pve-test-hba kernel: RIP: 0033:0x7f3fbd41e7b9
Apr 01 11:41:19 pve-test-hba kernel: Code: Unable to access opcode bytes at 0x7f3fbd41e78f.
Apr 01 11:41:19 pve-test-hba kernel: RSP: 002b:00007ffde2893c18 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
Apr 01 11:41:19 pve-test-hba kernel: RAX: ffffffffffffffda RBX: 00007f3fbd1816f0 RCX: 00007f3fbd41e7b9
Apr 01 11:41:19 pve-test-hba kernel: RDX: 00007ffde2893cc0 RSI: 0000000000000001 RDI: 00007f3fbd82f000
Apr 01 11:41:19 pve-test-hba kernel: RBP: 00007f3fbd82f000 R08: 0000596c2e67f000 R09: 0000000000000000
Apr 01 11:41:19 pve-test-hba kernel: R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000001
Apr 01 11:41:19 pve-test-hba kernel: R13: 0000000000000002 R14: 00007ffde2893cc0 R15: 0000596c12ab5e50
Apr 01 11:41:19 pve-test-hba kernel:  </TASK>
Apr 01 11:41:19 pve-test-hba kernel: Modules linked in: dm_crypt(E) nf_tables(E) sunrpc(E) bonding(E) tls(E) softdog(E) nfnetlink_log(E) binfmt_misc(E) ipmi_ssif(E) amd_atl(E) intel_rapl_msr(E) intel_rapl_common(E) amd64_edac(E) edac_mce_amd(E) kvm_amd(E) dax_hmem(E) cxl_acpi(E) kvm(E) cxl_port(E) irqbypass(E) cxl_pmem(E) cxl_core(E) ghash_clmulni_intel(E) joydev(E) acpi_ipmi(E) aesni_intel(E) ast(E) input_leds(E) fwctl(E) ses(E) i2c_algo_bit(E) enclosure(E) rapl(E) einj(E) ccp(E) pcspkr(E) ipmi_si(E) wmi_bmof(E) spd5118(E) ipmi_devintf(E) hsmp_acpi(E) k10temp(E) hsmp_common(E) ipmi_msghandler(E) mac_hid(E) sch_fq_codel(E) msr(E) vhost_net(E) vhost(E) nvme_fabrics(E) vhost_iotlb(E) tap(E) nvme_core(E) nvme_keyring(E) nvme_auth(E) hkdf(E) efi_pstore(E) nfnetlink(E) dmi_sysfs(E) autofs4(E) btrfs(E) libblake2b(E) xor(E) raid6_pq(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) hid_generic(E) usbmouse(E) usbhid(E) rndis_host(E) cdc_ether(E) hid(E) usbnet(E) mii(E) xhci_pci_renesas(E) mpt3sas(E) xhci_pci(E) raid_class(E)
Apr 01 11:41:19 pve-test-hba kernel:  ahci(E) tg3(E) scsi_transport_sas(E) libahci(E) xhci_hcd(E) i2c_piix4(E) i2c_smbus(E) wmi(E) 8250_dw(E)
Apr 01 11:41:19 pve-test-hba kernel: CR2: 0000000000000300
Apr 01 11:41:19 pve-test-hba kernel: ---[ end trace 0000000000000000 ]---
Apr 01 11:41:19 pve-test-hba kernel: RIP: 0010:_base_build_sg_scmd_ieee+0x478/0x590 [mpt3sas]
Apr 01 11:41:19 pve-test-hba kernel: Code: 20 48 83 c3 20 48 89 d1 48 83 e1 fc 83 e2 01 48 0f 45 d9 4c 8b 73 10 44 8b 63 18 4c 89 e9 4c 8d 69 08 44 85 e8 74 31 45 29 d7 <4c> 89 31 49 83 c1 08 41 83 c0 01 45 29 d4 45 85 ff 7f af 4c 8b 75
Apr 01 11:41:19 pve-test-hba kernel: RSP: 0018:ff3a241249717a68 EFLAGS: 00010206
Apr 01 11:41:19 pve-test-hba kernel: RAX: 0000000000000fff RBX: ff1cd46a5ff2c578 RCX: ff3a241243d68000
Apr 01 11:41:19 pve-test-hba kernel: RDX: ff3a241243d68008 RSI: 0000000000000000 RDI: 0000000000000000
Apr 01 11:41:19 pve-test-hba kernel: RBP: ff3a241249717ac8 R08: 0000000000000200 R09: 00000000fea8a000
Apr 01 11:41:19 pve-test-hba kernel: R10: 0000000000001000 R11: 0000000000001000 R12: 00000000000b0000
Apr 01 11:41:19 pve-test-hba kernel: R13: ff3a241243d68008 R14: 00000000fa200000 R15: 00000000000af000
Apr 01 11:41:19 pve-test-hba kernel: FS:  0000000000000000(0000) GS:ff1cd46e06c7e000(0000) knlGS:0000000000000000
Apr 01 11:41:19 pve-test-hba kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 01 11:41:19 pve-test-hba kernel: CR2: 0000000000000300 CR3: 000000010c6d2006 CR4: 0000000000f71ef0
Apr 01 11:41:19 pve-test-hba kernel: PKRU: 55555554
Apr 01 11:41:19 pve-test-hba kernel: note: vgs[6695] exited with irqs disabled
Apr 01 11:41:19 pve-test-hba kernel: Fixing recursive fault but reboot is needed!
Apr 01 11:41:19 pve-test-hba kernel: BUG: scheduling while atomic: vgs/6695/0x00000000
Apr 01 11:41:19 pve-test-hba kernel: Modules linked in: dm_crypt(E) nf_tables(E) sunrpc(E) bonding(E) tls(E) softdog(E) nfnetlink_log(E) binfmt_misc(E) ipmi_ssif(E) amd_atl(E) intel_rapl_msr(E) intel_rapl_common(E) amd64_edac(E) edac_mce_amd(E) kvm_amd(E) dax_hmem(E) cxl_acpi(E) kvm(E) cxl_port(E) irqbypass(E) cxl_pmem(E) cxl_core(E) ghash_clmulni_intel(E) joydev(E) acpi_ipmi(E) aesni_intel(E) ast(E) input_leds(E) fwctl(E) ses(E) i2c_algo_bit(E) enclosure(E) rapl(E) einj(E) ccp(E) pcspkr(E) ipmi_si(E) wmi_bmof(E) spd5118(E) ipmi_devintf(E) hsmp_acpi(E) k10temp(E) hsmp_common(E) ipmi_msghandler(E) mac_hid(E) sch_fq_codel(E) msr(E) vhost_net(E) vhost(E) nvme_fabrics(E) vhost_iotlb(E) tap(E) nvme_core(E) nvme_keyring(E) nvme_auth(E) hkdf(E) efi_pstore(E) nfnetlink(E) dmi_sysfs(E) autofs4(E) btrfs(E) libblake2b(E) xor(E) raid6_pq(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) hid_generic(E) usbmouse(E) usbhid(E) rndis_host(E) cdc_ether(E) hid(E) usbnet(E) mii(E) xhci_pci_renesas(E) mpt3sas(E) xhci_pci(E) raid_class(E)
Apr 01 11:41:19 pve-test-hba kernel:  ahci(E) tg3(E) scsi_transport_sas(E) libahci(E) xhci_hcd(E) i2c_piix4(E) i2c_smbus(E) wmi(E) 8250_dw(E)
Apr 01 11:41:19 pve-test-hba kernel: CPU: 15 UID: 0 PID: 6695 Comm: vgs Tainted: G      D W   E       7.0.0-rc6 #19 PREEMPT(full)
Apr 01 11:41:19 pve-test-hba kernel: Tainted: [D]=DIE, [W]=WARN, [E]=UNSIGNED_MODULE
Apr 01 11:41:19 pve-test-hba kernel: Hardware name: <snip>
Apr 01 11:41:19 pve-test-hba kernel: Call Trace:
Apr 01 11:41:19 pve-test-hba kernel:  <TASK>
Apr 01 11:41:19 pve-test-hba kernel:  dump_stack_lvl+0x76/0xa0
Apr 01 11:41:19 pve-test-hba kernel:  dump_stack+0x10/0x20
Apr 01 11:41:19 pve-test-hba kernel:  __schedule_bug+0x64/0x80
Apr 01 11:41:19 pve-test-hba kernel:  __schedule+0x1168/0x1760
Apr 01 11:41:19 pve-test-hba kernel:  ? vprintk_default+0x1d/0x30
Apr 01 11:41:19 pve-test-hba kernel:  do_task_dead+0x4c/0xb0
Apr 01 11:41:19 pve-test-hba kernel:  make_task_dead+0x142/0x160
Apr 01 11:41:19 pve-test-hba kernel:  rewind_stack_and_make_dead+0x16/0x20
Apr 01 11:41:19 pve-test-hba kernel: RIP: 0033:0x7f3fbd41e7b9
Apr 01 11:41:19 pve-test-hba kernel: Code: Unable to access opcode bytes at 0x7f3fbd41e78f.
Apr 01 11:41:19 pve-test-hba kernel: RSP: 002b:00007ffde2893c18 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
Apr 01 11:41:19 pve-test-hba kernel: RAX: ffffffffffffffda RBX: 00007f3fbd1816f0 RCX: 00007f3fbd41e7b9
Apr 01 11:41:19 pve-test-hba kernel: RDX: 00007ffde2893cc0 RSI: 0000000000000001 RDI: 00007f3fbd82f000
Apr 01 11:41:19 pve-test-hba kernel: RBP: 00007f3fbd82f000 R08: 0000596c2e67f000 R09: 0000000000000000
Apr 01 11:41:19 pve-test-hba kernel: R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000001
Apr 01 11:41:19 pve-test-hba kernel: R13: 0000000000000002 R14: 00007ffde2893cc0 R15: 0000596c12ab5e50
Apr 01 11:41:19 pve-test-hba kernel:  </TASK>
Apr 01 11:41:19 pve-test-hba kernel: ------------[ cut here ]------------
Apr 01 11:41:19 pve-test-hba kernel: Voluntary context switch within RCU read-side critical section!
Apr 01 11:41:19 pve-test-hba kernel: WARNING: kernel/rcu/tree_plugin.h:332 at rcu_note_context_switch+0x4e/0x530, CPU#15: vgs/6695
Apr 01 11:41:19 pve-test-hba kernel: Modules linked in: dm_crypt(E) nf_tables(E) sunrpc(E) bonding(E) tls(E) softdog(E) nfnetlink_log(E) binfmt_misc(E) ipmi_ssif(E) amd_atl(E) intel_rapl_msr(E) intel_rapl_common(E) amd64_edac(E) edac_mce_amd(E) kvm_amd(E) dax_hmem(E) cxl_acpi(E) kvm(E) cxl_port(E) irqbypass(E) cxl_pmem(E) cxl_core(E) ghash_clmulni_intel(E) joydev(E) acpi_ipmi(E) aesni_intel(E) ast(E) input_leds(E) fwctl(E) ses(E) i2c_algo_bit(E) enclosure(E) rapl(E) einj(E) ccp(E) pcspkr(E) ipmi_si(E) wmi_bmof(E) spd5118(E) ipmi_devintf(E) hsmp_acpi(E) k10temp(E) hsmp_common(E) ipmi_msghandler(E) mac_hid(E) sch_fq_codel(E) msr(E) vhost_net(E) vhost(E) nvme_fabrics(E) vhost_iotlb(E) tap(E) nvme_core(E) nvme_keyring(E) nvme_auth(E) hkdf(E) efi_pstore(E) nfnetlink(E) dmi_sysfs(E) autofs4(E) btrfs(E) libblake2b(E) xor(E) raid6_pq(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) hid_generic(E) usbmouse(E) usbhid(E) rndis_host(E) cdc_ether(E) hid(E) usbnet(E) mii(E) xhci_pci_renesas(E) mpt3sas(E) xhci_pci(E) raid_class(E)
Apr 01 11:41:19 pve-test-hba kernel:  ahci(E) tg3(E) scsi_transport_sas(E) libahci(E) xhci_hcd(E) i2c_piix4(E) i2c_smbus(E) wmi(E) 8250_dw(E)
Apr 01 11:41:19 pve-test-hba kernel: CPU: 15 UID: 0 PID: 6695 Comm: vgs Tainted: G      D W   E       7.0.0-rc6 #19 PREEMPT(full)
Apr 01 11:41:19 pve-test-hba kernel: Tainted: [D]=DIE, [W]=WARN, [E]=UNSIGNED_MODULE
Apr 01 11:41:19 pve-test-hba kernel: Hardware name: <snip>
Apr 01 11:41:19 pve-test-hba kernel: RIP: 0010:rcu_note_context_switch+0x4e/0x530
Apr 01 11:41:19 pve-test-hba kernel: Code: 65 48 8b 1d fc c2 ab 02 4c 8d a0 80 65 b3 86 0f 1f 44 00 00 45 84 ed 75 16 8b 83 7c 09 00 00 85 c0 7e 0c 48 8d 3d 02 78 5a 02 <67> 48 0f b9 3a 8b 83 7c 09 00 00 85 c0 7e 09 80 bb 80 09 00 00 00
Apr 01 11:41:19 pve-test-hba kernel: RSP: 0018:ff3a2412497ffe08 EFLAGS: 00010002
Apr 01 11:41:19 pve-test-hba kernel: RAX: 0000000000000001 RBX: ff1cd46a581f3100 RCX: 0000000000000000
Apr 01 11:41:19 pve-test-hba kernel: RDX: 0000000080000001 RSI: ffffffff8591fa97 RDI: ffffffff86605540
Apr 01 11:41:19 pve-test-hba kernel: RBP: ff3a2412497ffe30 R08: 0000000000000000 R09: 0000000000000000
Apr 01 11:41:19 pve-test-hba kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ff1cd46d8d7b4580
Apr 01 11:41:19 pve-test-hba kernel: R13: 0000000000000000 R14: ff1cd46a581f3100 R15: 0000000000000000
Apr 01 11:41:19 pve-test-hba kernel: FS:  0000000000000000(0000) GS:ff1cd46e06c7e000(0000) knlGS:0000000000000000
Apr 01 11:41:19 pve-test-hba kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 01 11:41:19 pve-test-hba kernel: CR2: 0000000000000300 CR3: 000000010c6d2006 CR4: 0000000000f71ef0
Apr 01 11:41:19 pve-test-hba kernel: PKRU: 55555554
Apr 01 11:41:19 pve-test-hba kernel: Call Trace:
Apr 01 11:41:19 pve-test-hba kernel:  <TASK>
Apr 01 11:41:19 pve-test-hba kernel:  __schedule+0xda/0x1760
Apr 01 11:41:19 pve-test-hba kernel:  ? vprintk_default+0x1d/0x30
Apr 01 11:41:19 pve-test-hba kernel:  do_task_dead+0x4c/0xb0
Apr 01 11:41:19 pve-test-hba kernel:  make_task_dead+0x142/0x160
Apr 01 11:41:19 pve-test-hba kernel:  rewind_stack_and_make_dead+0x16/0x20
Apr 01 11:41:19 pve-test-hba kernel: RIP: 0033:0x7f3fbd41e7b9
Apr 01 11:41:19 pve-test-hba kernel: Code: Unable to access opcode bytes at 0x7f3fbd41e78f.
Apr 01 11:41:19 pve-test-hba kernel: RSP: 002b:00007ffde2893c18 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
Apr 01 11:41:19 pve-test-hba kernel: RAX: ffffffffffffffda RBX: 00007f3fbd1816f0 RCX: 00007f3fbd41e7b9
Apr 01 11:41:19 pve-test-hba kernel: RDX: 00007ffde2893cc0 RSI: 0000000000000001 RDI: 00007f3fbd82f000
Apr 01 11:41:19 pve-test-hba kernel: RBP: 00007f3fbd82f000 R08: 0000596c2e67f000 R09: 0000000000000000
Apr 01 11:41:19 pve-test-hba kernel: R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000001
Apr 01 11:41:19 pve-test-hba kernel: R13: 0000000000000002 R14: 00007ffde2893cc0 R15: 0000596c12ab5e50
Apr 01 11:41:19 pve-test-hba kernel:  </TASK>
Apr 01 11:41:19 pve-test-hba kernel: ---[ end trace 0000000000000000 ]---


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2026-04-01 10:32     ` Mira Limbeck
@ 2026-04-01 20:02       ` Damien Le Moal
  2026-04-01 20:55         ` Keith Busch
  2026-04-02 14:33         ` Friedrich Weber
  0 siblings, 2 replies; 26+ messages in thread
From: Damien Le Moal @ 2026-04-01 20:02 UTC (permalink / raw)
  To: Mira Limbeck; +Cc: axboe, hch, linux-block, martin.petersen, Friedrich Weber

On 4/1/26 19:32, Mira Limbeck wrote:
> Sorry if I wasn't clear enough, we did test the following mainline
> kernels without any downstream patches (git tags):
> v6.16 (unaffected)
> v6.17 (affected)
> v7.0-rc5 (affected)
> 
> Afterwards we started to bisect between mainline 6.16
> (038d61fd642278bab63ee8ef722c50d10ab01e8f) and mainline 6.17
> (e5f0a698b34ed76002dc5cff3804a61c80233a7a) without any downstream
> patches, which led us to this commit as the first bad one:
> 9b8b84879d4adc506b0d3944e20b28d9f3f6994b

Note: the proper way to reference a patch is to use 12-digits commit ID and
patch title:

9b8b84879d4a ("block: Increase BLK_DEF_MAX_SECTORS_CAP")

as that make it easier to know what one is talking about without having to go
look what patch that ID references.

> Building our downstream kernel 6.17 with this commit reverted, fixed it.

Nope, this is likely not fixing anything but rather hiding the issue. With this
patch reverted, the default max_sectors_kb will be 1280, so all requests will be
chunked to that size at most, and your devices will not see large commands.
However, simply doing something like:

echo 4096 > /sys/block/<dev>/queue/max_sectors_kb

will put your system in a state that is equivalent to the patch being applied
and you will likely see the issue again. Try.

> To make sure that's also the case for the current mainline kernel, we've
> tried 7.0-rc6 today (v7.0-rc6, 7aaa8047eafd0bd628065b15757d9b48c5f9c07d,
> affected), and again with this commit reverted (unaffected).

Same comment as above.

> Here the logs from 7.0-rc6:
> 
> Apr 01 11:41:19 pve-test-hba kernel: sd 9:2:2:0: [sdc] tag#3962 page boundary curr_buff: 0x00000000f4d7cfce
> Apr 01 11:41:19 pve-test-hba kernel: BUG: unable to handle page fault for address: ff3a241243d70000
> Apr 01 11:41:19 pve-test-hba kernel: #PF: supervisor write access in kernel mode
> Apr 01 11:41:19 pve-test-hba kernel: #PF: error_code(0x0002) - not-present page
> Apr 01 11:41:19 pve-test-hba kernel: PGD 100010067 P4D 10066d067 PUD 10066e067 PMD 11f0fa067 PTE 0
> Apr 01 11:41:19 pve-test-hba kernel: Oops: Oops: 0002 [#3] SMP NOPTI
> Apr 01 11:41:19 pve-test-hba kernel: CPU: 15 UID: 0 PID: 6695 Comm: vgs Tainted: G      D W   E       7.0.0-rc6 #19 PREEMPT(full)
> Apr 01 11:41:19 pve-test-hba kernel: Tainted: [D]=DIE, [W]=WARN, [E]=UNSIGNED_MODULE
> Apr 01 11:41:19 pve-test-hba kernel: Hardware name: <snip>
> Apr 01 11:41:19 pve-test-hba kernel: RIP: 0010:_base_build_sg_scmd_ieee+0x478/0x590 [mpt3sas]

There may be an issue with the mpt3sas driver with large commands.

However, I am using that driver all day long and doing lots of testing with
gigantic read/write commands all the time. I have never seen any issues.
The difference is that I am using the SAS-SATA FW for the Broadcom HBA, so no
NVMe support, and my target devices are SAS or SATA HDDs, not SSDs.

Something may be wrong with the NVMe support in that HBA, or, your SSDs do not
like large commands and cause issues. That is easy to test: try connecting your
SSDs directly to PCI and test them by issuing large read/write commands with fio
(you will need to use iomem=mmaphuge option to use hugepages for the IO buffers
to ensure that you do not get the IOs chunked into small commands due to memory
fragmentation).

At least from my point of view and my tests, that commit is perfectly fine. As
mentioned above, it is only changing the default value, and that's something
that can be done manually even without this patch. So this is definitely not the
root cause and it is simply exposing a problem that already existed.
We have seen that for several devices in the ata subsystem and had to quirk many
drives that are choking on large commands.

We just need to figure out where the problem is (HBA and/or SSD) and can then
look into how to avoid that problem.

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2026-04-01 20:02       ` Damien Le Moal
@ 2026-04-01 20:55         ` Keith Busch
  2026-04-01 23:31           ` Damien Le Moal
  2026-04-02 14:33         ` Friedrich Weber
  1 sibling, 1 reply; 26+ messages in thread
From: Keith Busch @ 2026-04-01 20:55 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Mira Limbeck, axboe, hch, linux-block, martin.petersen,
	Friedrich Weber

On Thu, Apr 02, 2026 at 05:02:22AM +0900, Damien Le Moal wrote:
> On 4/1/26 19:32, Mira Limbeck wrote:
> > 
> > Apr 01 11:41:19 pve-test-hba kernel: sd 9:2:2:0: [sdc] tag#3962 page boundary curr_buff: 0x00000000f4d7cfce
> > Apr 01 11:41:19 pve-test-hba kernel: BUG: unable to handle page fault for address: ff3a241243d70000
> > Apr 01 11:41:19 pve-test-hba kernel: #PF: supervisor write access in kernel mode
> > Apr 01 11:41:19 pve-test-hba kernel: #PF: error_code(0x0002) - not-present page
> > Apr 01 11:41:19 pve-test-hba kernel: PGD 100010067 P4D 10066d067 PUD 10066e067 PMD 11f0fa067 PTE 0
> > Apr 01 11:41:19 pve-test-hba kernel: Oops: Oops: 0002 [#3] SMP NOPTI
> > Apr 01 11:41:19 pve-test-hba kernel: CPU: 15 UID: 0 PID: 6695 Comm: vgs Tainted: G      D W   E       7.0.0-rc6 #19 PREEMPT(full)
> > Apr 01 11:41:19 pve-test-hba kernel: Tainted: [D]=DIE, [W]=WARN, [E]=UNSIGNED_MODULE
> > Apr 01 11:41:19 pve-test-hba kernel: Hardware name: <snip>
> > Apr 01 11:41:19 pve-test-hba kernel: RIP: 0010:_base_build_sg_scmd_ieee+0x478/0x590 [mpt3sas]
> 
> There may be an issue with the mpt3sas driver with large commands.
> 
> However, I am using that driver all day long and doing lots of testing with
> gigantic read/write commands all the time. I have never seen any issues.
> The difference is that I am using the SAS-SATA FW for the Broadcom HBA, so no
> NVMe support, and my target devices are SAS or SATA HDDs, not SSDs.

It's only the NVMe attached ones that use base_make_prp_nvme, and it
looks like the ioc->pcie_sgl_dma_pool that provides the buffer for the
entries is too small for a transfer larger than 2MB. So it's overrunning
the allocated buffer.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2026-04-01 20:55         ` Keith Busch
@ 2026-04-01 23:31           ` Damien Le Moal
  0 siblings, 0 replies; 26+ messages in thread
From: Damien Le Moal @ 2026-04-01 23:31 UTC (permalink / raw)
  To: Keith Busch
  Cc: Mira Limbeck, axboe, hch, linux-block, martin.petersen,
	Friedrich Weber

On 2026/04/02 5:55, Keith Busch wrote:
> On Thu, Apr 02, 2026 at 05:02:22AM +0900, Damien Le Moal wrote:
>> On 4/1/26 19:32, Mira Limbeck wrote:
>>>
>>> Apr 01 11:41:19 pve-test-hba kernel: sd 9:2:2:0: [sdc] tag#3962 page boundary curr_buff: 0x00000000f4d7cfce
>>> Apr 01 11:41:19 pve-test-hba kernel: BUG: unable to handle page fault for address: ff3a241243d70000
>>> Apr 01 11:41:19 pve-test-hba kernel: #PF: supervisor write access in kernel mode
>>> Apr 01 11:41:19 pve-test-hba kernel: #PF: error_code(0x0002) - not-present page
>>> Apr 01 11:41:19 pve-test-hba kernel: PGD 100010067 P4D 10066d067 PUD 10066e067 PMD 11f0fa067 PTE 0
>>> Apr 01 11:41:19 pve-test-hba kernel: Oops: Oops: 0002 [#3] SMP NOPTI
>>> Apr 01 11:41:19 pve-test-hba kernel: CPU: 15 UID: 0 PID: 6695 Comm: vgs Tainted: G      D W   E       7.0.0-rc6 #19 PREEMPT(full)
>>> Apr 01 11:41:19 pve-test-hba kernel: Tainted: [D]=DIE, [W]=WARN, [E]=UNSIGNED_MODULE
>>> Apr 01 11:41:19 pve-test-hba kernel: Hardware name: <snip>
>>> Apr 01 11:41:19 pve-test-hba kernel: RIP: 0010:_base_build_sg_scmd_ieee+0x478/0x590 [mpt3sas]
>>
>> There may be an issue with the mpt3sas driver with large commands.
>>
>> However, I am using that driver all day long and doing lots of testing with
>> gigantic read/write commands all the time. I have never seen any issues.
>> The difference is that I am using the SAS-SATA FW for the Broadcom HBA, so no
>> NVMe support, and my target devices are SAS or SATA HDDs, not SSDs.
> 
> It's only the NVMe attached ones that use base_make_prp_nvme, and it
> looks like the ioc->pcie_sgl_dma_pool that provides the buffer for the
> entries is too small for a transfer larger than 2MB. So it's overrunning
> the allocated buffer.

Keith,

Thanks for checking. I had not look at the code yet. Given this, this should be
fairly straightforward to fix. I will have a look and send a patch.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2026-04-01 20:02       ` Damien Le Moal
  2026-04-01 20:55         ` Keith Busch
@ 2026-04-02 14:33         ` Friedrich Weber
  2026-04-02 15:03           ` Keith Busch
  1 sibling, 1 reply; 26+ messages in thread
From: Friedrich Weber @ 2026-04-02 14:33 UTC (permalink / raw)
  To: Damien Le Moal, Mira Limbeck; +Cc: axboe, hch, linux-block, martin.petersen

Hi, Mira and I have been looking into this issue together.

On 01/04/2026 22:01, Damien Le Moal wrote:
> On 4/1/26 19:32, Mira Limbeck wrote:
>> Sorry if I wasn't clear enough, we did test the following mainline
>> kernels without any downstream patches (git tags):
>> v6.16 (unaffected)
>> v6.17 (affected)
>> v7.0-rc5 (affected)
>>
>> Afterwards we started to bisect between mainline 6.16
>> (038d61fd642278bab63ee8ef722c50d10ab01e8f) and mainline 6.17
>> (e5f0a698b34ed76002dc5cff3804a61c80233a7a) without any downstream
>> patches, which led us to this commit as the first bad one:
>> 9b8b84879d4adc506b0d3944e20b28d9f3f6994b
> 
> Note: the proper way to reference a patch is to use 12-digits commit ID and
> patch title:
> 
> 9b8b84879d4a ("block: Increase BLK_DEF_MAX_SECTORS_CAP")
> 
> as that make it easier to know what one is talking about without having to go
> look what patch that ID references.

Thanks for the hint, noted!

>> Building our downstream kernel 6.17 with this commit reverted, fixed it.
> 
> Nope, this is likely not fixing anything but rather hiding the issue. With this
> patch reverted, the default max_sectors_kb will be 1280, so all requests will be
> chunked to that size at most, and your devices will not see large commands.
> However, simply doing something like:
> 
> echo 4096 > /sys/block/<dev>/queue/max_sectors_kb
> 
> will put your system in a state that is equivalent to the patch being applied
> and you will likely see the issue again. Try.

Yes, looks like on a system without this patch, setting the max_sectors_kb to
4096 on the encrypted Ceph OSD disk will produce the crash.

> [...]
> 
>> Here the logs from 7.0-rc6:
>>
>> Apr 01 11:41:19 pve-test-hba kernel: sd 9:2:2:0: [sdc] tag#3962 page boundary curr_buff: 0x00000000f4d7cfce
>> Apr 01 11:41:19 pve-test-hba kernel: BUG: unable to handle page fault for address: ff3a241243d70000
>> Apr 01 11:41:19 pve-test-hba kernel: #PF: supervisor write access in kernel mode
>> Apr 01 11:41:19 pve-test-hba kernel: #PF: error_code(0x0002) - not-present page
>> Apr 01 11:41:19 pve-test-hba kernel: PGD 100010067 P4D 10066d067 PUD 10066e067 PMD 11f0fa067 PTE 0
>> Apr 01 11:41:19 pve-test-hba kernel: Oops: Oops: 0002 [#3] SMP NOPTI
>> Apr 01 11:41:19 pve-test-hba kernel: CPU: 15 UID: 0 PID: 6695 Comm: vgs Tainted: G      D W   E       7.0.0-rc6 #19 PREEMPT(full)
>> Apr 01 11:41:19 pve-test-hba kernel: Tainted: [D]=DIE, [W]=WARN, [E]=UNSIGNED_MODULE
>> Apr 01 11:41:19 pve-test-hba kernel: Hardware name: <snip>
>> Apr 01 11:41:19 pve-test-hba kernel: RIP: 0010:_base_build_sg_scmd_ieee+0x478/0x590 [mpt3sas]
> 
> There may be an issue with the mpt3sas driver with large commands.
> 
> However, I am using that driver all day long and doing lots of testing with
> gigantic read/write commands all the time. I have never seen any issues.
> The difference is that I am using the SAS-SATA FW for the Broadcom HBA, so no
> NVMe support, and my target devices are SAS or SATA HDDs, not SSDs.
> 
> Something may be wrong with the NVMe support in that HBA, or, your SSDs do not
> like large commands and cause issues. That is easy to test: try connecting your
> SSDs directly to PCI and test them by issuing large read/write commands with fio
> (you will need to use iomem=mmaphuge option to use hugepages for the IO buffers
> to ensure that you do not get the IOs chunked into small commands due to memory
> fragmentation).

We only have limited access to the test machine, so testing this is not
trivial. If I understand correctly, there is a lead pointing in the direction
of mpt3sas [1], so I'd postpone this test for now. But if needed, we're happy
to look into it.

On a possibly related note, we also got some reports from users with HBAs using
the megaraid_sas (not mpt3sas) driver with NVMes and crashes that look similar,
at least to my eyes. We do not have a test system with the right hardware, so
currently cannot test with a mainline kernel. I am aware that downstream kernels
are not supported here, but wanted to mention it anyway on the off-chance that
there might be a similar issue in megaraid_sas.

The user at [2] has the following hardware:

> 2x Micron NVMe via a Broadcom SAS 3808 iMR (HBA mode)

And reports the following crash (quoted from the attachment in [2]):

Mar 20 14:52:36 pmx01rdm kernel: sd 0:0:5:0: [sdb] tag#1136 page boundary ptr_sgl: 0x00000000dd27511c
Mar 20 14:52:36 pmx01rdm kernel: BUG: unable to handle page fault for address: ff46467d429e0000
Mar 20 14:52:36 pmx01rdm kernel: #PF: supervisor write access in kernel mode
Mar 20 14:52:36 pmx01rdm kernel: #PF: error_code(0x0002) - not-present page
Mar 20 14:52:36 pmx01rdm kernel: PGD 100000067 P4D 1002fe067 PUD 1002ff067 PMD 1037d9067 PTE 0
Mar 20 14:52:36 pmx01rdm kernel: Oops: Oops: 0002 [#1] SMP NOPTI
Mar 20 14:52:36 pmx01rdm kernel: CPU: 22 UID: 0 PID: 263834 Comm: kworker/u130:3 Tainted: P           O        6.17.4-2-pve #1 PREEMPT(voluntary) 
Mar 20 14:52:36 pmx01rdm kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE
Mar 20 14:52:36 pmx01rdm kernel: Hardware name: Supermicro AS -2015CS-TNR/H13SSW, BIOS 3.7 10/09/2025
Mar 20 14:52:36 pmx01rdm kernel: Workqueue: writeback wb_workfn (flush-8:16)
Mar 20 14:52:36 pmx01rdm kernel: RIP: 0010:megasas_build_and_issue_cmd_fusion+0xeaa/0x1870 [megaraid_sas]
Mar 20 14:52:36 pmx01rdm kernel: Code: 20 48 89 d1 48 83 e1 fc 83 e2 01 48 0f 45 d9 4c 8b 73 10 44 8b 6b 18 4c 89 f9 4c 8d 79 08 45 85 fa 0f 84 fd 03 00 00 45 29 cc <4c> 89 31 48 83 c0 08 41 83 c0 01 45 29 cd 45 85 e4 7f ab 44 89 c0
Mar 20 14:52:36 pmx01rdm kernel: RSP: 0018:ff46467dc4647300 EFLAGS: 00010206
Mar 20 14:52:36 pmx01rdm kernel: RAX: 00000000ff690000 RBX: ff29d11d2c357040 RCX: ff46467d429e0000
Mar 20 14:52:36 pmx01rdm kernel: RDX: ff46467d429e0008 RSI: ff29d11d2c356f08 RDI: 0000000000000000
Mar 20 14:52:36 pmx01rdm kernel: RBP: ff46467dc46473d0 R08: 0000000000000200 R09: 0000000000001000
Mar 20 14:52:36 pmx01rdm kernel: R10: 0000000000000fff R11: 0000000000001000 R12: 00000000001ff000
Mar 20 14:52:36 pmx01rdm kernel: R13: 0000000000200000 R14: 00000000b6600000 R15: ff46467d429e0008
Mar 20 14:52:36 pmx01rdm kernel: FS:  0000000000000000(0000) GS:ff29d15bc8e86000(0000) knlGS:0000000000000000
Mar 20 14:52:36 pmx01rdm kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 20 14:52:36 pmx01rdm kernel: CR2: ff46467d429e0000 CR3: 0000001385c3a005 CR4: 0000000000f71ef0
Mar 20 14:52:36 pmx01rdm kernel: PKRU: 55555554
Mar 20 14:52:36 pmx01rdm kernel: Call Trace:
Mar 20 14:52:36 pmx01rdm kernel:  <TASK>
Mar 20 14:52:36 pmx01rdm kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Mar 20 14:52:36 pmx01rdm kernel:  ? scsi_alloc_sgtables+0xa3/0x3a0
Mar 20 14:52:36 pmx01rdm kernel:  megasas_queue_command+0x122/0x1d0 [megaraid_sas]
Mar 20 14:52:36 pmx01rdm kernel:  scsi_queue_rq+0x409/0xcc0
Mar 20 14:52:36 pmx01rdm kernel:  blk_mq_dispatch_rq_list+0x121/0x740
Mar 20 14:52:36 pmx01rdm kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Mar 20 14:52:36 pmx01rdm kernel:  ? sbitmap_get+0x73/0x180
Mar 20 14:52:36 pmx01rdm kernel:  __blk_mq_sched_dispatch_requests+0x408/0x600
Mar 20 14:52:36 pmx01rdm kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Mar 20 14:52:36 pmx01rdm kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Mar 20 14:52:36 pmx01rdm kernel:  blk_mq_sched_dispatch_requests+0x2d/0x80
Mar 20 14:52:36 pmx01rdm kernel:  blk_mq_run_hw_queue+0x2c3/0x330
Mar 20 14:52:36 pmx01rdm kernel:  blk_mq_dispatch_list+0x13e/0x460
Mar 20 14:52:36 pmx01rdm kernel:  blk_mq_flush_plug_list+0x62/0x1e0
Mar 20 14:52:36 pmx01rdm kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Mar 20 14:52:36 pmx01rdm kernel:  blk_add_rq_to_plug+0xfc/0x1c0
Mar 20 14:52:36 pmx01rdm kernel:  blk_mq_submit_bio+0x61f/0x890
Mar 20 14:52:36 pmx01rdm kernel:  __submit_bio+0x74/0x290
Mar 20 14:52:36 pmx01rdm kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Mar 20 14:52:36 pmx01rdm kernel:  submit_bio_noacct_nocheck+0x28d/0x370
Mar 20 14:52:36 pmx01rdm kernel:  submit_bio_noacct+0x19b/0x5b0
Mar 20 14:52:36 pmx01rdm kernel:  submit_bio+0xb1/0x110
Mar 20 14:52:36 pmx01rdm kernel:  mpage_write_folio+0x76a/0x7c0
Mar 20 14:52:36 pmx01rdm kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Mar 20 14:52:36 pmx01rdm kernel:  ? mod_memcg_lruvec_state+0xd3/0x1f0
Mar 20 14:52:36 pmx01rdm kernel:  mpage_writepages+0x87/0x110
Mar 20 14:52:36 pmx01rdm kernel:  ? __pfx_fat_get_block+0x10/0x10
Mar 20 14:52:36 pmx01rdm kernel:  ? update_curr+0x187/0x1b0
Mar 20 14:52:36 pmx01rdm kernel:  fat_writepages+0x15/0x30
Mar 20 14:52:36 pmx01rdm kernel:  do_writepages+0xc1/0x180
Mar 20 14:52:36 pmx01rdm kernel:  __writeback_single_inode+0x44/0x350
Mar 20 14:52:36 pmx01rdm kernel:  writeback_sb_inodes+0x24e/0x550
Mar 20 14:52:36 pmx01rdm kernel:  wb_writeback+0x98/0x330
Mar 20 14:52:36 pmx01rdm kernel:  wb_workfn+0xb6/0x410
Mar 20 14:52:36 pmx01rdm kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Mar 20 14:52:36 pmx01rdm kernel:  process_one_work+0x188/0x370
Mar 20 14:52:36 pmx01rdm kernel:  worker_thread+0x33a/0x480
Mar 20 14:52:36 pmx01rdm kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Mar 20 14:52:36 pmx01rdm kernel:  ? __pfx_worker_thread+0x10/0x10
Mar 20 14:52:36 pmx01rdm kernel:  kthread+0x108/0x220
Mar 20 14:52:36 pmx01rdm kernel:  ? __pfx_kthread+0x10/0x10
Mar 20 14:52:36 pmx01rdm kernel:  ret_from_fork+0x205/0x240
Mar 20 14:52:36 pmx01rdm kernel:  ? __pfx_kthread+0x10/0x10
Mar 20 14:52:36 pmx01rdm kernel:  ret_from_fork_asm+0x1a/0x30
Mar 20 14:52:36 pmx01rdm kernel:  </TASK>
[...]

[1] https://lore.kernel.org/all/ac2GOOLzui0U0BTJ@kbusch-mbp/
[2] https://bugzilla.proxmox.com/show_bug.cgi?id=7438


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2026-04-02 14:33         ` Friedrich Weber
@ 2026-04-02 15:03           ` Keith Busch
  0 siblings, 0 replies; 26+ messages in thread
From: Keith Busch @ 2026-04-02 15:03 UTC (permalink / raw)
  To: Friedrich Weber
  Cc: Damien Le Moal, Mira Limbeck, axboe, hch, linux-block,
	martin.petersen

On Thu, Apr 02, 2026 at 04:33:47PM +0200, Friedrich Weber wrote:
> We only have limited access to the test machine, so testing this is not
> trivial. If I understand correctly, there is a lead pointing in the direction
> of mpt3sas [1], so I'd postpone this test for now. But if needed, we're happy
> to look into it.

Yeah, the mpt3sas driver isn't using an appropriate sized buffer for
nvme prp handling. The easy option is just force the block layer to
split requests so the driver never sees anything bigger than what it can
currently handle. This should do it. I don't have any such device to
test on, though.

---
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 6ff7885572942..c76f5b958c56f 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -2739,7 +2739,10 @@ scsih_sdev_configure(struct scsi_device *sdev, struct queue_limits *lim)
 				pcie_device->connector_name);
 
 		if (pcie_device->nvme_mdts)
-			lim->max_hw_sectors = pcie_device->nvme_mdts / 512;
+			lim->max_hw_sectors = min(pcie_device->nvme_mdts / 512,
+						(SZ_2M / 512) - 8);
+		else
+			lim->max_hw_sectors = (SZ_2M / 512) - 8;
 
 		pcie_device_put(pcie_device);
 		spin_unlock_irqrestore(&ioc->pcie_device_lock, flags);
--

^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2026-04-02 15:03 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-18  6:00 [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP Damien Le Moal
2025-06-18  6:17 ` Hannes Reinecke
2025-06-18  8:51 ` Johannes Thumshirn
2025-06-18  9:06 ` John Garry
2025-06-18  9:47   ` Damien Le Moal
2025-06-18 10:19 ` Martin K. Petersen
2025-06-23 13:40 ` Christoph Hellwig
2025-06-24 16:49 ` Jens Axboe
2025-08-27  7:07   ` Sebastian Andrzej Siewior
2025-08-27  7:38     ` Christoph Hellwig
2025-08-27  7:52       ` Sebastian Andrzej Siewior
2025-08-27  8:00         ` Christoph Hellwig
2025-08-27  8:03           ` Damien Le Moal
2025-08-27  8:01         ` Damien Le Moal
2025-08-27  8:42           ` Sebastian Andrzej Siewior
2025-08-27  9:01             ` Damien Le Moal
2025-08-27 10:16               ` Sebastian Andrzej Siewior
2026-03-31 12:02 ` Mira Limbeck
2026-03-31 12:30   ` Mira Limbeck
2026-03-31 19:48   ` Damien Le Moal
2026-04-01 10:32     ` Mira Limbeck
2026-04-01 20:02       ` Damien Le Moal
2026-04-01 20:55         ` Keith Busch
2026-04-01 23:31           ` Damien Le Moal
2026-04-02 14:33         ` Friedrich Weber
2026-04-02 15:03           ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox