[PATCH v2] block: Increase BLK_DEF_MAX_SECTORS

public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
@ 2025-06-18  6:00 Damien Le Moal
  2025-06-18  6:17 ` Hannes Reinecke
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Damien Le Moal @ 2025-06-18  6:00 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Christoph Hellwig, Martin K . Petersen

Back in 2015, commit d2be537c3ba3 ("block: bump BLK_DEF_MAX_SECTORS to
2560") increased the default maximum size of a block device I/O to 2560
sectors (1280 KiB) to "accommodate a 10-data-disk stripe write with
chunk size 128k". This choice is rather arbitrary and since then,
improvements to the block layer have software RAID drivers correctly
advertize their stripe width through chunk_sectors and abuses of
BLK_DEF_MAX_SECTORS_CAP by drivers (to set the HW limit rather than the
default user controlled maximum I/O size) have been fixed.

Since many block devices can benefit from a larger value of
BLK_DEF_MAX_SECTORS_CAP, and in particular HDDs, increase this value to
be 4MiB, or 8192 sectors.

And given that BLK_DEF_MAX_SECTORS_CAP is only used in the block layer
and should not be used by drivers directly, move this macro definition
to the block layer internal header file block/blk.h.

Suggested-by: Martin K . Petersen <martin.petersen@oracle.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
Changes from v1:
 - Move BLK_DEF_MAX_SECTORS_CAP definition to block/blk.h
 - Define the macro value using SZ_4M to make it more readable
 - Added review tag

 block/blk.h            | 9 +++++++++
 include/linux/blkdev.h | 9 ---------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/block/blk.h b/block/blk.h
index 37ec459fe656..1141b343d0b5 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -13,6 +13,15 @@
 
 struct elevator_type;
 
+/*
+ * Default upper limit for the software max_sectors limit used for regular I/Os.
+ * This can be increased through sysfs.
+ *
+ * This should not be confused with the max_hw_sector limit that is entirely
+ * controlled by the block device driver, usually based on hardware limits.
+ */
+#define BLK_DEF_MAX_SECTORS_CAP	(SZ_4M >> SECTOR_SHIFT)
+
 #define	BLK_DEV_MAX_SECTORS	(LLONG_MAX >> 9)
 #define	BLK_MIN_SEGMENT_SIZE	4096
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 85aab8bc96e7..c2b3ddea8b6d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1231,15 +1231,6 @@ enum blk_default_limits {
 	BLK_SEG_BOUNDARY_MASK	= 0xFFFFFFFFUL,
 };
 
-/*
- * Default upper limit for the software max_sectors limit used for
- * regular file system I/O.  This can be increased through sysfs.
- *
- * Not to be confused with the max_hw_sector limit that is entirely
- * controlled by the driver, usually based on hardware limits.
- */
-#define BLK_DEF_MAX_SECTORS_CAP	2560u
-
 static inline struct queue_limits *bdev_limits(struct block_device *bdev)
 {
 	return &bdev_get_queue(bdev)->limits;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-06-18  6:00 [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP Damien Le Moal
@ 2025-06-18  6:17 ` Hannes Reinecke
  2025-06-18  8:51 ` Johannes Thumshirn
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Hannes Reinecke @ 2025-06-18  6:17 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe, linux-block
  Cc: Christoph Hellwig, Martin K . Petersen

On 6/18/25 08:00, Damien Le Moal wrote:
> Back in 2015, commit d2be537c3ba3 ("block: bump BLK_DEF_MAX_SECTORS to
> 2560") increased the default maximum size of a block device I/O to 2560
> sectors (1280 KiB) to "accommodate a 10-data-disk stripe write with
> chunk size 128k". This choice is rather arbitrary and since then,
> improvements to the block layer have software RAID drivers correctly
> advertize their stripe width through chunk_sectors and abuses of
> BLK_DEF_MAX_SECTORS_CAP by drivers (to set the HW limit rather than the
> default user controlled maximum I/O size) have been fixed.
> 
> Since many block devices can benefit from a larger value of
> BLK_DEF_MAX_SECTORS_CAP, and in particular HDDs, increase this value to
> be 4MiB, or 8192 sectors.
> 
> And given that BLK_DEF_MAX_SECTORS_CAP is only used in the block layer
> and should not be used by drivers directly, move this macro definition
> to the block layer internal header file block/blk.h.
> 
> Suggested-by: Martin K . Petersen <martin.petersen@oracle.com>
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
> ---
> Changes from v1:
>   - Move BLK_DEF_MAX_SECTORS_CAP definition to block/blk.h
>   - Define the macro value using SZ_4M to make it more readable
>   - Added review tag
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-06-18  6:00 [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP Damien Le Moal
  2025-06-18  6:17 ` Hannes Reinecke
@ 2025-06-18  8:51 ` Johannes Thumshirn
  2025-06-18  9:06 ` John Garry
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Johannes Thumshirn @ 2025-06-18  8:51 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe, linux-block@vger.kernel.org
  Cc: hch, Martin K . Petersen

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-06-18  6:00 [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP Damien Le Moal
  2025-06-18  6:17 ` Hannes Reinecke
  2025-06-18  8:51 ` Johannes Thumshirn
@ 2025-06-18  9:06 ` John Garry
  2025-06-18  9:47   ` Damien Le Moal
  2025-06-18 10:19 ` Martin K. Petersen
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: John Garry @ 2025-06-18  9:06 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe, linux-block
  Cc: Christoph Hellwig, Martin K . Petersen

On 18/06/2025 07:00, Damien Le Moal wrote:
> Back in 2015, commit d2be537c3ba3 ("block: bump BLK_DEF_MAX_SECTORS to
> 2560") increased the default maximum size of a block device I/O to 2560
> sectors (1280 KiB) to "accommodate a 10-data-disk stripe write with
> chunk size 128k". This choice is rather arbitrary and since then,
> improvements to the block layer have software RAID drivers correctly
> advertize their stripe width through chunk_sectors and abuses of
> BLK_DEF_MAX_SECTORS_CAP by drivers (to set the HW limit rather than the
> default user controlled maximum I/O size) have been fixed.
> 
> Since many block devices can benefit from a larger value of
> BLK_DEF_MAX_SECTORS_CAP, and in particular HDDs, increase this value to
> be 4MiB, or 8192 sectors.
> 
> And given that BLK_DEF_MAX_SECTORS_CAP is only used in the block layer
> and should not be used by drivers directly, move this macro definition
> to the block layer internal header file block/blk.h.
> 
> Suggested-by: Martin K . Petersen <martin.petersen@oracle.com>
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

Regardless of comment below:
Reviewed-by: John Garry <john.g.garry@oracle.com>

> ---
> Changes from v1:
>   - Move BLK_DEF_MAX_SECTORS_CAP definition to block/blk.h

it's only referenced in blk-settings.c, so I don't know why it doesn't 
live there.

However it is co-located with enum blk_default_limits and the same 
comment goes for members of enum blk_default_limits. I think all those 
in enum blk_default_limits could potentially be moved to blk-settings.c 
after Christoph's work for atomic queue limit updates.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-06-18  9:06 ` John Garry
@ 2025-06-18  9:47   ` Damien Le Moal
  0 siblings, 0 replies; 17+ messages in thread
From: Damien Le Moal @ 2025-06-18  9:47 UTC (permalink / raw)
  To: John Garry, Jens Axboe, linux-block
  Cc: Christoph Hellwig, Martin K . Petersen

On 6/18/25 18:06, John Garry wrote:
> On 18/06/2025 07:00, Damien Le Moal wrote:
>> Back in 2015, commit d2be537c3ba3 ("block: bump BLK_DEF_MAX_SECTORS to
>> 2560") increased the default maximum size of a block device I/O to 2560
>> sectors (1280 KiB) to "accommodate a 10-data-disk stripe write with
>> chunk size 128k". This choice is rather arbitrary and since then,
>> improvements to the block layer have software RAID drivers correctly
>> advertize their stripe width through chunk_sectors and abuses of
>> BLK_DEF_MAX_SECTORS_CAP by drivers (to set the HW limit rather than the
>> default user controlled maximum I/O size) have been fixed.
>>
>> Since many block devices can benefit from a larger value of
>> BLK_DEF_MAX_SECTORS_CAP, and in particular HDDs, increase this value to
>> be 4MiB, or 8192 sectors.
>>
>> And given that BLK_DEF_MAX_SECTORS_CAP is only used in the block layer
>> and should not be used by drivers directly, move this macro definition
>> to the block layer internal header file block/blk.h.
>>
>> Suggested-by: Martin K . Petersen <martin.petersen@oracle.com>
>> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
>> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
> 
> Regardless of comment below:
> Reviewed-by: John Garry <john.g.garry@oracle.com>
> 
>> ---
>> Changes from v1:
>>   - Move BLK_DEF_MAX_SECTORS_CAP definition to block/blk.h
> 
> it's only referenced in blk-settings.c, so I don't know why it doesn't 
> live there.
> 
> However it is co-located with enum blk_default_limits and the same 
> comment goes for members of enum blk_default_limits. I think all those 
> in enum blk_default_limits could potentially be moved to blk-settings.c 
> after Christoph's work for atomic queue limit updates.

I actually checked that and a few drivers are still using 2 of the 4 enum defaults.

Jens,

DO you prefer we move BLK_DEF_MAX_SECTORS_CAP to blk-settings.c ? blk.h has a
couple of settings macro at the top, it is together with that for now.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-06-18  6:00 [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP Damien Le Moal
                   ` (2 preceding siblings ...)
  2025-06-18  9:06 ` John Garry
@ 2025-06-18 10:19 ` Martin K. Petersen
  2025-06-23 13:40 ` Christoph Hellwig
  2025-06-24 16:49 ` Jens Axboe
  5 siblings, 0 replies; 17+ messages in thread
From: Martin K. Petersen @ 2025-06-18 10:19 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen


Damien,

> And given that BLK_DEF_MAX_SECTORS_CAP is only used in the block layer
> and should not be used by drivers directly, move this macro definition
> to the block layer internal header file block/blk.h.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-06-18  6:00 [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP Damien Le Moal
                   ` (3 preceding siblings ...)
  2025-06-18 10:19 ` Martin K. Petersen
@ 2025-06-23 13:40 ` Christoph Hellwig
  2025-06-24 16:49 ` Jens Axboe
  5 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2025-06-23 13:40 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-06-18  6:00 [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP Damien Le Moal
                   ` (4 preceding siblings ...)
  2025-06-23 13:40 ` Christoph Hellwig
@ 2025-06-24 16:49 ` Jens Axboe
  2025-08-27  7:07   ` Sebastian Andrzej Siewior
  5 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2025-06-24 16:49 UTC (permalink / raw)
  To: linux-block, Damien Le Moal; +Cc: Christoph Hellwig, Martin K . Petersen


On Wed, 18 Jun 2025 15:00:45 +0900, Damien Le Moal wrote:
> Back in 2015, commit d2be537c3ba3 ("block: bump BLK_DEF_MAX_SECTORS to
> 2560") increased the default maximum size of a block device I/O to 2560
> sectors (1280 KiB) to "accommodate a 10-data-disk stripe write with
> chunk size 128k". This choice is rather arbitrary and since then,
> improvements to the block layer have software RAID drivers correctly
> advertize their stripe width through chunk_sectors and abuses of
> BLK_DEF_MAX_SECTORS_CAP by drivers (to set the HW limit rather than the
> default user controlled maximum I/O size) have been fixed.
> 
> [...]

Applied, thanks!

[1/1] block: Increase BLK_DEF_MAX_SECTORS_CAP
      commit: 345c5091ffec5d4d53d7fe572fef3bcc3805824b

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-06-24 16:49 ` Jens Axboe
@ 2025-08-27  7:07   ` Sebastian Andrzej Siewior
  2025-08-27  7:38     ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-08-27  7:07 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Damien Le Moal, Christoph Hellwig,
	Martin K . Petersen

On 2025-06-24 10:49:16 [-0600], Jens Axboe wrote:
> Applied, thanks!
> 
> [1/1] block: Increase BLK_DEF_MAX_SECTORS_CAP
>       commit: 345c5091ffec5d4d53d7fe572fef3bcc3805824b

I have here a PowerEdge R6525 which exposes a "DELLBOSS VD" device with
firmware MV.R00-0. I updated firmware left and right of the components I
could find but started with this commit I get:
|[   10.894688] ata1: SATA max UDMA/133 abar m2048@0xa6300000 port 0xa6300100 irq 97 lpm-pol 1
|[   11.233656] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
|[   11.267540] ata1.00: ATA-8: DELLBOSS VD, MV.R00-0, max UDMA7
|[   11.279106] ata1.00: 468731008 sectors, multi 0: LBA48 NCQ (depth 32)
|[   11.309332] ata1.00: Invalid log directory version 0x0000
|[   11.324380] ata1.00: Security Log not supported
|[   11.336514] ata1.00: Security Log not supported
|[   11.350523] ata1.00: configured for UDMA/133
|[   11.351026] scsi 0:0:0:0: Direct-Access     ATA      DELLBOSS VD      00-0 PQ: 0 ANSI: 5
|[   11.361416] scsi 0:0:0:0: Attached scsi generic sg0 type 0
|[   11.361928] sd 0:0:0:0: [sda] 468731008 512-byte logical blocks: (240 GB/224 GiB)
|[   11.361932] sd 0:0:0:0: [sda] 4096-byte physical blocks
|[   11.361942] sd 0:0:0:0: [sda] Write Protect is off
|[   11.361944] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
|[   11.361957] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
|[   11.361979] sd 0:0:0:0: [sda] Preferred minimum I/O size 4096 bytes
|[   12.654692] EXT4-fs (sda2): mounted filesystem abc6d95d-c676-442b-b22a-ce59dbdc47d3 ro with ordered data mode. Quota mode: none.
|[   14.497319] EXT4-fs (sda2): re-mounted abc6d95d-c676-442b-b22a-ce59dbdc47d3 r/w.
|[   67.619838] ata1: illegal qc_active transition (100000000->180005576)
|[   67.627051] ata1.00: Read log 0x10 page 0x00 failed, Emask 0x100
|[   67.633773] ata1: failed to read log page 10h (errno=-5)
|[   67.639802] ata1.00: exception Emask 0x1 SAct 0x80005576 SErr 0x0 action 0x6 frozen
|[   67.648156] ata1.00: irq_stat 0x40000008
|[   67.652785] ata1.00: failed command: WRITE FPDMA QUEUED
|[   67.658707] ata1.00: cmd 61/08:08:d0:85:12/00:00:00:00:00/40 tag 1 ncq dma 4096 out
|                         res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x3 (HSM violation) 
…
|[   67.881878] ata1: hard resetting link
|[   68.194853] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
|[   68.201931] ata1.00: Security Log not supported
|[   68.207468] ata1.00: Security Log not supported
|[   68.212723] ata1.00: configured for UDMA/133
|[   68.527427] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
|[   68.534478] ata1.00: Security Log not supported
|[   68.539924] ata1.00: Security Log not supported
|[   68.545218] ata1.00: configured for UDMA/133
|[   68.550246] sd 0:0:0:0: [sda] tag#7 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=50s
|[   68.560502] sd 0:0:0:0: [sda] tag#7 CDB: Read(10) 28 00 00 50 08 00 00 00 10 00
|[   68.568502] I/O error, dev sda, sector 5244928 op 0x0:(READ) flags 0x83700 phys_seg 2 prio class 2

and this never recovers. After reverting 9b8b84879d4ad ("block: Increase
BLK_DEF_MAX_SECTORS_CAP") on top of v6.17-rc3 things are back to normal.

Did I forget to update firmware somewhere or is this "normal" and this
device requires a quirk?

> Best regards,

Sebastian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-08-27  7:07   ` Sebastian Andrzej Siewior
@ 2025-08-27  7:38     ` Christoph Hellwig
  2025-08-27  7:52       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2025-08-27  7:38 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Jens Axboe, linux-block, Damien Le Moal, Christoph Hellwig,
	Martin K . Petersen

On Wed, Aug 27, 2025 at 09:07:05AM +0200, Sebastian Andrzej Siewior wrote:
> 
> I have here a PowerEdge R6525 which exposes a "DELLBOSS VD" device with
> firmware MV.R00-0. I updated firmware left and right of the components I
> could find but started with this commit I get:

...

> Did I forget to update firmware somewhere or is this "normal" and this
> device requires a quirk?

Looks like it needs a quirk.  Note that if the above commit triggered
this for you, you could also reproduce it before by say doing a large
direct I/O read.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-08-27  7:38     ` Christoph Hellwig
@ 2025-08-27  7:52       ` Sebastian Andrzej Siewior
  2025-08-27  8:00         ` Christoph Hellwig
  2025-08-27  8:01         ` Damien Le Moal
  0 siblings, 2 replies; 17+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-08-27  7:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Damien Le Moal, Martin K . Petersen

On 2025-08-27 09:38:36 [+0200], Christoph Hellwig wrote:
> > Did I forget to update firmware somewhere or is this "normal" and this
> > device requires a quirk?
> 
> Looks like it needs a quirk.  

Just wanted to make sure I did not forget to update firmware somewhere…
It should be easy to fix this one the firmware's side (in case someone
capable is reading this).

>                               Note that if the above commit triggered
> this for you, you could also reproduce it before by say doing a large
> direct I/O read.

On a kernel without that commit in question? Booting Debian's current
v6.12 and
|  dd if=vmlinux.o of=/dev/null bs=1G count=1 iflag=direct

works like a charm. According to strace it does
| openat(AT_FDCWD, "vmlinux.o", O_RDONLY|O_DIRECT) = 3
| dup2(3, 0)                              = 0
| lseek(0, 0, SEEK_CUR)                   = 0
| read(0, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\1\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 1073741824) = 841980992

so it should be what you asked for. Asked for 1G, got ~800M.

Sebastian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-08-27  7:52       ` Sebastian Andrzej Siewior
@ 2025-08-27  8:00         ` Christoph Hellwig
  2025-08-27  8:03           ` Damien Le Moal
  2025-08-27  8:01         ` Damien Le Moal
  1 sibling, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2025-08-27  8:00 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Damien Le Moal,
	Martin K . Petersen

On Wed, Aug 27, 2025 at 09:52:21AM +0200, Sebastian Andrzej Siewior wrote:
> > this for you, you could also reproduce it before by say doing a large
> > direct I/O read.
> 
> On a kernel without that commit in question? Booting Debian's current
> v6.12 and
> |  dd if=vmlinux.o of=/dev/null bs=1G count=1 iflag=direct
> 
> works like a charm. According to strace it does
> | openat(AT_FDCWD, "vmlinux.o", O_RDONLY|O_DIRECT) = 3
> | dup2(3, 0)                              = 0
> | lseek(0, 0, SEEK_CUR)                   = 0
> | read(0, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\1\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 1073741824) = 841980992
> 
> so it should be what you asked for. Asked for 1G, got ~800M.

This is probably splitting thing up into multiple bios because your
output memory is fragmented.  You'd have to do it into hugetlbfs or
vma otherwise backed by very larger folios.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-08-27  8:00         ` Christoph Hellwig
@ 2025-08-27  8:03           ` Damien Le Moal
  0 siblings, 0 replies; 17+ messages in thread
From: Damien Le Moal @ 2025-08-27  8:03 UTC (permalink / raw)
  To: Christoph Hellwig, Sebastian Andrzej Siewior
  Cc: Jens Axboe, linux-block, Martin K . Petersen

On 8/27/25 5:00 PM, Christoph Hellwig wrote:
> On Wed, Aug 27, 2025 at 09:52:21AM +0200, Sebastian Andrzej Siewior wrote:
>>> this for you, you could also reproduce it before by say doing a large
>>> direct I/O read.
>>
>> On a kernel without that commit in question? Booting Debian's current
>> v6.12 and
>> |  dd if=vmlinux.o of=/dev/null bs=1G count=1 iflag=direct
>>
>> works like a charm. According to strace it does
>> | openat(AT_FDCWD, "vmlinux.o", O_RDONLY|O_DIRECT) = 3
>> | dup2(3, 0)                              = 0
>> | lseek(0, 0, SEEK_CUR)                   = 0
>> | read(0, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\1\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 1073741824) = 841980992
>>
>> so it should be what you asked for. Asked for 1G, got ~800M.
> 
> This is probably splitting thing up into multiple bios because your
> output memory is fragmented.  You'd have to do it into hugetlbfs or
> vma otherwise backed by very larger folios.

and also need:

echo 4096 > /sys/block/sdX/queue/max_sectors_kb

or some large number.

But given that commit 345c5091ffec sets the default to 4MiB, I/Os are split to
4M and trigger the issue. So there is likely a cut-off command size < 4M where
things stop working with this adapter.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-08-27  7:52       ` Sebastian Andrzej Siewior
  2025-08-27  8:00         ` Christoph Hellwig
@ 2025-08-27  8:01         ` Damien Le Moal
  2025-08-27  8:42           ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 17+ messages in thread
From: Damien Le Moal @ 2025-08-27  8:01 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Christoph Hellwig
  Cc: Jens Axboe, linux-block, Martin K . Petersen

On 8/27/25 4:52 PM, Sebastian Andrzej Siewior wrote:
> On 2025-08-27 09:38:36 [+0200], Christoph Hellwig wrote:
>>> Did I forget to update firmware somewhere or is this "normal" and this
>>> device requires a quirk?
>>
>> Looks like it needs a quirk.  
> 
> Just wanted to make sure I did not forget to update firmware somewhere…
> It should be easy to fix this one the firmware's side (in case someone
> capable is reading this).
> 
>>                               Note that if the above commit triggered
>> this for you, you could also reproduce it before by say doing a large
>> direct I/O read.
> 
> On a kernel without that commit in question? Booting Debian's current
> v6.12 and
> |  dd if=vmlinux.o of=/dev/null bs=1G count=1 iflag=direct

Don't read a file. Read the disk directly. So please use "if=/dev/sdX".
Also, there is no way that a 1GiB I/O will be done as a single large command.
That is not going to happen.

With 345c5091ffec reverted, what does:

cat /sys/block/sdX/queue/max_sectors_kb
cat /sys/block/sdX/queue/max_hw_sectors_kb

say ?

Likely, the first one is "1280". So before running dd, you need to do:

echo 4096 > /sys/block/sdX/queue/max_sectors_kb

and then

dd if=/dev/sdX of=/dev/null bs=4M count=1 iflag=direct

And you will likely trigger the issue, even with 345c5091ffec reverted.
The issue is likely caused by a FW bug handling large commands.
Please try.



> 
> works like a charm. According to strace it does
> | openat(AT_FDCWD, "vmlinux.o", O_RDONLY|O_DIRECT) = 3
> | dup2(3, 0)                              = 0
> | lseek(0, 0, SEEK_CUR)                   = 0
> | read(0, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\1\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 1073741824) = 841980992
> 
> so it should be what you asked for. Asked for 1G, got ~800M.
> 
> Sebastian


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-08-27  8:01         ` Damien Le Moal
@ 2025-08-27  8:42           ` Sebastian Andrzej Siewior
  2025-08-27  9:01             ` Damien Le Moal
  0 siblings, 1 reply; 17+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-08-27  8:42 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Martin K . Petersen

On 2025-08-27 17:01:49 [+0900], Damien Le Moal wrote:
> Don't read a file. Read the disk directly. So please use "if=/dev/sdX".
> Also, there is no way that a 1GiB I/O will be done as a single large command.
> That is not going to happen.
> 
> With 345c5091ffec reverted, what does:
> 
> cat /sys/block/sdX/queue/max_sectors_kb
> cat /sys/block/sdX/queue/max_hw_sectors_kb
> 
> say ?

| # cat /sys/block/sda/queue/max_sectors_kb
| 1280
| # cat /sys/block/sda/queue/max_hw_sectors_kb
| 32767

> Likely, the first one is "1280". So before running dd, you need to do:
> 
> echo 4096 > /sys/block/sdX/queue/max_sectors_kb
> 
> and then
> 
> dd if=/dev/sdX of=/dev/null bs=4M count=1 iflag=direct

| # echo 4096 > /sys/block/sda/queue/max_sectors_kb
| # cat /sys/block/sda/queue/max_sectors_kb 
| 4096
| # dd if=/dev/sda of=/dev/null bs=4M count=1 iflag=direct
| 1+0 records in
| 1+0 records out
| 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.00966543 s, 434 MB/s

It passed.
After a reboot I issued the same dd command five times and all came
back. Then I increased the sector size and issued it again. The first
two came back and then

| root@zen3:~# dd if=/dev/sda of=/dev/null bs=4M count=1 iflag=direct
| 1+0 records in
| 1+0 records out
| 4194304 bytes (4.2 MB, 4.0 MiB) copied, 33.1699 s, 126 kB/s
| root@zen3:~# dd if=/dev/sda of=/dev/null bs=4M count=1 iflag=direct
| 1+0 records in
| 1+0 records out
| 4194304 bytes (4.2 MB, 4.0 MiB) copied, 57.3711 s, 73.1 kB/s
| root@zen3:~# dd if=/dev/sda of=/dev/null bs=4M count=1 iflag=direct
| 1+0 records in
| 1+0 records out
| 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0264171 s, 159 MB/s

They all came back but as you see on the speed side, it took while. And
I see
| [  191.641315] ata1.00: exception Emask 0x0 SAct 0x800000 SErr 0x0 action 0x6 frozen
| [  191.648839] ata1.00: failed command: READ FPDMA QUEUED
| [  191.653995] ata1.00: cmd 60/00:b8:00:00:00/20:00:00:00:00/40 tag 23 ncq dma 4194304 in
|                         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
| [  191.669306] ata1.00: status: { DRDY }
| [  191.672981] ata1: hard resetting link
| [  192.702763] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
| [  192.702964] ata1.00: Security Log not supported
| [  192.703207] ata1.00: Security Log not supported
| [  192.703215] ata1.00: configured for UDMA/133
| [  192.703282] ata1: EH complete
| [  248.985303] ata1.00: exception Emask 0x0 SAct 0x10001 SErr 0x0 action 0x6 frozen
| [  248.992733] ata1.00: failed command: READ FPDMA QUEUED
| [  248.997889] ata1.00: cmd 60/00:00:00:00:00/20:00:00:00:00/40 tag 0 ncq dma 4194304 in
|                         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
| [  249.013107] ata1.00: status: { DRDY }
| [  249.016775] ata1.00: failed command: WRITE FPDMA QUEUED
| [  249.022011] ata1.00: cmd 61/08:80:40:d1:18/00:00:00:00:00/40 tag 16 ncq dma 4096 out
|                         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
| [  249.037135] ata1.00: status: { DRDY }
| [  249.040802] ata1: hard resetting link
| [  250.076059] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
| [  250.076258] ata1.00: Security Log not supported
| [  250.076471] ata1.00: Security Log not supported
| [  250.076478] ata1.00: configured for UDMA/133
| [  250.076537] ata1: EH complete

> And you will likely trigger the issue, even with 345c5091ffec reverted.
> The issue is likely caused by a FW bug handling large commands.
> Please try.
Done. It seems the firmware is not always dedicated to fulfill larger
requests.

Sebastian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-08-27  8:42           ` Sebastian Andrzej Siewior
@ 2025-08-27  9:01             ` Damien Le Moal
  2025-08-27 10:16               ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 17+ messages in thread
From: Damien Le Moal @ 2025-08-27  9:01 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Martin K . Petersen

On 8/27/25 5:42 PM, Sebastian Andrzej Siewior wrote:
> On 2025-08-27 17:01:49 [+0900], Damien Le Moal wrote:
>> Don't read a file. Read the disk directly. So please use "if=/dev/sdX".
>> Also, there is no way that a 1GiB I/O will be done as a single large command.
>> That is not going to happen.
>>
>> With 345c5091ffec reverted, what does:
>>
>> cat /sys/block/sdX/queue/max_sectors_kb
>> cat /sys/block/sdX/queue/max_hw_sectors_kb
>>
>> say ?
> 
> | # cat /sys/block/sda/queue/max_sectors_kb
> | 1280
> | # cat /sys/block/sda/queue/max_hw_sectors_kb
> | 32767
> 
>> Likely, the first one is "1280". So before running dd, you need to do:
>>
>> echo 4096 > /sys/block/sdX/queue/max_sectors_kb
>>
>> and then
>>
>> dd if=/dev/sdX of=/dev/null bs=4M count=1 iflag=direct
> 
> | # echo 4096 > /sys/block/sda/queue/max_sectors_kb
> | # cat /sys/block/sda/queue/max_sectors_kb 
> | 4096
> | # dd if=/dev/sda of=/dev/null bs=4M count=1 iflag=direct
> | 1+0 records in
> | 1+0 records out
> | 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.00966543 s, 434 MB/s
> 
> It passed.
> After a reboot I issued the same dd command five times and all came
> back. Then I increased the sector size and issued it again. The first
> two came back and then
> 
> | root@zen3:~# dd if=/dev/sda of=/dev/null bs=4M count=1 iflag=direct
> | 1+0 records in
> | 1+0 records out
> | 4194304 bytes (4.2 MB, 4.0 MiB) copied, 33.1699 s, 126 kB/s
> | root@zen3:~# dd if=/dev/sda of=/dev/null bs=4M count=1 iflag=direct
> | 1+0 records in
> | 1+0 records out
> | 4194304 bytes (4.2 MB, 4.0 MiB) copied, 57.3711 s, 73.1 kB/s
> | root@zen3:~# dd if=/dev/sda of=/dev/null bs=4M count=1 iflag=direct
> | 1+0 records in
> | 1+0 records out
> | 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0264171 s, 159 MB/s
> 
> They all came back but as you see on the speed side, it took while. And
> I see
> | [  191.641315] ata1.00: exception Emask 0x0 SAct 0x800000 SErr 0x0 action 0x6 frozen
> | [  191.648839] ata1.00: failed command: READ FPDMA QUEUED
> | [  191.653995] ata1.00: cmd 60/00:b8:00:00:00/20:00:00:00:00/40 tag 23 ncq dma 4194304 in
> |                         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> | [  191.669306] ata1.00: status: { DRDY }
> | [  191.672981] ata1: hard resetting link
> | [  192.702763] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> | [  192.702964] ata1.00: Security Log not supported
> | [  192.703207] ata1.00: Security Log not supported
> | [  192.703215] ata1.00: configured for UDMA/133
> | [  192.703282] ata1: EH complete
> | [  248.985303] ata1.00: exception Emask 0x0 SAct 0x10001 SErr 0x0 action 0x6 frozen
> | [  248.992733] ata1.00: failed command: READ FPDMA QUEUED
> | [  248.997889] ata1.00: cmd 60/00:00:00:00:00/20:00:00:00:00/40 tag 0 ncq dma 4194304 in
> |                         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> | [  249.013107] ata1.00: status: { DRDY }
> | [  249.016775] ata1.00: failed command: WRITE FPDMA QUEUED
> | [  249.022011] ata1.00: cmd 61/08:80:40:d1:18/00:00:00:00:00/40 tag 16 ncq dma 4096 out
> |                         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

The drive is not responding. So likely a drive/adapter FW bug.

> | [  249.037135] ata1.00: status: { DRDY }
> | [  249.040802] ata1: hard resetting link
> | [  250.076059] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> | [  250.076258] ata1.00: Security Log not supported
> | [  250.076471] ata1.00: Security Log not supported
> | [  250.076478] ata1.00: configured for UDMA/133
> | [  250.076537] ata1: EH complete
> 
>> And you will likely trigger the issue, even with 345c5091ffec reverted.
>> The issue is likely caused by a FW bug handling large commands.
>> Please try.
> Done. It seems the firmware is not always dedicated to fulfill larger
> requests.

Yep, looks like it.
What is the driver used for that "PowerEdge R6525 which exposes a "DELLBOSS VD"
device with firmware MV.R00-0" ? Is it the regular ahci driver ?
If yes, we can quirk it to limit the max command size, but we would need to
know what the limit is. That means repeating that test with varying max command
sise (max_sectors_kb) to try to figure out the threshold.

And maybe contact Dell support too if this is still a supported device.
(I have zero experience with this, no idea what that DELLBOSS VD is...)


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP
  2025-08-27  9:01             ` Damien Le Moal
@ 2025-08-27 10:16               ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 17+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-08-27 10:16 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Martin K . Petersen

On 2025-08-27 18:01:58 [+0900], Damien Le Moal wrote:
> Yep, looks like it.
> What is the driver used for that "PowerEdge R6525 which exposes a "DELLBOSS VD"
> device with firmware MV.R00-0" ? Is it the regular ahci driver ?
> If yes, we can quirk it to limit the max command size, but we would need to
> know what the limit is. That means repeating that test with varying max command
> sise (max_sectors_kb) to try to figure out the threshold.

issued 'dd if=/dev/sda of=/dev/null bs=4M count=1 iflag=direct'

| echo 3584 > /sys/block/sda/queue/max_sectors_kb

survives

| echo 4096 > /sys/block/sda/queue/max_sectors_kb
dies after a few attempts.

| lrwxrwxrwx 1 root root 0 Aug 27 11:01 0:0:0:0 -> ../../../devices/pci0000:a0/0000:a0:03.1/0000:a1:00.0/ata1/host0/target0:0:0/0:0:0:0
| a1:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9230 PCIe 2.0 x2 4-port SATA 6 Gb/s RAID Controller (rev 11)
|         Subsystem: Dell BOSS-S1 Adapter
|         Kernel driver in use: ahci
|         Kernel modules: ahci

so a Marvell one. Interesting.

> And maybe contact Dell support too if this is still a supported device.
> (I have zero experience with this, no idea what that DELLBOSS VD is...)

There are two physical disks which are behind this controller and
exposed as one virtual device after applying some raid magic. I have no
idea if this limitation is due to the physical device or the controller.

Now, how do I put this. The disk behind it is an INTEL SSDSCKKB240G8R.
The firmware XC31DL6P exposed this problem. After an update to XC31DL6R
the problem is gone.

Sebastian

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-08-27 10:16 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-18  6:00 [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP Damien Le Moal
2025-06-18  6:17 ` Hannes Reinecke
2025-06-18  8:51 ` Johannes Thumshirn
2025-06-18  9:06 ` John Garry
2025-06-18  9:47   ` Damien Le Moal
2025-06-18 10:19 ` Martin K. Petersen
2025-06-23 13:40 ` Christoph Hellwig
2025-06-24 16:49 ` Jens Axboe
2025-08-27  7:07   ` Sebastian Andrzej Siewior
2025-08-27  7:38     ` Christoph Hellwig
2025-08-27  7:52       ` Sebastian Andrzej Siewior
2025-08-27  8:00         ` Christoph Hellwig
2025-08-27  8:03           ` Damien Le Moal
2025-08-27  8:01         ` Damien Le Moal
2025-08-27  8:42           ` Sebastian Andrzej Siewior
2025-08-27  9:01             ` Damien Le Moal
2025-08-27 10:16               ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox