* [PATCH v4 0/2] Improve optimal IO size initialization
@ 2025-06-13 6:29 Damien Le Moal
2025-06-13 6:29 ` [PATCH v4 1/2] scsi: sd: Prevent logical_to_bytes() from returning overflowed values Damien Le Moal
2025-06-13 6:29 ` [PATCH v4 2/2] scsi: sd: Set a default optimal IO size if one is not defined Damien Le Moal
0 siblings, 2 replies; 7+ messages in thread
From: Damien Le Moal @ 2025-06-13 6:29 UTC (permalink / raw)
To: Martin K . Petersen, linux-scsi
A couple of patches to improve setting the optimal I/O size limit of
scsi disks. A fallback default is added to make sure we always have a
non-zero optimal I/O size so that file systems operate with a
reasonnably sized default read_ahead_kb value, for improving buffered
read performance.
Changes from v1:
- Changed message level from wrong WARNING level to INFO level
- Added review tag
Changes from v2:
- Added patch 1
- Make sure we do not overflow variables and limits in patch 2
Changes from v3:
- Change logical_to_bytes() to return a u64 in patch 1
- Added review tag to patch 2
Damien Le Moal (2):
scsi: sd: Prevent logical_to_bytes() from returning overflowed values
scsi: sd: Set a default optimal IO size if one is not defined
drivers/scsi/sd.c | 45 +++++++++++++++++++++++++++++++++++----------
drivers/scsi/sd.h | 4 ++--
2 files changed, 37 insertions(+), 12 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v4 1/2] scsi: sd: Prevent logical_to_bytes() from returning overflowed values
2025-06-13 6:29 [PATCH v4 0/2] Improve optimal IO size initialization Damien Le Moal
@ 2025-06-13 6:29 ` Damien Le Moal
2025-06-13 16:17 ` Bart Van Assche
2025-06-13 6:29 ` [PATCH v4 2/2] scsi: sd: Set a default optimal IO size if one is not defined Damien Le Moal
1 sibling, 1 reply; 7+ messages in thread
From: Damien Le Moal @ 2025-06-13 6:29 UTC (permalink / raw)
To: Martin K . Petersen, linux-scsi
Make sure that logical_to_bytes() does not return an overflowed value
by changing its return type from unsigned int (32-bits) to u64
(64-bits). And while at it, also use a bit-shift instead of a
multiplication, similar to logical_to_sectors() and bytes_to_logical().
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
drivers/scsi/sd.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 36382eca941c..53658679e063 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -213,9 +213,9 @@ static inline sector_t logical_to_sectors(struct scsi_device *sdev, sector_t blo
return blocks << (ilog2(sdev->sector_size) - 9);
}
-static inline unsigned int logical_to_bytes(struct scsi_device *sdev, sector_t blocks)
+static inline u64 logical_to_bytes(struct scsi_device *sdev, sector_t blocks)
{
- return blocks * sdev->sector_size;
+ return (u64)blocks << ilog2(sdev->sector_size);
}
static inline sector_t bytes_to_logical(struct scsi_device *sdev, unsigned int bytes)
--
2.49.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v4 2/2] scsi: sd: Set a default optimal IO size if one is not defined
2025-06-13 6:29 [PATCH v4 0/2] Improve optimal IO size initialization Damien Le Moal
2025-06-13 6:29 ` [PATCH v4 1/2] scsi: sd: Prevent logical_to_bytes() from returning overflowed values Damien Le Moal
@ 2025-06-13 6:29 ` Damien Le Moal
2025-06-13 14:31 ` John Garry
1 sibling, 1 reply; 7+ messages in thread
From: Damien Le Moal @ 2025-06-13 6:29 UTC (permalink / raw)
To: Martin K . Petersen, linux-scsi
Introduce the helper function sd_set_io_opt() to set a disk io_opt
limit. This new way of setting this limit falls back to using the
max_sectors limit if the host does not define an optimal sector limit
and the device did not indicate an optimal transfer size (e.g. as is
the case for ATA devices). io_opt calculation is done using a local
64-bits variable to avoid overflows. The final value is clamped to
UINT_MAX aligned down to the device physical block size.
This fallback io_opt limit avoids setting up the disk with a zero
io_opt limit, which result in the rather small 128 KB read_ahead_kb
attribute. The larger read_ahead_kb value set with the default non-zero
io_opt limit significantly improves buffered read performance with file
systems without any intervention from the user.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
---
drivers/scsi/sd.c | 45 +++++++++++++++++++++++++++++++++++----------
1 file changed, 35 insertions(+), 10 deletions(-)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index daddef2e9e87..8070356285a7 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3681,6 +3681,40 @@ static void sd_read_block_zero(struct scsi_disk *sdkp)
kfree(buffer);
}
+/*
+ * Set the optimal I/O size: limit the default to the SCSI host optimal sector
+ * limit if it is set. There may be an impact on performance when the size of
+ * a request exceeds this host limit. If the host did not set any optimal
+ * sector limit and the device did not indicate an optimal transfer size
+ * (e.g. ATA devices), default to using the device max_sectors limit.
+ */
+static void sd_set_io_opt(struct scsi_disk *sdkp, unsigned int dev_max,
+ struct queue_limits *lim)
+{
+ struct scsi_device *sdp = sdkp->device;
+ struct Scsi_Host *shost = sdp->host;
+ u64 io_opt;
+
+ io_opt = (u64)shost->opt_sectors << SECTOR_SHIFT;
+ if (sd_validate_opt_xfer_size(sdkp, dev_max))
+ io_opt = min_not_zero(io_opt,
+ logical_to_bytes(sdp, sdkp->opt_xfer_blocks));
+ if (io_opt) {
+ lim->io_opt = ALIGN_DOWN(min_t(u64, io_opt, UINT_MAX),
+ sdkp->physical_block_size - 1);
+ return;
+ }
+
+ /* Set default */
+ io_opt = (u64)lim->max_sectors << SECTOR_SHIFT;
+ lim->io_opt = ALIGN_DOWN(min_t(u64, io_opt, UINT_MAX),
+ sdkp->physical_block_size - 1);
+
+ sd_first_printk(KERN_INFO, sdkp,
+ "Using default optimal transfer size of %u bytes\n",
+ lim->io_opt);
+}
+
/**
* sd_revalidate_disk - called the first time a new disk is seen,
* performs disk spin up, read_capacity, etc.
@@ -3777,16 +3811,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
else
lim.io_min = 0;
- /*
- * Limit default to SCSI host optimal sector limit if set. There may be
- * an impact on performance for when the size of a request exceeds this
- * host limit.
- */
- lim.io_opt = sdp->host->opt_sectors << SECTOR_SHIFT;
- if (sd_validate_opt_xfer_size(sdkp, dev_max)) {
- lim.io_opt = min_not_zero(lim.io_opt,
- logical_to_bytes(sdp, sdkp->opt_xfer_blocks));
- }
+ sd_set_io_opt(sdkp, dev_max, &lim);
sdkp->first_scan = 0;
--
2.49.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v4 2/2] scsi: sd: Set a default optimal IO size if one is not defined
2025-06-13 6:29 ` [PATCH v4 2/2] scsi: sd: Set a default optimal IO size if one is not defined Damien Le Moal
@ 2025-06-13 14:31 ` John Garry
2025-06-16 5:34 ` Damien Le Moal
0 siblings, 1 reply; 7+ messages in thread
From: John Garry @ 2025-06-13 14:31 UTC (permalink / raw)
To: Damien Le Moal, Martin K . Petersen, linux-scsi
On 13/06/2025 07:29, Damien Le Moal wrote:
> Introduce the helper function sd_set_io_opt() to set a disk io_opt
> limit. This new way of setting this limit falls back to using the
> max_sectors limit if the host does not define an optimal sector limit
> and the device did not indicate an optimal transfer size (e.g. as is
> the case for ATA devices). io_opt calculation is done using a local
> 64-bits variable to avoid overflows. The final value is clamped to
> UINT_MAX aligned down to the device physical block size.
>
> This fallback io_opt limit avoids setting up the disk with a zero
> io_opt limit, which result in the rather small 128 KB read_ahead_kb
> attribute. The larger read_ahead_kb value set with the default non-zero
> io_opt limit significantly improves buffered read performance with file
> systems without any intervention from the user.
Out of curiosity, why do this just for sd.c and not always set up the
default like this in blk_validate_limits()?
>
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> ---
> drivers/scsi/sd.c | 45 +++++++++++++++++++++++++++++++++++----------
> 1 file changed, 35 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index daddef2e9e87..8070356285a7 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3681,6 +3681,40 @@ static void sd_read_block_zero(struct scsi_disk *sdkp)
> kfree(buffer);
> }
>
> +/*
> + * Set the optimal I/O size: limit the default to the SCSI host optimal sector
> + * limit if it is set. There may be an impact on performance when the size of
> + * a request exceeds this host limit. If the host did not set any optimal
> + * sector limit and the device did not indicate an optimal transfer size
> + * (e.g. ATA devices), default to using the device max_sectors limit.
> + */
> +static void sd_set_io_opt(struct scsi_disk *sdkp, unsigned int dev_max,
> + struct queue_limits *lim)
> +{
> + struct scsi_device *sdp = sdkp->device;
> + struct Scsi_Host *shost = sdp->host;
> + u64 io_opt;
> +
> + io_opt = (u64)shost->opt_sectors << SECTOR_SHIFT;
> + if (sd_validate_opt_xfer_size(sdkp, dev_max))
> + io_opt = min_not_zero(io_opt,
> + logical_to_bytes(sdp, sdkp->opt_xfer_blocks));
> + if (io_opt) {
> + lim->io_opt = ALIGN_DOWN(min_t(u64, io_opt, UINT_MAX),
> + sdkp->physical_block_size - 1);
> + return;
> + }
> +
> + /* Set default */
> + io_opt = (u64)lim->max_sectors << SECTOR_SHIFT;
> + lim->io_opt = ALIGN_DOWN(min_t(u64, io_opt, UINT_MAX),
does lim->max_sectors << SECTOR_SHIFT really possibly overflow? I guess
that it the reason for the min_t() call.
> + sdkp->physical_block_size - 1);
blk_validate_limits() has the following:
lim->io_opt = round_down(lim->io_opt, lim->physical_block_size)
Does that do what we want already? I do realize that we want to print
the used value in lim->io_opt, below.
> +
> + sd_first_printk(KERN_INFO, sdkp,
> + "Using default optimal transfer size of %u bytes\n",
> + lim->io_opt);
> +}
> +
> /**
> * sd_revalidate_disk - called the first time a new disk is seen,
> * performs disk spin up, read_capacity, etc.
> @@ -3777,16 +3811,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
> else
> lim.io_min = 0;
>
> - /*
> - * Limit default to SCSI host optimal sector limit if set. There may be
> - * an impact on performance for when the size of a request exceeds this
> - * host limit.
> - */
> - lim.io_opt = sdp->host->opt_sectors << SECTOR_SHIFT;
> - if (sd_validate_opt_xfer_size(sdkp, dev_max)) {
> - lim.io_opt = min_not_zero(lim.io_opt,
> - logical_to_bytes(sdp, sdkp->opt_xfer_blocks));
> - }
> + sd_set_io_opt(sdkp, dev_max, &lim);
>
> sdkp->first_scan = 0;
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 1/2] scsi: sd: Prevent logical_to_bytes() from returning overflowed values
2025-06-13 6:29 ` [PATCH v4 1/2] scsi: sd: Prevent logical_to_bytes() from returning overflowed values Damien Le Moal
@ 2025-06-13 16:17 ` Bart Van Assche
0 siblings, 0 replies; 7+ messages in thread
From: Bart Van Assche @ 2025-06-13 16:17 UTC (permalink / raw)
To: Damien Le Moal, Martin K . Petersen, linux-scsi
On 6/12/25 11:29 PM, Damien Le Moal wrote:
> Make sure that logical_to_bytes() does not return an overflowed value
> by changing its return type from unsigned int (32-bits) to u64
> (64-bits). And while at it, also use a bit-shift instead of a
> multiplication, similar to logical_to_sectors() and bytes_to_logical().
>
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
> drivers/scsi/sd.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
> index 36382eca941c..53658679e063 100644
> --- a/drivers/scsi/sd.h
> +++ b/drivers/scsi/sd.h
> @@ -213,9 +213,9 @@ static inline sector_t logical_to_sectors(struct scsi_device *sdev, sector_t blo
> return blocks << (ilog2(sdev->sector_size) - 9);
> }
>
> -static inline unsigned int logical_to_bytes(struct scsi_device *sdev, sector_t blocks)
> +static inline u64 logical_to_bytes(struct scsi_device *sdev, sector_t blocks)
> {
> - return blocks * sdev->sector_size;
> + return (u64)blocks << ilog2(sdev->sector_size);
> }
From <linux/types.h>:
typedef u64 sector_t;
Hence, casting 'blocks' from type sector_t to type u64 is not necessary.
Since 'blocks' represents an LBA instead of a byte offset divided by
512, please consider changing "sector_t blocks" into "u64 logical_blocks".
Thanks,
Bart.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 2/2] scsi: sd: Set a default optimal IO size if one is not defined
2025-06-13 14:31 ` John Garry
@ 2025-06-16 5:34 ` Damien Le Moal
2025-06-16 6:26 ` Damien Le Moal
0 siblings, 1 reply; 7+ messages in thread
From: Damien Le Moal @ 2025-06-16 5:34 UTC (permalink / raw)
To: John Garry, Martin K . Petersen, linux-scsi
On 6/13/25 23:31, John Garry wrote:
> On 13/06/2025 07:29, Damien Le Moal wrote:
>> Introduce the helper function sd_set_io_opt() to set a disk io_opt
>> limit. This new way of setting this limit falls back to using the
>> max_sectors limit if the host does not define an optimal sector limit
>> and the device did not indicate an optimal transfer size (e.g. as is
>> the case for ATA devices). io_opt calculation is done using a local
>> 64-bits variable to avoid overflows. The final value is clamped to
>> UINT_MAX aligned down to the device physical block size.
>>
>> This fallback io_opt limit avoids setting up the disk with a zero
>> io_opt limit, which result in the rather small 128 KB read_ahead_kb
>> attribute. The larger read_ahead_kb value set with the default non-zero
>> io_opt limit significantly improves buffered read performance with file
>> systems without any intervention from the user.
>
> Out of curiosity, why do this just for sd.c and not always set up the
> default like this in blk_validate_limits()?
Good point. Though I think we do not want to have a large io_opt for slow
devices like MMC/SD Cards. So something like this, which is indeed simpler than
hacking lim->io_opt in sd.c.
diff --git a/block/blk-settings.c b/block/blk-settings.c
index a000daafbfb4..d3ec6f4100f4 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -58,16 +58,24 @@ EXPORT_SYMBOL(blk_set_stacking_limits);
void blk_apply_bdi_limits(struct backing_dev_info *bdi,
struct queue_limits *lim)
{
+ u64 io_opt = lim->io_opt;
+
/*
* For read-ahead of large files to be effective, we need to read ahead
- * at least twice the optimal I/O size.
+ * at least twice the optimal I/O size. For rotational devices that do
+ * not report an optimal I/O size (e.g. ATA HDDs), use the maximum I/O
+ * size to avoid falling back to the (rather inefficient) small default
+ * read-ahead size.
*
* There is no hardware limitation for the read-ahead size and the user
* might have increased the read-ahead size through sysfs, so don't ever
* decrease it.
*/
+ if (!io_opt && (lim->features & BLK_FEAT_ROTATIONAL))
+ io_opt = lim->max_sectors;
+
bdi->ra_pages = max3(bdi->ra_pages,
- lim->io_opt * 2 / PAGE_SIZE,
+ io_opt * 2 >> PAGE_SHIFT,
VM_READAHEAD_PAGES);
bdi->io_pages = lim->max_sectors >> PAGE_SECTORS_SHIFT;
}
I will make a proper patch of this and send it out as a replacement.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v4 2/2] scsi: sd: Set a default optimal IO size if one is not defined
2025-06-16 5:34 ` Damien Le Moal
@ 2025-06-16 6:26 ` Damien Le Moal
0 siblings, 0 replies; 7+ messages in thread
From: Damien Le Moal @ 2025-06-16 6:26 UTC (permalink / raw)
To: John Garry, Martin K . Petersen, linux-scsi
On 6/16/25 14:34, Damien Le Moal wrote:
> On 6/13/25 23:31, John Garry wrote:
>> On 13/06/2025 07:29, Damien Le Moal wrote:
>>> Introduce the helper function sd_set_io_opt() to set a disk io_opt
>>> limit. This new way of setting this limit falls back to using the
>>> max_sectors limit if the host does not define an optimal sector limit
>>> and the device did not indicate an optimal transfer size (e.g. as is
>>> the case for ATA devices). io_opt calculation is done using a local
>>> 64-bits variable to avoid overflows. The final value is clamped to
>>> UINT_MAX aligned down to the device physical block size.
>>>
>>> This fallback io_opt limit avoids setting up the disk with a zero
>>> io_opt limit, which result in the rather small 128 KB read_ahead_kb
>>> attribute. The larger read_ahead_kb value set with the default non-zero
>>> io_opt limit significantly improves buffered read performance with file
>>> systems without any intervention from the user.
>>
>> Out of curiosity, why do this just for sd.c and not always set up the
>> default like this in blk_validate_limits()?
>
> Good point. Though I think we do not want to have a large io_opt for slow
> devices like MMC/SD Cards. So something like this, which is indeed simpler than
> hacking lim->io_opt in sd.c.
>
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index a000daafbfb4..d3ec6f4100f4 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -58,16 +58,24 @@ EXPORT_SYMBOL(blk_set_stacking_limits);
> void blk_apply_bdi_limits(struct backing_dev_info *bdi,
> struct queue_limits *lim)
> {
> + u64 io_opt = lim->io_opt;
> +
> /*
> * For read-ahead of large files to be effective, we need to read ahead
> - * at least twice the optimal I/O size.
> + * at least twice the optimal I/O size. For rotational devices that do
> + * not report an optimal I/O size (e.g. ATA HDDs), use the maximum I/O
> + * size to avoid falling back to the (rather inefficient) small default
> + * read-ahead size.
> *
> * There is no hardware limitation for the read-ahead size and the user
> * might have increased the read-ahead size through sysfs, so don't ever
> * decrease it.
> */
> + if (!io_opt && (lim->features & BLK_FEAT_ROTATIONAL))
> + io_opt = lim->max_sectors;
Oops... This should of course be:
io_opt = (u64)lim->max_sectors << SECTOR_SHIFT;
> +
> bdi->ra_pages = max3(bdi->ra_pages,
> - lim->io_opt * 2 / PAGE_SIZE,
> + io_opt * 2 >> PAGE_SHIFT,
> VM_READAHEAD_PAGES);
> bdi->io_pages = lim->max_sectors >> PAGE_SECTORS_SHIFT;
> }
>
> I will make a proper patch of this and send it out as a replacement.
>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-06-16 6:26 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-13 6:29 [PATCH v4 0/2] Improve optimal IO size initialization Damien Le Moal
2025-06-13 6:29 ` [PATCH v4 1/2] scsi: sd: Prevent logical_to_bytes() from returning overflowed values Damien Le Moal
2025-06-13 16:17 ` Bart Van Assche
2025-06-13 6:29 ` [PATCH v4 2/2] scsi: sd: Set a default optimal IO size if one is not defined Damien Le Moal
2025-06-13 14:31 ` John Garry
2025-06-16 5:34 ` Damien Le Moal
2025-06-16 6:26 ` Damien Le Moal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox