* [PATCH v15 01/19] block: Introduce more member variables related to zone write locking
2023-11-14 21:16 [PATCH v15 00/19] Improve write performance for zoned UFS devices Bart Van Assche
@ 2023-11-14 21:16 ` Bart Van Assche
2023-11-19 23:29 ` Damien Le Moal
2023-11-14 21:16 ` [PATCH v15 02/19] block: Only use write locking if necessary Bart Van Assche
` (18 subsequent siblings)
19 siblings, 1 reply; 32+ messages in thread
From: Bart Van Assche @ 2023-11-14 21:16 UTC (permalink / raw)
To: Martin K . Petersen
Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig,
Bart Van Assche, Damien Le Moal, Hannes Reinecke, Nitesh Shetty,
Ming Lei
Many but not all storage controllers require serialization of zoned writes.
Introduce two new request queue limit member variables related to write
serialization. 'driver_preserves_write_order' allows block drivers to
indicate that the order of write commands is preserved and hence that
serialization of writes per zone is not required. 'use_zone_write_lock' is
set by disk_set_zoned() if and only if the block device has zones and if
the block driver does not preserve the order of write requests.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
block/blk-settings.c | 15 +++++++++++++++
block/blk-zoned.c | 1 +
include/linux/blkdev.h | 10 ++++++++++
3 files changed, 26 insertions(+)
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 0046b447268f..4c776c08f190 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -56,6 +56,8 @@ void blk_set_default_limits(struct queue_limits *lim)
lim->alignment_offset = 0;
lim->io_opt = 0;
lim->misaligned = 0;
+ lim->driver_preserves_write_order = false;
+ lim->use_zone_write_lock = false;
lim->zoned = BLK_ZONED_NONE;
lim->zone_write_granularity = 0;
lim->dma_alignment = 511;
@@ -82,6 +84,8 @@ void blk_set_stacking_limits(struct queue_limits *lim)
lim->max_dev_sectors = UINT_MAX;
lim->max_write_zeroes_sectors = UINT_MAX;
lim->max_zone_append_sectors = UINT_MAX;
+ /* Request-based stacking drivers do not reorder requests. */
+ lim->driver_preserves_write_order = true;
}
EXPORT_SYMBOL(blk_set_stacking_limits);
@@ -685,6 +689,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
b->max_secure_erase_sectors);
t->zone_write_granularity = max(t->zone_write_granularity,
b->zone_write_granularity);
+ t->driver_preserves_write_order = t->driver_preserves_write_order &&
+ b->driver_preserves_write_order;
+ t->use_zone_write_lock = t->use_zone_write_lock ||
+ b->use_zone_write_lock;
t->zoned = max(t->zoned, b->zoned);
return ret;
}
@@ -949,6 +957,13 @@ void disk_set_zoned(struct gendisk *disk, enum blk_zoned_model model)
}
q->limits.zoned = model;
+ /*
+ * Use the zone write lock only for zoned block devices and only if
+ * the block driver does not preserve the order of write commands.
+ */
+ q->limits.use_zone_write_lock = model != BLK_ZONED_NONE &&
+ !q->limits.driver_preserves_write_order;
+
if (model != BLK_ZONED_NONE) {
/*
* Set the zone write granularity to the device logical block
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 619ee41a51cc..112620985bff 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -631,6 +631,7 @@ void disk_clear_zone_settings(struct gendisk *disk)
q->limits.chunk_sectors = 0;
q->limits.zone_write_granularity = 0;
q->limits.max_zone_append_sectors = 0;
+ q->limits.use_zone_write_lock = false;
blk_mq_unfreeze_queue(q);
}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 51fa7ffdee83..2d452f5a36c8 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -316,6 +316,16 @@ struct queue_limits {
unsigned char misaligned;
unsigned char discard_misaligned;
unsigned char raid_partial_stripes_expensive;
+ /*
+ * Whether or not the block driver preserves the order of write
+ * requests. Set by the block driver.
+ */
+ bool driver_preserves_write_order;
+ /*
+ * Whether or not zone write locking should be used. Set by
+ * disk_set_zoned().
+ */
+ bool use_zone_write_lock;
enum blk_zoned_model zoned;
/*
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH v15 01/19] block: Introduce more member variables related to zone write locking
2023-11-14 21:16 ` [PATCH v15 01/19] block: Introduce more member variables related to zone write locking Bart Van Assche
@ 2023-11-19 23:29 ` Damien Le Moal
2023-11-20 20:44 ` Bart Van Assche
0 siblings, 1 reply; 32+ messages in thread
From: Damien Le Moal @ 2023-11-19 23:29 UTC (permalink / raw)
To: Bart Van Assche, Martin K . Petersen
Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig,
Hannes Reinecke, Nitesh Shetty, Ming Lei
On 11/15/23 06:16, Bart Van Assche wrote:
> Many but not all storage controllers require serialization of zoned writes.
> Introduce two new request queue limit member variables related to write
> serialization. 'driver_preserves_write_order' allows block drivers to
> indicate that the order of write commands is preserved and hence that
> serialization of writes per zone is not required. 'use_zone_write_lock' is
> set by disk_set_zoned() if and only if the block device has zones and if
> the block driver does not preserve the order of write requests.
>
> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
> block/blk-settings.c | 15 +++++++++++++++
> block/blk-zoned.c | 1 +
> include/linux/blkdev.h | 10 ++++++++++
> 3 files changed, 26 insertions(+)
>
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 0046b447268f..4c776c08f190 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -56,6 +56,8 @@ void blk_set_default_limits(struct queue_limits *lim)
> lim->alignment_offset = 0;
> lim->io_opt = 0;
> lim->misaligned = 0;
> + lim->driver_preserves_write_order = false;
> + lim->use_zone_write_lock = false;
> lim->zoned = BLK_ZONED_NONE;
> lim->zone_write_granularity = 0;
> lim->dma_alignment = 511;
> @@ -82,6 +84,8 @@ void blk_set_stacking_limits(struct queue_limits *lim)
> lim->max_dev_sectors = UINT_MAX;
> lim->max_write_zeroes_sectors = UINT_MAX;
> lim->max_zone_append_sectors = UINT_MAX;
> + /* Request-based stacking drivers do not reorder requests. */
Rereading this patch, I do not think this statement is correct. I seriously
doubt that multipath will preserve write command order in all cases...
> + lim->driver_preserves_write_order = true;
... so it is likely much safer to set the default to "false" as that is the
default for all requests in general.
> }
> EXPORT_SYMBOL(blk_set_stacking_limits);
>
> @@ -685,6 +689,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
> b->max_secure_erase_sectors);
> t->zone_write_granularity = max(t->zone_write_granularity,
> b->zone_write_granularity);
> + t->driver_preserves_write_order = t->driver_preserves_write_order &&
> + b->driver_preserves_write_order;
> + t->use_zone_write_lock = t->use_zone_write_lock ||
> + b->use_zone_write_lock;
Very minor nit: splitting the line after the equal would make this more readable.
> t->zoned = max(t->zoned, b->zoned);
> return ret;
> }
> @@ -949,6 +957,13 @@ void disk_set_zoned(struct gendisk *disk, enum blk_zoned_model model)
> }
>
> q->limits.zoned = model;
> + /*
> + * Use the zone write lock only for zoned block devices and only if
> + * the block driver does not preserve the order of write commands.
> + */
> + q->limits.use_zone_write_lock = model != BLK_ZONED_NONE &&
> + !q->limits.driver_preserves_write_order;
> +
> if (model != BLK_ZONED_NONE) {
> /*
> * Set the zone write granularity to the device logical block
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> index 619ee41a51cc..112620985bff 100644
> --- a/block/blk-zoned.c
> +++ b/block/blk-zoned.c
> @@ -631,6 +631,7 @@ void disk_clear_zone_settings(struct gendisk *disk)
> q->limits.chunk_sectors = 0;
> q->limits.zone_write_granularity = 0;
> q->limits.max_zone_append_sectors = 0;
> + q->limits.use_zone_write_lock = false;
>
> blk_mq_unfreeze_queue(q);
> }
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 51fa7ffdee83..2d452f5a36c8 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -316,6 +316,16 @@ struct queue_limits {
> unsigned char misaligned;
> unsigned char discard_misaligned;
> unsigned char raid_partial_stripes_expensive;
> + /*
> + * Whether or not the block driver preserves the order of write
> + * requests. Set by the block driver.
> + */
> + bool driver_preserves_write_order;
> + /*
> + * Whether or not zone write locking should be used. Set by
> + * disk_set_zoned().
> + */
> + bool use_zone_write_lock;
> enum blk_zoned_model zoned;
>
> /*
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH v15 01/19] block: Introduce more member variables related to zone write locking
2023-11-19 23:29 ` Damien Le Moal
@ 2023-11-20 20:44 ` Bart Van Assche
2023-11-20 23:02 ` Damien Le Moal
0 siblings, 1 reply; 32+ messages in thread
From: Bart Van Assche @ 2023-11-20 20:44 UTC (permalink / raw)
To: Damien Le Moal, Martin K . Petersen
Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig,
Hannes Reinecke, Nitesh Shetty, Ming Lei
On 11/19/23 15:29, Damien Le Moal wrote:
> On 11/15/23 06:16, Bart Van Assche wrote:
>> @@ -82,6 +84,8 @@ void blk_set_stacking_limits(struct queue_limits *lim)
>> lim->max_dev_sectors = UINT_MAX;
>> lim->max_write_zeroes_sectors = UINT_MAX;
>> lim->max_zone_append_sectors = UINT_MAX;
>> + /* Request-based stacking drivers do not reorder requests. */
>
> Rereading this patch, I do not think this statement is correct. I seriously
> doubt that multipath will preserve write command order in all cases...
>
>> + lim->driver_preserves_write_order = true;
>
> ... so it is likely much safer to set the default to "false" as that is the
> default for all requests in general.
How about applying this (untested) patch on top of this patch series?
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 4c776c08f190..aba1972e9767 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -84,8 +84,6 @@ void blk_set_stacking_limits(struct queue_limits *lim)
lim->max_dev_sectors = UINT_MAX;
lim->max_write_zeroes_sectors = UINT_MAX;
lim->max_zone_append_sectors = UINT_MAX;
- /* Request-based stacking drivers do not reorder requests. */
- lim->driver_preserves_write_order = true;
}
EXPORT_SYMBOL(blk_set_stacking_limits);
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 2d3e186ca87e..cb9abe4bd065 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -147,6 +147,11 @@ static int linear_report_zones(struct dm_target *ti,
#define linear_report_zones NULL
#endif
+static void linear_io_hints(struct dm_target *ti, struct queue_limits *limits)
+{
+ limits->driver_preserves_write_order = true;
+}
+
static int linear_iterate_devices(struct dm_target *ti,
iterate_devices_callout_fn fn, void *data)
{
@@ -208,6 +213,7 @@ static struct target_type linear_target = {
.map = linear_map,
.status = linear_status,
.prepare_ioctl = linear_prepare_ioctl,
+ .io_hints = linear_io_hints,
.iterate_devices = linear_iterate_devices,
.direct_access = linear_dax_direct_access,
.dax_zero_page_range = linear_dax_zero_page_range,
>> @@ -685,6 +689,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>> b->max_secure_erase_sectors);
>> t->zone_write_granularity = max(t->zone_write_granularity,
>> b->zone_write_granularity);
>> + t->driver_preserves_write_order = t->driver_preserves_write_order &&
>> + b->driver_preserves_write_order;
>> + t->use_zone_write_lock = t->use_zone_write_lock ||
>> + b->use_zone_write_lock;
>
> Very minor nit: splitting the line after the equal would make this more readable.
Hmm ... I have often seen other reviewers asking to maximize the use of each
source code line as much as reasonably possible.
Thanks,
Bart.
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH v15 01/19] block: Introduce more member variables related to zone write locking
2023-11-20 20:44 ` Bart Van Assche
@ 2023-11-20 23:02 ` Damien Le Moal
2023-11-20 23:58 ` Bart Van Assche
0 siblings, 1 reply; 32+ messages in thread
From: Damien Le Moal @ 2023-11-20 23:02 UTC (permalink / raw)
To: Bart Van Assche, Martin K . Petersen
Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig,
Hannes Reinecke, Nitesh Shetty, Ming Lei
On 11/21/23 05:44, Bart Van Assche wrote:
> On 11/19/23 15:29, Damien Le Moal wrote:
>> On 11/15/23 06:16, Bart Van Assche wrote:
>>> @@ -82,6 +84,8 @@ void blk_set_stacking_limits(struct queue_limits *lim)
>>> lim->max_dev_sectors = UINT_MAX;
>>> lim->max_write_zeroes_sectors = UINT_MAX;
>>> lim->max_zone_append_sectors = UINT_MAX;
>>> + /* Request-based stacking drivers do not reorder requests. */
>>
>> Rereading this patch, I do not think this statement is correct. I seriously
>> doubt that multipath will preserve write command order in all cases...
>>
>>> + lim->driver_preserves_write_order = true;
>>
>> ... so it is likely much safer to set the default to "false" as that is the
>> default for all requests in general.
>
> How about applying this (untested) patch on top of this patch series?
>
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 4c776c08f190..aba1972e9767 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -84,8 +84,6 @@ void blk_set_stacking_limits(struct queue_limits *lim)
> lim->max_dev_sectors = UINT_MAX;
> lim->max_write_zeroes_sectors = UINT_MAX;
> lim->max_zone_append_sectors = UINT_MAX;
> - /* Request-based stacking drivers do not reorder requests. */
> - lim->driver_preserves_write_order = true;
> }
> EXPORT_SYMBOL(blk_set_stacking_limits);
>
> diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
> index 2d3e186ca87e..cb9abe4bd065 100644
> --- a/drivers/md/dm-linear.c
> +++ b/drivers/md/dm-linear.c
> @@ -147,6 +147,11 @@ static int linear_report_zones(struct dm_target *ti,
> #define linear_report_zones NULL
> #endif
>
> +static void linear_io_hints(struct dm_target *ti, struct queue_limits *limits)
> +{
> + limits->driver_preserves_write_order = true;
> +}
Hmm, but does dm-linear preserve write order ? I am not convinced. And what
about dm-flakey, dm-error and dm-crypt ? All of these also support zoned
devices. I do not think that we can say that any of these preserve write order.
> +
> static int linear_iterate_devices(struct dm_target *ti,
> iterate_devices_callout_fn fn, void *data)
> {
> @@ -208,6 +213,7 @@ static struct target_type linear_target = {
> .map = linear_map,
> .status = linear_status,
> .prepare_ioctl = linear_prepare_ioctl,
> + .io_hints = linear_io_hints,
> .iterate_devices = linear_iterate_devices,
> .direct_access = linear_dax_direct_access,
> .dax_zero_page_range = linear_dax_zero_page_range,
>
>>> @@ -685,6 +689,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>>> b->max_secure_erase_sectors);
>>> t->zone_write_granularity = max(t->zone_write_granularity,
>>> b->zone_write_granularity);
>>> + t->driver_preserves_write_order = t->driver_preserves_write_order &&
>>> + b->driver_preserves_write_order;
>>> + t->use_zone_write_lock = t->use_zone_write_lock ||
>>> + b->use_zone_write_lock;
>>
>> Very minor nit: splitting the line after the equal would make this more readable.
>
> Hmm ... I have often seen other reviewers asking to maximize the use of each
> source code line as much as reasonably possible.
As I said, very minor nit :) Feel free to ignore.
>
> Thanks,
>
> Bart.
>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH v15 01/19] block: Introduce more member variables related to zone write locking
2023-11-20 23:02 ` Damien Le Moal
@ 2023-11-20 23:58 ` Bart Van Assche
2023-11-21 1:21 ` Damien Le Moal
0 siblings, 1 reply; 32+ messages in thread
From: Bart Van Assche @ 2023-11-20 23:58 UTC (permalink / raw)
To: Damien Le Moal, Martin K . Petersen
Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig,
Hannes Reinecke, Nitesh Shetty, Ming Lei
On 11/20/23 15:02, Damien Le Moal wrote:
> On 11/21/23 05:44, Bart Van Assche wrote:
>> How about applying this (untested) patch on top of this patch series?
>>
>> diff --git a/block/blk-settings.c b/block/blk-settings.c
>> index 4c776c08f190..aba1972e9767 100644
>> --- a/block/blk-settings.c
>> +++ b/block/blk-settings.c
>> @@ -84,8 +84,6 @@ void blk_set_stacking_limits(struct queue_limits *lim)
>> lim->max_dev_sectors = UINT_MAX;
>> lim->max_write_zeroes_sectors = UINT_MAX;
>> lim->max_zone_append_sectors = UINT_MAX;
>> - /* Request-based stacking drivers do not reorder requests. */
>> - lim->driver_preserves_write_order = true;
>> }
>> EXPORT_SYMBOL(blk_set_stacking_limits);
>>
>> diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
>> index 2d3e186ca87e..cb9abe4bd065 100644
>> --- a/drivers/md/dm-linear.c
>> +++ b/drivers/md/dm-linear.c
>> @@ -147,6 +147,11 @@ static int linear_report_zones(struct dm_target *ti,
>> #define linear_report_zones NULL
>> #endif
>>
>> +static void linear_io_hints(struct dm_target *ti, struct queue_limits *limits)
>> +{
>> + limits->driver_preserves_write_order = true;
>> +}
>
> Hmm, but does dm-linear preserve write order ? I am not convinced. And what
> about dm-flakey, dm-error and dm-crypt ? All of these also support zoned
> devices. I do not think that we can say that any of these preserve write order.
Hi Damien,
I propose to keep any changes for files in the drivers/md/ directory for
later. This patch series is already big enough. Additionally, I don't
need the dm changes myself since Android does does not use dm-linear nor
dm-verity to access a zoned logical unit. All we need to know right now
is that the approach of this patch series can be extended to dm drivers.
Thanks,
Bart.
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH v15 01/19] block: Introduce more member variables related to zone write locking
2023-11-20 23:58 ` Bart Van Assche
@ 2023-11-21 1:21 ` Damien Le Moal
2023-11-21 2:12 ` Damien Le Moal
0 siblings, 1 reply; 32+ messages in thread
From: Damien Le Moal @ 2023-11-21 1:21 UTC (permalink / raw)
To: Bart Van Assche, Martin K . Petersen
Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig,
Hannes Reinecke, Nitesh Shetty, Ming Lei
On 11/21/23 08:58, Bart Van Assche wrote:
> On 11/20/23 15:02, Damien Le Moal wrote:
>> On 11/21/23 05:44, Bart Van Assche wrote:
>>> How about applying this (untested) patch on top of this patch series?
>>>
>>> diff --git a/block/blk-settings.c b/block/blk-settings.c
>>> index 4c776c08f190..aba1972e9767 100644
>>> --- a/block/blk-settings.c
>>> +++ b/block/blk-settings.c
>>> @@ -84,8 +84,6 @@ void blk_set_stacking_limits(struct queue_limits *lim)
>>> lim->max_dev_sectors = UINT_MAX;
>>> lim->max_write_zeroes_sectors = UINT_MAX;
>>> lim->max_zone_append_sectors = UINT_MAX;
>>> - /* Request-based stacking drivers do not reorder requests. */
>>> - lim->driver_preserves_write_order = true;
>>> }
>>> EXPORT_SYMBOL(blk_set_stacking_limits);
>>>
>>> diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
>>> index 2d3e186ca87e..cb9abe4bd065 100644
>>> --- a/drivers/md/dm-linear.c
>>> +++ b/drivers/md/dm-linear.c
>>> @@ -147,6 +147,11 @@ static int linear_report_zones(struct dm_target *ti,
>>> #define linear_report_zones NULL
>>> #endif
>>>
>>> +static void linear_io_hints(struct dm_target *ti, struct queue_limits *limits)
>>> +{
>>> + limits->driver_preserves_write_order = true;
>>> +}
>>
>> Hmm, but does dm-linear preserve write order ? I am not convinced. And what
>> about dm-flakey, dm-error and dm-crypt ? All of these also support zoned
>> devices. I do not think that we can say that any of these preserve write order.
>
> Hi Damien,
>
> I propose to keep any changes for files in the drivers/md/ directory for
> later. This patch series is already big enough. Additionally, I don't
> need the dm changes myself since Android does does not use dm-linear nor
> dm-verity to access a zoned logical unit. All we need to know right now
> is that the approach of this patch series can be extended to dm drivers.
Agree. For now, dm will keep working as usual using the zone write locking. We
can optimize that later as needed and if possible. So initializing the limits
driver_preserves_write_order to false (default) is the way to go.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH v15 01/19] block: Introduce more member variables related to zone write locking
2023-11-21 1:21 ` Damien Le Moal
@ 2023-11-21 2:12 ` Damien Le Moal
0 siblings, 0 replies; 32+ messages in thread
From: Damien Le Moal @ 2023-11-21 2:12 UTC (permalink / raw)
To: Bart Van Assche, Martin K . Petersen
Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig,
Hannes Reinecke, Nitesh Shetty, Ming Lei
On 11/21/23 10:21, Damien Le Moal wrote:
> On 11/21/23 08:58, Bart Van Assche wrote:
>> On 11/20/23 15:02, Damien Le Moal wrote:
>>> On 11/21/23 05:44, Bart Van Assche wrote:
>>>> How about applying this (untested) patch on top of this patch series?
>>>>
>>>> diff --git a/block/blk-settings.c b/block/blk-settings.c
>>>> index 4c776c08f190..aba1972e9767 100644
>>>> --- a/block/blk-settings.c
>>>> +++ b/block/blk-settings.c
>>>> @@ -84,8 +84,6 @@ void blk_set_stacking_limits(struct queue_limits *lim)
>>>> lim->max_dev_sectors = UINT_MAX;
>>>> lim->max_write_zeroes_sectors = UINT_MAX;
>>>> lim->max_zone_append_sectors = UINT_MAX;
>>>> - /* Request-based stacking drivers do not reorder requests. */
>>>> - lim->driver_preserves_write_order = true;
>>>> }
>>>> EXPORT_SYMBOL(blk_set_stacking_limits);
>>>>
>>>> diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
>>>> index 2d3e186ca87e..cb9abe4bd065 100644
>>>> --- a/drivers/md/dm-linear.c
>>>> +++ b/drivers/md/dm-linear.c
>>>> @@ -147,6 +147,11 @@ static int linear_report_zones(struct dm_target *ti,
>>>> #define linear_report_zones NULL
>>>> #endif
>>>>
>>>> +static void linear_io_hints(struct dm_target *ti, struct queue_limits *limits)
>>>> +{
>>>> + limits->driver_preserves_write_order = true;
>>>> +}
>>>
>>> Hmm, but does dm-linear preserve write order ? I am not convinced. And what
>>> about dm-flakey, dm-error and dm-crypt ? All of these also support zoned
>>> devices. I do not think that we can say that any of these preserve write order.
>>
>> Hi Damien,
>>
>> I propose to keep any changes for files in the drivers/md/ directory for
>> later. This patch series is already big enough. Additionally, I don't
>> need the dm changes myself since Android does does not use dm-linear nor
>> dm-verity to access a zoned logical unit. All we need to know right now
>> is that the approach of this patch series can be extended to dm drivers.
>
> Agree. For now, dm will keep working as usual using the zone write locking. We
> can optimize that later as needed and if possible. So initializing the limits
> driver_preserves_write_order to false (default) is the way to go.
Actually, I do not think it matters since DM devices do not have an IO
scheduler... So I do not think any optimization is really needed at all. The use
of the zone write locking or not based on driver_preserves_write_order will be
at the lowest physical device level only. So for BIO based DM, we should not
need to do anything at all.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v15 02/19] block: Only use write locking if necessary
2023-11-14 21:16 [PATCH v15 00/19] Improve write performance for zoned UFS devices Bart Van Assche
2023-11-14 21:16 ` [PATCH v15 01/19] block: Introduce more member variables related to zone write locking Bart Van Assche
@ 2023-11-14 21:16 ` Bart Van Assche
2023-11-14 21:16 ` [PATCH v15 03/19] block: Preserve the order of requeued zoned writes Bart Van Assche
` (17 subsequent siblings)
19 siblings, 0 replies; 32+ messages in thread
From: Bart Van Assche @ 2023-11-14 21:16 UTC (permalink / raw)
To: Martin K . Petersen
Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig,
Bart Van Assche, Hannes Reinecke, Nitesh Shetty, Damien Le Moal,
Ming Lei
Make blk_req_needs_zone_write_lock() return false if
q->limits.use_zone_write_lock is false.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
block/blk-zoned.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 112620985bff..d8a80cce832f 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -53,11 +53,16 @@ const char *blk_zone_cond_str(enum blk_zone_cond zone_cond)
EXPORT_SYMBOL_GPL(blk_zone_cond_str);
/*
- * Return true if a request is a write requests that needs zone write locking.
+ * Return true if a request is a write request that needs zone write locking.
*/
bool blk_req_needs_zone_write_lock(struct request *rq)
{
- if (!rq->q->disk->seq_zones_wlock)
+ struct request_queue *q = rq->q;
+
+ if (!q->limits.use_zone_write_lock)
+ return false;
+
+ if (!q->disk->seq_zones_wlock)
return false;
return blk_rq_is_seq_zoned_write(rq);
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v15 03/19] block: Preserve the order of requeued zoned writes
2023-11-14 21:16 [PATCH v15 00/19] Improve write performance for zoned UFS devices Bart Van Assche
2023-11-14 21:16 ` [PATCH v15 01/19] block: Introduce more member variables related to zone write locking Bart Van Assche
2023-11-14 21:16 ` [PATCH v15 02/19] block: Only use write locking if necessary Bart Van Assche
@ 2023-11-14 21:16 ` Bart Van Assche
2023-11-14 21:16 ` [PATCH v15 04/19] block/mq-deadline: Only use zone locking if necessary Bart Van Assche
` (16 subsequent siblings)
19 siblings, 0 replies; 32+ messages in thread
From: Bart Van Assche @ 2023-11-14 21:16 UTC (permalink / raw)
To: Martin K . Petersen
Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig,
Bart Van Assche, Damien Le Moal, Ming Lei, Hannes Reinecke
blk_mq_requeue_work() inserts requeued requests in front of other
requests. This is fine for all request types except for sequential zoned
writes. Hence this patch.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
block/blk-mq.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index e2d11183f62e..e678edca3fa8 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1484,8 +1484,12 @@ static void blk_mq_requeue_work(struct work_struct *work)
list_del_init(&rq->queuelist);
blk_mq_request_bypass_insert(rq, 0);
} else {
+ blk_insert_t insert_flags = BLK_MQ_INSERT_AT_HEAD;
+
list_del_init(&rq->queuelist);
- blk_mq_insert_request(rq, BLK_MQ_INSERT_AT_HEAD);
+ if (blk_rq_is_seq_zoned_write(rq))
+ insert_flags = 0;
+ blk_mq_insert_request(rq, insert_flags);
}
}
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v15 04/19] block/mq-deadline: Only use zone locking if necessary
2023-11-14 21:16 [PATCH v15 00/19] Improve write performance for zoned UFS devices Bart Van Assche
` (2 preceding siblings ...)
2023-11-14 21:16 ` [PATCH v15 03/19] block: Preserve the order of requeued zoned writes Bart Van Assche
@ 2023-11-14 21:16 ` Bart Van Assche
2023-11-14 21:16 ` [PATCH v15 05/19] scsi: Pass SCSI host pointer to scsi_eh_flush_done_q() Bart Van Assche
` (15 subsequent siblings)
19 siblings, 0 replies; 32+ messages in thread
From: Bart Van Assche @ 2023-11-14 21:16 UTC (permalink / raw)
To: Martin K . Petersen
Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig,
Bart Van Assche, Damien Le Moal, Hannes Reinecke, Nitesh Shetty,
Ming Lei
Measurements have shown that limiting the queue depth to one per zone for
zoned writes has a significant negative performance impact on zoned UFS
devices. Hence this patch that disables zone locking by the mq-deadline
scheduler if the storage controller preserves the command order. This
patch is based on the following assumptions:
- It happens infrequently that zoned write requests are reordered by the
block layer.
- The I/O priority of all write requests is the same per zone.
- Either no I/O scheduler is used or an I/O scheduler is used that
serializes write requests per zone.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
block/mq-deadline.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index f958e79277b8..082ccf3186f4 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -353,7 +353,7 @@ deadline_fifo_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
return NULL;
rq = rq_entry_fifo(per_prio->fifo_list[data_dir].next);
- if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q))
+ if (data_dir == DD_READ || !rq->q->limits.use_zone_write_lock)
return rq;
/*
@@ -398,7 +398,7 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
if (!rq)
return NULL;
- if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q))
+ if (data_dir == DD_READ || !rq->q->limits.use_zone_write_lock)
return rq;
/*
@@ -526,8 +526,9 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
}
/*
- * For a zoned block device, if we only have writes queued and none of
- * them can be dispatched, rq will be NULL.
+ * For a zoned block device that requires write serialization, if we
+ * only have writes queued and none of them can be dispatched, rq will
+ * be NULL.
*/
if (!rq)
return NULL;
@@ -934,7 +935,7 @@ static void dd_finish_request(struct request *rq)
atomic_inc(&per_prio->stats.completed);
- if (blk_queue_is_zoned(q)) {
+ if (rq->q->limits.use_zone_write_lock) {
unsigned long flags;
spin_lock_irqsave(&dd->zone_lock, flags);
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v15 05/19] scsi: Pass SCSI host pointer to scsi_eh_flush_done_q()
2023-11-14 21:16 [PATCH v15 00/19] Improve write performance for zoned UFS devices Bart Van Assche
` (3 preceding siblings ...)
2023-11-14 21:16 ` [PATCH v15 04/19] block/mq-deadline: Only use zone locking if necessary Bart Van Assche
@ 2023-11-14 21:16 ` Bart Van Assche
2023-11-14 21:16 ` [PATCH v15 06/19] scsi: core: Introduce a mechanism for reordering requests in the error handler Bart Van Assche
` (14 subsequent siblings)
19 siblings, 0 replies; 32+ messages in thread
From: Bart Van Assche @ 2023-11-14 21:16 UTC (permalink / raw)
To: Martin K . Petersen
Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig,
Bart Van Assche, Damien Le Moal, James E.J. Bottomley, Jason Yan,
John Garry, Wenchao Hao
This patch prepares for using the host pointer directly in
scsi_eh_flush_done_q() in a later patch.
Acked-by: Damien Le Moal <dlemoal@kernel.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
drivers/ata/libata-eh.c | 2 +-
drivers/scsi/libsas/sas_scsi_host.c | 2 +-
drivers/scsi/scsi_error.c | 5 +++--
include/scsi/scsi_eh.h | 3 ++-
4 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index b0d6e69c4a5b..ff03c4a6bad9 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -768,7 +768,7 @@ void ata_scsi_port_error_handler(struct Scsi_Host *host, struct ata_port *ap)
spin_unlock_irqrestore(ap->lock, flags);
ata_eh_release(ap);
- scsi_eh_flush_done_q(&ap->eh_done_q);
+ scsi_eh_flush_done_q(host, &ap->eh_done_q);
/* clean up */
spin_lock_irqsave(ap->lock, flags);
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 9047cfcd1072..dd4fb97fdc4b 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -730,7 +730,7 @@ void sas_scsi_recover_host(struct Scsi_Host *shost)
/* now link into libata eh --- if we have any ata devices */
sas_ata_strategy_handler(shost);
- scsi_eh_flush_done_q(&ha->eh_done_q);
+ scsi_eh_flush_done_q(shost, &ha->eh_done_q);
/* check if any new eh work was scheduled during the last run */
spin_lock_irq(&ha->lock);
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c67cdcdc3ba8..7390131e7f0a 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -2188,9 +2188,10 @@ EXPORT_SYMBOL_GPL(scsi_eh_ready_devs);
/**
* scsi_eh_flush_done_q - finish processed commands or retry them.
+ * @shost: SCSI host pointer.
* @done_q: list_head of processed commands.
*/
-void scsi_eh_flush_done_q(struct list_head *done_q)
+void scsi_eh_flush_done_q(struct Scsi_Host *shost, struct list_head *done_q)
{
struct scsi_cmnd *scmd, *next;
@@ -2265,7 +2266,7 @@ static void scsi_unjam_host(struct Scsi_Host *shost)
if (shost->eh_deadline != -1)
shost->last_reset = 0;
spin_unlock_irqrestore(shost->host_lock, flags);
- scsi_eh_flush_done_q(&eh_done_q);
+ scsi_eh_flush_done_q(shost, &eh_done_q);
}
/**
diff --git a/include/scsi/scsi_eh.h b/include/scsi/scsi_eh.h
index 1ae08e81339f..d2807d799fda 100644
--- a/include/scsi/scsi_eh.h
+++ b/include/scsi/scsi_eh.h
@@ -11,7 +11,8 @@ struct Scsi_Host;
extern void scsi_eh_finish_cmd(struct scsi_cmnd *scmd,
struct list_head *done_q);
-extern void scsi_eh_flush_done_q(struct list_head *done_q);
+extern void scsi_eh_flush_done_q(struct Scsi_Host *shost,
+ struct list_head *done_q);
extern void scsi_report_bus_reset(struct Scsi_Host *, int);
extern void scsi_report_device_reset(struct Scsi_Host *, int, int);
extern int scsi_block_when_processing_errors(struct scsi_device *);
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v15 06/19] scsi: core: Introduce a mechanism for reordering requests in the error handler
2023-11-14 21:16 [PATCH v15 00/19] Improve write performance for zoned UFS devices Bart Van Assche
` (4 preceding siblings ...)
2023-11-14 21:16 ` [PATCH v15 05/19] scsi: Pass SCSI host pointer to scsi_eh_flush_done_q() Bart Van Assche
@ 2023-11-14 21:16 ` Bart Van Assche
2023-11-14 21:16 ` [PATCH v15 07/19] scsi: core: Add unit tests for scsi_call_prepare_resubmit() Bart Van Assche
` (13 subsequent siblings)
19 siblings, 0 replies; 32+ messages in thread
From: Bart Van Assche @ 2023-11-14 21:16 UTC (permalink / raw)
To: Martin K . Petersen
Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig,
Bart Van Assche, Damien Le Moal, Ming Lei, James E.J. Bottomley
Introduce the .eh_needs_prepare_resubmit and the .eh_prepare_resubmit
function pointers in struct scsi_driver. Make the error handler call
.eh_prepare_resubmit() before resubmitting commands if any of the
.eh_needs_prepare_resubmit() invocations return true. A later patch
will use this functionality to sort SCSI commands by LBA from inside
the SCSI disk driver before these are resubmitted by the error handler.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
drivers/scsi/scsi_error.c | 51 ++++++++++++++++++++++++++++++++++++++
include/scsi/scsi_driver.h | 1 +
include/scsi/scsi_host.h | 6 +++++
3 files changed, 58 insertions(+)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 7390131e7f0a..4214d7b79b06 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -27,6 +27,7 @@
#include <linux/blkdev.h>
#include <linux/delay.h>
#include <linux/jiffies.h>
+#include <linux/list_sort.h>
#include <scsi/scsi.h>
#include <scsi/scsi_cmnd.h>
@@ -2186,6 +2187,54 @@ void scsi_eh_ready_devs(struct Scsi_Host *shost,
}
EXPORT_SYMBOL_GPL(scsi_eh_ready_devs);
+/*
+ * Comparison function for sorting SCSI commands by ULD driver.
+ */
+static int scsi_cmp_uld(void *priv, const struct list_head *_a,
+ const struct list_head *_b)
+{
+ struct scsi_cmnd *a = list_entry(_a, typeof(*a), eh_entry);
+ struct scsi_cmnd *b = list_entry(_b, typeof(*b), eh_entry);
+
+ /* See also the comment above the list_sort() definition. */
+ return scsi_cmd_to_driver(a) > scsi_cmd_to_driver(b);
+}
+
+static void scsi_call_prepare_resubmit(struct Scsi_Host *shost,
+ struct list_head *done_q)
+{
+ struct scsi_cmnd *scmd, *next;
+
+ if (!shost->hostt->needs_prepare_resubmit)
+ return;
+
+ if (list_empty(done_q))
+ return;
+
+ /* Sort pending SCSI commands by ULD. */
+ list_sort(NULL, done_q, scsi_cmp_uld);
+
+ /*
+ * Call .eh_prepare_resubmit for each range of commands with identical
+ * ULD driver pointer.
+ */
+ list_for_each_entry_safe(scmd, next, done_q, eh_entry) {
+ struct scsi_driver *uld =
+ scmd->device ? scsi_cmd_to_driver(scmd) : NULL;
+ struct list_head *prev, uld_cmd_list;
+
+ while (&next->eh_entry != done_q &&
+ scsi_cmd_to_driver(next) == uld)
+ next = list_next_entry(next, eh_entry);
+ if (!uld->eh_prepare_resubmit)
+ continue;
+ prev = scmd->eh_entry.prev;
+ list_cut_position(&uld_cmd_list, prev, next->eh_entry.prev);
+ uld->eh_prepare_resubmit(&uld_cmd_list);
+ list_splice(&uld_cmd_list, prev);
+ }
+}
+
/**
* scsi_eh_flush_done_q - finish processed commands or retry them.
* @shost: SCSI host pointer.
@@ -2195,6 +2244,8 @@ void scsi_eh_flush_done_q(struct Scsi_Host *shost, struct list_head *done_q)
{
struct scsi_cmnd *scmd, *next;
+ scsi_call_prepare_resubmit(shost, done_q);
+
list_for_each_entry_safe(scmd, next, done_q, eh_entry) {
list_del_init(&scmd->eh_entry);
if (scsi_device_online(scmd->device) &&
diff --git a/include/scsi/scsi_driver.h b/include/scsi/scsi_driver.h
index 4ce1988b2ba0..2b11be896eee 100644
--- a/include/scsi/scsi_driver.h
+++ b/include/scsi/scsi_driver.h
@@ -18,6 +18,7 @@ struct scsi_driver {
int (*done)(struct scsi_cmnd *);
int (*eh_action)(struct scsi_cmnd *, int);
void (*eh_reset)(struct scsi_cmnd *);
+ void (*eh_prepare_resubmit)(struct list_head *cmd_list);
};
#define to_scsi_driver(drv) \
container_of((drv), struct scsi_driver, gendrv)
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 3b907fc2ef08..150ae619c4d2 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -464,6 +464,12 @@ struct scsi_host_template {
/* The queuecommand callback may block. See also BLK_MQ_F_BLOCKING. */
unsigned queuecommand_may_block:1;
+ /*
+ * The scsi_driver .eh_prepare_resubmit function must be called by
+ * the SCSI error handler.
+ */
+ unsigned needs_prepare_resubmit:1;
+
/*
* Countdown for host blocking with no commands outstanding.
*/
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v15 07/19] scsi: core: Add unit tests for scsi_call_prepare_resubmit()
2023-11-14 21:16 [PATCH v15 00/19] Improve write performance for zoned UFS devices Bart Van Assche
` (5 preceding siblings ...)
2023-11-14 21:16 ` [PATCH v15 06/19] scsi: core: Introduce a mechanism for reordering requests in the error handler Bart Van Assche
@ 2023-11-14 21:16 ` Bart Van Assche
2023-11-14 21:16 ` [PATCH v15 08/19] scsi: sd: Support sorting commands by LBA before resubmitting Bart Van Assche
` (12 subsequent siblings)
19 siblings, 0 replies; 32+ messages in thread
From: Bart Van Assche @ 2023-11-14 21:16 UTC (permalink / raw)
To: Martin K . Petersen
Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig,
Bart Van Assche, Damien Le Moal, Ming Lei, James E.J. Bottomley
Triggering all code paths in scsi_call_prepare_resubmit() via manual
testing is difficult. Hence add unit tests for this function.
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
drivers/scsi/Kconfig | 5 +
drivers/scsi/scsi_error.c | 4 +
drivers/scsi/scsi_error_test.c | 233 +++++++++++++++++++++++++++++++++
3 files changed, 242 insertions(+)
create mode 100644 drivers/scsi/scsi_error_test.c
diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index addac7fbe37b..2e57afdbbc4d 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -232,6 +232,11 @@ config SCSI_SCAN_ASYNC
Note that this setting also affects whether resuming from
system suspend will be performed asynchronously.
+config SCSI_ERROR_TEST
+ tristate "scsi_error.c unit tests" if !KUNIT_ALL_TESTS
+ depends on SCSI && KUNIT
+ default KUNIT_ALL_TESTS
+
menu "SCSI Transports"
depends on SCSI
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 4214d7b79b06..3a2643293abf 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -2621,3 +2621,7 @@ bool scsi_get_sense_info_fld(const u8 *sense_buffer, int sb_len,
}
}
EXPORT_SYMBOL(scsi_get_sense_info_fld);
+
+#ifdef CONFIG_SCSI_ERROR_TEST
+#include "scsi_error_test.c"
+#endif
diff --git a/drivers/scsi/scsi_error_test.c b/drivers/scsi/scsi_error_test.c
new file mode 100644
index 000000000000..46362766ad48
--- /dev/null
+++ b/drivers/scsi/scsi_error_test.c
@@ -0,0 +1,233 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2023 Google LLC
+ */
+#include <kunit/test.h>
+#include <linux/cleanup.h>
+#include <scsi/scsi_cmnd.h>
+#include <scsi/scsi_driver.h>
+#include <scsi/scsi_host.h>
+
+#define ALLOC(type, ...) \
+ ({ \
+ type *obj; \
+ obj = kmalloc(sizeof(*obj), GFP_KERNEL); \
+ if (obj) \
+ *obj = (type){ __VA_ARGS__ }; \
+ obj; \
+ })
+
+#define ALLOC_DISK(...) ALLOC(struct gendisk, __VA_ARGS__)
+
+#define ALLOC_Q(...) ALLOC(struct request_queue, __VA_ARGS__)
+
+#define ALLOC_SDEV(...) ALLOC(struct scsi_device, __VA_ARGS__)
+
+#define ALLOC_CMD(...) ALLOC(struct rq_and_cmd, __VA_ARGS__)
+
+static struct kunit *kunit_test;
+
+static void uld_prepare_resubmit(struct list_head *cmd_list)
+{
+ /* This function must not be called. */
+ KUNIT_EXPECT_TRUE(kunit_test, false);
+}
+
+/*
+ * Verify that .eh_prepare_resubmit() is not called if needs_prepare_resubmit is
+ * false.
+ */
+static void test_prepare_resubmit1(struct kunit *test)
+{
+ struct gendisk *disk __free(kfree) = ALLOC_DISK();
+ struct request_queue *q __free(kfree) = ALLOC_Q(
+ .limits = {
+ .driver_preserves_write_order = false,
+ .use_zone_write_lock = true,
+ .zoned = BLK_ZONED_HM,
+ }
+ );
+ static struct scsi_driver uld = {
+ .eh_prepare_resubmit = uld_prepare_resubmit,
+ };
+ static const struct scsi_host_template host_template;
+ static struct Scsi_Host host = {
+ .hostt = &host_template,
+ };
+ struct scsi_device *dev __free(kfree) = ALLOC_SDEV(
+ .request_queue = q,
+ .sdev_gendev.driver = &uld.gendrv,
+ .host = &host,
+ );
+ struct rq_and_cmd {
+ struct request rq;
+ struct scsi_cmnd cmd;
+ } *cmd1 __free(kfree) = NULL, *cmd2 __free(kfree);
+ LIST_HEAD(cmd_list);
+
+ BUILD_BUG_ON(scsi_cmd_to_rq(&cmd1->cmd) != &cmd1->rq);
+
+ q->disk = disk;
+ disk->queue = q;
+ cmd1 = ALLOC_CMD(
+ .rq = {
+ .q = q,
+ .cmd_flags = REQ_OP_WRITE,
+ .__sector = 2,
+ },
+ .cmd.device = dev,
+ );
+ cmd2 = ALLOC_CMD(
+ .rq = {
+ .q = q,
+ .cmd_flags = REQ_OP_WRITE,
+ .__sector = 1,
+ },
+ .cmd.device = dev,
+ );
+ list_add_tail(&cmd1->cmd.eh_entry, &cmd_list);
+ list_add_tail(&cmd2->cmd.eh_entry, &cmd_list);
+
+ KUNIT_EXPECT_EQ(test, list_count_nodes(&cmd_list), 2);
+ kunit_test = test;
+ scsi_call_prepare_resubmit(&host, &cmd_list);
+ kunit_test = NULL;
+ KUNIT_EXPECT_EQ(test, list_count_nodes(&cmd_list), 2);
+ KUNIT_EXPECT_PTR_EQ(test, cmd_list.next, &cmd1->cmd.eh_entry);
+ KUNIT_EXPECT_PTR_EQ(test, cmd_list.next->next, &cmd2->cmd.eh_entry);
+}
+
+static struct scsi_driver *uld1, *uld2, *uld3;
+
+static void uld1_prepare_resubmit(struct list_head *cmd_list)
+{
+ struct scsi_cmnd *cmd;
+
+ KUNIT_EXPECT_EQ(kunit_test, list_count_nodes(cmd_list), 2);
+ list_for_each_entry(cmd, cmd_list, eh_entry)
+ KUNIT_EXPECT_PTR_EQ(kunit_test, scsi_cmd_to_driver(cmd), uld1);
+}
+
+static void uld2_prepare_resubmit(struct list_head *cmd_list)
+{
+ struct scsi_cmnd *cmd;
+
+ KUNIT_EXPECT_EQ(kunit_test, list_count_nodes(cmd_list), 2);
+ list_for_each_entry(cmd, cmd_list, eh_entry)
+ KUNIT_EXPECT_PTR_EQ(kunit_test, scsi_cmd_to_driver(cmd), uld2);
+}
+
+static void test_prepare_resubmit2(struct kunit *test)
+{
+ static const struct scsi_host_template host_template = {
+ .needs_prepare_resubmit = true,
+ };
+ static struct Scsi_Host host = {
+ .hostt = &host_template,
+ };
+ struct gendisk *disk __free(kfree);
+ struct request_queue *q __free(kfree) =
+ ALLOC_Q(.limits = {
+ .driver_preserves_write_order = true,
+ .use_zone_write_lock = false,
+ .zoned = BLK_ZONED_HM,
+ });
+ struct rq_and_cmd {
+ struct request rq;
+ struct scsi_cmnd cmd;
+ } *cmd1 __free(kfree), *cmd2 __free(kfree), *cmd3 __free(kfree),
+ *cmd4 __free(kfree), *cmd5 __free(kfree), *cmd6 __free(kfree);
+ struct scsi_device *dev1 __free(kfree), *dev2 __free(kfree),
+ *dev3 __free(kfree);
+ struct scsi_driver *uld __free(kfree);
+ LIST_HEAD(cmd_list);
+
+ BUILD_BUG_ON(scsi_cmd_to_rq(&cmd1->cmd) != &cmd1->rq);
+
+ uld = kzalloc(3 * sizeof(*uld), GFP_KERNEL);
+ uld1 = &uld[0];
+ uld1->eh_prepare_resubmit = uld1_prepare_resubmit;
+ uld2 = &uld[1];
+ uld2->eh_prepare_resubmit = uld2_prepare_resubmit;
+ uld3 = &uld[2];
+ disk = ALLOC_DISK();
+ disk->queue = q;
+ q->disk = disk;
+ dev1 = ALLOC_SDEV(.sdev_gendev.driver = &uld1->gendrv,
+ .request_queue = q, .host = &host);
+ dev2 = ALLOC_SDEV(.sdev_gendev.driver = &uld2->gendrv,
+ .request_queue = q, .host = &host);
+ dev3 = ALLOC_SDEV(.sdev_gendev.driver = &uld3->gendrv,
+ .request_queue = q, .host = &host);
+ cmd1 = ALLOC_CMD(
+ .rq = {
+ .q = q,
+ .cmd_flags = REQ_OP_WRITE,
+ .__sector = 3,
+ },
+ .cmd.device = dev1,
+ );
+ cmd2 = ALLOC_CMD();
+ *cmd2 = *cmd1;
+ cmd2->rq.__sector = 4;
+ cmd3 = ALLOC_CMD(
+ .rq = {
+ .q = q,
+ .cmd_flags = REQ_OP_WRITE,
+ .__sector = 1,
+ },
+ .cmd.device = dev2,
+ );
+ cmd4 = kmemdup(cmd3, sizeof(*cmd3), GFP_KERNEL);
+ cmd4->rq.__sector = 2;
+ cmd5 = ALLOC_CMD(
+ .rq = {
+ .q = q,
+ .cmd_flags = REQ_OP_WRITE,
+ .__sector = 5,
+ },
+ .cmd.device = dev3,
+ );
+ cmd6 = kmemdup(cmd5, sizeof(*cmd3), GFP_KERNEL);
+ cmd6->rq.__sector = 6;
+ list_add_tail(&cmd3->cmd.eh_entry, &cmd_list);
+ list_add_tail(&cmd1->cmd.eh_entry, &cmd_list);
+ list_add_tail(&cmd2->cmd.eh_entry, &cmd_list);
+ list_add_tail(&cmd5->cmd.eh_entry, &cmd_list);
+ list_add_tail(&cmd6->cmd.eh_entry, &cmd_list);
+ list_add_tail(&cmd4->cmd.eh_entry, &cmd_list);
+
+ KUNIT_EXPECT_EQ(test, list_count_nodes(&cmd_list), 6);
+ kunit_test = test;
+ scsi_call_prepare_resubmit(&host, &cmd_list);
+ kunit_test = NULL;
+ KUNIT_EXPECT_EQ(test, list_count_nodes(&cmd_list), 6);
+ KUNIT_EXPECT_TRUE(test, uld1 < uld2);
+ KUNIT_EXPECT_TRUE(test, uld2 < uld3);
+ KUNIT_EXPECT_PTR_EQ(test, cmd_list.next, &cmd1->cmd.eh_entry);
+ KUNIT_EXPECT_PTR_EQ(test, cmd_list.next->next, &cmd2->cmd.eh_entry);
+ KUNIT_EXPECT_PTR_EQ(test, cmd_list.next->next->next,
+ &cmd3->cmd.eh_entry);
+ KUNIT_EXPECT_PTR_EQ(test, cmd_list.next->next->next->next,
+ &cmd4->cmd.eh_entry);
+ KUNIT_EXPECT_PTR_EQ(test, cmd_list.next->next->next->next->next,
+ &cmd5->cmd.eh_entry);
+ KUNIT_EXPECT_PTR_EQ(test, cmd_list.next->next->next->next->next->next,
+ &cmd6->cmd.eh_entry);
+}
+
+static struct kunit_case prepare_resubmit_test_cases[] = {
+ KUNIT_CASE(test_prepare_resubmit1),
+ KUNIT_CASE(test_prepare_resubmit2),
+ {}
+};
+
+static struct kunit_suite prepare_resubmit_test_suite = {
+ .name = "prepare_resubmit",
+ .test_cases = prepare_resubmit_test_cases,
+};
+kunit_test_suite(prepare_resubmit_test_suite);
+
+MODULE_DESCRIPTION("scsi_call_prepare_resubmit() unit tests");
+MODULE_AUTHOR("Bart Van Assche");
+MODULE_LICENSE("GPL");
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v15 08/19] scsi: sd: Support sorting commands by LBA before resubmitting
2023-11-14 21:16 [PATCH v15 00/19] Improve write performance for zoned UFS devices Bart Van Assche
` (6 preceding siblings ...)
2023-11-14 21:16 ` [PATCH v15 07/19] scsi: core: Add unit tests for scsi_call_prepare_resubmit() Bart Van Assche
@ 2023-11-14 21:16 ` Bart Van Assche
2023-11-14 21:16 ` [PATCH v15 09/19] scsi: sd: Add a unit test for sd_cmp_sector() Bart Van Assche
` (11 subsequent siblings)
19 siblings, 0 replies; 32+ messages in thread
From: Bart Van Assche @ 2023-11-14 21:16 UTC (permalink / raw)
To: Martin K . Petersen
Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig,
Bart Van Assche, Damien Le Moal, Ming Lei, James E.J. Bottomley
Support sorting SCSI commands by LBA before the SCSI error handler
resubmits these commands. This is necessary when resubmitting zoned writes
(REQ_OP_WRITE) if multiple writes have been queued for a single zone.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
drivers/scsi/sd.c | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 530918cbfce2..63bb01ddadde 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -47,6 +47,7 @@
#include <linux/blkpg.h>
#include <linux/blk-pm.h>
#include <linux/delay.h>
+#include <linux/list_sort.h>
#include <linux/major.h>
#include <linux/mutex.h>
#include <linux/string_helpers.h>
@@ -2058,6 +2059,38 @@ static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp)
return eh_disp;
}
+static int sd_cmp_sector(void *priv, const struct list_head *_a,
+ const struct list_head *_b)
+{
+ struct scsi_cmnd *a = list_entry(_a, typeof(*a), eh_entry);
+ struct scsi_cmnd *b = list_entry(_b, typeof(*b), eh_entry);
+ struct request *rq_a = scsi_cmd_to_rq(a);
+ struct request *rq_b = scsi_cmd_to_rq(b);
+ bool use_zwl_a = rq_a->q->limits.use_zone_write_lock;
+ bool use_zwl_b = rq_b->q->limits.use_zone_write_lock;
+
+ /*
+ * Order the commands that need zone write locking after the commands
+ * that do not need zone write locking. Order the commands that do not
+ * need zone write locking by LBA. Do not reorder the commands that
+ * need zone write locking. See also the comment above the list_sort()
+ * definition.
+ */
+ if (use_zwl_a || use_zwl_b)
+ return use_zwl_a > use_zwl_b;
+ return blk_rq_pos(rq_a) > blk_rq_pos(rq_b);
+}
+
+static void sd_prepare_resubmit(struct list_head *cmd_list)
+{
+ /*
+ * Sort pending SCSI commands in starting sector order. This is
+ * important if one of the SCSI devices associated with @shost is a
+ * zoned block device for which zone write locking is disabled.
+ */
+ list_sort(NULL, cmd_list, sd_cmp_sector);
+}
+
static unsigned int sd_completed_bytes(struct scsi_cmnd *scmd)
{
struct request *req = scsi_cmd_to_rq(scmd);
@@ -4014,6 +4047,7 @@ static struct scsi_driver sd_template = {
.done = sd_done,
.eh_action = sd_eh_action,
.eh_reset = sd_eh_reset,
+ .eh_prepare_resubmit = sd_prepare_resubmit,
};
/**
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v15 09/19] scsi: sd: Add a unit test for sd_cmp_sector()
2023-11-14 21:16 [PATCH v15 00/19] Improve write performance for zoned UFS devices Bart Van Assche
` (7 preceding siblings ...)
2023-11-14 21:16 ` [PATCH v15 08/19] scsi: sd: Support sorting commands by LBA before resubmitting Bart Van Assche
@ 2023-11-14 21:16 ` Bart Van Assche
2023-11-14 21:16 ` [PATCH v15 10/19] scsi: core: Retry unaligned zoned writes Bart Van Assche
` (10 subsequent siblings)
19 siblings, 0 replies; 32+ messages in thread
From: Bart Van Assche @ 2023-11-14 21:16 UTC (permalink / raw)
To: Martin K . Petersen
Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig,
Bart Van Assche, Damien Le Moal, Ming Lei, James E.J. Bottomley
Make it easier to test sd_cmp_sector() by adding a unit test for this
function.
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
drivers/scsi/Kconfig | 5 +++
drivers/scsi/sd.c | 4 ++
drivers/scsi/sd_test.c | 86 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 95 insertions(+)
create mode 100644 drivers/scsi/sd_test.c
diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 2e57afdbbc4d..ba2b81ddd7f8 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -237,6 +237,11 @@ config SCSI_ERROR_TEST
depends on SCSI && KUNIT
default KUNIT_ALL_TESTS
+config SD_TEST
+ tristate "sd.c unit tests" if !KUNIT_ALL_TESTS
+ depends on SCSI && BLK_DEV_SD && KUNIT
+ default KUNIT_ALL_TESTS
+
menu "SCSI Transports"
depends on SCSI
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 63bb01ddadde..d52ea605ada8 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -4141,3 +4141,7 @@ void sd_print_result(const struct scsi_disk *sdkp, const char *msg, int result)
"%s: Result: hostbyte=0x%02x driverbyte=%s\n",
msg, host_byte(result), "DRIVER_OK");
}
+
+#ifdef CONFIG_SD_TEST
+#include "sd_test.c"
+#endif
diff --git a/drivers/scsi/sd_test.c b/drivers/scsi/sd_test.c
new file mode 100644
index 000000000000..b9c3d2bf311e
--- /dev/null
+++ b/drivers/scsi/sd_test.c
@@ -0,0 +1,86 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2023 Google LLC
+ */
+#include <kunit/test.h>
+#include <linux/cleanup.h>
+#include <linux/list_sort.h>
+#include <linux/slab.h>
+#include <scsi/scsi_cmnd.h>
+#include <scsi/scsi_driver.h>
+#include "sd.h"
+
+#define ALLOC(type, ...) \
+ ({ \
+ type *obj; \
+ obj = kmalloc(sizeof(*obj), GFP_KERNEL); \
+ if (obj) \
+ *obj = (type){ __VA_ARGS__ }; \
+ obj; \
+ })
+
+#define ALLOC_Q(...) ALLOC(struct request_queue, __VA_ARGS__)
+
+#define ALLOC_CMD(...) ALLOC(struct rq_and_cmd, __VA_ARGS__)
+
+struct rq_and_cmd {
+ struct request rq;
+ struct scsi_cmnd cmd;
+};
+
+/*
+ * Verify that sd_cmp_sector() does what it is expected to do.
+ */
+static void test_sd_cmp_sector(struct kunit *test)
+{
+ struct request_queue *q1 __free(kfree) =
+ ALLOC_Q(.limits.use_zone_write_lock = true);
+ struct request_queue *q2 __free(kfree) =
+ ALLOC_Q(.limits.use_zone_write_lock = false);
+ struct rq_and_cmd *cmd1 __free(kfree) = ALLOC_CMD(.rq = {
+ .q = q1,
+ .__sector = 7,
+ });
+ struct rq_and_cmd *cmd2 __free(kfree) = ALLOC_CMD(.rq = {
+ .q = q1,
+ .__sector = 5,
+ });
+ struct rq_and_cmd *cmd3 __free(kfree) = ALLOC_CMD(.rq = {
+ .q = q2,
+ .__sector = 7,
+ });
+ struct rq_and_cmd *cmd4 __free(kfree) = ALLOC_CMD(.rq = {
+ .q = q2,
+ .__sector = 5,
+ });
+ LIST_HEAD(cmd_list);
+
+ list_add_tail(&cmd1->cmd.eh_entry, &cmd_list);
+ list_add_tail(&cmd2->cmd.eh_entry, &cmd_list);
+ list_add_tail(&cmd3->cmd.eh_entry, &cmd_list);
+ list_add_tail(&cmd4->cmd.eh_entry, &cmd_list);
+ KUNIT_EXPECT_EQ(test, list_count_nodes(&cmd_list), 4);
+ list_sort(NULL, &cmd_list, sd_cmp_sector);
+ KUNIT_EXPECT_EQ(test, list_count_nodes(&cmd_list), 4);
+ KUNIT_EXPECT_PTR_EQ(test, cmd_list.next, &cmd4->cmd.eh_entry);
+ KUNIT_EXPECT_PTR_EQ(test, cmd_list.next->next, &cmd3->cmd.eh_entry);
+ KUNIT_EXPECT_PTR_EQ(test, cmd_list.next->next->next,
+ &cmd1->cmd.eh_entry);
+ KUNIT_EXPECT_PTR_EQ(test, cmd_list.next->next->next->next,
+ &cmd2->cmd.eh_entry);
+}
+
+static struct kunit_case sd_test_cases[] = {
+ KUNIT_CASE(test_sd_cmp_sector),
+ {}
+};
+
+static struct kunit_suite sd_test_suite = {
+ .name = "sd",
+ .test_cases = sd_test_cases,
+};
+kunit_test_suite(sd_test_suite);
+
+MODULE_DESCRIPTION("SCSI disk (sd) driver unit tests");
+MODULE_AUTHOR("Bart Van Assche");
+MODULE_LICENSE("GPL");
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v15 10/19] scsi: core: Retry unaligned zoned writes
2023-11-14 21:16 [PATCH v15 00/19] Improve write performance for zoned UFS devices Bart Van Assche
` (8 preceding siblings ...)
2023-11-14 21:16 ` [PATCH v15 09/19] scsi: sd: Add a unit test for sd_cmp_sector() Bart Van Assche
@ 2023-11-14 21:16 ` Bart Van Assche
2023-11-14 21:16 ` [PATCH v15 11/19] scsi: sd_zbc: Only require an I/O scheduler if needed Bart Van Assche
` (9 subsequent siblings)
19 siblings, 0 replies; 32+ messages in thread
From: Bart Van Assche @ 2023-11-14 21:16 UTC (permalink / raw)
To: Martin K . Petersen
Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig,
Bart Van Assche, Damien Le Moal, Ming Lei, James E.J. Bottomley
If zoned writes (REQ_OP_WRITE) for a sequential write required zone have
a starting LBA that differs from the write pointer, e.g. because zoned
writes have been reordered, then the storage device will respond with an
UNALIGNED WRITE COMMAND error. Send commands that failed with an
unaligned write error to the SCSI error handler if zone write locking is
disabled. The SCSI error handler will sort SCSI commands per LBA before
resubmitting these.
If zone write locking is disabled, increase the number of retries for
write commands sent to a sequential zone to the maximum number of
outstanding commands because in the worst case the number of times
reordered zoned writes have to be retried is (number of outstanding
writes per sequential zone) - 1.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
drivers/scsi/constants.c | 1 +
drivers/scsi/scsi_error.c | 17 +++++++++++++++++
drivers/scsi/scsi_lib.c | 1 +
drivers/scsi/sd.c | 6 ++++++
include/scsi/scsi.h | 1 +
5 files changed, 26 insertions(+)
diff --git a/drivers/scsi/constants.c b/drivers/scsi/constants.c
index 340785536998..8ddb30741999 100644
--- a/drivers/scsi/constants.c
+++ b/drivers/scsi/constants.c
@@ -419,6 +419,7 @@ EXPORT_SYMBOL(scsi_hostbyte_string);
#define scsi_mlreturn_name(result) { result, #result }
static const struct value_name_pair scsi_mlreturn_arr[] = {
+ scsi_mlreturn_name(NEEDS_DELAYED_RETRY),
scsi_mlreturn_name(NEEDS_RETRY),
scsi_mlreturn_name(SUCCESS),
scsi_mlreturn_name(FAILED),
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 3a2643293abf..4e9a35866a0d 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -699,6 +699,23 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
fallthrough;
case ILLEGAL_REQUEST:
+ /*
+ * Unaligned write command. This may indicate that zoned writes
+ * have been received by the device in the wrong order. If zone
+ * write locking is disabled, retry after all pending commands
+ * have completed.
+ */
+ if (sshdr.asc == 0x21 && sshdr.ascq == 0x04 &&
+ !req->q->limits.use_zone_write_lock &&
+ blk_rq_is_seq_zoned_write(req) &&
+ scsi_cmd_retry_allowed(scmd)) {
+ SCSI_LOG_ERROR_RECOVERY(1,
+ sdev_printk(KERN_WARNING, scmd->device,
+ "Retrying unaligned write at LBA %#llx.\n",
+ scsi_get_lba(scmd)));
+ return NEEDS_DELAYED_RETRY;
+ }
+
if (sshdr.asc == 0x20 || /* Invalid command operation code */
sshdr.asc == 0x21 || /* Logical block address out of range */
sshdr.asc == 0x22 || /* Invalid function */
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index cf3864f72093..2e28a1aeedd0 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1442,6 +1442,7 @@ static void scsi_complete(struct request *rq)
case ADD_TO_MLQUEUE:
scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
break;
+ case NEEDS_DELAYED_RETRY:
default:
scsi_eh_scmd_add(cmd);
break;
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d52ea605ada8..7e71f9f42036 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1276,6 +1276,12 @@ static blk_status_t sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
cmd->transfersize = sdp->sector_size;
cmd->underflow = nr_blocks << 9;
cmd->allowed = sdkp->max_retries;
+ /*
+ * Increase the number of allowed retries for zoned writes if zone
+ * write locking is disabled.
+ */
+ if (!rq->q->limits.use_zone_write_lock && blk_rq_is_seq_zoned_write(rq))
+ cmd->allowed += rq->q->nr_requests;
cmd->sdb.length = nr_blocks * sdp->sector_size;
SCSI_LOG_HLQUEUE(1,
diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
index 4498f845b112..5eb8b6e3f85a 100644
--- a/include/scsi/scsi.h
+++ b/include/scsi/scsi.h
@@ -93,6 +93,7 @@ static inline int scsi_status_is_check_condition(int status)
* Internal return values.
*/
enum scsi_disposition {
+ NEEDS_DELAYED_RETRY = 0x2000,
NEEDS_RETRY = 0x2001,
SUCCESS = 0x2002,
FAILED = 0x2003,
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v15 11/19] scsi: sd_zbc: Only require an I/O scheduler if needed
2023-11-14 21:16 [PATCH v15 00/19] Improve write performance for zoned UFS devices Bart Van Assche
` (9 preceding siblings ...)
2023-11-14 21:16 ` [PATCH v15 10/19] scsi: core: Retry unaligned zoned writes Bart Van Assche
@ 2023-11-14 21:16 ` Bart Van Assche
2023-11-14 21:16 ` [PATCH v15 12/19] scsi: scsi_debug: Add the preserves_write_order module parameter Bart Van Assche
` (8 subsequent siblings)
19 siblings, 0 replies; 32+ messages in thread
From: Bart Van Assche @ 2023-11-14 21:16 UTC (permalink / raw)
To: Martin K . Petersen
Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig,
Bart Van Assche, Damien Le Moal, Ming Lei, James E.J. Bottomley
An I/O scheduler that serializes zoned writes is only needed if the SCSI
LLD does not preserve the write order. Hence only set
ELEVATOR_F_ZBD_SEQ_WRITE if the LLD does not preserve the write order.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
drivers/scsi/sd_zbc.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index a25215507668..718b31bed878 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -955,7 +955,9 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, u8 buf[SD_BUF_SIZE])
/* The drive satisfies the kernel restrictions: set it up */
blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q);
- blk_queue_required_elevator_features(q, ELEVATOR_F_ZBD_SEQ_WRITE);
+ if (!q->limits.driver_preserves_write_order)
+ blk_queue_required_elevator_features(q,
+ ELEVATOR_F_ZBD_SEQ_WRITE);
if (sdkp->zones_max_open == U32_MAX)
disk_set_max_open_zones(disk, 0);
else
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v15 12/19] scsi: scsi_debug: Add the preserves_write_order module parameter
2023-11-14 21:16 [PATCH v15 00/19] Improve write performance for zoned UFS devices Bart Van Assche
` (10 preceding siblings ...)
2023-11-14 21:16 ` [PATCH v15 11/19] scsi: sd_zbc: Only require an I/O scheduler if needed Bart Van Assche
@ 2023-11-14 21:16 ` Bart Van Assche
2023-11-14 21:16 ` [PATCH v15 13/19] scsi: scsi_debug: Support injecting unaligned write errors Bart Van Assche
` (7 subsequent siblings)
19 siblings, 0 replies; 32+ messages in thread
From: Bart Van Assche @ 2023-11-14 21:16 UTC (permalink / raw)
To: Martin K . Petersen
Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig,
Bart Van Assche, Douglas Gilbert, Damien Le Moal, Ming Lei,
James E.J. Bottomley
Zone write locking is not used for zoned devices if the block driver
reports that it preserves the order of write commands. Make it easier to
test not using zone write locking by adding support for setting the
driver_preserves_write_order flag.
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
drivers/scsi/scsi_debug.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 67922e2c4c19..6f0c78e727ec 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -884,6 +884,7 @@ static int dix_reads;
static int dif_errors;
/* ZBC global data */
+static bool sdeb_preserves_write_order;
static bool sdeb_zbc_in_use; /* true for host-aware and host-managed disks */
static int sdeb_zbc_zone_cap_mb;
static int sdeb_zbc_zone_size_mb;
@@ -5451,10 +5452,14 @@ static struct sdebug_dev_info *find_build_dev_info(struct scsi_device *sdev)
static int scsi_debug_slave_alloc(struct scsi_device *sdp)
{
+ struct request_queue *q = sdp->request_queue;
+
if (sdebug_verbose)
pr_info("slave_alloc <%u %u %u %llu>\n",
sdp->host->host_no, sdp->channel, sdp->id, sdp->lun);
+ q->limits.driver_preserves_write_order = sdeb_preserves_write_order;
+
return 0;
}
@@ -6189,6 +6194,8 @@ module_param_named(statistics, sdebug_statistics, bool, S_IRUGO | S_IWUSR);
module_param_named(strict, sdebug_strict, bool, S_IRUGO | S_IWUSR);
module_param_named(submit_queues, submit_queues, int, S_IRUGO);
module_param_named(poll_queues, poll_queues, int, S_IRUGO);
+module_param_named(preserves_write_order, sdeb_preserves_write_order, bool,
+ S_IRUGO);
module_param_named(tur_ms_to_ready, sdeb_tur_ms_to_ready, int, S_IRUGO);
module_param_named(unmap_alignment, sdebug_unmap_alignment, int, S_IRUGO);
module_param_named(unmap_granularity, sdebug_unmap_granularity, int, S_IRUGO);
@@ -6255,6 +6262,8 @@ MODULE_PARM_DESC(opts, "1->noise, 2->medium_err, 4->timeout, 8->recovered_err...
MODULE_PARM_DESC(per_host_store, "If set, next positive add_host will get new store (def=0)");
MODULE_PARM_DESC(physblk_exp, "physical block exponent (def=0)");
MODULE_PARM_DESC(poll_queues, "support for iouring iopoll queues (1 to max(submit_queues - 1))");
+MODULE_PARM_DESC(preserves_write_order,
+ "Whether or not to inform the block layer that this driver preserves the order of WRITE commands (def=0)");
MODULE_PARM_DESC(ptype, "SCSI peripheral type(def=0[disk])");
MODULE_PARM_DESC(random, "If set, uniformly randomize command duration between 0 and delay_in_ns");
MODULE_PARM_DESC(removable, "claim to have removable media (def=0)");
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v15 13/19] scsi: scsi_debug: Support injecting unaligned write errors
2023-11-14 21:16 [PATCH v15 00/19] Improve write performance for zoned UFS devices Bart Van Assche
` (11 preceding siblings ...)
2023-11-14 21:16 ` [PATCH v15 12/19] scsi: scsi_debug: Add the preserves_write_order module parameter Bart Van Assche
@ 2023-11-14 21:16 ` Bart Van Assche
2023-11-14 21:16 ` [PATCH v15 14/19] scsi: ufs: hisi: Rework the code that disables auto-hibernation Bart Van Assche
` (6 subsequent siblings)
19 siblings, 0 replies; 32+ messages in thread
From: Bart Van Assche @ 2023-11-14 21:16 UTC (permalink / raw)
To: Martin K . Petersen
Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig,
Bart Van Assche, Douglas Gilbert, Damien Le Moal, Ming Lei,
James E.J. Bottomley
Allow user space software, e.g. a blktests test, to inject unaligned
write errors.
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
drivers/scsi/scsi_debug.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 6f0c78e727ec..23ea090698df 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -183,6 +183,7 @@ static const char *sdebug_version_date = "20210520";
#define SDEBUG_OPT_NO_CDB_NOISE 0x4000
#define SDEBUG_OPT_HOST_BUSY 0x8000
#define SDEBUG_OPT_CMD_ABORT 0x10000
+#define SDEBUG_OPT_UNALIGNED_WRITE 0x20000
#define SDEBUG_OPT_ALL_NOISE (SDEBUG_OPT_NOISE | SDEBUG_OPT_Q_NOISE | \
SDEBUG_OPT_RESET_NOISE)
#define SDEBUG_OPT_ALL_INJECTING (SDEBUG_OPT_RECOVERED_ERR | \
@@ -190,7 +191,8 @@ static const char *sdebug_version_date = "20210520";
SDEBUG_OPT_DIF_ERR | SDEBUG_OPT_DIX_ERR | \
SDEBUG_OPT_SHORT_TRANSFER | \
SDEBUG_OPT_HOST_BUSY | \
- SDEBUG_OPT_CMD_ABORT)
+ SDEBUG_OPT_CMD_ABORT | \
+ SDEBUG_OPT_UNALIGNED_WRITE)
#define SDEBUG_OPT_RECOV_DIF_DIX (SDEBUG_OPT_RECOVERED_ERR | \
SDEBUG_OPT_DIF_ERR | SDEBUG_OPT_DIX_ERR)
@@ -3898,6 +3900,14 @@ static int resp_write_dt0(struct scsi_cmnd *scp, struct sdebug_dev_info *devip)
struct sdeb_store_info *sip = devip2sip(devip, true);
u8 *cmd = scp->cmnd;
+ if (unlikely(sdebug_opts & SDEBUG_OPT_UNALIGNED_WRITE &&
+ atomic_read(&sdeb_inject_pending))) {
+ atomic_set(&sdeb_inject_pending, 0);
+ mk_sense_buffer(scp, ILLEGAL_REQUEST, LBA_OUT_OF_RANGE,
+ UNALIGNED_WRITE_ASCQ);
+ return check_condition_result;
+ }
+
switch (cmd[0]) {
case WRITE_16:
ei_lba = 0;
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v15 14/19] scsi: ufs: hisi: Rework the code that disables auto-hibernation
2023-11-14 21:16 [PATCH v15 00/19] Improve write performance for zoned UFS devices Bart Van Assche
` (12 preceding siblings ...)
2023-11-14 21:16 ` [PATCH v15 13/19] scsi: scsi_debug: Support injecting unaligned write errors Bart Van Assche
@ 2023-11-14 21:16 ` Bart Van Assche
2023-11-14 21:16 ` [PATCH v15 15/19] scsi: ufs: Rename ufshcd_auto_hibern8_enable() and make it static Bart Van Assche
` (5 subsequent siblings)
19 siblings, 0 replies; 32+ messages in thread
From: Bart Van Assche @ 2023-11-14 21:16 UTC (permalink / raw)
To: Martin K . Petersen
Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig,
Bart Van Assche, Bao D . Nguyen, Can Guo, Avri Altman,
James E.J. Bottomley, Bean Huo, Keoseong Park, Adrian Hunter,
Krzysztof Kozlowski, Uwe Kleine-König
The host driver link startup callback is called indirectly by
ufshcd_probe_hba(). That function applies the auto-hibernation
settings by writing hba->ahit into the auto-hibernation control
register. Simplify the code for disabling auto-hibernation by
setting hba->ahit instead of writing into the auto-hibernation
control register. This patch is part of an effort to move all
auto-hibernation register changes into the UFSHCI driver core.
Reviewed-by: Bao D. Nguyen <quic_nguyenb@quicinc.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Can Guo <quic_cang@quicinc.com>
Cc: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
drivers/ufs/host/ufs-hisi.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/ufs/host/ufs-hisi.c b/drivers/ufs/host/ufs-hisi.c
index 0229ac0a8dbe..2e032eaf9e93 100644
--- a/drivers/ufs/host/ufs-hisi.c
+++ b/drivers/ufs/host/ufs-hisi.c
@@ -142,7 +142,6 @@ static int ufs_hisi_link_startup_pre_change(struct ufs_hba *hba)
struct ufs_hisi_host *host = ufshcd_get_variant(hba);
int err;
uint32_t value;
- uint32_t reg;
/* Unipro VS_mphy_disable */
ufshcd_dme_set(hba, UIC_ARG_MIB_SEL(0xD0C1, 0x0), 0x1);
@@ -232,9 +231,7 @@ static int ufs_hisi_link_startup_pre_change(struct ufs_hba *hba)
ufshcd_writel(hba, UFS_HCLKDIV_NORMAL_VALUE, UFS_REG_HCLKDIV);
/* disable auto H8 */
- reg = ufshcd_readl(hba, REG_AUTO_HIBERNATE_IDLE_TIMER);
- reg = reg & (~UFS_AHIT_AH8ITV_MASK);
- ufshcd_writel(hba, reg, REG_AUTO_HIBERNATE_IDLE_TIMER);
+ hba->ahit = 0;
/* Unipro PA_Local_TX_LCC_Enable */
ufshcd_disable_host_tx_lcc(hba);
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v15 15/19] scsi: ufs: Rename ufshcd_auto_hibern8_enable() and make it static
2023-11-14 21:16 [PATCH v15 00/19] Improve write performance for zoned UFS devices Bart Van Assche
` (13 preceding siblings ...)
2023-11-14 21:16 ` [PATCH v15 14/19] scsi: ufs: hisi: Rework the code that disables auto-hibernation Bart Van Assche
@ 2023-11-14 21:16 ` Bart Van Assche
2023-11-14 21:16 ` [PATCH v15 16/19] scsi: ufs: Change the return type of ufshcd_auto_hibern8_update() Bart Van Assche
` (4 subsequent siblings)
19 siblings, 0 replies; 32+ messages in thread
From: Bart Van Assche @ 2023-11-14 21:16 UTC (permalink / raw)
To: Martin K . Petersen
Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig,
Bart Van Assche, Bao D . Nguyen, Can Guo, Avri Altman,
James E.J. Bottomley, Stanley Chu, Manivannan Sadhasivam,
Asutosh Das, Peter Wang, Bean Huo, Arthur Simchaev, Po-Wen Kao,
Eric Biggers
Rename ufshcd_auto_hibern8_enable() into ufshcd_configure_auto_hibern8()
since this function can enable or disable auto-hibernation. Since
ufshcd_auto_hibern8_enable() is only used inside the UFSHCI driver core,
declare it static. Additionally, move the definition of this function to
just before its first caller.
Suggested-by: Bao D. Nguyen <quic_nguyenb@quicinc.com>
Reviewed-by: Bao D. Nguyen <quic_nguyenb@quicinc.com>
Reviewed-by: Can Guo <quic_cang@quicinc.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
drivers/ufs/core/ufshcd.c | 24 +++++++++++-------------
include/ufs/ufshcd.h | 1 -
2 files changed, 11 insertions(+), 14 deletions(-)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 8b1031fb0a44..e80878a5d4e6 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -4401,6 +4401,14 @@ int ufshcd_uic_hibern8_exit(struct ufs_hba *hba)
}
EXPORT_SYMBOL_GPL(ufshcd_uic_hibern8_exit);
+static void ufshcd_configure_auto_hibern8(struct ufs_hba *hba)
+{
+ if (!ufshcd_is_auto_hibern8_supported(hba))
+ return;
+
+ ufshcd_writel(hba, hba->ahit, REG_AUTO_HIBERNATE_IDLE_TIMER);
+}
+
void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
{
unsigned long flags;
@@ -4420,21 +4428,13 @@ void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
!pm_runtime_suspended(&hba->ufs_device_wlun->sdev_gendev)) {
ufshcd_rpm_get_sync(hba);
ufshcd_hold(hba);
- ufshcd_auto_hibern8_enable(hba);
+ ufshcd_configure_auto_hibern8(hba);
ufshcd_release(hba);
ufshcd_rpm_put_sync(hba);
}
}
EXPORT_SYMBOL_GPL(ufshcd_auto_hibern8_update);
-void ufshcd_auto_hibern8_enable(struct ufs_hba *hba)
-{
- if (!ufshcd_is_auto_hibern8_supported(hba))
- return;
-
- ufshcd_writel(hba, hba->ahit, REG_AUTO_HIBERNATE_IDLE_TIMER);
-}
-
/**
* ufshcd_init_pwr_info - setting the POR (power on reset)
* values in hba power info
@@ -8864,8 +8864,7 @@ static int ufshcd_probe_hba(struct ufs_hba *hba, bool init_dev_params)
if (hba->ee_usr_mask)
ufshcd_write_ee_control(hba);
- /* Enable Auto-Hibernate if configured */
- ufshcd_auto_hibern8_enable(hba);
+ ufshcd_configure_auto_hibern8(hba);
out:
spin_lock_irqsave(hba->host->host_lock, flags);
@@ -9862,8 +9861,7 @@ static int __ufshcd_wl_resume(struct ufs_hba *hba, enum ufs_pm_op pm_op)
cancel_delayed_work(&hba->rpm_dev_flush_recheck_work);
}
- /* Enable Auto-Hibernate if configured */
- ufshcd_auto_hibern8_enable(hba);
+ ufshcd_configure_auto_hibern8(hba);
goto out;
diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
index 7f0b2c5599cd..4156dc2b389b 100644
--- a/include/ufs/ufshcd.h
+++ b/include/ufs/ufshcd.h
@@ -1360,7 +1360,6 @@ static inline int ufshcd_disable_host_tx_lcc(struct ufs_hba *hba)
return ufshcd_dme_set(hba, UIC_ARG_MIB(PA_LOCAL_TX_LCC_ENABLE), 0);
}
-void ufshcd_auto_hibern8_enable(struct ufs_hba *hba);
void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit);
void ufshcd_fixup_dev_quirks(struct ufs_hba *hba,
const struct ufs_dev_quirk *fixups);
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v15 16/19] scsi: ufs: Change the return type of ufshcd_auto_hibern8_update()
2023-11-14 21:16 [PATCH v15 00/19] Improve write performance for zoned UFS devices Bart Van Assche
` (14 preceding siblings ...)
2023-11-14 21:16 ` [PATCH v15 15/19] scsi: ufs: Rename ufshcd_auto_hibern8_enable() and make it static Bart Van Assche
@ 2023-11-14 21:16 ` Bart Van Assche
2023-11-14 21:16 ` [PATCH v15 17/19] scsi: ufs: Simplify ufshcd_auto_hibern8_update() Bart Van Assche
` (3 subsequent siblings)
19 siblings, 0 replies; 32+ messages in thread
From: Bart Van Assche @ 2023-11-14 21:16 UTC (permalink / raw)
To: Martin K . Petersen
Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig,
Bart Van Assche, Bao D . Nguyen, Can Guo, Peter Wang, Avri Altman,
James E.J. Bottomley, Matthias Brugger,
AngeloGioacchino Del Regno, Bean Huo, Lu Hongfei, Stanley Chu,
Manivannan Sadhasivam, Asutosh Das, Arthur Simchaev, zhanghui,
Keoseong Park, Po-Wen Kao, Eric Biggers
A later patch will introduce an error path in ufshcd_auto_hibern8_update().
Change the return type of that function before introducing calls to that
function in the host drivers such that the host drivers only have to be
modified once.
Reviewed-by: Bao D. Nguyen <quic_nguyenb@quicinc.com>
Reviewed-by: Can Guo <quic_cang@quicinc.com>
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
drivers/ufs/core/ufs-sysfs.c | 2 +-
drivers/ufs/core/ufshcd-priv.h | 1 -
drivers/ufs/core/ufshcd.c | 6 ++++--
include/ufs/ufshcd.h | 2 +-
4 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/ufs/core/ufs-sysfs.c b/drivers/ufs/core/ufs-sysfs.c
index c95906443d5f..a1554eac9bbc 100644
--- a/drivers/ufs/core/ufs-sysfs.c
+++ b/drivers/ufs/core/ufs-sysfs.c
@@ -203,7 +203,7 @@ static ssize_t auto_hibern8_store(struct device *dev,
goto out;
}
- ufshcd_auto_hibern8_update(hba, ufshcd_us_to_ahit(timer));
+ ret = ufshcd_auto_hibern8_update(hba, ufshcd_us_to_ahit(timer));
out:
up(&hba->host_sem);
diff --git a/drivers/ufs/core/ufshcd-priv.h b/drivers/ufs/core/ufshcd-priv.h
index f42d99ce5bf1..de8e891da36a 100644
--- a/drivers/ufs/core/ufshcd-priv.h
+++ b/drivers/ufs/core/ufshcd-priv.h
@@ -60,7 +60,6 @@ int ufshcd_query_attr(struct ufs_hba *hba, enum query_opcode opcode,
enum attr_idn idn, u8 index, u8 selector, u32 *attr_val);
int ufshcd_query_flag(struct ufs_hba *hba, enum query_opcode opcode,
enum flag_idn idn, u8 index, bool *flag_res);
-void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit);
void ufshcd_compl_one_cqe(struct ufs_hba *hba, int task_tag,
struct cq_entry *cqe);
int ufshcd_mcq_init(struct ufs_hba *hba);
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index e80878a5d4e6..a4f25131a78c 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -4409,13 +4409,13 @@ static void ufshcd_configure_auto_hibern8(struct ufs_hba *hba)
ufshcd_writel(hba, hba->ahit, REG_AUTO_HIBERNATE_IDLE_TIMER);
}
-void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
+int ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
{
unsigned long flags;
bool update = false;
if (!ufshcd_is_auto_hibern8_supported(hba))
- return;
+ return 0;
spin_lock_irqsave(hba->host->host_lock, flags);
if (hba->ahit != ahit) {
@@ -4432,6 +4432,8 @@ void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
ufshcd_release(hba);
ufshcd_rpm_put_sync(hba);
}
+
+ return 0;
}
EXPORT_SYMBOL_GPL(ufshcd_auto_hibern8_update);
diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
index 4156dc2b389b..513ecd485bb0 100644
--- a/include/ufs/ufshcd.h
+++ b/include/ufs/ufshcd.h
@@ -1360,7 +1360,7 @@ static inline int ufshcd_disable_host_tx_lcc(struct ufs_hba *hba)
return ufshcd_dme_set(hba, UIC_ARG_MIB(PA_LOCAL_TX_LCC_ENABLE), 0);
}
-void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit);
+int ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit);
void ufshcd_fixup_dev_quirks(struct ufs_hba *hba,
const struct ufs_dev_quirk *fixups);
#define SD_ASCII_STD true
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v15 17/19] scsi: ufs: Simplify ufshcd_auto_hibern8_update()
2023-11-14 21:16 [PATCH v15 00/19] Improve write performance for zoned UFS devices Bart Van Assche
` (15 preceding siblings ...)
2023-11-14 21:16 ` [PATCH v15 16/19] scsi: ufs: Change the return type of ufshcd_auto_hibern8_update() Bart Van Assche
@ 2023-11-14 21:16 ` Bart Van Assche
2023-11-14 21:16 ` [PATCH v15 18/19] scsi: ufs: Forbid auto-hibernation without I/O scheduler Bart Van Assche
` (2 subsequent siblings)
19 siblings, 0 replies; 32+ messages in thread
From: Bart Van Assche @ 2023-11-14 21:16 UTC (permalink / raw)
To: Martin K . Petersen
Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig,
Bart Van Assche, Bao D . Nguyen, Can Guo, Avri Altman,
James E.J. Bottomley, Stanley Chu, Manivannan Sadhasivam,
Asutosh Das, Peter Wang, Bean Huo, Arthur Simchaev
Calls to ufshcd_auto_hibern8_update() are already serialized: this
function is either called if user space software is not running
(preparing to suspend) or from a single sysfs store callback function.
Kernfs serializes sysfs .store() callbacks.
No functionality is changed. This patch makes the next patch in this
series easier to read.
Reviewed-by: Bao D. Nguyen <quic_nguyenb@quicinc.com>
Reviewed-by: Can Guo <quic_cang@quicinc.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
drivers/ufs/core/ufshcd.c | 16 ++++------------
1 file changed, 4 insertions(+), 12 deletions(-)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index a4f25131a78c..73cdf9917e02 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -4411,21 +4411,13 @@ static void ufshcd_configure_auto_hibern8(struct ufs_hba *hba)
int ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
{
- unsigned long flags;
- bool update = false;
+ const u32 cur_ahit = READ_ONCE(hba->ahit);
- if (!ufshcd_is_auto_hibern8_supported(hba))
+ if (!ufshcd_is_auto_hibern8_supported(hba) || cur_ahit == ahit)
return 0;
- spin_lock_irqsave(hba->host->host_lock, flags);
- if (hba->ahit != ahit) {
- hba->ahit = ahit;
- update = true;
- }
- spin_unlock_irqrestore(hba->host->host_lock, flags);
-
- if (update &&
- !pm_runtime_suspended(&hba->ufs_device_wlun->sdev_gendev)) {
+ WRITE_ONCE(hba->ahit, ahit);
+ if (!pm_runtime_suspended(&hba->ufs_device_wlun->sdev_gendev)) {
ufshcd_rpm_get_sync(hba);
ufshcd_hold(hba);
ufshcd_configure_auto_hibern8(hba);
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v15 18/19] scsi: ufs: Forbid auto-hibernation without I/O scheduler
2023-11-14 21:16 [PATCH v15 00/19] Improve write performance for zoned UFS devices Bart Van Assche
` (16 preceding siblings ...)
2023-11-14 21:16 ` [PATCH v15 17/19] scsi: ufs: Simplify ufshcd_auto_hibern8_update() Bart Van Assche
@ 2023-11-14 21:16 ` Bart Van Assche
2023-11-14 21:16 ` [PATCH v15 19/19] scsi: ufs: Inform the block layer about write ordering Bart Van Assche
2023-11-27 7:09 ` [PATCH v15 00/19] Improve write performance for zoned UFS devices Christoph Hellwig
19 siblings, 0 replies; 32+ messages in thread
From: Bart Van Assche @ 2023-11-14 21:16 UTC (permalink / raw)
To: Martin K . Petersen
Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig,
Bart Van Assche, Bao D . Nguyen, Can Guo, Avri Altman,
James E.J. Bottomley, Stanley Chu, Manivannan Sadhasivam,
Asutosh Das, Peter Wang, Bean Huo, Arthur Simchaev
UFSHCI controllers in legacy mode do not preserve the write order if
auto-hibernation is enabled. If the write order is not preserved, an I/O
scheduler is required to serialize zoned writes. Hence do not allow
auto-hibernation to be enabled without I/O scheduler if a zoned logical unit
is present and if the controller is operating in legacy mode. This patch has
been tested with the following shell script:
show_ah8() {
echo -n "auto_hibern8: "
adb shell "cat /sys/devices/platform/13200000.ufs/auto_hibern8"
}
set_ah8() {
local rc
adb shell "echo $1 > /sys/devices/platform/13200000.ufs/auto_hibern8"
rc=$?
show_ah8
return $rc
}
set_iosched() {
adb shell "echo $1 >/sys/class/block/$zoned_bdev/queue/scheduler &&
echo -n 'I/O scheduler: ' &&
cat /sys/class/block/sde/queue/scheduler"
}
adb root
zoned_bdev=$(adb shell grep -lvw 0 /sys/class/block/sd*/queue/chunk_sectors |&
sed 's|/sys/class/block/||g;s|/queue/chunk_sectors||g')
[ -n "$zoned_bdev" ]
show_ah8
set_ah8 0
set_iosched none
if set_ah8 150000; then
echo "Error: enabled AH8 without I/O scheduler"
fi
set_iosched mq-deadline
set_ah8 150000
Reviewed-by: Bao D. Nguyen <quic_nguyenb@quicinc.com>
Reviewed-by: Can Guo <quic_cang@quicinc.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
drivers/ufs/core/ufshcd.c | 60 +++++++++++++++++++++++++++++++++++++++
1 file changed, 60 insertions(+)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 73cdf9917e02..732509289165 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -4401,6 +4401,30 @@ int ufshcd_uic_hibern8_exit(struct ufs_hba *hba)
}
EXPORT_SYMBOL_GPL(ufshcd_uic_hibern8_exit);
+static int ufshcd_update_preserves_write_order(struct ufs_hba *hba,
+ bool preserves_write_order)
+{
+ struct scsi_device *sdev;
+
+ if (!preserves_write_order) {
+ shost_for_each_device(sdev, hba->host) {
+ struct request_queue *q = sdev->request_queue;
+
+ /*
+ * Refuse to enable auto-hibernation if no I/O scheduler
+ * is present. This code does not check whether the
+ * attached I/O scheduler serializes zoned writes
+ * (ELEVATOR_F_ZBD_SEQ_WRITE) because this cannot be
+ * checked from outside the block layer core.
+ */
+ if (blk_queue_is_zoned(q) && !q->elevator)
+ return -EPERM;
+ }
+ }
+
+ return 0;
+}
+
static void ufshcd_configure_auto_hibern8(struct ufs_hba *hba)
{
if (!ufshcd_is_auto_hibern8_supported(hba))
@@ -4409,13 +4433,42 @@ static void ufshcd_configure_auto_hibern8(struct ufs_hba *hba)
ufshcd_writel(hba, hba->ahit, REG_AUTO_HIBERNATE_IDLE_TIMER);
}
+/**
+ * ufshcd_auto_hibern8_update() - Modify the auto-hibernation control register
+ * @hba: per-adapter instance
+ * @ahit: New auto-hibernate settings. Includes the scale and the value of the
+ * auto-hibernation timer. See also the UFSHCI_AHIBERN8_TIMER_MASK and
+ * UFSHCI_AHIBERN8_SCALE_MASK constants.
+ *
+ * Notes:
+ * - UFSHCI controllers do not preserve the command order in legacy mode
+ * if auto-hibernation is enabled. If the command order is not preserved, an
+ * I/O scheduler that serializes zoned writes (mq-deadline) is required if a
+ * zoned logical unit is present. Enabling auto-hibernation without attaching
+ * the mq-deadline scheduler first may cause unaligned write errors for the
+ * zoned logical unit if a zoned logical unit is present.
+ * - Calls of this function must be serialized.
+ */
int ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
{
const u32 cur_ahit = READ_ONCE(hba->ahit);
+ bool prev_state, new_state;
+ int ret;
if (!ufshcd_is_auto_hibern8_supported(hba) || cur_ahit == ahit)
return 0;
+ prev_state = FIELD_GET(UFSHCI_AHIBERN8_TIMER_MASK, cur_ahit);
+ new_state = FIELD_GET(UFSHCI_AHIBERN8_TIMER_MASK, ahit);
+
+ if (!is_mcq_enabled(hba) && !prev_state && new_state) {
+ /*
+ * Auto-hibernation will be enabled for legacy UFSHCI mode.
+ */
+ ret = ufshcd_update_preserves_write_order(hba, false);
+ if (ret)
+ return ret;
+ }
WRITE_ONCE(hba->ahit, ahit);
if (!pm_runtime_suspended(&hba->ufs_device_wlun->sdev_gendev)) {
ufshcd_rpm_get_sync(hba);
@@ -4424,6 +4477,13 @@ int ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
ufshcd_release(hba);
ufshcd_rpm_put_sync(hba);
}
+ if (!is_mcq_enabled(hba) && prev_state && !new_state) {
+ /*
+ * Auto-hibernation has been disabled.
+ */
+ ret = ufshcd_update_preserves_write_order(hba, true);
+ WARN_ON_ONCE(ret);
+ }
return 0;
}
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v15 19/19] scsi: ufs: Inform the block layer about write ordering
2023-11-14 21:16 [PATCH v15 00/19] Improve write performance for zoned UFS devices Bart Van Assche
` (17 preceding siblings ...)
2023-11-14 21:16 ` [PATCH v15 18/19] scsi: ufs: Forbid auto-hibernation without I/O scheduler Bart Van Assche
@ 2023-11-14 21:16 ` Bart Van Assche
2023-11-28 1:45 ` Can Guo
2023-11-27 7:09 ` [PATCH v15 00/19] Improve write performance for zoned UFS devices Christoph Hellwig
19 siblings, 1 reply; 32+ messages in thread
From: Bart Van Assche @ 2023-11-14 21:16 UTC (permalink / raw)
To: Martin K . Petersen
Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig,
Bart Van Assche, Bao D . Nguyen, Can Guo, Avri Altman,
James E.J. Bottomley, Stanley Chu, Manivannan Sadhasivam,
Asutosh Das, Peter Wang, Bean Huo, Arthur Simchaev
From the UFSHCI 4.0 specification, about the legacy (single queue) mode:
"The host controller always process transfer requests in-order according
to the order submitted to the list. In case of multiple commands with
single doorbell register ringing (batch mode), The dispatch order for
these transfer requests by host controller will base on their index in
the List. A transfer request with lower index value will be executed
before a transfer request with higher index value."
From the UFSHCI 4.0 specification, about the MCQ mode:
"Command Submission
1. Host SW writes an Entry to SQ
2. Host SW updates SQ doorbell tail pointer
Command Processing
3. After fetching the Entry, Host Controller updates SQ doorbell head
pointer
4. Host controller sends COMMAND UPIU to UFS device"
In other words, for both legacy and MCQ mode, UFS controllers are
required to forward commands to the UFS device in the order these
commands have been received from the host.
Notes:
- For legacy mode this is only correct if the host submits one
command at a time. The UFS driver does this.
- Also in legacy mode, the command order is not preserved if
auto-hibernation is enabled in the UFS controller. Hence, enable
zone write locking if auto-hibernation is enabled.
This patch improves performance as follows on my test setup:
- With the mq-deadline scheduler: 2.5x more IOPS for small writes.
- When not using an I/O scheduler compared to using mq-deadline with
zone locking: 4x more IOPS for small writes.
Reviewed-by: Bao D. Nguyen <quic_nguyenb@quicinc.com>
Reviewed-by: Can Guo <quic_cang@quicinc.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
drivers/ufs/core/ufshcd.c | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 732509289165..e78954cda3ae 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -4421,6 +4421,20 @@ static int ufshcd_update_preserves_write_order(struct ufs_hba *hba,
return -EPERM;
}
}
+ shost_for_each_device(sdev, hba->host)
+ blk_freeze_queue_start(sdev->request_queue);
+ shost_for_each_device(sdev, hba->host) {
+ struct request_queue *q = sdev->request_queue;
+
+ blk_mq_freeze_queue_wait(q);
+ q->limits.driver_preserves_write_order = preserves_write_order;
+ blk_queue_required_elevator_features(q,
+ !preserves_write_order && blk_queue_is_zoned(q) ?
+ ELEVATOR_F_ZBD_SEQ_WRITE : 0);
+ if (q->disk)
+ disk_set_zoned(q->disk, q->limits.zoned);
+ blk_mq_unfreeze_queue(q);
+ }
return 0;
}
@@ -4463,7 +4477,8 @@ int ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
if (!is_mcq_enabled(hba) && !prev_state && new_state) {
/*
- * Auto-hibernation will be enabled for legacy UFSHCI mode.
+ * Auto-hibernation will be enabled for legacy UFSHCI mode. Tell
+ * the block layer that write requests may be reordered.
*/
ret = ufshcd_update_preserves_write_order(hba, false);
if (ret)
@@ -4479,7 +4494,8 @@ int ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
}
if (!is_mcq_enabled(hba) && prev_state && !new_state) {
/*
- * Auto-hibernation has been disabled.
+ * Auto-hibernation has been disabled. Tell the block layer that
+ * the order of write requests is preserved.
*/
ret = ufshcd_update_preserves_write_order(hba, true);
WARN_ON_ONCE(ret);
@@ -5247,6 +5263,10 @@ static int ufshcd_slave_configure(struct scsi_device *sdev)
struct ufs_hba *hba = shost_priv(sdev->host);
struct request_queue *q = sdev->request_queue;
+ q->limits.driver_preserves_write_order =
+ !ufshcd_is_auto_hibern8_supported(hba) ||
+ FIELD_GET(UFSHCI_AHIBERN8_TIMER_MASK, hba->ahit) == 0;
+
blk_queue_update_dma_pad(q, PRDT_DATA_BYTE_COUNT_PAD - 1);
/*
@@ -9026,6 +9046,7 @@ static const struct scsi_host_template ufshcd_driver_template = {
.max_host_blocked = 1,
.track_queue_depth = 1,
.skip_settle_delay = 1,
+ .needs_prepare_resubmit = 1,
.sdev_groups = ufshcd_driver_groups,
.rpm_autosuspend_delay = RPM_AUTOSUSPEND_DELAY_MS,
};
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH v15 19/19] scsi: ufs: Inform the block layer about write ordering
2023-11-14 21:16 ` [PATCH v15 19/19] scsi: ufs: Inform the block layer about write ordering Bart Van Assche
@ 2023-11-28 1:45 ` Can Guo
2023-11-28 21:49 ` Bart Van Assche
0 siblings, 1 reply; 32+ messages in thread
From: Can Guo @ 2023-11-28 1:45 UTC (permalink / raw)
To: Bart Van Assche, Martin K . Petersen
Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig,
Bao D . Nguyen, Avri Altman, James E.J. Bottomley, Stanley Chu,
Manivannan Sadhasivam, Asutosh Das, Peter Wang, Bean Huo,
Arthur Simchaev
Hi Bart,
On 11/15/2023 5:16 AM, Bart Van Assche wrote:
> From the UFSHCI 4.0 specification, about the legacy (single queue) mode:
> "The host controller always process transfer requests in-order according
> to the order submitted to the list. In case of multiple commands with
> single doorbell register ringing (batch mode), The dispatch order for
> these transfer requests by host controller will base on their index in
> the List. A transfer request with lower index value will be executed
> before a transfer request with higher index value."
>
> From the UFSHCI 4.0 specification, about the MCQ mode:
> "Command Submission
> 1. Host SW writes an Entry to SQ
> 2. Host SW updates SQ doorbell tail pointer
>
> Command Processing
> 3. After fetching the Entry, Host Controller updates SQ doorbell head
> pointer
> 4. Host controller sends COMMAND UPIU to UFS device"
>
> In other words, for both legacy and MCQ mode, UFS controllers are
> required to forward commands to the UFS device in the order these
> commands have been received from the host.
>
> Notes:
> - For legacy mode this is only correct if the host submits one
> command at a time. The UFS driver does this.
> - Also in legacy mode, the command order is not preserved if
> auto-hibernation is enabled in the UFS controller. Hence, enable
> zone write locking if auto-hibernation is enabled.
>
> This patch improves performance as follows on my test setup:
> - With the mq-deadline scheduler: 2.5x more IOPS for small writes.
> - When not using an I/O scheduler compared to using mq-deadline with
> zone locking: 4x more IOPS for small writes.
>
> Reviewed-by: Bao D. Nguyen <quic_nguyenb@quicinc.com>
> Reviewed-by: Can Guo <quic_cang@quicinc.com>
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Avri Altman <avri.altman@wdc.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
> drivers/ufs/core/ufshcd.c | 25 +++++++++++++++++++++++--
> 1 file changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index 732509289165..e78954cda3ae 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -4421,6 +4421,20 @@ static int ufshcd_update_preserves_write_order(struct ufs_hba *hba,
> return -EPERM;
> }
> }
> + shost_for_each_device(sdev, hba->host)
> + blk_freeze_queue_start(sdev->request_queue);
> + shost_for_each_device(sdev, hba->host) {
> + struct request_queue *q = sdev->request_queue;
> +
> + blk_mq_freeze_queue_wait(q);
> + q->limits.driver_preserves_write_order = preserves_write_order;
> + blk_queue_required_elevator_features(q,
> + !preserves_write_order && blk_queue_is_zoned(q) ?
> + ELEVATOR_F_ZBD_SEQ_WRITE : 0);
> + if (q->disk)
> + disk_set_zoned(q->disk, q->limits.zoned);
> + blk_mq_unfreeze_queue(q);
> + }
>
> return 0;
> }
> @@ -4463,7 +4477,8 @@ int ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
>
> if (!is_mcq_enabled(hba) && !prev_state && new_state) {
> /*
> - * Auto-hibernation will be enabled for legacy UFSHCI mode.
> + * Auto-hibernation will be enabled for legacy UFSHCI mode. Tell
> + * the block layer that write requests may be reordered.
> */
> ret = ufshcd_update_preserves_write_order(hba, false);
> if (ret)
> @@ -4479,7 +4494,8 @@ int ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
> }
> if (!is_mcq_enabled(hba) && prev_state && !new_state) {
> /*
> - * Auto-hibernation has been disabled.
> + * Auto-hibernation has been disabled. Tell the block layer that
> + * the order of write requests is preserved.
> */
> ret = ufshcd_update_preserves_write_order(hba, true);
> WARN_ON_ONCE(ret);
> @@ -5247,6 +5263,10 @@ static int ufshcd_slave_configure(struct scsi_device *sdev)
> struct ufs_hba *hba = shost_priv(sdev->host);
> struct request_queue *q = sdev->request_queue;
>
> + q->limits.driver_preserves_write_order =
> + !ufshcd_is_auto_hibern8_supported(hba) ||
> + FIELD_GET(UFSHCI_AHIBERN8_TIMER_MASK, hba->ahit) == 0;
> +
I got some time testing these changes on SM8650 with MCQ enabled. I
found that with these changes in place (with AH8 disabled). Even we can
make sure UFS driver does not re-order requests in MCQ mode, the reorder
is still happening while running FIO and can be seen from ftrace logs. I
think it is related with below logic in blk-mq-sched.c, please correct
me if I am wrong.
static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
{
...
} else if (multi_hctxs) {
/*
* Requests from different hctx may be dequeued from some
* schedulers, such as bfq and deadline.
*
* Sort the requests in the list according to their hctx,
* dispatch batching requests from same hctx at a time.
*/
list_sort(NULL, &rq_list, sched_rq_cmp);
...
}
Thanks,
Can Guo.
> blk_queue_update_dma_pad(q, PRDT_DATA_BYTE_COUNT_PAD - 1);
>
> /*
> @@ -9026,6 +9046,7 @@ static const struct scsi_host_template ufshcd_driver_template = {
> .max_host_blocked = 1,
> .track_queue_depth = 1,
> .skip_settle_delay = 1,
> + .needs_prepare_resubmit = 1,
> .sdev_groups = ufshcd_driver_groups,
> .rpm_autosuspend_delay = RPM_AUTOSUSPEND_DELAY_MS,
> };
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH v15 19/19] scsi: ufs: Inform the block layer about write ordering
2023-11-28 1:45 ` Can Guo
@ 2023-11-28 21:49 ` Bart Van Assche
0 siblings, 0 replies; 32+ messages in thread
From: Bart Van Assche @ 2023-11-28 21:49 UTC (permalink / raw)
To: Can Guo, Martin K . Petersen
Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig,
Bao D . Nguyen, Avri Altman, James E.J. Bottomley, Stanley Chu,
Manivannan Sadhasivam, Asutosh Das, Peter Wang, Bean Huo,
Arthur Simchaev
On 11/27/23 17:45, Can Guo wrote:
> I got some time testing these changes on SM8650 with MCQ enabled. I
> found that with these changes in place (with AH8 disabled). Even we
> can make sure UFS driver does not re-order requests in MCQ mode, the
> reorder is still happening while running FIO and can be seen from
> ftrace logs.
Hi Can,
Thank you for having taken the time to run this test and also for having
shared your findings. I have not yet had the chance to test this patch
series myself on an MCQ test setup. I will try to locate such a test
setup and test this patch series on an MCQ setup.
Thanks,
Bart.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v15 00/19] Improve write performance for zoned UFS devices
2023-11-14 21:16 [PATCH v15 00/19] Improve write performance for zoned UFS devices Bart Van Assche
` (18 preceding siblings ...)
2023-11-14 21:16 ` [PATCH v15 19/19] scsi: ufs: Inform the block layer about write ordering Bart Van Assche
@ 2023-11-27 7:09 ` Christoph Hellwig
2023-11-27 19:35 ` [PATCH v15 00/19] Improve write performance for zoned UFS devices Bart Van Assche
19 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2023-11-27 7:09 UTC (permalink / raw)
To: Bart Van Assche
Cc: Martin K . Petersen, linux-scsi, linux-block, Jens Axboe,
Christoph Hellwig
As this keeps getting reposted:
I still think it is a very bad idea to add this amount of complexity to
the SCSI code, for a model that can't work for the general case and
diverges from the established NVMe model.
So I do not thing we should support this.
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH v15 00/19] Improve write performance for zoned UFS devices
2023-11-27 7:09 ` [PATCH v15 00/19] Improve write performance for zoned UFS devices Christoph Hellwig
@ 2023-11-27 19:35 ` Bart Van Assche
2023-11-28 12:53 ` Christoph Hellwig
0 siblings, 1 reply; 32+ messages in thread
From: Bart Van Assche @ 2023-11-27 19:35 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Martin K . Petersen, linux-scsi, linux-block, Jens Axboe
On 11/26/23 23:09, Christoph Hellwig wrote:
> I still think it is a very bad idea to add this amount of complexity to
> the SCSI code, for a model that can't work for the general case and
> diverges from the established NVMe model.
Hi Christoph,
Here is some additional background information:
* UFS vendors prefer the SCSI command set because they combine it with the
M-PHY transport layer. This combination is more power efficient than NVMe
over PCIe. According to the information I have available power consumption
in the M-PHY hibernation state is lower than in the PCIe L2 state. I have
not yet heard about any attempts to combine the NVMe command set with the
M-PHY transport layer. Even if this would be possible, it would fragment
the mobile storage market. This would increase the price of mobile storage
devices which is undesirable.
* I think that the "established NVMe model" in your email refers to the NVMe
zone append command. As you know there is no zone append in the SCSI ZBC
standard.
* Using the software implementation of REQ_OP_ZONE_APPEND in drivers/scsi/sd_zbc.c
is not an option. REQ_OP_ZONE_APPEND commands are serialized by that
implementation. This serialization is unavoidable because a SCSI device
may respond with a unit attention condition to any SCSI command. Hence,
even if REQ_OP_ZONE_APPEND commands are submitted in order, these may be
executed out-of-order. We do not want any serialization of SCSI commands
because this has a significant negative performance impact on IOPS for UFS
devices. The latest UFS devices support more than 300 K IOPS.
* Serialization in the I/O scheduler of zoned writes also reduces IOPS more
than what is acceptable.
Hence the approach of this patch series to support pipelining of zoned writes
even if no I/O scheduler has been configured.
I think the amount of complexity introduced by this patch series in the SCSI
core is reasonable. No new states are introduced in the SCSI core. A single
call to a function that reorders pending SCSI commands is introduced in the
SCSI error handler (scsi_call_prepare_resubmit()).
Thanks,
Bart.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v15 00/19] Improve write performance for zoned UFS devices
2023-11-27 19:35 ` [PATCH v15 00/19] Improve write performance for zoned UFS devices Bart Van Assche
@ 2023-11-28 12:53 ` Christoph Hellwig
2023-11-28 17:36 ` Bart Van Assche
0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2023-11-28 12:53 UTC (permalink / raw)
To: Bart Van Assche
Cc: Christoph Hellwig, Martin K . Petersen, linux-scsi, linux-block,
Jens Axboe
On Mon, Nov 27, 2023 at 11:35:48AM -0800, Bart Van Assche wrote:
> Here is some additional background information:
I know the background. I also know that JEDEC did all this aginst
better judgement and knowing the situation. We should not give them
their carrot after they haven't even been interested in engaging.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v15 00/19] Improve write performance for zoned UFS devices
2023-11-28 12:53 ` Christoph Hellwig
@ 2023-11-28 17:36 ` Bart Van Assche
0 siblings, 0 replies; 32+ messages in thread
From: Bart Van Assche @ 2023-11-28 17:36 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Martin K . Petersen, linux-scsi, linux-block, Jens Axboe
On 11/28/23 04:53, Christoph Hellwig wrote:
> I know the background. I also know that JEDEC did all this aginst
> better judgement and knowing the situation. We should not give them
> their carrot after they haven't even been interested in engaging.
That statement is overly negative. The JEDEC Zoned Storage for UFS
standard has been published last week [1]. It can be downloaded by
anyone for free after having created a JEDEC account, which is also
free. As one can see in this standard, nothing excludes using a zone
append command. Once T10 standardizes a zone append command, it can
be implemented by UFS vendors. However, I do not know whether T10
plans to standardize a zone append command.
Bart.
[1] https://www.jedec.org/system/files/docs/JESD220-5.pdf
^ permalink raw reply [flat|nested] 32+ messages in thread