linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v24 00/18] Improve write performance for zoned UFS devices
@ 2025-08-27 21:29 Bart Van Assche
  2025-08-27 21:29 ` [PATCH v24 01/18] block: Support block devices that preserve the order of write requests Bart Van Assche
                   ` (18 more replies)
  0 siblings, 19 replies; 21+ messages in thread
From: Bart Van Assche @ 2025-08-27 21:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Damien Le Moal,
	Bart Van Assche

Hi Jens,

This patch series improves small write IOPS by a factor of two for zoned UFS
devices on my test setup. The changes included in this patch series are as
follows:
 - A new request queue limits flag is introduced that allows block drivers to
   declare whether or not the request order is preserved per hardware queue.
 - The order of zoned writes is preserved in the block layer by submitting all
   zoned writes from the same CPU core as long as any zoned writes are pending.
 - A new member 'from_cpu' is introduced in the per-zone data structure
   'blk_zone_wplug' to track from which CPU to submit zoned writes. This data
   member is reset to -1 after all pending zoned writes for a zone have
   completed.
 - The retry count for zoned writes is increased in the SCSI core to deal with
   reordering caused by unit attention conditions or the SCSI error handler.
 - New functionality is added in the null_blk and scsi_debug drivers to make it
   easier to test the changes introduced by this patch series.

Please consider this patch series for the next merge window.

Thanks,

Bart.

Changes compared to v23:
 - Removed the sysfs attribute for configuring write pipelining.
 - Split patch "Run all hwqs for sq scheds if write pipelining is enabled" into
   two patches to make it easier to review.
 - Added patch "blk-zoned: Document disk_zone_wplug_schedule_bio_work() locking".
 - Rebased on top of Jens' for-next branch.

Changes compared to v22:
 - Made write pipelining configurable via sysfs.
 - Fixed sporadic write errors observed with the mq-deadline I/O scheduler.

Changes compared to v21:
 - Added a patch that makes the block layer preserve the request order when
   inserting a request.
 - Restored a warning statement in block/blk-zoned.c.
 - Reworked the code that selects a CPU to queue zoned writes from such that no
   changes have to be undone if blk_zone_wplug_prepare_bio() fails.
 - Removed the "plug" label in block/blk-zoned.c and retained the
   "add_to_bio_list" label.
 - Changed scoped_guard() back into spin_lock_*() calls.
 - Fixed a recently introduced reference count leak in
   disk_zone_wplug_schedule_bio_work().
 - Restored the patch for the null_blk driver.

Changes compared to v20:
 - Converted a struct queue_limits member variable into a queue_limits feature
   flag.
 - Optimized performance of blk_mq_requeue_work().
 - Instead of splitting blk_zone_wplug_bio_work(), introduce a loop in that
   function.
 - Reworked patch "blk-zoned: Support pipelining of zoned writes".
 - Dropped the null_blk driver patch.
 - Improved several patch descriptions.

Changes compared to v19:
 - Dropped patch 2/11 "block: Support allocating from a specific software queue"
 - Implemented Damien's proposal to always add pipelined bios to the plug list
   and to submit all pipelined bios from the bio work for a zone.
 - Added three refactoring patches to make this patch series easier to review.

Changes compared to v18:
 - Dropped patch 2/12 "block: Rework request allocation in blk_mq_submit_bio()".
 - Improved patch descriptions.

Changes compared to v17:
 - Rebased the patch series on top of kernel v6.16-rc1.
 - Dropped support for UFSHCI 3.0 controllers because the UFSHCI 3.0 auto-
   hibernation mechanism causes request reordering. UFSHCI 4.0 controllers
   remain supported.
 - Removed the error handling and write pointer tracking mechanisms again
   from block/blk-zoned.c.
 - Dropped the dm-linear patch from this patch series since I'm not aware of
   any use cases for write pipelining and dm-linear.

Changes compared to v16:
 - Rebased the entire patch series on top of Jens' for-next branch. Compared
   to when v16 of this series was posted, the BLK_ZONE_WPLUG_NEED_WP_UPDATE
   flag has been introduced and support for REQ_NOWAIT has been fixed.
 - The behavior for SMR disks is preserved: if .driver_preserves_write_order
   has not been set, BLK_ZONE_WPLUG_NEED_WP_UPDATE is still set if a write
   error has been encountered. If .driver_preserves_write_order has not been
   set, the write pointer is restored and the failed zoned writes are retried.
 - The superfluous "disk->zone_wplugs_hash_bits != 0" tests have been removed.

Changes compared to v15:
 - Reworked this patch series on top of the zone write plugging approach.
 - Moved support for requeuing requests from the SCSI core into the block
   layer core.
 - In the UFS driver, instead of disabling write pipelining if
   auto-hibernation is enabled, rely on the requeuing mechanism to handle
   reordering caused by resuming from auto-hibernation.

Changes compared to v14:
 - Removed the drivers/scsi/Kconfig.kunit and drivers/scsi/Makefile.kunit
   files. Instead, modified drivers/scsi/Kconfig and added #include "*_test.c"
   directives in the appropriate .c files. Removed the EXPORT_SYMBOL()
   directives that were added to make the unit tests link.
 - Fixed a double free in a unit test.

Changes compared to v13:
 - Reworked patch "block: Preserve the order of requeued zoned writes".
 - Addressed a performance concern by removing the eh_needs_prepare_resubmit
   SCSI driver callback and by introducing the SCSI host template flag
   .needs_prepare_resubmit instead.
 - Added a patch that adds a 'host' argument to scsi_eh_flush_done_q().
 - Made the code in unit tests less repetitive.

Changes compared to v12:
 - Added two new patches: "block: Preserve the order of requeued zoned writes"
   and "scsi: sd: Add a unit test for sd_cmp_sector()"
 - Restricted the number of zoned write retries. To my surprise I had to add
   "&& scmd->retries <= scmd->allowed" in the SCSI error handler to limit the
   number of retries.
 - In patch "scsi: ufs: Inform the block layer about write ordering", only set
   ELEVATOR_F_ZBD_SEQ_WRITE for zoned block devices.

Changes compared to v11:
 - Fixed a NULL pointer dereference that happened when booting from an ATA
   device by adding an scmd->device != NULL check in scsi_needs_preparation().
 - Updated Reviewed-by tags.

Changes compared to v10:
 - Dropped the UFS MediaTek and HiSilicon patches because these are not correct
   and because it is safe to drop these patches.
 - Updated Acked-by / Reviewed-by tags.

Changes compared to v9:
 - Introduced an additional scsi_driver callback: .eh_needs_prepare_resubmit().
 - Renamed the scsi_debug kernel module parameter 'no_zone_write_lock' into
   'preserves_write_order'.
 - Fixed an out-of-bounds access in the unit scsi_call_prepare_resubmit() unit
   test.
 - Wrapped ufshcd_auto_hibern8_update() calls in UFS host drivers with
   WARN_ON_ONCE() such that a kernel stack appears in case an error code is
   returned.
 - Elaborated a comment in the UFSHCI driver.

Changes compared to v8:
 - Fixed handling of 'driver_preserves_write_order' and 'use_zone_write_lock'
   in blk_stack_limits().
 - Added a comment in disk_set_zoned().
 - Modified blk_req_needs_zone_write_lock() such that it returns false if
   q->limits.use_zone_write_lock is false.
 - Modified disk_clear_zone_settings() such that it clears
   q->limits.use_zone_write_lock.
 - Left out one change from the mq-deadline patch that became superfluous due to
   the blk_req_needs_zone_write_lock() change.
 - Modified scsi_call_prepare_resubmit() such that it only calls list_sort() if
   zoned writes have to be resubmitted for which zone write locking is disabled.
 - Added an additional unit test for scsi_call_prepare_resubmit().
 - Modified the sorting code in the sd driver such that only those SCSI commands
   are sorted for which write locking is disabled.
 - Modified sd_zbc.c such that ELEVATOR_F_ZBD_SEQ_WRITE is only set if the
   write order is not preserved.
 - Included three patches for UFS host drivers that rework code that wrote
   directly to the auto-hibernation controller register.
 - Modified the UFS driver such that enabling auto-hibernation is not allowed
   if a zoned logical unit is present and if the controller operates in legacy
   mode.
 - Also in the UFS driver, simplified ufshcd_auto_hibern8_update().

Changes compared to v7:
 - Split the queue_limits member variable `use_zone_write_lock' into two member
   variables: `use_zone_write_lock' (set by disk_set_zoned()) and
   `driver_preserves_write_order' (set by the block driver or SCSI LLD). This
   should clear up the confusion about the purpose of this variable.
 - Moved the code for sorting SCSI commands by LBA from the SCSI error handler
   into the SCSI disk (sd) driver as requested by Christoph.
   
Changes compared to v6:
 - Removed QUEUE_FLAG_NO_ZONE_WRITE_LOCK and instead introduced a flag in
   the request queue limits data structure.

Changes compared to v5:
 - Renamed scsi_cmp_lba() into scsi_cmp_sector().
 - Improved several source code comments.

Changes compared to v4:
 - Dropped the patch that introduces the REQ_NO_ZONE_WRITE_LOCK flag.
 - Dropped the null_blk patch and added two scsi_debug patches instead.
 - Dropped the f2fs patch.
 - Split the patch for the UFS driver into two patches.
 - Modified several patch descriptions and source code comments.
 - Renamed dd_use_write_locking() into dd_use_zone_write_locking().
 - Moved the list_sort() call from scsi_unjam_host() into scsi_eh_flush_done_q()
   such that sorting happens just before reinserting.
 - Removed the scsi_cmd_retry_allowed() call from scsi_check_sense() to make
   sure that the retry counter is adjusted once per retry instead of twice.

Changes compared to v3:
 - Restored the patch that introduces QUEUE_FLAG_NO_ZONE_WRITE_LOCK. That patch
   had accidentally been left out from v2.
 - In patch "block: Introduce the flag REQ_NO_ZONE_WRITE_LOCK", improved the
   patch description and added the function blk_no_zone_write_lock().
 - In patch "block/mq-deadline: Only use zone locking if necessary", moved the
   blk_queue_is_zoned() call into dd_use_write_locking().
 - In patch "fs/f2fs: Disable zone write locking", set REQ_NO_ZONE_WRITE_LOCK
   from inside __bio_alloc() instead of in f2fs_submit_write_bio().

Changes compared to v2:
 - Renamed the request queue flag for disabling zone write locking.
 - Introduced a new request flag for disabling zone write locking.
 - Modified the mq-deadline scheduler such that zone write locking is only
   disabled if both flags are set.
 - Added an F2FS patch that sets the request flag for disabling zone write
   locking.
 - Only disable zone write locking in the UFS driver if auto-hibernation is
   disabled.

Changes compared to v1:
 - Left out the patches that are already upstream.
 - Switched the approach in patch "scsi: Retry unaligned zoned writes" from
   retrying immediately to sending unaligne

Bart Van Assche (18):
  block: Support block devices that preserve the order of write requests
  blk-mq: Always insert sequential zoned writes into a software queue
  blk-mq: Restore the zone write order when requeuing
  blk-mq: Move the blk_queue_sq_sched() calls
  blk-mq: Run all hwqs for sq scheds if write pipelining is enabled
  block/mq-deadline: Enable zoned write pipelining
  blk-zoned: Add an argument to blk_zone_plug_bio()
  blk-zoned: Split an if-statement
  blk-zoned: Move code from disk_zone_wplug_add_bio() into its caller
  blk-zoned: Introduce a loop in blk_zone_wplug_bio_work()
  blk-zoned: Document disk_zone_wplug_schedule_bio_work() locking
  blk-zoned: Support pipelining of zoned writes
  null_blk: Add the preserves_write_order attribute
  scsi: core: Retry unaligned zoned writes
  scsi: sd: Increase retry count for zoned writes
  scsi: scsi_debug: Add the preserves_write_order module parameter
  scsi: scsi_debug: Support injecting unaligned write errors
  ufs: core: Inform the block layer about write ordering

 block/bfq-iosched.c               |   2 +
 block/blk-mq.c                    |  88 +++++++++---
 block/blk-mq.h                    |   2 +
 block/blk-settings.c              |   2 +
 block/blk-zoned.c                 | 220 ++++++++++++++++++++----------
 block/elevator.h                  |   1 +
 block/kyber-iosched.c             |   2 +
 block/mq-deadline.c               | 107 +++++++++++----
 drivers/block/null_blk/main.c     |   4 +
 drivers/block/null_blk/null_blk.h |   1 +
 drivers/md/dm.c                   |   5 +-
 drivers/scsi/scsi_debug.c         |  22 ++-
 drivers/scsi/scsi_error.c         |  16 +++
 drivers/scsi/sd.c                 |   6 +
 drivers/ufs/core/ufshcd.c         |   7 +
 include/linux/blk-mq.h            |  13 +-
 include/linux/blkdev.h            |  18 ++-
 17 files changed, 392 insertions(+), 124 deletions(-)


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-08-28 13:32 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-27 21:29 [PATCH v24 00/18] Improve write performance for zoned UFS devices Bart Van Assche
2025-08-27 21:29 ` [PATCH v24 01/18] block: Support block devices that preserve the order of write requests Bart Van Assche
2025-08-27 21:29 ` [PATCH v24 02/18] blk-mq: Always insert sequential zoned writes into a software queue Bart Van Assche
2025-08-27 21:29 ` [PATCH v24 03/18] blk-mq: Restore the zone write order when requeuing Bart Van Assche
2025-08-27 21:29 ` [PATCH v24 04/18] blk-mq: Move the blk_queue_sq_sched() calls Bart Van Assche
2025-08-27 21:29 ` [PATCH v24 05/18] blk-mq: Run all hwqs for sq scheds if write pipelining is enabled Bart Van Assche
2025-08-27 21:29 ` [PATCH v24 06/18] block/mq-deadline: Enable zoned write pipelining Bart Van Assche
2025-08-27 21:29 ` [PATCH v24 07/18] blk-zoned: Add an argument to blk_zone_plug_bio() Bart Van Assche
2025-08-27 21:29 ` [PATCH v24 08/18] blk-zoned: Split an if-statement Bart Van Assche
2025-08-27 21:29 ` [PATCH v24 09/18] blk-zoned: Move code from disk_zone_wplug_add_bio() into its caller Bart Van Assche
2025-08-27 21:29 ` [PATCH v24 10/18] blk-zoned: Introduce a loop in blk_zone_wplug_bio_work() Bart Van Assche
2025-08-27 21:29 ` [PATCH v24 11/18] blk-zoned: Document disk_zone_wplug_schedule_bio_work() locking Bart Van Assche
2025-08-27 21:29 ` [PATCH v24 12/18] blk-zoned: Support pipelining of zoned writes Bart Van Assche
2025-08-27 21:29 ` [PATCH v24 13/18] null_blk: Add the preserves_write_order attribute Bart Van Assche
2025-08-27 21:29 ` [PATCH v24 14/18] scsi: core: Retry unaligned zoned writes Bart Van Assche
2025-08-27 21:29 ` [PATCH v24 15/18] scsi: sd: Increase retry count for " Bart Van Assche
2025-08-27 21:29 ` [PATCH v24 16/18] scsi: scsi_debug: Add the preserves_write_order module parameter Bart Van Assche
2025-08-27 21:29 ` [PATCH v24 17/18] scsi: scsi_debug: Support injecting unaligned write errors Bart Van Assche
2025-08-27 21:29 ` [PATCH v24 18/18] ufs: core: Inform the block layer about write ordering Bart Van Assche
2025-08-28 11:23 ` [PATCH v24 00/18] Improve write performance for zoned UFS devices Hannes Reinecke
2025-08-28 13:32   ` Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).