linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bart Van Assche <bvanassche@acm.org>
To: Damien Le Moal <dlemoal@kernel.org>, Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v25 07/20] block/mq-deadline: Enable zoned write pipelining
Date: Wed, 22 Oct 2025 11:26:34 -0700	[thread overview]
Message-ID: <03dd02cd-6a08-42ce-9b06-f9968038faee@acm.org> (raw)
In-Reply-To: <d667eb93-6ced-4b36-963c-e6906413aee9@kernel.org>

On 10/21/25 2:01 PM, Damien Le Moal wrote:
> On 10/21/25 03:28, Bart Van Assche wrote:
>> On 10/17/25 10:31 PM, Damien Le Moal wrote:
>>> Maybe we need to rethink this, restarting from your main use case and why
>>> performance is not good. I think that said main use case is f2fs. So what
>>> happens with write throughput with it ? Why doesn't merging of small writes in
>>> the zone write plugs improve performance ? Wouldn't small modifications to f2fs
>>> zone write path improve things ?
>>
>> F2FS typically generates large writes if the I/O bandwidth is high (100
>> MiB or more). Write pipelining improves write performance even for large
>> writes but not by a spectacular percentage. Write pipelining only
>> results in a drastic performance improvement if the write size is kept
>> small (e.g. 4 KiB).
> 
> But you are talking about high queue dpeth 4K write pattern, right ? And if yes,
> BIO merging in the zone write plugs should generate much larger commands anyway.
> Have you verified that this is working as expected ?

Write pipelining improves performance even if bio merging is enabled
because with write pipelining enabled the Linux kernel doesn't wait for
a prior write to complete before the next write is sent to the storage
device.

>>> If the answers to all of the above is "no/does not work", what about a different
>>> approach: zone write plugging v2 with a single thread per CPU that does the
>>> pipelining without to force changes to other layers/change the API all over the
>>> block layer ?
>>
>> The block layer changes that I'm proposing are small, easy to maintain
>> and not invasive. Using a mutex when pipelining writes only as I
>> proposed in a previous email is a solution that will yield better
>> performance than delegating work to another thread. Obtaining an
>> uncontended mutex takes less than a microsecond. Delegating work to
>> another thread introduces a delay of 10 to 100 microseconds.
>>
>>> Unless you have a neat way to recreate the problem without Zoned UFS devices ?
>>
>> This patch series adds support in both the scsi_debug and null_blk
>> drivers for write pipelining. If the mq-deadline patches from this
>> series are reverted then the attached shell script sporadically reports
>> a write error on my test setup for the mq-deadline test cases.
> 
> I am not trying to check the correctness of your patches. I was wondering if
> there is an easy way to recreate the performance difference you are seeing with
> zoned UFS device easily. E.g. the 4 K write case you are describing above.

Yes, there is an easy way to recreate the performance difference. The
shell script attached to my previous email tests multiple combinations
of I/O schedulers, queue depths and write pipelining enabled/disabled.
The script that I shared disables I/O merging. Even if make the
following changes in that shell script:

--- a/test-pipelining-zoned-writes
+++ b/test-pipelining-zoned-writes
@@ -147,11 +147,11 @@ for mode in "none 0" "none 1" "mq-deadline 0" 
"mq-deadline 1"; do
         # Disable block layer request merging.
         dev="/dev/block/${basename}"
      fi
-    run_cmd "echo 4096 > /sys/class/block/${basename}/queue/max_sectors_kb"
+    #run_cmd "echo 4096 > 
/sys/class/block/${basename}/queue/max_sectors_kb"
      # 0: disable I/O statistics
      run_cmd "echo 0 > /sys/class/block/${basename}/queue/iostats"
      # 2: do not attempt any merges
-    run_cmd "echo 2 > /sys/class/block/${basename}/queue/nomerges"
+    #run_cmd "echo 2 > /sys/class/block/${basename}/queue/nomerges"
      # 2: complete on the requesting CPU
      run_cmd "echo 2 > /sys/class/block/${basename}/queue/rq_affinity"
      for iopattern in write randwrite; do

then I still see a significant performance improvement for the null_blk
driver (command-line option -n):

==== iosched=none preserves_write_order=0 iopattern=write qd=1
   write: IOPS=6503, BW=25.4MiB/s (26.6MB/s)(762MiB/30003msec); 762 zone 
resets
==== iosched=none preserves_write_order=0 iopattern=randwrite qd=1
   write: IOPS=6469, BW=25.3MiB/s (26.5MB/s)(758MiB/30003msec); 758 zone 
resets
==== iosched=none preserves_write_order=1 iopattern=write qd=1
   write: IOPS=5566, BW=21.7MiB/s (22.8MB/s)(652MiB/30010msec); 652 zone 
resets
==== iosched=none preserves_write_order=1 iopattern=write qd=64
   write: IOPS=15.3k, BW=59.9MiB/s (62.8MB/s)(1797MiB/30001msec); 1796 
zone resets
==== iosched=none preserves_write_order=1 iopattern=randwrite qd=1
   write: IOPS=5575, BW=21.8MiB/s (22.8MB/s)(653MiB/30005msec); 653 zone 
resets
==== iosched=none preserves_write_order=1 iopattern=randwrite qd=64
   write: IOPS=15.5k, BW=60.7MiB/s (63.7MB/s)(1821MiB/30001msec); 1821 
zone resets

As one can see above, if the queue depth equals 64 and with write
pipelining enabled, IOPS are about 2.5x higher compared to
queue depth 1 and write pipelining disabled.

Thanks,

Bart.

  reply	other threads:[~2025-10-22 18:26 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-14 21:54 [PATCH v25 00/20] Improve write performance for zoned UFS devices Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 01/20] block: Support block devices that preserve the order of write requests Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 02/20] blk-mq: Always insert sequential zoned writes into a software queue Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 03/20] blk-mq: Restore the zone write order when requeuing Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 04/20] blk-mq: Move the blk_queue_sq_sched() calls Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 05/20] blk-mq: Run all hwqs for sq scheds if write pipelining is enabled Bart Van Assche
2025-10-15  7:25   ` Damien Le Moal
2025-10-15 16:35     ` Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 06/20] block/mq-deadline: Make locking IRQ-safe Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 07/20] block/mq-deadline: Enable zoned write pipelining Bart Van Assche
2025-10-15  7:31   ` Damien Le Moal
2025-10-15 16:32     ` Bart Van Assche
2025-10-16 20:50     ` Bart Van Assche
2025-10-18  5:31       ` Damien Le Moal
2025-10-20 18:28         ` Bart Van Assche
2025-10-21 21:01           ` Damien Le Moal
2025-10-22 18:26             ` Bart Van Assche [this message]
2025-10-22  7:07           ` Christoph Hellwig
2025-10-14 21:54 ` [PATCH v25 08/20] blk-zoned: Fix a typo in a source code comment Bart Van Assche
2025-10-15  7:32   ` Damien Le Moal
2025-10-15 16:33     ` Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 09/20] blk-zoned: Add an argument to blk_zone_plug_bio() Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 10/20] blk-zoned: Split an if-statement Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 11/20] blk-zoned: Move code from disk_zone_wplug_add_bio() into its caller Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 12/20] blk-zoned: Introduce a loop in blk_zone_wplug_bio_work() Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 13/20] blk-zoned: Document disk_zone_wplug_schedule_bio_work() locking Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 14/20] blk-zoned: Support pipelining of zoned writes Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 15/20] null_blk: Add the preserves_write_order attribute Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 16/20] scsi: core: Retry unaligned zoned writes Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 17/20] scsi: sd: Increase retry count for " Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 18/20] scsi: scsi_debug: Add the preserves_write_order module parameter Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 19/20] scsi: scsi_debug: Support injecting unaligned write errors Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 20/20] ufs: core: Inform the block layer about write ordering Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=03dd02cd-6a08-42ce-9b06-f9968038faee@acm.org \
    --to=bvanassche@acm.org \
    --cc=axboe@kernel.dk \
    --cc=dlemoal@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).