From: Bart Van Assche <bvanassche@acm.org>
To: Damien Le Moal <dlemoal@kernel.org>, Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v25 07/20] block/mq-deadline: Enable zoned write pipelining
Date: Wed, 22 Oct 2025 11:26:34 -0700 [thread overview]
Message-ID: <03dd02cd-6a08-42ce-9b06-f9968038faee@acm.org> (raw)
In-Reply-To: <d667eb93-6ced-4b36-963c-e6906413aee9@kernel.org>
On 10/21/25 2:01 PM, Damien Le Moal wrote:
> On 10/21/25 03:28, Bart Van Assche wrote:
>> On 10/17/25 10:31 PM, Damien Le Moal wrote:
>>> Maybe we need to rethink this, restarting from your main use case and why
>>> performance is not good. I think that said main use case is f2fs. So what
>>> happens with write throughput with it ? Why doesn't merging of small writes in
>>> the zone write plugs improve performance ? Wouldn't small modifications to f2fs
>>> zone write path improve things ?
>>
>> F2FS typically generates large writes if the I/O bandwidth is high (100
>> MiB or more). Write pipelining improves write performance even for large
>> writes but not by a spectacular percentage. Write pipelining only
>> results in a drastic performance improvement if the write size is kept
>> small (e.g. 4 KiB).
>
> But you are talking about high queue dpeth 4K write pattern, right ? And if yes,
> BIO merging in the zone write plugs should generate much larger commands anyway.
> Have you verified that this is working as expected ?
Write pipelining improves performance even if bio merging is enabled
because with write pipelining enabled the Linux kernel doesn't wait for
a prior write to complete before the next write is sent to the storage
device.
>>> If the answers to all of the above is "no/does not work", what about a different
>>> approach: zone write plugging v2 with a single thread per CPU that does the
>>> pipelining without to force changes to other layers/change the API all over the
>>> block layer ?
>>
>> The block layer changes that I'm proposing are small, easy to maintain
>> and not invasive. Using a mutex when pipelining writes only as I
>> proposed in a previous email is a solution that will yield better
>> performance than delegating work to another thread. Obtaining an
>> uncontended mutex takes less than a microsecond. Delegating work to
>> another thread introduces a delay of 10 to 100 microseconds.
>>
>>> Unless you have a neat way to recreate the problem without Zoned UFS devices ?
>>
>> This patch series adds support in both the scsi_debug and null_blk
>> drivers for write pipelining. If the mq-deadline patches from this
>> series are reverted then the attached shell script sporadically reports
>> a write error on my test setup for the mq-deadline test cases.
>
> I am not trying to check the correctness of your patches. I was wondering if
> there is an easy way to recreate the performance difference you are seeing with
> zoned UFS device easily. E.g. the 4 K write case you are describing above.
Yes, there is an easy way to recreate the performance difference. The
shell script attached to my previous email tests multiple combinations
of I/O schedulers, queue depths and write pipelining enabled/disabled.
The script that I shared disables I/O merging. Even if make the
following changes in that shell script:
--- a/test-pipelining-zoned-writes
+++ b/test-pipelining-zoned-writes
@@ -147,11 +147,11 @@ for mode in "none 0" "none 1" "mq-deadline 0"
"mq-deadline 1"; do
# Disable block layer request merging.
dev="/dev/block/${basename}"
fi
- run_cmd "echo 4096 > /sys/class/block/${basename}/queue/max_sectors_kb"
+ #run_cmd "echo 4096 >
/sys/class/block/${basename}/queue/max_sectors_kb"
# 0: disable I/O statistics
run_cmd "echo 0 > /sys/class/block/${basename}/queue/iostats"
# 2: do not attempt any merges
- run_cmd "echo 2 > /sys/class/block/${basename}/queue/nomerges"
+ #run_cmd "echo 2 > /sys/class/block/${basename}/queue/nomerges"
# 2: complete on the requesting CPU
run_cmd "echo 2 > /sys/class/block/${basename}/queue/rq_affinity"
for iopattern in write randwrite; do
then I still see a significant performance improvement for the null_blk
driver (command-line option -n):
==== iosched=none preserves_write_order=0 iopattern=write qd=1
write: IOPS=6503, BW=25.4MiB/s (26.6MB/s)(762MiB/30003msec); 762 zone
resets
==== iosched=none preserves_write_order=0 iopattern=randwrite qd=1
write: IOPS=6469, BW=25.3MiB/s (26.5MB/s)(758MiB/30003msec); 758 zone
resets
==== iosched=none preserves_write_order=1 iopattern=write qd=1
write: IOPS=5566, BW=21.7MiB/s (22.8MB/s)(652MiB/30010msec); 652 zone
resets
==== iosched=none preserves_write_order=1 iopattern=write qd=64
write: IOPS=15.3k, BW=59.9MiB/s (62.8MB/s)(1797MiB/30001msec); 1796
zone resets
==== iosched=none preserves_write_order=1 iopattern=randwrite qd=1
write: IOPS=5575, BW=21.8MiB/s (22.8MB/s)(653MiB/30005msec); 653 zone
resets
==== iosched=none preserves_write_order=1 iopattern=randwrite qd=64
write: IOPS=15.5k, BW=60.7MiB/s (63.7MB/s)(1821MiB/30001msec); 1821
zone resets
As one can see above, if the queue depth equals 64 and with write
pipelining enabled, IOPS are about 2.5x higher compared to
queue depth 1 and write pipelining disabled.
Thanks,
Bart.
next prev parent reply other threads:[~2025-10-22 18:26 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-14 21:54 [PATCH v25 00/20] Improve write performance for zoned UFS devices Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 01/20] block: Support block devices that preserve the order of write requests Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 02/20] blk-mq: Always insert sequential zoned writes into a software queue Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 03/20] blk-mq: Restore the zone write order when requeuing Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 04/20] blk-mq: Move the blk_queue_sq_sched() calls Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 05/20] blk-mq: Run all hwqs for sq scheds if write pipelining is enabled Bart Van Assche
2025-10-15 7:25 ` Damien Le Moal
2025-10-15 16:35 ` Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 06/20] block/mq-deadline: Make locking IRQ-safe Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 07/20] block/mq-deadline: Enable zoned write pipelining Bart Van Assche
2025-10-15 7:31 ` Damien Le Moal
2025-10-15 16:32 ` Bart Van Assche
2025-10-16 20:50 ` Bart Van Assche
2025-10-18 5:31 ` Damien Le Moal
2025-10-20 18:28 ` Bart Van Assche
2025-10-21 21:01 ` Damien Le Moal
2025-10-22 18:26 ` Bart Van Assche [this message]
2025-10-22 7:07 ` Christoph Hellwig
2025-10-14 21:54 ` [PATCH v25 08/20] blk-zoned: Fix a typo in a source code comment Bart Van Assche
2025-10-15 7:32 ` Damien Le Moal
2025-10-15 16:33 ` Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 09/20] blk-zoned: Add an argument to blk_zone_plug_bio() Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 10/20] blk-zoned: Split an if-statement Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 11/20] blk-zoned: Move code from disk_zone_wplug_add_bio() into its caller Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 12/20] blk-zoned: Introduce a loop in blk_zone_wplug_bio_work() Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 13/20] blk-zoned: Document disk_zone_wplug_schedule_bio_work() locking Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 14/20] blk-zoned: Support pipelining of zoned writes Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 15/20] null_blk: Add the preserves_write_order attribute Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 16/20] scsi: core: Retry unaligned zoned writes Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 17/20] scsi: sd: Increase retry count for " Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 18/20] scsi: scsi_debug: Add the preserves_write_order module parameter Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 19/20] scsi: scsi_debug: Support injecting unaligned write errors Bart Van Assche
2025-10-14 21:54 ` [PATCH v25 20/20] ufs: core: Inform the block layer about write ordering Bart Van Assche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=03dd02cd-6a08-42ce-9b06-f9968038faee@acm.org \
--to=bvanassche@acm.org \
--cc=axboe@kernel.dk \
--cc=dlemoal@kernel.org \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).