From: Ming Lei <ming.lei@redhat.com>
To: Damien Le Moal <dlemoal@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org
Subject: Re: [PATCH 0/2] New zoned loop block device driver
Date: Thu, 6 Feb 2025 11:24:33 +0800 [thread overview]
Message-ID: <Z6QrceGGAJl_X_BM@fedora> (raw)
In-Reply-To: <f6d82d47-ff27-43e8-a772-0ab90a2f86c4@kernel.org>
On Wed, Feb 05, 2025 at 03:07:51PM +0900, Damien Le Moal wrote:
> On 2/5/25 12:43 PM, Ming Lei wrote:
> >>> Can you share how you create rublk/zoned and zloop and the underlying
> >>> device info? Especially queue depth and nr_queues(both rublk/zloop &
> >>> underlying disk) plays a big role.
> >>
> >> rublk:
> >>
> >> cargo run -r -- add zoned --size 524288 --zone-size 256 --conv-zones 0 \
> >> --logical-block-size 4096 --queue ${nrq} --depth 128 \
> >> --path /mnt/zloop/0
> >>
> >> zloop:
> >>
> >> echo "add conv_zones=0,capacity_mb=524288,zone_size_mb=256,\
> >> base_dir=/mnt/zloop,nr_queues=${nrq},queue_depth=128" > /dev/zloop-control
> >
> > zone is actually stateful, maybe it is better to use standalone backing
> > directory/files.
>
> I do not understand what you are saying... I reformat the backing FS and
> recreate the same /mnt/zloop/0 directory for every test, to be sure I am not
> seeing an artifact from the FS.
I meant same backfiles are shared for two devices.
But I guess it may not be big deal.
>
> >> The backing storage is using XFS on a PCIe Gen4 4TB M.2 SSD (my Xeon machine is
> >> PCIe Gen3 though). This drive has a large enough max_qid to provide one IO queue
> >> pair per CPU for up to 32 CPUs (16-cores / 32-threads).
> >
> > I just setup one XFS over nvme in real hardware, still can't reproduce the big gap in
> > your test result. Kernel is v6.13 with zloop patch v2.
> >
> > `8 queues` should only make a difference for the test of "QD=32, 4K rnd wr, 8 jobs".
> > For other single job test, single queue supposes to be same with 8 queues.
> >
> > The big gap is mainly in test of 'QD=32, 128K seq wr, 1 job ', maybe your local
> > change improves zloop's merge? In my test:
> >
> > - ublk/zoned : 912 MiB/s
> > - zloop(v2) : 960 MiB/s.
> >
> > BTW, my test is over btrfs, and follows the test script:
> >
> > fio --size=32G --time_based --bsrange=128K-128K --runtime=40 --numjobs=1 \
> > --ioengine=libaio --iodepth=32 --directory=./ublk --group_reporting=1 --direct=1 \
> > --fsync=0 --name=f1 --stonewall --rw=write
>
> If you add an FS on top of the emulated zoned deive, you are testing the FS
> perf as much as the backing dev. I focused on the backing dev so I ran fio
> directly on top of the emulated drive. E.g.:
>
> fio --name=test --filename=${dev} --rw=randwrite \
> --ioengine=libaio --iodepth=32 --direct=1 --bs=4096 \
> --zonemode=zbd --numjobs=8 --group_reporting --norandommap \
> --cpus_allowed=0-7 --cpus_allowed_policy=split \
> --runtime=${runtime} --ramp_time=5 --time_based
>
> (you must use libaio here)
Thanks for sharing the '--zonemode=zbd'.
I can reproduce the perf issue with the above script, and the reason is related
to io-uring emulation and zone space pre-allocation.
When FS WRITE IO needs to allocate space, .write_iter() returns -EAGAIN
for each io-uring write, then the write is always fallback to io-wq, cause
very bad sequential write perf.
It can be fixed[1] simply by pre-allocating space before writing to the
beginning of each seq-zone.
Now follows result in my test over real nvme/XFS:
+ ./zfio /dev/zloop0 write 1 40
write /dev/zloop0: jobs 1 io_depth 32 time 40sec
BS 4k: IOPS 171383 BW 685535KiB/s fio_cpu_util(25% 38%)
BS 128k: IOPS 7669 BW 981846KiB/s fio_cpu_util( 5% 11%)
+ ./zfio /dev/ublkb0 write 1 40
write /dev/ublkb0: jobs 1 io_depth 32 time 40sec
BS 4k: IOPS 179861 BW 719448KiB/s fio_cpu_util(29% 42%)
BS 128k: IOPS 7239 BW 926786KiB/s fio_cpu_util( 6% 9%)
+ ./zfio /dev/zloop0 randwrite 1 40
randwrite /dev/zloop0: jobs 1 io_depth 32 time 40sec
BS 4k: IOPS 8909 BW 35642KiB/s fio_cpu_util( 2% 5%)
BS 128k: IOPS 210 BW 27035KiB/s fio_cpu_util( 0% 0%)
+ ./zfio /dev/ublkb0 randwrite 1 40
randwrite /dev/ublkb0: jobs 1 io_depth 32 time 40sec
BS 4k: IOPS 20500 BW 82001KiB/s fio_cpu_util( 5% 12%)
BS 128k: IOPS 5622 BW 719792KiB/s fio_cpu_util( 6% 8%)
[1] https://github.com/ublk-org/rublk/commit/fd01a87abb2f9b8e94c8da24e73683e4bb12659b
[2] `z` (zone fio test script) https://github.com/ublk-org/rublk/blob/main/scripts/zfio
Thanks,
Ming
next prev parent reply other threads:[~2025-02-06 3:24 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-06 14:24 [PATCH 0/2] New zoned loop block device driver Damien Le Moal
2025-01-06 14:24 ` [PATCH 1/2] block: new " Damien Le Moal
2025-01-06 14:24 ` [PATCH 2/2] Documentation: Document the " Damien Le Moal
2025-01-06 14:54 ` [PATCH 0/2] New " Jens Axboe
2025-01-06 15:21 ` Christoph Hellwig
2025-01-06 15:24 ` Jens Axboe
2025-01-06 15:32 ` Christoph Hellwig
2025-01-06 15:38 ` Jens Axboe
2025-01-06 15:44 ` Christoph Hellwig
2025-01-06 17:38 ` Jens Axboe
2025-01-06 18:05 ` Christoph Hellwig
2025-01-07 21:10 ` Jens Axboe
2025-01-08 5:49 ` Christoph Hellwig
2025-01-07 1:08 ` Damien Le Moal
2025-01-07 21:08 ` Jens Axboe
2025-01-08 5:11 ` Damien Le Moal
2025-01-08 5:44 ` Christoph Hellwig
2025-01-08 2:47 ` Ming Lei
2025-01-08 14:10 ` Theodore Ts'o
2025-01-08 2:29 ` Ming Lei
2025-01-08 5:06 ` Damien Le Moal
2025-01-08 8:13 ` Ming Lei
2025-01-08 9:09 ` Christoph Hellwig
2025-01-08 9:39 ` Ming Lei
2025-01-10 12:34 ` Ming Lei
2025-01-24 9:30 ` Damien Le Moal
2025-01-24 12:30 ` Ming Lei
2025-01-24 14:20 ` Johannes Thumshirn
2025-01-29 8:10 ` Damien Le Moal
2025-01-31 3:54 ` Ming Lei
2025-02-04 3:22 ` Damien Le Moal
2025-02-05 3:43 ` Ming Lei
2025-02-05 6:07 ` Damien Le Moal
2025-02-06 3:24 ` Ming Lei [this message]
2025-01-08 5:47 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z6QrceGGAJl_X_BM@fedora \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=dlemoal@kernel.org \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.