From: Ming Lei <ming.lei@redhat.com>
To: Damien Le Moal <dlemoal@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org
Subject: Re: [PATCH 0/2] New zoned loop block device driver
Date: Wed, 5 Feb 2025 11:43:26 +0800 [thread overview]
Message-ID: <Z6LeXsYw_qq4hqoC@fedora> (raw)
In-Reply-To: <a63406f1-6a45-4d07-b998-504bd2d6d0d7@kernel.org>
On Tue, Feb 04, 2025 at 12:22:53PM +0900, Damien Le Moal wrote:
> On 1/31/25 12:54, Ming Lei wrote:
> > On Wed, Jan 29, 2025 at 05:10:32PM +0900, Damien Le Moal wrote:
> >> On 1/24/25 21:30, Ming Lei wrote:
> >>>> 1 queue:
> >>>> ========
> >>>> +-------------------+-------------------+
> >>>> | ublk (IOPS / BW) | zloop (IOPS / BW) |
> >>>> +----------------------------+-------------------+-------------------+
> >>>> | QD=1, 4K rnd wr, 1 job | 11.7k / 47.8 MB/s | 15.8k / 53.0 MB/s |
> >>>> | QD=32, 4K rnd wr, 8 jobs | 63.4k / 260 MB/s | 101k / 413 MB/s |
> >>>
> >>> I can't reproduce the above two, actually not observe obvious difference
> >>> between rublk/zoned and zloop in my test VM.
> >>
> >> I am using bare-metal machines for these tests as I do not want any
> >> noise from a VM/hypervisor in the numbers. And I did say that this is with a
> >> tweaked version of zloop that I have not posted yet (I was waiting for rc1 to
> >> repost as a rebase is needed to correct a compilation failure du to the nomerge
> >> tage set flag being removed). I am attaching the patch I used here (it applies
> >> on top of current Linus tree)
> >>
> >>> Maybe rublk works at debug mode, which reduces perf by half usually.
> >>> And you need to add device via 'cargo run -r -- add zoned' for using
> >>> release mode.
> >>
> >> Well, that is not an obvious thing for someone who does not know rust well. The
> >> README file of rublk also does not mention that. So no, I did not run it like
> >> this. I followed the README and call rublk directly. It would be great to
> >> document that.
> >
> > OK, that is fine, and now you can install rublk/zoned with 'cargo
> > install rublk' directly, which always build & install the binary of
> > release version.
> >
> >>
> >>> Actually there is just single io_uring_enter() running in each ublk queue
> >>> pthread, perf should be similar with kernel IO handling, and the main extra
> >>> load is from the single syscall kernel/user context switch and IO data copy,
> >>> and data copy effect can be neglected in small io size usually(< 64KB).
> >>>
> >>>> | QD=32, 128K rnd wr, 1 job | 5008 / 656 MB/s | 5993 / 786 MB/s |
> >>>> | QD=32, 128K seq wr, 1 job | 2636 / 346 MB/s | 5393 / 707 MB/s |
> >>>
> >>> ublk 128K BS may be a little slower since there is one extra copy.
> >>
> >> Here are newer numbers running rublk as you suggested (using cargo run -r).
> >> The backend storage is on an XFS file system using a PCI gen4 4TB M.2 SSD that
> >> is empty (the FS is empty on start). The emulated zoned disk has a capacity of
> >> 512GB with sequential zones only of 256 MB (that is, there are 2048
> >> zones/files). Each data point is from a 1min run of fio.
> >
> > Can you share how you create rublk/zoned and zloop and the underlying
> > device info? Especially queue depth and nr_queues(both rublk/zloop &
> > underlying disk) plays a big role.
>
> rublk:
>
> cargo run -r -- add zoned --size 524288 --zone-size 256 --conv-zones 0 \
> --logical-block-size 4096 --queue ${nrq} --depth 128 \
> --path /mnt/zloop/0
>
> zloop:
>
> echo "add conv_zones=0,capacity_mb=524288,zone_size_mb=256,\
> base_dir=/mnt/zloop,nr_queues=${nrq},queue_depth=128" > /dev/zloop-control
zone is actually stateful, maybe it is better to use standalone backing
directory/files.
>
> The backing storage is using XFS on a PCIe Gen4 4TB M.2 SSD (my Xeon machine is
> PCIe Gen3 though). This drive has a large enough max_qid to provide one IO queue
> pair per CPU for up to 32 CPUs (16-cores / 32-threads).
I just setup one XFS over nvme in real hardware, still can't reproduce the big gap in
your test result. Kernel is v6.13 with zloop patch v2.
`8 queues` should only make a difference for the test of "QD=32, 4K rnd wr, 8 jobs".
For other single job test, single queue supposes to be same with 8 queues.
The big gap is mainly in test of 'QD=32, 128K seq wr, 1 job ', maybe your local
change improves zloop's merge? In my test:
- ublk/zoned : 912 MiB/s
- zloop(v2) : 960 MiB/s.
BTW, my test is over btrfs, and follows the test script:
fio --size=32G --time_based --bsrange=128K-128K --runtime=40 --numjobs=1 \
--ioengine=libaio --iodepth=32 --directory=./ublk --group_reporting=1 --direct=1 \
--fsync=0 --name=f1 --stonewall --rw=write
>
> > I will take your setting on real hardware and re-run the test after I
> > return from the Spring Festival holiday.
> >
> >>
> >> On a 8-cores Intel Xeon test box, which has PCI gen 3 only, I get:
> >>
> >> Single queue:
> >> =============
> >> +-------------------+-------------------+
> >> | ublk (IOPS / BW) | zloop (IOPS / BW) |
> >> +----------------------------+-------------------+-------------------+
> >> | QD=1, 4K rnd wr, 1 job | 2859 / 11.7 MB/s | 5535 / 22.7 MB/s |
> >> | QD=32, 4K rnd wr, 8 jobs | 24.5k / 100 MB/s | 24.6k / 101 MB/s |
> >> | QD=32, 128K rnd wr, 1 job | 14.9k / 1954 MB/s | 19.6k / 2571 MB/s |
> >> | QD=32, 128K seq wr, 1 job | 1516 / 199 MB/s | 10.6k / 1385 MB/s |
> >> +----------------------------+-------------------+-------------------+
> >>
> >> 8 queues:
> >> =========
> >> +-------------------+-------------------+
> >> | ublk (IOPS / BW) | zloop (IOPS / BW) |
> >> +----------------------------+-------------------+-------------------+
> >> | QD=1, 4K rnd wr, 1 job | 5387 / 22.1 MB/s | 5436 / 22.3 MB/s |
> >> | QD=32, 4K rnd wr, 8 jobs | 16.4k / 67.0 MB/s | 26.3k / 108 MB/s |
> >> | QD=32, 128K rnd wr, 1 job | 6101 / 800 MB/s | 19.8k / 2591 MB/s |
> >> | QD=32, 128K seq wr, 1 job | 3987 / 523 MB/s | 10.6k / 1391 MB/s |
> >> +----------------------------+-------------------+-------------------+
> >>
> >> I have no idea why ublk is generally slower when setup with 8 I/O queues. The
> >> qd=32 4K random write with 8 jobs is generally faster with ublk than zloop, but
> >> that varies. I tracked that down to CPU utilization which is generally much
> >> better (all CPUs used) with ublk compared to zloop, as zloop is at the mercy of
> >> the workqueue code and how it schedules unbound work items.
> >
> > Maybe it is related with queue depth? The default ublk queue depth is
> > 128, and 8jobs actually causes 256 in-flight IOs, and default ublk nr_queue
> > is 1.
>
> See above: both rublk and zloop are setup with the exact same number of queues
> and max qd.
>
> > Another thing I mentioned is that ublk has one extra IO data copy, which
> > slows IO especially when IO size is > 64K usually.
>
> Yes. I do keep this in mind when looking at the results.
>
> [...]
>
> >>> Simplicity need to be observed from multiple dimensions, 300 vs. 1500 LoC has
> >>> shown something already, IMO.
> >>
> >> Sure. But given the very complicated syntax of rust, a lower LoC for rust
> >> compared to C is very subjective in my opinion.
> >>
> >> I said "simplicity" in the context of the driver use. And rublk is not as
> >> simple to use as zloop as it needs rust/cargo installed which is not an
> >> acceptable dependency for xfstests. Furthermore, it is very annoying to have to
> >
> > xfstests just need user to pass the zoned block device, so the same test can
> > cover any zoned device.
>
> Sure. But the environment that allows that still needs to have the rust
> dependency to pull-in and build rublk before using it to run the tests. That is
> more dependencies for a CI system or minimal VMs that are not necessarilly based
> on a full distro but used to run xfstests.
OK, it isn't too hard to solve:
- `install cargo` in the distribution if `cargo` doesn't exist
- run 'cargo install rublk' if rublk isn't installed
Thanks,
Ming
next prev parent reply other threads:[~2025-02-05 3:43 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-06 14:24 [PATCH 0/2] New zoned loop block device driver Damien Le Moal
2025-01-06 14:24 ` [PATCH 1/2] block: new " Damien Le Moal
2025-01-06 14:24 ` [PATCH 2/2] Documentation: Document the " Damien Le Moal
2025-01-06 14:54 ` [PATCH 0/2] New " Jens Axboe
2025-01-06 15:21 ` Christoph Hellwig
2025-01-06 15:24 ` Jens Axboe
2025-01-06 15:32 ` Christoph Hellwig
2025-01-06 15:38 ` Jens Axboe
2025-01-06 15:44 ` Christoph Hellwig
2025-01-06 17:38 ` Jens Axboe
2025-01-06 18:05 ` Christoph Hellwig
2025-01-07 21:10 ` Jens Axboe
2025-01-08 5:49 ` Christoph Hellwig
2025-01-07 1:08 ` Damien Le Moal
2025-01-07 21:08 ` Jens Axboe
2025-01-08 5:11 ` Damien Le Moal
2025-01-08 5:44 ` Christoph Hellwig
2025-01-08 2:47 ` Ming Lei
2025-01-08 14:10 ` Theodore Ts'o
2025-01-08 2:29 ` Ming Lei
2025-01-08 5:06 ` Damien Le Moal
2025-01-08 8:13 ` Ming Lei
2025-01-08 9:09 ` Christoph Hellwig
2025-01-08 9:39 ` Ming Lei
2025-01-10 12:34 ` Ming Lei
2025-01-24 9:30 ` Damien Le Moal
2025-01-24 12:30 ` Ming Lei
2025-01-24 14:20 ` Johannes Thumshirn
2025-01-29 8:10 ` Damien Le Moal
2025-01-31 3:54 ` Ming Lei
2025-02-04 3:22 ` Damien Le Moal
2025-02-05 3:43 ` Ming Lei [this message]
2025-02-05 6:07 ` Damien Le Moal
2025-02-06 3:24 ` Ming Lei
2025-01-08 5:47 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z6LeXsYw_qq4hqoC@fedora \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=dlemoal@kernel.org \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox