Re: [PATCH 0/2] New zoned loop block device driver

Linux block layer
 help / color / mirror / Atom feed

From: Ming Lei <ming.lei@redhat.com>
To: Damien Le Moal <dlemoal@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org
Subject: Re: [PATCH 0/2] New zoned loop block device driver
Date: Wed, 5 Feb 2025 11:43:26 +0800	[thread overview]
Message-ID: <Z6LeXsYw_qq4hqoC@fedora> (raw)
In-Reply-To: <a63406f1-6a45-4d07-b998-504bd2d6d0d7@kernel.org>

On Tue, Feb 04, 2025 at 12:22:53PM +0900, Damien Le Moal wrote:
> On 1/31/25 12:54, Ming Lei wrote:
> > On Wed, Jan 29, 2025 at 05:10:32PM +0900, Damien Le Moal wrote:
> >> On 1/24/25 21:30, Ming Lei wrote:
> >>>> 1 queue:
> >>>> ========
> >>>>                               +-------------------+-------------------+
> >>>>                               | ublk (IOPS / BW)  | zloop (IOPS / BW) |
> >>>>  +----------------------------+-------------------+-------------------+
> >>>>  | QD=1,    4K rnd wr, 1 job  | 11.7k / 47.8 MB/s | 15.8k / 53.0 MB/s |
> >>>>  | QD=32,   4K rnd wr, 8 jobs | 63.4k / 260 MB/s  | 101k / 413 MB/s   |
> >>>
> >>> I can't reproduce the above two, actually not observe obvious difference
> >>> between rublk/zoned and zloop in my test VM.
> >>
> >> I am using bare-metal machines for these tests as I do not want any
> >> noise from a VM/hypervisor in the numbers. And I did say that this is with a
> >> tweaked version of zloop that I have not posted yet (I was waiting for rc1 to
> >> repost as a rebase is needed to correct a compilation failure du to the nomerge
> >> tage set flag being removed). I am attaching the patch I used here (it applies
> >> on top of current Linus tree)
> >>
> >>> Maybe rublk works at debug mode, which reduces perf by half usually.
> >>> And you need to add device via 'cargo run -r -- add zoned' for using
> >>> release mode.
> >>
> >> Well, that is not an obvious thing for someone who does not know rust well. The
> >> README file of rublk also does not mention that. So no, I did not run it like
> >> this. I followed the README and call rublk directly. It would be great to
> >> document that.
> > 
> > OK, that is fine, and now you can install rublk/zoned with 'cargo
> > install rublk' directly, which always build & install the binary of
> > release version.
> > 
> >>
> >>> Actually there is just single io_uring_enter() running in each ublk queue
> >>> pthread, perf should be similar with kernel IO handling, and the main extra
> >>> load is from the single syscall kernel/user context switch and IO data copy,
> >>> and data copy effect can be neglected in small io size usually(< 64KB).
> >>>
> >>>>  | QD=32, 128K rnd wr, 1 job  | 5008 / 656 MB/s   | 5993 / 786 MB/s   |
> >>>>  | QD=32, 128K seq wr, 1 job  | 2636 / 346 MB/s   | 5393 / 707 MB/s   |
> >>>
> >>> ublk 128K BS may be a little slower since there is one extra copy.
> >>
> >> Here are newer numbers running rublk as you suggested (using cargo run -r).
> >> The backend storage is on an XFS file system using a PCI gen4 4TB M.2 SSD that
> >> is empty (the FS is empty on start). The emulated zoned disk has a capacity of
> >> 512GB with sequential zones only of 256 MB (that is, there are 2048
> >> zones/files). Each data point is from a 1min run of fio.
> > 
> > Can you share how you create rublk/zoned and zloop and the underlying
> > device info? Especially queue depth and nr_queues(both rublk/zloop &
> > underlying disk) plays a big role.
> 
> rublk:
> 
> cargo run -r -- add zoned --size 524288 --zone-size 256 --conv-zones 0 \
> 		--logical-block-size 4096 --queue ${nrq} --depth 128 \
> 		--path /mnt/zloop/0
> 
> zloop:
> 
> echo "add conv_zones=0,capacity_mb=524288,zone_size_mb=256,\
> base_dir=/mnt/zloop,nr_queues=${nrq},queue_depth=128" > /dev/zloop-control

zone is actually stateful, maybe it is better to use standalone backing
directory/files.

> 
> The backing storage is using XFS on a PCIe Gen4 4TB M.2 SSD (my Xeon machine is
> PCIe Gen3 though). This drive has a large enough max_qid to provide one IO queue
> pair per CPU for up to 32 CPUs (16-cores / 32-threads).

I just setup one XFS over nvme in real hardware, still can't reproduce the big gap in
your test result. Kernel is v6.13 with zloop patch v2.

`8 queues` should only make a difference for the test of "QD=32,   4K rnd wr, 8 jobs".
For other single job test, single queue supposes to be same with 8 queues.

The big gap is mainly in test of 'QD=32, 128K seq wr, 1 job ', maybe your local
change improves zloop's merge? In my test:

	- ublk/zoned : 912 MiB/s
	- zloop(v2) : 960 MiB/s.

BTW, my test is over btrfs, and follows the test script:

 fio --size=32G --time_based --bsrange=128K-128K --runtime=40 --numjobs=1 \
 	--ioengine=libaio --iodepth=32 --directory=./ublk --group_reporting=1 --direct=1 \
	--fsync=0 --name=f1 --stonewall --rw=write

> 
> > I will take your setting on real hardware and re-run the test after I
> > return from the Spring Festival holiday.
> > 
> >>
> >> On a 8-cores Intel Xeon test box, which has PCI gen 3 only, I get:
> >>
> >> Single queue:
> >> =============
> >>                               +-------------------+-------------------+
> >>                               | ublk (IOPS / BW)  | zloop (IOPS / BW) |
> >>  +----------------------------+-------------------+-------------------+
> >>  | QD=1,    4K rnd wr, 1 job  | 2859 / 11.7 MB/s  | 5535 / 22.7 MB/s  |
> >>  | QD=32,   4K rnd wr, 8 jobs | 24.5k / 100 MB/s  | 24.6k / 101 MB/s  |
> >>  | QD=32, 128K rnd wr, 1 job  | 14.9k / 1954 MB/s | 19.6k / 2571 MB/s |
> >>  | QD=32, 128K seq wr, 1 job  | 1516 / 199 MB/s   | 10.6k / 1385 MB/s |
> >>  +----------------------------+-------------------+-------------------+
> >>
> >> 8 queues:
> >> =========
> >>                               +-------------------+-------------------+
> >>                               | ublk (IOPS / BW)  | zloop (IOPS / BW) |
> >>  +----------------------------+-------------------+-------------------+
> >>  | QD=1,    4K rnd wr, 1 job  | 5387 / 22.1 MB/s  | 5436 / 22.3 MB/s  |
> >>  | QD=32,   4K rnd wr, 8 jobs | 16.4k / 67.0 MB/s | 26.3k / 108 MB/s  |
> >>  | QD=32, 128K rnd wr, 1 job  | 6101 / 800 MB/s   | 19.8k / 2591 MB/s |
> >>  | QD=32, 128K seq wr, 1 job  | 3987 / 523 MB/s   | 10.6k / 1391 MB/s |
> >>  +----------------------------+-------------------+-------------------+
> >>
> >> I have no idea why ublk is generally slower when setup with 8 I/O queues. The
> >> qd=32 4K random write with 8 jobs is generally faster with ublk than zloop, but
> >> that varies. I tracked that down to CPU utilization which is generally much
> >> better (all CPUs used) with ublk compared to zloop, as zloop is at the mercy of
> >> the workqueue code and how it schedules unbound work items.
> > 
> > Maybe it is related with queue depth? The default ublk queue depth is
> > 128, and 8jobs actually causes 256 in-flight IOs, and default ublk nr_queue
> > is 1.
> 
> See above: both rublk and zloop are setup with the exact same number of queues
> and max qd.
> 
> > Another thing I mentioned is that ublk has one extra IO data copy, which
> > slows IO especially when IO size is > 64K usually.
> 
> Yes. I do keep this in mind when looking at the results.
> 
> [...]
> 
> >>> Simplicity need to be observed from multiple dimensions, 300 vs. 1500 LoC has
> >>> shown something already, IMO.
> >>
> >> Sure. But given the very complicated syntax of rust, a lower LoC for rust
> >> compared to C is very subjective in my opinion.
> >>
> >> I said "simplicity" in the context of the driver use. And rublk is not as
> >> simple to use as zloop as it needs rust/cargo installed which is not an
> >> acceptable dependency for xfstests. Furthermore, it is very annoying to have to
> > 
> > xfstests just need user to pass the zoned block device, so the same test can
> > cover any zoned device.
> 
> Sure. But the environment that allows that still needs to have the rust
> dependency to pull-in and build rublk before using it to run the tests. That is
> more dependencies for a CI system or minimal VMs that are not necessarilly based
> on a full distro but used to run xfstests.

OK, it isn't too hard to solve:

- `install cargo` in the distribution if `cargo` doesn't exist

- run 'cargo install rublk' if rublk isn't installed



Thanks,
Ming

next prev parent reply	other threads:[~2025-02-05  3:43 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-06 14:24 [PATCH 0/2] New zoned loop block device driver Damien Le Moal
2025-01-06 14:24 ` [PATCH 1/2] block: new " Damien Le Moal
2025-01-06 14:24 ` [PATCH 2/2] Documentation: Document the " Damien Le Moal
2025-01-06 14:54 ` [PATCH 0/2] New " Jens Axboe
2025-01-06 15:21   ` Christoph Hellwig
2025-01-06 15:24     ` Jens Axboe
2025-01-06 15:32       ` Christoph Hellwig
2025-01-06 15:38         ` Jens Axboe
2025-01-06 15:44           ` Christoph Hellwig
2025-01-06 17:38             ` Jens Axboe
2025-01-06 18:05               ` Christoph Hellwig
2025-01-07 21:10                 ` Jens Axboe
2025-01-08  5:49                   ` Christoph Hellwig
2025-01-07  1:08               ` Damien Le Moal
2025-01-07 21:08                 ` Jens Axboe
2025-01-08  5:11                   ` Damien Le Moal
2025-01-08  5:44                   ` Christoph Hellwig
2025-01-08  2:47             ` Ming Lei
2025-01-08 14:10               ` Theodore Ts'o
2025-01-08  2:29     ` Ming Lei
2025-01-08  5:06       ` Damien Le Moal
2025-01-08  8:13         ` Ming Lei
2025-01-08  9:09           ` Christoph Hellwig
2025-01-08  9:39             ` Ming Lei
2025-01-10 12:34               ` Ming Lei
2025-01-24  9:30                 ` Damien Le Moal
2025-01-24 12:30                   ` Ming Lei
2025-01-24 14:20                     ` Johannes Thumshirn
2025-01-29  8:10                     ` Damien Le Moal
2025-01-31  3:54                       ` Ming Lei
2025-02-04  3:22                         ` Damien Le Moal
2025-02-05  3:43                           ` Ming Lei [this message]
2025-02-05  6:07                             ` Damien Le Moal
2025-02-06  3:24                               ` Ming Lei
2025-01-08  5:47       ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z6LeXsYw_qq4hqoC@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=dlemoal@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox