All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Damien Le Moal <dlemoal@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org
Subject: Re: [PATCH 0/2] New zoned loop block device driver
Date: Thu, 6 Feb 2025 11:24:33 +0800	[thread overview]
Message-ID: <Z6QrceGGAJl_X_BM@fedora> (raw)
In-Reply-To: <f6d82d47-ff27-43e8-a772-0ab90a2f86c4@kernel.org>

On Wed, Feb 05, 2025 at 03:07:51PM +0900, Damien Le Moal wrote:
> On 2/5/25 12:43 PM, Ming Lei wrote:
> >>> Can you share how you create rublk/zoned and zloop and the underlying
> >>> device info? Especially queue depth and nr_queues(both rublk/zloop &
> >>> underlying disk) plays a big role.
> >>
> >> rublk:
> >>
> >> cargo run -r -- add zoned --size 524288 --zone-size 256 --conv-zones 0 \
> >> 		--logical-block-size 4096 --queue ${nrq} --depth 128 \
> >> 		--path /mnt/zloop/0
> >>
> >> zloop:
> >>
> >> echo "add conv_zones=0,capacity_mb=524288,zone_size_mb=256,\
> >> base_dir=/mnt/zloop,nr_queues=${nrq},queue_depth=128" > /dev/zloop-control
> > 
> > zone is actually stateful, maybe it is better to use standalone backing
> > directory/files.
> 
> I do not understand what you are saying... I reformat the backing FS and
> recreate the same /mnt/zloop/0 directory for every test, to be sure I am not
> seeing an artifact from the FS.

I meant same backfiles are shared for two devices.

But I guess it may not be big deal.

> 
> >> The backing storage is using XFS on a PCIe Gen4 4TB M.2 SSD (my Xeon machine is
> >> PCIe Gen3 though). This drive has a large enough max_qid to provide one IO queue
> >> pair per CPU for up to 32 CPUs (16-cores / 32-threads).
> > 
> > I just setup one XFS over nvme in real hardware, still can't reproduce the big gap in
> > your test result. Kernel is v6.13 with zloop patch v2.
> > 
> > `8 queues` should only make a difference for the test of "QD=32,   4K rnd wr, 8 jobs".
> > For other single job test, single queue supposes to be same with 8 queues.
> > 
> > The big gap is mainly in test of 'QD=32, 128K seq wr, 1 job ', maybe your local
> > change improves zloop's merge? In my test:
> > 
> > 	- ublk/zoned : 912 MiB/s
> > 	- zloop(v2) : 960 MiB/s.
> > 
> > BTW, my test is over btrfs, and follows the test script:
> > 
> >  fio --size=32G --time_based --bsrange=128K-128K --runtime=40 --numjobs=1 \
> >  	--ioengine=libaio --iodepth=32 --directory=./ublk --group_reporting=1 --direct=1 \
> > 	--fsync=0 --name=f1 --stonewall --rw=write
> 
> If you add an FS on top of the emulated zoned deive, you are testing the FS
> perf as much as the backing dev. I focused on the backing dev so I ran fio
> directly on top of the emulated drive. E.g.:
> 
> fio --name=test --filename=${dev} --rw=randwrite \
>                 --ioengine=libaio --iodepth=32 --direct=1 --bs=4096 \
>                 --zonemode=zbd --numjobs=8 --group_reporting --norandommap \
>                 --cpus_allowed=0-7 --cpus_allowed_policy=split \
>                 --runtime=${runtime} --ramp_time=5 --time_based
> 
> (you must use libaio here)

Thanks for sharing the '--zonemode=zbd'.

I can reproduce the perf issue with the above script, and the reason is related
to io-uring emulation and zone space pre-allocation.

When FS WRITE IO needs to allocate space, .write_iter() returns -EAGAIN
for each io-uring write, then the write is always fallback to io-wq, cause
very bad sequential write perf.

It can be fixed[1] simply by pre-allocating space before writing to the
beginning of each seq-zone.

Now follows result in my test over real nvme/XFS:

+ ./zfio /dev/zloop0 write 1 40
    write /dev/zloop0: jobs   1 io_depth   32 time   40sec
	BS   4k: IOPS   171383 BW   685535KiB/s fio_cpu_util(25% 38%)
	BS 128k: IOPS     7669 BW   981846KiB/s fio_cpu_util( 5% 11%)
+ ./zfio /dev/ublkb0 write 1 40
    write /dev/ublkb0: jobs   1 io_depth   32 time   40sec
	BS   4k: IOPS   179861 BW   719448KiB/s fio_cpu_util(29% 42%)
	BS 128k: IOPS     7239 BW   926786KiB/s fio_cpu_util( 6%  9%)

+ ./zfio /dev/zloop0 randwrite 1 40
randwrite /dev/zloop0: jobs   1 io_depth   32 time   40sec
	BS   4k: IOPS     8909 BW    35642KiB/s fio_cpu_util( 2%  5%)
	BS 128k: IOPS      210 BW    27035KiB/s fio_cpu_util( 0%  0%)
+ ./zfio /dev/ublkb0 randwrite 1 40
randwrite /dev/ublkb0: jobs   1 io_depth   32 time   40sec
	BS   4k: IOPS    20500 BW    82001KiB/s fio_cpu_util( 5% 12%)
	BS 128k: IOPS     5622 BW   719792KiB/s fio_cpu_util( 6%  8%)



[1] https://github.com/ublk-org/rublk/commit/fd01a87abb2f9b8e94c8da24e73683e4bb12659b

[2] `z` (zone fio test script) https://github.com/ublk-org/rublk/blob/main/scripts/zfio

Thanks,
Ming


  reply	other threads:[~2025-02-06  3:24 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-06 14:24 [PATCH 0/2] New zoned loop block device driver Damien Le Moal
2025-01-06 14:24 ` [PATCH 1/2] block: new " Damien Le Moal
2025-01-06 14:24 ` [PATCH 2/2] Documentation: Document the " Damien Le Moal
2025-01-06 14:54 ` [PATCH 0/2] New " Jens Axboe
2025-01-06 15:21   ` Christoph Hellwig
2025-01-06 15:24     ` Jens Axboe
2025-01-06 15:32       ` Christoph Hellwig
2025-01-06 15:38         ` Jens Axboe
2025-01-06 15:44           ` Christoph Hellwig
2025-01-06 17:38             ` Jens Axboe
2025-01-06 18:05               ` Christoph Hellwig
2025-01-07 21:10                 ` Jens Axboe
2025-01-08  5:49                   ` Christoph Hellwig
2025-01-07  1:08               ` Damien Le Moal
2025-01-07 21:08                 ` Jens Axboe
2025-01-08  5:11                   ` Damien Le Moal
2025-01-08  5:44                   ` Christoph Hellwig
2025-01-08  2:47             ` Ming Lei
2025-01-08 14:10               ` Theodore Ts'o
2025-01-08  2:29     ` Ming Lei
2025-01-08  5:06       ` Damien Le Moal
2025-01-08  8:13         ` Ming Lei
2025-01-08  9:09           ` Christoph Hellwig
2025-01-08  9:39             ` Ming Lei
2025-01-10 12:34               ` Ming Lei
2025-01-24  9:30                 ` Damien Le Moal
2025-01-24 12:30                   ` Ming Lei
2025-01-24 14:20                     ` Johannes Thumshirn
2025-01-29  8:10                     ` Damien Le Moal
2025-01-31  3:54                       ` Ming Lei
2025-02-04  3:22                         ` Damien Le Moal
2025-02-05  3:43                           ` Ming Lei
2025-02-05  6:07                             ` Damien Le Moal
2025-02-06  3:24                               ` Ming Lei [this message]
2025-01-08  5:47       ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z6QrceGGAJl_X_BM@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=dlemoal@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.