From: Ming Lei <ming.lei@redhat.com>
To: Damien Le Moal <dlemoal@kernel.org>
Cc: linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
linux-scsi@vger.kernel.org,
"Martin K . Petersen" <martin.petersen@oracle.com>,
dm-devel@lists.linux.dev, Mike Snitzer <snitzer@redhat.com>,
Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH 06/26] block: Introduce zone write plugging
Date: Mon, 5 Feb 2024 11:38:47 +0800 [thread overview]
Message-ID: <ZcBYRzAqwoQIaR3r@fedora> (raw)
In-Reply-To: <58fa0123-e884-4321-9b9b-8575cc7b4e1d@kernel.org>
On Mon, Feb 05, 2024 at 11:41:04AM +0900, Damien Le Moal wrote:
> On 2/5/24 11:19, Ming Lei wrote:
> >>>> +static inline void blk_zone_wplug_add_bio(struct blk_zone_wplug *zwplug,
> >>>> + struct bio *bio, unsigned int nr_segs)
> >>>> +{
> >>>> + /*
> >>>> + * Keep a reference on the BIO request queue usage. This reference will
> >>>> + * be dropped either if the BIO is failed or after it is issued and
> >>>> + * completes.
> >>>> + */
> >>>> + percpu_ref_get(&bio->bi_bdev->bd_disk->queue->q_usage_counter);
> >>>
> >>> It is fragile to get nested usage_counter, and same with grabbing/releasing it
> >>> from different contexts or even functions, and it could be much better to just
> >>> let block layer maintain it.
> >>>
> >>> From patch 23's change:
> >>>
> >>> + * Zoned block device information. Reads of this information must be
> >>> + * protected with blk_queue_enter() / blk_queue_exit(). Modifying this
> >>>
> >>> Anytime if there is in-flight bio, the block device is opened, so both gendisk and
> >>> request_queue are live, so not sure if this .q_usage_counter protection
> >>> is needed.
> >>
> >> Hannes also commented about this. Let me revisit this.
> >
> > I think only queue re-configuration(blk_revalidate_zone) requires the
> > queue usage counter. Otherwise, bdev open()/close() should work just
> > fine.
>
> I want to check FS case though. No clear if mounting FS that supports zone
> (btrfs) also uses bdev open ?
btrfs '-O zoned' shouldn't be one exception:
mount -O zoned /dev/ublkb0 /mnt
b'blkdev_get_whole'
b'bdev_open_by_dev'
b'bdev_open_by_path'
b'btrfs_scan_one_device'
b'btrfs_get_tree'
b'vfs_get_tree'
b'fc_mount'
b'btrfs_get_tree'
b'vfs_get_tree'
b'path_mount'
b'__x64_sys_mount'
b'do_syscall_64'
b'entry_SYSCALL_64_after_hwframe'
b'[unknown]'
1
>
> >>>> + /*
> >>>> + * blk-mq devices will reuse the reference on the request queue usage
> >>>> + * we took when the BIO was plugged, but the submission path for
> >>>> + * BIO-based devices will not do that. So drop this reference here.
> >>>> + */
> >>>> + if (bio->bi_bdev->bd_has_submit_bio)
> >>>> + blk_queue_exit(bio->bi_bdev->bd_disk->queue);
> >>>
> >>> But I don't see where this reference is reused for blk-mq in this patch,
> >>> care to point it out?
> >>
> >> This patch modifies blk_mq_submit_bio() to add a "goto new_request" at the top
> >> for any BIO flagged with BIO_FLAG_ZONE_WRITE_PLUGGING. So when a plugged BIO is
> >> unplugged and submitted again, the reference that was taken in
> >> blk_zone_wplug_add_bio() is reused for the new request for that BIO.
> >
> > OK, this reference reuse may be worse, because queue freeze can't prevent new
> > write zoned bio from being submitted any more given only percpu_ref_get() is
> > called for all write zoned bios.
>
> New BIOs (BIOS that have never been plugged yet) will go through the normal
> blk_queue_enter() in blk_mq_submit_bio(), so they will be stopped there if
> another context asked for a queue freeze. I do not think there is any issue with
> how things are currently done (we tested that *a lot* with many different drives
> and drive configs with DM etc). Reference counting as it is is OK, even though
> it most likely be simplified. I am looking at that now.
Indeed, new zoned write bio is still covered by blk-mq's queue reference, and the
trick is just played on old plugged bio.
Thanks,
Ming
next prev parent reply other threads:[~2024-02-05 3:39 UTC|newest]
Thread overview: 107+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-02 7:30 [PATCH 00/26] Zone write plugging Damien Le Moal
2024-02-02 7:30 ` [PATCH 01/26] block: Restore sector of flush requests Damien Le Moal
2024-02-04 11:55 ` Hannes Reinecke
2024-02-05 17:22 ` Bart Van Assche
2024-02-05 23:42 ` Damien Le Moal
2024-02-02 7:30 ` [PATCH 02/26] block: Remove req_bio_endio() Damien Le Moal
2024-02-04 11:57 ` Hannes Reinecke
2024-02-05 17:28 ` Bart Van Assche
2024-02-05 23:45 ` Damien Le Moal
2024-02-09 6:53 ` Damien Le Moal
2024-02-02 7:30 ` [PATCH 03/26] block: Introduce bio_straddle_zones() and bio_offset_from_zone_start() Damien Le Moal
2024-02-03 4:09 ` Bart Van Assche
2024-02-04 11:58 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 04/26] block: Introduce blk_zone_complete_request_bio() Damien Le Moal
2024-02-04 11:59 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 05/26] block: Allow using bio_attempt_back_merge() internally Damien Le Moal
2024-02-03 4:11 ` Bart Van Assche
2024-02-04 12:00 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 06/26] block: Introduce zone write plugging Damien Le Moal
2024-02-04 3:56 ` Ming Lei
2024-02-04 23:57 ` Damien Le Moal
2024-02-05 2:19 ` Ming Lei
2024-02-05 2:41 ` Damien Le Moal
2024-02-05 3:38 ` Ming Lei [this message]
2024-02-05 5:11 ` Christoph Hellwig
2024-02-05 5:37 ` Damien Le Moal
2024-02-05 5:50 ` Christoph Hellwig
2024-02-05 6:14 ` Damien Le Moal
2024-02-05 10:06 ` Ming Lei
2024-02-05 12:20 ` Damien Le Moal
2024-02-05 12:43 ` Damien Le Moal
2024-02-04 12:14 ` Hannes Reinecke
2024-02-05 17:48 ` Bart Van Assche
2024-02-05 23:48 ` Damien Le Moal
2024-02-06 0:52 ` Bart Van Assche
2024-02-02 7:30 ` [PATCH 07/26] block: Allow zero value of max_zone_append_sectors queue limit Damien Le Moal
2024-02-04 12:15 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 08/26] block: Implement zone append emulation Damien Le Moal
2024-02-04 12:24 ` Hannes Reinecke
2024-02-05 0:10 ` Damien Le Moal
2024-02-05 17:58 ` Bart Van Assche
2024-02-05 23:57 ` Damien Le Moal
2024-02-02 7:30 ` [PATCH 09/26] block: Allow BIO-based drivers to use blk_revalidate_disk_zones() Damien Le Moal
2024-02-04 12:26 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 10/26] dm: Use the block layer zone append emulation Damien Le Moal
2024-02-03 17:58 ` Mike Snitzer
2024-02-05 5:38 ` Damien Le Moal
2024-02-05 20:33 ` Mike Snitzer
2024-02-05 23:40 ` Damien Le Moal
2024-02-06 20:41 ` Mike Snitzer
2024-02-04 12:30 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 11/26] scsi: sd: " Damien Le Moal
2024-02-04 12:29 ` Hannes Reinecke
2024-02-06 1:55 ` Martin K. Petersen
2024-02-02 7:30 ` [PATCH 12/26] ublk_drv: Do not request ELEVATOR_F_ZBD_SEQ_WRITE elevator feature Damien Le Moal
2024-02-04 12:31 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 13/26] null_blk: " Damien Le Moal
2024-02-04 12:31 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 14/26] null_blk: Introduce zone_append_max_sectors attribute Damien Le Moal
2024-02-04 12:32 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 15/26] null_blk: Introduce fua attribute Damien Le Moal
2024-02-04 12:33 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 16/26] nvmet: zns: Do not reference the gendisk conv_zones_bitmap Damien Le Moal
2024-02-04 12:34 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 17/26] block: Remove BLK_STS_ZONE_RESOURCE Damien Le Moal
2024-02-04 12:34 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 18/26] block: Simplify blk_revalidate_disk_zones() interface Damien Le Moal
2024-02-04 12:35 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 19/26] block: mq-deadline: Remove support for zone write locking Damien Le Moal
2024-02-04 12:36 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 20/26] block: Remove elevator required features Damien Le Moal
2024-02-04 12:36 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 21/26] block: Do not check zone type in blk_check_zone_append() Damien Le Moal
2024-02-04 12:37 ` Hannes Reinecke
2024-02-02 7:31 ` [PATCH 22/26] block: Move zone related debugfs attribute to blk-zoned.c Damien Le Moal
2024-02-04 12:38 ` Hannes Reinecke
2024-02-02 7:31 ` [PATCH 23/26] block: Remove zone write locking Damien Le Moal
2024-02-04 12:38 ` Hannes Reinecke
2024-02-02 7:31 ` [PATCH 24/26] block: Do not special-case plugging of zone write operations Damien Le Moal
2024-02-04 12:39 ` Hannes Reinecke
2024-02-02 7:31 ` [PATCH 25/26] block: Reduce zone write plugging memory usage Damien Le Moal
2024-02-04 12:42 ` Hannes Reinecke
2024-02-05 17:51 ` Bart Van Assche
2024-02-05 23:55 ` Damien Le Moal
2024-02-06 21:20 ` Bart Van Assche
2024-02-09 3:58 ` Damien Le Moal
2024-02-09 19:36 ` Bart Van Assche
2024-02-10 0:06 ` Damien Le Moal
2024-02-11 3:40 ` Bart Van Assche
2024-02-12 1:09 ` Damien Le Moal
2024-02-12 18:58 ` Bart Van Assche
2024-02-12 8:23 ` Damien Le Moal
2024-02-12 8:47 ` Damien Le Moal
2024-02-12 18:40 ` Bart Van Assche
2024-02-13 0:05 ` Damien Le Moal
2024-02-02 7:31 ` [PATCH 26/26] block: Add zone_active_wplugs debugfs entry Damien Le Moal
2024-02-04 12:43 ` Hannes Reinecke
2024-02-02 7:37 ` [PATCH 00/26] Zone write plugging Damien Le Moal
2024-02-03 12:11 ` Jens Axboe
2024-02-09 5:28 ` Damien Le Moal
2024-02-05 17:21 ` Bart Van Assche
2024-02-05 23:42 ` Damien Le Moal
2024-02-06 0:57 ` Bart Van Assche
2024-02-05 18:18 ` Bart Van Assche
2024-02-06 0:07 ` Damien Le Moal
2024-02-06 1:25 ` Bart Van Assche
2024-02-09 4:03 ` Damien Le Moal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZcBYRzAqwoQIaR3r@fedora \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=dlemoal@kernel.org \
--cc=dm-devel@lists.linux.dev \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).