From: Damien Le Moal <dlemoal@kernel.org>
To: Hannes Reinecke <hare@suse.de>,
linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
linux-scsi@vger.kernel.org,
"Martin K . Petersen" <martin.petersen@oracle.com>,
dm-devel@lists.linux.dev, Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH 08/26] block: Implement zone append emulation
Date: Mon, 5 Feb 2024 09:10:14 +0900 [thread overview]
Message-ID: <ff4018f0-7a51-4cec-82de-f105483dfde2@kernel.org> (raw)
In-Reply-To: <9026cc14-b107-41de-994f-6b46858b6601@suse.de>
On 2/4/24 21:24, Hannes Reinecke wrote:
> On 2/2/24 15:30, Damien Le Moal wrote:
>> +/*
>> + * Set a zone write plug write pointer offset to either 0 (zone reset case)
>> + * or to the zone size (zone finish case). This aborts all plugged BIOs, which
>> + * is fine to do as doing a zone reset or zone finish while writes are in-flight
>> + * is a mistake from the user which will most likely cause all plugged BIOs to
>> + * fail anyway.
>> + */
>> +static void blk_zone_wplug_set_wp_offset(struct gendisk *disk,
>> + struct blk_zone_wplug *zwplug,
>> + unsigned int wp_offset)
>> +{
>> + /*
>> + * Updating the write pointer offset puts back the zone
>> + * in a good state. So clear the error flag and decrement the
>> + * error count if we were in error state.
>> + */
>> + if (zwplug->flags & BLK_ZONE_WPLUG_ERROR) {
>> + zwplug->flags &= ~BLK_ZONE_WPLUG_ERROR;
>> + atomic_dec(&disk->zone_nr_wplugs_with_error);
>> + }
>> +
>> + /* Update the zone write pointer and abort all plugged BIOs. */
>> + zwplug->wp_offset = wp_offset;
>> + blk_zone_wplug_abort(disk, zwplug);
>> +}
[...]
>> +static void blk_zone_wplug_handle_error(struct gendisk *disk,
>> + struct blk_zone_wplug *zwplug)
>> +{
>> + unsigned int zno = zwplug - disk->zone_wplugs;
>> + sector_t zone_start_sector = bdev_zone_sectors(disk->part0) * zno;
>> + unsigned int noio_flag;
>> + struct blk_zone zone;
>> + unsigned long flags;
>> + int ret;
>> +
>> + /* Check if we have an error and clear it if we do. */
>> + blk_zone_wplug_lock(zwplug, flags);
>> + if (!(zwplug->flags & BLK_ZONE_WPLUG_ERROR))
>> + goto unlock;
>> + zwplug->flags &= ~BLK_ZONE_WPLUG_ERROR;
>> + atomic_dec(&disk->zone_nr_wplugs_with_error);
>> + blk_zone_wplug_unlock(zwplug, flags);
>> +
>
> Don't you need to quiesce the drive here?
> After all, I/O (or a reset zone) might be executed after the call to
> report zones, but before we lock the zone, no?
Indeed, this is racy with reset zone. But there is no race with IOs because when
the error flag is set, we always plug incoming BIOs.
But the race with reset (and finish) is actually easy to fix. All I need to do
is not clear the error flag above and check for it after the report zones and
locking the zone plug. Given that blk_zone_wplug_set_wp_offset() clears the
error flag, we end up restoring a known good wp either from the reset or from
the report zones.
>> struct blk_revalidate_zone_args {
>> @@ -890,6 +1292,8 @@ static int blk_revalidate_zone_cb(struct blk_zone *zone, unsigned int idx,
>> if (!args->seq_zones_wlock)
>> return -ENOMEM;
>> }
>> + args->zone_wplugs[idx].capacity = zone->capacity;
>> + args->zone_wplugs[idx].wp_offset = blk_zone_wp_offset(zone);
>> break;
>> case BLK_ZONE_TYPE_SEQWRITE_PREF:
>> default:
>> @@ -964,6 +1368,13 @@ int blk_revalidate_disk_zones(struct gendisk *disk,
>> if (!args.zone_wplugs)
>> goto out_restore_noio;
>>
>> + if (!disk->zone_wplugs) {
>> + mutex_init(&disk->zone_wplugs_mutex);
>> + atomic_set(&disk->zone_nr_wplugs_with_error, 0);
>> + INIT_DELAYED_WORK(&disk->zone_wplugs_work,
>> + disk_zone_wplugs_work);
>> + }
>> +
>
> Same question here about device quiesce ...
Yes, I need to check this, together with revisiting the queue usage counter
handling.
>> ret = disk->fops->report_zones(disk, 0, UINT_MAX,
>> blk_revalidate_zone_cb, &args);
>> if (!ret) {
>> @@ -989,12 +1400,14 @@ int blk_revalidate_disk_zones(struct gendisk *disk,
>> */
>> blk_mq_freeze_queue(q);
>
> And this, I guess, comes to late.
> We've already read the zone list, so any write I/O submitted after the
> report zone but befor here will cause things to be iffy.
Yes, but the driver is supposed to guarantee that this function is being called
while there are no writes in flight. DM is OK with that. I think scsi is too,
but need to check again. Not sure about NVMe and null_blk. But Ming had a good
point about the usage ref coming from the device being open. So a quiesce may be
enough here. Need to revisit this.
--
Damien Le Moal
Western Digital Research
next prev parent reply other threads:[~2024-02-05 0:10 UTC|newest]
Thread overview: 107+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-02 7:30 [PATCH 00/26] Zone write plugging Damien Le Moal
2024-02-02 7:30 ` [PATCH 01/26] block: Restore sector of flush requests Damien Le Moal
2024-02-04 11:55 ` Hannes Reinecke
2024-02-05 17:22 ` Bart Van Assche
2024-02-05 23:42 ` Damien Le Moal
2024-02-02 7:30 ` [PATCH 02/26] block: Remove req_bio_endio() Damien Le Moal
2024-02-04 11:57 ` Hannes Reinecke
2024-02-05 17:28 ` Bart Van Assche
2024-02-05 23:45 ` Damien Le Moal
2024-02-09 6:53 ` Damien Le Moal
2024-02-02 7:30 ` [PATCH 03/26] block: Introduce bio_straddle_zones() and bio_offset_from_zone_start() Damien Le Moal
2024-02-03 4:09 ` Bart Van Assche
2024-02-04 11:58 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 04/26] block: Introduce blk_zone_complete_request_bio() Damien Le Moal
2024-02-04 11:59 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 05/26] block: Allow using bio_attempt_back_merge() internally Damien Le Moal
2024-02-03 4:11 ` Bart Van Assche
2024-02-04 12:00 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 06/26] block: Introduce zone write plugging Damien Le Moal
2024-02-04 3:56 ` Ming Lei
2024-02-04 23:57 ` Damien Le Moal
2024-02-05 2:19 ` Ming Lei
2024-02-05 2:41 ` Damien Le Moal
2024-02-05 3:38 ` Ming Lei
2024-02-05 5:11 ` Christoph Hellwig
2024-02-05 5:37 ` Damien Le Moal
2024-02-05 5:50 ` Christoph Hellwig
2024-02-05 6:14 ` Damien Le Moal
2024-02-05 10:06 ` Ming Lei
2024-02-05 12:20 ` Damien Le Moal
2024-02-05 12:43 ` Damien Le Moal
2024-02-04 12:14 ` Hannes Reinecke
2024-02-05 17:48 ` Bart Van Assche
2024-02-05 23:48 ` Damien Le Moal
2024-02-06 0:52 ` Bart Van Assche
2024-02-02 7:30 ` [PATCH 07/26] block: Allow zero value of max_zone_append_sectors queue limit Damien Le Moal
2024-02-04 12:15 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 08/26] block: Implement zone append emulation Damien Le Moal
2024-02-04 12:24 ` Hannes Reinecke
2024-02-05 0:10 ` Damien Le Moal [this message]
2024-02-05 17:58 ` Bart Van Assche
2024-02-05 23:57 ` Damien Le Moal
2024-02-02 7:30 ` [PATCH 09/26] block: Allow BIO-based drivers to use blk_revalidate_disk_zones() Damien Le Moal
2024-02-04 12:26 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 10/26] dm: Use the block layer zone append emulation Damien Le Moal
2024-02-03 17:58 ` Mike Snitzer
2024-02-05 5:38 ` Damien Le Moal
2024-02-05 20:33 ` Mike Snitzer
2024-02-05 23:40 ` Damien Le Moal
2024-02-06 20:41 ` Mike Snitzer
2024-02-04 12:30 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 11/26] scsi: sd: " Damien Le Moal
2024-02-04 12:29 ` Hannes Reinecke
2024-02-06 1:55 ` Martin K. Petersen
2024-02-02 7:30 ` [PATCH 12/26] ublk_drv: Do not request ELEVATOR_F_ZBD_SEQ_WRITE elevator feature Damien Le Moal
2024-02-04 12:31 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 13/26] null_blk: " Damien Le Moal
2024-02-04 12:31 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 14/26] null_blk: Introduce zone_append_max_sectors attribute Damien Le Moal
2024-02-04 12:32 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 15/26] null_blk: Introduce fua attribute Damien Le Moal
2024-02-04 12:33 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 16/26] nvmet: zns: Do not reference the gendisk conv_zones_bitmap Damien Le Moal
2024-02-04 12:34 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 17/26] block: Remove BLK_STS_ZONE_RESOURCE Damien Le Moal
2024-02-04 12:34 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 18/26] block: Simplify blk_revalidate_disk_zones() interface Damien Le Moal
2024-02-04 12:35 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 19/26] block: mq-deadline: Remove support for zone write locking Damien Le Moal
2024-02-04 12:36 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 20/26] block: Remove elevator required features Damien Le Moal
2024-02-04 12:36 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 21/26] block: Do not check zone type in blk_check_zone_append() Damien Le Moal
2024-02-04 12:37 ` Hannes Reinecke
2024-02-02 7:31 ` [PATCH 22/26] block: Move zone related debugfs attribute to blk-zoned.c Damien Le Moal
2024-02-04 12:38 ` Hannes Reinecke
2024-02-02 7:31 ` [PATCH 23/26] block: Remove zone write locking Damien Le Moal
2024-02-04 12:38 ` Hannes Reinecke
2024-02-02 7:31 ` [PATCH 24/26] block: Do not special-case plugging of zone write operations Damien Le Moal
2024-02-04 12:39 ` Hannes Reinecke
2024-02-02 7:31 ` [PATCH 25/26] block: Reduce zone write plugging memory usage Damien Le Moal
2024-02-04 12:42 ` Hannes Reinecke
2024-02-05 17:51 ` Bart Van Assche
2024-02-05 23:55 ` Damien Le Moal
2024-02-06 21:20 ` Bart Van Assche
2024-02-09 3:58 ` Damien Le Moal
2024-02-09 19:36 ` Bart Van Assche
2024-02-10 0:06 ` Damien Le Moal
2024-02-11 3:40 ` Bart Van Assche
2024-02-12 1:09 ` Damien Le Moal
2024-02-12 18:58 ` Bart Van Assche
2024-02-12 8:23 ` Damien Le Moal
2024-02-12 8:47 ` Damien Le Moal
2024-02-12 18:40 ` Bart Van Assche
2024-02-13 0:05 ` Damien Le Moal
2024-02-02 7:31 ` [PATCH 26/26] block: Add zone_active_wplugs debugfs entry Damien Le Moal
2024-02-04 12:43 ` Hannes Reinecke
2024-02-02 7:37 ` [PATCH 00/26] Zone write plugging Damien Le Moal
2024-02-03 12:11 ` Jens Axboe
2024-02-09 5:28 ` Damien Le Moal
2024-02-05 17:21 ` Bart Van Assche
2024-02-05 23:42 ` Damien Le Moal
2024-02-06 0:57 ` Bart Van Assche
2024-02-05 18:18 ` Bart Van Assche
2024-02-06 0:07 ` Damien Le Moal
2024-02-06 1:25 ` Bart Van Assche
2024-02-09 4:03 ` Damien Le Moal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ff4018f0-7a51-4cec-82de-f105483dfde2@kernel.org \
--to=dlemoal@kernel.org \
--cc=axboe@kernel.dk \
--cc=dm-devel@lists.linux.dev \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox