linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Damien Le Moal <dlemoal@kernel.org>
To: Bart Van Assche <bvanassche@acm.org>,
	linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
	linux-scsi@vger.kernel.org,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	dm-devel@lists.linux.dev, Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v2 07/28] block: Introduce zone write plugging
Date: Tue, 26 Mar 2024 12:12:03 +0900	[thread overview]
Message-ID: <839ebf2a-4dc6-433b-bc47-fd7915ed0ecf@kernel.org> (raw)
In-Reply-To: <f3b298bb-b68a-4375-a3b4-fc91229740c1@acm.org>

On 3/26/24 06:53, Bart Van Assche wrote:
> On 3/24/24 21:44, Damien Le Moal wrote:
>> +/*
>> + * Per-zone write plug.
>> + */
>> +struct blk_zone_wplug {
>> +	struct hlist_node	node;
>> +	struct list_head	err;
>> +	atomic_t		ref;
>> +	spinlock_t		lock;
>> +	unsigned int		flags;
>> +	unsigned int		zone_no;
>> +	unsigned int		wp_offset;
>> +	struct bio_list		bio_list;
>> +	struct work_struct	bio_work;
>> +};
> 
> Please document what 'lock' protects. Please also document the unit of
> wp_offset.
> 
> Since there is an atomic reference count in this data structure, why is
> the flag BLK_ZONE_WPLUG_FREEING required? Can that flag be replaced by
> checking whether or not 'ref' is zero?

Nope, we cannot. The reason is that BIO issuing and zone reset/finish can be
concurrently processed and we need to be ready for a user doing really stupid
things like resetting or finishing a zone while BIOs for that zone are being
issued. When zone reset/finish is processed, the plug is removed from the hash
table, but disk_get_zone_wplug_locked() may still get a reference to it because
we do not have the plug locked yet. Hence the flag, to prevent reusing the plug
for the reset/finished zone that was already removed from the hash table. This
is mentioned with a comment in disk_get_zone_wplug_locked():

	/*
	 * Check that a BIO completion or a zone reset or finish
	 * operation has not already flagged the zone write plug for
	 * freeing and dropped its reference count. In such case, we
	 * need to get a new plug so start over from the beginning.
	 */

The reference count dropping to 0 will then be the trigger for actually freeing
the plug, after all in-flight or plugged BIOs are completed (most likely failed).

>> -void disk_free_zone_bitmaps(struct gendisk *disk)
>> +static bool disk_insert_zone_wplug(struct gendisk *disk,
>> +				   struct blk_zone_wplug *zwplug)
>> +{
>> +	struct blk_zone_wplug *zwplg;
>> +	unsigned long flags;
>> +	unsigned int idx =
>> +		hash_32(zwplug->zone_no, disk->zone_wplugs_hash_bits);
>> +
>> +	/*
>> +	 * Add the new zone write plug to the hash table, but carefully as we
>> +	 * are racing with other submission context, so we may already have a
>> +	 * zone write plug for the same zone.
>> +	 */
>> +	spin_lock_irqsave(&disk->zone_wplugs_lock, flags);
>> +	hlist_for_each_entry_rcu(zwplg, &disk->zone_wplugs_hash[idx], node) {
>> +		if (zwplg->zone_no == zwplug->zone_no) {
>> +			spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags);
>> +			return false;
>> +		}
>> +	}
>> +	hlist_add_head_rcu(&zwplug->node, &disk->zone_wplugs_hash[idx]);
>> +	spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags);
>> +
>> +	return true;
>> +}
> 
> Since this function inserts an element into disk->zone_wplugs_hash[],
> can it happen that another thread removes that element from the hash
> list before this function returns?

No, that cannot happen. Both insertion and deletion of plugs in the hash table
are serialized with disk->zone_wplugs_lock. See disk_remove_zone_wplug().

-- 
Damien Le Moal
Western Digital Research


  reply	other threads:[~2024-03-26  3:12 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-25  4:44 [PATCH v2 00/28] Zone write plugging Damien Le Moal
2024-03-25  4:44 ` [PATCH v2 01/28] block: Restore sector of flush requests Damien Le Moal
2024-03-25 19:30   ` Bart Van Assche
2024-03-26  6:05   ` Christoph Hellwig
2024-03-25  4:44 ` [PATCH v2 02/28] block: Remove req_bio_endio() Damien Le Moal
2024-03-25 19:39   ` Bart Van Assche
2024-03-26  1:54     ` Damien Le Moal
2024-03-25  4:44 ` [PATCH v2 03/28] block: Introduce blk_zone_update_request_bio() Damien Le Moal
2024-03-25 19:52   ` Bart Van Assche
2024-03-25 23:23     ` Damien Le Moal
2024-03-26  6:37       ` Christoph Hellwig
2024-03-26  7:47         ` Damien Le Moal
2024-03-27  7:01   ` Hannes Reinecke
2024-03-25  4:44 ` [PATCH v2 04/28] block: Introduce bio_straddle_zones() and bio_offset_from_zone_start() Damien Le Moal
2024-03-25 19:55   ` Bart Van Assche
2024-03-26  6:39   ` Christoph Hellwig
2024-03-25  4:44 ` [PATCH v2 05/28] block: Allow using bio_attempt_back_merge() internally Damien Le Moal
2024-03-25 20:00   ` Bart Van Assche
2024-03-26  6:39   ` Christoph Hellwig
2024-03-25  4:44 ` [PATCH v2 06/28] block: Remember zone capacity when revalidating zones Damien Le Moal
2024-03-25 21:53   ` Bart Van Assche
2024-03-25 23:20     ` Damien Le Moal
2024-03-26  6:40   ` Christoph Hellwig
2024-03-27  7:05   ` Hannes Reinecke
2024-03-25  4:44 ` [PATCH v2 07/28] block: Introduce zone write plugging Damien Le Moal
2024-03-25 21:53   ` Bart Van Assche
2024-03-26  3:12     ` Damien Le Moal [this message]
2024-03-26  6:51       ` Christoph Hellwig
2024-03-26 17:23       ` Bart Van Assche
2024-03-27  7:18   ` Hannes Reinecke
2024-03-25  4:44 ` [PATCH v2 08/28] block: Use a mempool to allocate zone write plugs Damien Le Moal
2024-03-27  7:19   ` Hannes Reinecke
2024-03-27  7:22     ` Damien Le Moal
2024-03-25  4:44 ` [PATCH v2 09/28] block: Fake max open zones limit when there is no limit Damien Le Moal
2024-03-26  6:57   ` Christoph Hellwig
2024-03-27  7:21   ` Hannes Reinecke
2024-03-25  4:44 ` [PATCH v2 10/28] block: Allow zero value of max_zone_append_sectors queue limit Damien Le Moal
2024-03-25  4:44 ` [PATCH v2 11/28] block: Implement zone append emulation Damien Le Moal
2024-03-27  7:28   ` Hannes Reinecke
2024-03-25  4:44 ` [PATCH v2 12/28] block: Allow BIO-based drivers to use blk_revalidate_disk_zones() Damien Le Moal
2024-03-26  7:08   ` Christoph Hellwig
2024-03-26  8:12     ` Damien Le Moal
2024-03-27  7:29   ` Hannes Reinecke
2024-03-25  4:44 ` [PATCH v2 13/28] dm: Use the block layer zone append emulation Damien Le Moal
2024-03-25  4:44 ` [PATCH v2 14/28] scsi: sd: " Damien Le Moal
2024-03-25  4:44 ` [PATCH v2 15/28] ublk_drv: Do not request ELEVATOR_F_ZBD_SEQ_WRITE elevator feature Damien Le Moal
2024-03-25  4:44 ` [PATCH v2 16/28] null_blk: " Damien Le Moal
2024-03-25  4:44 ` [PATCH v2 17/28] null_blk: Introduce zone_append_max_sectors attribute Damien Le Moal
2024-03-27  7:31   ` Hannes Reinecke
2024-03-25  4:44 ` [PATCH v2 18/28] null_blk: Introduce fua attribute Damien Le Moal
2024-03-25  4:44 ` [PATCH v2 19/28] nvmet: zns: Do not reference the gendisk conv_zones_bitmap Damien Le Moal
2024-03-26  6:45   ` Christoph Hellwig
2024-03-25  4:44 ` [PATCH v2 20/28] block: Remove BLK_STS_ZONE_RESOURCE Damien Le Moal
2024-03-26  6:45   ` Christoph Hellwig
2024-03-25  4:44 ` [PATCH v2 21/28] block: Simplify blk_revalidate_disk_zones() interface Damien Le Moal
2024-03-26  6:45   ` Christoph Hellwig
2024-03-25  4:44 ` [PATCH v2 22/28] block: mq-deadline: Remove support for zone write locking Damien Le Moal
2024-03-25 22:13   ` Bart Van Assche
2024-03-25  4:44 ` [PATCH v2 23/28] block: Remove elevator required features Damien Le Moal
2024-03-26  6:45   ` Christoph Hellwig
2024-03-25  4:44 ` [PATCH v2 24/28] block: Do not check zone type in blk_check_zone_append() Damien Le Moal
2024-03-26  6:46   ` Christoph Hellwig
2024-03-25  4:44 ` [PATCH v2 25/28] block: Move zone related debugfs attribute to blk-zoned.c Damien Le Moal
2024-03-25 22:20   ` Bart Van Assche
2024-03-25 23:17     ` Damien Le Moal
2024-03-25  4:44 ` [PATCH v2 26/28] block: Remove zone write locking Damien Le Moal
2024-03-25 22:27   ` Bart Van Assche
2024-03-27  7:32   ` Hannes Reinecke
2024-03-25  4:44 ` [PATCH v2 27/28] block: Do not force select mq-deadline with CONFIG_BLK_DEV_ZONED Damien Le Moal
2024-03-25 22:29   ` Bart Van Assche
2024-03-27  7:33   ` Hannes Reinecke
2024-03-25  4:44 ` [PATCH v2 28/28] block: Do not special-case plugging of zone write operations Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=839ebf2a-4dc6-433b-bc47-fd7915ed0ecf@kernel.org \
    --to=dlemoal@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=dm-devel@lists.linux.dev \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).