public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Damien Le Moal <dlemoal@kernel.org>,
	linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
	dm-devel@lists.linux.dev, Mike Snitzer <snitzer@redhat.com>
Subject: Re: [PATCH v3 04/14] block: Fix reference counting for zone write plugs in error state
Date: Thu, 2 May 2024 08:01:35 +0200	[thread overview]
Message-ID: <d4f71b64-b2d3-4350-b502-bbcfcc9614ce@suse.de> (raw)
In-Reply-To: <20240501110907.96950-5-dlemoal@kernel.org>

On 5/1/24 13:08, Damien Le Moal wrote:
> When zone is reset or finished, disk_zone_wplug_set_wp_offset() is
> called to update the zone write plug write pointer offset and to clear
> the zone error state (BLK_ZONE_WPLUG_ERROR flag) if it is set.
> However, this processing is missing dropping the reference to the zone
> write plug that was taken in disk_zone_wplug_set_error() when the error
> flag was first set. Furthermore, the error state handling must release
> the zone write plug lock to first execute a report zones command. When
> the report zone races with a reset or finish operation that clears the
> error, we can end up decrementing the zone write plug reference count
> twice: once in disk_zone_wplug_set_wp_offset() for the reset/finish
> operation and one more time in disk_zone_wplugs_work() once
> disk_zone_wplug_handle_error() completes.
> 
> Fix this by introducing disk_zone_wplug_clear_error() as the symmetric
> function of disk_zone_wplug_set_error(). disk_zone_wplug_clear_error()
> decrements the zone write plug reference count obtained in
> disk_zone_wplug_set_error() only if the error handling has not started
> yet, that is, only if disk_zone_wplugs_work() has not yet taken the zone
> write plug off the error list. This ensure that either
> disk_zone_wplug_clear_error() or disk_zone_wplugs_work() drop the zone
> write plug reference count.
> 
> Fixes: dd291d77cc90 ("block: Introduce zone write plugging")
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>   block/blk-zoned.c | 75 +++++++++++++++++++++++++++++++----------------
>   1 file changed, 49 insertions(+), 26 deletions(-)
> 
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> index 7824bd52c82c..23ad1de0da62 100644
> --- a/block/blk-zoned.c
> +++ b/block/blk-zoned.c
> @@ -658,6 +658,54 @@ static void disk_zone_wplug_abort_unaligned(struct gendisk *disk,
>   	bio_list_merge(&zwplug->bio_list, &bl);
>   }
>   
> +static inline void disk_zone_wplug_set_error(struct gendisk *disk,
> +					     struct blk_zone_wplug *zwplug)
> +{
> +	unsigned long flags;
> +
> +	if (zwplug->flags & BLK_ZONE_WPLUG_ERROR)
> +		return;
> +

I still get nervous when I see an unprotected flag being set.
Especially in code which is known to race with error handling.
Wouldn't it be better to check the flag under the lock or at
least use 'test_and_set_bit' here?

> +	/*
> +	 * At this point, we already have a reference on the zone write plug.
> +	 * However, since we are going to add the plug to the disk zone write
> +	 * plugs work list, increase its reference count. This reference will
> +	 * be dropped in disk_zone_wplugs_work() once the error state is
> +	 * handled, or in disk_zone_wplug_clear_error() if the zone is reset or
> +	 * finished.
> +	 */
> +	zwplug->flags |= BLK_ZONE_WPLUG_ERROR;

And that is even worse. We might have been interrupted between these
two lines, invalidating the first check.

Please consider using 'test_and_set_bit()' here.

> +	atomic_inc(&zwplug->ref);
> +
> +	spin_lock_irqsave(&disk->zone_wplugs_lock, flags);
> +	list_add_tail(&zwplug->link, &disk->zone_wplugs_err_list);
> +	spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags);
> +}
> +
> +static inline void disk_zone_wplug_clear_error(struct gendisk *disk,
> +					       struct blk_zone_wplug *zwplug)
> +{
> +	unsigned long flags;
> +
> +	if (!(zwplug->flags & BLK_ZONE_WPLUG_ERROR))
> +		return;
> +
> +	/*
> +	 * We are racing with the error handling work which drops the reference
> +	 * on the zone write plug after handling the error state. So remove the
> +	 * plug from the error list and drop its reference count only if the
> +	 * error handling has not yet started, that is, if the zone write plug
> +	 * is still listed.
> +	 */
> +	spin_lock_irqsave(&disk->zone_wplugs_lock, flags);
> +	if (!list_empty(&zwplug->link)) {
> +		list_del_init(&zwplug->link);
> +		zwplug->flags &= ~BLK_ZONE_WPLUG_ERROR;
> +		disk_put_zone_wplug(zwplug);
> +	}
> +	spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags);
> +}
> +

Similar comments to above: you are clearing the flag under the lock,
but don't check or set under the lock...

>   /*
>    * Set a zone write plug write pointer offset to either 0 (zone reset case)
>    * or to the zone size (zone finish case). This aborts all plugged BIOs, which
> @@ -691,12 +739,7 @@ static void disk_zone_wplug_set_wp_offset(struct gendisk *disk,
>   	 * in a good state. So clear the error flag and decrement the
>   	 * error count if we were in error state.
>   	 */
> -	if (zwplug->flags & BLK_ZONE_WPLUG_ERROR) {
> -		zwplug->flags &= ~BLK_ZONE_WPLUG_ERROR;
> -		spin_lock(&disk->zone_wplugs_lock);
> -		list_del_init(&zwplug->link);
> -		spin_unlock(&disk->zone_wplugs_lock);
> -	}
> +	disk_zone_wplug_clear_error(disk, zwplug);
>   
>   	/*
>   	 * The zone write plug now has no BIO plugged: remove it from the
> @@ -885,26 +928,6 @@ void blk_zone_write_plug_attempt_merge(struct request *req)
>   	spin_unlock_irqrestore(&zwplug->lock, flags);
>   }
>   
> -static inline void disk_zone_wplug_set_error(struct gendisk *disk,
> -					     struct blk_zone_wplug *zwplug)
> -{
> -	if (!(zwplug->flags & BLK_ZONE_WPLUG_ERROR)) {
> -		unsigned long flags;
> -
> -		/*
> -		 * Increase the plug reference count. The reference will be
> -		 * dropped in disk_zone_wplugs_work() once the error state
> -		 * is handled.
> -		 */
> -		zwplug->flags |= BLK_ZONE_WPLUG_ERROR;
> -		atomic_inc(&zwplug->ref);
> -
> -		spin_lock_irqsave(&disk->zone_wplugs_lock, flags);
> -		list_add_tail(&zwplug->link, &disk->zone_wplugs_err_list);
> -		spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags);
> -	}
> -}
> -
>   /*
>    * Check and prepare a BIO for submission by incrementing the write pointer
>    * offset of its zone write plug and changing zone append operations into

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


  reply	other threads:[~2024-05-02  6:01 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-01 11:08 [PATCH v3 00/14] Zone write plugging fixes and cleanup Damien Le Moal
2024-05-01 11:08 ` [PATCH v3 01/14] dm: Check that a zoned table leads to a valid mapped device Damien Le Moal
2024-05-01 11:08 ` [PATCH v3 02/14] block: Exclude conventional zones when faking max open limit Damien Le Moal
2024-05-02  5:52   ` Hannes Reinecke
2024-05-01 11:08 ` [PATCH v3 03/14] block: Fix zone write plug initialization from blk_revalidate_zone_cb() Damien Le Moal
2024-05-02  5:53   ` Hannes Reinecke
2024-05-01 11:08 ` [PATCH v3 04/14] block: Fix reference counting for zone write plugs in error state Damien Le Moal
2024-05-02  6:01   ` Hannes Reinecke [this message]
2024-05-02  9:37     ` Damien Le Moal
2024-05-01 11:08 ` [PATCH v3 05/14] block: Hold a reference on zone write plugs to schedule submission Damien Le Moal
2024-05-02  6:04   ` Hannes Reinecke
2024-05-01 11:08 ` [PATCH v3 06/14] block: Unhash a zone write plug only if needed Damien Le Moal
2024-05-02  6:05   ` Hannes Reinecke
2024-05-01 11:09 ` [PATCH v3 07/14] block: Do not remove zone write plugs still in use Damien Le Moal
2024-05-02  5:38   ` Christoph Hellwig
2024-05-02  6:09   ` Hannes Reinecke
2024-05-01 11:09 ` [PATCH v3 08/14] block: Fix flush request sector restore Damien Le Moal
2024-05-02  6:10   ` Hannes Reinecke
2024-05-01 11:09 ` [PATCH v3 09/14] block: Fix handling of non-empty flush write requests to zones Damien Le Moal
2024-05-02  6:11   ` Hannes Reinecke
2024-05-01 11:09 ` [PATCH v3 10/14] block: Improve blk_zone_write_plug_bio_merged() Damien Le Moal
2024-05-02  6:13   ` Hannes Reinecke
2024-05-01 11:09 ` [PATCH v3 11/14] block: Improve zone write request completion handling Damien Le Moal
2024-05-02  6:14   ` Hannes Reinecke
2024-05-01 11:09 ` [PATCH v3 12/14] block: Simplify blk_zone_write_plug_bio_endio() Damien Le Moal
2024-05-02  6:15   ` Hannes Reinecke
2024-05-01 11:09 ` [PATCH v3 13/14] block: Simplify zone write plug BIO abort Damien Le Moal
2024-05-02  6:15   ` Hannes Reinecke
2024-05-01 11:09 ` [PATCH v3 14/14] block: Cleanup blk_revalidate_zone_cb() Damien Le Moal
2024-05-02  6:17   ` Hannes Reinecke
2024-05-01 14:08 ` [PATCH v3 00/14] Zone write plugging fixes and cleanup Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d4f71b64-b2d3-4350-b502-bbcfcc9614ce@suse.de \
    --to=hare@suse.de \
    --cc=axboe@kernel.dk \
    --cc=dlemoal@kernel.org \
    --cc=dm-devel@lists.linux.dev \
    --cc=linux-block@vger.kernel.org \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox