Re: [PATCH] dm: Fix deadlock when reloading a multipath table

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH] dm: Fix deadlock when reloading a multipath table
       [not found] <20251009030431.2895495-1-bmarzins@redhat.com>
@ 2025-10-09  9:57 ` Martin Wilck
  2025-10-09 22:25   ` Benjamin Marzinski
  2025-10-09 23:29   ` Bart Van Assche
  0 siblings, 2 replies; 10+ messages in thread
From: Martin Wilck @ 2025-10-09  9:57 UTC (permalink / raw)
  To: Benjamin Marzinski, Mikulas Patocka, Mike Snitzer; +Cc: dm-devel, linux-block

On Wed, 2025-10-08 at 23:04 -0400, Benjamin Marzinski wrote:
> Request-based devices (dm-multipath) queue I/O in blk-mq on noflush
> suspends. Any queued IO will make it impossible to freeze the queue.
> If
> a process attempts to update the queue limits while there is queued
> IO,
> it can be get stuck holding the limits lock, while unable to freeze
> the
> queue. If device-mapper then attempts to update the limits during a
> table swap, it will deadlock trying to grab the limits lock while
> making
> it impossible to flush the IO.
> 
> Disallow updating the queue limits during a table swap, when updating
> an
> immutable request-based dm device (dm-multipath) during a noflush
> suspend. It is userspace's responsibility to make sure that the new
> table uses the same limits as the existing table if it asks for a
> noflush suspend.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  drivers/md/dm-table.c |  4 ++++
>  drivers/md/dm-thin.c  |  7 ++-----
>  drivers/md/dm.c       | 35 +++++++++++++++++++++++------------
>  3 files changed, 29 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index ad0a60a07b93..0522cd700e0e 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -2043,6 +2043,10 @@ bool dm_table_supports_size_change(struct
> dm_table *t, sector_t old_size,
>  	return true;
>  }
>  
> +/*
> + * This function will be skipped by noflush reloads of immutable
> request
> + * based devices (dm-mpath).
> + */
>  int dm_table_set_restrictions(struct dm_table *t, struct
> request_queue *q,
>  			      struct queue_limits *limits)
>  {
> diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> index c84149ba4e38..6f98936f0e05 100644
> --- a/drivers/md/dm-thin.c
> +++ b/drivers/md/dm-thin.c
> @@ -4383,11 +4383,8 @@ static void thin_postsuspend(struct dm_target
> *ti)
>  {
>  	struct thin_c *tc = ti->private;
>  
> -	/*
> -	 * The dm_noflush_suspending flag has been cleared by now,
> so
> -	 * unfortunately we must always run this.
> -	 */
> -	noflush_work(tc, do_noflush_stop);
> +	if (dm_noflush_suspending(ti))
> +		noflush_work(tc, do_noflush_stop);
>  }
>  
>  static int thin_preresume(struct dm_target *ti)
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 3ede5ba4cf7e..87e65c48dd04 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -2439,7 +2439,6 @@ static struct dm_table *__bind(struct
> mapped_device *md, struct dm_table *t,
>  {
>  	struct dm_table *old_map;
>  	sector_t size, old_size;
> -	int ret;
>  
>  	lockdep_assert_held(&md->suspend_lock);
>  
> @@ -2454,11 +2453,13 @@ static struct dm_table *__bind(struct
> mapped_device *md, struct dm_table *t,
>  
>  	set_capacity(md->disk, size);
>  
> -	ret = dm_table_set_restrictions(t, md->queue, limits);
> -	if (ret) {
> -		set_capacity(md->disk, old_size);
> -		old_map = ERR_PTR(ret);
> -		goto out;
> +	if (limits) {
> +		int ret = dm_table_set_restrictions(t, md->queue,
> limits);
> +		if (ret) {
> +			set_capacity(md->disk, old_size);
> +			old_map = ERR_PTR(ret);
> +			goto out;
> +		}
>  	}
>  
>  	/*
> @@ -2836,6 +2837,7 @@ static void dm_wq_work(struct work_struct
> *work)
>  
>  static void dm_queue_flush(struct mapped_device *md)
>  {
> +	clear_bit(DMF_NOFLUSH_SUSPENDING, &md->flags);
>  	clear_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags);
>  	smp_mb__after_atomic();
>  	queue_work(md->wq, &md->work);
> @@ -2848,6 +2850,7 @@ struct dm_table *dm_swap_table(struct
> mapped_device *md, struct dm_table *table)
>  {
>  	struct dm_table *live_map = NULL, *map = ERR_PTR(-EINVAL);
>  	struct queue_limits limits;
> +	bool update_limits = true;
>  	int r;
>  
>  	mutex_lock(&md->suspend_lock);
> @@ -2856,20 +2859,31 @@ struct dm_table *dm_swap_table(struct
> mapped_device *md, struct dm_table *table)
>  	if (!dm_suspended_md(md))
>  		goto out;
>  
> +	/*
> +	 * To avoid a potential deadlock locking the queue limits,
> disallow
> +	 * updating the queue limits during a table swap, when
> updating an
> +	 * immutable request-based dm device (dm-multipath) during a
> noflush
> +	 * suspend. It is userspace's responsibility to make sure
> that the new
> +	 * table uses the same limits as the existing table, if it
> asks for a
> +	 * noflush suspend.
> +	 */
> +	if (dm_request_based(md) && md->immutable_target &&
> +	    __noflush_suspending(md))
> +		update_limits = false;
>  	/*
>  	 * If the new table has no data devices, retain the existing
> limits.
>  	 * This helps multipath with queue_if_no_path if all paths
> disappear,
>  	 * then new I/O is queued based on these limits, and then
> some paths
>  	 * reappear.
>  	 */
> -	if (dm_table_has_no_data_devices(table)) {
> +	else if (dm_table_has_no_data_devices(table)) {
>  		live_map = dm_get_live_table_fast(md);
>  		if (live_map)
>  			limits = md->queue->limits;
>  		dm_put_live_table_fast(md);
>  	}
>  
> -	if (!live_map) {
> +	if (update_limits && !live_map) {
>  		r = dm_calculate_queue_limits(table, &limits);
>  		if (r) {
>  			map = ERR_PTR(r);
> @@ -2877,7 +2891,7 @@ struct dm_table *dm_swap_table(struct
> mapped_device *md, struct dm_table *table)
>  		}
>  	}
>  
> -	map = __bind(md, table, &limits);
> +	map = __bind(md, table, update_limits ? &limits : NULL);
>  	dm_issue_global_event();
>  
>  out:
> @@ -2930,7 +2944,6 @@ static int __dm_suspend(struct mapped_device
> *md, struct dm_table *map,
>  
>  	/*
>  	 * DMF_NOFLUSH_SUSPENDING must be set before presuspend.
> -	 * This flag is cleared before dm_suspend returns.
>  	 */
>  	if (noflush)
>  		set_bit(DMF_NOFLUSH_SUSPENDING, &md->flags);
> @@ -2993,8 +3006,6 @@ static int __dm_suspend(struct mapped_device
> *md, struct dm_table *map,
>  	if (!r)
>  		set_bit(dmf_suspended_flag, &md->flags);
>  
> -	if (noflush)
> -		clear_bit(DMF_NOFLUSH_SUSPENDING, &md->flags);
>  	if (map)
>  		synchronize_srcu(&md->io_barrier);

This moves the clear_bit() behind synchronize_rcu(). Is that safe?

In general, I'm wondering whether we need a more generic solution to
this problem. Therefore I've added linux-block to cc.

The way I see it, if a device has queued IO without any means to
perform the IO, it can't be frozen. We'd either need to fail all queued
IO in this case, or refuse attempts to freeze the queue.

Thanks,
Martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dm: Fix deadlock when reloading a multipath table
  2025-10-09  9:57 ` [PATCH] dm: Fix deadlock when reloading a multipath table Martin Wilck
@ 2025-10-09 22:25   ` Benjamin Marzinski
  2025-10-10 10:19     ` Martin Wilck
  2025-10-09 23:29   ` Bart Van Assche
  1 sibling, 1 reply; 10+ messages in thread
From: Benjamin Marzinski @ 2025-10-09 22:25 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Mikulas Patocka, Mike Snitzer, dm-devel, linux-block

On Thu, Oct 09, 2025 at 11:57:08AM +0200, Martin Wilck wrote:
> On Wed, 2025-10-08 at 23:04 -0400, Benjamin Marzinski wrote:
> > Request-based devices (dm-multipath) queue I/O in blk-mq on noflush
> > suspends. Any queued IO will make it impossible to freeze the queue.
> > If
> > a process attempts to update the queue limits while there is queued
> > IO,
> > it can be get stuck holding the limits lock, while unable to freeze
> > the
> > queue. If device-mapper then attempts to update the limits during a
> > table swap, it will deadlock trying to grab the limits lock while
> > making
> > it impossible to flush the IO.
> > 
> > Disallow updating the queue limits during a table swap, when updating
> > an
> > immutable request-based dm device (dm-multipath) during a noflush
> > suspend. It is userspace's responsibility to make sure that the new
> > table uses the same limits as the existing table if it asks for a
> > noflush suspend.
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> >  drivers/md/dm-table.c |  4 ++++
> >  drivers/md/dm-thin.c  |  7 ++-----
> >  drivers/md/dm.c       | 35 +++++++++++++++++++++++------------
> >  3 files changed, 29 insertions(+), 17 deletions(-)
> > 
> > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> > index ad0a60a07b93..0522cd700e0e 100644
> > --- a/drivers/md/dm-table.c
> > +++ b/drivers/md/dm-table.c
> > @@ -2043,6 +2043,10 @@ bool dm_table_supports_size_change(struct
> > dm_table *t, sector_t old_size,
> >  	return true;
> >  }
> >  
> > +/*
> > + * This function will be skipped by noflush reloads of immutable
> > request
> > + * based devices (dm-mpath).
> > + */
> >  int dm_table_set_restrictions(struct dm_table *t, struct
> > request_queue *q,
> >  			      struct queue_limits *limits)
> >  {
> > diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> > index c84149ba4e38..6f98936f0e05 100644
> > --- a/drivers/md/dm-thin.c
> > +++ b/drivers/md/dm-thin.c
> > @@ -4383,11 +4383,8 @@ static void thin_postsuspend(struct dm_target
> > *ti)
> >  {
> >  	struct thin_c *tc = ti->private;
> >  
> > -	/*
> > -	 * The dm_noflush_suspending flag has been cleared by now,
> > so
> > -	 * unfortunately we must always run this.
> > -	 */
> > -	noflush_work(tc, do_noflush_stop);
> > +	if (dm_noflush_suspending(ti))
> > +		noflush_work(tc, do_noflush_stop);
> >  }
> >  
> >  static int thin_preresume(struct dm_target *ti)
> > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > index 3ede5ba4cf7e..87e65c48dd04 100644
> > --- a/drivers/md/dm.c
> > +++ b/drivers/md/dm.c
> > @@ -2439,7 +2439,6 @@ static struct dm_table *__bind(struct
> > mapped_device *md, struct dm_table *t,
> >  {
> >  	struct dm_table *old_map;
> >  	sector_t size, old_size;
> > -	int ret;
> >  
> >  	lockdep_assert_held(&md->suspend_lock);
> >  
> > @@ -2454,11 +2453,13 @@ static struct dm_table *__bind(struct
> > mapped_device *md, struct dm_table *t,
> >  
> >  	set_capacity(md->disk, size);
> >  
> > -	ret = dm_table_set_restrictions(t, md->queue, limits);
> > -	if (ret) {
> > -		set_capacity(md->disk, old_size);
> > -		old_map = ERR_PTR(ret);
> > -		goto out;
> > +	if (limits) {
> > +		int ret = dm_table_set_restrictions(t, md->queue,
> > limits);
> > +		if (ret) {
> > +			set_capacity(md->disk, old_size);
> > +			old_map = ERR_PTR(ret);
> > +			goto out;
> > +		}
> >  	}
> >  
> >  	/*
> > @@ -2836,6 +2837,7 @@ static void dm_wq_work(struct work_struct
> > *work)
> >  
> >  static void dm_queue_flush(struct mapped_device *md)
> >  {
> > +	clear_bit(DMF_NOFLUSH_SUSPENDING, &md->flags);
> >  	clear_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags);
> >  	smp_mb__after_atomic();
> >  	queue_work(md->wq, &md->work);
> > @@ -2848,6 +2850,7 @@ struct dm_table *dm_swap_table(struct
> > mapped_device *md, struct dm_table *table)
> >  {
> >  	struct dm_table *live_map = NULL, *map = ERR_PTR(-EINVAL);
> >  	struct queue_limits limits;
> > +	bool update_limits = true;
> >  	int r;
> >  
> >  	mutex_lock(&md->suspend_lock);
> > @@ -2856,20 +2859,31 @@ struct dm_table *dm_swap_table(struct
> > mapped_device *md, struct dm_table *table)
> >  	if (!dm_suspended_md(md))
> >  		goto out;
> >  
> > +	/*
> > +	 * To avoid a potential deadlock locking the queue limits,
> > disallow
> > +	 * updating the queue limits during a table swap, when
> > updating an
> > +	 * immutable request-based dm device (dm-multipath) during a
> > noflush
> > +	 * suspend. It is userspace's responsibility to make sure
> > that the new
> > +	 * table uses the same limits as the existing table, if it
> > asks for a
> > +	 * noflush suspend.
> > +	 */
> > +	if (dm_request_based(md) && md->immutable_target &&
> > +	    __noflush_suspending(md))
> > +		update_limits = false;
> >  	/*
> >  	 * If the new table has no data devices, retain the existing
> > limits.
> >  	 * This helps multipath with queue_if_no_path if all paths
> > disappear,
> >  	 * then new I/O is queued based on these limits, and then
> > some paths
> >  	 * reappear.
> >  	 */
> > -	if (dm_table_has_no_data_devices(table)) {
> > +	else if (dm_table_has_no_data_devices(table)) {
> >  		live_map = dm_get_live_table_fast(md);
> >  		if (live_map)
> >  			limits = md->queue->limits;
> >  		dm_put_live_table_fast(md);
> >  	}
> >  
> > -	if (!live_map) {
> > +	if (update_limits && !live_map) {
> >  		r = dm_calculate_queue_limits(table, &limits);
> >  		if (r) {
> >  			map = ERR_PTR(r);
> > @@ -2877,7 +2891,7 @@ struct dm_table *dm_swap_table(struct
> > mapped_device *md, struct dm_table *table)
> >  		}
> >  	}
> >  
> > -	map = __bind(md, table, &limits);
> > +	map = __bind(md, table, update_limits ? &limits : NULL);
> >  	dm_issue_global_event();
> >  
> >  out:
> > @@ -2930,7 +2944,6 @@ static int __dm_suspend(struct mapped_device
> > *md, struct dm_table *map,
> >  
> >  	/*
> >  	 * DMF_NOFLUSH_SUSPENDING must be set before presuspend.
> > -	 * This flag is cleared before dm_suspend returns.
> >  	 */
> >  	if (noflush)
> >  		set_bit(DMF_NOFLUSH_SUSPENDING, &md->flags);
> > @@ -2993,8 +3006,6 @@ static int __dm_suspend(struct mapped_device
> > *md, struct dm_table *map,
> >  	if (!r)
> >  		set_bit(dmf_suspended_flag, &md->flags);
> >  
> > -	if (noflush)
> > -		clear_bit(DMF_NOFLUSH_SUSPENDING, &md->flags);
> >  	if (map)
> >  		synchronize_srcu(&md->io_barrier);
> 
> This moves the clear_bit() behind synchronize_rcu(). Is that safe?

It's not just moved behind the synchronize_rcu(). DMF_NOFLUSH_SUSPENDING
is now set for the entire time the device is suspended. If people would
like, I'll update the patch to rename it to DMF_NOFLUSH_SUSPEND.

I did check to see if holding it for the entire suspend would cause
issues, but I didn't see any case where it would. If I missed a 
case where __noflush_suspending() should only return true if we are
actually in the process of suspending, I can easily update that
function to do that.
 
> In general, I'm wondering whether we need a more generic solution to
> this problem. Therefore I've added linux-block to cc.
> 
> The way I see it, if a device has queued IO without any means to
> perform the IO, it can't be frozen. We'd either need to fail all queued
> IO in this case, or refuse attempts to freeze the queue.

In general, it's perfectly normal to start freezing the queue while
there are still outstanding requests. I'm not sure that there is a good
way to distinguish between multipath's queued requests and other cases.
I'd be happy to be proved wrong, however.

-Ben
 
> Thanks,
> Martin


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dm: Fix deadlock when reloading a multipath table
  2025-10-09  9:57 ` [PATCH] dm: Fix deadlock when reloading a multipath table Martin Wilck
  2025-10-09 22:25   ` Benjamin Marzinski
@ 2025-10-09 23:29   ` Bart Van Assche
  2025-10-10  0:39     ` Benjamin Marzinski
  2025-10-10  9:23     ` Martin Wilck
  1 sibling, 2 replies; 10+ messages in thread
From: Bart Van Assche @ 2025-10-09 23:29 UTC (permalink / raw)
  To: Martin Wilck, Benjamin Marzinski, Mikulas Patocka, Mike Snitzer
  Cc: dm-devel, linux-block

On 10/9/25 2:57 AM, Martin Wilck wrote:
> In general, I'm wondering whether we need a more generic solution to
> this problem. Therefore I've added linux-block to cc.
> 
> The way I see it, if a device has queued IO without any means to
> perform the IO, it can't be frozen. We'd either need to fail all queued
> IO in this case, or refuse attempts to freeze the queue.

If a device has queued I/O and the I/O can't make progress then it isn't
necessary to call blk_mq_freeze_queue(), isn't it? See also "[PATCH 0/3] 
Fix a deadlock related to modifying queue attributes"
(https://lore.kernel.org/linux-block/20250702182430.3764163-1-bvanassche@acm.org/).

BTW, that patch series is not upstream. I apply it manually every time
before I run blktests.

Bart.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dm: Fix deadlock when reloading a multipath table
  2025-10-09 23:29   ` Bart Van Assche
@ 2025-10-10  0:39     ` Benjamin Marzinski
  2025-10-10  9:23     ` Martin Wilck
  1 sibling, 0 replies; 10+ messages in thread
From: Benjamin Marzinski @ 2025-10-10  0:39 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Martin Wilck, Mikulas Patocka, Mike Snitzer, dm-devel,
	linux-block

On Thu, Oct 09, 2025 at 04:29:21PM -0700, Bart Van Assche wrote:
> On 10/9/25 2:57 AM, Martin Wilck wrote:
> > In general, I'm wondering whether we need a more generic solution to
> > this problem. Therefore I've added linux-block to cc.
> > 
> > The way I see it, if a device has queued IO without any means to
> > perform the IO, it can't be frozen. We'd either need to fail all queued
> > IO in this case, or refuse attempts to freeze the queue.
> 
> If a device has queued I/O and the I/O can't make progress then it isn't
> necessary to call blk_mq_freeze_queue(), isn't it? See also "[PATCH 0/3] Fix
> a deadlock related to modifying queue attributes"
> (https://lore.kernel.org/linux-block/20250702182430.3764163-1-bvanassche@acm.org/).
> 
> BTW, that patch series is not upstream. I apply it manually every time
> before I run blktests.

Timing out the queue freeze looks like a good solution to me.

Also, looking at that thread, I should point out that bio-based
multipath does not suffer from this, since it doesn't queue requests. It
queues bios. The reason you still got the hang with your blktests patch
is that it didn't actually configure bio-based multipath ("queue_mode"
is not a valid multipath.conf option). That would look something like
this:

===================
diff --git a/common/multipath-over-rdma b/common/multipath-over-rdma
index e157e0a..1084f80 100644
--- a/common/multipath-over-rdma
+++ b/common/multipath-over-rdma
@@ -267,9 +267,7 @@ mpath_has_stale_dev() {
 # Check whether multipath definition $1 includes the queue_if_no_path keyword.
 is_qinp_def() {
 	case "$1" in
-		*" 3 queue_if_no_path queue_mode mq "*)
-			return 0;;
-		*" 1 queue_if_no_path "*)
+		*" queue_if_no_path "*)
 			return 0;;
 		*)
 			return 1;;
diff --git a/tests/srp/multipath.conf b/tests/srp/multipath.conf
index e0da32e..aad8e56 100644
--- a/tests/srp/multipath.conf
+++ b/tests/srp/multipath.conf
@@ -9,6 +9,7 @@ devices {
 		product		".*"
 		no_path_retry	queue
 		path_checker	tur
+		features "2 queue_mode bio"
 	}
 }
 blacklist {
==================

But you are correct that unless we can get bio-based multipath to
something like performance parity with request-based multipath,
customers aren't interested it in.

-Ben

> 
> Bart.


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] dm: Fix deadlock when reloading a multipath table
  2025-10-09 23:29   ` Bart Van Assche
  2025-10-10  0:39     ` Benjamin Marzinski
@ 2025-10-10  9:23     ` Martin Wilck
  1 sibling, 0 replies; 10+ messages in thread
From: Martin Wilck @ 2025-10-10  9:23 UTC (permalink / raw)
  To: Bart Van Assche, Benjamin Marzinski, Mikulas Patocka,
	Mike Snitzer
  Cc: dm-devel, linux-block

Hi Bart,

On Thu, 2025-10-09 at 16:29 -0700, Bart Van Assche wrote:
> On 10/9/25 2:57 AM, Martin Wilck wrote:
> > In general, I'm wondering whether we need a more generic solution
> > to
> > this problem. Therefore I've added linux-block to cc.
> > 
> > The way I see it, if a device has queued IO without any means to
> > perform the IO, it can't be frozen. We'd either need to fail all
> > queued
> > IO in this case, or refuse attempts to freeze the queue.
> 
> If a device has queued I/O and the I/O can't make progress then it
> isn't
> necessary to call blk_mq_freeze_queue(), isn't it? 

Good point. Even if the queue limits were changed while the IO was in
the queue,  they'd be re-checked in blk_insert_cloned_request(). That
might cause IO failures, but if you modify queue limits while IO is in
flight, that's part of the risk, I suppose.

> See also "[PATCH 0/3] 
> Fix a deadlock related to modifying queue attributes"
> (
> https://lore.kernel.org/linux-block/20250702182430.3764163-1-bvanassche@acm.org
> /).
> 
> BTW, that patch series is not upstream. I apply it manually every
> time
> before I run blktests.

Will you make another attempt to get it merged?

Martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dm: Fix deadlock when reloading a multipath table
  2025-10-09 22:25   ` Benjamin Marzinski
@ 2025-10-10 10:19     ` Martin Wilck
  2025-10-21 19:16       ` Benjamin Marzinski
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Wilck @ 2025-10-10 10:19 UTC (permalink / raw)
  To: Benjamin Marzinski, Bart Van Assche
  Cc: Mikulas Patocka, Mike Snitzer, dm-devel, linux-block

On Thu, 2025-10-09 at 18:25 -0400, Benjamin Marzinski wrote:
> On Thu, Oct 09, 2025 at 11:57:08AM +0200, Martin Wilck wrote:
> > On Wed, 2025-10-08 at 23:04 -0400, Benjamin Marzinski wrote:
> > > Request-based devices (dm-multipath) queue I/O in blk-mq on
> > > noflush
> > > suspends. Any queued IO will make it impossible to freeze the
> > > queue.
> > > If
> > > a process attempts to update the queue limits while there is
> > > queued
> > > IO,
> > > it can be get stuck holding the limits lock, while unable to
> > > freeze
> > > the
> > > queue. If device-mapper then attempts to update the limits during
> > > a
> > > table swap, it will deadlock trying to grab the limits lock while
> > > making
> > > it impossible to flush the IO.
> > > 
> > > Disallow updating the queue limits during a table swap, when
> > > updating
> > > an
> > > immutable request-based dm device (dm-multipath) during a noflush
> > > suspend. It is userspace's responsibility to make sure that the
> > > new
> > > table uses the same limits as the existing table if it asks for a
> > > noflush suspend.
> > > 
> > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > > ---
> > >  drivers/md/dm-table.c |  4 ++++
> > >  drivers/md/dm-thin.c  |  7 ++-----
> > >  drivers/md/dm.c       | 35 +++++++++++++++++++++++------------
> > >  3 files changed, 29 insertions(+), 17 deletions(-)
> > > 
> > > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> > > index ad0a60a07b93..0522cd700e0e 100644
> > > --- a/drivers/md/dm-table.c
> > > +++ b/drivers/md/dm-table.c
> > > @@ -2043,6 +2043,10 @@ bool dm_table_supports_size_change(struct
> > > dm_table *t, sector_t old_size,
> > >  	return true;
> > >  }
> > >  
> > > +/*
> > > + * This function will be skipped by noflush reloads of immutable
> > > request
> > > + * based devices (dm-mpath).
> > > + */
> > >  int dm_table_set_restrictions(struct dm_table *t, struct
> > > request_queue *q,
> > >  			      struct queue_limits *limits)
> > >  {
> > > diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> > > index c84149ba4e38..6f98936f0e05 100644
> > > --- a/drivers/md/dm-thin.c
> > > +++ b/drivers/md/dm-thin.c
> > > @@ -4383,11 +4383,8 @@ static void thin_postsuspend(struct
> > > dm_target
> > > *ti)
> > >  {
> > >  	struct thin_c *tc = ti->private;
> > >  
> > > -	/*
> > > -	 * The dm_noflush_suspending flag has been cleared by
> > > now,
> > > so
> > > -	 * unfortunately we must always run this.
> > > -	 */
> > > -	noflush_work(tc, do_noflush_stop);
> > > +	if (dm_noflush_suspending(ti))
> > > +		noflush_work(tc, do_noflush_stop);
> > >  }
> > >  
> > >  static int thin_preresume(struct dm_target *ti)
> > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > > index 3ede5ba4cf7e..87e65c48dd04 100644
> > > --- a/drivers/md/dm.c
> > > +++ b/drivers/md/dm.c
> > > @@ -2439,7 +2439,6 @@ static struct dm_table *__bind(struct
> > > mapped_device *md, struct dm_table *t,
> > >  {
> > >  	struct dm_table *old_map;
> > >  	sector_t size, old_size;
> > > -	int ret;
> > >  
> > >  	lockdep_assert_held(&md->suspend_lock);
> > >  
> > > @@ -2454,11 +2453,13 @@ static struct dm_table *__bind(struct
> > > mapped_device *md, struct dm_table *t,
> > >  
> > >  	set_capacity(md->disk, size);
> > >  
> > > -	ret = dm_table_set_restrictions(t, md->queue, limits);
> > > -	if (ret) {
> > > -		set_capacity(md->disk, old_size);
> > > -		old_map = ERR_PTR(ret);
> > > -		goto out;
> > > +	if (limits) {
> > > +		int ret = dm_table_set_restrictions(t, md-
> > > >queue,
> > > limits);
> > > +		if (ret) {
> > > +			set_capacity(md->disk, old_size);
> > > +			old_map = ERR_PTR(ret);
> > > +			goto out;
> > > +		}
> > >  	}
> > >  
> > >  	/*
> > > @@ -2836,6 +2837,7 @@ static void dm_wq_work(struct work_struct
> > > *work)
> > >  
> > >  static void dm_queue_flush(struct mapped_device *md)
> > >  {
> > > +	clear_bit(DMF_NOFLUSH_SUSPENDING, &md->flags);
> > >  	clear_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags);
> > >  	smp_mb__after_atomic();
> > >  	queue_work(md->wq, &md->work);
> > > @@ -2848,6 +2850,7 @@ struct dm_table *dm_swap_table(struct
> > > mapped_device *md, struct dm_table *table)
> > >  {
> > >  	struct dm_table *live_map = NULL, *map = ERR_PTR(-
> > > EINVAL);
> > >  	struct queue_limits limits;
> > > +	bool update_limits = true;
> > >  	int r;
> > >  
> > >  	mutex_lock(&md->suspend_lock);
> > > @@ -2856,20 +2859,31 @@ struct dm_table *dm_swap_table(struct
> > > mapped_device *md, struct dm_table *table)
> > >  	if (!dm_suspended_md(md))
> > >  		goto out;
> > >  
> > > +	/*
> > > +	 * To avoid a potential deadlock locking the queue
> > > limits,
> > > disallow
> > > +	 * updating the queue limits during a table swap, when
> > > updating an
> > > +	 * immutable request-based dm device (dm-multipath)
> > > during a
> > > noflush
> > > +	 * suspend. It is userspace's responsibility to make
> > > sure
> > > that the new
> > > +	 * table uses the same limits as the existing table, if
> > > it
> > > asks for a
> > > +	 * noflush suspend.
> > > +	 */
> > > +	if (dm_request_based(md) && md->immutable_target &&
> > > +	    __noflush_suspending(md))
> > > +		update_limits = false;
> > >  	/*
> > >  	 * If the new table has no data devices, retain the
> > > existing
> > > limits.
> > >  	 * This helps multipath with queue_if_no_path if all
> > > paths
> > > disappear,
> > >  	 * then new I/O is queued based on these limits, and
> > > then
> > > some paths
> > >  	 * reappear.
> > >  	 */
> > > -	if (dm_table_has_no_data_devices(table)) {
> > > +	else if (dm_table_has_no_data_devices(table)) {
> > >  		live_map = dm_get_live_table_fast(md);
> > >  		if (live_map)
> > >  			limits = md->queue->limits;
> > >  		dm_put_live_table_fast(md);
> > >  	}
> > >  
> > > -	if (!live_map) {
> > > +	if (update_limits && !live_map) {
> > >  		r = dm_calculate_queue_limits(table, &limits);
> > >  		if (r) {
> > >  			map = ERR_PTR(r);
> > > @@ -2877,7 +2891,7 @@ struct dm_table *dm_swap_table(struct
> > > mapped_device *md, struct dm_table *table)
> > >  		}
> > >  	}
> > >  
> > > -	map = __bind(md, table, &limits);
> > > +	map = __bind(md, table, update_limits ? &limits : NULL);
> > >  	dm_issue_global_event();
> > >  
> > >  out:
> > > @@ -2930,7 +2944,6 @@ static int __dm_suspend(struct
> > > mapped_device
> > > *md, struct dm_table *map,
> > >  
> > >  	/*
> > >  	 * DMF_NOFLUSH_SUSPENDING must be set before presuspend.
> > > -	 * This flag is cleared before dm_suspend returns.
> > >  	 */
> > >  	if (noflush)
> > >  		set_bit(DMF_NOFLUSH_SUSPENDING, &md->flags);
> > > @@ -2993,8 +3006,6 @@ static int __dm_suspend(struct
> > > mapped_device
> > > *md, struct dm_table *map,
> > >  	if (!r)
> > >  		set_bit(dmf_suspended_flag, &md->flags);
> > >  
> > > -	if (noflush)
> > > -		clear_bit(DMF_NOFLUSH_SUSPENDING, &md->flags);
> > >  	if (map)
> > >  		synchronize_srcu(&md->io_barrier);
> > 
> > This moves the clear_bit() behind synchronize_rcu(). Is that safe?
> 
> It's not just moved behind the synchronize_rcu().
> DMF_NOFLUSH_SUSPENDING
> is now set for the entire time the device is suspended. If people
> would
> like, I'll update the patch to rename it to DMF_NOFLUSH_SUSPEND.

Right. I realize now that there's a smp_mb__after_atomic() in
dm_queue_flush().

> I did check to see if holding it for the entire suspend would cause
> issues, but I didn't see any case where it would. If I missed a 
> case where __noflush_suspending() should only return true if we are
> actually in the process of suspending, I can easily update that
> function to do that.

If this is necessary, I agree that the flag an related function should
be renamed. But there are already generic DM flags to indicate that a
queue is suspend*ed*. Perhaps, instead of changing the semantics of
DMF_NOFLUSH_SUSPENDING, it would make more sense to test 

  (__noflush_suspending || test_bit(DMF_BLOCK_IO_FOR_SUSPEND)

in dm_swap_table()?

Anyway, I am still not convinced that we want to have this specific
exception for multipath only.

> > In general, I'm wondering whether we need a more generic solution
> > to
> > this problem. Therefore I've added linux-block to cc.
> > 
> > The way I see it, if a device has queued IO without any means to
> > perform the IO, it can't be frozen. We'd either need to fail all
> > queued
> > IO in this case, or refuse attempts to freeze the queue.
> 
> In general, it's perfectly normal to start freezing the queue while
> there are still outstanding requests.

Sure, but my point was "without any means to perform the IO". As you
pointed out yourself, a freeze attempt in this situation would get
stuck.

I find Bart's approach very attractive; freezing might not be necessary
at all in that case. We'dd just need to avoid a race where paths get
reinstated while the operation that would normally have required a
freeze is ongoing.

Martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dm: Fix deadlock when reloading a multipath table
  2025-10-10 10:19     ` Martin Wilck
@ 2025-10-21 19:16       ` Benjamin Marzinski
  2025-10-22 14:11         ` Martin Wilck
  0 siblings, 1 reply; 10+ messages in thread
From: Benjamin Marzinski @ 2025-10-21 19:16 UTC (permalink / raw)
  To: Martin Wilck
  Cc: Bart Van Assche, Mikulas Patocka, Mike Snitzer, dm-devel,
	linux-block

On Fri, Oct 10, 2025 at 12:19:51PM +0200, Martin Wilck wrote:
> On Thu, 2025-10-09 at 18:25 -0400, Benjamin Marzinski wrote:
> > On Thu, Oct 09, 2025 at 11:57:08AM +0200, Martin Wilck wrote:
> > > On Wed, 2025-10-08 at 23:04 -0400, Benjamin Marzinski wrote:
> > > > Request-based devices (dm-multipath) queue I/O in blk-mq on
> > > > noflush
> > > > suspends. Any queued IO will make it impossible to freeze the
> > > > queue.
> > > > If
> > > > a process attempts to update the queue limits while there is
> > > > queued
> > > > IO,
> > > > it can be get stuck holding the limits lock, while unable to
> > > > freeze
> > > > the
> > > > queue. If device-mapper then attempts to update the limits during
> > > > a
> > > > table swap, it will deadlock trying to grab the limits lock while
> > > > making
> > > > it impossible to flush the IO.
> > > > 
> > > > Disallow updating the queue limits during a table swap, when
> > > > updating
> > > > an
> > > > immutable request-based dm device (dm-multipath) during a noflush
> > > > suspend. It is userspace's responsibility to make sure that the
> > > > new
> > > > table uses the same limits as the existing table if it asks for a
> > > > noflush suspend.
> > > > 
> > > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > > > ---
> > > >  drivers/md/dm-table.c |  4 ++++
> > > >  drivers/md/dm-thin.c  |  7 ++-----
> > > >  drivers/md/dm.c       | 35 +++++++++++++++++++++++------------
> > > >  3 files changed, 29 insertions(+), 17 deletions(-)
> > > > 
> > > > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> > > > index ad0a60a07b93..0522cd700e0e 100644
> > > > --- a/drivers/md/dm-table.c
> > > > +++ b/drivers/md/dm-table.c
> > > > @@ -2043,6 +2043,10 @@ bool dm_table_supports_size_change(struct
> > > > dm_table *t, sector_t old_size,
> > > >  	return true;
> > > >  }
> > > >  
> > > > +/*
> > > > + * This function will be skipped by noflush reloads of immutable
> > > > request
> > > > + * based devices (dm-mpath).
> > > > + */
> > > >  int dm_table_set_restrictions(struct dm_table *t, struct
> > > > request_queue *q,
> > > >  			      struct queue_limits *limits)
> > > >  {
> > > > diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> > > > index c84149ba4e38..6f98936f0e05 100644
> > > > --- a/drivers/md/dm-thin.c
> > > > +++ b/drivers/md/dm-thin.c
> > > > @@ -4383,11 +4383,8 @@ static void thin_postsuspend(struct
> > > > dm_target
> > > > *ti)
> > > >  {
> > > >  	struct thin_c *tc = ti->private;
> > > >  
> > > > -	/*
> > > > -	 * The dm_noflush_suspending flag has been cleared by
> > > > now,
> > > > so
> > > > -	 * unfortunately we must always run this.
> > > > -	 */
> > > > -	noflush_work(tc, do_noflush_stop);
> > > > +	if (dm_noflush_suspending(ti))
> > > > +		noflush_work(tc, do_noflush_stop);
> > > >  }
> > > >  
> > > >  static int thin_preresume(struct dm_target *ti)
> > > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > > > index 3ede5ba4cf7e..87e65c48dd04 100644
> > > > --- a/drivers/md/dm.c
> > > > +++ b/drivers/md/dm.c
> > > > @@ -2439,7 +2439,6 @@ static struct dm_table *__bind(struct
> > > > mapped_device *md, struct dm_table *t,
> > > >  {
> > > >  	struct dm_table *old_map;
> > > >  	sector_t size, old_size;
> > > > -	int ret;
> > > >  
> > > >  	lockdep_assert_held(&md->suspend_lock);
> > > >  
> > > > @@ -2454,11 +2453,13 @@ static struct dm_table *__bind(struct
> > > > mapped_device *md, struct dm_table *t,
> > > >  
> > > >  	set_capacity(md->disk, size);
> > > >  
> > > > -	ret = dm_table_set_restrictions(t, md->queue, limits);
> > > > -	if (ret) {
> > > > -		set_capacity(md->disk, old_size);
> > > > -		old_map = ERR_PTR(ret);
> > > > -		goto out;
> > > > +	if (limits) {
> > > > +		int ret = dm_table_set_restrictions(t, md-
> > > > >queue,
> > > > limits);
> > > > +		if (ret) {
> > > > +			set_capacity(md->disk, old_size);
> > > > +			old_map = ERR_PTR(ret);
> > > > +			goto out;
> > > > +		}
> > > >  	}
> > > >  
> > > >  	/*
> > > > @@ -2836,6 +2837,7 @@ static void dm_wq_work(struct work_struct
> > > > *work)
> > > >  
> > > >  static void dm_queue_flush(struct mapped_device *md)
> > > >  {
> > > > +	clear_bit(DMF_NOFLUSH_SUSPENDING, &md->flags);
> > > >  	clear_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags);
> > > >  	smp_mb__after_atomic();
> > > >  	queue_work(md->wq, &md->work);
> > > > @@ -2848,6 +2850,7 @@ struct dm_table *dm_swap_table(struct
> > > > mapped_device *md, struct dm_table *table)
> > > >  {
> > > >  	struct dm_table *live_map = NULL, *map = ERR_PTR(-
> > > > EINVAL);
> > > >  	struct queue_limits limits;
> > > > +	bool update_limits = true;
> > > >  	int r;
> > > >  
> > > >  	mutex_lock(&md->suspend_lock);
> > > > @@ -2856,20 +2859,31 @@ struct dm_table *dm_swap_table(struct
> > > > mapped_device *md, struct dm_table *table)
> > > >  	if (!dm_suspended_md(md))
> > > >  		goto out;
> > > >  
> > > > +	/*
> > > > +	 * To avoid a potential deadlock locking the queue
> > > > limits,
> > > > disallow
> > > > +	 * updating the queue limits during a table swap, when
> > > > updating an
> > > > +	 * immutable request-based dm device (dm-multipath)
> > > > during a
> > > > noflush
> > > > +	 * suspend. It is userspace's responsibility to make
> > > > sure
> > > > that the new
> > > > +	 * table uses the same limits as the existing table, if
> > > > it
> > > > asks for a
> > > > +	 * noflush suspend.
> > > > +	 */
> > > > +	if (dm_request_based(md) && md->immutable_target &&
> > > > +	    __noflush_suspending(md))
> > > > +		update_limits = false;
> > > >  	/*
> > > >  	 * If the new table has no data devices, retain the
> > > > existing
> > > > limits.
> > > >  	 * This helps multipath with queue_if_no_path if all
> > > > paths
> > > > disappear,
> > > >  	 * then new I/O is queued based on these limits, and
> > > > then
> > > > some paths
> > > >  	 * reappear.
> > > >  	 */
> > > > -	if (dm_table_has_no_data_devices(table)) {
> > > > +	else if (dm_table_has_no_data_devices(table)) {
> > > >  		live_map = dm_get_live_table_fast(md);
> > > >  		if (live_map)
> > > >  			limits = md->queue->limits;
> > > >  		dm_put_live_table_fast(md);
> > > >  	}
> > > >  
> > > > -	if (!live_map) {
> > > > +	if (update_limits && !live_map) {
> > > >  		r = dm_calculate_queue_limits(table, &limits);
> > > >  		if (r) {
> > > >  			map = ERR_PTR(r);
> > > > @@ -2877,7 +2891,7 @@ struct dm_table *dm_swap_table(struct
> > > > mapped_device *md, struct dm_table *table)
> > > >  		}
> > > >  	}
> > > >  
> > > > -	map = __bind(md, table, &limits);
> > > > +	map = __bind(md, table, update_limits ? &limits : NULL);
> > > >  	dm_issue_global_event();
> > > >  
> > > >  out:
> > > > @@ -2930,7 +2944,6 @@ static int __dm_suspend(struct
> > > > mapped_device
> > > > *md, struct dm_table *map,
> > > >  
> > > >  	/*
> > > >  	 * DMF_NOFLUSH_SUSPENDING must be set before presuspend.
> > > > -	 * This flag is cleared before dm_suspend returns.
> > > >  	 */
> > > >  	if (noflush)
> > > >  		set_bit(DMF_NOFLUSH_SUSPENDING, &md->flags);
> > > > @@ -2993,8 +3006,6 @@ static int __dm_suspend(struct
> > > > mapped_device
> > > > *md, struct dm_table *map,
> > > >  	if (!r)
> > > >  		set_bit(dmf_suspended_flag, &md->flags);
> > > >  
> > > > -	if (noflush)
> > > > -		clear_bit(DMF_NOFLUSH_SUSPENDING, &md->flags);
> > > >  	if (map)
> > > >  		synchronize_srcu(&md->io_barrier);
> > > 
> > > This moves the clear_bit() behind synchronize_rcu(). Is that safe?
> > 
> > It's not just moved behind the synchronize_rcu().
> > DMF_NOFLUSH_SUSPENDING
> > is now set for the entire time the device is suspended. If people
> > would
> > like, I'll update the patch to rename it to DMF_NOFLUSH_SUSPEND.
> 
> Right. I realize now that there's a smp_mb__after_atomic() in
> dm_queue_flush().
> 
> > I did check to see if holding it for the entire suspend would cause
> > issues, but I didn't see any case where it would. If I missed a 
> > case where __noflush_suspending() should only return true if we are
> > actually in the process of suspending, I can easily update that
> > function to do that.
> 
> If this is necessary, I agree that the flag an related function should
> be renamed. But there are already generic DM flags to indicate that a
> queue is suspend*ed*. Perhaps, instead of changing the semantics of
> DMF_NOFLUSH_SUSPENDING, it would make more sense to test 
> 
>   (__noflush_suspending || test_bit(DMF_BLOCK_IO_FOR_SUSPEND)
> 
> in dm_swap_table()?

Won't we ALWAYS be suspended when we are in dm_swap_table()? We do need
to refresh the limits in some cases (the cases where multipath-tools
currently reloads the table without setting noflush). What we need to
know is "is this table swap happening in a noflush suspend, where
userspace understands that it can't modify the device table in a way
that would change the limits". For multipath, this is almost always the
case. 

> 
> Anyway, I am still not convinced that we want to have this specific
> exception for multipath only.

I agree that Bart's solution looks better to me, if it can get in.
 
> > > In general, I'm wondering whether we need a more generic solution
> > > to
> > > this problem. Therefore I've added linux-block to cc.
> > > 
> > > The way I see it, if a device has queued IO without any means to
> > > perform the IO, it can't be frozen. We'd either need to fail all
> > > queued
> > > IO in this case, or refuse attempts to freeze the queue.
> > 
> > In general, it's perfectly normal to start freezing the queue while
> > there are still outstanding requests.
> 
> Sure, but my point was "without any means to perform the IO". As you
> pointed out yourself, a freeze attempt in this situation would get
> stuck.
> 
> I find Bart's approach very attractive; freezing might not be necessary
> at all in that case. We'dd just need to avoid a race where paths get
> reinstated while the operation that would normally have required a
> freeze is ongoing.

I agree. Even just the timing out of freezes, his
"[PATCH 2/3] block: Restrict the duration of sysfs attribute changes"
would be enough to keep this from deadlocking the system.

-Ben

> Martin


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dm: Fix deadlock when reloading a multipath table
  2025-10-21 19:16       ` Benjamin Marzinski
@ 2025-10-22 14:11         ` Martin Wilck
  2025-10-23 17:39           ` Benjamin Marzinski
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Wilck @ 2025-10-22 14:11 UTC (permalink / raw)
  To: Benjamin Marzinski
  Cc: Bart Van Assche, Mikulas Patocka, Mike Snitzer, dm-devel,
	linux-block

On Tue, 2025-10-21 at 15:16 -0400, Benjamin Marzinski wrote:
> On Fri, Oct 10, 2025 at 12:19:51PM +0200, Martin Wilck wrote:
> > On Thu, 2025-10-09 at 18:25 -0400, Benjamin Marzinski wrote:
> > 
> > 
> > > I did check to see if holding it for the entire suspend would
> > > cause
> > > issues, but I didn't see any case where it would. If I missed a 
> > > case where __noflush_suspending() should only return true if we
> > > are
> > > actually in the process of suspending, I can easily update that
> > > function to do that.
> > 
> > If this is necessary, I agree that the flag an related function
> > should
> > be renamed. But there are already generic DM flags to indicate that
> > a
> > queue is suspend*ed*. Perhaps, instead of changing the semantics of
> > DMF_NOFLUSH_SUSPENDING, it would make more sense to test 
> > 
> >   (__noflush_suspending || test_bit(DMF_BLOCK_IO_FOR_SUSPEND)
> > 
> > in dm_swap_table()?
> 
> Won't we ALWAYS be suspended when we are in dm_swap_table()? We do
> need
> to refresh the limits in some cases (the cases where multipath-tools
> currently reloads the table without setting noflush). What we need to
> know is "is this table swap happening in a noflush suspend, where
> userspace understands that it can't modify the device table in a way
> that would change the limits". For multipath, this is almost always
> the
> case. 

Ok, getting it now. The semantics of the flag are changed from "device
is noflush-suspending" to "device is either noflush-suspending or
noflush-suspended". It isn't easy to express this in a simple flag
name. I'm fine with not renaming the flag, if a comment is added that
explains the semantics clearly.

> > 
> > I find Bart's approach very attractive; freezing might not be
> > necessary
> > at all in that case. We'dd just need to avoid a race where paths
> > get
> > reinstated while the operation that would normally have required a
> > freeze is ongoing.
> 
> I agree. Even just the timing out of freezes, his
> "[PATCH 2/3] block: Restrict the duration of sysfs attribute changes"
> would be enough to keep this from deadlocking the system.
> 

OK, let's see how it goes. Given your explanations, I'm ok with your
patch, too.

Martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dm: Fix deadlock when reloading a multipath table
  2025-10-22 14:11         ` Martin Wilck
@ 2025-10-23 17:39           ` Benjamin Marzinski
  2025-10-23 17:54             ` Bart Van Assche
  0 siblings, 1 reply; 10+ messages in thread
From: Benjamin Marzinski @ 2025-10-23 17:39 UTC (permalink / raw)
  To: Martin Wilck
  Cc: Bart Van Assche, Mikulas Patocka, Mike Snitzer, dm-devel,
	linux-block

On Wed, Oct 22, 2025 at 04:11:58PM +0200, Martin Wilck wrote:
> On Tue, 2025-10-21 at 15:16 -0400, Benjamin Marzinski wrote:
> > On Fri, Oct 10, 2025 at 12:19:51PM +0200, Martin Wilck wrote:
> > > On Thu, 2025-10-09 at 18:25 -0400, Benjamin Marzinski wrote:
> > > 
> > > 
> > > > I did check to see if holding it for the entire suspend would
> > > > cause
> > > > issues, but I didn't see any case where it would. If I missed a 
> > > > case where __noflush_suspending() should only return true if we
> > > > are
> > > > actually in the process of suspending, I can easily update that
> > > > function to do that.
> > > 
> > > If this is necessary, I agree that the flag an related function
> > > should
> > > be renamed. But there are already generic DM flags to indicate that
> > > a
> > > queue is suspend*ed*. Perhaps, instead of changing the semantics of
> > > DMF_NOFLUSH_SUSPENDING, it would make more sense to test 
> > > 
> > >   (__noflush_suspending || test_bit(DMF_BLOCK_IO_FOR_SUSPEND)
> > > 
> > > in dm_swap_table()?
> > 
> > Won't we ALWAYS be suspended when we are in dm_swap_table()? We do
> > need
> > to refresh the limits in some cases (the cases where multipath-tools
> > currently reloads the table without setting noflush). What we need to
> > know is "is this table swap happening in a noflush suspend, where
> > userspace understands that it can't modify the device table in a way
> > that would change the limits". For multipath, this is almost always
> > the
> > case. 
> 
> Ok, getting it now. The semantics of the flag are changed from "device
> is noflush-suspending" to "device is either noflush-suspending or
> noflush-suspended". It isn't easy to express this in a simple flag
> name. I'm fine with not renaming the flag, if a comment is added that
> explains the semantics clearly.
> 
> > > 
> > > I find Bart's approach very attractive; freezing might not be
> > > necessary
> > > at all in that case. We'dd just need to avoid a race where paths
> > > get
> > > reinstated while the operation that would normally have required a
> > > freeze is ongoing.
> > 
> > I agree. Even just the timing out of freezes, his
> > "[PATCH 2/3] block: Restrict the duration of sysfs attribute changes"
> > would be enough to keep this from deadlocking the system.
> > 
> 
> OK, let's see how it goes. Given your explanations, I'm ok with your
> patch, too.

I see Mikulas pulled this commit into linux-dm. Bart, does this solve
your issue? Looking at your hang, it should. Also, do you have any
interest in attempting again to get your fixes upstream?

-Ben

> 
> Martin


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dm: Fix deadlock when reloading a multipath table
  2025-10-23 17:39           ` Benjamin Marzinski
@ 2025-10-23 17:54             ` Bart Van Assche
  0 siblings, 0 replies; 10+ messages in thread
From: Bart Van Assche @ 2025-10-23 17:54 UTC (permalink / raw)
  To: Benjamin Marzinski, Martin Wilck
  Cc: Bart Van Assche, Mikulas Patocka, Mike Snitzer, dm-devel,
	linux-block

On 10/23/25 10:39 AM, Benjamin Marzinski wrote:
> On Wed, Oct 22, 2025 at 04:11:58PM +0200, Martin Wilck wrote:
>> On Tue, 2025-10-21 at 15:16 -0400, Benjamin Marzinski wrote:
>>> On Fri, Oct 10, 2025 at 12:19:51PM +0200, Martin Wilck wrote:
>>>> I find Bart's approach very attractive; freezing might not be
>>>> necessary
>>>> at all in that case. We'dd just need to avoid a race where paths
>>>> get
>>>> reinstated while the operation that would normally have required a
>>>> freeze is ongoing.
>>>
>>> I agree. Even just the timing out of freezes, his
>>> "[PATCH 2/3] block: Restrict the duration of sysfs attribute changes"
>>> would be enough to keep this from deadlocking the system.
>>>
>>
>> OK, let's see how it goes. Given your explanations, I'm ok with your
>> patch, too.
> 
> I see Mikulas pulled this commit into linux-dm. Bart, does this solve
> your issue? Looking at your hang, it should. Also, do you have any
> interest in attempting again to get your fixes upstream?

I think that I have done what I could to get this regression fixed. As
one can see here I reported on June 30 that a regression got introduced
in the block layer:
https://lore.kernel.org/linux-block/765d62a8-bb5d-4f1d-8996-afc005bc2d1d@acm.org/

Although Linus Torvalds does not tolerate regressions in the Linux
kernel, the people who introduced this regression were not impressed
by my email, refused my patches, did not revert their own patches that
introduced the regression and did not suggest an alternative approach
for fixing the regression.

Bart.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-10-23 17:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20251009030431.2895495-1-bmarzins@redhat.com>
2025-10-09  9:57 ` [PATCH] dm: Fix deadlock when reloading a multipath table Martin Wilck
2025-10-09 22:25   ` Benjamin Marzinski
2025-10-10 10:19     ` Martin Wilck
2025-10-21 19:16       ` Benjamin Marzinski
2025-10-22 14:11         ` Martin Wilck
2025-10-23 17:39           ` Benjamin Marzinski
2025-10-23 17:54             ` Bart Van Assche
2025-10-09 23:29   ` Bart Van Assche
2025-10-10  0:39     ` Benjamin Marzinski
2025-10-10  9:23     ` Martin Wilck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).