linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] block: fix deadlock caused by atomic limits update
@ 2024-12-16  8:02 Ming Lei
  2024-12-16  8:02 ` [PATCH 1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits Ming Lei
  2024-12-16  8:02 ` [PATCH 2/2] block: remove queue_limits_cancel_update() Ming Lei
  0 siblings, 2 replies; 23+ messages in thread
From: Ming Lei @ 2024-12-16  8:02 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Christoph Hellwig, Ming Lei

Hello,

The 1st patch fixes deadlock caused by atomic limits update.

The 2nd patch removes queue_limits_cancel_update() which becomes nop
now.


Ming Lei (2):
  block: avoid to hold q->limits_lock across APIs for atomic update
    queue limits
  block: remove queue_limits_cancel_update()

 block/blk-settings.c     |  2 +-
 drivers/md/md.c          |  1 -
 drivers/md/raid0.c       |  4 +---
 drivers/md/raid1.c       |  4 +---
 drivers/md/raid10.c      |  4 +---
 drivers/scsi/scsi_scan.c |  1 -
 include/linux/blkdev.h   | 20 ++++++--------------
 7 files changed, 10 insertions(+), 26 deletions(-)

-- 
2.47.0


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits
  2024-12-16  8:02 [PATCH 0/2] block: fix deadlock caused by atomic limits update Ming Lei
@ 2024-12-16  8:02 ` Ming Lei
  2024-12-16 15:49   ` Christoph Hellwig
  2024-12-16  8:02 ` [PATCH 2/2] block: remove queue_limits_cancel_update() Ming Lei
  1 sibling, 1 reply; 23+ messages in thread
From: Ming Lei @ 2024-12-16  8:02 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Christoph Hellwig, Ming Lei

Commit d690cb8ae14b ("block: add an API to atomically update queue limits")
adds APIs for updating queue limits atomically. And q->limits_lock is
grabbed in queue_limits_start_update(), and released in queue_limits_commit_update().

This way is very fragile and easy to introduce deadlock[1][2].

More importantly, queue_limits_start_update() returns one local copy of
q->limits, then the API user overwrites the local copy, and commit the
copy by queue_limits_commit_update() finally.

So holding q->limits_lock protects nothing for the local copy, and not see
any real help by preventing new update & commit from happening, cause
what matters is that we do validation & commit atomically.

Changes the API to not hold q->limits_lock across atomic queue limits update
APIs for fixing deadlock & making it easy to use.

[1] https://lore.kernel.org/linux-block/Z1A8fai9_fQFhs1s@hovoldconsulting.com/
[2] https://lore.kernel.org/linux-scsi/ZxG38G9BuFdBpBHZ@fedora/

Fixes: d690cb8ae14b ("block: add an API to atomically update queue limits")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-settings.c   | 2 +-
 include/linux/blkdev.h | 8 ++++++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 8f09e33f41f6..b737428c6084 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -422,6 +422,7 @@ int queue_limits_commit_update(struct request_queue *q,
 {
 	int error;
 
+	mutex_lock(&q->limits_lock);
 	error = blk_validate_limits(lim);
 	if (error)
 		goto out_unlock;
@@ -456,7 +457,6 @@ EXPORT_SYMBOL_GPL(queue_limits_commit_update);
  */
 int queue_limits_set(struct request_queue *q, struct queue_limits *lim)
 {
-	mutex_lock(&q->limits_lock);
 	return queue_limits_commit_update(q, lim);
 }
 EXPORT_SYMBOL_GPL(queue_limits_set);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 378d3a1a22fc..6cc20ca79adc 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -944,8 +944,13 @@ static inline unsigned int blk_boundary_sectors_left(sector_t offset,
 static inline struct queue_limits
 queue_limits_start_update(struct request_queue *q)
 {
+	struct queue_limits lim;
+
 	mutex_lock(&q->limits_lock);
-	return q->limits;
+	lim = q->limits;
+	mutex_unlock(&q->limits_lock);
+
+	return lim;
 }
 int queue_limits_commit_update(struct request_queue *q,
 		struct queue_limits *lim);
@@ -962,7 +967,6 @@ int blk_validate_limits(struct queue_limits *lim);
  */
 static inline void queue_limits_cancel_update(struct request_queue *q)
 {
-	mutex_unlock(&q->limits_lock);
 }
 
 /*
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/2] block: remove queue_limits_cancel_update()
  2024-12-16  8:02 [PATCH 0/2] block: fix deadlock caused by atomic limits update Ming Lei
  2024-12-16  8:02 ` [PATCH 1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits Ming Lei
@ 2024-12-16  8:02 ` Ming Lei
  1 sibling, 0 replies; 23+ messages in thread
From: Ming Lei @ 2024-12-16  8:02 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Christoph Hellwig, Ming Lei

Now queue_limits_cancel_update() becomes nop, so remove it.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/md/md.c          |  1 -
 drivers/md/raid0.c       |  4 +---
 drivers/md/raid1.c       |  4 +---
 drivers/md/raid10.c      |  4 +---
 drivers/scsi/scsi_scan.c |  1 -
 include/linux/blkdev.h   | 12 ------------
 6 files changed, 3 insertions(+), 23 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index aebe12b0ee27..4a3e109dfa11 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5788,7 +5788,6 @@ int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev)
 	if (!queue_limits_stack_integrity_bdev(&lim, rdev->bdev)) {
 		pr_err("%s: incompatible integrity profile for %pg\n",
 		       mdname(mddev), rdev->bdev);
-		queue_limits_cancel_update(mddev->gendisk->queue);
 		return -ENXIO;
 	}
 
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 7049ec7fb8eb..e8802309ed60 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -386,10 +386,8 @@ static int raid0_set_limits(struct mddev *mddev)
 	lim.io_opt = lim.io_min * mddev->raid_disks;
 	lim.features |= BLK_FEAT_ATOMIC_WRITES_STACKED;
 	err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY);
-	if (err) {
-		queue_limits_cancel_update(mddev->gendisk->queue);
+	if (err)
 		return err;
-	}
 	return queue_limits_set(mddev->gendisk->queue, &lim);
 }
 
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 519c56f0ee3d..c6e53cc57440 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -3241,10 +3241,8 @@ static int raid1_set_limits(struct mddev *mddev)
 	lim.max_write_zeroes_sectors = 0;
 	lim.features |= BLK_FEAT_ATOMIC_WRITES_STACKED;
 	err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY);
-	if (err) {
-		queue_limits_cancel_update(mddev->gendisk->queue);
+	if (err)
 		return err;
-	}
 	return queue_limits_set(mddev->gendisk->queue, &lim);
 }
 
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 7d7a8a2524dc..6acc96be77aa 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4042,10 +4042,8 @@ static int raid10_set_queue_limits(struct mddev *mddev)
 	lim.io_opt = lim.io_min * raid10_nr_stripes(conf);
 	lim.features |= BLK_FEAT_ATOMIC_WRITES_STACKED;
 	err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY);
-	if (err) {
-		queue_limits_cancel_update(mddev->gendisk->queue);
+	if (err)
 		return err;
-	}
 	return queue_limits_set(mddev->gendisk->queue, &lim);
 }
 
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 042329b74c6e..3e3f64cec9ee 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1079,7 +1079,6 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result,
 	else if (hostt->slave_configure)
 		ret = hostt->slave_configure(sdev);
 	if (ret) {
-		queue_limits_cancel_update(sdev->request_queue);
 		/*
 		 * If the LLDD reports device not present, don't clutter the
 		 * console with failure messages.
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 6cc20ca79adc..b2542d3dcc23 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -957,18 +957,6 @@ int queue_limits_commit_update(struct request_queue *q,
 int queue_limits_set(struct request_queue *q, struct queue_limits *lim);
 int blk_validate_limits(struct queue_limits *lim);
 
-/**
- * queue_limits_cancel_update - cancel an atomic update of queue limits
- * @q:		queue to update
- *
- * This functions cancels an atomic update of the queue limits started by
- * queue_limits_start_update() and should be used when an error occurs after
- * starting update.
- */
-static inline void queue_limits_cancel_update(struct request_queue *q)
-{
-}
-
 /*
  * These helpers are for drivers that have sloppy feature negotiation and might
  * have to disable DISCARD, WRITE_ZEROES or SECURE_DISCARD from the I/O
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits
  2024-12-16  8:02 ` [PATCH 1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits Ming Lei
@ 2024-12-16 15:49   ` Christoph Hellwig
  2024-12-17  1:52     ` Ming Lei
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2024-12-16 15:49 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Christoph Hellwig

On Mon, Dec 16, 2024 at 04:02:03PM +0800, Ming Lei wrote:
> More importantly, queue_limits_start_update() returns one local copy of
> q->limits, then the API user overwrites the local copy, and commit the
> copy by queue_limits_commit_update() finally.
> 
> So holding q->limits_lock protects nothing for the local copy, and not see
> any real help by preventing new update & commit from happening, cause
> what matters is that we do validation & commit atomically.

It protects against someone else changing the copy in the queue while
the current thread is updating the local copy, and thus incoherent
updates.  So no, we can't just remove it.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits
  2024-12-16 15:49   ` Christoph Hellwig
@ 2024-12-17  1:52     ` Ming Lei
  2024-12-17  4:40       ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Ming Lei @ 2024-12-17  1:52 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block

On Mon, Dec 16, 2024 at 04:49:01PM +0100, Christoph Hellwig wrote:
> On Mon, Dec 16, 2024 at 04:02:03PM +0800, Ming Lei wrote:
> > More importantly, queue_limits_start_update() returns one local copy of
> > q->limits, then the API user overwrites the local copy, and commit the
> > copy by queue_limits_commit_update() finally.
> > 
> > So holding q->limits_lock protects nothing for the local copy, and not see
> > any real help by preventing new update & commit from happening, cause
> > what matters is that we do validation & commit atomically.
> 
> It protects against someone else changing the copy in the queue while
> the current thread is updating the local copy, and thus incoherent
> updates.  So no, we can't just remove it.

The local copy can be updated in any way with any data, so does another
concurrent update on q->limits really matter?


thanks,
Ming


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits
  2024-12-17  1:52     ` Ming Lei
@ 2024-12-17  4:40       ` Christoph Hellwig
  2024-12-17  7:05         ` Ming Lei
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2024-12-17  4:40 UTC (permalink / raw)
  To: Ming Lei; +Cc: Christoph Hellwig, Jens Axboe, linux-block

On Tue, Dec 17, 2024 at 09:52:51AM +0800, Ming Lei wrote:
> The local copy can be updated in any way with any data, so does another
> concurrent update on q->limits really matter?

Yes, because that means one of the updates get lost even if it is
for entirely separate fields.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits
  2024-12-17  4:40       ` Christoph Hellwig
@ 2024-12-17  7:05         ` Ming Lei
  2024-12-17  7:19           ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Ming Lei @ 2024-12-17  7:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block

On Tue, Dec 17, 2024 at 05:40:56AM +0100, Christoph Hellwig wrote:
> On Tue, Dec 17, 2024 at 09:52:51AM +0800, Ming Lei wrote:
> > The local copy can be updated in any way with any data, so does another
> > concurrent update on q->limits really matter?
> 
> Yes, because that means one of the updates get lost even if it is
> for entirely separate fields.

Right, but the limits are still valid anytime.

Any suggestion for fixing this deadlock?

One idea is to add queue_limits_start_try_update() and apply it in
sysfs ->store(), in which update failure could be tolerable.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits
  2024-12-17  7:05         ` Ming Lei
@ 2024-12-17  7:19           ` Christoph Hellwig
  2024-12-17  7:30             ` Ming Lei
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2024-12-17  7:19 UTC (permalink / raw)
  To: Ming Lei; +Cc: Christoph Hellwig, Jens Axboe, linux-block

On Tue, Dec 17, 2024 at 03:05:48PM +0800, Ming Lei wrote:
> On Tue, Dec 17, 2024 at 05:40:56AM +0100, Christoph Hellwig wrote:
> > On Tue, Dec 17, 2024 at 09:52:51AM +0800, Ming Lei wrote:
> > > The local copy can be updated in any way with any data, so does another
> > > concurrent update on q->limits really matter?
> > 
> > Yes, because that means one of the updates get lost even if it is
> > for entirely separate fields.
> 
> Right, but the limits are still valid anytime.
> 
> Any suggestion for fixing this deadlock?

What is "this deadlock"?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits
  2024-12-17  7:19           ` Christoph Hellwig
@ 2024-12-17  7:30             ` Ming Lei
  2024-12-17 16:07               ` Damien Le Moal
  0 siblings, 1 reply; 23+ messages in thread
From: Ming Lei @ 2024-12-17  7:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block

On Tue, Dec 17, 2024 at 08:19:28AM +0100, Christoph Hellwig wrote:
> On Tue, Dec 17, 2024 at 03:05:48PM +0800, Ming Lei wrote:
> > On Tue, Dec 17, 2024 at 05:40:56AM +0100, Christoph Hellwig wrote:
> > > On Tue, Dec 17, 2024 at 09:52:51AM +0800, Ming Lei wrote:
> > > > The local copy can be updated in any way with any data, so does another
> > > > concurrent update on q->limits really matter?
> > > 
> > > Yes, because that means one of the updates get lost even if it is
> > > for entirely separate fields.
> > 
> > Right, but the limits are still valid anytime.
> > 
> > Any suggestion for fixing this deadlock?
> 
> What is "this deadlock"?

The commit log provides two reports:

- lockdep warning

https://lore.kernel.org/linux-block/Z1A8fai9_fQFhs1s@hovoldconsulting.com/

- real deadlock report

https://lore.kernel.org/linux-scsi/ZxG38G9BuFdBpBHZ@fedora/

It is actually one simple ABBA lock:

1) queue_attr_store()

      blk_mq_freeze_queue(q);					//queue freeze lock
      res = entry->store(disk, page, length);
	  			queue_limits_start_update		//->limits_lock
				...
				queue_limits_commit_update
      blk_mq_unfreeze_queue(q);

2) sd_revalidate_disk()

queue_limits_start_update					//->limits_lock
	sd_read_capacity()
		scsi_execute_cmd
			scsi_alloc_request
				blk_queue_enter					//queue freeze lock


Thanks,
Ming


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits
  2024-12-17  7:30             ` Ming Lei
@ 2024-12-17 16:07               ` Damien Le Moal
  2024-12-18  2:09                 ` Ming Lei
  0 siblings, 1 reply; 23+ messages in thread
From: Damien Le Moal @ 2024-12-17 16:07 UTC (permalink / raw)
  To: Ming Lei, Christoph Hellwig; +Cc: Jens Axboe, linux-block

On 2024/12/16 23:30, Ming Lei wrote:
> On Tue, Dec 17, 2024 at 08:19:28AM +0100, Christoph Hellwig wrote:
>> On Tue, Dec 17, 2024 at 03:05:48PM +0800, Ming Lei wrote:
>>> On Tue, Dec 17, 2024 at 05:40:56AM +0100, Christoph Hellwig wrote:
>>>> On Tue, Dec 17, 2024 at 09:52:51AM +0800, Ming Lei wrote:
>>>>> The local copy can be updated in any way with any data, so does another
>>>>> concurrent update on q->limits really matter?
>>>>
>>>> Yes, because that means one of the updates get lost even if it is
>>>> for entirely separate fields.
>>>
>>> Right, but the limits are still valid anytime.
>>>
>>> Any suggestion for fixing this deadlock?
>>
>> What is "this deadlock"?
> 
> The commit log provides two reports:
> 
> - lockdep warning
> 
> https://lore.kernel.org/linux-block/Z1A8fai9_fQFhs1s@hovoldconsulting.com/
> 
> - real deadlock report
> 
> https://lore.kernel.org/linux-scsi/ZxG38G9BuFdBpBHZ@fedora/
> 
> It is actually one simple ABBA lock:
> 
> 1) queue_attr_store()
> 
>       blk_mq_freeze_queue(q);					//queue freeze lock
>       res = entry->store(disk, page, length);
> 	  			queue_limits_start_update		//->limits_lock
> 				...
> 				queue_limits_commit_update
>       blk_mq_unfreeze_queue(q);

The locking + freeze pattern should be:

	lim = queue_limits_start_update(q);
	...
	blk_mq_freeze_queue(q);
	ret = queue_limits_commit_update(q, &lim);
	blk_mq_unfreeze_queue(q);

This pattern is used in most places and anything that does not use it is likely
susceptible to a similar ABBA deadlock. We should probably look into trying to
integrate the freeze/unfreeze calls directly into queue_limits_commit_update().

Fixing queue_attr_store() to use this pattern seems simpler than trying to fix
sd_revalidate_disk().

> 
> 2) sd_revalidate_disk()
> 
> queue_limits_start_update					//->limits_lock
> 	sd_read_capacity()
> 		scsi_execute_cmd
> 			scsi_alloc_request
> 				blk_queue_enter					//queue freeze lock
> 
> 
> Thanks,
> Ming
> 
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits
  2024-12-17 16:07               ` Damien Le Moal
@ 2024-12-18  2:09                 ` Ming Lei
  2024-12-18 11:33                   ` Nilay Shroff
  2024-12-19  6:15                   ` Christoph Hellwig
  0 siblings, 2 replies; 23+ messages in thread
From: Ming Lei @ 2024-12-18  2:09 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: Christoph Hellwig, Jens Axboe, linux-block

On Tue, Dec 17, 2024 at 08:07:06AM -0800, Damien Le Moal wrote:
> On 2024/12/16 23:30, Ming Lei wrote:
> > On Tue, Dec 17, 2024 at 08:19:28AM +0100, Christoph Hellwig wrote:
> >> On Tue, Dec 17, 2024 at 03:05:48PM +0800, Ming Lei wrote:
> >>> On Tue, Dec 17, 2024 at 05:40:56AM +0100, Christoph Hellwig wrote:
> >>>> On Tue, Dec 17, 2024 at 09:52:51AM +0800, Ming Lei wrote:
> >>>>> The local copy can be updated in any way with any data, so does another
> >>>>> concurrent update on q->limits really matter?
> >>>>
> >>>> Yes, because that means one of the updates get lost even if it is
> >>>> for entirely separate fields.
> >>>
> >>> Right, but the limits are still valid anytime.
> >>>
> >>> Any suggestion for fixing this deadlock?
> >>
> >> What is "this deadlock"?
> > 
> > The commit log provides two reports:
> > 
> > - lockdep warning
> > 
> > https://lore.kernel.org/linux-block/Z1A8fai9_fQFhs1s@hovoldconsulting.com/
> > 
> > - real deadlock report
> > 
> > https://lore.kernel.org/linux-scsi/ZxG38G9BuFdBpBHZ@fedora/
> > 
> > It is actually one simple ABBA lock:
> > 
> > 1) queue_attr_store()
> > 
> >       blk_mq_freeze_queue(q);					//queue freeze lock
> >       res = entry->store(disk, page, length);
> > 	  			queue_limits_start_update		//->limits_lock
> > 				...
> > 				queue_limits_commit_update
> >       blk_mq_unfreeze_queue(q);
> 
> The locking + freeze pattern should be:
> 
> 	lim = queue_limits_start_update(q);
> 	...
> 	blk_mq_freeze_queue(q);
> 	ret = queue_limits_commit_update(q, &lim);
> 	blk_mq_unfreeze_queue(q);
> 
> This pattern is used in most places and anything that does not use it is likely
> susceptible to a similar ABBA deadlock. We should probably look into trying to
> integrate the freeze/unfreeze calls directly into queue_limits_commit_update().
> 
> Fixing queue_attr_store() to use this pattern seems simpler than trying to fix
> sd_revalidate_disk().

This way looks good, just commit af2814149883 ("block: freeze the queue in
queue_attr_store") needs to be reverted, and freeze/unfreeze has to be
added to each queue attribute .store() handler.



Thanks,
Ming


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits
  2024-12-18  2:09                 ` Ming Lei
@ 2024-12-18 11:33                   ` Nilay Shroff
  2024-12-18 13:40                     ` Ming Lei
  2024-12-19  6:15                   ` Christoph Hellwig
  1 sibling, 1 reply; 23+ messages in thread
From: Nilay Shroff @ 2024-12-18 11:33 UTC (permalink / raw)
  To: Ming Lei, Damien Le Moal; +Cc: Christoph Hellwig, Jens Axboe, linux-block



On 12/18/24 07:39, Ming Lei wrote:
> On Tue, Dec 17, 2024 at 08:07:06AM -0800, Damien Le Moal wrote:
>> On 2024/12/16 23:30, Ming Lei wrote:
>>> On Tue, Dec 17, 2024 at 08:19:28AM +0100, Christoph Hellwig wrote:
>>>> On Tue, Dec 17, 2024 at 03:05:48PM +0800, Ming Lei wrote:
>>>>> On Tue, Dec 17, 2024 at 05:40:56AM +0100, Christoph Hellwig wrote:
>>>>>> On Tue, Dec 17, 2024 at 09:52:51AM +0800, Ming Lei wrote:
>>>>>>> The local copy can be updated in any way with any data, so does another
>>>>>>> concurrent update on q->limits really matter?
>>>>>>
>>>>>> Yes, because that means one of the updates get lost even if it is
>>>>>> for entirely separate fields.
>>>>>
>>>>> Right, but the limits are still valid anytime.
>>>>>
>>>>> Any suggestion for fixing this deadlock?
>>>>
>>>> What is "this deadlock"?
>>>
>>> The commit log provides two reports:
>>>
>>> - lockdep warning
>>>
>>> https://lore.kernel.org/linux-block/Z1A8fai9_fQFhs1s@hovoldconsulting.com/
>>>
>>> - real deadlock report
>>>
>>> https://lore.kernel.org/linux-scsi/ZxG38G9BuFdBpBHZ@fedora/
>>>
>>> It is actually one simple ABBA lock:
>>>
>>> 1) queue_attr_store()
>>>
>>>       blk_mq_freeze_queue(q);					//queue freeze lock
>>>       res = entry->store(disk, page, length);
>>> 	  			queue_limits_start_update		//->limits_lock
>>> 				...
>>> 				queue_limits_commit_update
>>>       blk_mq_unfreeze_queue(q);
>>
>> The locking + freeze pattern should be:
>>
>> 	lim = queue_limits_start_update(q);
>> 	...
>> 	blk_mq_freeze_queue(q);
>> 	ret = queue_limits_commit_update(q, &lim);
>> 	blk_mq_unfreeze_queue(q);
>>
>> This pattern is used in most places and anything that does not use it is likely
>> susceptible to a similar ABBA deadlock. We should probably look into trying to
>> integrate the freeze/unfreeze calls directly into queue_limits_commit_update().
>>
>> Fixing queue_attr_store() to use this pattern seems simpler than trying to fix
>> sd_revalidate_disk().
> 
> This way looks good, just commit af2814149883 ("block: freeze the queue in
> queue_attr_store") needs to be reverted, and freeze/unfreeze has to be
> added to each queue attribute .store() handler.
> 
Wouldn't it be feasible to add blk-mq freeze in queue_limits_start_update()
and blk-mq unfreeze in queue_limits_commit_update()? If we do this then 
the pattern would be, 

queue_limits_start_update(): limit-lock + freeze
queue_limits_commit_update() : unfreeze + limit-unlock  

Then in queue_attr_store() we shall just remove freeze/unfreeze.

We also need to fix few call sites where we've code block,

{
    blk_mq_freeze_queue()
    ...
    queue_limits_start_update()
    ...    
    queue_limits_commit_update()
    ...
    blk_mq_unfreeze_queue()
    
}

In the above code block, we may then replace blk_mq_freeze_queue() with
queue_limits_commit_update() and similarly replace blk_mq_unfreeze_queue() 
with queue_limits_commit_update().

{
    queue_limits_start_update()
    ...
    ...
    ...
    queue_limits_commit_update()

}

Thanks,
--Nilay



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits
  2024-12-18 11:33                   ` Nilay Shroff
@ 2024-12-18 13:40                     ` Ming Lei
  2024-12-18 14:05                       ` Nilay Shroff
  0 siblings, 1 reply; 23+ messages in thread
From: Ming Lei @ 2024-12-18 13:40 UTC (permalink / raw)
  To: Nilay Shroff; +Cc: Damien Le Moal, Christoph Hellwig, Jens Axboe, linux-block

On Wed, Dec 18, 2024 at 05:03:00PM +0530, Nilay Shroff wrote:
> 
> 
> On 12/18/24 07:39, Ming Lei wrote:
> > On Tue, Dec 17, 2024 at 08:07:06AM -0800, Damien Le Moal wrote:
> >> On 2024/12/16 23:30, Ming Lei wrote:
> >>> On Tue, Dec 17, 2024 at 08:19:28AM +0100, Christoph Hellwig wrote:
> >>>> On Tue, Dec 17, 2024 at 03:05:48PM +0800, Ming Lei wrote:
> >>>>> On Tue, Dec 17, 2024 at 05:40:56AM +0100, Christoph Hellwig wrote:
> >>>>>> On Tue, Dec 17, 2024 at 09:52:51AM +0800, Ming Lei wrote:
> >>>>>>> The local copy can be updated in any way with any data, so does another
> >>>>>>> concurrent update on q->limits really matter?
> >>>>>>
> >>>>>> Yes, because that means one of the updates get lost even if it is
> >>>>>> for entirely separate fields.
> >>>>>
> >>>>> Right, but the limits are still valid anytime.
> >>>>>
> >>>>> Any suggestion for fixing this deadlock?
> >>>>
> >>>> What is "this deadlock"?
> >>>
> >>> The commit log provides two reports:
> >>>
> >>> - lockdep warning
> >>>
> >>> https://lore.kernel.org/linux-block/Z1A8fai9_fQFhs1s@hovoldconsulting.com/
> >>>
> >>> - real deadlock report
> >>>
> >>> https://lore.kernel.org/linux-scsi/ZxG38G9BuFdBpBHZ@fedora/
> >>>
> >>> It is actually one simple ABBA lock:
> >>>
> >>> 1) queue_attr_store()
> >>>
> >>>       blk_mq_freeze_queue(q);					//queue freeze lock
> >>>       res = entry->store(disk, page, length);
> >>> 	  			queue_limits_start_update		//->limits_lock
> >>> 				...
> >>> 				queue_limits_commit_update
> >>>       blk_mq_unfreeze_queue(q);
> >>
> >> The locking + freeze pattern should be:
> >>
> >> 	lim = queue_limits_start_update(q);
> >> 	...
> >> 	blk_mq_freeze_queue(q);
> >> 	ret = queue_limits_commit_update(q, &lim);
> >> 	blk_mq_unfreeze_queue(q);
> >>
> >> This pattern is used in most places and anything that does not use it is likely
> >> susceptible to a similar ABBA deadlock. We should probably look into trying to
> >> integrate the freeze/unfreeze calls directly into queue_limits_commit_update().
> >>
> >> Fixing queue_attr_store() to use this pattern seems simpler than trying to fix
> >> sd_revalidate_disk().
> > 
> > This way looks good, just commit af2814149883 ("block: freeze the queue in
> > queue_attr_store") needs to be reverted, and freeze/unfreeze has to be
> > added to each queue attribute .store() handler.
> > 
> Wouldn't it be feasible to add blk-mq freeze in queue_limits_start_update()
> and blk-mq unfreeze in queue_limits_commit_update()? If we do this then 
> the pattern would be, 
> 
> queue_limits_start_update(): limit-lock + freeze
> queue_limits_commit_update() : unfreeze + limit-unlock  
> 
> Then in queue_attr_store() we shall just remove freeze/unfreeze.
> 
> We also need to fix few call sites where we've code block,
> 
> {
>     blk_mq_freeze_queue()
>     ...
>     queue_limits_start_update()
>     ...    
>     queue_limits_commit_update()
>     ...
>     blk_mq_unfreeze_queue()
>     
> }
> 
> In the above code block, we may then replace blk_mq_freeze_queue() with
> queue_limits_commit_update() and similarly replace blk_mq_unfreeze_queue() 
> with queue_limits_commit_update().
> 
> {
>     queue_limits_start_update()
>     ...
>     ...
>     ...
>     queue_limits_commit_update()

In sd_revalidate_disk(), blk-mq request is allocated under queue_limits_start_update(),
then ABBA deadlock is triggered since blk_queue_enter() implies same lock(freeze lock)
from blk_mq_freeze_queue().


Thanks,
Ming


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits
  2024-12-18 13:40                     ` Ming Lei
@ 2024-12-18 14:05                       ` Nilay Shroff
  2024-12-18 14:57                         ` Damien Le Moal
  2024-12-19  6:17                         ` Christoph Hellwig
  0 siblings, 2 replies; 23+ messages in thread
From: Nilay Shroff @ 2024-12-18 14:05 UTC (permalink / raw)
  To: Ming Lei; +Cc: Damien Le Moal, Christoph Hellwig, Jens Axboe, linux-block



On 12/18/24 19:10, Ming Lei wrote:
> On Wed, Dec 18, 2024 at 05:03:00PM +0530, Nilay Shroff wrote:
>>
>>
>> On 12/18/24 07:39, Ming Lei wrote:
>>> On Tue, Dec 17, 2024 at 08:07:06AM -0800, Damien Le Moal wrote:
>>>> On 2024/12/16 23:30, Ming Lei wrote:
>>>>> On Tue, Dec 17, 2024 at 08:19:28AM +0100, Christoph Hellwig wrote:
>>>>>> On Tue, Dec 17, 2024 at 03:05:48PM +0800, Ming Lei wrote:
>>>>>>> On Tue, Dec 17, 2024 at 05:40:56AM +0100, Christoph Hellwig wrote:
>>>>>>>> On Tue, Dec 17, 2024 at 09:52:51AM +0800, Ming Lei wrote:
>>>>>>>>> The local copy can be updated in any way with any data, so does another
>>>>>>>>> concurrent update on q->limits really matter?
>>>>>>>>
>>>>>>>> Yes, because that means one of the updates get lost even if it is
>>>>>>>> for entirely separate fields.
>>>>>>>
>>>>>>> Right, but the limits are still valid anytime.
>>>>>>>
>>>>>>> Any suggestion for fixing this deadlock?
>>>>>>
>>>>>> What is "this deadlock"?
>>>>>
>>>>> The commit log provides two reports:
>>>>>
>>>>> - lockdep warning
>>>>>
>>>>> https://lore.kernel.org/linux-block/Z1A8fai9_fQFhs1s@hovoldconsulting.com/
>>>>>
>>>>> - real deadlock report
>>>>>
>>>>> https://lore.kernel.org/linux-scsi/ZxG38G9BuFdBpBHZ@fedora/
>>>>>
>>>>> It is actually one simple ABBA lock:
>>>>>
>>>>> 1) queue_attr_store()
>>>>>
>>>>>       blk_mq_freeze_queue(q);					//queue freeze lock
>>>>>       res = entry->store(disk, page, length);
>>>>> 	  			queue_limits_start_update		//->limits_lock
>>>>> 				...
>>>>> 				queue_limits_commit_update
>>>>>       blk_mq_unfreeze_queue(q);
>>>>
>>>> The locking + freeze pattern should be:
>>>>
>>>> 	lim = queue_limits_start_update(q);
>>>> 	...
>>>> 	blk_mq_freeze_queue(q);
>>>> 	ret = queue_limits_commit_update(q, &lim);
>>>> 	blk_mq_unfreeze_queue(q);
>>>>
>>>> This pattern is used in most places and anything that does not use it is likely
>>>> susceptible to a similar ABBA deadlock. We should probably look into trying to
>>>> integrate the freeze/unfreeze calls directly into queue_limits_commit_update().
>>>>
>>>> Fixing queue_attr_store() to use this pattern seems simpler than trying to fix
>>>> sd_revalidate_disk().
>>>
>>> This way looks good, just commit af2814149883 ("block: freeze the queue in
>>> queue_attr_store") needs to be reverted, and freeze/unfreeze has to be
>>> added to each queue attribute .store() handler.
>>>
>> Wouldn't it be feasible to add blk-mq freeze in queue_limits_start_update()
>> and blk-mq unfreeze in queue_limits_commit_update()? If we do this then 
>> the pattern would be, 
>>
>> queue_limits_start_update(): limit-lock + freeze
>> queue_limits_commit_update() : unfreeze + limit-unlock  
>>
>> Then in queue_attr_store() we shall just remove freeze/unfreeze.
>>
>> We also need to fix few call sites where we've code block,
>>
>> {
>>     blk_mq_freeze_queue()
>>     ...
>>     queue_limits_start_update()
>>     ...    
>>     queue_limits_commit_update()
>>     ...
>>     blk_mq_unfreeze_queue()
>>     
>> }
>>
>> In the above code block, we may then replace blk_mq_freeze_queue() with
>> queue_limits_commit_update() and similarly replace blk_mq_unfreeze_queue() 
>> with queue_limits_commit_update().
>>
>> {
>>     queue_limits_start_update()
>>     ...
>>     ...
>>     ...
>>     queue_limits_commit_update()
> 
> In sd_revalidate_disk(), blk-mq request is allocated under queue_limits_start_update(),
> then ABBA deadlock is triggered since blk_queue_enter() implies same lock(freeze lock)
> from blk_mq_freeze_queue().
> 
Yeah agreed but I see sd_revalidate_disk() is probably the only exception 
which allocates the blk-mq request. Can't we fix it? 

One simple way I could think of would be updating queue_limit_xxx() APIs and
then use it,

queue_limits_start_update(struct request_queue *q, bool freeze) 
{
    ...
    mutex_lock(&q->limits_lock);
    if(freeze)
        blk_mq_freeze_queue(q);
    ...
}

queue_limits_commit_update(struct request_queue *q,
		struct queue_limits *lim, bool unfreeze)
{
    ...
    ...
    if(unfreeze)
        blk_mq_unfreeze_queue(q);
    mutex_unlock(&q->limits_lock);
}

sd_revalidate_disk()
{
    ...
    ...
    queue_limits_start_update(q, false);

    ...
    ...

    blk_mq_freeze_queue(q);
    queue_limits_commit_update(q, lim, true);        
}

Thanks,
--Nilay


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits
  2024-12-18 14:05                       ` Nilay Shroff
@ 2024-12-18 14:57                         ` Damien Le Moal
  2024-12-19  6:20                           ` Christoph Hellwig
  2024-12-19  6:17                         ` Christoph Hellwig
  1 sibling, 1 reply; 23+ messages in thread
From: Damien Le Moal @ 2024-12-18 14:57 UTC (permalink / raw)
  To: Nilay Shroff, Ming Lei; +Cc: Christoph Hellwig, Jens Axboe, linux-block

On 2024/12/18 6:05, Nilay Shroff wrote:
>>> In the above code block, we may then replace blk_mq_freeze_queue() with
>>> queue_limits_commit_update() and similarly replace blk_mq_unfreeze_queue() 
>>> with queue_limits_commit_update().
>>>
>>> {
>>>     queue_limits_start_update()
>>>     ...
>>>     ...
>>>     ...
>>>     queue_limits_commit_update()
>>
>> In sd_revalidate_disk(), blk-mq request is allocated under queue_limits_start_update(),
>> then ABBA deadlock is triggered since blk_queue_enter() implies same lock(freeze lock)
>> from blk_mq_freeze_queue().
>>
> Yeah agreed but I see sd_revalidate_disk() is probably the only exception 
> which allocates the blk-mq request. Can't we fix it? 

If we change where limits_lock is taken now, we will again introduce races
between user config and discovery/revalidation, which is what
queue_limits_start_update() and queue_limits_commit_update() intended to fix in
the first place.

So changing sd_revalidate_disk() is not the right approach.

> One simple way I could think of would be updating queue_limit_xxx() APIs and
> then use it,
> 
> queue_limits_start_update(struct request_queue *q, bool freeze) 
> {
>     ...
>     mutex_lock(&q->limits_lock);
>     if(freeze)
>         blk_mq_freeze_queue(q);
>     ...
> }
> 
> queue_limits_commit_update(struct request_queue *q,
> 		struct queue_limits *lim, bool unfreeze)
> {
>     ...
>     ...
>     if(unfreeze)
>         blk_mq_unfreeze_queue(q);
>     mutex_unlock(&q->limits_lock);
> }
> 
> sd_revalidate_disk()
> {
>     ...
>     ...
>     queue_limits_start_update(q, false);
> 
>     ...
>     ...
> 
>     blk_mq_freeze_queue(q);
>     queue_limits_commit_update(q, lim, true);    

This is overly complicated ... As I suggested, I think that a simpler approach
is to call blk_mq_freeze_queue() and blk_mq_unfreeze_queue() inside
queue_limits_commit_update(). Doing so, no driver should need to directly call
freeze/unfreeze. But that would be a cleanup. Let's first fix the few instances
that have the update/freeze order wrong. As mentioned, the pattern simply needs
to be:

	queue_limits_start_update();
	...
	blk_mq_freeze_queue();
	queue_limits_commit_update();
	blk_mq_unfreeze_queue();

I fixed blk-zoned code last week like this. But I had not noticed that other
places had a similar issue as well.

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits
  2024-12-18  2:09                 ` Ming Lei
  2024-12-18 11:33                   ` Nilay Shroff
@ 2024-12-19  6:15                   ` Christoph Hellwig
  1 sibling, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-12-19  6:15 UTC (permalink / raw)
  To: Ming Lei; +Cc: Damien Le Moal, Christoph Hellwig, Jens Axboe, linux-block

On Wed, Dec 18, 2024 at 10:09:24AM +0800, Ming Lei wrote:
> This way looks good, just commit af2814149883 ("block: freeze the queue in
> queue_attr_store") needs to be reverted, and freeze/unfreeze has to be
> added to each queue attribute .store() handler.

Actually the sysfs handlers need a lot more work, including sorting
out the sysfs_lock mess.  Gimme a little time to work on it.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits
  2024-12-18 14:05                       ` Nilay Shroff
  2024-12-18 14:57                         ` Damien Le Moal
@ 2024-12-19  6:17                         ` Christoph Hellwig
  1 sibling, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-12-19  6:17 UTC (permalink / raw)
  To: Nilay Shroff
  Cc: Ming Lei, Damien Le Moal, Christoph Hellwig, Jens Axboe,
	linux-block

On Wed, Dec 18, 2024 at 07:35:36PM +0530, Nilay Shroff wrote:
> One simple way I could think of would be updating queue_limit_xxx() APIs and
> then use it,
> 
> queue_limits_start_update(struct request_queue *q, bool freeze) 
> {
>     ...
>     mutex_lock(&q->limits_lock);
>     if(freeze)
>         blk_mq_freeze_queue(q);
>     ...

Conditional locking (and the freeze sort of is a lock) is always a bad
idea.  But we can do a lower level no-free API or this use case.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits
  2024-12-18 14:57                         ` Damien Le Moal
@ 2024-12-19  6:20                           ` Christoph Hellwig
  2024-12-19  7:16                             ` Nilay Shroff
  2024-12-21 13:03                             ` Nilay Shroff
  0 siblings, 2 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-12-19  6:20 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Nilay Shroff, Ming Lei, Christoph Hellwig, Jens Axboe,
	linux-block

On Wed, Dec 18, 2024 at 06:57:45AM -0800, Damien Le Moal wrote:
> > Yeah agreed but I see sd_revalidate_disk() is probably the only exception 
> > which allocates the blk-mq request. Can't we fix it? 
> 
> If we change where limits_lock is taken now, we will again introduce races
> between user config and discovery/revalidation, which is what
> queue_limits_start_update() and queue_limits_commit_update() intended to fix in
> the first place.
> 
> So changing sd_revalidate_disk() is not the right approach.

Well, sd_revalidate_disk is a bit special in that it needs a command
on the same queue to query the information.  So it needs to be able
to issue commands without the queue frozen.  Freezing the queue inside
the limits lock support that, sd just can't use the convenience helpers
that lock and freeze.

> This is overly complicated ... As I suggested, I think that a simpler approach
> is to call blk_mq_freeze_queue() and blk_mq_unfreeze_queue() inside
> queue_limits_commit_update(). Doing so, no driver should need to directly call
> freeze/unfreeze. But that would be a cleanup. Let's first fix the few instances
> that have the update/freeze order wrong. As mentioned, the pattern simply needs

Yes, the queue only needs to be frozen for the actual update,
which would remove the need for the locking.  The big question for both
variants is if we can get rid of all the callers that have the queue
already frozen and then start an update.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits
  2024-12-19  6:20                           ` Christoph Hellwig
@ 2024-12-19  7:16                             ` Nilay Shroff
  2024-12-21 13:03                             ` Nilay Shroff
  1 sibling, 0 replies; 23+ messages in thread
From: Nilay Shroff @ 2024-12-19  7:16 UTC (permalink / raw)
  To: Christoph Hellwig, Damien Le Moal; +Cc: Ming Lei, Jens Axboe, linux-block



On 12/19/24 11:50, Christoph Hellwig wrote:
> On Wed, Dec 18, 2024 at 06:57:45AM -0800, Damien Le Moal wrote:
>>> Yeah agreed but I see sd_revalidate_disk() is probably the only exception 
>>> which allocates the blk-mq request. Can't we fix it? 
>>
>> If we change where limits_lock is taken now, we will again introduce races
>> between user config and discovery/revalidation, which is what
>> queue_limits_start_update() and queue_limits_commit_update() intended to fix in
>> the first place.
>>
>> So changing sd_revalidate_disk() is not the right approach.
> 
> Well, sd_revalidate_disk is a bit special in that it needs a command
> on the same queue to query the information.  So it needs to be able
> to issue commands without the queue frozen.  Freezing the queue inside
> the limits lock support that, sd just can't use the convenience helpers
> that lock and freeze.
> 
>> This is overly complicated ... As I suggested, I think that a simpler approach
>> is to call blk_mq_freeze_queue() and blk_mq_unfreeze_queue() inside
>> queue_limits_commit_update(). Doing so, no driver should need to directly call
>> freeze/unfreeze. But that would be a cleanup. Let's first fix the few instances
>> that have the update/freeze order wrong. As mentioned, the pattern simply needs
> 
> Yes, the queue only needs to be frozen for the actual update,
> which would remove the need for the locking.  The big question for both
> variants is if we can get rid of all the callers that have the queue
> already frozen and then start an update.
> 
Yes agreed that in most cases we only needs the queue to be frozen while 
committing the update, however we do have few call sites (in nvme driver)
where I see we freeze queue before actually starting update. And looking 
at those call sites it seems that we probably do require freezing the 
queue. One example from NVMe driver,

nvme_update_ns_info_block()
{
    ...
    ...

    blk_mq_freeze_queue(ns->disk->queue);
    ns->head->lba_shift = id->lbaf[lbaf].ds;
    ns->head->nuse = le64_to_cpu(id->nuse);
    capacity = nvme_lba_to_sect(ns->head, le64_to_cpu(id->nsze));

    lim = queue_limits_start_update(ns->disk->queue);
    ...
    ...
    queue_limits_commit_update();
    ...
    set_capacity_and_notify(ns->disk, capacity);
    ...
    set_disk_ro(ns->disk, nvme_ns_is_readonly(ns, info));
    set_bit(NVME_NS_READY, &ns->flags);
    blk_mq_unfreeze_queue(ns->disk->queue);
    ...
}

So looking at the above example, I earlier proposed  freezing the queue 
in queue_limits_start_update() and then unfreezing the queue in 
queue_limits_commit_update(). In the above code then we could replace 
blk_mq_freeze_queue() with queue_limits_start_update() and 
blk_mq_unfreeze_queue() with queue_limits_commit_update() and get rid 
of the original call sites of start/commit update APIs. Having said 
that, I am open for any other better suggestions and one of the suggestion 
is from Damien about calling blk_mq_freeze_queue() and blk_mq_unfreeze_queue() 
inside queue_limits_commit_update(). But then I wonder how would we fix the 
call sites as shown above with this approach.

Thanks,
--Nilay

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits
  2024-12-19  6:20                           ` Christoph Hellwig
  2024-12-19  7:16                             ` Nilay Shroff
@ 2024-12-21 13:03                             ` Nilay Shroff
  2024-12-30  9:02                               ` Ming Lei
  1 sibling, 1 reply; 23+ messages in thread
From: Nilay Shroff @ 2024-12-21 13:03 UTC (permalink / raw)
  To: Christoph Hellwig, Damien Le Moal; +Cc: Ming Lei, Jens Axboe, linux-block

[-- Attachment #1: Type: text/plain, Size: 4142 bytes --]



On 12/19/24 11:50, Christoph Hellwig wrote:
> On Wed, Dec 18, 2024 at 06:57:45AM -0800, Damien Le Moal wrote:
>>> Yeah agreed but I see sd_revalidate_disk() is probably the only exception 
>>> which allocates the blk-mq request. Can't we fix it? 
>>
>> If we change where limits_lock is taken now, we will again introduce races
>> between user config and discovery/revalidation, which is what
>> queue_limits_start_update() and queue_limits_commit_update() intended to fix in
>> the first place.
>>
>> So changing sd_revalidate_disk() is not the right approach.
> 
> Well, sd_revalidate_disk is a bit special in that it needs a command
> on the same queue to query the information.  So it needs to be able
> to issue commands without the queue frozen.  Freezing the queue inside
> the limits lock support that, sd just can't use the convenience helpers
> that lock and freeze.
> 
>> This is overly complicated ... As I suggested, I think that a simpler approach
>> is to call blk_mq_freeze_queue() and blk_mq_unfreeze_queue() inside
>> queue_limits_commit_update(). Doing so, no driver should need to directly call
>> freeze/unfreeze. But that would be a cleanup. Let's first fix the few instances
>> that have the update/freeze order wrong. As mentioned, the pattern simply needs
> 
> Yes, the queue only needs to be frozen for the actual update,
> which would remove the need for the locking.  The big question for both
> variants is if we can get rid of all the callers that have the queue
> already frozen and then start an update.
> 
After thinking for a while I found that broadly we've four categories of users
which need this pattern of limits-lock and/or queue-freeze:

1. Callers which need acquiring limits-lock while starting the update; and freezing 
   queue only when committing the update:
   - sd_revalidate_disk
   - nvme_init_identify
   - loop_clear_limits
   - few more...

2. Callers which need both freezing the queue and acquiring limits-lock while starting
   the update:
   - nvme_update_ns_info_block
   - nvme_update_ns_info_generic
   - few more... 

3. Callers which neither need acquiring limits-lock nor require freezing queue as for 
   these set of callers in the call stack limits-lock is already acquired and queue is 
   already frozen:
   - __blk_mq_update_nr_hw_queues
   - queue_xxx_store and helpers

4. Callers which only need acquiring limits-lock; freezing queue may not be needed
   for such callers even while committing update:
   - scsi_add_lun
   - nvme_init_identify
   - few more...

IMO, we may covert category #4 users into category #1, as it may not harm even if 
we momentarily freeze the queue while committing the update. 

Then, for each of the above category we may define helpers show below:

// For category-3:

static inline struct queue_limits
get_queue_limits(struct request_queue *q)
{
	return q->limits;
}
int set_queue_limits(struct request_queue *q,
		struct queue_limits *lim)
{
	int error;

	error = blk_validate_limits(lim);
        ...
        ...
	q->limits = *lim;
	if (q->disk)
		blk_apply_bdi_limits(q->disk->bdi, lim);

	return error;
}

// For category-1:

static inline struct queue_limits
__queue_limits_start_update(struct request_queue *q)
{
	mutex_lock(&q->limits_lock);
	return q->limits;
}
int __queue_limits_commit_update(struct request_queue *q,
		struct queue_limits *lim)
{
	int error;

	blk_mq_freeze_queue(q);
	error = set_queue_limits(q, lim);
	blk_mq_unfreeze_queue(q);
	mutex_unlock(&q->limits_lock);

	return error;
}

// For category-2 :
static inline struct queue_limits
queue_limits_start_update(struct request_queue *q)
{
	mutex_lock(&q->limits_lock);
	blk_mq_freeze_queue(q);
	return q->limits;
}
int queue_limits_commit_update(struct request_queue *q,
		struct queue_limits *lim)
{
	int error;

	error = set_queue_limits(q, lim);
	blk_mq_unfreeze_queue(q);
	mutex_unlock(&q->limits_lock);

	return error;
}

With above helpers, I updated each caller based on in which category it fits in. 
For reference, attached the full diff. With this change, I ran blktests to ensure
that we don't see any lockdep splat or failures. 

Thanks,
--Nilay 

[-- Attachment #2: blk-mq-limits-lock.diff --]
[-- Type: text/x-patch, Size: 23570 bytes --]

diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index b180cac61a9d..6d5f3664bb91 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -212,15 +212,13 @@ static ssize_t flag_store(struct device *dev, const char *page, size_t count,
 		return err;
 
 	/* note that the flags are inverted vs the values in the sysfs files */
-	lim = queue_limits_start_update(q);
+	lim = __queue_limits_start_update(q);
 	if (val)
 		lim.integrity.flags &= ~flag;
 	else
 		lim.integrity.flags |= flag;
 
-	blk_mq_freeze_queue(q);
-	err = queue_limits_commit_update(q, &lim);
-	blk_mq_unfreeze_queue(q);
+	err = __queue_limits_commit_update(q, &lim);
 	if (err)
 		return err;
 	return count;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6b6111513986..0d9efe543b41 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4994,6 +4994,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 	list_for_each_entry(q, &set->tag_list, tag_set_list) {
 		mutex_lock(&q->sysfs_dir_lock);
 		mutex_lock(&q->sysfs_lock);
+		mutex_lock(&q->limits_lock);
 		blk_mq_freeze_queue(q);
 	}
 	/*
@@ -5031,12 +5032,12 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 			set->nr_hw_queues = prev_nr_hw_queues;
 			goto fallback;
 		}
-		lim = queue_limits_start_update(q);
+		lim = get_queue_limits(q);
 		if (blk_mq_can_poll(set))
 			lim.features |= BLK_FEAT_POLL;
 		else
 			lim.features &= ~BLK_FEAT_POLL;
-		if (queue_limits_commit_update(q, &lim) < 0)
+		if (set_queue_limits(q, &lim) < 0)
 			pr_warn("updating the poll flag failed\n");
 		blk_mq_map_swqueue(q);
 	}
@@ -5053,6 +5054,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 
 	list_for_each_entry(q, &set->tag_list, tag_set_list) {
 		blk_mq_unfreeze_queue(q);
+		mutex_unlock(&q->limits_lock);
 		mutex_unlock(&q->sysfs_lock);
 		mutex_unlock(&q->sysfs_dir_lock);
 	}
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 8f09e33f41f6..ca50bede1fb5 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -406,43 +406,75 @@ int blk_set_default_limits(struct queue_limits *lim)
 	lim->max_user_discard_sectors = UINT_MAX;
 	return blk_validate_limits(lim);
 }
-
-/**
- * queue_limits_commit_update - commit an atomic update of queue limits
- * @q:		queue to update
- * @lim:	limits to apply
- *
- * Apply the limits in @lim that were obtained from queue_limits_start_update()
- * and updated by the caller to @q.
- *
- * Returns 0 if successful, else a negative error code.
+/*
+ * Non atomic version of commiting queue limits. For atomicity, it's the caller
+ * responsibility to ensure that ->limits_lock has been acquired and queue has
+ * been frozen before calling this API.  Please also see queue_limits_commit_
+ * update() and __queue_limits_commit_update().
  */
-int queue_limits_commit_update(struct request_queue *q,
+int set_queue_limits(struct request_queue *q,
 		struct queue_limits *lim)
 {
 	int error;
 
 	error = blk_validate_limits(lim);
 	if (error)
-		goto out_unlock;
+		return error;
 
 #ifdef CONFIG_BLK_INLINE_ENCRYPTION
 	if (q->crypto_profile && lim->integrity.tag_size) {
 		pr_warn("blk-integrity: Integrity and hardware inline encryption are not supported together.\n");
-		error = -EINVAL;
-		goto out_unlock;
+		return -EINVAL;
 	}
 #endif
 
 	q->limits = *lim;
 	if (q->disk)
 		blk_apply_bdi_limits(q->disk->bdi, lim);
-out_unlock:
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(set_queue_limits);
+/**
+ * queue_limits_commit_update - commit an atomic update of queue limits
+ * @q:		queue to update
+ * @lim:	limits to apply
+ *
+ * Apply the limits in @lim that were obtained from queue_limits_start_update()
+ * and updated by the caller to @q.
+ *
+ * Returns 0 if successful, else a negative error code.
+ */
+int queue_limits_commit_update(struct request_queue *q,
+		struct queue_limits *lim)
+{
+	int error;
+
+	error = set_queue_limits(q, lim);
+	blk_mq_unfreeze_queue(q);
 	mutex_unlock(&q->limits_lock);
+
 	return error;
 }
 EXPORT_SYMBOL_GPL(queue_limits_commit_update);
+/*
+ * Same as queue_limits_commit_update but it first freezes queue before setting
+ * the limits. It goes hand in hand with __queue_limits_start_update().
+ * Please also see  __queue_limits_start_update().
+ */
+int __queue_limits_commit_update(struct request_queue *q,
+		struct queue_limits *lim)
+{
+	int error;
+
+	blk_mq_freeze_queue(q);
+	error = set_queue_limits(q, lim);
+	blk_mq_unfreeze_queue(q);
+	mutex_unlock(&q->limits_lock);
 
+	return error;
+}
+EXPORT_SYMBOL_GPL(__queue_limits_commit_update);
 /**
  * queue_limits_set - apply queue limits to queue
  * @q:		queue to update
@@ -456,6 +488,7 @@ EXPORT_SYMBOL_GPL(queue_limits_commit_update);
  */
 int queue_limits_set(struct request_queue *q, struct queue_limits *lim)
 {
+	blk_mq_freeze_queue(q);
 	mutex_lock(&q->limits_lock);
 	return queue_limits_commit_update(q, lim);
 }
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 64f70c713d2f..28eaff3756a1 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -171,9 +171,9 @@ static ssize_t queue_max_discard_sectors_store(struct gendisk *disk,
 	if ((max_discard_bytes >> SECTOR_SHIFT) > UINT_MAX)
 		return -EINVAL;
 
-	lim = queue_limits_start_update(disk->queue);
+	lim = get_queue_limits(disk->queue);
 	lim.max_user_discard_sectors = max_discard_bytes >> SECTOR_SHIFT;
-	err = queue_limits_commit_update(disk->queue, &lim);
+	err = set_queue_limits(disk->queue, &lim);
 	if (err)
 		return err;
 	return ret;
@@ -191,9 +191,9 @@ queue_max_sectors_store(struct gendisk *disk, const char *page, size_t count)
 	if (ret < 0)
 		return ret;
 
-	lim = queue_limits_start_update(disk->queue);
+	lim = get_queue_limits(disk->queue);
 	lim.max_user_sectors = max_sectors_kb << 1;
-	err = queue_limits_commit_update(disk->queue, &lim);
+	err = set_queue_limits(disk->queue, &lim);
 	if (err)
 		return err;
 	return ret;
@@ -210,12 +210,12 @@ static ssize_t queue_feature_store(struct gendisk *disk, const char *page,
 	if (ret < 0)
 		return ret;
 
-	lim = queue_limits_start_update(disk->queue);
+	lim = get_queue_limits(disk->queue);
 	if (val)
 		lim.features |= feature;
 	else
 		lim.features &= ~feature;
-	ret = queue_limits_commit_update(disk->queue, &lim);
+	ret = set_queue_limits(disk->queue, &lim);
 	if (ret)
 		return ret;
 	return count;
@@ -277,13 +277,13 @@ static ssize_t queue_iostats_passthrough_store(struct gendisk *disk,
 	if (ret < 0)
 		return ret;
 
-	lim = queue_limits_start_update(disk->queue);
+	lim = get_queue_limits(disk->queue);
 	if (ios)
 		lim.flags |= BLK_FLAG_IOSTATS_PASSTHROUGH;
 	else
 		lim.flags &= ~BLK_FLAG_IOSTATS_PASSTHROUGH;
 
-	ret = queue_limits_commit_update(disk->queue, &lim);
+	ret = set_queue_limits(disk->queue, &lim);
 	if (ret)
 		return ret;
 
@@ -407,12 +407,12 @@ static ssize_t queue_wc_store(struct gendisk *disk, const char *page,
 		return -EINVAL;
 	}
 
-	lim = queue_limits_start_update(disk->queue);
+	lim = get_queue_limits(disk->queue);
 	if (disable)
 		lim.flags |= BLK_FLAG_WRITE_CACHE_DISABLED;
 	else
 		lim.flags &= ~BLK_FLAG_WRITE_CACHE_DISABLED;
-	err = queue_limits_commit_update(disk->queue, &lim);
+	err = set_queue_limits(disk->queue, &lim);
 	if (err)
 		return err;
 	return count;
@@ -707,10 +707,15 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr,
 		entry->load_module(disk, page, length);
 
 	mutex_lock(&q->sysfs_lock);
+	mutex_lock(&q->limits_lock);
 	blk_mq_freeze_queue(q);
+
 	res = entry->store(disk, page, length);
+
 	blk_mq_unfreeze_queue(q);
+	mutex_unlock(&q->limits_lock);
 	mutex_unlock(&q->sysfs_lock);
+
 	return res;
 }
 
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 84da1eadff64..366704b6e2a2 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -1459,7 +1459,7 @@ static int disk_update_zone_resources(struct gendisk *disk,
 		return -ENODEV;
 	}
 
-	lim = queue_limits_start_update(q);
+	lim = __queue_limits_start_update(q);
 
 	/*
 	 * Some devices can advertize zone resource limits that are larger than
@@ -1497,9 +1497,7 @@ static int disk_update_zone_resources(struct gendisk *disk,
 	}
 
 commit:
-	blk_mq_freeze_queue(q);
-	ret = queue_limits_commit_update(q, &lim);
-	blk_mq_unfreeze_queue(q);
+	ret = __queue_limits_commit_update(q, &lim);
 
 	return ret;
 }
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 720fc30e2ecc..7c132f748429 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1290,7 +1290,7 @@ void drbd_reconsider_queue_parameters(struct drbd_device *device,
 		drbd_info(device, "max BIO size = %u\n", new);
 	}
 
-	lim = queue_limits_start_update(q);
+	lim = __queue_limits_start_update(q);
 	if (bdev) {
 		blk_set_stacking_limits(&lim);
 		lim.max_segments = drbd_backing_dev_max_segments(device);
@@ -1337,7 +1337,7 @@ void drbd_reconsider_queue_parameters(struct drbd_device *device,
 		lim.max_hw_discard_sectors = 0;
 	}
 
-	if (queue_limits_commit_update(q, &lim))
+	if (__queue_limits_commit_update(q, &lim))
 		drbd_err(device, "setting new queue limits failed\n");
 }
 
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 8f6761c27c68..b443b7092158 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -301,7 +301,7 @@ static int lo_read_simple(struct loop_device *lo, struct request *rq,
 
 static void loop_clear_limits(struct loop_device *lo, int mode)
 {
-	struct queue_limits lim = queue_limits_start_update(lo->lo_queue);
+	struct queue_limits lim = __queue_limits_start_update(lo->lo_queue);
 
 	if (mode & FALLOC_FL_ZERO_RANGE)
 		lim.max_write_zeroes_sectors = 0;
@@ -311,7 +311,7 @@ static void loop_clear_limits(struct loop_device *lo, int mode)
 		lim.discard_granularity = 0;
 	}
 
-	queue_limits_commit_update(lo->lo_queue, &lim);
+	__queue_limits_commit_update(lo->lo_queue, &lim);
 }
 
 static int lo_fallocate(struct loop_device *lo, struct request *rq, loff_t pos,
@@ -995,7 +995,7 @@ static int loop_reconfigure_limits(struct loop_device *lo, unsigned int bsize)
 
 	loop_get_discard_config(lo, &granularity, &max_discard_sectors);
 
-	lim = queue_limits_start_update(lo->lo_queue);
+	lim = __queue_limits_start_update(lo->lo_queue);
 	lim.logical_block_size = bsize;
 	lim.physical_block_size = bsize;
 	lim.io_min = bsize;
@@ -1010,7 +1010,7 @@ static int loop_reconfigure_limits(struct loop_device *lo, unsigned int bsize)
 		lim.discard_granularity = granularity;
 	else
 		lim.discard_granularity = 0;
-	return queue_limits_commit_update(lo->lo_queue, &lim);
+	return __queue_limits_commit_update(lo->lo_queue, &lim);
 }
 
 static int loop_configure(struct loop_device *lo, blk_mode_t mode,
@@ -1151,11 +1151,11 @@ static void __loop_clr_fd(struct loop_device *lo)
 	memset(lo->lo_file_name, 0, LO_NAME_SIZE);
 
 	/* reset the block size to the default */
-	lim = queue_limits_start_update(lo->lo_queue);
+	lim = __queue_limits_start_update(lo->lo_queue);
 	lim.logical_block_size = SECTOR_SIZE;
 	lim.physical_block_size = SECTOR_SIZE;
 	lim.io_min = SECTOR_SIZE;
-	queue_limits_commit_update(lo->lo_queue, &lim);
+	__queue_limits_commit_update(lo->lo_queue, &lim);
 
 	invalidate_disk(lo->lo_disk);
 	loop_sysfs_exit(lo);
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index b852050d8a96..a73f11f0a2f5 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -348,7 +348,7 @@ static int __nbd_set_size(struct nbd_device *nbd, loff_t bytesize,
 	if (!nbd->pid)
 		return 0;
 
-	lim = queue_limits_start_update(nbd->disk->queue);
+	lim = __queue_limits_start_update(nbd->disk->queue);
 	if (nbd->config->flags & NBD_FLAG_SEND_TRIM)
 		lim.max_hw_discard_sectors = UINT_MAX >> SECTOR_SHIFT;
 	else
@@ -368,7 +368,7 @@ static int __nbd_set_size(struct nbd_device *nbd, loff_t bytesize,
 
 	lim.logical_block_size = blksize;
 	lim.physical_block_size = blksize;
-	error = queue_limits_commit_update(nbd->disk->queue, &lim);
+	error = __queue_limits_commit_update(nbd->disk->queue, &lim);
 	if (error)
 		return error;
 
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 3efe378f1386..cb4ca7dcce26 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -1101,14 +1101,12 @@ cache_type_store(struct device *dev, struct device_attribute *attr,
 
 	virtio_cwrite8(vdev, offsetof(struct virtio_blk_config, wce), i);
 
-	lim = queue_limits_start_update(disk->queue);
+	lim = __queue_limits_start_update(disk->queue);
 	if (virtblk_get_cache_mode(vdev))
 		lim.features |= BLK_FEAT_WRITE_CACHE;
 	else
 		lim.features &= ~BLK_FEAT_WRITE_CACHE;
-	blk_mq_freeze_queue(disk->queue);
-	i = queue_limits_commit_update(disk->queue, &lim);
-	blk_mq_unfreeze_queue(disk->queue);
+	i = __queue_limits_commit_update(disk->queue, &lim);
 	if (i)
 		return i;
 	return count;
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 59ce113b882a..b802a17abaef 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -2013,10 +2013,10 @@ static int blkif_recover(struct blkfront_info *info)
 	struct bio *bio;
 	struct blkfront_ring_info *rinfo;
 
-	lim = queue_limits_start_update(info->rq);
+	lim = __queue_limits_start_update(info->rq);
 	blkfront_gather_backend_features(info);
 	blkif_set_queue_limits(info, &lim);
-	rc = queue_limits_commit_update(info->rq, &lim);
+	rc = __queue_limits_commit_update(info->rq, &lim);
 	if (rc)
 		return rc;
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index df4cc8a27385..b62bcc71c76c 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2128,13 +2128,10 @@ static int nvme_update_ns_info_generic(struct nvme_ns *ns,
 	struct queue_limits lim;
 	int ret;
 
-	blk_mq_freeze_queue(ns->disk->queue);
 	lim = queue_limits_start_update(ns->disk->queue);
 	nvme_set_ctrl_limits(ns->ctrl, &lim);
-	ret = queue_limits_commit_update(ns->disk->queue, &lim);
 	set_disk_ro(ns->disk, nvme_ns_is_readonly(ns, info));
-	blk_mq_unfreeze_queue(ns->disk->queue);
-
+	ret = queue_limits_commit_update(ns->disk->queue, &lim);
 	/* Hide the block-interface for these devices */
 	if (!ret)
 		ret = -ENODEV;
@@ -2177,12 +2174,11 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
 			goto out;
 	}
 
-	blk_mq_freeze_queue(ns->disk->queue);
+	lim = queue_limits_start_update(ns->disk->queue);
 	ns->head->lba_shift = id->lbaf[lbaf].ds;
 	ns->head->nuse = le64_to_cpu(id->nuse);
 	capacity = nvme_lba_to_sect(ns->head, le64_to_cpu(id->nsze));
 
-	lim = queue_limits_start_update(ns->disk->queue);
 	nvme_set_ctrl_limits(ns->ctrl, &lim);
 	nvme_configure_metadata(ns->ctrl, ns->head, id, nvm, info);
 	nvme_set_chunk_sectors(ns, id, &lim);
@@ -2210,12 +2206,6 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
 	if (!nvme_init_integrity(ns->head, &lim, info))
 		capacity = 0;
 
-	ret = queue_limits_commit_update(ns->disk->queue, &lim);
-	if (ret) {
-		blk_mq_unfreeze_queue(ns->disk->queue);
-		goto out;
-	}
-
 	set_capacity_and_notify(ns->disk, capacity);
 
 	/*
@@ -2228,7 +2218,9 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
 		ns->head->features |= NVME_NS_DEAC;
 	set_disk_ro(ns->disk, nvme_ns_is_readonly(ns, info));
 	set_bit(NVME_NS_READY, &ns->flags);
-	blk_mq_unfreeze_queue(ns->disk->queue);
+	ret = queue_limits_commit_update(ns->disk->queue, &lim);
+	if (ret)
+		goto out;
 
 	if (blk_queue_is_zoned(ns->queue)) {
 		ret = blk_revalidate_disk_zones(ns->disk);
@@ -2285,7 +2277,7 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
 		struct queue_limits *ns_lim = &ns->disk->queue->limits;
 		struct queue_limits lim;
 
-		blk_mq_freeze_queue(ns->head->disk->queue);
+		lim = queue_limits_start_update(ns->head->disk->queue);
 		/*
 		 * queue_limits mixes values that are the hardware limitations
 		 * for bio splitting with what is the device configuration.
@@ -2301,7 +2293,6 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
 		 * the splitting limits in to make sure we still obey possibly
 		 * lower limitations of other controllers.
 		 */
-		lim = queue_limits_start_update(ns->head->disk->queue);
 		lim.logical_block_size = ns_lim->logical_block_size;
 		lim.physical_block_size = ns_lim->physical_block_size;
 		lim.io_min = ns_lim->io_min;
@@ -2312,13 +2303,12 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
 			ns->head->disk->flags |= GENHD_FL_HIDDEN;
 		else
 			nvme_init_integrity(ns->head, &lim, info);
-		ret = queue_limits_commit_update(ns->head->disk->queue, &lim);
 
 		set_capacity_and_notify(ns->head->disk, get_capacity(ns->disk));
 		set_disk_ro(ns->head->disk, nvme_ns_is_readonly(ns, info));
 		nvme_mpath_revalidate_paths(ns);
 
-		blk_mq_unfreeze_queue(ns->head->disk->queue);
+		ret = queue_limits_commit_update(ns->head->disk->queue, &lim);
 	}
 
 	return ret;
@@ -3338,9 +3328,9 @@ static int nvme_init_identify(struct nvme_ctrl *ctrl)
 	ctrl->max_hw_sectors =
 		min_not_zero(ctrl->max_hw_sectors, max_hw_sectors);
 
-	lim = queue_limits_start_update(ctrl->admin_q);
+	lim = __queue_limits_start_update(ctrl->admin_q);
 	nvme_set_ctrl_limits(ctrl, &lim);
-	ret = queue_limits_commit_update(ctrl->admin_q, &lim);
+	ret = __queue_limits_commit_update(ctrl->admin_q, &lim);
 	if (ret)
 		goto out_free;
 
diff --git a/drivers/scsi/mpi3mr/mpi3mr_os.c b/drivers/scsi/mpi3mr/mpi3mr_os.c
index 1bef88130d0c..62ca096daad1 100644
--- a/drivers/scsi/mpi3mr/mpi3mr_os.c
+++ b/drivers/scsi/mpi3mr/mpi3mr_os.c
@@ -1064,9 +1064,9 @@ mpi3mr_update_sdev(struct scsi_device *sdev, void *data)
 
 	mpi3mr_change_queue_depth(sdev, tgtdev->q_depth);
 
-	lim = queue_limits_start_update(sdev->request_queue);
+	lim = __queue_limits_start_update(sdev->request_queue);
 	mpi3mr_configure_tgt_dev(tgtdev, &lim);
-	WARN_ON_ONCE(queue_limits_commit_update(sdev->request_queue, &lim));
+	WARN_ON_ONCE(__queue_limits_commit_update(sdev->request_queue, &lim));
 }
 
 /**
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 042329b74c6e..18d76d852cbe 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1068,7 +1068,7 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result,
 	/*
 	 * No need to freeze the queue as it isn't reachable to anyone else yet.
 	 */
-	lim = queue_limits_start_update(sdev->request_queue);
+	lim = __queue_limits_start_update(sdev->request_queue);
 	if (*bflags & BLIST_MAX_512)
 		lim.max_hw_sectors = 512;
 	else if (*bflags & BLIST_MAX_1024)
@@ -1090,7 +1090,7 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result,
 		return SCSI_SCAN_NO_RESPONSE;
 	}
 
-	ret = queue_limits_commit_update(sdev->request_queue, &lim);
+	ret = __queue_limits_commit_update(sdev->request_queue, &lim);
 	if (ret) {
 		sdev_printk(KERN_ERR, sdev, "failed to apply queue limits.\n");
 		return SCSI_SCAN_NO_RESPONSE;
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 8947dab132d7..9bf2d75cf37b 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -175,10 +175,9 @@ cache_type_store(struct device *dev, struct device_attribute *attr,
 		sdkp->WCE = wce;
 		sdkp->RCD = rcd;
 
-		lim = queue_limits_start_update(sdkp->disk->queue);
+		lim = __queue_limits_start_update(sdkp->disk->queue);
 		sd_set_flush_flag(sdkp, &lim);
-		blk_mq_freeze_queue(sdkp->disk->queue);
-		ret = queue_limits_commit_update(sdkp->disk->queue, &lim);
+		ret = __queue_limits_commit_update(sdkp->disk->queue, &lim);
 		blk_mq_unfreeze_queue(sdkp->disk->queue);
 		if (ret)
 			return ret;
@@ -481,11 +480,9 @@ provisioning_mode_store(struct device *dev, struct device_attribute *attr,
 	if (mode < 0)
 		return -EINVAL;
 
-	lim = queue_limits_start_update(sdkp->disk->queue);
+	lim = __queue_limits_start_update(sdkp->disk->queue);
 	sd_config_discard(sdkp, &lim, mode);
-	blk_mq_freeze_queue(sdkp->disk->queue);
-	err = queue_limits_commit_update(sdkp->disk->queue, &lim);
-	blk_mq_unfreeze_queue(sdkp->disk->queue);
+	err = __queue_limits_commit_update(sdkp->disk->queue, &lim);
 	if (err)
 		return err;
 	return count;
@@ -592,11 +589,9 @@ max_write_same_blocks_store(struct device *dev, struct device_attribute *attr,
 		sdkp->max_ws_blocks = max;
 	}
 
-	lim = queue_limits_start_update(sdkp->disk->queue);
+	lim = __queue_limits_start_update(sdkp->disk->queue);
 	sd_config_write_same(sdkp, &lim);
-	blk_mq_freeze_queue(sdkp->disk->queue);
-	err = queue_limits_commit_update(sdkp->disk->queue, &lim);
-	blk_mq_unfreeze_queue(sdkp->disk->queue);
+	err = __queue_limits_commit_update(sdkp->disk->queue, &lim);
 	if (err)
 		return err;
 	return count;
@@ -3724,7 +3719,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
 
 	sd_spinup_disk(sdkp);
 
-	lim = queue_limits_start_update(sdkp->disk->queue);
+	lim = __queue_limits_start_update(sdkp->disk->queue);
 
 	/*
 	 * Without media there is no reason to ask; moreover, some devices
@@ -3803,9 +3798,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
 	sd_config_write_same(sdkp, &lim);
 	kfree(buffer);
 
-	blk_mq_freeze_queue(sdkp->disk->queue);
-	err = queue_limits_commit_update(sdkp->disk->queue, &lim);
-	blk_mq_unfreeze_queue(sdkp->disk->queue);
+	err = __queue_limits_commit_update(sdkp->disk->queue, &lim);
 	if (err)
 		return err;
 
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index 198bec87bb8e..de562a17cbd4 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -795,11 +795,9 @@ static int get_sectorsize(struct scsi_cd *cd)
 		set_capacity(cd->disk, cd->capacity);
 	}
 
-	lim = queue_limits_start_update(q);
+	lim = __queue_limits_start_update(q);
 	lim.logical_block_size = sector_size;
-	blk_mq_freeze_queue(q);
-	err = queue_limits_commit_update(q, &lim);
-	blk_mq_unfreeze_queue(q);
+	err = __queue_limits_commit_update(q, &lim);
 	return err;
 }
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 378d3a1a22fc..5218cc90937b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -46,6 +46,8 @@ extern const struct device_type disk_type;
 extern const struct device_type part_type;
 extern const struct class block_class;
 
+extern void blk_mq_freeze_queue(struct request_queue *q);
+
 /*
  * Maximum number of blkcg policies allowed to be registered concurrently.
  * Defined here to simplify include dependency.
@@ -945,8 +947,36 @@ static inline struct queue_limits
 queue_limits_start_update(struct request_queue *q)
 {
 	mutex_lock(&q->limits_lock);
+	blk_mq_freeze_queue(q);
 	return q->limits;
 }
+/*
+ * Same as queue_limits_start_update but without freezing the queue. It's
+ * appropriate for callers who don't require freezing the queue while reading
+ * the queue limits. It goes hand in hand with __queue_limits_commit_update().
+ * Please also see __queue_limits_commit_update().
+ */
+static inline struct queue_limits
+__queue_limits_start_update(struct request_queue *q)
+{
+	mutex_lock(&q->limits_lock);
+	return q->limits;
+}
+/*
+ * Same as queue_limits_start_update() but without acquiring ->limits_lock
+ * and freezing the queue. It's assumed that caller has acquired ->limits_lock
+ * and frozen the queue before calling this function.
+ */
+static inline struct queue_limits
+get_queue_limits(struct request_queue *q)
+{
+	return q->limits;
+}
+
+int set_queue_limits(struct request_queue *q,
+		struct queue_limits *lim);
+int __queue_limits_commit_update(struct request_queue *q,
+		struct queue_limits *lim);
 int queue_limits_commit_update(struct request_queue *q,
 		struct queue_limits *lim);
 int queue_limits_set(struct request_queue *q, struct queue_limits *lim);

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits
  2024-12-21 13:03                             ` Nilay Shroff
@ 2024-12-30  9:02                               ` Ming Lei
  2024-12-30 23:29                                 ` Damien Le Moal
  0 siblings, 1 reply; 23+ messages in thread
From: Ming Lei @ 2024-12-30  9:02 UTC (permalink / raw)
  To: Nilay Shroff; +Cc: Christoph Hellwig, Damien Le Moal, Jens Axboe, linux-block

On Sat, Dec 21, 2024 at 06:33:13PM +0530, Nilay Shroff wrote:
> 
> 
> On 12/19/24 11:50, Christoph Hellwig wrote:
> > On Wed, Dec 18, 2024 at 06:57:45AM -0800, Damien Le Moal wrote:
> >>> Yeah agreed but I see sd_revalidate_disk() is probably the only exception 
> >>> which allocates the blk-mq request. Can't we fix it? 
> >>
> >> If we change where limits_lock is taken now, we will again introduce races
> >> between user config and discovery/revalidation, which is what
> >> queue_limits_start_update() and queue_limits_commit_update() intended to fix in
> >> the first place.
> >>
> >> So changing sd_revalidate_disk() is not the right approach.
> > 
> > Well, sd_revalidate_disk is a bit special in that it needs a command
> > on the same queue to query the information.  So it needs to be able
> > to issue commands without the queue frozen.  Freezing the queue inside
> > the limits lock support that, sd just can't use the convenience helpers
> > that lock and freeze.
> > 
> >> This is overly complicated ... As I suggested, I think that a simpler approach
> >> is to call blk_mq_freeze_queue() and blk_mq_unfreeze_queue() inside
> >> queue_limits_commit_update(). Doing so, no driver should need to directly call
> >> freeze/unfreeze. But that would be a cleanup. Let's first fix the few instances
> >> that have the update/freeze order wrong. As mentioned, the pattern simply needs
> > 
> > Yes, the queue only needs to be frozen for the actual update,
> > which would remove the need for the locking.  The big question for both
> > variants is if we can get rid of all the callers that have the queue
> > already frozen and then start an update.
> > 
> After thinking for a while I found that broadly we've four categories of users
> which need this pattern of limits-lock and/or queue-freeze:
> 
> 1. Callers which need acquiring limits-lock while starting the update; and freezing 
>    queue only when committing the update:
>    - sd_revalidate_disk

sd_revalidate_disk() should be the most strange one, in which
passthrough io command is required, so dependency on queue freeze lock
can't be added, such as, q->limits_lock

Actually the current queue limits structure aren't well-organized, otherwise
limit lock isn't needed for reading queue limits from hardware, since
sd_revalidate_disk() just overwrites partial limits. Or it can be
done by refactoring sd_revalidate_disk(). However, the change might
be a little big, and I guess that is the reason why Damien don't like
it.

>    - nvme_init_identify
>    - loop_clear_limits
>    - few more...
> 
> 2. Callers which need both freezing the queue and acquiring limits-lock while starting
>    the update:
>    - nvme_update_ns_info_block
>    - nvme_update_ns_info_generic
>    - few more... 
> 
> 3. Callers which neither need acquiring limits-lock nor require freezing queue as for 
>    these set of callers in the call stack limits-lock is already acquired and queue is 
>    already frozen:
>    - __blk_mq_update_nr_hw_queues
>    - queue_xxx_store and helpers

I think it isn't correct.

The queue limits are applied on fast IO path, in theory anywhere
updating q->limits need to drain IOs in submission path at least
after gendisk is added.

Also Christoph adds limits-lock for avoiding to lose other concurrent
update, which makes the problem more hard to solve.



Thanks, 
Ming


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits
  2024-12-30  9:02                               ` Ming Lei
@ 2024-12-30 23:29                                 ` Damien Le Moal
  2025-01-01 11:17                                   ` Nilay Shroff
  0 siblings, 1 reply; 23+ messages in thread
From: Damien Le Moal @ 2024-12-30 23:29 UTC (permalink / raw)
  To: Ming Lei, Nilay Shroff; +Cc: Christoph Hellwig, Jens Axboe, linux-block

On 2024/12/30 18:02, Ming Lei wrote:
>> 1. Callers which need acquiring limits-lock while starting the update; and freezing 
>>    queue only when committing the update:
>>    - sd_revalidate_disk
> 
> sd_revalidate_disk() should be the most strange one, in which
> passthrough io command is required, so dependency on queue freeze lock
> can't be added, such as, q->limits_lock
> 
> Actually the current queue limits structure aren't well-organized, otherwise
> limit lock isn't needed for reading queue limits from hardware, since
> sd_revalidate_disk() just overwrites partial limits. Or it can be
> done by refactoring sd_revalidate_disk(). However, the change might
> be a little big, and I guess that is the reason why Damien don't like
> it.

That was not the reason, but yes, modifying sd_revalidate_disk() is not without
risks of introducing regressions. The reason I proposed to simply move the queue
freeze around or inside queue_limits_commit_update() is that:

1) It is the right thing to do as that is the only place where it is actually
needed to avoid losing concurrent limits changes.

2) It clarifies the locking order between queue freeze and the limits lock.

3) The current issues should mostly all be solved with some refactoring of the
->store() calls in blk-sysfs.c, resolving the current ABBA deadlocks between
queue freeze and limits lock.

With that, we should be able to fix the issue for all block drivers with changes
to the block layer sysfs code only. But... I have not looked into the details of
all limits commit calls in all block drivers. So there may be some bad apples in
there that will also need some tweaking.

>>    - nvme_init_identify
>>    - loop_clear_limits
>>    - few more...
>>
>> 2. Callers which need both freezing the queue and acquiring limits-lock while starting
>>    the update:
>>    - nvme_update_ns_info_block
>>    - nvme_update_ns_info_generic
>>    - few more... 

The queue freeze should not be necessary anywhere when starting the update. The
queue freeze is only needed when applying the limits so that IOs that are in
flight are not affected by the limits change while still being processed.

>>
>> 3. Callers which neither need acquiring limits-lock nor require freezing queue as for 
>>    these set of callers in the call stack limits-lock is already acquired and queue is 
>>    already frozen:
>>    - __blk_mq_update_nr_hw_queues
>>    - queue_xxx_store and helpers
> 
> I think it isn't correct.
> 
> The queue limits are applied on fast IO path, in theory anywhere
> updating q->limits need to drain IOs in submission path at least
> after gendisk is added.

Yes !

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits
  2024-12-30 23:29                                 ` Damien Le Moal
@ 2025-01-01 11:17                                   ` Nilay Shroff
  0 siblings, 0 replies; 23+ messages in thread
From: Nilay Shroff @ 2025-01-01 11:17 UTC (permalink / raw)
  To: Damien Le Moal, Ming Lei; +Cc: Christoph Hellwig, Jens Axboe, linux-block



On 12/31/24 04:59, Damien Le Moal wrote:
> On 2024/12/30 18:02, Ming Lei wrote:
>>> 1. Callers which need acquiring limits-lock while starting the update; and freezing 
>>>    queue only when committing the update:
>>>    - sd_revalidate_disk
>>
>> sd_revalidate_disk() should be the most strange one, in which
>> passthrough io command is required, so dependency on queue freeze lock
>> can't be added, such as, q->limits_lock
>>
>> Actually the current queue limits structure aren't well-organized, otherwise
>> limit lock isn't needed for reading queue limits from hardware, since
>> sd_revalidate_disk() just overwrites partial limits. Or it can be
>> done by refactoring sd_revalidate_disk(). However, the change might
>> be a little big, and I guess that is the reason why Damien don't like
>> it.
> 
> That was not the reason, but yes, modifying sd_revalidate_disk() is not without
> risks of introducing regressions. The reason I proposed to simply move the queue
> freeze around or inside queue_limits_commit_update() is that:
> 
> 1) It is the right thing to do as that is the only place where it is actually
> needed to avoid losing concurrent limits changes.
> 
> 2) It clarifies the locking order between queue freeze and the limits lock.
> 
> 3) The current issues should mostly all be solved with some refactoring of the
> ->store() calls in blk-sysfs.c, resolving the current ABBA deadlocks between
> queue freeze and limits lock.
> 
> With that, we should be able to fix the issue for all block drivers with changes
> to the block layer sysfs code only. But... I have not looked into the details of
> all limits commit calls in all block drivers. So there may be some bad apples in
> there that will also need some tweaking.
Yes, I think, we have places other than blk-sysfs, like nvme, where we need fixing.
> 
>>>    - nvme_init_identify
>>>    - loop_clear_limits
>>>    - few more...
>>>
>>> 2. Callers which need both freezing the queue and acquiring limits-lock while starting
>>>    the update:
>>>    - nvme_update_ns_info_block
>>>    - nvme_update_ns_info_generic
>>>    - few more... 
> 
> The queue freeze should not be necessary anywhere when starting the update. The
> queue freeze is only needed when applying the limits so that IOs that are in
> flight are not affected by the limits change while still being processed.
Hmm, but as mentioned for nvme limits update this is not always true. So this need to be tweaked.
>>>
>>> 3. Callers which neither need acquiring limits-lock nor require freezing queue as for 
>>>    these set of callers in the call stack limits-lock is already acquired and queue is 
>>>    already frozen:
>>>    - __blk_mq_update_nr_hw_queues
>>>    - queue_xxx_store and helpers
>>
>> I think it isn't correct.
>>
>> The queue limits are applied on fast IO path, in theory anywhere
>> updating q->limits need to drain IOs in submission path at least
>> after gendisk is added.
> 
> Yes !
> 
OK, I think it'd be tricky to fix __blk_mq_update_nr_hw_queues and queue_xxx_store
with the current proposal of acquiring limits lock followed by queue freeze 
while committing limit update. 

Thanks,
--Nilay

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2025-01-01 11:17 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-16  8:02 [PATCH 0/2] block: fix deadlock caused by atomic limits update Ming Lei
2024-12-16  8:02 ` [PATCH 1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits Ming Lei
2024-12-16 15:49   ` Christoph Hellwig
2024-12-17  1:52     ` Ming Lei
2024-12-17  4:40       ` Christoph Hellwig
2024-12-17  7:05         ` Ming Lei
2024-12-17  7:19           ` Christoph Hellwig
2024-12-17  7:30             ` Ming Lei
2024-12-17 16:07               ` Damien Le Moal
2024-12-18  2:09                 ` Ming Lei
2024-12-18 11:33                   ` Nilay Shroff
2024-12-18 13:40                     ` Ming Lei
2024-12-18 14:05                       ` Nilay Shroff
2024-12-18 14:57                         ` Damien Le Moal
2024-12-19  6:20                           ` Christoph Hellwig
2024-12-19  7:16                             ` Nilay Shroff
2024-12-21 13:03                             ` Nilay Shroff
2024-12-30  9:02                               ` Ming Lei
2024-12-30 23:29                                 ` Damien Le Moal
2025-01-01 11:17                                   ` Nilay Shroff
2024-12-19  6:17                         ` Christoph Hellwig
2024-12-19  6:15                   ` Christoph Hellwig
2024-12-16  8:02 ` [PATCH 2/2] block: remove queue_limits_cancel_update() Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).