public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] block: model freeze & enter queue as rwsem for supporting lockdep
       [not found] <20241018013542.3013963-1-ming.lei@redhat.com>
@ 2024-10-22  6:18 ` Christoph Hellwig
  2024-10-22  7:19   ` Peter Zijlstra
  2024-10-23  3:22   ` Ming Lei
  0 siblings, 2 replies; 5+ messages in thread
From: Christoph Hellwig @ 2024-10-22  6:18 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, linux-kernel

On Fri, Oct 18, 2024 at 09:35:42AM +0800, Ming Lei wrote:
> Recently we got several deadlock report[1][2][3] caused by blk_mq_freeze_queue
> and blk_enter_queue().
> 
> Turns out the two are just like one rwsem, so model them as rwsem for
> supporting lockdep:
> 
> 1) model blk_mq_freeze_queue() as down_write_trylock()
> - it is exclusive lock, so dependency with blk_enter_queue() is covered
> - it is trylock because blk_mq_freeze_queue() are allowed to run concurrently

Is this using the right terminology?  down_write and other locking
primitives obviously can run concurrently, the whole point is to
synchronize the code run inside the criticial section.

I think what you mean here is blk_mq_freeze_queue can be called more
than once due to a global recursion counter.

Not sure modelling it as a trylock is the right approach here,
I've added the lockdep maintainers if they have an idea.

> 
> 2) model blk_enter_queue() as down_read()
> - it is shared lock, so concurrent blk_enter_queue() are allowed
> - it is read lock, so dependency with blk_mq_freeze_queue() is modeled
> - blk_queue_exit() is often called from other contexts(such as irq), and
> it can't be annotated as rwsem_release(), so simply do it in
> blk_enter_queue(), this way still covered cases as many as possible
> 
> NVMe is the only subsystem which may call blk_mq_freeze_queue() and
> blk_mq_unfreeze_queue() from different context, so it is the only
> exception for the modeling. Add one tagset flag to exclude it from
> the lockdep support.

rwsems have a non_owner variant for these kinds of uses cases,
we should do the same for blk_mq_freeze_queue to annoate the callsite
instead of a global flag.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] block: model freeze & enter queue as rwsem for supporting lockdep
  2024-10-22  6:18 ` [PATCH] block: model freeze & enter queue as rwsem for supporting lockdep Christoph Hellwig
@ 2024-10-22  7:19   ` Peter Zijlstra
  2024-10-22  7:21     ` Christoph Hellwig
  2024-10-23  3:22   ` Ming Lei
  1 sibling, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2024-10-22  7:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ming Lei, Jens Axboe, linux-block, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, linux-kernel

On Tue, Oct 22, 2024 at 08:18:05AM +0200, Christoph Hellwig wrote:
> On Fri, Oct 18, 2024 at 09:35:42AM +0800, Ming Lei wrote:
> > Recently we got several deadlock report[1][2][3] caused by blk_mq_freeze_queue
> > and blk_enter_queue().
> > 
> > Turns out the two are just like one rwsem, so model them as rwsem for
> > supporting lockdep:
> > 
> > 1) model blk_mq_freeze_queue() as down_write_trylock()
> > - it is exclusive lock, so dependency with blk_enter_queue() is covered
> > - it is trylock because blk_mq_freeze_queue() are allowed to run concurrently
> 
> Is this using the right terminology?  down_write and other locking
> primitives obviously can run concurrently, the whole point is to
> synchronize the code run inside the criticial section.
> 
> I think what you mean here is blk_mq_freeze_queue can be called more
> than once due to a global recursion counter.
> 
> Not sure modelling it as a trylock is the right approach here,
> I've added the lockdep maintainers if they have an idea.

So lockdep supports recursive reader state, but you're looking at
recursive exclusive state?

If you achieve this using an external nest count, then it is probably (I
haven't yet had morning juice) sufficient to use the regular exclusive
state on the outermost lock / unlock pair and simply ignore the inner
locks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] block: model freeze & enter queue as rwsem for supporting lockdep
  2024-10-22  7:19   ` Peter Zijlstra
@ 2024-10-22  7:21     ` Christoph Hellwig
  0 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2024-10-22  7:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Hellwig, Ming Lei, Jens Axboe, linux-block, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, linux-kernel

On Tue, Oct 22, 2024 at 09:19:05AM +0200, Peter Zijlstra wrote:
> > Not sure modelling it as a trylock is the right approach here,
> > I've added the lockdep maintainers if they have an idea.
> 
> So lockdep supports recursive reader state, but you're looking at
> recursive exclusive state?

Yes.

> If you achieve this using an external nest count, then it is probably (I
> haven't yet had morning juice) sufficient to use the regular exclusive
> state on the outermost lock / unlock pair and simply ignore the inner
> locks.

There obviosuly is an external nest count here, so I guess that should
do the job.  Ming?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] block: model freeze & enter queue as rwsem for supporting lockdep
  2024-10-22  6:18 ` [PATCH] block: model freeze & enter queue as rwsem for supporting lockdep Christoph Hellwig
  2024-10-22  7:19   ` Peter Zijlstra
@ 2024-10-23  3:22   ` Ming Lei
  2024-10-23  6:07     ` Christoph Hellwig
  1 sibling, 1 reply; 5+ messages in thread
From: Ming Lei @ 2024-10-23  3:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, linux-kernel

On Tue, Oct 22, 2024 at 08:18:05AM +0200, Christoph Hellwig wrote:
> On Fri, Oct 18, 2024 at 09:35:42AM +0800, Ming Lei wrote:
> > Recently we got several deadlock report[1][2][3] caused by blk_mq_freeze_queue
> > and blk_enter_queue().
> > 
> > Turns out the two are just like one rwsem, so model them as rwsem for
> > supporting lockdep:
> > 
> > 1) model blk_mq_freeze_queue() as down_write_trylock()
> > - it is exclusive lock, so dependency with blk_enter_queue() is covered
> > - it is trylock because blk_mq_freeze_queue() are allowed to run concurrently
> 
> Is this using the right terminology?  down_write and other locking
> primitives obviously can run concurrently, the whole point is to
> synchronize the code run inside the criticial section.
> 
> I think what you mean here is blk_mq_freeze_queue can be called more
> than once due to a global recursion counter.
> 
> Not sure modelling it as a trylock is the right approach here,
> I've added the lockdep maintainers if they have an idea.

Yeah, looks we can just call lock_acquire for the outermost
freeze/unfreeze.

> 
> > 
> > 2) model blk_enter_queue() as down_read()
> > - it is shared lock, so concurrent blk_enter_queue() are allowed
> > - it is read lock, so dependency with blk_mq_freeze_queue() is modeled
> > - blk_queue_exit() is often called from other contexts(such as irq), and
> > it can't be annotated as rwsem_release(), so simply do it in
> > blk_enter_queue(), this way still covered cases as many as possible
> > 
> > NVMe is the only subsystem which may call blk_mq_freeze_queue() and
> > blk_mq_unfreeze_queue() from different context, so it is the only
> > exception for the modeling. Add one tagset flag to exclude it from
> > the lockdep support.
> 
> rwsems have a non_owner variant for these kinds of uses cases,
> we should do the same for blk_mq_freeze_queue to annoate the callsite
> instead of a global flag.
 
Here it isn't real rwsem, and lockdep doesn't have non_owner variant
for rwsem_acquire() and rwsem_release().

Another corner case is blk_mark_disk_dead() in which freeze & unfreeze
may be run from different task contexts too.


thanks,
Ming


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] block: model freeze & enter queue as rwsem for supporting lockdep
  2024-10-23  3:22   ` Ming Lei
@ 2024-10-23  6:07     ` Christoph Hellwig
  0 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2024-10-23  6:07 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng, linux-kernel

On Wed, Oct 23, 2024 at 11:22:55AM +0800, Ming Lei wrote:
> > > 2) model blk_enter_queue() as down_read()
> > > - it is shared lock, so concurrent blk_enter_queue() are allowed
> > > - it is read lock, so dependency with blk_mq_freeze_queue() is modeled
> > > - blk_queue_exit() is often called from other contexts(such as irq), and
> > > it can't be annotated as rwsem_release(), so simply do it in
> > > blk_enter_queue(), this way still covered cases as many as possible
> > > 
> > > NVMe is the only subsystem which may call blk_mq_freeze_queue() and
> > > blk_mq_unfreeze_queue() from different context, so it is the only
> > > exception for the modeling. Add one tagset flag to exclude it from
> > > the lockdep support.
> > 
> > rwsems have a non_owner variant for these kinds of uses cases,
> > we should do the same for blk_mq_freeze_queue to annoate the callsite
> > instead of a global flag.
>  
> Here it isn't real rwsem, and lockdep doesn't have non_owner variant
> for rwsem_acquire() and rwsem_release().

Hmm, it looks like down_read_non_owner completely skips lockdep,
which seems rather problematic.  Sure we can't really track an
owner, but having it take part in the lock chain would be extremely
useful.  Whatever we're using there should work for the freeze
protection.

> Another corner case is blk_mark_disk_dead() in which freeze & unfreeze
> may be run from different task contexts too.

Yes, this is a pretty questionable one though as we should be able
to unfreeze as soon as the dying bit is set.  Separate discussion,
though.

Either way the non-ownership should be per call and not a queue or
tagset flag.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-10-23  6:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20241018013542.3013963-1-ming.lei@redhat.com>
2024-10-22  6:18 ` [PATCH] block: model freeze & enter queue as rwsem for supporting lockdep Christoph Hellwig
2024-10-22  7:19   ` Peter Zijlstra
2024-10-22  7:21     ` Christoph Hellwig
2024-10-23  3:22   ` Ming Lei
2024-10-23  6:07     ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox