linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* qrwlock && read-after-read
@ 2015-08-04 13:00 Oleg Nesterov
  2015-08-04 13:10 ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2015-08-04 13:00 UTC (permalink / raw)
  To: Arnd Bergmann, Ingo Molnar, Linus Torvalds, Paul E.McKenney,
	Peter Zijlstra, Waiman Long
  Cc: linux-kernel

I am working on the (off-topic) bug report which motivated me to
look at locking/qrwlock.c and it seems to me there is a problem
with the queued rwlocks.

Unless I am totally confused read-after-read is no longer valid,
write_lock() stops the new readers. And lockdep doesn't know this,
read_lock()->rwlock_acquire_read() doesn't match the reality. The
code doing

	read_lock(X);
	read_lock(X);

can deadlock if another CPU does write_lock(X) in between. This
was fine before rwlock_t was changed to use qrwlock.

A nested read_lock() in interrupt should be fine though, and this
is because queue_read_lock_slowpath() "ignores" _QW_WAITING if
in_interrupt().

This means that rwlock_t has the really strange semantics imho,
and again, it is not lockdep-friendly.

What do you think we can/should do? Or did I misread this code?

Oleg.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: qrwlock && read-after-read
  2015-08-04 13:00 qrwlock && read-after-read Oleg Nesterov
@ 2015-08-04 13:10 ` Peter Zijlstra
  2015-08-04 13:40   ` Oleg Nesterov
  2015-08-04 13:51   ` Linus Torvalds
  0 siblings, 2 replies; 6+ messages in thread
From: Peter Zijlstra @ 2015-08-04 13:10 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Arnd Bergmann, Ingo Molnar, Linus Torvalds, Paul E.McKenney,
	Waiman Long, linux-kernel

On Tue, Aug 04, 2015 at 03:00:53PM +0200, Oleg Nesterov wrote:
> I am working on the (off-topic) bug report which motivated me to
> look at locking/qrwlock.c and it seems to me there is a problem
> with the queued rwlocks.
> 
> Unless I am totally confused read-after-read is no longer valid,
> write_lock() stops the new readers. And lockdep doesn't know this,
> read_lock()->rwlock_acquire_read() doesn't match the reality. The
> code doing
> 
> 	read_lock(X);
> 	read_lock(X);
> 
> can deadlock if another CPU does write_lock(X) in between. This
> was fine before rwlock_t was changed to use qrwlock.
> 
> A nested read_lock() in interrupt should be fine though, and this
> is because queue_read_lock_slowpath() "ignores" _QW_WAITING if
> in_interrupt().
> 
> This means that rwlock_t has the really strange semantics imho,
> and again, it is not lockdep-friendly.
> 
> What do you think we can/should do? Or did I misread this code?

Fix lockdep, although that's non trivial from what I remember.

These (new) semantics were very much on purpose and suggested by Linus
IIRC.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: qrwlock && read-after-read
  2015-08-04 13:10 ` Peter Zijlstra
@ 2015-08-04 13:40   ` Oleg Nesterov
  2015-08-04 17:32     ` Peter Zijlstra
  2015-08-04 13:51   ` Linus Torvalds
  1 sibling, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2015-08-04 13:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnd Bergmann, Ingo Molnar, Linus Torvalds, Paul E.McKenney,
	Waiman Long, linux-kernel

On 08/04, Peter Zijlstra wrote:
>
> On Tue, Aug 04, 2015 at 03:00:53PM +0200, Oleg Nesterov wrote:
> > I am working on the (off-topic) bug report which motivated me to
> > look at locking/qrwlock.c and it seems to me there is a problem
> > with the queued rwlocks.
> >
> > Unless I am totally confused read-after-read is no longer valid,
> > write_lock() stops the new readers. And lockdep doesn't know this,
> > read_lock()->rwlock_acquire_read() doesn't match the reality. The
> > code doing
> >
> > 	read_lock(X);
> > 	read_lock(X);
> >
> > can deadlock if another CPU does write_lock(X) in between. This
> > was fine before rwlock_t was changed to use qrwlock.
> >
> > A nested read_lock() in interrupt should be fine though, and this
> > is because queue_read_lock_slowpath() "ignores" _QW_WAITING if
> > in_interrupt().
> >
> > This means that rwlock_t has the really strange semantics imho,
> > and again, it is not lockdep-friendly.
> >
> > What do you think we can/should do? Or did I misread this code?
>
> Fix lockdep, although that's non trivial from what I remember.
>
> These (new) semantics were very much on purpose and suggested by Linus
> IIRC.

Hmm, OK.

Lets fix the lockdep annotaions?

Oleg.

--- x/include/linux/rwlock_api_smp.h
+++ x/include/linux/rwlock_api_smp.h
@@ -146,7 +146,7 @@ static inline int __raw_write_trylock(rw
 static inline void __raw_read_lock(rwlock_t *lock)
 {
 	preempt_disable();
-	rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
+	lock_acquire(..., /* read */ in_interrupt() 2 : 1, ...);
 	LOCK_CONTENDED(lock, do_raw_read_trylock, do_raw_read_lock);
 }
 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: qrwlock && read-after-read
  2015-08-04 13:10 ` Peter Zijlstra
  2015-08-04 13:40   ` Oleg Nesterov
@ 2015-08-04 13:51   ` Linus Torvalds
  2015-08-05 13:08     ` Oleg Nesterov
  1 sibling, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2015-08-04 13:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Arnd Bergmann, Ingo Molnar, Paul E.McKenney,
	Waiman Long, Linux Kernel Mailing List

On Tue, Aug 4, 2015 at 6:10 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> These (new) semantics were very much on purpose and suggested by Linus
> IIRC.

Well, I'll not take all the blame.

I refused to have something that broke the tasklist lock, so the "irq
users nest" was a requirement.

I also refused to have the original version that made this a per-lock
explicit and confusing choice, which in turn required changes to
existing users, and that made the interface more complex in ways that
didn't actually help anybody.

So it's not like I love the current semantics, but at least they are
realistic and can work. I agree that teaching lockdep to check for
this would be a good idea, because the semantics _are_ subtle.

(And I'm not 100% convinced we needed the fair model at all, but
fairness does end up being a good thing _if_ it works).

                   Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: qrwlock && read-after-read
  2015-08-04 13:40   ` Oleg Nesterov
@ 2015-08-04 17:32     ` Peter Zijlstra
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Zijlstra @ 2015-08-04 17:32 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Arnd Bergmann, Ingo Molnar, Linus Torvalds, Paul E.McKenney,
	Waiman Long, linux-kernel

On Tue, Aug 04, 2015 at 03:40:43PM +0200, Oleg Nesterov wrote:
> Lets fix the lockdep annotaions?
> 
> Oleg.
> 
> --- x/include/linux/rwlock_api_smp.h
> +++ x/include/linux/rwlock_api_smp.h
> @@ -146,7 +146,7 @@ static inline int __raw_write_trylock(rw
>  static inline void __raw_read_lock(rwlock_t *lock)
>  {
>  	preempt_disable();
> -	rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
> +	lock_acquire(..., /* read */ in_interrupt() 2 : 1, ...);
>  	LOCK_CONTENDED(lock, do_raw_read_trylock, do_raw_read_lock);
>  }

I think that suffers the same problems we had before; see:

  8acd91e86208 ("locking/lockdep: Revert qrwlock recusive stuff")

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: qrwlock && read-after-read
  2015-08-04 13:51   ` Linus Torvalds
@ 2015-08-05 13:08     ` Oleg Nesterov
  0 siblings, 0 replies; 6+ messages in thread
From: Oleg Nesterov @ 2015-08-05 13:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Arnd Bergmann, Ingo Molnar, Paul E.McKenney,
	Waiman Long, Linux Kernel Mailing List

On 08/04, Linus Torvalds wrote:
>
> I refused to have something that broke the tasklist lock, so the "irq
> users nest" was a requirement.

And I was going to reply that this breaks tasklist lock anyway but
failed to find anything wrong after the quick grep.

> So it's not like I love the current semantics, but at least they are
> realistic and can work. I agree that teaching lockdep to check for
> this would be a good idea, because the semantics _are_ subtle.

Yes... Just for example, the comment above task_lock(),

	Nests both inside and outside of read_lock(&tasklist_lock).

is no longer correct. Fortunately task_lock() is not irq-safe, and
iirc nobody does task_lock() + read_lock(&tasklist_lock) in process
context, so we are probably fine. Still, qrwlock changed the rules
and now it can only nest inside of read_lock(tasklist_lock).

Hmm. And afaics this in turn means that the next sentence

	It must not be nested with write_lock_irq(&tasklist_lock),
	neither inside nor outside.

also becomes wrong. So task_lock() can nest inside tasklist_lock,
write-or-read doesn't matter.


So it would be really nice to fix lockdep, but as Peter explains
(thanks Peter!) this is not simple.


> (And I'm not 100% convinced we needed the fair model at all, but
> fairness does end up being a good thing _if_ it works).

Yes. At least this automatically fixes the easy-to-trigger problems
with write_lock(tasklist) starvation/lockup.

Oleg.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-08-05 13:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-04 13:00 qrwlock && read-after-read Oleg Nesterov
2015-08-04 13:10 ` Peter Zijlstra
2015-08-04 13:40   ` Oleg Nesterov
2015-08-04 17:32     ` Peter Zijlstra
2015-08-04 13:51   ` Linus Torvalds
2015-08-05 13:08     ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).