* qrwlock && read-after-read
@ 2015-08-04 13:00 Oleg Nesterov
2015-08-04 13:10 ` Peter Zijlstra
0 siblings, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2015-08-04 13:00 UTC (permalink / raw)
To: Arnd Bergmann, Ingo Molnar, Linus Torvalds, Paul E.McKenney,
Peter Zijlstra, Waiman Long
Cc: linux-kernel
I am working on the (off-topic) bug report which motivated me to
look at locking/qrwlock.c and it seems to me there is a problem
with the queued rwlocks.
Unless I am totally confused read-after-read is no longer valid,
write_lock() stops the new readers. And lockdep doesn't know this,
read_lock()->rwlock_acquire_read() doesn't match the reality. The
code doing
read_lock(X);
read_lock(X);
can deadlock if another CPU does write_lock(X) in between. This
was fine before rwlock_t was changed to use qrwlock.
A nested read_lock() in interrupt should be fine though, and this
is because queue_read_lock_slowpath() "ignores" _QW_WAITING if
in_interrupt().
This means that rwlock_t has the really strange semantics imho,
and again, it is not lockdep-friendly.
What do you think we can/should do? Or did I misread this code?
Oleg.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: qrwlock && read-after-read
2015-08-04 13:00 qrwlock && read-after-read Oleg Nesterov
@ 2015-08-04 13:10 ` Peter Zijlstra
2015-08-04 13:40 ` Oleg Nesterov
2015-08-04 13:51 ` Linus Torvalds
0 siblings, 2 replies; 6+ messages in thread
From: Peter Zijlstra @ 2015-08-04 13:10 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Arnd Bergmann, Ingo Molnar, Linus Torvalds, Paul E.McKenney,
Waiman Long, linux-kernel
On Tue, Aug 04, 2015 at 03:00:53PM +0200, Oleg Nesterov wrote:
> I am working on the (off-topic) bug report which motivated me to
> look at locking/qrwlock.c and it seems to me there is a problem
> with the queued rwlocks.
>
> Unless I am totally confused read-after-read is no longer valid,
> write_lock() stops the new readers. And lockdep doesn't know this,
> read_lock()->rwlock_acquire_read() doesn't match the reality. The
> code doing
>
> read_lock(X);
> read_lock(X);
>
> can deadlock if another CPU does write_lock(X) in between. This
> was fine before rwlock_t was changed to use qrwlock.
>
> A nested read_lock() in interrupt should be fine though, and this
> is because queue_read_lock_slowpath() "ignores" _QW_WAITING if
> in_interrupt().
>
> This means that rwlock_t has the really strange semantics imho,
> and again, it is not lockdep-friendly.
>
> What do you think we can/should do? Or did I misread this code?
Fix lockdep, although that's non trivial from what I remember.
These (new) semantics were very much on purpose and suggested by Linus
IIRC.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: qrwlock && read-after-read
2015-08-04 13:10 ` Peter Zijlstra
@ 2015-08-04 13:40 ` Oleg Nesterov
2015-08-04 17:32 ` Peter Zijlstra
2015-08-04 13:51 ` Linus Torvalds
1 sibling, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2015-08-04 13:40 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Arnd Bergmann, Ingo Molnar, Linus Torvalds, Paul E.McKenney,
Waiman Long, linux-kernel
On 08/04, Peter Zijlstra wrote:
>
> On Tue, Aug 04, 2015 at 03:00:53PM +0200, Oleg Nesterov wrote:
> > I am working on the (off-topic) bug report which motivated me to
> > look at locking/qrwlock.c and it seems to me there is a problem
> > with the queued rwlocks.
> >
> > Unless I am totally confused read-after-read is no longer valid,
> > write_lock() stops the new readers. And lockdep doesn't know this,
> > read_lock()->rwlock_acquire_read() doesn't match the reality. The
> > code doing
> >
> > read_lock(X);
> > read_lock(X);
> >
> > can deadlock if another CPU does write_lock(X) in between. This
> > was fine before rwlock_t was changed to use qrwlock.
> >
> > A nested read_lock() in interrupt should be fine though, and this
> > is because queue_read_lock_slowpath() "ignores" _QW_WAITING if
> > in_interrupt().
> >
> > This means that rwlock_t has the really strange semantics imho,
> > and again, it is not lockdep-friendly.
> >
> > What do you think we can/should do? Or did I misread this code?
>
> Fix lockdep, although that's non trivial from what I remember.
>
> These (new) semantics were very much on purpose and suggested by Linus
> IIRC.
Hmm, OK.
Lets fix the lockdep annotaions?
Oleg.
--- x/include/linux/rwlock_api_smp.h
+++ x/include/linux/rwlock_api_smp.h
@@ -146,7 +146,7 @@ static inline int __raw_write_trylock(rw
static inline void __raw_read_lock(rwlock_t *lock)
{
preempt_disable();
- rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
+ lock_acquire(..., /* read */ in_interrupt() 2 : 1, ...);
LOCK_CONTENDED(lock, do_raw_read_trylock, do_raw_read_lock);
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: qrwlock && read-after-read
2015-08-04 13:10 ` Peter Zijlstra
2015-08-04 13:40 ` Oleg Nesterov
@ 2015-08-04 13:51 ` Linus Torvalds
2015-08-05 13:08 ` Oleg Nesterov
1 sibling, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2015-08-04 13:51 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Oleg Nesterov, Arnd Bergmann, Ingo Molnar, Paul E.McKenney,
Waiman Long, Linux Kernel Mailing List
On Tue, Aug 4, 2015 at 6:10 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> These (new) semantics were very much on purpose and suggested by Linus
> IIRC.
Well, I'll not take all the blame.
I refused to have something that broke the tasklist lock, so the "irq
users nest" was a requirement.
I also refused to have the original version that made this a per-lock
explicit and confusing choice, which in turn required changes to
existing users, and that made the interface more complex in ways that
didn't actually help anybody.
So it's not like I love the current semantics, but at least they are
realistic and can work. I agree that teaching lockdep to check for
this would be a good idea, because the semantics _are_ subtle.
(And I'm not 100% convinced we needed the fair model at all, but
fairness does end up being a good thing _if_ it works).
Linus
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: qrwlock && read-after-read
2015-08-04 13:40 ` Oleg Nesterov
@ 2015-08-04 17:32 ` Peter Zijlstra
0 siblings, 0 replies; 6+ messages in thread
From: Peter Zijlstra @ 2015-08-04 17:32 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Arnd Bergmann, Ingo Molnar, Linus Torvalds, Paul E.McKenney,
Waiman Long, linux-kernel
On Tue, Aug 04, 2015 at 03:40:43PM +0200, Oleg Nesterov wrote:
> Lets fix the lockdep annotaions?
>
> Oleg.
>
> --- x/include/linux/rwlock_api_smp.h
> +++ x/include/linux/rwlock_api_smp.h
> @@ -146,7 +146,7 @@ static inline int __raw_write_trylock(rw
> static inline void __raw_read_lock(rwlock_t *lock)
> {
> preempt_disable();
> - rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
> + lock_acquire(..., /* read */ in_interrupt() 2 : 1, ...);
> LOCK_CONTENDED(lock, do_raw_read_trylock, do_raw_read_lock);
> }
I think that suffers the same problems we had before; see:
8acd91e86208 ("locking/lockdep: Revert qrwlock recusive stuff")
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: qrwlock && read-after-read
2015-08-04 13:51 ` Linus Torvalds
@ 2015-08-05 13:08 ` Oleg Nesterov
0 siblings, 0 replies; 6+ messages in thread
From: Oleg Nesterov @ 2015-08-05 13:08 UTC (permalink / raw)
To: Linus Torvalds
Cc: Peter Zijlstra, Arnd Bergmann, Ingo Molnar, Paul E.McKenney,
Waiman Long, Linux Kernel Mailing List
On 08/04, Linus Torvalds wrote:
>
> I refused to have something that broke the tasklist lock, so the "irq
> users nest" was a requirement.
And I was going to reply that this breaks tasklist lock anyway but
failed to find anything wrong after the quick grep.
> So it's not like I love the current semantics, but at least they are
> realistic and can work. I agree that teaching lockdep to check for
> this would be a good idea, because the semantics _are_ subtle.
Yes... Just for example, the comment above task_lock(),
Nests both inside and outside of read_lock(&tasklist_lock).
is no longer correct. Fortunately task_lock() is not irq-safe, and
iirc nobody does task_lock() + read_lock(&tasklist_lock) in process
context, so we are probably fine. Still, qrwlock changed the rules
and now it can only nest inside of read_lock(tasklist_lock).
Hmm. And afaics this in turn means that the next sentence
It must not be nested with write_lock_irq(&tasklist_lock),
neither inside nor outside.
also becomes wrong. So task_lock() can nest inside tasklist_lock,
write-or-read doesn't matter.
So it would be really nice to fix lockdep, but as Peter explains
(thanks Peter!) this is not simple.
> (And I'm not 100% convinced we needed the fair model at all, but
> fairness does end up being a good thing _if_ it works).
Yes. At least this automatically fixes the easy-to-trigger problems
with write_lock(tasklist) starvation/lockup.
Oleg.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-08-05 13:10 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-04 13:00 qrwlock && read-after-read Oleg Nesterov
2015-08-04 13:10 ` Peter Zijlstra
2015-08-04 13:40 ` Oleg Nesterov
2015-08-04 17:32 ` Peter Zijlstra
2015-08-04 13:51 ` Linus Torvalds
2015-08-05 13:08 ` Oleg Nesterov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).