* qrwlock && read-after-read @ 2015-08-04 13:00 Oleg Nesterov 2015-08-04 13:10 ` Peter Zijlstra 0 siblings, 1 reply; 6+ messages in thread From: Oleg Nesterov @ 2015-08-04 13:00 UTC (permalink / raw) To: Arnd Bergmann, Ingo Molnar, Linus Torvalds, Paul E.McKenney, Peter Zijlstra, Waiman Long Cc: linux-kernel I am working on the (off-topic) bug report which motivated me to look at locking/qrwlock.c and it seems to me there is a problem with the queued rwlocks. Unless I am totally confused read-after-read is no longer valid, write_lock() stops the new readers. And lockdep doesn't know this, read_lock()->rwlock_acquire_read() doesn't match the reality. The code doing read_lock(X); read_lock(X); can deadlock if another CPU does write_lock(X) in between. This was fine before rwlock_t was changed to use qrwlock. A nested read_lock() in interrupt should be fine though, and this is because queue_read_lock_slowpath() "ignores" _QW_WAITING if in_interrupt(). This means that rwlock_t has the really strange semantics imho, and again, it is not lockdep-friendly. What do you think we can/should do? Or did I misread this code? Oleg. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: qrwlock && read-after-read 2015-08-04 13:00 qrwlock && read-after-read Oleg Nesterov @ 2015-08-04 13:10 ` Peter Zijlstra 2015-08-04 13:40 ` Oleg Nesterov 2015-08-04 13:51 ` Linus Torvalds 0 siblings, 2 replies; 6+ messages in thread From: Peter Zijlstra @ 2015-08-04 13:10 UTC (permalink / raw) To: Oleg Nesterov Cc: Arnd Bergmann, Ingo Molnar, Linus Torvalds, Paul E.McKenney, Waiman Long, linux-kernel On Tue, Aug 04, 2015 at 03:00:53PM +0200, Oleg Nesterov wrote: > I am working on the (off-topic) bug report which motivated me to > look at locking/qrwlock.c and it seems to me there is a problem > with the queued rwlocks. > > Unless I am totally confused read-after-read is no longer valid, > write_lock() stops the new readers. And lockdep doesn't know this, > read_lock()->rwlock_acquire_read() doesn't match the reality. The > code doing > > read_lock(X); > read_lock(X); > > can deadlock if another CPU does write_lock(X) in between. This > was fine before rwlock_t was changed to use qrwlock. > > A nested read_lock() in interrupt should be fine though, and this > is because queue_read_lock_slowpath() "ignores" _QW_WAITING if > in_interrupt(). > > This means that rwlock_t has the really strange semantics imho, > and again, it is not lockdep-friendly. > > What do you think we can/should do? Or did I misread this code? Fix lockdep, although that's non trivial from what I remember. These (new) semantics were very much on purpose and suggested by Linus IIRC. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: qrwlock && read-after-read 2015-08-04 13:10 ` Peter Zijlstra @ 2015-08-04 13:40 ` Oleg Nesterov 2015-08-04 17:32 ` Peter Zijlstra 2015-08-04 13:51 ` Linus Torvalds 1 sibling, 1 reply; 6+ messages in thread From: Oleg Nesterov @ 2015-08-04 13:40 UTC (permalink / raw) To: Peter Zijlstra Cc: Arnd Bergmann, Ingo Molnar, Linus Torvalds, Paul E.McKenney, Waiman Long, linux-kernel On 08/04, Peter Zijlstra wrote: > > On Tue, Aug 04, 2015 at 03:00:53PM +0200, Oleg Nesterov wrote: > > I am working on the (off-topic) bug report which motivated me to > > look at locking/qrwlock.c and it seems to me there is a problem > > with the queued rwlocks. > > > > Unless I am totally confused read-after-read is no longer valid, > > write_lock() stops the new readers. And lockdep doesn't know this, > > read_lock()->rwlock_acquire_read() doesn't match the reality. The > > code doing > > > > read_lock(X); > > read_lock(X); > > > > can deadlock if another CPU does write_lock(X) in between. This > > was fine before rwlock_t was changed to use qrwlock. > > > > A nested read_lock() in interrupt should be fine though, and this > > is because queue_read_lock_slowpath() "ignores" _QW_WAITING if > > in_interrupt(). > > > > This means that rwlock_t has the really strange semantics imho, > > and again, it is not lockdep-friendly. > > > > What do you think we can/should do? Or did I misread this code? > > Fix lockdep, although that's non trivial from what I remember. > > These (new) semantics were very much on purpose and suggested by Linus > IIRC. Hmm, OK. Lets fix the lockdep annotaions? Oleg. --- x/include/linux/rwlock_api_smp.h +++ x/include/linux/rwlock_api_smp.h @@ -146,7 +146,7 @@ static inline int __raw_write_trylock(rw static inline void __raw_read_lock(rwlock_t *lock) { preempt_disable(); - rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_); + lock_acquire(..., /* read */ in_interrupt() 2 : 1, ...); LOCK_CONTENDED(lock, do_raw_read_trylock, do_raw_read_lock); } ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: qrwlock && read-after-read 2015-08-04 13:40 ` Oleg Nesterov @ 2015-08-04 17:32 ` Peter Zijlstra 0 siblings, 0 replies; 6+ messages in thread From: Peter Zijlstra @ 2015-08-04 17:32 UTC (permalink / raw) To: Oleg Nesterov Cc: Arnd Bergmann, Ingo Molnar, Linus Torvalds, Paul E.McKenney, Waiman Long, linux-kernel On Tue, Aug 04, 2015 at 03:40:43PM +0200, Oleg Nesterov wrote: > Lets fix the lockdep annotaions? > > Oleg. > > --- x/include/linux/rwlock_api_smp.h > +++ x/include/linux/rwlock_api_smp.h > @@ -146,7 +146,7 @@ static inline int __raw_write_trylock(rw > static inline void __raw_read_lock(rwlock_t *lock) > { > preempt_disable(); > - rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_); > + lock_acquire(..., /* read */ in_interrupt() 2 : 1, ...); > LOCK_CONTENDED(lock, do_raw_read_trylock, do_raw_read_lock); > } I think that suffers the same problems we had before; see: 8acd91e86208 ("locking/lockdep: Revert qrwlock recusive stuff") ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: qrwlock && read-after-read 2015-08-04 13:10 ` Peter Zijlstra 2015-08-04 13:40 ` Oleg Nesterov @ 2015-08-04 13:51 ` Linus Torvalds 2015-08-05 13:08 ` Oleg Nesterov 1 sibling, 1 reply; 6+ messages in thread From: Linus Torvalds @ 2015-08-04 13:51 UTC (permalink / raw) To: Peter Zijlstra Cc: Oleg Nesterov, Arnd Bergmann, Ingo Molnar, Paul E.McKenney, Waiman Long, Linux Kernel Mailing List On Tue, Aug 4, 2015 at 6:10 AM, Peter Zijlstra <peterz@infradead.org> wrote: > > These (new) semantics were very much on purpose and suggested by Linus > IIRC. Well, I'll not take all the blame. I refused to have something that broke the tasklist lock, so the "irq users nest" was a requirement. I also refused to have the original version that made this a per-lock explicit and confusing choice, which in turn required changes to existing users, and that made the interface more complex in ways that didn't actually help anybody. So it's not like I love the current semantics, but at least they are realistic and can work. I agree that teaching lockdep to check for this would be a good idea, because the semantics _are_ subtle. (And I'm not 100% convinced we needed the fair model at all, but fairness does end up being a good thing _if_ it works). Linus ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: qrwlock && read-after-read 2015-08-04 13:51 ` Linus Torvalds @ 2015-08-05 13:08 ` Oleg Nesterov 0 siblings, 0 replies; 6+ messages in thread From: Oleg Nesterov @ 2015-08-05 13:08 UTC (permalink / raw) To: Linus Torvalds Cc: Peter Zijlstra, Arnd Bergmann, Ingo Molnar, Paul E.McKenney, Waiman Long, Linux Kernel Mailing List On 08/04, Linus Torvalds wrote: > > I refused to have something that broke the tasklist lock, so the "irq > users nest" was a requirement. And I was going to reply that this breaks tasklist lock anyway but failed to find anything wrong after the quick grep. > So it's not like I love the current semantics, but at least they are > realistic and can work. I agree that teaching lockdep to check for > this would be a good idea, because the semantics _are_ subtle. Yes... Just for example, the comment above task_lock(), Nests both inside and outside of read_lock(&tasklist_lock). is no longer correct. Fortunately task_lock() is not irq-safe, and iirc nobody does task_lock() + read_lock(&tasklist_lock) in process context, so we are probably fine. Still, qrwlock changed the rules and now it can only nest inside of read_lock(tasklist_lock). Hmm. And afaics this in turn means that the next sentence It must not be nested with write_lock_irq(&tasklist_lock), neither inside nor outside. also becomes wrong. So task_lock() can nest inside tasklist_lock, write-or-read doesn't matter. So it would be really nice to fix lockdep, but as Peter explains (thanks Peter!) this is not simple. > (And I'm not 100% convinced we needed the fair model at all, but > fairness does end up being a good thing _if_ it works). Yes. At least this automatically fixes the easy-to-trigger problems with write_lock(tasklist) starvation/lockup. Oleg. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-08-05 13:10 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-08-04 13:00 qrwlock && read-after-read Oleg Nesterov 2015-08-04 13:10 ` Peter Zijlstra 2015-08-04 13:40 ` Oleg Nesterov 2015-08-04 17:32 ` Peter Zijlstra 2015-08-04 13:51 ` Linus Torvalds 2015-08-05 13:08 ` Oleg Nesterov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).