From: Byungchul Park <byungchul.park@lge.com>
To: torvalds@linux-foundation.org, peterz@infradead.org,
mingo@redhat.com, will@kernel.org
Cc: linux-kernel@vger.kernel.org, tglx@linutronix.de,
rostedt@goodmis.org, joel@joelfernandes.org,
alexander.levin@microsoft.com, daniel.vetter@ffwll.ch,
chris@chris-wilson.co.uk, duyuyang@gmail.com,
johannes.berg@intel.com, tj@kernel.org, tytso@mit.edu,
willy@infradead.org, david@fromorbit.com, amir73il@gmail.com,
bfields@fieldses.org, gregkh@linuxfoundation.org,
kernel-team@lge.com
Subject: [RFC] Are you good with Lockdep?
Date: Wed, 11 Nov 2020 14:05:59 +0900 [thread overview]
Message-ID: <20201111050559.GA24438@X58A-UD3R> (raw)
Hello folks,
We have no choise but to use Lockdep to track dependencies for deadlock
detection with the current kernel. I'm wondering if they are satifsied
in that tool. Lockdep has too big problems to continue to use.
---
PROBLEM 1) First of all, Lockdep gets disabled on the first detection.
What if there are more than two problems? We cannot get reported
other than the first one. So the one who has introduced the first one
should fix it as soon as possible so that the other problems can be
reported and fixed. It will get even worse if it's a false positive
because it's worth nothing but only preventing reporting real ones.
That's why kernel developers are so sensitive to Lockdep's false
positive reporting - I would, too. But precisely speaking, it's a
problem of how Lockdep was designed and implemented, not false
positive itself. Annoying false positives - as WARN()'s messages are
annoying - should be fixed but we don't have to be as sensitive as we
are now if the tool keeps normally working even after reporting.
But it's very hard to achieve it with Lockdep because of the complex
design. Maybe re-designing and re-implementing almost whole code
would be required.
PROBLEM 2) Lockdep forces us to emulate lock acquisition for non-lock.
We add manual annotations for non-lock code in the following way:
At the interest wait,
...
lockdep_acquire(X);
lockdep_release(X);
wait_for_something(X);
...
At begin and end of the region where we expect there's the something,
...
lockdep_acquire(X);
(or lockdep_acquire_read(); to allow recursive annotations.)
function_doing_the_something(X);
lockdep_release(X);
...
This way we try to detect deadlocks by waits for now. But don't you
think it looks ugly? Are you good if it manages to work by some
means? That even doesn't work correctly. Instead it should look like:
At the interest wait,
...
xxx_wait(X);
wait_for_something(X);
...
At the something,
...
xxx_event(X);
do_the_something(X);
...
Or at begin and end of the region for hint,
...
xxx_event_context_enter(X);
function_doing_the_something(X);
xxx_event_context_exit(X);
...
Lockdep had been a not bad tool for detecting deadlock by problematic
acquisition order. But it's worth noting that deadlock is caused by
*waits* and their *events* that never reach. Deadlock detection tool
should focus on waits and events instead of lock acquisition order.
Just FYI, it should look like for locks:
At the interest lock acquisition,
...
xxx_wait(X);
xxx_event_context_enter(X);
lock(X);
...
At the lock acquisition using trylock type,
...
xxx_event_context_enter(X);
lock(X);
...
At the lock release,
...
xxx_event(X);
xxx_event_context_exit(X);
unlock(X);
...
---
These two are big-why we should not keep using Lockdep as a deadlock
detection tool. Other small things can be fixed by modifying Lockdep but
these two are not.
Fine. What could we do for it? Options that I've considered are:
---
OPTION 1) Revert reverting cross-release locking checks (e966eaeeb62
locking/lockdep: Remove the cross-release locking checks) or implement
another Lockdep extention like cross-release.
The reason cross-release was reverted was a few false positives -
someone was lying like there were too many false positives though -
leading people to disable Lockdep. I admit it had to be done that way.
Folks still don't like Lockdep's false positive that stops the tool.
OPTION 2) Newally design and implement another tool for deadlock
detection based on wait-event model. And replace Lockdep right away.
Lockdep definitely includes all the efforts great developers have
made for a long time as as to be quite stable enough. But the new one
is not. It's not good idea to replace Lockdep right away.
OPTION 3) Newally design and implement another tool for deadlock
detection based on wait-event model. And keep both Lockdep and the new
tool until the new one gets considered stable.
For people who need stronger capacity for deadlock detection, the new
tool needs to be introduced but de-coupled with Lockdep so as to be
getting matured independently. I think this option is the best.
I have the patch set. Let me share it with you in a few days.
---
Thanks,
Byungchul
next reply other threads:[~2020-11-11 5:37 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-11 5:05 Byungchul Park [this message]
2020-11-11 10:54 ` [RFC] Are you good with Lockdep? Ingo Molnar
2020-11-11 14:36 ` Steven Rostedt
2020-11-11 23:16 ` Thomas Gleixner
2020-11-12 8:10 ` Byungchul Park
2020-11-12 14:26 ` Steven Rostedt
2020-11-12 14:52 ` Matthew Wilcox
2020-11-16 8:57 ` Byungchul Park
2020-11-16 15:37 ` Matthew Wilcox
2020-11-18 1:45 ` Boqun Feng
2020-11-18 3:30 ` Matthew Wilcox
2020-11-23 13:15 ` Byungchul Park
2020-11-12 14:58 ` Byungchul Park
2020-11-16 9:05 ` Byungchul Park
2020-11-23 10:45 ` Byungchul Park
2020-11-12 10:32 ` Byungchul Park
2020-11-12 13:56 ` Daniel Vetter
2020-11-16 8:45 ` Byungchul Park
2020-11-12 6:15 ` Byungchul Park
2020-11-12 8:51 ` Byungchul Park
2020-11-12 9:46 ` Byungchul Park
2020-11-23 11:05 ` [RFC] Dept(Dependency Tracker) Implementation Byungchul Park
2020-11-23 11:36 ` [RFC 1/6] dept: Implement Dept(Dependency Tracker) Byungchul Park
2020-11-23 11:36 ` [RFC 2/6] dept: Apply Dept to spinlock Byungchul Park
2020-11-23 11:36 ` [RFC 3/6] dept: Apply Dept to mutex families Byungchul Park
2020-11-23 11:36 ` [RFC 4/6] dept: Apply Dept to rwlock Byungchul Park
2020-11-23 11:36 ` [RFC 5/6] dept: Apply Dept to wait_for_completion()/complete() Byungchul Park
2020-11-23 11:36 ` [RFC 6/6] dept: Assign custom dept_keys or disable to avoid false positives Byungchul Park
2020-11-23 12:29 ` [RFC] Dept(Dependency Tracker) Implementation Byungchul Park
2020-11-23 11:13 ` [RFC] Dept(Dependency Tracker) Report Example Byungchul Park
2020-11-23 12:14 ` Byungchul Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201111050559.GA24438@X58A-UD3R \
--to=byungchul.park@lge.com \
--cc=alexander.levin@microsoft.com \
--cc=amir73il@gmail.com \
--cc=bfields@fieldses.org \
--cc=chris@chris-wilson.co.uk \
--cc=daniel.vetter@ffwll.ch \
--cc=david@fromorbit.com \
--cc=duyuyang@gmail.com \
--cc=gregkh@linuxfoundation.org \
--cc=joel@joelfernandes.org \
--cc=johannes.berg@intel.com \
--cc=kernel-team@lge.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=tytso@mit.edu \
--cc=will@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox