From: Oleg Nesterov <oleg@tv-sign.ru>
To: Johannes Berg <johannes@sipsolutions.net>
Cc: Ingo Molnar <mingo@redhat.com>,
Arjan van de Ven <arjan@linux.intel.com>,
Linux Kernel list <linux-kernel@vger.kernel.org>,
linux-wireless <linux-wireless@vger.kernel.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
mingo@elte.hu, Thomas Sattler <tsattler@gmx.de>
Subject: Re: [RFC/PATCH] debug workqueue deadlocks with lockdep
Date: Tue, 3 Jul 2007 21:31:12 +0400 [thread overview]
Message-ID: <20070703173112.GA108@tv-sign.ru> (raw)
In-Reply-To: <1183381393.4089.117.camel@johannes.berg>
On 07/02, Johannes Berg wrote:
>
> On Sat, 2007-06-30 at 15:46 +0400, Oleg Nesterov wrote:
>
> > Johannes, could you change wait_on_work() as well? Most users of
> > flush_workqueue() should be converted to use it.
(to avoid a possible confusion: I meant, most users of flush_workqueue()
should be converted to use cancel_work_sync/cancel_delayed_work_sync
which in turn use wait_on_work()).
> I think this could lead to false positives, but then I think we
> shouldn't care about those. Let me explain. The thing is that with this
> it can happen that the code using the workqueue somehow obeys an
> ordering in the work it queues, so as far as I can tell it'd be fine to
> have two work items A and B where only B takes a lock L1, and then have
> a wait_on_work(A) under L1, if and only if B was always queued right
> after A (or something like that).
Not sure I understand. Yes, we can have false positives, but I think the
ordering in the workqueue doesn't matter.
If A does NOT take a lock L1, then it is OK to do cancel_work_sync(A)
under L1, regardless of which other work_structs this workqueue has,
before or after A.
Now we have a false positive if some time we queue B into that workqueue,
and this is not good.
We can avoid this problem if we put lockdep_map into work_struct, so
that wait_on_work() "locks" work->lockdep_map, while flush_workqueue()
takes wq->lockdep_map.
But probably we don't need this right now, at least until we really
have a lot of false positives while converting from flush_workqueue()
to cancel_work_sync/cancel_delayed_work_sync.
> However, since this invariant is
> rather likely to be broken by subsequent changes to the code, I think
> such code should probably use two workqueues instead, and I doubt it
> ever happens anyway since then people could in most cases just put both
> works into one function.
Well, I am not sure, think about the shared workqueues like keventd_wq...
> Hence I included it in my patch below.
a couple of minor nits below,
> @@ -257,7 +260,9 @@ static void run_workqueue(struct cpu_wor
>
> BUG_ON(get_wq_data(work) != cwq);
> work_clear_pending(work);
> + lock_acquire(&cwq->wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
> f(work);
> + lock_release(&cwq->wq->lockdep_map, 0, _THIS_IP_);
^^^
Isn't it better to call lock_release() with nested == 1 ?
> +#define create_workqueue(name) \
> +({ \
> + static struct lock_class_key __key; \
> + struct workqueue_struct *__wq; \
> + \
> + __wq = __create_workqueue((name), 0, 0, &__key); \
> + __wq; \
> +})
Why do we need __wq ?
+#define create_workqueue(name) \
({ \
static struct lock_class_key __key; \
__create_workqueue((name), 0, 0, &__key); \
})
Actually, I'd suggest to rename __create_workqueue() to __create_workqueue_key(),
and then you need the only change in linux/workqueue.h
- extern struct workqueue_struct *__create_workqueue(...);
+ extern struct workqueue_struct *__create_workqueue_key(..., key);
+ #define __create_workqueue(...) \
+ static struct lock_class_key __key; \
+ __create_workqueue_key(..., key); \
but this is a matter of taste.
Btw, I think your patch found a real bug in net/mac80211/, cool!
Oleg.
next prev parent reply other threads:[~2007-07-03 17:31 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-27 18:40 [RFC/PATCH] debug workqueue deadlocks with lockdep Johannes Berg
2007-06-28 16:33 ` Johannes Berg
2007-06-28 17:33 ` Johannes Berg
2007-06-30 8:05 ` Ingo Molnar
2007-06-30 11:46 ` Oleg Nesterov
2007-07-02 8:37 ` Johannes Berg
2007-07-02 13:03 ` Johannes Berg
2007-07-03 17:31 ` Oleg Nesterov [this message]
2007-07-03 19:42 ` Ingo Molnar
2007-07-04 11:49 ` Johannes Berg
2007-07-04 12:21 ` Ingo Molnar
2007-07-04 13:59 ` Johannes Berg
2007-07-04 12:25 ` Ingo Molnar
2007-07-04 12:52 ` Oleg Nesterov
2007-07-04 13:57 ` Johannes Berg
2007-07-05 8:43 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070703173112.GA108@tv-sign.ru \
--to=oleg@tv-sign.ru \
--cc=a.p.zijlstra@chello.nl \
--cc=arjan@linux.intel.com \
--cc=johannes@sipsolutions.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-wireless@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=mingo@redhat.com \
--cc=tsattler@gmx.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.