From: Tejun Heo <tj@kernel.org>
To: Hugh Dickins <hughd@google.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>,
Yong Zhang <yong.zhang0@gmail.com>,
linux-kernel@vger.kernel.org
Subject: Re: linux-next oops in __lock_acquire for process_one_work
Date: Mon, 7 May 2012 10:57:43 -0700 [thread overview]
Message-ID: <20120507175743.GC19417@google.com> (raw)
In-Reply-To: <alpine.LSU.2.00.1205070951170.1544@eggly.anvils>
(cc'ing Peter and Ingo and quoting whole body)
On Mon, May 07, 2012 at 10:19:09AM -0700, Hugh Dickins wrote:
> Running MM load on recent linux-nexts (e.g. 3.4.0-rc5-next-20120504),
> with CONFIG_PROVE_LOCKING=y, I've been hitting an oops in __lock_acquire
> called from lock_acquire called from process_one_work: serving mm/swap.c's
> lru_add_drain_all - schedule_on_each_cpu(lru_add_drain_per_cpu).
>
> In each case the oopsing address has been ffffffff00000198, and the
> oopsing instruction is the "atomic_inc((atomic_t *)&class->ops)" in
> __lock_acquire: so class is ffffffff00000000.
>
> I notice Stephen's commit 0976dfc1d0cd80a4e9dfaf87bd8744612bde475a
> workqueue: Catch more locking problems with flush_work()
> in linux-next but not 3.4-rc, adding
> lock_map_acquire(&work->lockdep_map);
> lock_map_release(&work->lockdep_map);
> to flush_work.
>
> I believe that occasionally races with your
> struct lockdep_map lockdep_map = work->lockdep_map;
> in process_one_work, putting an entry into the class_cache
> just as you're copying it, so you end up with half a pointer.
> yes, the structure copy is using "rep movsl" not "rep movsq".
>
> I've reverted Stephen's commit from my testing, and indeed it's
> now run that MM load much longer than I've seen since this bug
> first appeared. Though I suspect that strictly it's your
> unlocked copying of the lockdep_map that's to blame. Probably
> easily fixed by someone who understands lockdep - not me!
The offending commit is 0976dfc1d0cd80a4e9dfaf87bd8744612bde475a
"workqueue: Catch more locking problems with flush_work()". It sounds
fancy but all it does is adding the following to flush_work().
lock_map_acquire(&work->lockdep_map);
lock_map_release(&work->lockdep_map);
Which seems correct to me and more importantly not different from what
wait_on_work() does, so if this is broken, flush_work_sync() and
cancel_work_sync() are broken too - probably masked by lower usage
frequency.
It seems the problem stems from how process_one_work() "caches"
lockdep_map. This part predates cmwq changes but it seems necessary
because the work item may be freed during execution but lockdep_map
should be released after execution is complete. Peter, do you
remember how this lockdep_map copying is added? Is (or was) this
correct? If it's broken, how do we fix it? Add a lockdep_map copy
API which does some magic lockdep locking dancing?
Thanks.
--
tejun
next prev parent reply other threads:[~2012-05-07 17:57 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-07 17:19 linux-next oops in __lock_acquire for process_one_work Hugh Dickins
2012-05-07 17:57 ` Tejun Heo [this message]
2012-05-08 13:03 ` Peter Zijlstra
2012-05-08 16:58 ` Tejun Heo
2012-05-08 17:02 ` Peter Zijlstra
2012-05-08 18:11 ` Hugh Dickins
2012-05-08 22:31 ` Peter Zijlstra
2012-05-08 22:58 ` Hugh Dickins
2012-05-09 9:25 ` Ingo Molnar
2012-05-09 20:09 ` Hugh Dickins
2012-05-10 17:52 ` Hugh Dickins
2012-05-14 21:27 ` Tejun Heo
2012-05-15 11:11 ` Peter Zijlstra
2012-05-15 15:10 ` [PATCH] lockdep: fix oops in processing workqueue Tejun Heo
2012-05-15 15:29 ` Dave Jones
2012-05-15 15:31 ` Tejun Heo
2012-05-15 20:36 ` Hugh Dickins
2012-05-08 18:05 ` linux-next oops in __lock_acquire for process_one_work Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120507175743.GC19417@google.com \
--to=tj@kernel.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=sboyd@codeaurora.org \
--cc=yong.zhang0@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).