From: Tejun Heo <tj@kernel.org>
To: Hugh Dickins <hughd@google.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>,
Yong Zhang <yong.zhang0@gmail.com>,
linux-kernel@vger.kernel.org
Subject: Re: linux-next oops in __lock_acquire for process_one_work
Date: Mon, 7 May 2012 10:57:43 -0700 [thread overview]
Message-ID: <20120507175743.GC19417@google.com> (raw)
In-Reply-To: <alpine.LSU.2.00.1205070951170.1544@eggly.anvils>
(cc'ing Peter and Ingo and quoting whole body)
On Mon, May 07, 2012 at 10:19:09AM -0700, Hugh Dickins wrote:
> Running MM load on recent linux-nexts (e.g. 3.4.0-rc5-next-20120504),
> with CONFIG_PROVE_LOCKING=y, I've been hitting an oops in __lock_acquire
> called from lock_acquire called from process_one_work: serving mm/swap.c's
> lru_add_drain_all - schedule_on_each_cpu(lru_add_drain_per_cpu).
>
> In each case the oopsing address has been ffffffff00000198, and the
> oopsing instruction is the "atomic_inc((atomic_t *)&class->ops)" in
> __lock_acquire: so class is ffffffff00000000.
>
> I notice Stephen's commit 0976dfc1d0cd80a4e9dfaf87bd8744612bde475a
> workqueue: Catch more locking problems with flush_work()
> in linux-next but not 3.4-rc, adding
> lock_map_acquire(&work->lockdep_map);
> lock_map_release(&work->lockdep_map);
> to flush_work.
>
> I believe that occasionally races with your
> struct lockdep_map lockdep_map = work->lockdep_map;
> in process_one_work, putting an entry into the class_cache
> just as you're copying it, so you end up with half a pointer.
> yes, the structure copy is using "rep movsl" not "rep movsq".
>
> I've reverted Stephen's commit from my testing, and indeed it's
> now run that MM load much longer than I've seen since this bug
> first appeared. Though I suspect that strictly it's your
> unlocked copying of the lockdep_map that's to blame. Probably
> easily fixed by someone who understands lockdep - not me!
The offending commit is 0976dfc1d0cd80a4e9dfaf87bd8744612bde475a
"workqueue: Catch more locking problems with flush_work()". It sounds
fancy but all it does is adding the following to flush_work().
lock_map_acquire(&work->lockdep_map);
lock_map_release(&work->lockdep_map);
Which seems correct to me and more importantly not different from what
wait_on_work() does, so if this is broken, flush_work_sync() and
cancel_work_sync() are broken too - probably masked by lower usage
frequency.
It seems the problem stems from how process_one_work() "caches"
lockdep_map. This part predates cmwq changes but it seems necessary
because the work item may be freed during execution but lockdep_map
should be released after execution is complete. Peter, do you
remember how this lockdep_map copying is added? Is (or was) this
correct? If it's broken, how do we fix it? Add a lockdep_map copy
API which does some magic lockdep locking dancing?
Thanks.
--
tejun
next prev parent reply other threads:[~2012-05-07 17:57 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-07 17:19 linux-next oops in __lock_acquire for process_one_work Hugh Dickins
2012-05-07 17:57 ` Tejun Heo [this message]
2012-05-08 13:03 ` Peter Zijlstra
2012-05-08 16:58 ` Tejun Heo
2012-05-08 17:02 ` Peter Zijlstra
2012-05-08 18:11 ` Hugh Dickins
2012-05-08 22:31 ` Peter Zijlstra
2012-05-08 22:58 ` Hugh Dickins
2012-05-09 9:25 ` Ingo Molnar
2012-05-09 20:09 ` Hugh Dickins
2012-05-10 17:52 ` Hugh Dickins
2012-05-14 21:27 ` Tejun Heo
2012-05-15 11:11 ` Peter Zijlstra
2012-05-15 15:10 ` [PATCH] lockdep: fix oops in processing workqueue Tejun Heo
2012-05-15 15:29 ` Dave Jones
2012-05-15 15:31 ` Tejun Heo
2012-05-15 20:36 ` Hugh Dickins
2012-05-08 18:05 ` linux-next oops in __lock_acquire for process_one_work Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120507175743.GC19417@google.com \
--to=tj@kernel.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=sboyd@codeaurora.org \
--cc=yong.zhang0@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.