From: Oleg Nesterov <oleg@redhat.com>
To: Davidlohr Bueso <dave@stgolabs.net>, Nicholas Guire <der.herr@hofr.at>
Cc: paulmck@linux.vnet.ibm.com, linux-kernel@vger.kernel.org,
waiman.long@hp.com, peterz@infradead.org,
raghavendra.kt@linux.vnet.ibm.com
Subject: Re: BUG: spinlock bad magic on CPU#0, migration/0/9
Date: Thu, 12 Feb 2015 18:28:05 +0100 [thread overview]
Message-ID: <20150212172805.GA20850@redhat.com> (raw)
In-Reply-To: <1423710911.2046.50.camel@stgolabs.net>
On 02/11, Davidlohr Bueso wrote:
>
> On Wed, 2015-02-11 at 16:34 -0800, Paul E. McKenney wrote:
> > Hello!
> >
> > Did an earlier-than-usual port of v3.21 patches to post-v3.19, and
> > hit the following on x86_64. This happened after about 15 minutes of
> > rcutorture. In contrast, I have been doing successful 15-hour runs
> > on v3.19. I will check reproducibility and try to narrow it down.
> > Might this be a duplicate of the bug that Raghavendra posted a fix for?
> >
> > Anyway, this was on 3e8c04eb1174 (Merge branch 'for-3.20' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata).
> >
> > [ 837.287011] BUG: spinlock bad magic on CPU#0, migration/0/9
> > [ 837.287013] lock: 0xffff88001ea0fe80, .magic: ffffffff, .owner: gî<81>ÿÿÿÿ/0, .owner_cpu: -42
> > [ 837.287013] CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.19.0+ #1
> > [ 837.287013] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > [ 837.287013] ffff88001ea0fe80 ffff88001ea0bc78 ffffffff818f6f4b ffffffff810a5a51
> > [ 837.287013] ffffffff81e500e0 ffff88001ea0bc98 ffffffff818f3755 ffff88001ea0fe80
> > [ 837.287013] ffffffff81ca4396 ffff88001ea0bcb8 ffffffff818f377b ffff88001ea0fe80
> > [ 837.287013] Call Trace:
> > [ 837.287013] [<ffffffff818f6f4b>] dump_stack+0x45/0x57
> > [ 837.287013] [<ffffffff810a5a51>] ? console_unlock+0x1f1/0x4c0
> > [ 837.287013] [<ffffffff818f3755>] spin_dump+0x8b/0x90
> > [ 837.287013] [<ffffffff818f377b>] spin_bug+0x21/0x26
> > [ 837.287013] [<ffffffff8109923c>] do_raw_spin_unlock+0x5c/0xa0
> > [ 837.287013] [<ffffffff81902587>] _raw_spin_unlock_irqrestore+0x27/0x50
> > [ 837.287013] [<ffffffff8108f0a1>] complete+0x41/0x50
>
> We did have some recent changes in completions:
>
> 7c34e318 (sched/completion: Add lock-free checking of the blocking case)
> de30ec47 (sched/completion: Remove unnecessary ->wait.lock serialization when reading completion state)
>
> The second one being more related (although both appear to make sense).
> Perhaps some subtle implication in the completion_done side that
> disappeared with the spinlock?
At first glance both changes look suspicious. Unless at least document how
you can use these helpers.
Consider this code:
void xxx(void)
{
struct completion c;
init_completion(&c);
expose_this_completion(&c);
while (!completion_done(&c)
schedule_timeout_uninterruptible(1);
}
Before that change this code was correct, now it is not. Hmm and note that
this is what stop_machine_from_inactive_cpu() does although I do not know
if this is related or not.
Because completion_done() can now race with complete(), the final
spin_unlock() can write to the memory after it was freed/reused. In this
case it can write to the stack after return.
Add CC's.
Oleg.
next prev parent reply other threads:[~2015-02-12 17:30 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-12 0:34 BUG: spinlock bad magic on CPU#0, migration/0/9 Paul E. McKenney
2015-02-12 3:15 ` Davidlohr Bueso
2015-02-12 3:43 ` Paul E. McKenney
2015-02-12 17:28 ` Oleg Nesterov [this message]
2015-02-12 17:41 ` Oleg Nesterov
2015-02-12 17:58 ` Davidlohr Bueso
2015-02-12 19:10 ` Nicholas Mc Guire
2015-02-12 19:37 ` Oleg Nesterov
2015-02-12 21:27 ` Oleg Nesterov
2015-02-13 18:17 ` Nicholas Mc Guire
2015-02-13 18:53 ` Oleg Nesterov
2015-02-14 8:35 ` Nicholas Mc Guire
2015-02-14 14:00 ` Oleg Nesterov
2015-02-12 19:59 ` Davidlohr Bueso
2015-02-12 19:32 ` Nicholas Mc Guire
2015-02-12 19:39 ` Oleg Nesterov
2015-02-12 19:59 ` [PATCH] sched/completion: completion_done() should serialize with complete() Oleg Nesterov
2015-02-13 21:09 ` Paul E. McKenney
2015-02-13 21:56 ` Davidlohr Bueso
2015-02-13 22:02 ` Davidlohr Bueso
2015-02-16 8:21 ` Peter Zijlstra
2015-02-16 16:51 ` Oleg Nesterov
2015-02-18 17:06 ` [tip:sched/core] sched/completion: Serialize completion_done() " tip-bot for Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150212172805.GA20850@redhat.com \
--to=oleg@redhat.com \
--cc=dave@stgolabs.net \
--cc=der.herr@hofr.at \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@linux.vnet.ibm.com \
--cc=waiman.long@hp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.