From: Nicholas Mc Guire <der.herr@hofr.at>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>,
paulmck@linux.vnet.ibm.com, linux-kernel@vger.kernel.org,
waiman.long@hp.com, peterz@infradead.org,
raghavendra.kt@linux.vnet.ibm.com
Subject: Re: BUG: spinlock bad magic on CPU#0, migration/0/9
Date: Sat, 14 Feb 2015 09:35:55 +0100 [thread overview]
Message-ID: <20150214083555.GA30176@opentech.at> (raw)
In-Reply-To: <20150213185328.GA19746@redhat.com>
On Fri, 13 Feb 2015, Oleg Nesterov wrote:
> On 02/13, Nicholas Mc Guire wrote:
> >
> > On Thu, 12 Feb 2015, Oleg Nesterov wrote:
> >
> > > Nicholas, sorry, I sent the patch but forgot to CC you.
> > > See https://lkml.org/lkml/2015/2/12/587
> > >
> > > And please note that "completion" was specially designed to guarantee
> > > that complete() can't play with this memory after wait_for_completion/etc
> > > returns.
> > >
> >
> > hmmm.... I guess that "falling out of context" can happen in a number of cases
> > with completion - any of the timeout/interruptible variants e.g:
> >
> > void xxx(void)
> > {
> > struct completion c;
> >
> > init_completion(&c);
> >
> > expose_this_completion(&c);
> >
> > wait_for_completion_timeout(&c,A_FEW_JIFFIES);
> > }
> >
> > and if the other side did not call complete() within A_FEW_JIFFIES then
> > it would result in the same failure - I don't think the API can prevent
> > this type of bug.
>
> Yes sure, but in this case the user of wait_for_completion_timeout() should
> blame itself, it is simply buggy.
>
> > Tt has to be ensured by additional locking
>
> Yes, but
>
> > drivers/misc/tifm_7xx1.c:tifm_7xx1_resume() resolve this issue by resetting
> > the completion to NULL and testing for !NULL before calling complete()
> > with appropriate locking protection access.
>
> I don't understand this code, I can be easily wrong. but at first glance it
> doesn't need completion at all. Exactly because it relies on the additional
> fm->lock. ->finish_me could be "task_struct *", the tifm_7xx1_resume() could
> simply do schedule_timeout(), tifm_7xx1_isr() could do wake_up_process().
> Nevermind, this is off-topic and most probably I misread this code.
>
this is unfortunately true for a few other places as well - so the problem of
going out of scope with the _timeout/interruptible variants is quite general
and there is no clean solution. you are right that its the users code that is
buggy if the struct completion drops out of context - was jsut thinking if it
were not a resonable extension of the competion API to eliminate that need to
mess with locks to resolve this by adding a caccelation mechanism that would
resolve this at the API level.
Basically if you call wait_for_completion_timeout and the timeout condition
occures you always need some way of notifying the completing end that it
should no longer call complete()/complete_all().
> > Never the less of course the proposed change in completion_done() was a bug -
> > many thanks for catching that so quickly !
>
> OK, perhaps you can ack the fix I sent?
>
the only question I still have is that there would be no matching
smp_wmb() to the smp_rmb() you are using (atleast I did not figure out where).
thx!
hofrat
next prev parent reply other threads:[~2015-02-14 8:35 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-12 0:34 BUG: spinlock bad magic on CPU#0, migration/0/9 Paul E. McKenney
2015-02-12 3:15 ` Davidlohr Bueso
2015-02-12 3:43 ` Paul E. McKenney
2015-02-12 17:28 ` Oleg Nesterov
2015-02-12 17:41 ` Oleg Nesterov
2015-02-12 17:58 ` Davidlohr Bueso
2015-02-12 19:10 ` Nicholas Mc Guire
2015-02-12 19:37 ` Oleg Nesterov
2015-02-12 21:27 ` Oleg Nesterov
2015-02-13 18:17 ` Nicholas Mc Guire
2015-02-13 18:53 ` Oleg Nesterov
2015-02-14 8:35 ` Nicholas Mc Guire [this message]
2015-02-14 14:00 ` Oleg Nesterov
2015-02-12 19:59 ` Davidlohr Bueso
2015-02-12 19:32 ` Nicholas Mc Guire
2015-02-12 19:39 ` Oleg Nesterov
2015-02-12 19:59 ` [PATCH] sched/completion: completion_done() should serialize with complete() Oleg Nesterov
2015-02-13 21:09 ` Paul E. McKenney
2015-02-13 21:56 ` Davidlohr Bueso
2015-02-13 22:02 ` Davidlohr Bueso
2015-02-16 8:21 ` Peter Zijlstra
2015-02-16 16:51 ` Oleg Nesterov
2015-02-18 17:06 ` [tip:sched/core] sched/completion: Serialize completion_done() " tip-bot for Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150214083555.GA30176@opentech.at \
--to=der.herr@hofr.at \
--cc=dave@stgolabs.net \
--cc=linux-kernel@vger.kernel.org \
--cc=oleg@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@linux.vnet.ibm.com \
--cc=waiman.long@hp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.