From: Nicholas Mc Guire <der.herr@hofr.at>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>,
paulmck@linux.vnet.ibm.com, linux-kernel@vger.kernel.org,
waiman.long@hp.com, peterz@infradead.org,
raghavendra.kt@linux.vnet.ibm.com
Subject: Re: BUG: spinlock bad magic on CPU#0, migration/0/9
Date: Sat, 14 Feb 2015 09:35:55 +0100 [thread overview]
Message-ID: <20150214083555.GA30176@opentech.at> (raw)
In-Reply-To: <20150213185328.GA19746@redhat.com>
On Fri, 13 Feb 2015, Oleg Nesterov wrote:
> On 02/13, Nicholas Mc Guire wrote:
> >
> > On Thu, 12 Feb 2015, Oleg Nesterov wrote:
> >
> > > Nicholas, sorry, I sent the patch but forgot to CC you.
> > > See https://lkml.org/lkml/2015/2/12/587
> > >
> > > And please note that "completion" was specially designed to guarantee
> > > that complete() can't play with this memory after wait_for_completion/etc
> > > returns.
> > >
> >
> > hmmm.... I guess that "falling out of context" can happen in a number of cases
> > with completion - any of the timeout/interruptible variants e.g:
> >
> > void xxx(void)
> > {
> > struct completion c;
> >
> > init_completion(&c);
> >
> > expose_this_completion(&c);
> >
> > wait_for_completion_timeout(&c,A_FEW_JIFFIES);
> > }
> >
> > and if the other side did not call complete() within A_FEW_JIFFIES then
> > it would result in the same failure - I don't think the API can prevent
> > this type of bug.
>
> Yes sure, but in this case the user of wait_for_completion_timeout() should
> blame itself, it is simply buggy.
>
> > Tt has to be ensured by additional locking
>
> Yes, but
>
> > drivers/misc/tifm_7xx1.c:tifm_7xx1_resume() resolve this issue by resetting
> > the completion to NULL and testing for !NULL before calling complete()
> > with appropriate locking protection access.
>
> I don't understand this code, I can be easily wrong. but at first glance it
> doesn't need completion at all. Exactly because it relies on the additional
> fm->lock. ->finish_me could be "task_struct *", the tifm_7xx1_resume() could
> simply do schedule_timeout(), tifm_7xx1_isr() could do wake_up_process().
> Nevermind, this is off-topic and most probably I misread this code.
>
this is unfortunately true for a few other places as well - so the problem of
going out of scope with the _timeout/interruptible variants is quite general
and there is no clean solution. you are right that its the users code that is
buggy if the struct completion drops out of context - was jsut thinking if it
were not a resonable extension of the competion API to eliminate that need to
mess with locks to resolve this by adding a caccelation mechanism that would
resolve this at the API level.
Basically if you call wait_for_completion_timeout and the timeout condition
occures you always need some way of notifying the completing end that it
should no longer call complete()/complete_all().
> > Never the less of course the proposed change in completion_done() was a bug -
> > many thanks for catching that so quickly !
>
> OK, perhaps you can ack the fix I sent?
>
the only question I still have is that there would be no matching
smp_wmb() to the smp_rmb() you are using (atleast I did not figure out where).
thx!
hofrat
next prev parent reply other threads:[~2015-02-14 8:35 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-12 0:34 BUG: spinlock bad magic on CPU#0, migration/0/9 Paul E. McKenney
2015-02-12 3:15 ` Davidlohr Bueso
2015-02-12 3:43 ` Paul E. McKenney
2015-02-12 17:28 ` Oleg Nesterov
2015-02-12 17:41 ` Oleg Nesterov
2015-02-12 17:58 ` Davidlohr Bueso
2015-02-12 19:10 ` Nicholas Mc Guire
2015-02-12 19:37 ` Oleg Nesterov
2015-02-12 21:27 ` Oleg Nesterov
2015-02-13 18:17 ` Nicholas Mc Guire
2015-02-13 18:53 ` Oleg Nesterov
2015-02-14 8:35 ` Nicholas Mc Guire [this message]
2015-02-14 14:00 ` Oleg Nesterov
2015-02-12 19:59 ` Davidlohr Bueso
2015-02-12 19:32 ` Nicholas Mc Guire
2015-02-12 19:39 ` Oleg Nesterov
2015-02-12 19:59 ` [PATCH] sched/completion: completion_done() should serialize with complete() Oleg Nesterov
2015-02-13 21:09 ` Paul E. McKenney
2015-02-13 21:56 ` Davidlohr Bueso
2015-02-13 22:02 ` Davidlohr Bueso
2015-02-16 8:21 ` Peter Zijlstra
2015-02-16 16:51 ` Oleg Nesterov
2015-02-18 17:06 ` [tip:sched/core] sched/completion: Serialize completion_done() " tip-bot for Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150214083555.GA30176@opentech.at \
--to=der.herr@hofr.at \
--cc=dave@stgolabs.net \
--cc=linux-kernel@vger.kernel.org \
--cc=oleg@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@linux.vnet.ibm.com \
--cc=waiman.long@hp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox