All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rusty Russell <rusty@rustcorp.com.au>
To: Ben Greear <greearb@candelatech.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>, Tejun Heo <tj@kernel.org>
Subject: Re: 3.9.x:  Possible race related to stop_machine leads to lockup.
Date: Wed, 05 Jun 2013 14:11:39 +0930	[thread overview]
Message-ID: <87mwr5rwxo.fsf@rustcorp.com.au> (raw)
In-Reply-To: <51AE667F.6030702@candelatech.com>

Ben Greear <greearb@candelatech.com> writes:
> On 06/04/2013 02:18 PM, Ben Greear wrote:
>> I've been trying to figure out why I see the migration/* processes
>> hang in a busy loop....
>>
>> While reading the stop_machine.c file, I think I might have an
>> answer.
>>
>> The set_state() method sets the thread_ack to the current number
>> of threads.  Each thread's state machine then decrements it down to
>> zero where it bumps the state to the next level.  This lets each
>> cpu stop in lock-step it seems.
>>
>> But, from what I can tell, the __stop_machine() method can
>> (re)set the state to STOPMACHINE_PREPARE while the migration
>> processes are in their loop.  That would explain why they sometimes
>> loop forever.
>>
>> Does this make sense?
>
> Err, no..that doesn't make sense.  'smdata' is on the stack.
>
> More printk debugging makes it look like one thread just
> never notices that smdata->state has been updated by another
> thread.
>
> There is this comment..maybe cpu_relax only does the chill out part
> and we need something else to make sure smdata->state is freshly
> read from the other CPU's cache?
>
> 		/* Chill out and ensure we re-read stopmachine_state. */
> 		cpu_relax();
> 		if (smdata->state != curstate) {
>
> Gah..way out of my league :P

What architecture?  Maybe someone didn't get the memo; cpu_relax()
should be a read barrier.

Cheers,
Rusty.

  reply	other threads:[~2013-06-05  4:46 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-04 21:18 3.9.x: Possible race related to stop_machine leads to lockup Ben Greear
2013-06-04 22:13 ` Ben Greear
2013-06-05  4:41   ` Rusty Russell [this message]
2013-06-05 15:11     ` Ben Greear

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mwr5rwxo.fsf@rustcorp.com.au \
    --to=rusty@rustcorp.com.au \
    --cc=greearb@candelatech.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.