linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Nathan Lynch <ntl@pobox.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@ozlabs.org
Subject: Re: __cpu_up vs. start_secondary race?
Date: Tue, 2 Dec 2008 20:16:24 -0600	[thread overview]
Message-ID: <20081203021624.GE6829@localdomain> (raw)
In-Reply-To: <1228169318.7356.146.camel@pasglop>

Benjamin Herrenschmidt wrote:
> On Mon, 2008-12-01 at 15:30 -0600, Nathan Lynch wrote:
> > 
> > cpu_callin_map is used during secondary CPU bootstrap to notify the
> > waiting CPU that the new CPU is coming up.  __cpu_up clears
> > cpu_callin_map[cpu] and then polls the same location, waiting for
> > start_secondary to set it to 1.  But I'm wondering how safe the
> > current implementation is -- start_secondary doesn't have an explicit
> > sync following cpu_callin_map[cpu] = 1, and __cpu_up has no
> > synchronization instructions in its polling loop, so how can we be
> > sure that the waiting cpu will see the update to that location in
> > time?
> 
> I think it works because there's no big ordering problem (though we
> should still probably stick a few barriers here for safety) so it's
> really just a problem of how long it takes for the store to be visible,
> and the duration of the waiting loop is such that in practice, it will
> end up being visible wayyyyy before we timeout.

At least on "real" hardware, yes.  Various 64-bit systems I've tested
see the update after two iterations at most (during boot, didn't check
the hotplug case).

> IE. It's not like stores get buffered for ever due to absence of
> barriers. They ultimately get out to the bus.

Hrm, "ultimately" :)  Okay, thanks.

Apart from barriers (or lack thereof), the fact that __cpu_up gives up
after a more-or-less arbitrary period seems... well, arbitrary.  If we
get to "Processor X is stuck" then something is seriously wrong:
there's either a kernel bug or a platform issue, and the CPU just
kicked is in an unknown state.  Polling indefinitely seems safer, no?
Especially since some hypervisors allow overcommitting processors and
memory, which can introduce latencies in unexpected places.

  reply	other threads:[~2008-12-03  2:16 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-01 21:30 __cpu_up vs. start_secondary race? Nathan Lynch
2008-12-01 22:08 ` Benjamin Herrenschmidt
2008-12-03  2:16   ` Nathan Lynch [this message]
2008-12-03  4:14     ` Trent Piepho
2008-12-03  4:52     ` Benjamin Herrenschmidt
2008-12-03  5:20       ` Nathan Lynch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081203021624.GE6829@localdomain \
    --to=ntl@pobox.com \
    --cc=benh@kernel.crashing.org \
    --cc=linuxppc-dev@ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).