linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: paulmck@linux.vnet.ibm.com (Paul E. McKenney)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v2] ARM: Don't use complete() during __cpu_die
Date: Thu, 5 Feb 2015 09:54:31 -0800	[thread overview]
Message-ID: <20150205175431.GH5370@linux.vnet.ibm.com> (raw)
In-Reply-To: <20150205173440.GR8656@n2100.arm.linux.org.uk>

On Thu, Feb 05, 2015 at 05:34:40PM +0000, Russell King - ARM Linux wrote:
> On Thu, Feb 05, 2015 at 09:02:28AM -0800, Paul E. McKenney wrote:
> > On Thu, Feb 05, 2015 at 04:11:00PM +0000, Russell King - ARM Linux wrote:
> > > On Thu, Feb 05, 2015 at 06:29:18AM -0800, Paul E. McKenney wrote:
> > > > Works for me, assuming no hidden uses of RCU in the IPI code.  ;-)
> > > 
> > > Sigh... I kind'a new it wouldn't be this simple.  The gic code which
> > > actually raises the IPI takes a raw spinlock, so it's not going to be
> > > this simple - there's a small theoretical window where we have taken
> > > this lock, written the register to send the IPI, and then dropped the
> > > lock - the update to the lock to release it could get lost if the
> > > CPU power is quickly cut at that point.
> > > 
> > > Also, we _do_ need the second cache flush in place to ensure that the
> > > unlock is seen to other CPUs.
> > > 
> > > We could work around that by taking and releasing the lock in the IPI
> > > processing function... but this is starting to look less attractive
> > > as the lock is private to irq-gic.c.
> > > 
> > > Well, we're very close to 3.19, we're too close to be trying to sort
> > > this out, so I'm hoping that your changes which cause this RCU error
> > > are *not* going in during this merge window, because we seem to have
> > > something of a problem right now which needs more time to resolve.
> > 
> > Most likely into the 3.20 merge window.  But please keep in mind that
> > RCU is just the messenger here -- the current code will break if any
> > CPU for whatever reason takes more than a jiffy to get from its
> > _stop_machine() handler to the end of its last RCU read-side critical
> > section on its way out.  A jiffy may sound like a lot, but it is not
> > hard to exceed this limit, especially in virtualized environments.
> 
> What I'm saying is that we can't likely get a good fix prepared before
> the 3.20 merge window opens.

And I cannot count.  Or cannot type.  Or something.

I meant do say "Most likely into the 3.21 merge window."  I agree that
this stuff is not ready for next week's merge window.  For one thing,
there are similar issues with a number of other architectures as well.

> I don't term the set_bit/clear_bit solution a "good fix" because it is
> far too complex - I've not done a thorough review on it, but the idea
> of setting and clearing a couple of bits in unison, making sure that
> their state is set appropriately through multiple different code paths
> does not strike me as a provably correct replacement for this completion.
> The reason for that complexity is because there is no pre-notification
> to arch code that a CPU might be going down, so there's no way for the
> "CPU is dead" flag to be properly reset (which is why there's all the
> manipulation in lots of possible failure paths.)
> 
> The idea that we could reset it in the CPU up code doesn't fly - that
> would only work if we had one secondary CPU (which would guarantee a
> strict up/down/up ordering on it) but as soon as you have more than one
> CPU, that doesn't hold true.
> 
> We could hook into the CPU hotplug notifiers - which would be quite a
> lot of additional code to achieve the reset early enough in the hot
> unplug path, though it would probably be the most reliable solution to
> the wait-for-bit solution.
> 
> However, any of those solutions needs writing and thorough testing,
> which, if Linus opens the merge window on Sunday, isn't going to
> happen before hand (and we know Linus doesn't like extra development
> appearing which wasn't in -next prior to the merge window - he's taken
> snapshots of -next to check during the merge window in the past - so
> it's not something I'm going to be adding during that time, not even
> as a "fix" because we know about the problem right now, before the
> merge window.  To me, to treat this as a "fix" would be wilfully
> deceitful.)
> 
> I don't think the existing code is a big problem at the moment - it's
> been like this for about 10 years, and no one has ever reported an
> issue with it, although there have been changes over that time:
> 
> aa033810461ee56abbef6cef10aabd6b97f5caee
> ARM: smp: Drop RCU_NONIDLE usage in cpu_die()
> 
> 	This removed the RCU_NONIDLE() from the completion() call.
> 
> ff081e05bfba3461119cd280201d163b6858eda2
> ARM: 7457/1: smp: Fix suspicious RCU originating from cpu_die()
> 
> 	This added the RCU_NONIDLE() to the completion() call.
> 
> 3c030beabf937b1d3b4ecaedfd1fb2f1e2aa0c70
> ARM: CPU hotplug: move cpu_killed completion to core code
> 
> 	This moved the completion code from Realview (and other ARM
> 	platforms) into core ARM code.
> 
> and 97a63ecff4bd06da5d8feb8c0394a4d020f2d34d
> [ARM SMP] Add CPU hotplug support for Realview MPcore
> 
> 	The earliest current introduction of CPU hotplug in 2005.

Agreed.  It does need to be fixed, but it makes sense to take a few
weeks and get a fix that is more likely to be correct.

							Thanx, Paul

  reply	other threads:[~2015-02-05 17:54 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-05 10:14 [PATCH v2] ARM: Don't use complete() during __cpu_die Krzysztof Kozlowski
2015-02-05 10:50 ` Russell King - ARM Linux
2015-02-05 11:00   ` Krzysztof Kozlowski
2015-02-05 11:08     ` Russell King - ARM Linux
2015-02-05 11:28   ` Mark Rutland
2015-02-05 11:30     ` Russell King - ARM Linux
2015-02-05 14:29   ` Paul E. McKenney
2015-02-05 16:11     ` Russell King - ARM Linux
2015-02-05 17:02       ` Paul E. McKenney
2015-02-05 17:34         ` Russell King - ARM Linux
2015-02-05 17:54           ` Paul E. McKenney [this message]
2015-02-10  1:24       ` Stephen Boyd
2015-02-10  1:37         ` Paul E. McKenney
2015-02-10  2:05           ` Stephen Boyd
2015-02-10  3:05             ` Paul E. McKenney
2015-02-10 15:14         ` Mark Rutland
2015-02-10 20:48           ` Stephen Boyd
2015-02-10 21:04             ` Stephen Boyd
2015-02-10 21:15               ` Russell King - ARM Linux
2015-02-10 21:49                 ` Stephen Boyd
2015-02-10 22:05                   ` Stephen Boyd
2015-02-13 15:52               ` Mark Rutland
2015-02-13 16:27                 ` Russell King - ARM Linux
2015-02-13 17:21                   ` Mark Rutland
2015-02-13 17:30                     ` Russell King - ARM Linux
2015-02-13 16:28                 ` Stephen Boyd
2015-02-13 15:38             ` Mark Rutland
2015-02-10 20:58           ` Russell King - ARM Linux
2015-02-10 15:41         ` Russell King - ARM Linux
2015-02-10 18:33           ` Stephen Boyd
2015-02-25 12:56       ` Russell King - ARM Linux
2015-02-25 16:47         ` Nicolas Pitre
2015-02-25 17:00           ` Russell King - ARM Linux
2015-02-25 18:13             ` Nicolas Pitre
2015-02-25 20:16               ` Nicolas Pitre
2015-02-26  1:05                 ` Paul E. McKenney
2015-03-22 23:30                   ` Paul E. McKenney
2015-03-23 12:55                     ` Russell King - ARM Linux
2015-03-23 13:21                       ` Paul E. McKenney
2015-03-23 14:00                         ` Russell King - ARM Linux
2015-03-23 15:37                           ` Paul E. McKenney
2015-03-23 16:56                             ` Paul E. McKenney
2015-02-26 19:14           ` Daniel Thompson
2015-02-26 19:47             ` Nicolas Pitre
2015-02-05 10:53 ` Mark Rutland
2015-02-05 10:59   ` Krzysztof Kozlowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150205175431.GH5370@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).