linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Guenter Roeck <linux@roeck-us.net>, Rabin Vincent <rabin@rab.in>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: qemu:arm test failure due to commit 8053871d0f7f (smp: Fix smp_call_function_single_async() locking)
Date: Mon, 20 Apr 2015 05:17:29 -0700	[thread overview]
Message-ID: <20150420121729.GN5561@linux.vnet.ibm.com> (raw)
In-Reply-To: <20150420053954.GA9923@gmail.com>

On Mon, Apr 20, 2015 at 07:39:55AM +0200, Ingo Molnar wrote:
> 
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > On Sun, Apr 19, 2015 at 11:01 AM, Ingo Molnar <mingo@kernel.org> wrote:
> > >
> > > That's all fine and good, but why is an IPI sent to a non-existent 
> > > CPU? It's not like we don't know which CPU is up and down.
> > 
> > I agree that the qemu-arm behavior smells like a bug, plain and 
> > simple. Nobody sane should send random IPI's to CPU's that they 
> > don't even know are up or not.
> 
> Yeah, and a warning would have caught that bug a bit earlier, at the 
> cost of restricting the API:
> 
> > That said, I could imagine that we would have some valid case where 
> > we just do a cross-cpu call to (for example) do lock wakeup, and the 
> > code would use some optimistic algorithm that prods the CPU after 
> > the lock has been released, and there could be some random race 
> > where the lock data structure has already been released (ie imagine 
> > the kind of optimistic unlocked racy access to "sem->owner" that we 
> > discussed as part of the rwsem_spin_on_owner() thread recently).
> > 
> > So I _could_ imagine that somebody would want to do optimistic "prod 
> > other cpu" calls that in all normal cases are for existing cpus, but 
> > could be racy in theory.
> 
> Yes, and I don't disagree with such optimizations in principle (it 
> allows less references to be taken in the fast path), but is it really 
> safe?
> 
> If a CPU is going down and we potentially race against that, and send 
> off an IPI, the IPI might be 'in flight' for an indeterminate amount 
> of time, especially on wildly non-deterministic hardware like virtual 
> platforms.
> 
> So if the CPU goes down during that time, the typical way a CPU goes 
> down is that it ignores all IPIs that arrive after that. End result: 
> the sender of the IPI may hang indefinitely (for the synchronous API), 
> waiting for the CSD lock to be released...
> 
> For the non-wait API we could also hang, but in an even more 
> interesting way: we won't hang trying to send to the downed CPU, we'd 
> hang on the _next_ cross-call call, possibly sending to another CPU, 
> because the CSD_FLAG_LOCK of the sender CPU is never released by the 
> offlined CPU which was supposed to unlock it.
> 
> Also note the existing warning we already have in 
> flush_smp_call_function_queue():
> 
>         /* There shouldn't be any pending callbacks on an offline CPU. */
>         if (unlikely(warn_cpu_offline && !cpu_online(smp_processor_id()) &&
>                      !warned && !llist_empty(head))) {
>                 warned = true;
>                 WARN(1, "IPI on offline CPU %d\n", smp_processor_id());
> 
> Couldn't this trigger in the opportunistic, imprecise IPI case, if 
> IRQs are ever enabled when or after the CPU is marked offline?
> 
> My suggested warning would simply catch this kind of unsafe looking 
> race a bit sooner, instead of silently ignoring the cross-call 
> attempt.
> 
> Now your suggestion could be made to work by:
> 
>   - polling for CPU status in the CSD-lock code as well. (We don't do
>     this currently.)
> 
>   - making sure that if a late IPI arrives to an already-down CPU it
>     does not attempt to execute CSD functions. (I.e. silence the
>     above, already existing warning.)
> 
>   - auditing the CPU-down path to make sure it does not get surprised
>     by late external IPIs. (I think this should already be so, given
>     the existing warning.)
> 
>   - IPIs can be pending indefinitely, so make sure a pending IPI won't 
>     confuse the machinery after a CPU has been onlined again.
> 
>   - pending IPIs may consume hardware resources when not received 
>     properly. For example I think x86 will keep trying to resend it. 
>     Not sure there's any timeout mechanism on the hardware level - the 
>     sending APIC might even get stuck? Audit all hardware for this 
>     kind of possibility.
> 
> So unless I'm missing something, to me it looks like that the current 
> code is only safe to be used on offline CPUs if the 'offline' CPU is 
> never ever brought online in the first place.
> 
> > It doesn't sound like the qemu-arm case is that kind of situation, 
> > though. That one just sounds like a stupid "let's send an ipi to a 
> > cpu whether it exists or not".
> > 
> > But Icommitted it without any new warning, because I could in theory 
> > see it as being a valid use.
> 
> So my point is that we already have a 'late' warning, and I think it 
> actively hid the qemu-arm bug, because the 'offline' CPU was so 
> offline (due to being non-existent) that it never had a chance to 
> print a warning.
> 
> With an 'early' warning we could flush out such bugs (or code 
> uncleanlinesses: clearly the ARM code was at least functional before) 
> a bit sooner.
> 
> But I don't have strong feelings about it, SMP cross-call users are a 
> lot less frequent than say locking facility users, so the strength of 
> debugging isn't nearly as important and it's fine to me either way!

In the long term, support of "send an IPI to a CPU that might or might
not be leaving" will probably need an API that returns a success or
failure indication.  This will probably simplify things a bit, because
currently the caller needs to participate in the IPI's synchronization.
With a conditional primitive, callers could instead simply rely on the
return value.

							Thanx, Paul


  reply	other threads:[~2015-04-20 12:17 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-18 23:23 qemu:arm test failure due to commit 8053871d0f7f (smp: Fix smp_call_function_single_async() locking) Guenter Roeck
2015-04-18 23:40 ` Guenter Roeck
2015-04-19  0:04   ` Linus Torvalds
2015-04-19  0:36     ` Guenter Roeck
2015-04-19  1:56     ` Guenter Roeck
2015-04-19  3:39       ` Rabin Vincent
2015-04-19  4:03         ` Guenter Roeck
     [not found]         ` <CA+55aFw4FSja+VBuCYJ7wLXKVRQZ7w6vOUaUJ4B=FXyBmNkrUg@mail.gmail.com>
2015-04-19  8:56           ` Linus Torvalds
2015-04-19  9:31             ` Ingo Molnar
2015-04-19 14:08               ` Guenter Roeck
2015-04-19 18:01                 ` Ingo Molnar
2015-04-19 20:34                   ` Linus Torvalds
2015-04-20  5:39                     ` Ingo Molnar
2015-04-20 12:17                       ` Paul E. McKenney [this message]
2015-04-20 15:53                       ` Linus Torvalds
2015-04-20 15:41                   ` Rabin Vincent
2015-04-20 10:46                 ` Geert Uytterhoeven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150420121729.GN5561@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rabin@rab.in \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).