linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Manfred Spraul <manfred@colorfullife.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>,
	linux-kernel@vger.kernel.org, cl@linux-foundation.org,
	mingo@elte.hu, akpm@linux-foundation.org, dipankar@in.ibm.com,
	josht@linux.vnet.ibm.com, schamp@sgi.com, niv@us.ibm.com,
	dvhltc@us.ibm.com, ego@in.ibm.com, rostedt@goodmis.org,
	peterz@infradead.org, benh@kernel.crashing.org,
	davem@davemloft.net, tony.luck@intel.com
Subject: Re: [PATCH, RFC, tip/core/rcu] v3 scalable classic RCU implementation
Date: Sun, 31 Aug 2008 10:20:01 -0700	[thread overview]
Message-ID: <20080831172001.GA7015@linux.vnet.ibm.com> (raw)
In-Reply-To: <48BA7944.8070402@colorfullife.com>

On Sun, Aug 31, 2008 at 12:58:12PM +0200, Manfred Spraul wrote:
> Paul E. McKenney wrote:
>>
>>> Perhaps it's possible to rely on CPU_DYING, but I haven't figured out yet 
>>> how to handle read-side critical sections in CPU_DYING handlers.
>>> Interrupts after CPU_DYING could be handled by rcu_irq_enter(), 
>>> rcu_irq_exit() [yes, they exist on x86: the arch code enables the local 
>>> interrupts in order to process the currently queued interrupts]
>>>     
>>
>> My feeling is that CPU online/offline will be quite rare, so it should
>> be OK to clean up after the races in force_quiescent_state(), which in
>> this version is called every three ticks in a given grace period.
>>   
> If you add failing cpu offline calls, then the problem appears to be 
> unsolvable:
> If I get it right, the offlining process looks like this:
> * one cpu in the system makes the CPU_DOWN_PREPARE notifier call. These 
> calls can sleep (e.g. slab sleeps on semaphores). The cpu that goes offline 
> is still alive, still doing arbitrary work. cpu_quiet calls on behalf of 
> the cpu would be wrong.
> * stop_machine: all cpus schedule to a special kernel thread [1], only the 
> dying cpu runs.
> * The cpu that goes offline calls the CPU_DYING notifiers.
> * __cpu_disable(): The cpu that goes offline check if it's possible to 
> offline the cpu. At least on i386, this can fail.
> On success:
> * at least on i386: the cpu that goes offline handles outstanding 
> interrupts. I'm not sure, perhaps even softirqs are handled.
> * the cpus stopps handling interrupts.
> * stop machine leaves, the remaining cpus continue their work.

As I understand it, this is the point where the dying CPU disables
interrupts and removes itself from the online masks.  Though I would
feel better if there was an smp_mb() after the last local_irq_disable()
and before the remove_cpu_from_maps()!

> * The CPU_DEAD notifiers are called. They can sleep.
> On failure:
> * all cpus continue their work. call_rcu, synchronize_rcu(), ...
> * some time later: the CPU_DOWN_FAILED callbacks are called.
>
> Is that description correct?

Gautham?

> Then:
> - treating a cpu as always quiet after the rcu notifer was called with 
> CPU_OFFLINE_PREPARE is wrong: the target cpu still runs normal code: user 
> space, kernel space, interrupts, whatever. The target cpu still accepts 
> interrupst, thus treating it as "normal" should work.

Indeed!  My current code doesn't declare them offline until the CPU_DEAD
notifiers are called.  And force_quiescent_state() does not consider
them to be offline until after they have cleared their bit in
cpu_online_map, which does not happen until the outgoing CPU has
disabled interrupts, at least in x86.  So my current code should be
OK on x86.

It -looks- like stop_cpu() expects to be called with irqs disabled,
but I don't see what would be disabling irqs.  (Don't kthreads normally
start with irqs enabled?)  Ah, I see it -- the stop_cpu() threads
sequence through a state machine, and one of the states disables
irqs for everyone.

So the only problem would occur in architectures that re-enable irqs
in the middle of __cpu_disable(), as x86 does (but x86 correctly orders
the clearing of the cpu_online_mask bit, so is OK).  This of course
has the added benefit that irq handlers aren't running on a CPU that
is marked offline.

Checking other architectures:

o	ARM arch/arm/kernel/smp.c __cpu_disable() does not re-enable
	irqs, so is OK.

!	arch/ia64/kernel/smpboot.c __cpu_disable() clears itself
	from the cpu_online mask before flushing pending irqs, which
	might include RCU read-side critical sections.	I believe that
	the "cpu_clear(cpu, cpu_online_map)" must move to after the
	"fixup_irqs()".

o	arch/powerpc/kernel/smp.c __cpu_disable() does not disable
	irqs directly, but calls subarch-specific functions noted
	below.

o	arch/powerpc/platforms/powermac/smp.c smp_core99_cpu_disable()
	does not appear to re-enable irqs, so should be OK.

!	arch/powerpc/kernel/smp.c generic_cpu_disable() clears itself
	from the cpu_online_mask before invoking fixup_irqs(), which
	momentarily enables irqs.  I believe that the  "cpu_clear(cpu,
	cpu_online_map)" must move to after the "fixup_irqs()".

?	arch/powerpc/platforms/pseries/hotplug-cpu.c pseries_cpu_disable()
	clears itself from the cpu_online_mask before calling
	xics_migrate_irqs_away().  This function rejects already-pending
	irqs, then redirects future irqs.  Not clear to me what happens
	if an irq arrives between the reject and the immediately
	following removal from the global interrupt queue.

o	arch/s390/kernel/smp.c __cpu_disable() does not reenable irqs
	so is OK.

!	arch/sparc64/kernel/smp.c __cpu_disable() clears its bit before
	re-enabling interrupts.  I believe that the "cpu_clear(cpu,
	cpu_online_map)" needs to happen after the local_irq_disable().

?	include/asm-parisc/smp.h __cpu_disable() just returns without
	doing anything.  This means pa-risc does not support hotplug
	CPU?  If so, no problem.

I am sending (untested) patches separately for the amusement of the
arch maintainers.

> __cpu_disable() success:
> - after CPU_DYING, a cpu is either in an interrupt or outside read-side 
> critical sections. Parallel synchronize_rcu() calls are impossible until 
> the cpu is dead. call_rcu() is probably possible.
> - The CPU_DEAD notifiers are called. a synchronize_rcu() call before the 
> rcu notifier is called is possible.
> __cpu_disable() failure:
> - CPU_DYING is called, but the cpu remains fully alive. The system comes 
> fully alive again.
> - some time later, CPU_DEAD is called.
>
> With the current CPU_DYING callback, it's impossible to be both 
> deadlock-free and race-free with the given conditions. If __cpu_disable() 
> succeeds, then the cpu must be treated as gone and always idle. If 
> __cpu_disable() fails, then the cpu must be treated as fully there. Doing 
> both things at the same time is impossible. Waiting until CPU_DOWN_FAILED 
> or CPU_DEAD is called is impossible, too: Either synchronize_rcu() in a 
> CPU_DEAD notifier [called before the rcu notifier] would deadlock or 
> read-side critical sections on the not-killed cpu would race.

Assuming that the ordering of processing pending irqs and marking the
CPU offline in cpu_online_mask can be resolved as noted above, it should
work fine -- if a CPU's bit is clear, we can safely ignore it.  The race
can be resolved by checking the CPU's bit in force_quiescent_state().

Or am I missing something?

> What about moving the CPU_DYING notifier calls behind the __cpu_disable() 
> call?
> Any other solutions?

RCU should ignore the CPU_DYING notifier calls -- only the CPU_DEAD.*
calls should be processed for CPUs being offlined.  Right?

> Btw, as far as I can see, rcupreempt would deadlock if a CPU_DEAD notifier 
> uses synchronize_rcu().
> Probably noone will ever succeed in triggering the deadlock:
> - cpu goes offline.
> - the other cpus in the system are restarted.
> - one cpu does the CPU_DEAD notifier calls.
> - before the rcu notifier is called with CPU_DEAD:
> - one CPU_DEAD notifier sleeps.
> - while CPU_DEAD is sleeping: on the same cpu: kmem_cache_destroy is 
> called. get_online_cpus immediately succeeds.
> - kmem_cache_destroy acquires the cache_chain_mutex.
> - kmem_cache_destroy does synchronize_rcu(), it sleeps.
> - CPU_DEAD processing continues, the slab CPU_DEAD tries to acquire the 
> cache_chain_mutex. it sleeps, too.
> --> deadlock, because the already dead cpu will never signal itself as 
> quiet. Thus synchronize_rcu() will never succeed, thus the slab CPU_DEAD 
> notifier will never return, thus rcu_offline_cpu() is never called.

It is entirely possible that rcu_try_flip_waitack() and
rcu_try_flip_waitmb() need to check the AND of rcu_cpu_online_map and
cpu_online_map.  If this really is a problem (and it might well be),
then the easiest fix is to check for cpu_is_offline(cpu) in both
rcu_try_flip_waitmb_needed() and rcu_try_flip_waitack_needed(), and
that in both versions of both functions.  Thoughts?

> --
>    Manfred
> [1] open question: with rcu_preempt, is it possible that these cpus could 
> be inside read side critical sections?

Yes, they could.  Well, tasks that were previously running on them
might be preempted or blocked waiting on locks while still in RCU
read-side critical sections.  However, when a given CPU goes offline,
rcu_preempt moves that CPU's counters to some surviving CPU.  So RCU
knows that these tasks are still in RCU read-side critical sections,
and will therefore wait for them.

							Thanx, Paul

  reply	other threads:[~2008-08-31 17:20 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-21 23:43 [PATCH, RFC, tip/core/rcu] scalable classic RCU implementation Paul E. McKenney
2008-08-22  4:37 ` Ingo Molnar
2008-08-22 13:47   ` Paul E. McKenney
2008-08-22 17:22     ` Paul E. McKenney
2008-08-22 18:16       ` Josh Triplett
2008-08-23 16:07       ` Ingo Molnar
2008-08-24  2:44         ` Paul E. McKenney
2008-08-22 23:29 ` Josh Triplett
2008-08-23  1:53   ` Paul E. McKenney
2008-08-25 22:02     ` Josh Triplett
2008-08-26 16:05       ` Paul E. McKenney
2008-08-27  0:38         ` Josh Triplett
2008-08-27 18:34           ` Paul E. McKenney
2008-08-27 20:23             ` Josh Triplett
2008-08-27 20:41               ` Paul E. McKenney
2008-08-25 10:34   ` Peter Zijlstra
2008-08-25 15:16     ` Paul E. McKenney
2008-08-25 15:26       ` Peter Zijlstra
2008-08-27 18:28         ` Paul E. McKenney
2008-08-24  8:08 ` Manfred Spraul
2008-08-24 16:32   ` Paul E. McKenney
2008-08-24 18:25     ` Manfred Spraul
2008-08-24 21:19       ` Paul E. McKenney
2008-08-25  0:07 ` [PATCH, RFC, tip/core/rcu] v2 " Paul E. McKenney
2008-08-30  0:49   ` [PATCH, RFC, tip/core/rcu] v3 " Paul E. McKenney
2008-08-30  9:33     ` Peter Zijlstra
2008-08-30 14:10       ` Paul E. McKenney
2008-08-30 15:40         ` Peter Zijlstra
2008-08-30 19:38           ` Paul E. McKenney
2008-09-02 13:26           ` Mathieu Desnoyers
2008-09-02 13:41             ` Peter Zijlstra
2008-09-02 14:55               ` Paul E. McKenney
2008-08-30  9:58     ` Lai Jiangshan
2008-08-30 13:32       ` Manfred Spraul
2008-08-30 14:34         ` Paul E. McKenney
2008-08-31 10:58           ` Manfred Spraul
2008-08-31 17:20             ` Paul E. McKenney [this message]
2008-08-31 17:45               ` Manfred Spraul
2008-08-31 17:55                 ` Paul E. McKenney
2008-08-31 18:18                   ` Manfred Spraul
2008-08-31 19:23                     ` Paul E. McKenney
2008-08-30 14:29       ` Paul E. McKenney
2008-09-01  9:38     ` Andi Kleen
2008-09-02  1:05       ` Paul E. McKenney
2008-09-02  6:18         ` Andi Kleen
2008-09-05 15:29     ` [PATCH, RFC] v4 " Paul E. McKenney
2008-09-05 19:33       ` Andrew Morton
2008-09-05 23:04         ` Paul E. McKenney
2008-09-05 23:52           ` Andrew Morton
2008-09-06  4:16             ` Paul E. McKenney
2008-09-06 16:37       ` Manfred Spraul
2008-09-07 17:25         ` Paul E. McKenney
2008-09-07 10:18       ` [RFC, PATCH] Add a CPU_STARTING notifier (was: Re: [PATCH, RFC] v4 scalable classic RCU implementation) Manfred Spraul
2008-09-07 11:07         ` Andi Kleen
2008-09-07 19:46         ` Paul E. McKenney
2008-09-15 16:02       ` [PATCH, RFC] v4 scalable classic RCU implementation Paul E. McKenney
2008-09-16 16:52         ` Manfred Spraul
2008-09-16 17:30           ` Paul E. McKenney
2008-09-16 17:48             ` Manfred Spraul
2008-09-16 18:22               ` Paul E. McKenney
2008-09-21 11:09               ` Manfred Spraul
2008-09-21 21:14                 ` Paul E. McKenney
2008-09-23 23:53         ` [PATCH, RFC] v6 " Paul E. McKenney
2008-09-25  7:26           ` Ingo Molnar
2008-09-25 14:05             ` Paul E. McKenney
2008-09-25  7:29           ` Ingo Molnar
2008-09-25 14:18             ` Paul E. McKenney
2008-10-10 16:09           ` [PATCH, RFC] v7 " Paul E. McKenney
2008-10-12 15:52             ` Manfred Spraul
2008-10-12 22:46               ` Paul E. McKenney
2008-10-13 18:03                 ` Manfred Spraul
2008-10-15  1:11                   ` Paul E. McKenney
2008-10-15  8:13                     ` Manfred Spraul
2008-10-15 15:26                       ` Paul E. McKenney
2008-10-22 18:41                         ` Manfred Spraul
2008-10-22 21:02                           ` Paul E. McKenney
2008-10-22 21:24                             ` Manfred Spraul
2008-10-27 16:45                               ` Paul E. McKenney
2008-10-27 19:48                                 ` Manfred Spraul
2008-10-27 23:52                                   ` Paul E. McKenney
2008-10-28  5:30                                     ` Manfred Spraul
2008-10-28 15:17                                       ` Paul E. McKenney
2008-10-28 17:21                                         ` Manfred Spraul
2008-10-28 17:35                                           ` Paul E. McKenney
2008-10-17  8:34             ` Gautham R Shenoy
2008-10-17 15:35               ` Gautham R Shenoy
2008-10-17 15:46                 ` Paul E. McKenney
2008-10-17 15:43               ` Paul E. McKenney
2008-12-08 18:42               ` Paul E. McKenney
2008-11-02 20:10             ` Manfred Spraul
2008-11-03 20:33               ` Paul E. McKenney
2008-11-05 19:48                 ` Manfred Spraul
2008-11-05 21:27                   ` Paul E. McKenney
2008-11-15 23:20             ` [PATCH, RFC] v8 " Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080831172001.GA7015@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=cl@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=dipankar@in.ibm.com \
    --cc=dvhltc@us.ibm.com \
    --cc=ego@in.ibm.com \
    --cc=josht@linux.vnet.ibm.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=manfred@colorfullife.com \
    --cc=mingo@elte.hu \
    --cc=niv@us.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=schamp@sgi.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).