All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@polymtl.ca,
	josh@joshtriplett.org, niv@us.ibm.com, tglx@linutronix.de,
	peterz@infradead.org, rostedt@goodmis.org,
	Valdis.Kletnieks@vt.edu, dhowells@redhat.com,
	eric.dumazet@gmail.com, darren@dvhart.com,
	"Paul E. McKenney" <paul.mckenney@linaro.org>
Subject: Re: [PATCH RFC tip/core/rcu 11/11] rcu: move TREE_RCU from softirq to kthread
Date: Mon, 28 Feb 2011 15:51:36 -0800	[thread overview]
Message-ID: <20110228235136.GC2331@linux.vnet.ibm.com> (raw)
In-Reply-To: <4D6B16A8.4050405@cn.fujitsu.com>

On Mon, Feb 28, 2011 at 11:29:44AM +0800, Lai Jiangshan wrote:
> On 02/26/2011 04:32 AM, Paul E. McKenney wrote:
> >>> +/*
> >>> + * Handle cases where the rcu_cpu_kthread() ends up on the wrong CPU.
> >>> + * This can happen while the corresponding CPU is either coming online
> >>> + * or going offline.  We cannot wait until the CPU is fully online
> >>> + * before starting the kthread, because the various notifier functions
> >>> + * can wait for RCU grace periods.  So we park rcu_cpu_kthread() until
> >>> + * the corresponding CPU is online.
> >>> + *
> >>> + * Return 1 if the kthread needs to stop, 0 otherwise.
> >>> + *
> >>> + * Caller must disable bh.  This function can momentarily enable it.
> >>> + */
> >>> +static int rcu_cpu_kthread_should_stop(int cpu)
> >>> +{
> >>> +	while (cpu_is_offline(cpu) || smp_processor_id() != cpu) {
> >>> +		if (kthread_should_stop())
> >>> +			return 1;
> >>> +		local_bh_enable();
> >>> +		schedule_timeout_uninterruptible(1);
> >>> +		if (smp_processor_id() != cpu)
> >>> +			set_cpus_allowed_ptr(current, cpumask_of(cpu));
> >>
> >> The current task is PF_THREAD_BOUND,
> >> Why do "set_cpus_allowed_ptr(current, cpumask_of(cpu));" ?
> > 
> > Because I have seen CPU hotplug operations unbind PF_THREAD_BOUND threads.
> > In addition, I end up having to spawn the kthread at CPU_UP_PREPARE time,
> > at which point the thread must run unbound because its CPU isn't online
> > yet.  I cannot invoke kthread_create() within the stop-machine handler
> > (right?).  I cannot wait until CPU_ONLINE time because that results in
> > hangs when other CPU notifiers wait for grace periods.
> > 
> > Yes, I did find out about the hangs the hard way.  Why do you ask?  ;-)
> 
> The current task is PF_THREAD_BOUND, "set_cpus_allowed_ptr(current, cpumask_of(cpu))"
> will do nothing even it runs on the wrong CPU.

You lost me on this one.

Looking at set_cpus_allowed_ptr()...

The "again" loop won't happen because the task is already running.
The CPU is online, so the cpumask_intersects() check won't kick
us out.  We are working with the current task, so the check for
PF_THREAD_BOUND, current, and cpumask_equal() won't kick us out.
If the old and new cpumasks had been the same, we would not have called
set_cpus_allowed_ptr() in the first place.  So we should get to
the call to migrate_task().

What am I missing here?

> If the task runs on the wrong CPU. We have no API to force/migrate the task
> to the bound CPU when the cpu becomes online. But wake_up_process() has
> a side affect that it will move a slept task to the correct online CPU.
> "schedule_timeout_uninterruptible(1);" will call
> wake_up_process() when timeout, so it will do all thing you need.
> 
> But "set_cpus_allowed_ptr(current, cpumask_of(cpu));" will do nothing.
> 
> The code is a little nasty I think. The proper solution I like:
> set the rcu_cpu_notify a proper priority, and wake up the kthread
> in the notifier.

I will be using both belt and suspenders on this one -- too much can
go wrong given slight adjustments in scheduler, CPU hotplug, and so on.

But speaking of paranoia, I should add a check of smp_processor_id()
vs. the local variable "cpu", shouldn't I?

> Steven, any suggestion? I just known very little about scheduler.
> 
> > 
> > Please feel free to suggest improvements in the header comment above
> > for rcu_cpu_kthread_should_stop(), which is my apparently insufficient
> > attempt to explain this.
> > 
> >>> +		local_bh_disable();
> >>> +	}
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +/*
> >>> + * Per-CPU kernel thread that invokes RCU callbacks.  This replaces the
> >>> + * earlier RCU softirq.
> >>> + */
> >>> +static int rcu_cpu_kthread(void *arg)
> >>> +{
> >>> +	int cpu = (int)(long)arg;
> >>> +	unsigned long flags;
> >>> +	int spincnt = 0;
> >>> +	wait_queue_head_t *wqp = &per_cpu(rcu_cpu_wq, cpu);
> >>> +	char work;
> >>> +	char *workp = &per_cpu(rcu_cpu_has_work, cpu);
> >>> +
> >>> +	for (;;) {
> >>> +		wait_event_interruptible(*wqp,
> >>> +					 *workp != 0 || kthread_should_stop());
> >>> +		local_bh_disable();
> >>> +		if (rcu_cpu_kthread_should_stop(cpu)) {
> >>> +			local_bh_enable();
> >>> +			break;
> >>> +		}
> >>> +		local_irq_save(flags);
> >>> +		work = *workp;
> >>> +		*workp = 0;
> >>> +		local_irq_restore(flags);
> >>> +		if (work)
> >>> +			rcu_process_callbacks();
> >>> +		local_bh_enable();
> >>> +		if (*workp != 0)
> >>> +			spincnt++;
> >>> +		else
> >>> +			spincnt = 0;
> >>> +		if (spincnt > 10) {
> >>
> >> "10" is a magic number here.
> > 
> > It is indeed.  Suggestions for a cpp macro name to hide it behind?
> > 
> >>> +			rcu_yield(cpu);
> >>> +			spincnt = 0;
> >>> +		}
> >>> +	}
> >>> +	return 0;
> >>> +}
> >>> +
> >>
> >>
> >>> +/*
> >>> + * Per-rcu_node kthread, which is in charge of waking up the per-CPU
> >>> + * kthreads when needed.
> >>> + */
> >>> +static int rcu_node_kthread(void *arg)
> >>> +{
> >>> +	int cpu;
> >>> +	unsigned long flags;
> >>> +	unsigned long mask;
> >>> +	struct rcu_node *rnp = (struct rcu_node *)arg;
> >>> +	struct sched_param sp;
> >>> +	struct task_struct *t;
> >>> +
> >>> +	for (;;) {
> >>> +		wait_event_interruptible(rnp->node_wq, rnp->wakemask != 0 ||
> >>> +						       kthread_should_stop());
> >>> +		if (kthread_should_stop())
> >>> +			break;
> >>> +		raw_spin_lock_irqsave(&rnp->lock, flags);
> >>> +		mask = rnp->wakemask;
> >>> +		rnp->wakemask = 0;
> >>> +		raw_spin_unlock_irqrestore(&rnp->lock, flags);
> >>> +		for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
> >>> +			if ((mask & 0x1) == 0)
> >>> +				continue;
> >>> +			preempt_disable();
> >>> +			per_cpu(rcu_cpu_has_work, cpu) = 1;
> >>> +			t = per_cpu(rcu_cpu_kthread_task, cpu);
> >>> +			if (t == NULL) {
> >>> +				preempt_enable();
> >>> +				continue;
> >>> +			}
> >>
> >> Obviously preempt_disable() is not for protecting remote percpu data.
> >> Is it for disabling cpu hotplug? I am afraid the @t may leave
> >> and become invalid.
> > 
> > Indeed, acquiring the rnp->lock is safer, except that I don't trust
> > calling sched_setscheduler_nocheck() in that state.  So I need to check
> > for the CPU being online after the preempt_disable().  This means that
> > I ignore requests to do work after CPU_DYING time, but that is OK because
> > force_quiescent_state() will figure out that the CPU is in fact offline.
> > 
> > Make sense?
> 
> Yes.

Good, I will take that approach.

> Another:
> 
> #if CONFIG_HOTPLUG_CPU
> get_task_struct() when set bit in wakemask
> put_task_struct() when clear bit in wakemask
> #endif

Good point, but I will pass on the added #ifdef.  ;-)

							Thanx, Paul

  parent reply	other threads:[~2011-02-28 23:51 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-23  1:39 [PATCH tip/core/rcu 0/14] Preview of RCU patches for 2.6.39 Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 01/11] rcu: call __rcu_read_unlock() in exit_rcu for tiny RCU Paul E. McKenney
2011-02-25  8:29   ` Lai Jiangshan
2011-02-25 19:40     ` Paul E. McKenney
2011-03-24  3:45       ` Lai Jiangshan
2011-03-24 13:07         ` Paul E. McKenney
2011-03-25  2:30           ` Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 02/11] rcutorture: Get rid of duplicate sched.h include Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 03/11] rcu: add documentation saying which RCU flavor to choose Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 04/11] rcupdate: remove dead code Paul E. McKenney
2011-02-23 14:36   ` Mathieu Desnoyers
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 05/11] rcu: add comment saying why DEBUG_OBJECTS_RCU_HEAD depends on PREEMPT Paul E. McKenney
2011-02-23  3:23   ` Steven Rostedt
2011-02-23 13:59     ` Mathieu Desnoyers
     [not found]     ` <BLU0-SMTP615CB0BE0A2623EF62925096DB0@phx.gbl>
2011-02-23 14:11       ` Steven Rostedt
2011-02-23 14:37         ` Mathieu Desnoyers
2011-02-23 14:55       ` Steven Rostedt
2011-02-23 15:02         ` Mathieu Desnoyers
2011-02-23 15:13         ` [PATCH] debug rcu head support !PREEMPT config Mathieu Desnoyers
     [not found]         ` <BLU0-SMTP1519908E0ACAEE1384F71896DB0@phx.gbl>
2011-02-23 15:27           ` Steven Rostedt
2011-02-23 15:37             ` Mathieu Desnoyers
     [not found]             ` <BLU0-SMTP42770DC9BDE561B962274096DB0@phx.gbl>
2011-02-23 18:31               ` Paul E. McKenney
2011-02-23 18:40                 ` Mathieu Desnoyers
     [not found]         ` <BLU0-SMTP900C4ABCF4001FBCB1594696DB0@phx.gbl>
2011-02-23 17:49           ` Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 06/11] smp: Document transitivity for memory barriers Paul E. McKenney
2011-02-23  3:29   ` Steven Rostedt
2011-02-23  6:21     ` Lai Jiangshan
2011-02-23 15:14       ` Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 07/11] rcu: Remove conditional compilation for RCU CPU stall warnings Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 08/11] rcu: Decrease memory-barrier usage based on semi-formal proof Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 09/11] rcu: merge TREE_PREEPT_RCU blocked_tasks[] lists Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 10/11] rcu: Update documentation to reflect blocked_tasks[] merge Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 11/11] rcu: move TREE_RCU from softirq to kthread Paul E. McKenney
2011-02-23  2:44   ` Frederic Weisbecker
2011-02-23 15:11     ` Paul E. McKenney
2011-02-23  3:09   ` Frederic Weisbecker
2011-02-23 15:12     ` Paul E. McKenney
2011-02-23 14:02   ` Mathieu Desnoyers
     [not found]   ` <BLU0-SMTP211F39903EDACD9B7E025C96DB0@phx.gbl>
2011-02-23 14:42     ` Steven Rostedt
2011-02-23 16:16   ` Frederic Weisbecker
2011-02-23 16:41     ` Steven Rostedt
2011-02-23 17:03       ` Mathieu Desnoyers
2011-02-23 17:14       ` Frederic Weisbecker
     [not found]       ` <BLU0-SMTP5642728A153E83B94895F896DB0@phx.gbl>
2011-02-23 17:30         ` Frederic Weisbecker
     [not found]       ` <BLU0-SMTP65F733B8D1D704C7EA1F8796DB0@phx.gbl>
2011-02-23 17:34         ` Christoph Lameter
2011-02-23 18:17           ` Steven Rostedt
2011-02-23 18:29             ` Christoph Lameter
2011-02-23 18:32               ` Steven Rostedt
2011-02-23 19:19                 ` Christoph Lameter
2011-02-23 19:23                   ` Peter Zijlstra
2011-02-23 19:35                     ` Steven Rostedt
2011-02-23 19:40                     ` Christoph Lameter
2011-02-23 20:15                     ` Paul E. McKenney
2011-02-23 19:16               ` Paul E. McKenney
2011-02-23 19:24                 ` Christoph Lameter
2011-02-23 20:45                   ` Paul E. McKenney
2011-02-23 18:38             ` Mathieu Desnoyers
2011-02-23 18:27           ` Mathieu Desnoyers
2011-02-23 19:10           ` Paul E. McKenney
2011-02-23 19:22             ` Christoph Lameter
2011-02-23 19:39               ` Paul E. McKenney
2011-02-23 16:50   ` Frederic Weisbecker
2011-02-23 19:06     ` Paul E. McKenney
2011-02-23 19:13       ` Frederic Weisbecker
2011-02-23 20:41         ` Paul E. McKenney
     [not found]   ` <BLU0-SMTP57EE20F30B92B8763FD2FE96DB0@phx.gbl>
2011-02-23 18:52     ` Paul E. McKenney
2011-02-25  8:17   ` Lai Jiangshan
2011-02-25 20:32     ` Paul E. McKenney
2011-02-28  3:29       ` Lai Jiangshan
2011-02-28  9:47         ` Peter Zijlstra
2011-03-01  0:13           ` Paul E. McKenney
2011-03-01 14:38             ` Peter Zijlstra
2011-03-02  0:07               ` Paul E. McKenney
2011-03-02 22:41                 ` Paul E. McKenney
2011-02-28 23:51         ` Paul E. McKenney [this message]
2011-03-02  1:52           ` Lai Jiangshan
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 12/14] rcu: priority boosting for TREE_PREEMPT_RCU Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 13/14] rcu: eliminate unused boosting statistics Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 14/14] rcu: Add boosting to TREE_PREEMPT_RCU tracing Paul E. McKenney
2011-02-23  3:07   ` Lai Jiangshan
2011-02-23 16:31     ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110228235136.GC2331@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=akpm@linux-foundation.org \
    --cc=darren@dvhart.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=eric.dumazet@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=mingo@elte.hu \
    --cc=niv@us.ibm.com \
    --cc=paul.mckenney@linaro.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.