All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@polymtl.ca,
	josh@joshtriplett.org, niv@us.ibm.com, tglx@linutronix.de,
	peterz@infradead.org, rostedt@goodmis.org,
	Valdis.Kletnieks@vt.edu, dhowells@redhat.com,
	eric.dumazet@gmail.com, darren@dvhart.com,
	"Paul E. McKenney" <paul.mckenney@linaro.org>
Subject: Re: [PATCH RFC tip/core/rcu 11/11] rcu: move TREE_RCU from softirq to kthread
Date: Fri, 25 Feb 2011 12:32:19 -0800	[thread overview]
Message-ID: <20110225203219.GD2269@linux.vnet.ibm.com> (raw)
In-Reply-To: <4D6765B6.1030401@cn.fujitsu.com>

On Fri, Feb 25, 2011 at 04:17:58PM +0800, Lai Jiangshan wrote:
> On 02/23/2011 09:39 AM, Paul E. McKenney wrote:
> > From: Paul E. McKenney <paul.mckenney@linaro.org>
> > 
> > If RCU priority boosting is to be meaningful, callback invocation must
> > be boosted in addition to preempted RCU readers.  Otherwise, in presence
> > of CPU real-time threads, the grace period ends, but the callbacks don't
> > get invoked.  If the callbacks don't get invoked, the associated memory
> > doesn't get freed, so the system is still subject to OOM.
> > 
> > But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit
> > moves the callback invocations to a kthread, which can be boosted easily.
> > 
> > Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> >  include/linux/interrupt.h           |    1 -
> >  include/trace/events/irq.h          |    3 +-
> >  kernel/rcutree.c                    |  324 ++++++++++++++++++++++++++++++++++-
> >  kernel/rcutree.h                    |    8 +
> >  kernel/rcutree_plugin.h             |    4 +-
> >  tools/perf/util/trace-event-parse.c |    1 -
> >  6 files changed, 331 insertions(+), 10 deletions(-)
> > 
> > diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> > index 79d0c4f..ed47deb 100644
> > --- a/include/linux/interrupt.h
> > +++ b/include/linux/interrupt.h
> > @@ -385,7 +385,6 @@ enum
> >  	TASKLET_SOFTIRQ,
> >  	SCHED_SOFTIRQ,
> >  	HRTIMER_SOFTIRQ,
> > -	RCU_SOFTIRQ,	/* Preferable RCU should always be the last softirq */
> >  
> >  	NR_SOFTIRQS
> >  };
> > diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h
> > index 1c09820..ae045ca 100644
> > --- a/include/trace/events/irq.h
> > +++ b/include/trace/events/irq.h
> > @@ -20,8 +20,7 @@ struct softirq_action;
> >  			 softirq_name(BLOCK_IOPOLL),	\
> >  			 softirq_name(TASKLET),		\
> >  			 softirq_name(SCHED),		\
> > -			 softirq_name(HRTIMER),		\
> > -			 softirq_name(RCU))
> > +			 softirq_name(HRTIMER))
> >  
> >  /**
> >   * irq_handler_entry - called immediately before the irq action handler
> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > index 0ac1cc0..2241f28 100644
> > --- a/kernel/rcutree.c
> > +++ b/kernel/rcutree.c
> > @@ -47,6 +47,8 @@
> >  #include <linux/mutex.h>
> >  #include <linux/time.h>
> >  #include <linux/kernel_stat.h>
> > +#include <linux/wait.h>
> > +#include <linux/kthread.h>
> >  
> >  #include "rcutree.h"
> >  
> > @@ -82,6 +84,18 @@ DEFINE_PER_CPU(struct rcu_data, rcu_bh_data);
> >  int rcu_scheduler_active __read_mostly;
> >  EXPORT_SYMBOL_GPL(rcu_scheduler_active);
> >  
> > +/* Control variables for per-CPU and per-rcu_node kthreads. */
> 
> I think "per-leaf-rcu_node" is better. It seems that only the leaf rcu_node
> of rcu_sched are used for rcu_node kthreads and they also serve for
> other rcu domains(rcu_bh, rcu_preempt)? I think we need to add some
> comments for it.

There is a per-root_rcu_node kthread that is added with priority boosting.

Good point on the scope of the kthreads.  I have changed the above
comment to read:

/*
 * Control variables for per-CPU and per-rcu_node kthreads.  These
 * handle all flavors of RCU.
 */

Seem reasonable?

> > +/*
> > + * Timer handler to initiate the waking up of per-CPU kthreads that
> > + * have yielded the CPU due to excess numbers of RCU callbacks.
> > + */
> > +static void rcu_cpu_kthread_timer(unsigned long arg)
> > +{
> > +	unsigned long flags;
> > +	struct rcu_data *rdp = (struct rcu_data *)arg;
> > +	struct rcu_node *rnp = rdp->mynode;
> > +	struct task_struct *t;
> > +
> > +	raw_spin_lock_irqsave(&rnp->lock, flags);
> > +	rnp->wakemask |= rdp->grpmask;
> 
> I think there is no reason that the rnp->lock also protects the
> rnp->node_kthread_task. "raw_spin_unlock_irqrestore(&rnp->lock, flags);"
> can be moved up here.

If I am not too confused, the lock needs to cover the statements below
in order to correctly handle races with concurrent CPU-hotplug operations.

> > +	t = rnp->node_kthread_task;
> > +	if (t == NULL) {
> > +		raw_spin_unlock_irqrestore(&rnp->lock, flags);
> > +		return;
> > +	}
> > +	wake_up_process(t);
> > +	raw_spin_unlock_irqrestore(&rnp->lock, flags);
> > +}
> > +
> > +/*
> > + * Drop to non-real-time priority and yield, but only after posting a
> > + * timer that will cause us to regain our real-time priority if we
> > + * remain preempted.  Either way, we restore our real-time priority
> > + * before returning.
> > + */
> > +static void rcu_yield(int cpu)
> > +{
> > +	struct rcu_data *rdp = per_cpu_ptr(rcu_sched_state.rda, cpu);
> > +	struct sched_param sp;
> > +	struct timer_list yield_timer;
> > +
> > +	setup_timer(&yield_timer, rcu_cpu_kthread_timer, (unsigned long)rdp);
> > +	mod_timer(&yield_timer, jiffies + 2);
> > +	sp.sched_priority = 0;
> > +	sched_setscheduler_nocheck(current, SCHED_NORMAL, &sp);
> > +	schedule();
> > +	sp.sched_priority = RCU_KTHREAD_PRIO;
> > +	sched_setscheduler_nocheck(current, SCHED_FIFO, &sp);
> > +	del_timer(&yield_timer);
> > +}
> > +
> > +/*
> > + * Handle cases where the rcu_cpu_kthread() ends up on the wrong CPU.
> > + * This can happen while the corresponding CPU is either coming online
> > + * or going offline.  We cannot wait until the CPU is fully online
> > + * before starting the kthread, because the various notifier functions
> > + * can wait for RCU grace periods.  So we park rcu_cpu_kthread() until
> > + * the corresponding CPU is online.
> > + *
> > + * Return 1 if the kthread needs to stop, 0 otherwise.
> > + *
> > + * Caller must disable bh.  This function can momentarily enable it.
> > + */
> > +static int rcu_cpu_kthread_should_stop(int cpu)
> > +{
> > +	while (cpu_is_offline(cpu) || smp_processor_id() != cpu) {
> > +		if (kthread_should_stop())
> > +			return 1;
> > +		local_bh_enable();
> > +		schedule_timeout_uninterruptible(1);
> > +		if (smp_processor_id() != cpu)
> > +			set_cpus_allowed_ptr(current, cpumask_of(cpu));
> 
> The current task is PF_THREAD_BOUND,
> Why do "set_cpus_allowed_ptr(current, cpumask_of(cpu));" ?

Because I have seen CPU hotplug operations unbind PF_THREAD_BOUND threads.
In addition, I end up having to spawn the kthread at CPU_UP_PREPARE time,
at which point the thread must run unbound because its CPU isn't online
yet.  I cannot invoke kthread_create() within the stop-machine handler
(right?).  I cannot wait until CPU_ONLINE time because that results in
hangs when other CPU notifiers wait for grace periods.

Yes, I did find out about the hangs the hard way.  Why do you ask?  ;-)

Please feel free to suggest improvements in the header comment above
for rcu_cpu_kthread_should_stop(), which is my apparently insufficient
attempt to explain this.

> > +		local_bh_disable();
> > +	}
> > +	return 0;
> > +}
> > +
> > +/*
> > + * Per-CPU kernel thread that invokes RCU callbacks.  This replaces the
> > + * earlier RCU softirq.
> > + */
> > +static int rcu_cpu_kthread(void *arg)
> > +{
> > +	int cpu = (int)(long)arg;
> > +	unsigned long flags;
> > +	int spincnt = 0;
> > +	wait_queue_head_t *wqp = &per_cpu(rcu_cpu_wq, cpu);
> > +	char work;
> > +	char *workp = &per_cpu(rcu_cpu_has_work, cpu);
> > +
> > +	for (;;) {
> > +		wait_event_interruptible(*wqp,
> > +					 *workp != 0 || kthread_should_stop());
> > +		local_bh_disable();
> > +		if (rcu_cpu_kthread_should_stop(cpu)) {
> > +			local_bh_enable();
> > +			break;
> > +		}
> > +		local_irq_save(flags);
> > +		work = *workp;
> > +		*workp = 0;
> > +		local_irq_restore(flags);
> > +		if (work)
> > +			rcu_process_callbacks();
> > +		local_bh_enable();
> > +		if (*workp != 0)
> > +			spincnt++;
> > +		else
> > +			spincnt = 0;
> > +		if (spincnt > 10) {
> 
> "10" is a magic number here.

It is indeed.  Suggestions for a cpp macro name to hide it behind?

> > +			rcu_yield(cpu);
> > +			spincnt = 0;
> > +		}
> > +	}
> > +	return 0;
> > +}
> > +
> 
> 
> > +/*
> > + * Per-rcu_node kthread, which is in charge of waking up the per-CPU
> > + * kthreads when needed.
> > + */
> > +static int rcu_node_kthread(void *arg)
> > +{
> > +	int cpu;
> > +	unsigned long flags;
> > +	unsigned long mask;
> > +	struct rcu_node *rnp = (struct rcu_node *)arg;
> > +	struct sched_param sp;
> > +	struct task_struct *t;
> > +
> > +	for (;;) {
> > +		wait_event_interruptible(rnp->node_wq, rnp->wakemask != 0 ||
> > +						       kthread_should_stop());
> > +		if (kthread_should_stop())
> > +			break;
> > +		raw_spin_lock_irqsave(&rnp->lock, flags);
> > +		mask = rnp->wakemask;
> > +		rnp->wakemask = 0;
> > +		raw_spin_unlock_irqrestore(&rnp->lock, flags);
> > +		for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
> > +			if ((mask & 0x1) == 0)
> > +				continue;
> > +			preempt_disable();
> > +			per_cpu(rcu_cpu_has_work, cpu) = 1;
> > +			t = per_cpu(rcu_cpu_kthread_task, cpu);
> > +			if (t == NULL) {
> > +				preempt_enable();
> > +				continue;
> > +			}
> 
> Obviously preempt_disable() is not for protecting remote percpu data.
> Is it for disabling cpu hotplug? I am afraid the @t may leave
> and become invalid.

Indeed, acquiring the rnp->lock is safer, except that I don't trust
calling sched_setscheduler_nocheck() in that state.  So I need to check
for the CPU being online after the preempt_disable().  This means that
I ignore requests to do work after CPU_DYING time, but that is OK because
force_quiescent_state() will figure out that the CPU is in fact offline.

Make sense?

In any case, good catch!!!

							Thanx, Paul

  reply	other threads:[~2011-02-25 20:32 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-23  1:39 [PATCH tip/core/rcu 0/14] Preview of RCU patches for 2.6.39 Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 01/11] rcu: call __rcu_read_unlock() in exit_rcu for tiny RCU Paul E. McKenney
2011-02-25  8:29   ` Lai Jiangshan
2011-02-25 19:40     ` Paul E. McKenney
2011-03-24  3:45       ` Lai Jiangshan
2011-03-24 13:07         ` Paul E. McKenney
2011-03-25  2:30           ` Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 02/11] rcutorture: Get rid of duplicate sched.h include Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 03/11] rcu: add documentation saying which RCU flavor to choose Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 04/11] rcupdate: remove dead code Paul E. McKenney
2011-02-23 14:36   ` Mathieu Desnoyers
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 05/11] rcu: add comment saying why DEBUG_OBJECTS_RCU_HEAD depends on PREEMPT Paul E. McKenney
2011-02-23  3:23   ` Steven Rostedt
2011-02-23 13:59     ` Mathieu Desnoyers
     [not found]     ` <BLU0-SMTP615CB0BE0A2623EF62925096DB0@phx.gbl>
2011-02-23 14:11       ` Steven Rostedt
2011-02-23 14:37         ` Mathieu Desnoyers
2011-02-23 14:55       ` Steven Rostedt
2011-02-23 15:02         ` Mathieu Desnoyers
2011-02-23 15:13         ` [PATCH] debug rcu head support !PREEMPT config Mathieu Desnoyers
     [not found]         ` <BLU0-SMTP1519908E0ACAEE1384F71896DB0@phx.gbl>
2011-02-23 15:27           ` Steven Rostedt
2011-02-23 15:37             ` Mathieu Desnoyers
     [not found]             ` <BLU0-SMTP42770DC9BDE561B962274096DB0@phx.gbl>
2011-02-23 18:31               ` Paul E. McKenney
2011-02-23 18:40                 ` Mathieu Desnoyers
     [not found]         ` <BLU0-SMTP900C4ABCF4001FBCB1594696DB0@phx.gbl>
2011-02-23 17:49           ` Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 06/11] smp: Document transitivity for memory barriers Paul E. McKenney
2011-02-23  3:29   ` Steven Rostedt
2011-02-23  6:21     ` Lai Jiangshan
2011-02-23 15:14       ` Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 07/11] rcu: Remove conditional compilation for RCU CPU stall warnings Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 08/11] rcu: Decrease memory-barrier usage based on semi-formal proof Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 09/11] rcu: merge TREE_PREEPT_RCU blocked_tasks[] lists Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 10/11] rcu: Update documentation to reflect blocked_tasks[] merge Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 11/11] rcu: move TREE_RCU from softirq to kthread Paul E. McKenney
2011-02-23  2:44   ` Frederic Weisbecker
2011-02-23 15:11     ` Paul E. McKenney
2011-02-23  3:09   ` Frederic Weisbecker
2011-02-23 15:12     ` Paul E. McKenney
2011-02-23 14:02   ` Mathieu Desnoyers
     [not found]   ` <BLU0-SMTP211F39903EDACD9B7E025C96DB0@phx.gbl>
2011-02-23 14:42     ` Steven Rostedt
2011-02-23 16:16   ` Frederic Weisbecker
2011-02-23 16:41     ` Steven Rostedt
2011-02-23 17:03       ` Mathieu Desnoyers
2011-02-23 17:14       ` Frederic Weisbecker
     [not found]       ` <BLU0-SMTP5642728A153E83B94895F896DB0@phx.gbl>
2011-02-23 17:30         ` Frederic Weisbecker
     [not found]       ` <BLU0-SMTP65F733B8D1D704C7EA1F8796DB0@phx.gbl>
2011-02-23 17:34         ` Christoph Lameter
2011-02-23 18:17           ` Steven Rostedt
2011-02-23 18:29             ` Christoph Lameter
2011-02-23 18:32               ` Steven Rostedt
2011-02-23 19:19                 ` Christoph Lameter
2011-02-23 19:23                   ` Peter Zijlstra
2011-02-23 19:35                     ` Steven Rostedt
2011-02-23 19:40                     ` Christoph Lameter
2011-02-23 20:15                     ` Paul E. McKenney
2011-02-23 19:16               ` Paul E. McKenney
2011-02-23 19:24                 ` Christoph Lameter
2011-02-23 20:45                   ` Paul E. McKenney
2011-02-23 18:38             ` Mathieu Desnoyers
2011-02-23 18:27           ` Mathieu Desnoyers
2011-02-23 19:10           ` Paul E. McKenney
2011-02-23 19:22             ` Christoph Lameter
2011-02-23 19:39               ` Paul E. McKenney
2011-02-23 16:50   ` Frederic Weisbecker
2011-02-23 19:06     ` Paul E. McKenney
2011-02-23 19:13       ` Frederic Weisbecker
2011-02-23 20:41         ` Paul E. McKenney
     [not found]   ` <BLU0-SMTP57EE20F30B92B8763FD2FE96DB0@phx.gbl>
2011-02-23 18:52     ` Paul E. McKenney
2011-02-25  8:17   ` Lai Jiangshan
2011-02-25 20:32     ` Paul E. McKenney [this message]
2011-02-28  3:29       ` Lai Jiangshan
2011-02-28  9:47         ` Peter Zijlstra
2011-03-01  0:13           ` Paul E. McKenney
2011-03-01 14:38             ` Peter Zijlstra
2011-03-02  0:07               ` Paul E. McKenney
2011-03-02 22:41                 ` Paul E. McKenney
2011-02-28 23:51         ` Paul E. McKenney
2011-03-02  1:52           ` Lai Jiangshan
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 12/14] rcu: priority boosting for TREE_PREEMPT_RCU Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 13/14] rcu: eliminate unused boosting statistics Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 14/14] rcu: Add boosting to TREE_PREEMPT_RCU tracing Paul E. McKenney
2011-02-23  3:07   ` Lai Jiangshan
2011-02-23 16:31     ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110225203219.GD2269@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=akpm@linux-foundation.org \
    --cc=darren@dvhart.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=eric.dumazet@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=mingo@elte.hu \
    --cc=niv@us.ibm.com \
    --cc=paul.mckenney@linaro.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.