public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Lai Jiangshan <laijs@cn.fujitsu.com>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@polymtl.ca,
	josh@joshtriplett.org, niv@us.ibm.com, tglx@linutronix.de,
	peterz@infradead.org, rostedt@goodmis.org,
	Valdis.Kletnieks@vt.edu, dhowells@redhat.com,
	eric.dumazet@gmail.com, darren@dvhart.com,
	"Paul E. McKenney" <paul.mckenney@linaro.org>
Subject: Re: [PATCH RFC tip/core/rcu 11/11] rcu: move TREE_RCU from softirq to kthread
Date: Fri, 25 Feb 2011 16:17:58 +0800	[thread overview]
Message-ID: <4D6765B6.1030401@cn.fujitsu.com> (raw)
In-Reply-To: <1298425183-21265-11-git-send-email-paulmck@linux.vnet.ibm.com>

On 02/23/2011 09:39 AM, Paul E. McKenney wrote:
> From: Paul E. McKenney <paul.mckenney@linaro.org>
> 
> If RCU priority boosting is to be meaningful, callback invocation must
> be boosted in addition to preempted RCU readers.  Otherwise, in presence
> of CPU real-time threads, the grace period ends, but the callbacks don't
> get invoked.  If the callbacks don't get invoked, the associated memory
> doesn't get freed, so the system is still subject to OOM.
> 
> But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit
> moves the callback invocations to a kthread, which can be boosted easily.
> 
> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
>  include/linux/interrupt.h           |    1 -
>  include/trace/events/irq.h          |    3 +-
>  kernel/rcutree.c                    |  324 ++++++++++++++++++++++++++++++++++-
>  kernel/rcutree.h                    |    8 +
>  kernel/rcutree_plugin.h             |    4 +-
>  tools/perf/util/trace-event-parse.c |    1 -
>  6 files changed, 331 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index 79d0c4f..ed47deb 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -385,7 +385,6 @@ enum
>  	TASKLET_SOFTIRQ,
>  	SCHED_SOFTIRQ,
>  	HRTIMER_SOFTIRQ,
> -	RCU_SOFTIRQ,	/* Preferable RCU should always be the last softirq */
>  
>  	NR_SOFTIRQS
>  };
> diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h
> index 1c09820..ae045ca 100644
> --- a/include/trace/events/irq.h
> +++ b/include/trace/events/irq.h
> @@ -20,8 +20,7 @@ struct softirq_action;
>  			 softirq_name(BLOCK_IOPOLL),	\
>  			 softirq_name(TASKLET),		\
>  			 softirq_name(SCHED),		\
> -			 softirq_name(HRTIMER),		\
> -			 softirq_name(RCU))
> +			 softirq_name(HRTIMER))
>  
>  /**
>   * irq_handler_entry - called immediately before the irq action handler
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 0ac1cc0..2241f28 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -47,6 +47,8 @@
>  #include <linux/mutex.h>
>  #include <linux/time.h>
>  #include <linux/kernel_stat.h>
> +#include <linux/wait.h>
> +#include <linux/kthread.h>
>  
>  #include "rcutree.h"
>  
> @@ -82,6 +84,18 @@ DEFINE_PER_CPU(struct rcu_data, rcu_bh_data);
>  int rcu_scheduler_active __read_mostly;
>  EXPORT_SYMBOL_GPL(rcu_scheduler_active);
>  
> +/* Control variables for per-CPU and per-rcu_node kthreads. */

I think "per-leaf-rcu_node" is better. It seems that only the leaf rcu_node
of rcu_sched are used for rcu_node kthreads and they also serve for
other rcu domains(rcu_bh, rcu_preempt)? I think we need to add some
comments for it.

> +/*
> + * Timer handler to initiate the waking up of per-CPU kthreads that
> + * have yielded the CPU due to excess numbers of RCU callbacks.
> + */
> +static void rcu_cpu_kthread_timer(unsigned long arg)
> +{
> +	unsigned long flags;
> +	struct rcu_data *rdp = (struct rcu_data *)arg;
> +	struct rcu_node *rnp = rdp->mynode;
> +	struct task_struct *t;
> +
> +	raw_spin_lock_irqsave(&rnp->lock, flags);
> +	rnp->wakemask |= rdp->grpmask;

I think there is no reason that the rnp->lock also protects the
rnp->node_kthread_task. "raw_spin_unlock_irqrestore(&rnp->lock, flags);"
can be moved up here.

> +	t = rnp->node_kthread_task;
> +	if (t == NULL) {
> +		raw_spin_unlock_irqrestore(&rnp->lock, flags);
> +		return;
> +	}
> +	wake_up_process(t);
> +	raw_spin_unlock_irqrestore(&rnp->lock, flags);
> +}
> +
> +/*
> + * Drop to non-real-time priority and yield, but only after posting a
> + * timer that will cause us to regain our real-time priority if we
> + * remain preempted.  Either way, we restore our real-time priority
> + * before returning.
> + */
> +static void rcu_yield(int cpu)
> +{
> +	struct rcu_data *rdp = per_cpu_ptr(rcu_sched_state.rda, cpu);
> +	struct sched_param sp;
> +	struct timer_list yield_timer;
> +
> +	setup_timer(&yield_timer, rcu_cpu_kthread_timer, (unsigned long)rdp);
> +	mod_timer(&yield_timer, jiffies + 2);
> +	sp.sched_priority = 0;
> +	sched_setscheduler_nocheck(current, SCHED_NORMAL, &sp);
> +	schedule();
> +	sp.sched_priority = RCU_KTHREAD_PRIO;
> +	sched_setscheduler_nocheck(current, SCHED_FIFO, &sp);
> +	del_timer(&yield_timer);
> +}
> +
> +/*
> + * Handle cases where the rcu_cpu_kthread() ends up on the wrong CPU.
> + * This can happen while the corresponding CPU is either coming online
> + * or going offline.  We cannot wait until the CPU is fully online
> + * before starting the kthread, because the various notifier functions
> + * can wait for RCU grace periods.  So we park rcu_cpu_kthread() until
> + * the corresponding CPU is online.
> + *
> + * Return 1 if the kthread needs to stop, 0 otherwise.
> + *
> + * Caller must disable bh.  This function can momentarily enable it.
> + */
> +static int rcu_cpu_kthread_should_stop(int cpu)
> +{
> +	while (cpu_is_offline(cpu) || smp_processor_id() != cpu) {
> +		if (kthread_should_stop())
> +			return 1;
> +		local_bh_enable();
> +		schedule_timeout_uninterruptible(1);
> +		if (smp_processor_id() != cpu)
> +			set_cpus_allowed_ptr(current, cpumask_of(cpu));

The current task is PF_THREAD_BOUND,
Why do "set_cpus_allowed_ptr(current, cpumask_of(cpu));" ?

> +		local_bh_disable();
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Per-CPU kernel thread that invokes RCU callbacks.  This replaces the
> + * earlier RCU softirq.
> + */
> +static int rcu_cpu_kthread(void *arg)
> +{
> +	int cpu = (int)(long)arg;
> +	unsigned long flags;
> +	int spincnt = 0;
> +	wait_queue_head_t *wqp = &per_cpu(rcu_cpu_wq, cpu);
> +	char work;
> +	char *workp = &per_cpu(rcu_cpu_has_work, cpu);
> +
> +	for (;;) {
> +		wait_event_interruptible(*wqp,
> +					 *workp != 0 || kthread_should_stop());
> +		local_bh_disable();
> +		if (rcu_cpu_kthread_should_stop(cpu)) {
> +			local_bh_enable();
> +			break;
> +		}
> +		local_irq_save(flags);
> +		work = *workp;
> +		*workp = 0;
> +		local_irq_restore(flags);
> +		if (work)
> +			rcu_process_callbacks();
> +		local_bh_enable();
> +		if (*workp != 0)
> +			spincnt++;
> +		else
> +			spincnt = 0;
> +		if (spincnt > 10) {

"10" is a magic number here.

> +			rcu_yield(cpu);
> +			spincnt = 0;
> +		}
> +	}
> +	return 0;
> +}
> +


> +/*
> + * Per-rcu_node kthread, which is in charge of waking up the per-CPU
> + * kthreads when needed.
> + */
> +static int rcu_node_kthread(void *arg)
> +{
> +	int cpu;
> +	unsigned long flags;
> +	unsigned long mask;
> +	struct rcu_node *rnp = (struct rcu_node *)arg;
> +	struct sched_param sp;
> +	struct task_struct *t;
> +
> +	for (;;) {
> +		wait_event_interruptible(rnp->node_wq, rnp->wakemask != 0 ||
> +						       kthread_should_stop());
> +		if (kthread_should_stop())
> +			break;
> +		raw_spin_lock_irqsave(&rnp->lock, flags);
> +		mask = rnp->wakemask;
> +		rnp->wakemask = 0;
> +		raw_spin_unlock_irqrestore(&rnp->lock, flags);
> +		for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
> +			if ((mask & 0x1) == 0)
> +				continue;
> +			preempt_disable();
> +			per_cpu(rcu_cpu_has_work, cpu) = 1;
> +			t = per_cpu(rcu_cpu_kthread_task, cpu);
> +			if (t == NULL) {
> +				preempt_enable();
> +				continue;
> +			}

Obviously preempt_disable() is not for protecting remote percpu data.
Is it for disabling cpu hotplug? I am afraid the @t may leave
and become invalid.

  parent reply	other threads:[~2011-02-25  8:29 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-23  1:39 [PATCH tip/core/rcu 0/14] Preview of RCU patches for 2.6.39 Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 01/11] rcu: call __rcu_read_unlock() in exit_rcu for tiny RCU Paul E. McKenney
2011-02-25  8:29   ` Lai Jiangshan
2011-02-25 19:40     ` Paul E. McKenney
2011-03-24  3:45       ` Lai Jiangshan
2011-03-24 13:07         ` Paul E. McKenney
2011-03-25  2:30           ` Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 02/11] rcutorture: Get rid of duplicate sched.h include Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 03/11] rcu: add documentation saying which RCU flavor to choose Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 04/11] rcupdate: remove dead code Paul E. McKenney
2011-02-23 14:36   ` Mathieu Desnoyers
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 05/11] rcu: add comment saying why DEBUG_OBJECTS_RCU_HEAD depends on PREEMPT Paul E. McKenney
2011-02-23  3:23   ` Steven Rostedt
2011-02-23 13:59     ` Mathieu Desnoyers
     [not found]     ` <BLU0-SMTP615CB0BE0A2623EF62925096DB0@phx.gbl>
2011-02-23 14:11       ` Steven Rostedt
2011-02-23 14:37         ` Mathieu Desnoyers
2011-02-23 14:55       ` Steven Rostedt
2011-02-23 15:02         ` Mathieu Desnoyers
2011-02-23 15:13         ` [PATCH] debug rcu head support !PREEMPT config Mathieu Desnoyers
     [not found]         ` <BLU0-SMTP1519908E0ACAEE1384F71896DB0@phx.gbl>
2011-02-23 15:27           ` Steven Rostedt
2011-02-23 15:37             ` Mathieu Desnoyers
     [not found]             ` <BLU0-SMTP42770DC9BDE561B962274096DB0@phx.gbl>
2011-02-23 18:31               ` Paul E. McKenney
2011-02-23 18:40                 ` Mathieu Desnoyers
     [not found]         ` <BLU0-SMTP900C4ABCF4001FBCB1594696DB0@phx.gbl>
2011-02-23 17:49           ` Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 06/11] smp: Document transitivity for memory barriers Paul E. McKenney
2011-02-23  3:29   ` Steven Rostedt
2011-02-23  6:21     ` Lai Jiangshan
2011-02-23 15:14       ` Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 07/11] rcu: Remove conditional compilation for RCU CPU stall warnings Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 08/11] rcu: Decrease memory-barrier usage based on semi-formal proof Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 09/11] rcu: merge TREE_PREEPT_RCU blocked_tasks[] lists Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 10/11] rcu: Update documentation to reflect blocked_tasks[] merge Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 11/11] rcu: move TREE_RCU from softirq to kthread Paul E. McKenney
2011-02-23  2:44   ` Frederic Weisbecker
2011-02-23 15:11     ` Paul E. McKenney
2011-02-23  3:09   ` Frederic Weisbecker
2011-02-23 15:12     ` Paul E. McKenney
2011-02-23 14:02   ` Mathieu Desnoyers
     [not found]   ` <BLU0-SMTP211F39903EDACD9B7E025C96DB0@phx.gbl>
2011-02-23 14:42     ` Steven Rostedt
2011-02-23 16:16   ` Frederic Weisbecker
2011-02-23 16:41     ` Steven Rostedt
2011-02-23 17:03       ` Mathieu Desnoyers
2011-02-23 17:14       ` Frederic Weisbecker
     [not found]       ` <BLU0-SMTP5642728A153E83B94895F896DB0@phx.gbl>
2011-02-23 17:30         ` Frederic Weisbecker
     [not found]       ` <BLU0-SMTP65F733B8D1D704C7EA1F8796DB0@phx.gbl>
2011-02-23 17:34         ` Christoph Lameter
2011-02-23 18:17           ` Steven Rostedt
2011-02-23 18:29             ` Christoph Lameter
2011-02-23 18:32               ` Steven Rostedt
2011-02-23 19:19                 ` Christoph Lameter
2011-02-23 19:23                   ` Peter Zijlstra
2011-02-23 19:35                     ` Steven Rostedt
2011-02-23 19:40                     ` Christoph Lameter
2011-02-23 20:15                     ` Paul E. McKenney
2011-02-23 19:16               ` Paul E. McKenney
2011-02-23 19:24                 ` Christoph Lameter
2011-02-23 20:45                   ` Paul E. McKenney
2011-02-23 18:38             ` Mathieu Desnoyers
2011-02-23 18:27           ` Mathieu Desnoyers
2011-02-23 19:10           ` Paul E. McKenney
2011-02-23 19:22             ` Christoph Lameter
2011-02-23 19:39               ` Paul E. McKenney
2011-02-23 16:50   ` Frederic Weisbecker
2011-02-23 19:06     ` Paul E. McKenney
2011-02-23 19:13       ` Frederic Weisbecker
2011-02-23 20:41         ` Paul E. McKenney
     [not found]   ` <BLU0-SMTP57EE20F30B92B8763FD2FE96DB0@phx.gbl>
2011-02-23 18:52     ` Paul E. McKenney
2011-02-25  8:17   ` Lai Jiangshan [this message]
2011-02-25 20:32     ` Paul E. McKenney
2011-02-28  3:29       ` Lai Jiangshan
2011-02-28  9:47         ` Peter Zijlstra
2011-03-01  0:13           ` Paul E. McKenney
2011-03-01 14:38             ` Peter Zijlstra
2011-03-02  0:07               ` Paul E. McKenney
2011-03-02 22:41                 ` Paul E. McKenney
2011-02-28 23:51         ` Paul E. McKenney
2011-03-02  1:52           ` Lai Jiangshan
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 12/14] rcu: priority boosting for TREE_PREEMPT_RCU Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 13/14] rcu: eliminate unused boosting statistics Paul E. McKenney
2011-02-23  1:39 ` [PATCH RFC tip/core/rcu 14/14] rcu: Add boosting to TREE_PREEMPT_RCU tracing Paul E. McKenney
2011-02-23  3:07   ` Lai Jiangshan
2011-02-23 16:31     ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D6765B6.1030401@cn.fujitsu.com \
    --to=laijs@cn.fujitsu.com \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=akpm@linux-foundation.org \
    --cc=darren@dvhart.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=eric.dumazet@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=mingo@elte.hu \
    --cc=niv@us.ibm.com \
    --cc=paul.mckenney@linaro.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox