Re: [PATCH] a local-timer-free version of RCU

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Lai Jiangshan <laijs@cn.fujitsu.com>
To: Joe Korty <joe.korty@ccur.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	fweisbec@gmail.com, mathieu.desnoyers@efficios.com,
	dhowells@redhat.com, loic.minier@linaro.org,
	dhaval.giani@gmail.com, tglx@linutronix.de, peterz@infradead.org,
	linux-kernel@vger.kernel.org, josh@joshtriplett.org,
	houston.jim@comcast.net, Lai Jiangshan <laijs@cn.fujitsu.com>
Subject: Re: [PATCH] a local-timer-free version of RCU
Date: Tue, 09 Nov 2010 17:22:49 +0800	[thread overview]
Message-ID: <4CD912E9.1080907@cn.fujitsu.com> (raw)
In-Reply-To: <20101105210059.GA27317@tsunami.ccur.com>



On Sat, Nov 6, 2010 at 5:00 AM, Joe Korty <joe.korty@ccur.com> wrote:
> +}
> +
> +/**
> + * rcu_read_lock - mark the beginning of an RCU read-side critical section.
> + *
> + * When synchronize_rcu() is invoked on one CPU while other CPUs
> + * are within RCU read-side critical sections, then the
> + * synchronize_rcu() is guaranteed to block until after all the other
> + * CPUs exit their critical sections.  Similarly, if call_rcu() is invoked
> + * on one CPU while other CPUs are within RCU read-side critical
> + * sections, invocation of the corresponding RCU callback is deferred
> + * until after the all the other CPUs exit their critical sections.
> + *
> + * Note, however, that RCU callbacks are permitted to run concurrently
> + * with RCU read-side critical sections.  One way that this can happen
> + * is via the following sequence of events: (1) CPU 0 enters an RCU
> + * read-side critical section, (2) CPU 1 invokes call_rcu() to register
> + * an RCU callback, (3) CPU 0 exits the RCU read-side critical section,
> + * (4) CPU 2 enters a RCU read-side critical section, (5) the RCU
> + * callback is invoked.  This is legal, because the RCU read-side critical
> + * section that was running concurrently with the call_rcu() (and which
> + * therefore might be referencing something that the corresponding RCU
> + * callback would free up) has completed before the corresponding
> + * RCU callback is invoked.
> + *
> + * RCU read-side critical sections may be nested.  Any deferred actions
> + * will be deferred until the outermost RCU read-side critical section
> + * completes.
> + *
> + * It is illegal to block while in an RCU read-side critical section.
> + */
> +void __rcu_read_lock(void)
> +{
> +       struct rcu_data *r;
> +
> +       r = &per_cpu(rcu_data, smp_processor_id());
> +       if (r->nest_count++ == 0)
> +               /*
> +                * Set the flags value to show that we are in
> +                * a read side critical section.  The code starting
> +                * a batch uses this to determine if a processor
> +                * needs to participate in the batch.  Including
> +                * a sequence allows the remote processor to tell
> +                * that a critical section has completed and another
> +                * has begun.
> +                */

memory barrier is needed as Paul noted.

> +               r->flags = IN_RCU_READ_LOCK | (r->sequence++ << 2);
> +}
> +EXPORT_SYMBOL(__rcu_read_lock);
> +
> +/**
> + * rcu_read_unlock - marks the end of an RCU read-side critical section.
> + * Check if a RCU batch was started while we were in the critical
> + * section.  If so, call rcu_quiescent() join the rendezvous.
> + *
> + * See rcu_read_lock() for more information.
> + */
> +void __rcu_read_unlock(void)
> +{
> +       struct rcu_data *r;
> +       int     cpu, flags;
> +
> +       cpu = smp_processor_id();
> +       r = &per_cpu(rcu_data, cpu);
> +       if (--r->nest_count == 0) {
> +               flags = xchg(&r->flags, 0);
> +               if (flags & DO_RCU_COMPLETION)
> +                       rcu_quiescent(cpu);
> +       }
> +}
> +EXPORT_SYMBOL(__rcu_read_unlock);

It is hardly acceptable when there are memory barriers or
atomic operations in the fast paths of rcu_read_lock(),
rcu_read_unlock().

We need some thing to drive the completion of GP
(and the callbacks process). There is no free lunch,
if the completion of GP is driven by rcu_read_unlock(),
we very probably need memory barriers or atomic operations
in the fast paths of rcu_read_lock(), rcu_read_unlock().

We need look for some periodic/continuous events of
the kernel for GP-driven, schedule-tick and schedule() are
most ideal events sources in the kernel I think.

schedule-tick and schedule() are not happened when NO_HZ
and dyntick-hpc, so we need some approaches to fix it. I vote up
for the #5 approach of Paul's.

Also, I propose an unmature idea here.

	Don't tell RCU about dyntick-hpc mode, but instead
	stop the RCU function of a CPU when the first time RCU disturbs
	dyntick-hpc mode or NO_HZ mode.

	rcu_read_lock()
		if the RCU function of this CPU is not started, start it and
		start a RCU timer.
		handle rcu_read_lock()

	enter NO_HZ
		if interrupts are just happened very frequently, do nothing, else:
		stop the rcu function and the rcu timer of the current CPU.

	exit interrupt:
		if this interrupt is just caused by RCU timer && it just disrurbs
		dyntick-hpc mode or NO_HZ mode(and will reenter these modes),
		stop the rcu function and stop the rcu timer of the current CPU.

	schedule-tick:
		requeue the rcu timer before it causes an unneeded interrupt.
		handle rcu things.

	+	Not big changes to RCU, use the same code to handle
		dyntick-hpc mode or NO_HZ mode, reuse some code of rcu_offline_cpu()

	+	No need to inform RCU of user/kernel transitions.

	+	No need to turn scheduling-clock interrupts on
		at each user/kernel transition.

	-	carefully handle some critical region which also implies
		rcu critical region.

	-	Introduce some unneeded interrupt, but it is very infrequency.

Thanks,
Lai

> +
> +/**
> + * call_rcu - Queue an RCU callback for invocation after a grace period.
> + * @head: structure to be used for queueing the RCU updates.
> + * @func: actual update function to be invoked after the grace period
> + *
> + * The update function will be invoked some time after a full grace
> + * period elapses, in other words after all currently executing RCU
> + * read-side critical sections have completed.  RCU read-side critical
> + * sections are delimited by rcu_read_lock() and rcu_read_unlock(),
> + * and may be nested.
> + */
> +void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
> +{
> +       struct rcu_data *r;
> +       unsigned long flags;
> +       int cpu;
> +
> +       head->func = func;
> +       head->next = NULL;
> +       local_irq_save(flags);
> +       cpu = smp_processor_id();
> +       r = &per_cpu(rcu_data, cpu);
> +       /*
> +        * Avoid mixing new entries with batches which have already
> +        * completed or have a grace period in progress.
> +        */
> +       if (r->nxt.head && rcu_move_if_done(r))
> +               rcu_wake_daemon(r);
> +
> +       rcu_list_add(&r->nxt, head);
> +       if (r->nxtcount++ == 0) {

memory barrier is needed. (before read the rcu_batch)

> +               r->nxtbatch = (rcu_batch & RCU_BATCH_MASK) + RCU_INCREMENT;
> +               barrier();
> +               if (!rcu_timestamp)
> +                       rcu_timestamp = jiffies ?: 1;
> +       }
> +       /* If we reach the limit start a batch. */
> +       if (r->nxtcount > rcu_max_count) {
> +               if (rcu_set_state(RCU_NEXT_PENDING) == RCU_COMPLETE)
> +                       rcu_start_batch();
> +       }
> +       local_irq_restore(flags);
> +}
> +EXPORT_SYMBOL_GPL(call_rcu);
> +
> +


> +/*
> + * Process the completed RCU callbacks.
> + */
> +static void rcu_process_callbacks(struct rcu_data *r)
> +{
> +       struct rcu_head *list, *next;
> +
> +       local_irq_disable();
> +       rcu_move_if_done(r);
> +       list = r->done.head;
> +       rcu_list_init(&r->done);
> +       local_irq_enable();
> +

memory barrier is needed. (after read the rcu_batch)

> +       while (list) {
> +               next = list->next;
> +               list->func(list);
> +               list = next;
> +       }
> +}

next prev parent reply	other threads:[~2010-11-09  9:17 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-04 23:21 dyntick-hpc and RCU Paul E. McKenney
2010-11-05  5:27 ` Frederic Weisbecker
2010-11-05  5:38   ` Frederic Weisbecker
2010-11-05 15:06     ` Paul E. McKenney
2010-11-05 20:06       ` Dhaval Giani
2010-11-05 15:04   ` Paul E. McKenney
2010-11-08 14:10     ` Frederic Weisbecker
2010-11-05 21:00 ` [PATCH] a local-timer-free version of RCU Joe Korty
2010-11-06 19:28   ` Paul E. McKenney
2010-11-06 19:34     ` Mathieu Desnoyers
2010-11-06 19:42       ` Mathieu Desnoyers
2010-11-06 19:44         ` Paul E. McKenney
2010-11-08  2:11     ` Udo A. Steinberg
2010-11-08  2:19       ` Udo A. Steinberg
2010-11-08  2:54         ` Paul E. McKenney
2010-11-08 15:32           ` Frederic Weisbecker
2010-11-08 19:38             ` Paul E. McKenney
2010-11-08 20:40               ` Frederic Weisbecker
2010-11-10 18:08                 ` Paul E. McKenney
2010-11-08 15:06     ` Frederic Weisbecker
2010-11-08 15:18       ` Joe Korty
2010-11-08 19:50         ` Paul E. McKenney
2010-11-08 19:49       ` Paul E. McKenney
2010-11-08 20:51         ` Frederic Weisbecker
2010-11-06 20:03   ` Mathieu Desnoyers
2010-11-09  9:22   ` Lai Jiangshan [this message]
2010-11-10 15:54     ` Frederic Weisbecker
2010-11-10 17:31       ` Peter Zijlstra
2010-11-10 17:45         ` Frederic Weisbecker
2010-11-11  4:19         ` Paul E. McKenney
2010-11-13 22:30           ` Frederic Weisbecker
2010-11-16  1:28             ` Paul E. McKenney
2010-11-16 13:52               ` Frederic Weisbecker
2010-11-16 15:51                 ` Paul E. McKenney
2010-11-17  0:52                   ` Frederic Weisbecker
2010-11-17  1:25                     ` Paul E. McKenney
2011-03-07 20:31                     ` [PATCH] An RCU for SMP with a single CPU garbage collector Joe Korty
     [not found]                       ` <20110307210157.GG3104@linux.vnet.ibm.com>
2011-03-07 21:16                         ` Joe Korty
2011-03-07 21:33                           ` Joe Korty
2011-03-07 22:51                           ` Joe Korty
2011-03-08  9:07                             ` Paul E. McKenney
2011-03-08 15:57                               ` Joe Korty
2011-03-08 22:53                                 ` Joe Korty
2011-03-10  0:30                                   ` Paul E. McKenney
2011-03-10  0:28                                 ` Paul E. McKenney
2011-03-09 22:29                           ` Frederic Weisbecker
2011-03-09 22:15                       ` [PATCH 2/4] jrcu: tap rcu_read_unlock Joe Korty
2011-03-10  0:34                         ` Paul E. McKenney
2011-03-10 19:50                           ` JRCU Theory of Operation Joe Korty
2011-03-12 14:36                             ` Paul E. McKenney
2011-03-13  0:43                               ` Joe Korty
2011-03-13  5:56                                 ` Paul E. McKenney
2011-03-13 23:53                                   ` Joe Korty
2011-03-14  0:50                                     ` Paul E. McKenney
2011-03-14  0:55                                       ` Josh Triplett
2011-03-09 22:16                       ` [PATCH 3/4] jrcu: tap might_resched() Joe Korty
2011-03-09 22:17                       ` [PATCH 4/4] jrcu: add new stat to /sys/kernel/debug/rcu/rcudata Joe Korty
2011-03-09 22:19                       ` [PATCH 1/4] jrcu: remove preempt_enable() tap [resend] Joe Korty
2011-03-12 14:36                       ` [PATCH] An RCU for SMP with a single CPU garbage collector Paul E. McKenney
2011-03-13  1:25                         ` Joe Korty
2011-03-13  6:09                           ` Paul E. McKenney
     [not found] <757455806.950179.1289232791283.JavaMail.root@sz0076a.westchester.pa.mail.comcast.net>
2010-11-08 16:15 ` [PATCH] a local-timer-free version of RCU houston.jim
2010-11-08 19:52   ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CD912E9.1080907@cn.fujitsu.com \
    --to=laijs@cn.fujitsu.com \
    --cc=dhaval.giani@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=houston.jim@comcast.net \
    --cc=joe.korty@ccur.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=loic.minier@linaro.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.