netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -tip 0/3] expedited "big hammer" RCU grace periods
@ 2009-06-24 16:45 Paul E. McKenney
  2009-06-24 16:46 ` [PATCH -tip 1/3] synchronize_sched_expedited() primitive Paul E. McKenney
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Paul E. McKenney @ 2009-06-24 16:45 UTC (permalink / raw)
  To: linux-kernel, netdev, netfilter-devel
  Cc: mingo, akpm, torvalds, davem, dada1, zbr, jeff.chua.linux, paulus,
	laijs, jengelh, r000n, benh, mathieu.desnoyers

This patch set implements the "big hammer" expedited RCU grace periods.
This leverages the existing per-CPU migration kthreads, as suggested
by Ingo.  These are awakened in a loop, and waited for in a second loop.
Not fully scalable, but removing the extra hop through smp_call_function
reduces latency on systems with moderate numbers of CPUs.  The
synchronize_rcu_expedited() and synchronize_bh_expedited() primitives
invoke synchronize_sched_expedited(), except for CONFIG_PREEMPT_RCU,
where they instead invoke synchronize_rcu() and synchronize_rcu_bh(),
respectively.  This will be fixed in the future, after preemptable RCU
is folded into the rcutree implementation.

As before, this does nothing to expedite callbacks already registered
with call_rcu() or call_rcu_bh(), but there is no need to.

Passes many hours of rcutorture testing in parallel with a script
that randomly offlines and onlines CPUs in a number of configurations.
Grace periods take about 40 microseconds on an 8-CPU Power machine, which
I believe is good enough from a performance viewpoint for the near future.
This represents some slowdown from v7, which was unfortunately necessary
to fix some bugs.

This is finally ready for inclusion.  ;-)

Shortcomings:

o	Does not address preemptable RCU (though synchronize_sched_expedited()
	is in fact expedited in this configuration).

o	Probably not helpful on systems with thousands of CPUs, but likely
	quite helpful even on systems with a few hundred CPUs.

Changes since v7:

o	Fixed several embarrassing bugs turned up by tests on multiple
	configurations.

Changes since v6:

o	Moved to using the migration threads, as suggested by Ingo.

Changes since v5:

o	Fixed several embarrassing locking bugs, including those
	noted by Ingo and Lai.

o	Added a missing set of braces.

o	Cut out the extra kthread, so that synchronize_sched_expedited()
	directly calls smp_call_function() and waits for the quiescent
	states.

o	Removed some debug code, but promoted one to production.

o	Fix a compiler warning.

Changes since v4:

o	Use per-CPU kthreads to force the quiescent states in parallel.

Changes since v3:

o	Use a kthread that schedules itself on each CPU in turn to
	force a grace period.  The synchronize_rcu() primitive
	wakes up the kthread in order to avoid messing with affinity
	masks on user tasks.

o	Tried a number of additional variations on the v3 approach, none
	of which helped much.

Changes since v2:

o	Use reschedule IPIs rather than a softirq.

Changes since v1:

o	Added rcutorture support, and added exports required by
	rcutorture.

o	Added comment stating that smp_call_function() implies a
	memory barrier, suggested by Mathieu.

o	Added #include for delay.h.

 Documentation/RCU/torture.txt |   17 +++
 include/linux/rcuclassic.h    |   15 ++-
 include/linux/rcupdate.h      |   25 ++---
 include/linux/rcupreempt.h    |   10 ++
 include/linux/rcutree.h       |   12 ++
 kernel/rcupdate.c             |   25 +++++
 kernel/rcutorture.c           |  202 ++++++++++++++++++++++--------------------
 kernel/sched.c                |  129 ++++++++++++++++++++++++++
 8 files changed, 327 insertions(+), 108 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread
* [PATCH -tip 0/3] expedited "big hammer" RCU grace periods
@ 2009-06-25 16:07 Paul E. McKenney
  2009-06-25 16:08 ` [PATCH -tip 2/3] synchronize_sched_expedited() torture tests Paul E. McKenney
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Paul E. McKenney @ 2009-06-25 16:07 UTC (permalink / raw)
  To: linux-kernel, netdev, netfilter-devel
  Cc: mingo, akpm, torvalds, davem, dada1, zbr, jeff.chua.linux, paulus,
	laijs, jengelh, r000n, benh, mathieu.desnoyers

Respin of http://lkml.org/lkml/2009/6/24/350 to allow for the removal
of Classic RCU (in favor of Hierarchical RCU) from the -tip tree.

This patch set implements the "big hammer" expedited RCU grace periods.
This leverages the existing per-CPU migration kthreads, as suggested
by Ingo.  These are awakened in a loop, and waited for in a second loop.
Not fully scalable, but removing the extra hop through smp_call_function
reduces latency on systems with moderate numbers of CPUs.  The
synchronize_rcu_expedited() and synchronize_bh_expedited() primitives
invoke synchronize_sched_expedited(), except for CONFIG_PREEMPT_RCU,
where they instead invoke synchronize_rcu() and synchronize_rcu_bh(),
respectively.  This will be fixed in the future, after preemptable RCU
is folded into the rcutree implementation.

As before, this does nothing to expedite callbacks already registered
with call_rcu() or call_rcu_bh(), but there is no need to.

Passes many hours of rcutorture testing in parallel with a script
that randomly offlines and onlines CPUs in a number of configurations.
Grace periods take about 40 microseconds on an 8-CPU Power machine, which
I believe is good enough from a performance viewpoint for the near future.
This represents some slowdown from v7, which was unfortunately necessary
to fix some bugs.

This is finally ready for inclusion.  ;-)

Shortcomings:

o	Does not address preemptable RCU (though synchronize_sched_expedited()
	is in fact expedited in this configuration).

o	Probably not helpful on systems with thousands of CPUs, but likely
	quite helpful even on systems with a few hundred CPUs.

Changes since v7:

o	Fixed several embarrassing bugs turned up by tests on multiple
	configurations.

Changes since v6:

o	Moved to using the migration threads, as suggested by Ingo.

Changes since v5:

o	Fixed several embarrassing locking bugs, including those
	noted by Ingo and Lai.

o	Added a missing set of braces.

o	Cut out the extra kthread, so that synchronize_sched_expedited()
	directly calls smp_call_function() and waits for the quiescent
	states.

o	Removed some debug code, but promoted one to production.

o	Fix a compiler warning.

Changes since v4:

o	Use per-CPU kthreads to force the quiescent states in parallel.

Changes since v3:

o	Use a kthread that schedules itself on each CPU in turn to
	force a grace period.  The synchronize_rcu() primitive
	wakes up the kthread in order to avoid messing with affinity
	masks on user tasks.

o	Tried a number of additional variations on the v3 approach, none
	of which helped much.

Changes since v2:

o	Use reschedule IPIs rather than a softirq.

Changes since v1:

o	Added rcutorture support, and added exports required by
	rcutorture.

o	Added comment stating that smp_call_function() implies a
	memory barrier, suggested by Mathieu.

o	Added #include for delay.h.

 Documentation/RCU/RTFP.txt       |   77 ++++++++++++++
 Documentation/RCU/UP.txt         |   34 ++++--
 Documentation/RCU/checklist.txt  |   20 ++-
 Documentation/RCU/rcubarrier.txt |    7 +
 Documentation/RCU/torture.txt    |   23 ++++
 Documentation/RCU/whatisRCU.txt  |   15 ++
 include/linux/rcupdate.h         |   25 ++--
 include/linux/rcupreempt.h       |   10 +
 include/linux/rcutree.h          |   12 ++
 kernel/rcupdate.c                |   25 ++++
 kernel/rcutorture.c              |  203 +++++++++++++++++++++------------------
 kernel/sched.c                   |  130 ++++++++++++++++++++++++
 12 files changed, 452 insertions(+), 129 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-06-25 17:54 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-24 16:45 [PATCH -tip 0/3] expedited "big hammer" RCU grace periods Paul E. McKenney
2009-06-24 16:46 ` [PATCH -tip 1/3] synchronize_sched_expedited() primitive Paul E. McKenney
2009-06-24 16:47 ` [PATCH -tip 2/3] synchronize_sched_expedited() torture tests Paul E. McKenney
2009-06-24 16:48 ` [PATCH -tip 3/3] synchronize_sched_expedited() rcutorture doc Paul E. McKenney
2009-06-24 18:03 ` [PATCH -tip 0/3] expedited "big hammer" RCU grace periods Ingo Molnar
2009-06-24 18:44   ` Paul E. McKenney
2009-06-25  9:59     ` Ingo Molnar
2009-06-25 15:27       ` Paul E. McKenney
2009-06-25 16:47       ` Linus Torvalds
2009-06-25 17:54         ` Paul E. McKenney
  -- strict thread matches above, loose matches on Subject: below --
2009-06-25 16:07 Paul E. McKenney
2009-06-25 16:08 ` [PATCH -tip 2/3] synchronize_sched_expedited() torture tests Paul E. McKenney
2009-06-25 16:08 ` Paul E. McKenney
2009-06-25 16:08 ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).