RCU scaling on large systems

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Jack Steiner <steiner@sgi.com>
To: linux-kernel@vger.kernel.org
Subject: RCU scaling on large systems
Date: Sat, 1 May 2004 07:08:05 -0500	[thread overview]
Message-ID: <20040501120805.GA7767@sgi.com> (raw)

Has anyone investigated scaling of the RCU algorithms on large
cpu-count systems?

We are running 2.6.5 on a 512p system. The RCU update code is causing 
a near-livelock when booting. Several times, I dropped into kdb &
found many of the cpus in the RCU code. The cpus are in
rcu_check_quiescent_state() spinning on the rcu_ctrlblk.mutex.

I also found an interesting anomaly that was traced to RCU. I have
a program that measures "cpu efficiency". Basically, the program 
creates a cpu bound thread for each cpu & measures the percentage 
of time that each cpu ACTUALLY spends executing user code.
On an idle each system, each cpu *should* spend >99% in user mode.

On a 512p idle 2.6.5 system, each cpu spends ~6% of the time in the kernel
RCU code. The time is spent contending for shared cache lines.

Even more bizarre: if I repeatedly type "ls" in a *single* window 
(probably 5 times/sec), then EVERY CPU IN THE SYSTEM spends ~50%
of the time in the RCU code.

The RCU algorithms don't scale - at least on our systems!!!!!

Attached is an experimental hack that fixes the problem. I
don't believe that this is the correct fix but it does prove
that slowing down the frequency of updates fixes the problem.

With this hack, "ls" no longer measurable disturbs other cpus. Each
cpu spends ~99.8% of its time in user code regardless of the frequency
of typing "ls".

By default, the RCU code attempts to process callbacks on each cpu
every tick. The hack adds a mask so that only a few cpus process
callbacks each tick. 

I ran a simple experiment to vary the mask. Col 1 shows the mask
value, col 2 show system time when system is idle, col 3 shows
system time when typing "ls" on a single cpu. The percentages are
"eyeballed averages" for all cpus.

	VAL	idle%	"ls" %
	0	6.00	55.00
	1	2.00	50.00
	3	 .20	20.00
	7	 .20	  .22
	15	 .20	  .24
	31	 .19	  .25
	63	 .19	  .26
	127	 .19	  .25
	511	 .19	  .19

Looks like any value >7 or 15 should be ok on our hardware. I picked 
a larger value because I don't understand how heavy loads affect the 
optimal value.  I suspect that the optimal value is architecture
dependent & probably platform dependent.

Is anyone already working on this issue?

Comments and additional insight appreciated..........

Index: linux/include/linux/rcupdate.h
===================================================================
--- linux.orig/include/linux/rcupdate.h	2004-04-30 09:59:00.000000000 -0500
+++ linux/include/linux/rcupdate.h	2004-04-30 10:01:37.000000000 -0500
@@ -41,6 +41,7 @@
 #include <linux/threads.h>
 #include <linux/percpu.h>
 #include <linux/cpumask.h>
+#include <linux/jiffies.h>

 /**
  * struct rcu_head - callback structure for use with RCU
@@ -84,6 +85,14 @@
         return (a - b) > 0;
 }

+#define RCU_JIFFIES_MASK 15
+
+/* Is it time to process a batch on this cpu */
+static inline int rcu_time(int cpu)
+{
+	return (((jiffies - cpu) & RCU_JIFFIES_MASK) == 0);
+}
+
 /*
  * Per-CPU data for Read-Copy Update.
  * nxtlist - new callbacks are added here
@@ -111,11 +120,11 @@

 static inline int rcu_pending(int cpu) 
 {
-	if ((!list_empty(&RCU_curlist(cpu)) &&
+	if (rcu_time(cpu) && ((!list_empty(&RCU_curlist(cpu)) &&
 	     rcu_batch_before(RCU_batch(cpu), rcu_ctrlblk.curbatch)) ||
 	    (list_empty(&RCU_curlist(cpu)) &&
 			 !list_empty(&RCU_nxtlist(cpu))) ||
-	    cpu_isset(cpu, rcu_ctrlblk.rcu_cpu_mask))
+	    cpu_isset(cpu, rcu_ctrlblk.rcu_cpu_mask)))
 		return 1;
 	else
 		return 0;
-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.

next             reply	other threads:[~2004-05-01 12:08 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-05-01 12:08 Jack Steiner [this message]
2004-05-01 21:17 ` RCU scaling on large systems William Lee Irwin III
2004-05-01 22:35   ` William Lee Irwin III
2004-05-02  1:38   ` Jack Steiner
2004-05-07 17:53   ` Andrea Arcangeli
2004-05-07 18:17     ` William Lee Irwin III
2004-05-07 19:59       ` Andrea Arcangeli
2004-05-07 20:49   ` Jack Steiner
2004-05-02 18:28 ` Paul E. McKenney
2004-05-03 16:39   ` Jesse Barnes
2004-05-03 20:04     ` Paul E. McKenney
2004-05-03 18:40   ` Jack Steiner
2004-05-07 20:50     ` Paul E. McKenney
2004-05-07 22:06       ` Jack Steiner
2004-05-07 23:32         ` Andrew Morton
2004-05-08  4:55           ` Jack Steiner
2004-05-17 21:18           ` Andrea Arcangeli
2004-05-17 21:42             ` Andrew Morton
2004-05-17 23:50               ` Andrea Arcangeli
2004-05-18 13:33               ` Jack Steiner
2004-05-18 23:13               ` Matt Mackall
  -- strict thread matches above, loose matches on Subject: below --
2004-05-20 11:36 Manfred Spraul

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040501120805.GA7767@sgi.com \
    --to=steiner@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox