public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Linus Torvalds <torvalds@osdl.org>
Cc: linux-kernel@vger.kernel.org,
	"David S. Miller" <davem@davemloft.net>,
	Dipankar Sarma <dipankar@in.ibm.com>,
	"Paul E. McKenney" <paulmck@us.ibm.com>,
	Manfred Spraul <manfred@colorfullife.com>,
	netdev@vger.kernel.org
Subject: [PATCH, RFC] RCU : OOM avoidance and lower latency
Date: Fri, 06 Jan 2006 11:17:26 +0100	[thread overview]
Message-ID: <43BE43B6.3010105@cosmosbay.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0601051727070.3169@g5.osdl.org>

[-- Attachment #1: Type: text/plain, Size: 1192 bytes --]


In order to avoid some OOM triggered by a flood of call_rcu() calls, we 
increased in linux 2.6.14 maxbatch from 10 to 10000, and conditionally call 
set_need_resched() in call_rcu().

This solution doesnt solve all the problems and has drawbacks.

1) Using a big maxbatch has a bad impact on latency.
2) A flood of call_rcu_bh() still can OOM

I have some servers that once in a while crashes when the ip route cache is 
flushed. After raising /proc/sys/net/ipv4/route/secret_interval (so that *no* 
flush is done), I got better uptime for these servers. But in some cases I 
think the network stack can floods call_rcu_bh(), and a fatal OOM occurs.

I suggest in this patch :

1) To lower maxbatch to a more reasonable value (as far as the latency is 
concerned)

2) To be able to guard a RCU cpu queue against a maximal count (10.000 for 
example). If this limit is reached, free the oldest entry of this queue.

I assume that if a CPU queued 10.000 items in its RCU queue, then the oldest 
entry cannot still be in use by another CPU. This might sounds as a violation 
of RCU rules, (I'm not an RCU expert) but seems quite reasonable.


Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

[-- Attachment #2: RCU_OOM.patch --]
[-- Type: text/plain, Size: 2078 bytes --]

--- linux-2.6.15/kernel/rcupdate.c	2006-01-03 04:21:10.000000000 +0100
+++ linux-2.6.15-edum/kernel/rcupdate.c	2006-01-06 11:10:45.000000000 +0100
@@ -71,14 +71,14 @@
 
 /* Fake initialization required by compiler */
 static DEFINE_PER_CPU(struct tasklet_struct, rcu_tasklet) = {NULL};
-static int maxbatch = 10000;
+static int maxbatch = 100;
 
 #ifndef __HAVE_ARCH_CMPXCHG
 /*
  * We use an array of spinlocks for the rcurefs -- similar to ones in sparc
  * 32 bit atomic_t implementations, and a hash function similar to that
  * for our refcounting needs.
- * Can't help multiprocessors which donot have cmpxchg :(
+ * Can't help multiprocessors which dont have cmpxchg :(
  */
 
 spinlock_t __rcuref_hash[RCUREF_HASH_SIZE] = {
@@ -110,9 +110,17 @@
 	*rdp->nxttail = head;
 	rdp->nxttail = &head->next;
 
-	if (unlikely(++rdp->count > 10000))
-		set_need_resched();
-
+/*
+ * OOM avoidance : If we queued too many items in this queue,
+ *  free the oldest entry
+ */
+	if (unlikely(++rdp->count > 10000)) {
+		rdp->count--;
+		head = rdp->donelist;
+		rdp->donelist = head->next;
+		local_irq_restore(flags);
+		return head->func(head);
+	}
 	local_irq_restore(flags);
 }
 
@@ -148,12 +156,17 @@
 	rdp = &__get_cpu_var(rcu_bh_data);
 	*rdp->nxttail = head;
 	rdp->nxttail = &head->next;
-	rdp->count++;
 /*
- *  Should we directly call rcu_do_batch() here ?
- *  if (unlikely(rdp->count > 10000))
- *      rcu_do_batch(rdp);
+ * OOM avoidance : If we queued too many items in this queue,
+ *  free the oldest entry
  */
+	if (unlikely(++rdp->count > 10000)) {
+		rdp->count--;
+		head = rdp->donelist;
+		rdp->donelist = head->next;
+		local_irq_restore(flags);
+		return head->func(head);
+	}
 	local_irq_restore(flags);
 }
 
@@ -209,7 +222,7 @@
 static void rcu_do_batch(struct rcu_data *rdp)
 {
 	struct rcu_head *next, *list;
-	int count = 0;
+	int count = maxbatch;
 
 	list = rdp->donelist;
 	while (list) {
@@ -217,7 +230,7 @@
 		list->func(list);
 		list = next;
 		rdp->count--;
-		if (++count >= maxbatch)
+		if (--count <= 0)
 			break;
 	}
 	if (!rdp->donelist)

  parent reply	other threads:[~2006-01-06 10:18 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20060105235845.967478000@sorel.sous-sol.org>
2006-01-05 21:47 ` [PATCH 0/6] -stable review Chris Wright
2006-01-06  0:45   ` [PATCH 1/6] drivers/net/sungem.c: gem_remove_one mustnt be __devexit Chris Wright
2006-01-06  0:45   ` [PATCH 2/6] ieee80211_crypt_tkip depends on NET_RADIO Chris Wright
2006-01-06  0:45   ` [PATCH 3/6] Insanity avoidance in /proc (CVE-2005-4605) Chris Wright
2006-01-06  0:45   ` [PATCH 4/6] sysctl: dont overflow the user-supplied buffer with 0 Chris Wright
2006-01-06  1:30     ` Linus Torvalds
2006-01-06  3:40       ` Chris Wright
2006-01-06 10:17       ` Eric Dumazet [this message]
2006-01-06 12:52         ` [PATCH, RFC] RCU : OOM avoidance and lower latency (Version 2), HOTPLUG_CPU fix Eric Dumazet
2006-01-06 12:58         ` [PATCH, RFC] RCU : OOM avoidance and lower latency Andi Kleen
2006-01-06 13:09           ` Eric Dumazet
2006-01-06 19:26           ` Lee Revell
2006-01-06 22:18             ` Andi Kleen
2006-01-06 13:37         ` Alan Cox
2006-01-06 14:00           ` Eric Dumazet
2006-01-06 14:45             ` Alan Cox
2006-01-06 16:47           ` Paul E. McKenney
2006-01-06 17:19             ` Eric Dumazet
2006-01-06 20:26               ` Paul E. McKenney
2006-01-06 20:33                 ` David S. Miller
2006-01-06 20:57                 ` Andi Kleen
2006-01-07  0:17                   ` David S. Miller
2006-01-07  1:09                     ` Andi Kleen
2006-01-07  7:10                       ` David S. Miller
2006-01-07  7:34                       ` Eric Dumazet
2006-01-07  7:44                         ` David S. Miller
2006-01-07  7:53                           ` Eric Dumazet
2006-01-07  8:36                             ` David S. Miller
2006-01-07 20:30                               ` Paul E. McKenney
2006-01-07  8:30                     ` Eric Dumazet
2006-01-06 19:24         ` Lee Revell
2006-01-06  0:46   ` [PATCH 5/6] UFS: inode->i_sem is not released in error path Chris Wright
2006-01-06  0:46   ` [PATCH 6/6] [ATYFB]: Fix onboard video on SPARC Blade 100 for 2.6.{13,14,15} Chris Wright
2006-01-06  0:53   ` [PATCH 0/6] -stable review Chris Wright

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43BE43B6.3010105@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=davem@davemloft.net \
    --cc=dipankar@in.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=manfred@colorfullife.com \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@us.ibm.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox