First patch was buggy, sorry :( This 2nd version makes no more RCU assumptions, because only the 'donelist' queue is fetched for an item to be deleted. Items from the donelist are ready to be freed. This V2 also corrects a problem in case of a CPU hotplug, we forgot to update the ->count variable when transfering a queue to another one. ------------------------------------------------------------------------- In order to avoid some OOM triggered by a flood of call_rcu() calls, we increased in linux 2.6.14 maxbatch from 10 to 10000, and conditionally call set_need_resched() in call_rcu(). This solution doesnt solve all the problems and has drawbacks. 1) Using a big maxbatch has a bad impact on latency. 2) A flood of call_rcu_bh() still can OOM I have some servers that once in a while crashes when the ip route cache is flushed. After raising /proc/sys/net/ipv4/route/secret_interval (so that *no* flush is done), I got better uptime for these servers. But in some cases I think the network stack can floods call_rcu_bh(), and a fatal OOM occurs. I suggest in this patch : 1) To lower maxbatch to a more reasonable value (as far as the latency is concerned) 2) To be able to guard a RCU cpu queue against a maximal count (10.000 for example). If this limit is reached, free the oldest entry (if available from the donelist queue). 3) Bug correction in __rcu_offline_cpu() where we forgot to adjust ->count field when transfering a queue to another one. In my stress tests, I could not reproduce OOM anymore after applying this patch. Signed-off-by: Eric Dumazet