Re: [PATCH RFC] v2 expedited "big hammer" RCU grace periods

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Ingo Molnar <mingo@elte.hu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	netfilter-devel@vger.kernel.org, akpm@linux-foundation.org,
	torvalds@linux-foundation.org, davem@davemloft.net,
	dada1@cosmosbay.com, zbr@ioremap.net, jeff.chua.linux@gmail.com,
	paulus@samba.org, laijs@cn.fujitsu.com, jengelh@medozas.de,
	r000n@r000n.net, benh@kernel.crashing.org
Subject: Re: [PATCH RFC] v2 expedited "big hammer" RCU grace periods
Date: Sun, 26 Apr 2009 15:13:44 -0400	[thread overview]
Message-ID: <20090426191343.GC29238@Krystal> (raw)
In-Reply-To: <20090426112717.GE10391@elte.hu>

* Ingo Molnar (mingo@elte.hu) wrote:
> 
> * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> 
> > Second cut of "big hammer" expedited RCU grace periods, but only 
> > for rcu_bh.  This creates another softirq vector, so that entering 
> > this softirq vector will have forced an rcu_bh quiescent state (as 
> > noted by Dave Miller).  Use smp_call_function() to invoke 
> > raise_softirq() on all CPUs in order to cause this to happen.  
> > Track the CPUs that have passed through a quiescent state (or gone 
> > offline) with a cpumask.
> 
> hm, i'm still asking whether doing this would be simpler via a 
> reschedule vector - which not only is an existing facility but also 
> forces all RCU domains through a quiescent state - not just bh-RCU 
> participants.
> 
> Triggering a new softirq is in no way simpler that doing an SMP 
> cross-call - in fact softirqs are a finite resource so using some 
> other facility would be preferred.
> 
> Am i missing something?
> 

I think the reason for this whole thread is that waiting for rcu
quiescent state, when called many times e.g. in multiple iptables
invokations, takes too longs (5 seconds to load the netfilter rules at
boot). The three solutions proposed so far are :

- bh disabling + per-cpu read-write lock
- RCU FGP (fast grace periods), where the writer directly check each
  per-cpu variables associated with netfilter to make sure the quiescent
  state for a particular resource has been reached. (derived from my
  userspace RCU implementation)
- expedited "big hammer" rcu GP, where the writer only waits for bh
  quiescent state. This is useful if we can guarantee that all readers
  are either in bh context or disable bottom halves.

Therefore, it's really on purpose that Paul does not wait for global RCU
quiescent states, but rather just for bh : it's faster.

IMHO, the bh rcu GP shares the same problem as the global RCU GP
approach : it monitors global kernel state to ensure quiescence. It's
better in practice because bh quiescent states are much more frequent
than global QS, but it still depends on every other bh handler and bh
disabled section duration to calculate the maximum writer delay. One
might argue that if we keep those small, this should not matter in
practice.

The RCU FGP approach is interesting because it's based solely on
netfilter-specific per-cpu variables to detect QS. Therefore, even if
an unrelated piece of kernel software eventually decides to be a bad
citizen and disable bh for long periods on a 4096 cpu box, it won't slow
down the netfilter tables update.

This last positive aspect of RCU FGP is common with the bh disabling +
per-cpu rw lock approach, where the rw lock is also local to netfilter.
However, taking a rwlock and disabling bh will make the read-side much
slower than RCU FGP (which simply disables preemption and touches a
per-cpu GP/nesting count variable). But given RCU FGP is relatively new,
it makes sense to use a known-good solution in the short term.

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

next prev parent reply	other threads:[~2009-04-26 19:13 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-26  5:23 [PATCH RFC] v2 expedited "big hammer" RCU grace periods Paul E. McKenney
2009-04-26 11:27 ` Ingo Molnar
2009-04-26 19:13   ` Mathieu Desnoyers [this message]
2009-04-26 20:22     ` Ingo Molnar
2009-04-26 21:44       ` Paul E. McKenney
2009-04-27  3:26         ` Ingo Molnar
2009-04-27 13:21           ` Paul E. McKenney
2009-04-27 13:43             ` Ingo Molnar
2009-04-27 16:17               ` Paul E. McKenney
2009-04-27 15:54             ` Mathieu Desnoyers
2009-04-27 16:16               ` Paul E. McKenney
2009-04-27 20:56               ` Evgeniy Polyakov
2009-04-26 20:54   ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090426191343.GC29238@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=dada1@cosmosbay.com \
    --cc=davem@davemloft.net \
    --cc=jeff.chua.linux@gmail.com \
    --cc=jengelh@medozas.de \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=paulus@samba.org \
    --cc=r000n@r000n.net \
    --cc=torvalds@linux-foundation.org \
    --cc=zbr@ioremap.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).