From: Stephen Hemminger <shemminger@vyatta.com>
To: paulmck@linux.vnet.ibm.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
David Miller <davem@davemloft.net>, Ingo Molnar <mingo@elte.hu>,
Lai Jiangshan <laijs@cn.fujitsu.com>,
jeff.chua.linux@gmail.com, dada1@cosmosbay.com,
jengelh@medozas.de, kaber@trash.net, r000n@r000n.net,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
netfilter-devel@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: iptables very slow after commit 784544739a25c30637397ace5489eeb6e15d7d49
Date: Sat, 11 Apr 2009 08:50:09 -0700 [thread overview]
Message-ID: <20090411085009.13d5a349@nehalam> (raw)
In-Reply-To: <20090411041533.GB6822@linux.vnet.ibm.com>
On Fri, 10 Apr 2009 21:15:33 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> On Fri, Apr 10, 2009 at 06:39:18PM -0700, Linus Torvalds wrote:
> >
> >
> > On Fri, 10 Apr 2009, David Miller wrote:
> > >
> > > [ CC:'ing netfilter-devel and netdev... ]
> >
> > I wonder if we should bring in the RCU people too, for them to tell you
> > that the networking people are beign silly, and should not synchronize
> > with the very heavy-handed
> >
> > synchronize_net()
> >
> > but instead of doing synchronization (which is probably why adding a few
> > hundred rules then takes several seconds - each synchronizes and that
> > takes a timer tick or so), add the rules to be free'd on some rcu-freeing
> > list for later freeing.
> >
> > Or whatever. Paul? synchronize_net() just calls synchronize_rcu(), and
> > with that knowledge and a simple
> >
> > git show 784544739a25c30637397ace5489eeb6e15d7d49
> >
> > I bet you can already tell people how to fix their performance issue.
>
> Well, I am certainly happy to demonstrate my ignorance of the networking
> code by throwing out a few suggestions.
>
> So, Dave and Steve, you might want to get out your barf bag before
> reading further. You have been warned! ;-)
>
> 1. Assuming that the synchronize_net() is intended to guarantee
> that the new rules will be in effect before returning to
> user space:
In this case it is to make sure that the old counter table is no
longer being used by other cpu's receiving.
> a. Split this functionality, so that there is a new
> user-space primitive that installs a new rule, but
> without waiting. They provide an additional user-space
> primitive that waits for the rules to take effect.
> Then, when loading a long list of rules, load them
> using the non-waiting primitive, and wait at the end.
>
> b. As above, but provide a flag that says whether or not
> to wait. Same general effect.
>
> But I am not seeing the direct connection between this patch
> and netfilter, so...
> 2. For the xt_replace_table() case, it would be necessary to add an
> rcu_head to the xt_table_info, and replace each caller's direct
> calls to xt_free_table_info() with call_rcu().
>
> Now this has an issue in that the caller wants to return the
> final counter values. My assumption is that these values do
> not in fact need to be exact. If I am wrong about that, then
> my suggestion would lose the counts from late readers.
> I must defer to the networking guys as to whether this is
> acceptable or not. If not, more head-scratching would be
> required. (But it looks to me that the rule is being trashed,
> so who cares about the extra counts?)
The problem is that users want to account for every byte.
> In addition, a malicious user might be able to force this to
> happen extremely frequently, running the system out of memory.
> One way to fix this is to invoke synchronize_net() one out of
> 20 times or some such.
Malicious user == root, therefore don't care.
> 3. For the alloc_counters() case, the comments indicate that we
> really truly do want an atomic sampling of the counters.
> The counters are 64-bit entities, which is a bit inconvenient.
> Though people using this functionality are no doubt quite happy
> to never have to worry about overflow, I hasten to add!
And we need snapshot of all counters (which are not even an array but
a skip list).
> I will nevertheless suggest the following egregious hack to
> get a consistent sample of one counter for some other CPU:
>
> a. Disable interrupts
> b. Atomically exchange the bottom 32 bits of the
> counter with the value zero.
> c. Atomically exchange the top 32 bits of the counter
> with the value zero.
> d. Concatenate the values obtained in (b) and (c), which
> is the snapshot value.
> e. Re-enable interrupts. Yes, for each counter. Do it
> for the honor of the -rt patchset. ;-)
>
> Disabling interrupts should make it impossible for
> the low-order 32 bits of the counter to overflow before
> we get around to zeroing the upper 32 bits. Yes, this
> is horribly paranoid, but please keep in mind that even
> my level of paranoia is not always sufficient to keep
> RCU working correctly. :-/
>
> Architectures with 64-bit atomics can simply do a 64-bit
> exchange (or cmpxchg(), for that matter).
>
> Now we still have the possibility that the other CPU is still
> hammering away on the counter that we just zeroed from a
> long-running RCU read-side critical section.
>
> So, we also need to add an rcu_head somewhere, perhaps reuse
> the one in xt_table_info, create a second one, or squirrel one
> away somewhere else. As long as there is a way to get to the
> old counter values. And a flag to indicate that the rcu_head
> is in use. It is socially irresponsible to pass a given
> rcu_head to call_rcu() before it has been invoked after the
> previous time it was passed to call_rcu(). But you guys all
> knew that already.
>
> We replace the synchronize_net() with call_rcu(), more or less.
> The call_rcu() probably needs to be under the lock -- or at the
> very least, setting the flag saying that it is in use needs to
> be under the lock.
>
> The RCU callback function traverses the old counters one last
> time, adding their values to the new set of counters. No
> atomic exchange tricks are required this time, since all the
> RCU readers that could possibly have held a reference to the
> old set of counters must now be done. We now clear the flag,
> allowing the next counter snapshot to proceed.
>
> OK, OK, Dave and Steve, I should have suggested that you get two
> barf bags. Maybe three. ;-)
>
> Additional caveat: coward that I am, I looked only at the IPv4 code.
> There might well be additional complications in the arp and IPv6 code.
>
> However, I do believe that something like this might actually work.
>
> Thoughts?
>
> Thanx, Paul
>
> > Linus
> >
> > ---
> > > > On Fri, 10 Apr 2009 17:15:52 +0800 (SGT)
> > > > Jeff Chua <jeff.chua.linux@gmail.com> wrote:
> > > >>
> > > >> Adding 200 records in iptables took 6.0sec in 2.6.30-rc1 compared to
> > > >> 0.2sec in 2.6.29. I've bisected down this commit.
> > > >>
> > > >> There are a few patches on top of the original patch. When I reverted the
> > > >> original commit + changing rcu_read() to rcu_read_bh(), it speeds up the
> > > >> inserts back to .2sec again.
> > > >>
> > > >> I'm loading all the firewall rules during boot-up and this 6 secs slowness
> > > >> is really not very nice to wait for.
> > > >
> > > > The performance benefit during operation is more important. The load
> > > > time is fixable. The problem is probably generic to any set of rules,
> > > > but could you post some info about your configuration (like the rule
> > > > set), and the system configuration (# of cpu's, config etc).
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > > Please read the FAQ at http://www.tux.org/lkml/
> > >
> > --
> > To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2009-04-11 15:50 UTC|newest]
Thread overview: 215+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <Pine.LNX.4.64.0904101656190.2093@boston.corp.fedex.com>
[not found] ` <20090410095246.4fdccb56@s6510>
2009-04-11 1:25 ` iptables very slow after commit784544739a25c30637397ace5489eeb6e15d7d49 David Miller
2009-04-11 1:39 ` iptables very slow after commit 784544739a25c30637397ace5489eeb6e15d7d49 Linus Torvalds
2009-04-11 4:15 ` Paul E. McKenney
2009-04-11 5:14 ` Jan Engelhardt
2009-04-11 5:42 ` Paul E. McKenney
2009-04-11 6:00 ` David Miller
2009-04-11 18:12 ` Kyle Moffett
2009-04-11 18:32 ` Arkadiusz Miskiewicz
2009-04-12 0:54 ` david
2009-04-12 5:05 ` Kyle Moffett
2009-04-12 12:30 ` Harald Welte
2009-04-12 16:38 ` Jan Engelhardt
2009-04-11 15:07 ` Stephen Hemminger
2009-04-11 16:05 ` Jeff Chua
2009-04-11 17:51 ` Linus Torvalds
2009-04-11 7:08 ` Ingo Molnar
2009-04-11 15:05 ` Stephen Hemminger
2009-04-11 17:48 ` Paul E. McKenney
2009-04-12 10:54 ` Ingo Molnar
2009-04-12 11:34 ` Paul Mackerras
2009-04-12 17:31 ` Paul E. McKenney
2009-04-13 1:13 ` David Miller
2009-04-13 4:04 ` Paul E. McKenney
2009-04-13 16:53 ` [PATCH] netfilter: use per-cpu spinlock rather than RCU Stephen Hemminger
2009-04-13 17:40 ` Eric Dumazet
2009-04-13 18:11 ` Stephen Hemminger
2009-04-13 19:06 ` Martin Josefsson
2009-04-13 19:17 ` Linus Torvalds
2009-04-13 22:24 ` Andrew Morton
2009-04-13 23:20 ` Stephen Hemminger
2009-04-13 23:26 ` Andrew Morton
2009-04-13 23:37 ` Linus Torvalds
2009-04-13 23:52 ` Ingo Molnar
2009-04-14 12:27 ` Patrick McHardy
2009-04-14 14:23 ` Eric Dumazet
2009-04-14 14:45 ` Stephen Hemminger
2009-04-14 15:49 ` Eric Dumazet
2009-04-14 16:51 ` Jeff Chua
2009-04-14 18:17 ` [PATCH] netfilter: use per-cpu spinlock rather than RCU (v2) Stephen Hemminger
2009-04-14 19:28 ` Eric Dumazet
2009-04-14 21:11 ` Stephen Hemminger
2009-04-14 21:13 ` [PATCH] netfilter: use per-cpu spinlock rather than RCU (v3) Stephen Hemminger
2009-04-14 21:40 ` Eric Dumazet
2009-04-15 10:59 ` Patrick McHardy
2009-04-15 16:31 ` Stephen Hemminger
2009-04-15 20:55 ` Stephen Hemminger
2009-04-15 21:07 ` Eric Dumazet
2009-04-15 21:55 ` Jan Engelhardt
2009-04-16 12:12 ` Patrick McHardy
2009-04-16 12:24 ` Jan Engelhardt
2009-04-16 12:31 ` Patrick McHardy
2009-04-15 21:57 ` [PATCH] netfilter: use per-cpu rwlock rather than RCU (v4) Stephen Hemminger
2009-04-15 23:48 ` [PATCH] netfilter: use per-cpu spinlock rather than RCU (v3) David Miller
2009-04-16 0:01 ` Stephen Hemminger
2009-04-16 0:05 ` David Miller
2009-04-16 12:28 ` Patrick McHardy
2009-04-16 0:10 ` Linus Torvalds
2009-04-16 0:45 ` [PATCH] netfilter: use per-cpu spinlock and RCU (v5) Stephen Hemminger
2009-04-16 5:01 ` Eric Dumazet
2009-04-16 13:53 ` Patrick McHardy
2009-04-16 14:47 ` Paul E. McKenney
2009-04-16 16:10 ` [PATCH] netfilter: use per-cpu recursive spinlock (v6) Eric Dumazet
2009-04-16 16:20 ` Eric Dumazet
2009-04-16 16:37 ` Linus Torvalds
2009-04-16 16:59 ` Patrick McHardy
2009-04-16 17:58 ` Paul E. McKenney
2009-04-16 18:41 ` Eric Dumazet
2009-04-16 20:49 ` [PATCH[] netfilter: use per-cpu reader-writer lock (v0.7) Stephen Hemminger
2009-04-16 21:02 ` Linus Torvalds
2009-04-16 23:04 ` Ingo Molnar
2009-04-17 0:13 ` [PATCH] netfilter: use per-cpu recursive spinlock (v6) Paul E. McKenney
2009-04-16 13:11 ` [PATCH] netfilter: use per-cpu spinlock rather than RCU (v3) Patrick McHardy
2009-04-16 22:33 ` David Miller
2009-04-16 23:49 ` Paul E. McKenney
2009-04-16 23:52 ` [PATCH] netfilter: per-cpu spin-lock with recursion (v0.8) Stephen Hemminger
2009-04-17 0:15 ` Jeff Chua
2009-04-17 5:55 ` Peter Zijlstra
2009-04-17 6:03 ` Eric Dumazet
2009-04-17 6:14 ` Eric Dumazet
2009-04-17 17:08 ` Peter Zijlstra
2009-04-17 11:17 ` Patrick McHardy
2009-04-17 1:28 ` [PATCH] netfilter: use per-cpu spinlock rather than RCU (v3) Paul E. McKenney
2009-04-17 2:19 ` Mathieu Desnoyers
2009-04-17 5:05 ` Paul E. McKenney
2009-04-17 5:44 ` Mathieu Desnoyers
2009-04-17 14:51 ` Paul E. McKenney
2009-04-17 4:50 ` Stephen Hemminger
2009-04-17 5:08 ` Paul E. McKenney
2009-04-17 5:16 ` Eric Dumazet
2009-04-17 5:40 ` Paul E. McKenney
2009-04-17 8:07 ` David Miller
2009-04-17 15:00 ` Paul E. McKenney
2009-04-17 17:22 ` Peter Zijlstra
2009-04-17 17:32 ` Linus Torvalds
2009-04-17 6:12 ` Peter Zijlstra
2009-04-17 16:33 ` Paul E. McKenney
2009-04-17 16:51 ` Peter Zijlstra
2009-04-17 21:29 ` Paul E. McKenney
2009-04-18 9:40 ` Evgeniy Polyakov
2009-04-18 14:14 ` Paul E. McKenney
2009-04-20 17:34 ` [PATCH] netfilter: use per-cpu recursive lock (v10) Stephen Hemminger
2009-04-20 18:21 ` Paul E. McKenney
2009-04-20 18:25 ` Eric Dumazet
2009-04-20 20:32 ` Stephen Hemminger
2009-04-20 20:42 ` Stephen Hemminger
2009-04-20 21:05 ` Paul E. McKenney
2009-04-20 21:23 ` Paul Mackerras
2009-04-20 21:58 ` Paul E. McKenney
2009-04-20 22:41 ` Paul Mackerras
2009-04-20 23:01 ` [PATCH] netfilter: use per-cpu recursive lock (v11) Stephen Hemminger
2009-04-21 3:41 ` Lai Jiangshan
2009-04-21 3:56 ` Eric Dumazet
2009-04-21 4:15 ` Stephen Hemminger
2009-04-21 5:22 ` Lai Jiangshan
2009-04-21 5:45 ` Stephen Hemminger
2009-04-21 6:52 ` Lai Jiangshan
2009-04-21 8:16 ` Evgeniy Polyakov
2009-04-21 8:42 ` Lai Jiangshan
2009-04-21 8:49 ` David Miller
2009-04-21 8:55 ` Eric Dumazet
2009-04-21 9:22 ` Evgeniy Polyakov
2009-04-21 9:34 ` Lai Jiangshan
2009-04-21 5:34 ` Lai Jiangshan
2009-04-21 4:59 ` Eric Dumazet
2009-04-21 16:37 ` Paul E. McKenney
2009-04-21 5:46 ` Lai Jiangshan
2009-04-21 16:13 ` Linus Torvalds
2009-04-21 16:43 ` Stephen Hemminger
2009-04-21 16:50 ` Linus Torvalds
2009-04-21 18:02 ` Ingo Molnar
2009-04-21 18:15 ` Stephen Hemminger
2009-04-21 19:10 ` Ingo Molnar
2009-04-21 19:46 ` Eric Dumazet
2009-04-22 7:35 ` Ingo Molnar
2009-04-22 8:53 ` Eric Dumazet
2009-04-22 10:13 ` Jarek Poplawski
2009-04-22 11:26 ` Ingo Molnar
2009-04-22 11:39 ` Jarek Poplawski
2009-04-22 11:18 ` Ingo Molnar
2009-04-22 15:19 ` Linus Torvalds
2009-04-22 16:57 ` Eric Dumazet
2009-04-22 17:18 ` Linus Torvalds
2009-04-22 20:46 ` Jarek Poplawski
2009-04-22 17:48 ` Ingo Molnar
2009-04-21 21:04 ` Stephen Hemminger
2009-04-22 8:00 ` Ingo Molnar
2009-04-21 19:39 ` Ingo Molnar
2009-04-21 21:39 ` [PATCH] netfilter: use per-cpu recursive lock (v13) Stephen Hemminger
2009-04-22 4:17 ` Paul E. McKenney
2009-04-22 14:57 ` Eric Dumazet
2009-04-22 15:32 ` Linus Torvalds
2009-04-24 4:09 ` [PATCH] netfilter: use per-CPU recursive lock {XIV} Stephen Hemminger
2009-04-24 4:58 ` Eric Dumazet
2009-04-24 15:33 ` Patrick McHardy
2009-04-24 16:18 ` Stephen Hemminger
2009-04-24 20:43 ` Jarek Poplawski
2009-04-25 20:30 ` [PATCH] netfilter: iptables no lockdep is needed Stephen Hemminger
2009-04-26 8:18 ` Jarek Poplawski
2009-04-26 18:24 ` [PATCH] netfilter: use per-CPU recursive lock {XV} Eric Dumazet
2009-04-26 18:56 ` Mathieu Desnoyers
2009-04-26 21:57 ` Stephen Hemminger
2009-04-26 22:32 ` Mathieu Desnoyers
2009-04-27 17:44 ` Peter Zijlstra
2009-04-27 18:30 ` [PATCH] netfilter: use per-CPU r**ursive " Stephen Hemminger
2009-04-27 18:54 ` Ingo Molnar
2009-04-27 19:06 ` Stephen Hemminger
2009-04-27 19:46 ` Linus Torvalds
2009-04-27 19:48 ` Linus Torvalds
2009-04-27 20:36 ` Evgeniy Polyakov
2009-04-27 20:58 ` Linus Torvalds
2009-04-27 21:40 ` Stephen Hemminger
2009-04-27 22:24 ` Linus Torvalds
2009-04-27 23:01 ` Linus Torvalds
2009-04-27 23:03 ` Linus Torvalds
2009-04-28 6:58 ` Eric Dumazet
2009-04-28 11:53 ` David Miller
2009-04-28 12:40 ` Ingo Molnar
2009-04-28 13:43 ` David Miller
2009-04-28 13:52 ` Mathieu Desnoyers
2009-04-28 14:37 ` David Miller
2009-04-28 14:49 ` Mathieu Desnoyers
2009-04-28 15:00 ` David Miller
2009-04-28 16:24 ` [PATCH] netfilter: revised locking for x_tables Stephen Hemminger
2009-04-28 16:50 ` Linus Torvalds
2009-04-28 16:55 ` Linus Torvalds
2009-04-29 5:37 ` David Miller
[not found] ` <20090428.223708.168741998.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2009-04-30 3:26 ` Jeff Chua
[not found] ` <b6a2187b0904292026k7d6107a7vcdc761d4149f40aa-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-04-30 3:31 ` David Miller
2009-05-01 8:38 ` [PATCH] netfilter: use likely() in xt_info_rdlock_bh() Eric Dumazet
2009-05-01 16:10 ` David Miller
2009-04-28 15:42 ` [PATCH] netfilter: use per-CPU r**ursive lock {XV} Paul E. McKenney
2009-04-28 17:35 ` Christoph Lameter
2009-04-28 15:09 ` Linus Torvalds
2009-04-27 23:32 ` Linus Torvalds
2009-04-28 7:41 ` Peter Zijlstra
2009-04-28 14:22 ` Paul E. McKenney
2009-04-28 7:42 ` Jan Engelhardt
2009-04-26 19:31 ` [PATCH] netfilter: use per-CPU recursive " Mathieu Desnoyers
2009-04-26 20:55 ` Eric Dumazet
2009-04-26 21:39 ` Mathieu Desnoyers
2009-04-21 18:34 ` [PATCH] netfilter: use per-cpu recursive lock (v11) Paul E. McKenney
2009-04-21 20:14 ` Linus Torvalds
2009-04-20 23:44 ` [PATCH] netfilter: use per-cpu recursive lock (v10) Paul E. McKenney
2009-04-16 0:02 ` [PATCH] netfilter: use per-cpu spinlock rather than RCU (v3) Linus Torvalds
2009-04-16 6:26 ` Eric Dumazet
2009-04-16 14:33 ` Paul E. McKenney
2009-04-15 3:23 ` David Miller
2009-04-14 17:19 ` [PATCH] netfilter: use per-cpu spinlock rather than RCU Stephen Hemminger
2009-04-11 15:50 ` Stephen Hemminger [this message]
2009-04-11 17:43 ` iptables very slow after commit 784544739a25c30637397ace5489eeb6e15d7d49 Paul E. McKenney
2009-04-11 18:57 ` Linus Torvalds
2009-04-12 0:34 ` Paul E. McKenney
2009-04-12 7:23 ` Evgeniy Polyakov
2009-04-12 16:06 ` Stephen Hemminger
2009-04-12 17:30 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090411085009.13d5a349@nehalam \
--to=shemminger@vyatta.com \
--cc=dada1@cosmosbay.com \
--cc=davem@davemloft.net \
--cc=jeff.chua.linux@gmail.com \
--cc=jengelh@medozas.de \
--cc=kaber@trash.net \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=r000n@r000n.net \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).