From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Eric Dumazet <dada1@cosmosbay.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>,
David Miller <davem@davemloft.net>,
Patrick McHardy <kaber@trash.net>,
Rick Jones <rick.jones2@hp.com>,
netdev@vger.kernel.org, netfilter-devel@vger.kernel.org,
linux kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] rcu: increment quiescent state counter in ksoftirqd()
Date: Fri, 27 Feb 2009 08:34:08 -0800 [thread overview]
Message-ID: <20090227163408.GB6758@linux.vnet.ibm.com> (raw)
In-Reply-To: <49A80FE4.6030508@cosmosbay.com>
On Fri, Feb 27, 2009 at 05:08:04PM +0100, Eric Dumazet wrote:
> Eric Dumazet a écrit :
> > Eric Dumazet a écrit :
> >> Stephen Hemminger a écrit :
> >>> The reader/writer lock in ip_tables is acquired in the critical path of
> >>> processing packets and is one of the reasons just loading iptables can cause
> >>> a 20% performance loss. The rwlock serves two functions:
> >>>
> >>> 1) it prevents changes to table state (xt_replace) while table is in use.
> >>> This is now handled by doing rcu on the xt_table. When table is
> >>> replaced, the new table(s) are put in and the old one table(s) are freed
> >>> after RCU period.
> >>>
> >>> 2) it provides synchronization when accesing the counter values.
> >>> This is now handled by swapping in new table_info entries for each cpu
> >>> then summing the old values, and putting the result back onto one
> >>> cpu. On a busy system it may cause sampling to occur at different
> >>> times on each cpu, but no packet/byte counts are lost in the process.
> >>>
> >>> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> >>
> >> Acked-by: Eric Dumazet <dada1@cosmosbay.com>
> >>
> >> Sucessfully tested on my dual quad core machine too, but iptables only (no ipv6 here)
> >>
> >> BTW, my new "tbench 8" result is 2450 MB/s, (it was 2150 MB/s not so long ago)
> >>
> >> Thanks Stephen, thats very cool stuff, yet another rwlock out of kernel :)
> >>
> >
> > While testing multicast flooding stuff, I found that "iptables -nvL" can
> > have a *very* slow response time on my dual quad core machine...
> >
> >
> > # time iptables -nvL
> > Chain INPUT (policy ACCEPT 416M packets, 64G bytes)
> > pkts bytes target prot opt in out source destination
> >
> > Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
> > pkts bytes target prot opt in out source destination
> >
> > Chain OUTPUT (policy ACCEPT 401M packets, 62G bytes)
> > pkts bytes target prot opt in out source destination
> >
> > real 0m1.810s <<<< HERE >>>>
> > user 0m0.000s
> > sys 0m0.001s
> >
> >
> > CONFIG_NO_HZ=y
> > CONFIG_HZ_1000=y
> > CONFIG_HZ=1000
> >
> > One cpu is 100% handling softirqs, could it be the problem ?
> >
> > Cpu0 : 1.0%us, 14.7%sy, 0.0%ni, 83.3%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st
> > Cpu1 : 3.6%us, 23.2%sy, 0.0%ni, 71.6%id, 0.0%wa, 0.0%hi, 1.7%si, 0.0%st
> > Cpu2 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi,100.0%si, 0.0%st
> > Cpu3 : 2.7%us, 23.9%sy, 0.0%ni, 71.1%id, 0.7%wa, 0.0%hi, 1.7%si, 0.0%st
> > Cpu4 : 1.3%us, 14.3%sy, 0.0%ni, 83.3%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st
> > Cpu5 : 1.0%us, 14.2%sy, 0.0%ni, 83.4%id, 0.0%wa, 0.0%hi, 1.3%si, 0.0%st
> > Cpu6 : 0.3%us, 7.0%sy, 0.0%ni, 92.4%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
> > Cpu7 : 0.7%us, 8.0%sy, 0.0%ni, 90.0%id, 0.7%wa, 0.0%hi, 0.7%si, 0.0%st
>
> Hi Paul
>
> I found following patch helps if one cpu is looping inside ksoftirqd()
>
> synchronize_rcu() now completes in 40 ms instead of 1800 ms.
>
> Thank you
>
> [PATCH] rcu: increment quiescent state counter in ksoftirqd()
>
> If a machine is flooded by network frames, a cpu can loop 100% of its time
> inside ksoftirqd() without calling schedule().
> This can delay RCU grace period to insane values.
>
> Adding rcu_qsctr_inc() call in ksoftirqd() solves this problem.
Good catch!!! This regression was a result of the recent change
from "schedule()" to "cond_resched()", which got rid of that quiescent
state in the common case where a reschedule is not needed.
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
> ---
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index bdbe9de..9041ea7 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -626,6 +626,7 @@ static int ksoftirqd(void * __bind_cpu)
> preempt_enable_no_resched();
> cond_resched();
> preempt_disable();
> + rcu_qsctr_inc((long)__bind_cpu);
> }
> preempt_enable();
> set_current_state(TASK_INTERRUPTIBLE);
>
next prev parent reply other threads:[~2009-02-27 16:34 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-18 5:19 [RFT 0/4] Netfilter/iptables performance improvements Stephen Hemminger
2009-02-18 5:19 ` [RFT 1/4] iptables: lock free counters Stephen Hemminger
2009-02-18 10:02 ` Patrick McHardy
2009-02-19 19:47 ` [PATCH] " Stephen Hemminger
2009-02-19 23:46 ` Eric Dumazet
2009-02-19 23:56 ` Rick Jones
2009-02-20 1:03 ` Stephen Hemminger
2009-02-20 1:18 ` Rick Jones
2009-02-20 9:42 ` Patrick McHardy
2009-02-20 22:57 ` Rick Jones
2009-02-21 0:35 ` Rick Jones
2009-02-20 9:37 ` Patrick McHardy
2009-02-20 18:10 ` [PATCH] iptables: xt_hashlimit fix Eric Dumazet
2009-02-20 18:33 ` Jan Engelhardt
2009-02-28 1:54 ` Jan Engelhardt
2009-02-28 6:56 ` Eric Dumazet
2009-02-28 8:22 ` Jan Engelhardt
2009-02-24 14:31 ` Patrick McHardy
2009-02-27 14:02 ` [PATCH] iptables: lock free counters Eric Dumazet
2009-02-27 16:08 ` [PATCH] rcu: increment quiescent state counter in ksoftirqd() Eric Dumazet
2009-02-27 16:34 ` Paul E. McKenney [this message]
2009-03-02 10:55 ` [PATCH] iptables: lock free counters Patrick McHardy
2009-03-02 17:47 ` Eric Dumazet
2009-03-02 21:56 ` Patrick McHardy
2009-03-02 22:02 ` Stephen Hemminger
2009-03-02 22:07 ` Patrick McHardy
2009-03-02 22:17 ` Paul E. McKenney
2009-03-02 22:27 ` Eric Dumazet
2009-02-18 5:19 ` [RFT 2/4] Add mod_timer_noact Stephen Hemminger
2009-02-18 9:20 ` Ingo Molnar
2009-02-18 9:30 ` David Miller
2009-02-18 11:01 ` Ingo Molnar
2009-02-18 11:39 ` Jarek Poplawski
2009-02-18 12:37 ` Ingo Molnar
2009-02-18 12:33 ` Patrick McHardy
2009-02-18 21:39 ` David Miller
2009-02-18 21:51 ` Ingo Molnar
2009-02-18 22:04 ` David Miller
2009-02-18 22:42 ` Peter Zijlstra
2009-02-18 22:47 ` David Miller
2009-02-18 22:56 ` Stephen Hemminger
2009-02-18 10:07 ` Patrick McHardy
2009-02-18 12:05 ` [patch] timers: add mod_timer_pending() Ingo Molnar
2009-02-18 12:33 ` Patrick McHardy
2009-02-18 12:50 ` Ingo Molnar
2009-02-18 12:54 ` Patrick McHardy
2009-02-18 13:47 ` Ingo Molnar
2009-02-18 17:00 ` Oleg Nesterov
2009-02-18 18:23 ` Ingo Molnar
2009-02-18 18:58 ` Oleg Nesterov
2009-02-18 19:24 ` Ingo Molnar
2009-02-18 10:29 ` [RFT 2/4] Add mod_timer_noact Patrick McHardy
2009-02-18 5:19 ` [RFT 3/4] Use mod_timer_noact to remove nf_conntrack_lock Stephen Hemminger
2009-02-18 9:54 ` Patrick McHardy
2009-02-18 11:05 ` Jarek Poplawski
2009-02-18 11:08 ` Patrick McHardy
2009-02-18 14:01 ` Eric Dumazet
2009-02-18 14:04 ` Patrick McHardy
2009-02-18 14:22 ` Eric Dumazet
2009-02-18 14:27 ` Patrick McHardy
2009-02-18 5:19 ` [RFT 4/4] netfilter: Get rid of central rwlock in tcp conntracking Stephen Hemminger
2009-02-18 9:56 ` Patrick McHardy
2009-02-18 14:17 ` Eric Dumazet
2009-02-19 22:03 ` Stephen Hemminger
2009-03-28 16:55 ` [PATCH] netfilter: finer grained nf_conn locking Eric Dumazet
2009-03-29 0:48 ` Stephen Hemminger
2009-03-30 19:57 ` Eric Dumazet
2009-03-30 20:05 ` Stephen Hemminger
2009-04-06 12:07 ` Patrick McHardy
2009-04-06 12:32 ` Jan Engelhardt
2009-04-06 17:25 ` Stephen Hemminger
2009-03-30 18:57 ` Rick Jones
2009-03-30 19:20 ` Eric Dumazet
2009-03-30 19:38 ` Jesper Dangaard Brouer
2009-03-30 19:54 ` Eric Dumazet
2009-03-30 20:34 ` Jesper Dangaard Brouer
2009-03-30 20:41 ` Eric Dumazet
2009-03-30 21:25 ` Jesper Dangaard Brouer
2009-03-30 22:44 ` Rick Jones
2009-02-18 21:55 ` [RFT 4/4] netfilter: Get rid of central rwlock in tcp conntracking David Miller
2009-02-18 23:23 ` Patrick McHardy
2009-02-18 23:35 ` Stephen Hemminger
2009-02-18 8:30 ` [RFT 0/4] Netfilter/iptables performance improvements Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090227163408.GB6758@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=dada1@cosmosbay.com \
--cc=davem@davemloft.net \
--cc=kaber@trash.net \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=rick.jones2@hp.com \
--cc=shemminger@vyatta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).