From: Eric Dumazet <dada1@cosmosbay.com>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>,
David Miller <davem@davemloft.net>,
Patrick McHardy <kaber@trash.net>,
Rick Jones <rick.jones2@hp.com>,
netdev@vger.kernel.org, netfilter-devel@vger.kernel.org,
linux kernel <linux-kernel@vger.kernel.org>
Subject: [PATCH] rcu: increment quiescent state counter in ksoftirqd()
Date: Fri, 27 Feb 2009 17:08:04 +0100 [thread overview]
Message-ID: <49A80FE4.6030508@cosmosbay.com> (raw)
In-Reply-To: <49A7F262.8040805@cosmosbay.com>
Eric Dumazet a écrit :
> Eric Dumazet a écrit :
>> Stephen Hemminger a écrit :
>>> The reader/writer lock in ip_tables is acquired in the critical path of
>>> processing packets and is one of the reasons just loading iptables can cause
>>> a 20% performance loss. The rwlock serves two functions:
>>>
>>> 1) it prevents changes to table state (xt_replace) while table is in use.
>>> This is now handled by doing rcu on the xt_table. When table is
>>> replaced, the new table(s) are put in and the old one table(s) are freed
>>> after RCU period.
>>>
>>> 2) it provides synchronization when accesing the counter values.
>>> This is now handled by swapping in new table_info entries for each cpu
>>> then summing the old values, and putting the result back onto one
>>> cpu. On a busy system it may cause sampling to occur at different
>>> times on each cpu, but no packet/byte counts are lost in the process.
>>>
>>> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>>
>> Acked-by: Eric Dumazet <dada1@cosmosbay.com>
>>
>> Sucessfully tested on my dual quad core machine too, but iptables only (no ipv6 here)
>>
>> BTW, my new "tbench 8" result is 2450 MB/s, (it was 2150 MB/s not so long ago)
>>
>> Thanks Stephen, thats very cool stuff, yet another rwlock out of kernel :)
>>
>
> While testing multicast flooding stuff, I found that "iptables -nvL" can
> have a *very* slow response time on my dual quad core machine...
>
>
> # time iptables -nvL
> Chain INPUT (policy ACCEPT 416M packets, 64G bytes)
> pkts bytes target prot opt in out source destination
>
> Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
> pkts bytes target prot opt in out source destination
>
> Chain OUTPUT (policy ACCEPT 401M packets, 62G bytes)
> pkts bytes target prot opt in out source destination
>
> real 0m1.810s <<<< HERE >>>>
> user 0m0.000s
> sys 0m0.001s
>
>
> CONFIG_NO_HZ=y
> CONFIG_HZ_1000=y
> CONFIG_HZ=1000
>
> One cpu is 100% handling softirqs, could it be the problem ?
>
> Cpu0 : 1.0%us, 14.7%sy, 0.0%ni, 83.3%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st
> Cpu1 : 3.6%us, 23.2%sy, 0.0%ni, 71.6%id, 0.0%wa, 0.0%hi, 1.7%si, 0.0%st
> Cpu2 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi,100.0%si, 0.0%st
> Cpu3 : 2.7%us, 23.9%sy, 0.0%ni, 71.1%id, 0.7%wa, 0.0%hi, 1.7%si, 0.0%st
> Cpu4 : 1.3%us, 14.3%sy, 0.0%ni, 83.3%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st
> Cpu5 : 1.0%us, 14.2%sy, 0.0%ni, 83.4%id, 0.0%wa, 0.0%hi, 1.3%si, 0.0%st
> Cpu6 : 0.3%us, 7.0%sy, 0.0%ni, 92.4%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
> Cpu7 : 0.7%us, 8.0%sy, 0.0%ni, 90.0%id, 0.7%wa, 0.0%hi, 0.7%si, 0.0%st
Hi Paul
I found following patch helps if one cpu is looping inside ksoftirqd()
synchronize_rcu() now completes in 40 ms instead of 1800 ms.
Thank you
[PATCH] rcu: increment quiescent state counter in ksoftirqd()
If a machine is flooded by network frames, a cpu can loop 100% of its time
inside ksoftirqd() without calling schedule().
This can delay RCU grace period to insane values.
Adding rcu_qsctr_inc() call in ksoftirqd() solves this problem.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---
diff --git a/kernel/softirq.c b/kernel/softirq.c
index bdbe9de..9041ea7 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -626,6 +626,7 @@ static int ksoftirqd(void * __bind_cpu)
preempt_enable_no_resched();
cond_resched();
preempt_disable();
+ rcu_qsctr_inc((long)__bind_cpu);
}
preempt_enable();
set_current_state(TASK_INTERRUPTIBLE);
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
WARNING: multiple messages have this Message-ID (diff)
From: Eric Dumazet <dada1@cosmosbay.com>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>,
David Miller <davem@davemloft.net>,
Patrick McHardy <kaber@trash.net>,
Rick Jones <rick.jones2@hp.com>,
netdev@vger.kernel.org, netfilter-devel@vger.kernel.org,
linux kernel <linux-kernel@vger.kernel.org>
Subject: [PATCH] rcu: increment quiescent state counter in ksoftirqd()
Date: Fri, 27 Feb 2009 17:08:04 +0100 [thread overview]
Message-ID: <49A80FE4.6030508@cosmosbay.com> (raw)
In-Reply-To: <49A7F262.8040805@cosmosbay.com>
Eric Dumazet a écrit :
> Eric Dumazet a écrit :
>> Stephen Hemminger a écrit :
>>> The reader/writer lock in ip_tables is acquired in the critical path of
>>> processing packets and is one of the reasons just loading iptables can cause
>>> a 20% performance loss. The rwlock serves two functions:
>>>
>>> 1) it prevents changes to table state (xt_replace) while table is in use.
>>> This is now handled by doing rcu on the xt_table. When table is
>>> replaced, the new table(s) are put in and the old one table(s) are freed
>>> after RCU period.
>>>
>>> 2) it provides synchronization when accesing the counter values.
>>> This is now handled by swapping in new table_info entries for each cpu
>>> then summing the old values, and putting the result back onto one
>>> cpu. On a busy system it may cause sampling to occur at different
>>> times on each cpu, but no packet/byte counts are lost in the process.
>>>
>>> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>>
>> Acked-by: Eric Dumazet <dada1@cosmosbay.com>
>>
>> Sucessfully tested on my dual quad core machine too, but iptables only (no ipv6 here)
>>
>> BTW, my new "tbench 8" result is 2450 MB/s, (it was 2150 MB/s not so long ago)
>>
>> Thanks Stephen, thats very cool stuff, yet another rwlock out of kernel :)
>>
>
> While testing multicast flooding stuff, I found that "iptables -nvL" can
> have a *very* slow response time on my dual quad core machine...
>
>
> # time iptables -nvL
> Chain INPUT (policy ACCEPT 416M packets, 64G bytes)
> pkts bytes target prot opt in out source destination
>
> Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
> pkts bytes target prot opt in out source destination
>
> Chain OUTPUT (policy ACCEPT 401M packets, 62G bytes)
> pkts bytes target prot opt in out source destination
>
> real 0m1.810s <<<< HERE >>>>
> user 0m0.000s
> sys 0m0.001s
>
>
> CONFIG_NO_HZ=y
> CONFIG_HZ_1000=y
> CONFIG_HZ=1000
>
> One cpu is 100% handling softirqs, could it be the problem ?
>
> Cpu0 : 1.0%us, 14.7%sy, 0.0%ni, 83.3%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st
> Cpu1 : 3.6%us, 23.2%sy, 0.0%ni, 71.6%id, 0.0%wa, 0.0%hi, 1.7%si, 0.0%st
> Cpu2 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi,100.0%si, 0.0%st
> Cpu3 : 2.7%us, 23.9%sy, 0.0%ni, 71.1%id, 0.7%wa, 0.0%hi, 1.7%si, 0.0%st
> Cpu4 : 1.3%us, 14.3%sy, 0.0%ni, 83.3%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st
> Cpu5 : 1.0%us, 14.2%sy, 0.0%ni, 83.4%id, 0.0%wa, 0.0%hi, 1.3%si, 0.0%st
> Cpu6 : 0.3%us, 7.0%sy, 0.0%ni, 92.4%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
> Cpu7 : 0.7%us, 8.0%sy, 0.0%ni, 90.0%id, 0.7%wa, 0.0%hi, 0.7%si, 0.0%st
Hi Paul
I found following patch helps if one cpu is looping inside ksoftirqd()
synchronize_rcu() now completes in 40 ms instead of 1800 ms.
Thank you
[PATCH] rcu: increment quiescent state counter in ksoftirqd()
If a machine is flooded by network frames, a cpu can loop 100% of its time
inside ksoftirqd() without calling schedule().
This can delay RCU grace period to insane values.
Adding rcu_qsctr_inc() call in ksoftirqd() solves this problem.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---
diff --git a/kernel/softirq.c b/kernel/softirq.c
index bdbe9de..9041ea7 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -626,6 +626,7 @@ static int ksoftirqd(void * __bind_cpu)
preempt_enable_no_resched();
cond_resched();
preempt_disable();
+ rcu_qsctr_inc((long)__bind_cpu);
}
preempt_enable();
set_current_state(TASK_INTERRUPTIBLE);
next prev parent reply other threads:[~2009-02-27 16:08 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-18 5:19 [RFT 0/4] Netfilter/iptables performance improvements Stephen Hemminger
2009-02-18 5:19 ` [RFT 1/4] iptables: lock free counters Stephen Hemminger
2009-02-18 10:02 ` Patrick McHardy
2009-02-19 19:47 ` [PATCH] " Stephen Hemminger
2009-02-19 23:46 ` Eric Dumazet
2009-02-19 23:56 ` Rick Jones
2009-02-20 1:03 ` Stephen Hemminger
2009-02-20 1:18 ` Rick Jones
2009-02-20 9:42 ` Patrick McHardy
2009-02-20 22:57 ` Rick Jones
2009-02-21 0:35 ` Rick Jones
2009-02-20 9:37 ` Patrick McHardy
2009-02-20 18:10 ` [PATCH] iptables: xt_hashlimit fix Eric Dumazet
2009-02-20 18:33 ` Jan Engelhardt
2009-02-28 1:54 ` Jan Engelhardt
2009-02-28 6:56 ` Eric Dumazet
2009-02-28 8:22 ` Jan Engelhardt
2009-02-24 14:31 ` Patrick McHardy
2009-02-27 14:02 ` [PATCH] iptables: lock free counters Eric Dumazet
2009-02-27 16:08 ` Eric Dumazet [this message]
2009-02-27 16:08 ` [PATCH] rcu: increment quiescent state counter in ksoftirqd() Eric Dumazet
2009-02-27 16:34 ` Paul E. McKenney
2009-03-02 10:55 ` [PATCH] iptables: lock free counters Patrick McHardy
2009-03-02 17:47 ` Eric Dumazet
2009-03-02 21:56 ` Patrick McHardy
2009-03-02 22:02 ` Stephen Hemminger
2009-03-02 22:07 ` Patrick McHardy
2009-03-02 22:17 ` Paul E. McKenney
2009-03-02 22:27 ` Eric Dumazet
2009-02-18 5:19 ` [RFT 2/4] Add mod_timer_noact Stephen Hemminger
2009-02-18 9:20 ` Ingo Molnar
2009-02-18 9:30 ` David Miller
2009-02-18 11:01 ` Ingo Molnar
2009-02-18 11:39 ` Jarek Poplawski
2009-02-18 12:37 ` Ingo Molnar
2009-02-18 12:33 ` Patrick McHardy
2009-02-18 21:39 ` David Miller
2009-02-18 21:51 ` Ingo Molnar
2009-02-18 22:04 ` David Miller
2009-02-18 22:42 ` Peter Zijlstra
2009-02-18 22:47 ` David Miller
2009-02-18 22:56 ` Stephen Hemminger
2009-02-18 10:07 ` Patrick McHardy
2009-02-18 12:05 ` [patch] timers: add mod_timer_pending() Ingo Molnar
2009-02-18 12:33 ` Patrick McHardy
2009-02-18 12:50 ` Ingo Molnar
2009-02-18 12:54 ` Patrick McHardy
2009-02-18 13:47 ` Ingo Molnar
2009-02-18 17:00 ` Oleg Nesterov
2009-02-18 18:23 ` Ingo Molnar
2009-02-18 18:58 ` Oleg Nesterov
2009-02-18 19:24 ` Ingo Molnar
2009-02-18 10:29 ` [RFT 2/4] Add mod_timer_noact Patrick McHardy
2009-02-18 5:19 ` [RFT 3/4] Use mod_timer_noact to remove nf_conntrack_lock Stephen Hemminger
2009-02-18 9:54 ` Patrick McHardy
2009-02-18 11:05 ` Jarek Poplawski
2009-02-18 11:08 ` Patrick McHardy
2009-02-18 14:01 ` Eric Dumazet
2009-02-18 14:04 ` Patrick McHardy
2009-02-18 14:22 ` Eric Dumazet
2009-02-18 14:27 ` Patrick McHardy
2009-02-18 5:19 ` [RFT 4/4] netfilter: Get rid of central rwlock in tcp conntracking Stephen Hemminger
2009-02-18 9:56 ` Patrick McHardy
2009-02-18 14:17 ` Eric Dumazet
2009-02-19 22:03 ` Stephen Hemminger
2009-03-28 16:55 ` [PATCH] netfilter: finer grained nf_conn locking Eric Dumazet
2009-03-29 0:48 ` Stephen Hemminger
2009-03-30 19:57 ` Eric Dumazet
2009-03-30 20:05 ` Stephen Hemminger
2009-04-06 12:07 ` Patrick McHardy
2009-04-06 12:32 ` Jan Engelhardt
2009-04-06 17:25 ` Stephen Hemminger
2009-03-30 18:57 ` Rick Jones
2009-03-30 19:20 ` Eric Dumazet
2009-03-30 19:38 ` Jesper Dangaard Brouer
2009-03-30 19:54 ` Eric Dumazet
2009-03-30 20:34 ` Jesper Dangaard Brouer
2009-03-30 20:41 ` Eric Dumazet
2009-03-30 21:25 ` Jesper Dangaard Brouer
2009-03-30 22:44 ` Rick Jones
2009-02-18 21:55 ` [RFT 4/4] netfilter: Get rid of central rwlock in tcp conntracking David Miller
2009-02-18 23:23 ` Patrick McHardy
2009-02-18 23:35 ` Stephen Hemminger
2009-02-18 8:30 ` [RFT 0/4] Netfilter/iptables performance improvements Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49A80FE4.6030508@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=davem@davemloft.net \
--cc=kaber@trash.net \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=rick.jones2@hp.com \
--cc=shemminger@vyatta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.