* Re: [PATCH net-next] softirq: reduce latencies @ 2013-01-03 13:12 Sedat Dilek 2013-01-03 13:31 ` Eric Dumazet 0 siblings, 1 reply; 15+ messages in thread From: Sedat Dilek @ 2013-01-03 13:12 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, LKML Hi Eric, your patch from [2] applies cleanly on top of Linux v3.8-rc2. I would like to test it. In [1] you were talking about benchmarks you did. Can you describe them or provide a testcase (script etc.)? You made only network testing? Thanks in advance. Regards, - Sedat - [1] http://marc.info/?l=linux-kernel&m=135721614718434&w=2 [2] https://patchwork.kernel.org/patch/1927531/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next] softirq: reduce latencies 2013-01-03 13:12 [PATCH net-next] softirq: reduce latencies Sedat Dilek @ 2013-01-03 13:31 ` Eric Dumazet 2013-01-03 19:41 ` Rick Jones 0 siblings, 1 reply; 15+ messages in thread From: Eric Dumazet @ 2013-01-03 13:31 UTC (permalink / raw) To: sedat.dilek; +Cc: netdev, LKML On Thu, 2013-01-03 at 14:12 +0100, Sedat Dilek wrote: > Hi Eric, > > your patch from [2] applies cleanly on top of Linux v3.8-rc2. > I would like to test it. > In [1] you were talking about benchmarks you did. > Can you describe them or provide a testcase (script etc.)? > You made only network testing? Yes I did network testing : - net_rx_action() softirq handler is the typical function that can consume 2 ms per call. I did some netperf sessions, with multiqueue 10G nics, tuned to that IRQ would be handled by few cpus. (check /proc/irq/*/eth0-$QUEUE/../smp_affinity ) Another way to make the softirq processing use more cpu cycles is by adding a fake iptable setup like : for n in `seq 1 100` do iptables -I INPUT done A common network load is to launch ~200 concurrent TCP_RR netperf sessions like the following netperf -H remote_host -t TCP_RR -l 1000 And then you can launch some netperf asking P99_LATENCY results : netperf -H remote_host -t TCP_RR -- -k P99_LATENCY You can play with taskset or netperf option -T to force netperf/netserver running on given cpus. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next] softirq: reduce latencies 2013-01-03 13:31 ` Eric Dumazet @ 2013-01-03 19:41 ` Rick Jones 2013-01-04 4:41 ` Eric Dumazet 0 siblings, 1 reply; 15+ messages in thread From: Rick Jones @ 2013-01-03 19:41 UTC (permalink / raw) To: Eric Dumazet; +Cc: sedat.dilek, netdev, LKML On 01/03/2013 05:31 AM, Eric Dumazet wrote: > A common network load is to launch ~200 concurrent TCP_RR netperf > sessions like the following > > netperf -H remote_host -t TCP_RR -l 1000 > And then you can launch some netperf asking P99_LATENCY results : > > netperf -H remote_host -t TCP_RR -- -k P99_LATENCY In terms of netperf overhead, once you specify P99_LATENCY, you are already in for the pound of cost but only getting the penny of output (so to speak). While it would clutter the output, one could go ahead and ask for the other latency stats and it won't "cost" anything more: ... -- -k RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY Additional information about how the omni output selectors work can be found at http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Omni-Output-Selection happy benchmarking, rick jones BTW - you will likely see some differences between RT_LATENCY, which is calculated from the average transactions per second, and MEAN_LATENCY, which is calculated from the histogram of individual latencies maintained when any of the _LATENCY outputs other than RT_LATENCY is requested. Kudos to the folks at Google who did the extensions to the then-existing histogram code to enable it to be used for more reasonably accurate statistics. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next] softirq: reduce latencies 2013-01-03 19:41 ` Rick Jones @ 2013-01-04 4:41 ` Eric Dumazet 2013-01-04 5:31 ` Sedat Dilek 2013-01-04 11:57 ` Sedat Dilek 0 siblings, 2 replies; 15+ messages in thread From: Eric Dumazet @ 2013-01-04 4:41 UTC (permalink / raw) To: Rick Jones; +Cc: sedat.dilek, netdev, LKML On Thu, 2013-01-03 at 11:41 -0800, Rick Jones wrote: > In terms of netperf overhead, once you specify P99_LATENCY, you are > already in for the pound of cost but only getting the penny of output > (so to speak). While it would clutter the output, one could go ahead > and ask for the other latency stats and it won't "cost" anything more: > > ... -- -k > RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY > > Additional information about how the omni output selectors work can be > found at > http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Omni-Output-Selection > > happy benchmarking, > > rick jones > > BTW - you will likely see some differences between RT_LATENCY, which is > calculated from the average transactions per second, and MEAN_LATENCY, > which is calculated from the histogram of individual latencies > maintained when any of the _LATENCY outputs other than RT_LATENCY is > requested. Kudos to the folks at Google who did the extensions to the > then-existing histogram code to enable it to be used for more reasonably > accurate statistics. > Yeah ;) Here are the before/after_patch results, cpu 2 handling the NIC irqs : Before patch : # netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind RT_LATENCY=550110.424 MIN_LATENCY=146858 MAX_LATENCY=997109 P50_LATENCY=305000 P90_LATENCY=550000 P99_LATENCY=710000 MEAN_LATENCY=376989.12 STDDEV_LATENCY=184046.92 After patch : # netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind RT_LATENCY=40545.492 MIN_LATENCY=9834 MAX_LATENCY=78366 P50_LATENCY=33583 P90_LATENCY=59000 P99_LATENCY=69000 MEAN_LATENCY=38364.67 STDDEV_LATENCY=12865.26 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next] softirq: reduce latencies 2013-01-04 4:41 ` Eric Dumazet @ 2013-01-04 5:31 ` Sedat Dilek 2013-01-04 6:54 ` Eric Dumazet 2013-01-04 11:57 ` Sedat Dilek 1 sibling, 1 reply; 15+ messages in thread From: Sedat Dilek @ 2013-01-04 5:31 UTC (permalink / raw) To: Eric Dumazet; +Cc: Rick Jones, netdev, LKML, Ben Hutchings On Fri, Jan 4, 2013 at 5:41 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Thu, 2013-01-03 at 11:41 -0800, Rick Jones wrote: > >> In terms of netperf overhead, once you specify P99_LATENCY, you are >> already in for the pound of cost but only getting the penny of output >> (so to speak). While it would clutter the output, one could go ahead >> and ask for the other latency stats and it won't "cost" anything more: >> >> ... -- -k >> RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY >> >> Additional information about how the omni output selectors work can be >> found at >> http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Omni-Output-Selection >> >> happy benchmarking, >> >> rick jones >> >> BTW - you will likely see some differences between RT_LATENCY, which is >> calculated from the average transactions per second, and MEAN_LATENCY, >> which is calculated from the histogram of individual latencies >> maintained when any of the _LATENCY outputs other than RT_LATENCY is >> requested. Kudos to the folks at Google who did the extensions to the >> then-existing histogram code to enable it to be used for more reasonably >> accurate statistics. >> > > Yeah ;) > > Here are the before/after_patch results, cpu 2 handling the NIC irqs : > > > Before patch : > > # netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k > RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY > MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET > to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind > RT_LATENCY=550110.424 > MIN_LATENCY=146858 > MAX_LATENCY=997109 > P50_LATENCY=305000 > P90_LATENCY=550000 > P99_LATENCY=710000 > MEAN_LATENCY=376989.12 > STDDEV_LATENCY=184046.92 > > After patch : > > # netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k > RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY > MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET > to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind > RT_LATENCY=40545.492 > MIN_LATENCY=9834 > MAX_LATENCY=78366 > P50_LATENCY=33583 > P90_LATENCY=59000 > P99_LATENCY=69000 > MEAN_LATENCY=38364.67 > STDDEV_LATENCY=12865.26 > Will you send a v2 with this change...? -#define MAX_SOFTIRQ_TIME min(1, (2*HZ/1000)) +#define MAX_SOFTIRQ_TIME max(1, (2*HZ/1000)) - Sedat - ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next] softirq: reduce latencies 2013-01-04 5:31 ` Sedat Dilek @ 2013-01-04 6:54 ` Eric Dumazet 0 siblings, 0 replies; 15+ messages in thread From: Eric Dumazet @ 2013-01-04 6:54 UTC (permalink / raw) To: sedat.dilek; +Cc: Rick Jones, netdev, LKML, Ben Hutchings On Fri, 2013-01-04 at 06:31 +0100, Sedat Dilek wrote: > > Will you send a v2 with this change...? > > -#define MAX_SOFTIRQ_TIME min(1, (2*HZ/1000)) > +#define MAX_SOFTIRQ_TIME max(1, (2*HZ/1000)) I will, I was planning to do this after waiting for other comments/reviews. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next] softirq: reduce latencies 2013-01-04 4:41 ` Eric Dumazet 2013-01-04 5:31 ` Sedat Dilek @ 2013-01-04 11:57 ` Sedat Dilek 1 sibling, 0 replies; 15+ messages in thread From: Sedat Dilek @ 2013-01-04 11:57 UTC (permalink / raw) To: Eric Dumazet; +Cc: Rick Jones, netdev, LKML [-- Attachment #1: Type: text/plain, Size: 2491 bytes --] On Fri, Jan 4, 2013 at 5:41 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Thu, 2013-01-03 at 11:41 -0800, Rick Jones wrote: > >> In terms of netperf overhead, once you specify P99_LATENCY, you are >> already in for the pound of cost but only getting the penny of output >> (so to speak). While it would clutter the output, one could go ahead >> and ask for the other latency stats and it won't "cost" anything more: >> >> ... -- -k >> RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY >> >> Additional information about how the omni output selectors work can be >> found at >> http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Omni-Output-Selection >> >> happy benchmarking, >> >> rick jones >> >> BTW - you will likely see some differences between RT_LATENCY, which is >> calculated from the average transactions per second, and MEAN_LATENCY, >> which is calculated from the histogram of individual latencies >> maintained when any of the _LATENCY outputs other than RT_LATENCY is >> requested. Kudos to the folks at Google who did the extensions to the >> then-existing histogram code to enable it to be used for more reasonably >> accurate statistics. >> > > Yeah ;) > > Here are the before/after_patch results, cpu 2 handling the NIC irqs : > > > Before patch : > > # netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k > RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY > MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET > to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind > RT_LATENCY=550110.424 > MIN_LATENCY=146858 > MAX_LATENCY=997109 > P50_LATENCY=305000 > P90_LATENCY=550000 > P99_LATENCY=710000 > MEAN_LATENCY=376989.12 > STDDEV_LATENCY=184046.92 > > After patch : > > # netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k > RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY > MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET > to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind > RT_LATENCY=40545.492 > MIN_LATENCY=9834 > MAX_LATENCY=78366 > P50_LATENCY=33583 > P90_LATENCY=59000 > P99_LATENCY=69000 > MEAN_LATENCY=38364.67 > STDDEV_LATENCY=12865.26 > I also wanted to give some numbers. But with localhost as default (no netserver running on a remote-host) I am not sure if these numbers give any helpful feedback. ( I have not tested yet w/o your patch. ) - Sedat - [-- Attachment #2: NETPERF_softirq-experimental.txt --] [-- Type: text/plain, Size: 2150 bytes --] # netperf -H localhost -t TCP_RR -T2,2 -- -k RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET : demo : first burst 0 : cpu bind RT_LATENCY=11.407 MIN_LATENCY=10 MAX_LATENCY=152 P50_LATENCY=11 P90_LATENCY=12 P99_LATENCY=13 MEAN_LATENCY=11.31 STDDEV_LATENCY=0.95 # netperf -H localhost -t TCP_RR -T2,2 -- -k RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET : demo : first burst 0 : cpu bind RT_LATENCY=11.364 MIN_LATENCY=10 MAX_LATENCY=156 P50_LATENCY=11 P90_LATENCY=12 P99_LATENCY=13 MEAN_LATENCY=11.27 STDDEV_LATENCY=1.06 # netperf -H localhost -t TCP_RR -T2,2 -- -k RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET : demo : first burst 0 : cpu bind RT_LATENCY=11.408 MIN_LATENCY=10 MAX_LATENCY=156 P50_LATENCY=11 P90_LATENCY=12 P99_LATENCY=13 MEAN_LATENCY=11.31 STDDEV_LATENCY=1.07 # netperf -H localhost -t TCP_RR -T2,2 -- -k RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET : demo : first burst 0 : cpu bind RT_LATENCY=11.312 MIN_LATENCY=10 MAX_LATENCY=151 P50_LATENCY=11 P90_LATENCY=12 P99_LATENCY=13 MEAN_LATENCY=11.22 STDDEV_LATENCY=0.94 # netperf -H localhost -t TCP_RR -T2,2 -- -k RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET : demo : first burst 0 : cpu bind RT_LATENCY=11.345 MIN_LATENCY=10 MAX_LATENCY=159 P50_LATENCY=11 P90_LATENCY=12 P99_LATENCY=13 MEAN_LATENCY=11.25 STDDEV_LATENCY=1.03 -dileks // 04-Jan-2013 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH net-next] softirq: reduce latencies @ 2013-01-03 12:28 Eric Dumazet 2013-01-03 20:46 ` Andrew Morton 2013-01-03 22:08 ` Ben Hutchings 0 siblings, 2 replies; 15+ messages in thread From: Eric Dumazet @ 2013-01-03 12:28 UTC (permalink / raw) To: David Miller, Andrew Morton Cc: netdev, linux-kernel@vger.kernel.org, Tom Herbert From: Eric Dumazet <edumazet@google.com> In various network workloads, __do_softirq() latencies can be up to 20 ms if HZ=1000, and 200 ms if HZ=100. This is because we iterate 10 times in the softirq dispatcher, and some actions can consume a lot of cycles. This patch changes the fallback to ksoftirqd condition to : - A time limit of 2 ms. - need_resched() being set on current task When one of this condition is met, we wakeup ksoftirqd for further softirq processing if we still have pending softirqs. I ran several benchmarks and got no significant difference in throughput, but a very significant reduction of maximum latencies. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Miller <davem@davemloft.net> Cc: Tom Herbert <therbert@google.com> --- kernel/softirq.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/kernel/softirq.c b/kernel/softirq.c index ed567ba..64d61ea 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -195,21 +195,21 @@ void local_bh_enable_ip(unsigned long ip) EXPORT_SYMBOL(local_bh_enable_ip); /* - * We restart softirq processing MAX_SOFTIRQ_RESTART times, - * and we fall back to softirqd after that. + * We restart softirq processing for at most 2 ms, + * and if need_resched() is not set. * - * This number has been established via experimentation. + * These limits have been established via experimentation. * The two things to balance is latency against fairness - * we want to handle softirqs as soon as possible, but they * should not be able to lock up the box. */ -#define MAX_SOFTIRQ_RESTART 10 +#define MAX_SOFTIRQ_TIME min(1, (2*HZ/1000)) asmlinkage void __do_softirq(void) { struct softirq_action *h; __u32 pending; - int max_restart = MAX_SOFTIRQ_RESTART; + unsigned long end = jiffies + MAX_SOFTIRQ_TIME; int cpu; unsigned long old_flags = current->flags; @@ -264,11 +264,12 @@ restart: local_irq_disable(); pending = local_softirq_pending(); - if (pending && --max_restart) - goto restart; + if (pending) { + if (time_before(jiffies, end) && !need_resched()) + goto restart; - if (pending) wakeup_softirqd(); + } lockdep_softirq_exit(); ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH net-next] softirq: reduce latencies 2013-01-03 12:28 Eric Dumazet @ 2013-01-03 20:46 ` Andrew Morton 2013-01-03 22:41 ` Eric Dumazet 2013-01-03 22:08 ` Ben Hutchings 1 sibling, 1 reply; 15+ messages in thread From: Andrew Morton @ 2013-01-03 20:46 UTC (permalink / raw) To: Eric Dumazet Cc: David Miller, netdev, linux-kernel@vger.kernel.org, Tom Herbert On Thu, 03 Jan 2013 04:28:52 -0800 Eric Dumazet <eric.dumazet@gmail.com> wrote: > From: Eric Dumazet <edumazet@google.com> > > In various network workloads, __do_softirq() latencies can be up > to 20 ms if HZ=1000, and 200 ms if HZ=100. > > This is because we iterate 10 times in the softirq dispatcher, > and some actions can consume a lot of cycles. hm, where did that "20 ms" come from? What caused it? Is it simply the case that you happened to have actions which consume 2ms if HZ=1000 and 20ms if HZ=100? > This patch changes the fallback to ksoftirqd condition to : > > - A time limit of 2 ms. > - need_resched() being set on current task > > When one of this condition is met, we wakeup ksoftirqd for further > softirq processing if we still have pending softirqs. Do we need both tests? The need_resched() test alone might be sufficient? With this change, there is a possibility that a rapidly-rescheduling task will cause softirq starvation? Can this change cause worsened latencies in some situations? Say there are a large number of short-running actions queued. Presently we'll dispatch ten of them and return. With this change we'll dispatch many more of them - however many consume 2ms. So worst-case latency increases from "10 * not-much" to "2 ms". ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next] softirq: reduce latencies 2013-01-03 20:46 ` Andrew Morton @ 2013-01-03 22:41 ` Eric Dumazet 2013-01-04 5:16 ` Namhyung Kim [not found] ` <787701357283699@web24e.yandex.ru> 0 siblings, 2 replies; 15+ messages in thread From: Eric Dumazet @ 2013-01-03 22:41 UTC (permalink / raw) To: Andrew Morton Cc: David Miller, netdev, linux-kernel@vger.kernel.org, Tom Herbert On Thu, 2013-01-03 at 12:46 -0800, Andrew Morton wrote: > On Thu, 03 Jan 2013 04:28:52 -0800 > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > From: Eric Dumazet <edumazet@google.com> > > > > In various network workloads, __do_softirq() latencies can be up > > to 20 ms if HZ=1000, and 200 ms if HZ=100. > > > > This is because we iterate 10 times in the softirq dispatcher, > > and some actions can consume a lot of cycles. > > hm, where did that "20 ms" come from? What caused it? Is it simply > the case that you happened to have actions which consume 2ms if HZ=1000 > and 20ms if HZ=100? net_rx_action() has such behavior yes. In the worst/busy case, we spend 2 ticks per call, and even more in some cases you dont want to know about (like triggering a IPv6 route garbage collect) > > > This patch changes the fallback to ksoftirqd condition to : > > > > - A time limit of 2 ms. > > - need_resched() being set on current task > > > > When one of this condition is met, we wakeup ksoftirqd for further > > softirq processing if we still have pending softirqs. > > Do we need both tests? The need_resched() test alone might be > sufficient? > I tried a need_resched() only, but could trigger watchdog faults and reboots, in case a cpu was dedicated to softirq and all other tasks run on other cpus. In other cases, the following RCU splat was triggered : (need_resched() doesnt know the current cpu is blocking RCU ) Jan 2 21:33:40 lpq83 kernel: [ 311.678050] INFO: rcu_sched self-detected stall on CPU { 2} (t=21000 jiffies g=11416 c=11415 q=2665) Jan 2 21:33:40 lpq83 kernel: [ 311.687314] Pid: 1460, comm: simple-watchdog Not tainted 3.8.0-smp-DEV #63 Jan 2 21:33:40 lpq83 kernel: [ 311.687316] Call Trace: Jan 2 21:33:40 lpq83 kernel: [ 311.687317] <IRQ> [<ffffffff81100e92>] rcu_check_callbacks+0x212/0x7a0 Jan 2 21:33:40 lpq83 kernel: [ 311.687326] [<ffffffff81097018>] update_process_times+0x48/0x90 Jan 2 21:33:40 lpq83 kernel: [ 311.687329] [<ffffffff810cfe31>] tick_sched_timer+0x81/0xd0 Jan 2 21:33:40 lpq83 kernel: [ 311.687332] [<ffffffff810ad69d>] __run_hrtimer+0x7d/0x220 Jan 2 21:33:40 lpq83 kernel: [ 311.687333] [<ffffffff810cfdb0>] ? tick_nohz_handler+0x100/0x100 Jan 2 21:33:40 lpq83 kernel: [ 311.687337] [<ffffffff810ca02c>] ? ktime_get_update_offsets+0x4c/0xd0 Jan 2 21:33:40 lpq83 kernel: [ 311.687339] [<ffffffff810adfa7>] hrtimer_interrupt+0xf7/0x230 Jan 2 21:33:40 lpq83 kernel: [ 311.687343] [<ffffffff815b7089>] smp_apic_timer_interrupt+0x69/0x99 Jan 2 21:33:40 lpq83 kernel: [ 311.687345] [<ffffffff815b630a>] apic_timer_interrupt+0x6a/0x70 Jan 2 21:33:40 lpq83 kernel: [ 311.687348] [<ffffffffa01a1b66>] ? ipt_do_table+0x106/0x5b0 [ip_tables] Jan 2 21:33:40 lpq83 kernel: [ 311.687352] [<ffffffff810fb357>] ? handle_edge_irq+0x77/0x130 Jan 2 21:33:40 lpq83 kernel: [ 311.687354] [<ffffffff8108e9c9>] ? irq_exit+0x79/0xb0 Jan 2 21:33:40 lpq83 kernel: [ 311.687356] [<ffffffff815b6fa3>] ? do_IRQ+0x63/0xe0 Jan 2 21:33:40 lpq83 kernel: [ 311.687359] [<ffffffffa004a0d3>] iptable_filter_hook+0x33/0x64 [iptable_filter] Jan 2 21:33:40 lpq83 kernel: [ 311.687362] [<ffffffff81523dff>] nf_iterate+0x8f/0xd0 Jan 2 21:33:40 lpq83 kernel: [ 311.687364] [<ffffffff81529fc0>] ? ip_rcv_finish+0x360/0x360 Jan 2 21:33:40 lpq83 kernel: [ 311.687366] [<ffffffff81523ebd>] nf_hook_slow+0x7d/0x150 Jan 2 21:33:40 lpq83 kernel: [ 311.687368] [<ffffffff81529fc0>] ? ip_rcv_finish+0x360/0x360 Jan 2 21:33:40 lpq83 kernel: [ 311.687370] [<ffffffff8152a39e>] ip_local_deliver+0x5e/0xa0 Jan 2 21:33:40 lpq83 kernel: [ 311.687372] [<ffffffff81529d79>] ip_rcv_finish+0x119/0x360 Jan 2 21:33:40 lpq83 kernel: [ 311.687374] [<ffffffff8152a631>] ip_rcv+0x251/0x300 Jan 2 21:33:40 lpq83 kernel: [ 311.687377] [<ffffffff814f4c72>] __netif_receive_skb+0x582/0x820 Jan 2 21:33:40 lpq83 kernel: [ 311.687379] [<ffffffff81560697>] ? inet_gro_receive+0x197/0x200 Jan 2 21:33:40 lpq83 kernel: [ 311.687381] [<ffffffff814f50ad>] netif_receive_skb+0x2d/0x90 Jan 2 21:33:40 lpq83 kernel: [ 311.687383] [<ffffffff814f5943>] napi_gro_frags+0xf3/0x2a0 Jan 2 21:33:40 lpq83 kernel: [ 311.687387] [<ffffffffa01aa87c>] mlx4_en_process_rx_cq+0x6cc/0x7b0 [mlx4_en] Jan 2 21:33:40 lpq83 kernel: [ 311.687390] [<ffffffffa01aa9ff>] mlx4_en_poll_rx_cq+0x3f/0x80 [mlx4_en] Jan 2 21:33:40 lpq83 kernel: [ 311.687392] [<ffffffff814f53c1>] net_rx_action+0x111/0x210 Jan 2 21:33:40 lpq83 kernel: [ 311.687393] [<ffffffff814f3a51>] ? net_tx_action+0x81/0x1d0 > > With this change, there is a possibility that a rapidly-rescheduling > task will cause softirq starvation? > Only if this task has higher priority than ksoftirqd, but then, people wanting/playing with high priority tasks know what they do ;) > > Can this change cause worsened latencies in some situations? Say there > are a large number of short-running actions queued. Presently we'll > dispatch ten of them and return. With this change we'll dispatch many > more of them - however many consume 2ms. So worst-case latency > increases from "10 * not-much" to "2 ms". I tried to reproduce such workload but couldnt. 2 ms (or more exactly 1 to 2 ms given the jiffies/HZ granularity) is about the time needed to process 1000 frames on current hardware. Certainly, this patch will increase number of scheduler calls in some situations. But with the increase of cores, it seems a bit odd to allow softirq to be a bad guy. The current logic was more suited for the !SMP age. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next] softirq: reduce latencies 2013-01-03 22:41 ` Eric Dumazet @ 2013-01-04 5:16 ` Namhyung Kim 2013-01-04 6:53 ` Eric Dumazet [not found] ` <787701357283699@web24e.yandex.ru> 1 sibling, 1 reply; 15+ messages in thread From: Namhyung Kim @ 2013-01-04 5:16 UTC (permalink / raw) To: Eric Dumazet Cc: Andrew Morton, David Miller, netdev, linux-kernel@vger.kernel.org, Tom Herbert Hi, On Thu, 03 Jan 2013 14:41:15 -0800, Eric Dumazet wrote: > On Thu, 2013-01-03 at 12:46 -0800, Andrew Morton wrote: >> Can this change cause worsened latencies in some situations? Say there >> are a large number of short-running actions queued. Presently we'll >> dispatch ten of them and return. With this change we'll dispatch many >> more of them - however many consume 2ms. So worst-case latency >> increases from "10 * not-much" to "2 ms". > > I tried to reproduce such workload but couldnt. 2 ms (or more exactly 1 > to 2 ms given the jiffies/HZ granularity) is about the time needed to > process 1000 frames on current hardware. Probably a silly question: Why not using ktime rather than jiffies for this? Thanks, Namhyung ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next] softirq: reduce latencies 2013-01-04 5:16 ` Namhyung Kim @ 2013-01-04 6:53 ` Eric Dumazet 0 siblings, 0 replies; 15+ messages in thread From: Eric Dumazet @ 2013-01-04 6:53 UTC (permalink / raw) To: Namhyung Kim Cc: Andrew Morton, David Miller, netdev, linux-kernel@vger.kernel.org, Tom Herbert On Fri, 2013-01-04 at 14:16 +0900, Namhyung Kim wrote: > Probably a silly question: > > Why not using ktime rather than jiffies for this? ktime is too expensive on some hardware. Here we only want a safety belt, no need for high time resolution. ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <787701357283699@web24e.yandex.ru>]
* Re: [PATCH net-next] softirq: reduce latencies [not found] ` <787701357283699@web24e.yandex.ru> @ 2013-01-04 7:46 ` Eric Dumazet 0 siblings, 0 replies; 15+ messages in thread From: Eric Dumazet @ 2013-01-04 7:46 UTC (permalink / raw) To: Oleg A.Arkhangelsky Cc: Andrew Morton, David Miller, netdev, linux-kernel@vger.kernel.org, Tom Herbert On Fri, 2013-01-04 at 11:14 +0400, Oleg A.Arkhangelsky wrote: > It leads to many context switches when softirqs processing deffered to > ksoftirqd kthreads which can be very expensive. Here is some evidence > of ksoftirqd activation effects: > > http://marc.info/?l=linux-netdev&m=124116262916969&w=2 > > Look for "magic threshold". Yes, I know there was another bug in scheduler > discovered that time, but this bug was only about tick accounting. > This thread is 3 years old : - It was a router workload. Forwarded packets should not wakeup a task. - The measure of how cpus spent their cycles was completely wrong. - A lot of things have changed, both in network stack and scheduler. In fact, under moderate load, my patch is able to loop more than 10 times before deferring to ksoftirqd. Under stress, ksoftirqd will be started anyway, and its a good thing, because it enables process migration. 500 "context switches" [1] per second instead of 50 on behalf of ksoftirqd is absolutely not measurable. It also permits smoother RCU cleanups. I did a lot of benchmarks, and didnt see any regression yet, but usual noise. [1] Under load, __do_softirq() would be called 500 times per second, instead of ~50 times per second. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next] softirq: reduce latencies 2013-01-03 12:28 Eric Dumazet 2013-01-03 20:46 ` Andrew Morton @ 2013-01-03 22:08 ` Ben Hutchings 2013-01-03 22:40 ` Eric Dumazet 1 sibling, 1 reply; 15+ messages in thread From: Ben Hutchings @ 2013-01-03 22:08 UTC (permalink / raw) To: Eric Dumazet Cc: David Miller, Andrew Morton, netdev, linux-kernel@vger.kernel.org, Tom Herbert On Thu, 2013-01-03 at 04:28 -0800, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > In various network workloads, __do_softirq() latencies can be up > to 20 ms if HZ=1000, and 200 ms if HZ=100. > > This is because we iterate 10 times in the softirq dispatcher, > and some actions can consume a lot of cycles. > > This patch changes the fallback to ksoftirqd condition to : > > - A time limit of 2 ms. > - need_resched() being set on current task [...] > --- a/kernel/softirq.c > +++ b/kernel/softirq.c [...] > -#define MAX_SOFTIRQ_RESTART 10 > +#define MAX_SOFTIRQ_TIME min(1, (2*HZ/1000)) [...] Really? Never iterate if HZ < 500? Ben. -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next] softirq: reduce latencies 2013-01-03 22:08 ` Ben Hutchings @ 2013-01-03 22:40 ` Eric Dumazet 0 siblings, 0 replies; 15+ messages in thread From: Eric Dumazet @ 2013-01-03 22:40 UTC (permalink / raw) To: Ben Hutchings Cc: David Miller, Andrew Morton, netdev, linux-kernel@vger.kernel.org, Tom Herbert On Thu, 2013-01-03 at 22:08 +0000, Ben Hutchings wrote: > On Thu, 2013-01-03 at 04:28 -0800, Eric Dumazet wrote: > > From: Eric Dumazet <edumazet@google.com> > > > > In various network workloads, __do_softirq() latencies can be up > > to 20 ms if HZ=1000, and 200 ms if HZ=100. > > > > This is because we iterate 10 times in the softirq dispatcher, > > and some actions can consume a lot of cycles. > > > > This patch changes the fallback to ksoftirqd condition to : > > > > - A time limit of 2 ms. > > - need_resched() being set on current task > [...] > > --- a/kernel/softirq.c > > +++ b/kernel/softirq.c > [...] > > -#define MAX_SOFTIRQ_RESTART 10 > > +#define MAX_SOFTIRQ_TIME min(1, (2*HZ/1000)) > [...] > > Really? Never iterate if HZ < 500? > good catch, it should be max() ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2013-01-04 11:57 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-01-03 13:12 [PATCH net-next] softirq: reduce latencies Sedat Dilek 2013-01-03 13:31 ` Eric Dumazet 2013-01-03 19:41 ` Rick Jones 2013-01-04 4:41 ` Eric Dumazet 2013-01-04 5:31 ` Sedat Dilek 2013-01-04 6:54 ` Eric Dumazet 2013-01-04 11:57 ` Sedat Dilek -- strict thread matches above, loose matches on Subject: below -- 2013-01-03 12:28 Eric Dumazet 2013-01-03 20:46 ` Andrew Morton 2013-01-03 22:41 ` Eric Dumazet 2013-01-04 5:16 ` Namhyung Kim 2013-01-04 6:53 ` Eric Dumazet [not found] ` <787701357283699@web24e.yandex.ru> 2013-01-04 7:46 ` Eric Dumazet 2013-01-03 22:08 ` Ben Hutchings 2013-01-03 22:40 ` Eric Dumazet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).