Re: [PATCH net-next] softirq: reduce latencies

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH net-next] softirq: reduce latencies
@ 2013-01-03 13:12 Sedat Dilek
  2013-01-03 13:31 ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Sedat Dilek @ 2013-01-03 13:12 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, LKML

Hi Eric,

your patch from [2] applies cleanly on top of Linux v3.8-rc2.
I would like to test it.
In [1] you were talking about benchmarks you did.
Can you describe them or provide a testcase (script etc.)?
You made only network testing?
Thanks in advance.

Regards,
- Sedat -

[1] http://marc.info/?l=linux-kernel&m=135721614718434&w=2
[2] https://patchwork.kernel.org/patch/1927531/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] softirq: reduce latencies
  2013-01-03 13:12 [PATCH net-next] softirq: reduce latencies Sedat Dilek
@ 2013-01-03 13:31 ` Eric Dumazet
  2013-01-03 19:41   ` Rick Jones
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2013-01-03 13:31 UTC (permalink / raw)
  To: sedat.dilek; +Cc: netdev, LKML

On Thu, 2013-01-03 at 14:12 +0100, Sedat Dilek wrote:
> Hi Eric,
> 
> your patch from [2] applies cleanly on top of Linux v3.8-rc2.
> I would like to test it.
> In [1] you were talking about benchmarks you did.
> Can you describe them or provide a testcase (script etc.)?
> You made only network testing?

Yes I did network testing : 

- net_rx_action() softirq handler is the typical function that can
consume 2 ms per call.

I did some netperf sessions, with multiqueue 10G nics, tuned to that IRQ
would be handled by few cpus.
 (check /proc/irq/*/eth0-$QUEUE/../smp_affinity )

Another way to make the softirq processing use more cpu cycles is by
adding a fake iptable setup like :

for n in `seq 1 100`
do
 iptables -I INPUT
done

A common network load is to launch ~200 concurrent TCP_RR netperf
sessions like the following

netperf -H remote_host -t TCP_RR -l 1000

And then you can launch some netperf asking P99_LATENCY results :

netperf -H remote_host -t TCP_RR -- -k P99_LATENCY

You can play with taskset or netperf option -T to force
netperf/netserver running on given cpus.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] softirq: reduce latencies
  2013-01-03 13:31 ` Eric Dumazet
@ 2013-01-03 19:41   ` Rick Jones
  2013-01-04  4:41     ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Rick Jones @ 2013-01-03 19:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: sedat.dilek, netdev, LKML

On 01/03/2013 05:31 AM, Eric Dumazet wrote:
> A common network load is to launch ~200 concurrent TCP_RR netperf
> sessions like the following
>
> netperf -H remote_host -t TCP_RR -l 1000

> And then you can launch some netperf asking P99_LATENCY results :
>
> netperf -H remote_host -t TCP_RR -- -k P99_LATENCY

In terms of netperf overhead, once you specify P99_LATENCY, you are 
already in for the pound of cost but only getting the penny of output 
(so to speak).  While it would clutter the output, one could go ahead 
and ask for the other latency stats and it won't "cost" anything more:

... -- -k 
RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY

Additional information about how the omni output selectors work can be 
found at 
http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Omni-Output-Selection

happy benchmarking,

rick jones

BTW - you will likely see some differences between RT_LATENCY, which is 
calculated from the average transactions per second, and MEAN_LATENCY, 
which is calculated from the histogram of individual latencies 
maintained when any of the _LATENCY outputs other than RT_LATENCY is 
requested.  Kudos to the folks at Google who did the extensions to the 
then-existing histogram code to enable it to be used for more reasonably 
accurate statistics.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] softirq: reduce latencies
  2013-01-03 19:41   ` Rick Jones
@ 2013-01-04  4:41     ` Eric Dumazet
  2013-01-04  5:31       ` Sedat Dilek
  2013-01-04 11:57       ` Sedat Dilek
  0 siblings, 2 replies; 15+ messages in thread
From: Eric Dumazet @ 2013-01-04  4:41 UTC (permalink / raw)
  To: Rick Jones; +Cc: sedat.dilek, netdev, LKML

On Thu, 2013-01-03 at 11:41 -0800, Rick Jones wrote:

> In terms of netperf overhead, once you specify P99_LATENCY, you are 
> already in for the pound of cost but only getting the penny of output 
> (so to speak).  While it would clutter the output, one could go ahead 
> and ask for the other latency stats and it won't "cost" anything more:
> 
> ... -- -k 
> RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
> 
> Additional information about how the omni output selectors work can be 
> found at 
> http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Omni-Output-Selection
> 
> happy benchmarking,
> 
> rick jones
> 
> BTW - you will likely see some differences between RT_LATENCY, which is 
> calculated from the average transactions per second, and MEAN_LATENCY, 
> which is calculated from the histogram of individual latencies 
> maintained when any of the _LATENCY outputs other than RT_LATENCY is 
> requested.  Kudos to the folks at Google who did the extensions to the 
> then-existing histogram code to enable it to be used for more reasonably 
> accurate statistics.
> 

Yeah ;)

Here are the before/after_patch results, cpu 2 handling the NIC irqs :


Before patch :

# netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
RT_LATENCY=550110.424
MIN_LATENCY=146858
MAX_LATENCY=997109
P50_LATENCY=305000
P90_LATENCY=550000
P99_LATENCY=710000
MEAN_LATENCY=376989.12
STDDEV_LATENCY=184046.92

After patch :

# netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
RT_LATENCY=40545.492
MIN_LATENCY=9834
MAX_LATENCY=78366
P50_LATENCY=33583
P90_LATENCY=59000
P99_LATENCY=69000
MEAN_LATENCY=38364.67
STDDEV_LATENCY=12865.26

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] softirq: reduce latencies
  2013-01-04  4:41     ` Eric Dumazet
@ 2013-01-04  5:31       ` Sedat Dilek
  2013-01-04  6:54         ` Eric Dumazet
  2013-01-04 11:57       ` Sedat Dilek
  1 sibling, 1 reply; 15+ messages in thread
From: Sedat Dilek @ 2013-01-04  5:31 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Rick Jones, netdev, LKML, Ben Hutchings

On Fri, Jan 4, 2013 at 5:41 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2013-01-03 at 11:41 -0800, Rick Jones wrote:
>
>> In terms of netperf overhead, once you specify P99_LATENCY, you are
>> already in for the pound of cost but only getting the penny of output
>> (so to speak).  While it would clutter the output, one could go ahead
>> and ask for the other latency stats and it won't "cost" anything more:
>>
>> ... -- -k
>> RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
>>
>> Additional information about how the omni output selectors work can be
>> found at
>> http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Omni-Output-Selection
>>
>> happy benchmarking,
>>
>> rick jones
>>
>> BTW - you will likely see some differences between RT_LATENCY, which is
>> calculated from the average transactions per second, and MEAN_LATENCY,
>> which is calculated from the histogram of individual latencies
>> maintained when any of the _LATENCY outputs other than RT_LATENCY is
>> requested.  Kudos to the folks at Google who did the extensions to the
>> then-existing histogram code to enable it to be used for more reasonably
>> accurate statistics.
>>
>
> Yeah ;)
>
> Here are the before/after_patch results, cpu 2 handling the NIC irqs :
>
>
> Before patch :
>
> # netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
> RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
> MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
> to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
> RT_LATENCY=550110.424
> MIN_LATENCY=146858
> MAX_LATENCY=997109
> P50_LATENCY=305000
> P90_LATENCY=550000
> P99_LATENCY=710000
> MEAN_LATENCY=376989.12
> STDDEV_LATENCY=184046.92
>
> After patch :
>
> # netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
> RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
> MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
> to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
> RT_LATENCY=40545.492
> MIN_LATENCY=9834
> MAX_LATENCY=78366
> P50_LATENCY=33583
> P90_LATENCY=59000
> P99_LATENCY=69000
> MEAN_LATENCY=38364.67
> STDDEV_LATENCY=12865.26
>

Will you send a v2 with this change...?

-#define MAX_SOFTIRQ_TIME  min(1, (2*HZ/1000))
+#define MAX_SOFTIRQ_TIME  max(1, (2*HZ/1000))

- Sedat -

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] softirq: reduce latencies
  2013-01-04  5:31       ` Sedat Dilek
@ 2013-01-04  6:54         ` Eric Dumazet
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2013-01-04  6:54 UTC (permalink / raw)
  To: sedat.dilek; +Cc: Rick Jones, netdev, LKML, Ben Hutchings

On Fri, 2013-01-04 at 06:31 +0100, Sedat Dilek wrote:

> 
> Will you send a v2 with this change...?
> 
> -#define MAX_SOFTIRQ_TIME  min(1, (2*HZ/1000))
> +#define MAX_SOFTIRQ_TIME  max(1, (2*HZ/1000))

I will, I was planning to do this after waiting for other
comments/reviews.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] softirq: reduce latencies
  2013-01-04  4:41     ` Eric Dumazet
  2013-01-04  5:31       ` Sedat Dilek
@ 2013-01-04 11:57       ` Sedat Dilek
  1 sibling, 0 replies; 15+ messages in thread
From: Sedat Dilek @ 2013-01-04 11:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Rick Jones, netdev, LKML

[-- Attachment #1: Type: text/plain, Size: 2491 bytes --]

On Fri, Jan 4, 2013 at 5:41 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2013-01-03 at 11:41 -0800, Rick Jones wrote:
>
>> In terms of netperf overhead, once you specify P99_LATENCY, you are
>> already in for the pound of cost but only getting the penny of output
>> (so to speak).  While it would clutter the output, one could go ahead
>> and ask for the other latency stats and it won't "cost" anything more:
>>
>> ... -- -k
>> RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
>>
>> Additional information about how the omni output selectors work can be
>> found at
>> http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Omni-Output-Selection
>>
>> happy benchmarking,
>>
>> rick jones
>>
>> BTW - you will likely see some differences between RT_LATENCY, which is
>> calculated from the average transactions per second, and MEAN_LATENCY,
>> which is calculated from the histogram of individual latencies
>> maintained when any of the _LATENCY outputs other than RT_LATENCY is
>> requested.  Kudos to the folks at Google who did the extensions to the
>> then-existing histogram code to enable it to be used for more reasonably
>> accurate statistics.
>>
>
> Yeah ;)
>
> Here are the before/after_patch results, cpu 2 handling the NIC irqs :
>
>
> Before patch :
>
> # netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
> RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
> MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
> to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
> RT_LATENCY=550110.424
> MIN_LATENCY=146858
> MAX_LATENCY=997109
> P50_LATENCY=305000
> P90_LATENCY=550000
> P99_LATENCY=710000
> MEAN_LATENCY=376989.12
> STDDEV_LATENCY=184046.92
>
> After patch :
>
> # netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
> RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
> MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
> to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
> RT_LATENCY=40545.492
> MIN_LATENCY=9834
> MAX_LATENCY=78366
> P50_LATENCY=33583
> P90_LATENCY=59000
> P99_LATENCY=69000
> MEAN_LATENCY=38364.67
> STDDEV_LATENCY=12865.26
>

I also wanted to give some numbers.
But with localhost as default (no netserver running on a remote-host)
I am not sure if these numbers give any helpful feedback.
( I have not tested yet w/o your patch. )

- Sedat -

[-- Attachment #2: NETPERF_softirq-experimental.txt --]
[-- Type: text/plain, Size: 2150 bytes --]

# netperf -H localhost -t TCP_RR -T2,2 -- -k RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET : demo : first burst 0 : cpu bind
RT_LATENCY=11.407
MIN_LATENCY=10
MAX_LATENCY=152
P50_LATENCY=11
P90_LATENCY=12
P99_LATENCY=13
MEAN_LATENCY=11.31
STDDEV_LATENCY=0.95

# netperf -H localhost -t TCP_RR -T2,2 -- -k RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET : demo : first burst 0 : cpu bind
RT_LATENCY=11.364
MIN_LATENCY=10
MAX_LATENCY=156
P50_LATENCY=11
P90_LATENCY=12
P99_LATENCY=13
MEAN_LATENCY=11.27
STDDEV_LATENCY=1.06

# netperf -H localhost -t TCP_RR -T2,2 -- -k RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET : demo : first burst 0 : cpu bind
RT_LATENCY=11.408
MIN_LATENCY=10
MAX_LATENCY=156
P50_LATENCY=11
P90_LATENCY=12
P99_LATENCY=13
MEAN_LATENCY=11.31
STDDEV_LATENCY=1.07

# netperf -H localhost -t TCP_RR -T2,2 -- -k RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET : demo : first burst 0 : cpu bind
RT_LATENCY=11.312
MIN_LATENCY=10
MAX_LATENCY=151
P50_LATENCY=11
P90_LATENCY=12
P99_LATENCY=13
MEAN_LATENCY=11.22
STDDEV_LATENCY=0.94

# netperf -H localhost -t TCP_RR -T2,2 -- -k RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET : demo : first burst 0 : cpu bind
RT_LATENCY=11.345
MIN_LATENCY=10
MAX_LATENCY=159
P50_LATENCY=11
P90_LATENCY=12
P99_LATENCY=13
MEAN_LATENCY=11.25
STDDEV_LATENCY=1.03


-dileks // 04-Jan-2013


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next] softirq: reduce latencies
@ 2013-01-03 12:28 Eric Dumazet
  2013-01-03 20:46 ` Andrew Morton
  2013-01-03 22:08 ` Ben Hutchings
  0 siblings, 2 replies; 15+ messages in thread
From: Eric Dumazet @ 2013-01-03 12:28 UTC (permalink / raw)
  To: David Miller, Andrew Morton
  Cc: netdev, linux-kernel@vger.kernel.org, Tom Herbert

From: Eric Dumazet <edumazet@google.com>

In various network workloads, __do_softirq() latencies can be up
to 20 ms if HZ=1000, and 200 ms if HZ=100.

This is because we iterate 10 times in the softirq dispatcher,
and some actions can consume a lot of cycles.

This patch changes the fallback to ksoftirqd condition to :

- A time limit of 2 ms.
- need_resched() being set on current task

When one of this condition is met, we wakeup ksoftirqd for further
softirq processing if we still have pending softirqs.

I ran several benchmarks and got no significant difference in
throughput, but a very significant reduction of maximum latencies.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Miller <davem@davemloft.net>
Cc: Tom Herbert <therbert@google.com>
---
 kernel/softirq.c |   17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index ed567ba..64d61ea 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -195,21 +195,21 @@ void local_bh_enable_ip(unsigned long ip)
 EXPORT_SYMBOL(local_bh_enable_ip);
 
 /*
- * We restart softirq processing MAX_SOFTIRQ_RESTART times,
- * and we fall back to softirqd after that.
+ * We restart softirq processing for at most 2 ms,
+ * and if need_resched() is not set.
  *
- * This number has been established via experimentation.
+ * These limits have been established via experimentation.
  * The two things to balance is latency against fairness -
  * we want to handle softirqs as soon as possible, but they
  * should not be able to lock up the box.
  */
-#define MAX_SOFTIRQ_RESTART 10
+#define MAX_SOFTIRQ_TIME  min(1, (2*HZ/1000))
 
 asmlinkage void __do_softirq(void)
 {
 	struct softirq_action *h;
 	__u32 pending;
-	int max_restart = MAX_SOFTIRQ_RESTART;
+	unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
 	int cpu;
 	unsigned long old_flags = current->flags;
 
@@ -264,11 +264,12 @@ restart:
 	local_irq_disable();
 
 	pending = local_softirq_pending();
-	if (pending && --max_restart)
-		goto restart;
+	if (pending) {
+		if (time_before(jiffies, end) && !need_resched())
+			goto restart;
 
-	if (pending)
 		wakeup_softirqd();
+	}
 
 	lockdep_softirq_exit();
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] softirq: reduce latencies
  2013-01-03 12:28 Eric Dumazet
@ 2013-01-03 20:46 ` Andrew Morton
  2013-01-03 22:41   ` Eric Dumazet
  2013-01-03 22:08 ` Ben Hutchings
  1 sibling, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2013-01-03 20:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, linux-kernel@vger.kernel.org, Tom Herbert

On Thu, 03 Jan 2013 04:28:52 -0800
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> From: Eric Dumazet <edumazet@google.com>
> 
> In various network workloads, __do_softirq() latencies can be up
> to 20 ms if HZ=1000, and 200 ms if HZ=100.
> 
> This is because we iterate 10 times in the softirq dispatcher,
> and some actions can consume a lot of cycles.

hm, where did that "20 ms" come from?  What caused it?  Is it simply
the case that you happened to have actions which consume 2ms if HZ=1000
and 20ms if HZ=100?

> This patch changes the fallback to ksoftirqd condition to :
> 
> - A time limit of 2 ms.
> - need_resched() being set on current task
>
> When one of this condition is met, we wakeup ksoftirqd for further
> softirq processing if we still have pending softirqs.

Do we need both tests?  The need_resched() test alone might be
sufficient?

With this change, there is a possibility that a rapidly-rescheduling
task will cause softirq starvation?

Can this change cause worsened latencies in some situations?  Say there
are a large number of short-running actions queued.  Presently we'll
dispatch ten of them and return.  With this change we'll dispatch many
more of them - however many consume 2ms.  So worst-case latency
increases from "10 * not-much" to "2 ms".

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] softirq: reduce latencies
  2013-01-03 20:46 ` Andrew Morton
@ 2013-01-03 22:41   ` Eric Dumazet
  2013-01-04  5:16     ` Namhyung Kim
       [not found]     ` <787701357283699@web24e.yandex.ru>
  0 siblings, 2 replies; 15+ messages in thread
From: Eric Dumazet @ 2013-01-03 22:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, netdev, linux-kernel@vger.kernel.org, Tom Herbert

On Thu, 2013-01-03 at 12:46 -0800, Andrew Morton wrote:
> On Thu, 03 Jan 2013 04:28:52 -0800
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > In various network workloads, __do_softirq() latencies can be up
> > to 20 ms if HZ=1000, and 200 ms if HZ=100.
> > 
> > This is because we iterate 10 times in the softirq dispatcher,
> > and some actions can consume a lot of cycles.
> 
> hm, where did that "20 ms" come from?  What caused it?  Is it simply
> the case that you happened to have actions which consume 2ms if HZ=1000
> and 20ms if HZ=100?

net_rx_action() has such behavior yes.

In the worst/busy case, we spend 2 ticks per call, and even more
in some cases you dont want to know about (like triggering a IPv6 route
garbage collect)

> 
> > This patch changes the fallback to ksoftirqd condition to :
> > 
> > - A time limit of 2 ms.
> > - need_resched() being set on current task
> >
> > When one of this condition is met, we wakeup ksoftirqd for further
> > softirq processing if we still have pending softirqs.
> 
> Do we need both tests?  The need_resched() test alone might be
> sufficient?
> 

I tried a need_resched() only, but could trigger watchdog faults and
reboots, in case a cpu was dedicated to softirq and all other tasks run
on other cpus.

In other cases, the following RCU splat was triggered :

(need_resched() doesnt know the current cpu is blocking RCU )

Jan  2 21:33:40 lpq83 kernel: [  311.678050] INFO: rcu_sched self-detected stall on CPU { 2}  (t=21000 jiffies g=11416 c=11415 q=2665)
Jan  2 21:33:40 lpq83 kernel: [  311.687314] Pid: 1460, comm: simple-watchdog Not tainted 3.8.0-smp-DEV #63
Jan  2 21:33:40 lpq83 kernel: [  311.687316] Call Trace:
Jan  2 21:33:40 lpq83 kernel: [  311.687317]  <IRQ>  [<ffffffff81100e92>] rcu_check_callbacks+0x212/0x7a0
Jan  2 21:33:40 lpq83 kernel: [  311.687326]  [<ffffffff81097018>] update_process_times+0x48/0x90
Jan  2 21:33:40 lpq83 kernel: [  311.687329]  [<ffffffff810cfe31>] tick_sched_timer+0x81/0xd0
Jan  2 21:33:40 lpq83 kernel: [  311.687332]  [<ffffffff810ad69d>] __run_hrtimer+0x7d/0x220
Jan  2 21:33:40 lpq83 kernel: [  311.687333]  [<ffffffff810cfdb0>] ? tick_nohz_handler+0x100/0x100
Jan  2 21:33:40 lpq83 kernel: [  311.687337]  [<ffffffff810ca02c>] ? ktime_get_update_offsets+0x4c/0xd0
Jan  2 21:33:40 lpq83 kernel: [  311.687339]  [<ffffffff810adfa7>] hrtimer_interrupt+0xf7/0x230
Jan  2 21:33:40 lpq83 kernel: [  311.687343]  [<ffffffff815b7089>] smp_apic_timer_interrupt+0x69/0x99
Jan  2 21:33:40 lpq83 kernel: [  311.687345]  [<ffffffff815b630a>] apic_timer_interrupt+0x6a/0x70
Jan  2 21:33:40 lpq83 kernel: [  311.687348]  [<ffffffffa01a1b66>] ? ipt_do_table+0x106/0x5b0 [ip_tables]
Jan  2 21:33:40 lpq83 kernel: [  311.687352]  [<ffffffff810fb357>] ? handle_edge_irq+0x77/0x130
Jan  2 21:33:40 lpq83 kernel: [  311.687354]  [<ffffffff8108e9c9>] ? irq_exit+0x79/0xb0
Jan  2 21:33:40 lpq83 kernel: [  311.687356]  [<ffffffff815b6fa3>] ? do_IRQ+0x63/0xe0
Jan  2 21:33:40 lpq83 kernel: [  311.687359]  [<ffffffffa004a0d3>] iptable_filter_hook+0x33/0x64 [iptable_filter]
Jan  2 21:33:40 lpq83 kernel: [  311.687362]  [<ffffffff81523dff>] nf_iterate+0x8f/0xd0
Jan  2 21:33:40 lpq83 kernel: [  311.687364]  [<ffffffff81529fc0>] ? ip_rcv_finish+0x360/0x360
Jan  2 21:33:40 lpq83 kernel: [  311.687366]  [<ffffffff81523ebd>] nf_hook_slow+0x7d/0x150
Jan  2 21:33:40 lpq83 kernel: [  311.687368]  [<ffffffff81529fc0>] ? ip_rcv_finish+0x360/0x360
Jan  2 21:33:40 lpq83 kernel: [  311.687370]  [<ffffffff8152a39e>] ip_local_deliver+0x5e/0xa0
Jan  2 21:33:40 lpq83 kernel: [  311.687372]  [<ffffffff81529d79>] ip_rcv_finish+0x119/0x360
Jan  2 21:33:40 lpq83 kernel: [  311.687374]  [<ffffffff8152a631>] ip_rcv+0x251/0x300
Jan  2 21:33:40 lpq83 kernel: [  311.687377]  [<ffffffff814f4c72>] __netif_receive_skb+0x582/0x820
Jan  2 21:33:40 lpq83 kernel: [  311.687379]  [<ffffffff81560697>] ? inet_gro_receive+0x197/0x200
Jan  2 21:33:40 lpq83 kernel: [  311.687381]  [<ffffffff814f50ad>] netif_receive_skb+0x2d/0x90
Jan  2 21:33:40 lpq83 kernel: [  311.687383]  [<ffffffff814f5943>] napi_gro_frags+0xf3/0x2a0
Jan  2 21:33:40 lpq83 kernel: [  311.687387]  [<ffffffffa01aa87c>] mlx4_en_process_rx_cq+0x6cc/0x7b0 [mlx4_en]
Jan  2 21:33:40 lpq83 kernel: [  311.687390]  [<ffffffffa01aa9ff>] mlx4_en_poll_rx_cq+0x3f/0x80 [mlx4_en]
Jan  2 21:33:40 lpq83 kernel: [  311.687392]  [<ffffffff814f53c1>] net_rx_action+0x111/0x210
Jan  2 21:33:40 lpq83 kernel: [  311.687393]  [<ffffffff814f3a51>] ? net_tx_action+0x81/0x1d0


> 
> With this change, there is a possibility that a rapidly-rescheduling
> task will cause softirq starvation?
> 

Only if this task has higher priority than ksoftirqd, but then, people
wanting/playing with high priority tasks know what they do ;)

> 
> Can this change cause worsened latencies in some situations?  Say there
> are a large number of short-running actions queued.  Presently we'll
> dispatch ten of them and return.  With this change we'll dispatch many
> more of them - however many consume 2ms.  So worst-case latency
> increases from "10 * not-much" to "2 ms".

I tried to reproduce such workload but couldnt. 2 ms (or more exactly 1
to 2 ms given the jiffies/HZ granularity) is about the time needed to
process 1000 frames on current hardware.

Certainly, this patch will increase number of scheduler calls in some
situations. But with the increase of cores, it seems a bit odd to allow
softirq to be a bad guy. The current logic was more suited for the !SMP
age.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] softirq: reduce latencies
  2013-01-03 22:41   ` Eric Dumazet
@ 2013-01-04  5:16     ` Namhyung Kim
  2013-01-04  6:53       ` Eric Dumazet
       [not found]     ` <787701357283699@web24e.yandex.ru>
  1 sibling, 1 reply; 15+ messages in thread
From: Namhyung Kim @ 2013-01-04  5:16 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrew Morton, David Miller, netdev, linux-kernel@vger.kernel.org,
	Tom Herbert

Hi,

On Thu, 03 Jan 2013 14:41:15 -0800, Eric Dumazet wrote:
> On Thu, 2013-01-03 at 12:46 -0800, Andrew Morton wrote:
>> Can this change cause worsened latencies in some situations?  Say there
>> are a large number of short-running actions queued.  Presently we'll
>> dispatch ten of them and return.  With this change we'll dispatch many
>> more of them - however many consume 2ms.  So worst-case latency
>> increases from "10 * not-much" to "2 ms".
>
> I tried to reproduce such workload but couldnt. 2 ms (or more exactly 1
> to 2 ms given the jiffies/HZ granularity) is about the time needed to
> process 1000 frames on current hardware.

Probably a silly question:

Why not using ktime rather than jiffies for this?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] softirq: reduce latencies
  2013-01-04  5:16     ` Namhyung Kim
@ 2013-01-04  6:53       ` Eric Dumazet
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2013-01-04  6:53 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Andrew Morton, David Miller, netdev, linux-kernel@vger.kernel.org,
	Tom Herbert

On Fri, 2013-01-04 at 14:16 +0900, Namhyung Kim wrote:

> Probably a silly question:
> 
> Why not using ktime rather than jiffies for this?

ktime is too expensive on some hardware.

Here we only want a safety belt, no need for high time resolution.

^ permalink raw reply	[flat|nested] 15+ messages in thread

[parent not found: <787701357283699@web24e.yandex.ru>]

* Re: [PATCH net-next] softirq: reduce latencies
       [not found]     ` <787701357283699@web24e.yandex.ru>
@ 2013-01-04  7:46       ` Eric Dumazet
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2013-01-04  7:46 UTC (permalink / raw)
  To: Oleg A.Arkhangelsky
  Cc: Andrew Morton, David Miller, netdev, linux-kernel@vger.kernel.org,
	Tom Herbert

On Fri, 2013-01-04 at 11:14 +0400, Oleg A.Arkhangelsky wrote:

> It leads to many context switches when softirqs processing deffered to
> ksoftirqd kthreads which can be very expensive. Here is some evidence
> of ksoftirqd activation effects:
> 
> http://marc.info/?l=linux-netdev&m=124116262916969&w=2
> 
> Look for "magic threshold". Yes, I know there was another bug in scheduler
> discovered that time, but this bug was only about tick accounting.
> 

This thread is 3 years old : 

- It was a router workload. Forwarded packets should not wakeup a task.
- The measure of how cpus spent their cycles was completely wrong.
- A lot of things have changed, both in network stack and scheduler.

In fact, under moderate load, my patch is able to loop more than 10
times before deferring to ksoftirqd.

Under stress, ksoftirqd will be started anyway, and its a good thing,
because it enables process migration.

500 "context switches" [1] per second instead of 50 on behalf of
ksoftirqd is absolutely not measurable. It also permits smoother RCU
cleanups.

I did a lot of benchmarks, and didnt see any regression yet, but usual
noise.

[1] Under load, __do_softirq() would be called 500 times per second,
instead of ~50 times per second.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] softirq: reduce latencies
  2013-01-03 12:28 Eric Dumazet
  2013-01-03 20:46 ` Andrew Morton
@ 2013-01-03 22:08 ` Ben Hutchings
  2013-01-03 22:40   ` Eric Dumazet
  1 sibling, 1 reply; 15+ messages in thread
From: Ben Hutchings @ 2013-01-03 22:08 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Andrew Morton, netdev, linux-kernel@vger.kernel.org,
	Tom Herbert

On Thu, 2013-01-03 at 04:28 -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> In various network workloads, __do_softirq() latencies can be up
> to 20 ms if HZ=1000, and 200 ms if HZ=100.
> 
> This is because we iterate 10 times in the softirq dispatcher,
> and some actions can consume a lot of cycles.
> 
> This patch changes the fallback to ksoftirqd condition to :
> 
> - A time limit of 2 ms.
> - need_resched() being set on current task
[...]
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
[...]
> -#define MAX_SOFTIRQ_RESTART 10
> +#define MAX_SOFTIRQ_TIME  min(1, (2*HZ/1000))
[...]

Really?  Never iterate if HZ < 500?

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] softirq: reduce latencies
  2013-01-03 22:08 ` Ben Hutchings
@ 2013-01-03 22:40   ` Eric Dumazet
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2013-01-03 22:40 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David Miller, Andrew Morton, netdev, linux-kernel@vger.kernel.org,
	Tom Herbert

On Thu, 2013-01-03 at 22:08 +0000, Ben Hutchings wrote:
> On Thu, 2013-01-03 at 04:28 -0800, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > In various network workloads, __do_softirq() latencies can be up
> > to 20 ms if HZ=1000, and 200 ms if HZ=100.
> > 
> > This is because we iterate 10 times in the softirq dispatcher,
> > and some actions can consume a lot of cycles.
> > 
> > This patch changes the fallback to ksoftirqd condition to :
> > 
> > - A time limit of 2 ms.
> > - need_resched() being set on current task
> [...]
> > --- a/kernel/softirq.c
> > +++ b/kernel/softirq.c
> [...]
> > -#define MAX_SOFTIRQ_RESTART 10
> > +#define MAX_SOFTIRQ_TIME  min(1, (2*HZ/1000))
> [...]
> 
> Really?  Never iterate if HZ < 500?
> 

good catch, it should be max()

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-01-04 11:57 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-03 13:12 [PATCH net-next] softirq: reduce latencies Sedat Dilek
2013-01-03 13:31 ` Eric Dumazet
2013-01-03 19:41   ` Rick Jones
2013-01-04  4:41     ` Eric Dumazet
2013-01-04  5:31       ` Sedat Dilek
2013-01-04  6:54         ` Eric Dumazet
2013-01-04 11:57       ` Sedat Dilek
  -- strict thread matches above, loose matches on Subject: below --
2013-01-03 12:28 Eric Dumazet
2013-01-03 20:46 ` Andrew Morton
2013-01-03 22:41   ` Eric Dumazet
2013-01-04  5:16     ` Namhyung Kim
2013-01-04  6:53       ` Eric Dumazet
     [not found]     ` <787701357283699@web24e.yandex.ru>
2013-01-04  7:46       ` Eric Dumazet
2013-01-03 22:08 ` Ben Hutchings
2013-01-03 22:40   ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).