[PATCH net-next] softirq: reduce latencies

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next] softirq: reduce latencies
@ 2013-01-03 12:28 Eric Dumazet
  2013-01-03 20:46 ` Andrew Morton
  2013-01-03 22:08 ` Ben Hutchings
  0 siblings, 2 replies; 22+ messages in thread
From: Eric Dumazet @ 2013-01-03 12:28 UTC (permalink / raw)
  To: David Miller, Andrew Morton
  Cc: netdev, linux-kernel@vger.kernel.org, Tom Herbert

From: Eric Dumazet <edumazet@google.com>

In various network workloads, __do_softirq() latencies can be up
to 20 ms if HZ=1000, and 200 ms if HZ=100.

This is because we iterate 10 times in the softirq dispatcher,
and some actions can consume a lot of cycles.

This patch changes the fallback to ksoftirqd condition to :

- A time limit of 2 ms.
- need_resched() being set on current task

When one of this condition is met, we wakeup ksoftirqd for further
softirq processing if we still have pending softirqs.

I ran several benchmarks and got no significant difference in
throughput, but a very significant reduction of maximum latencies.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Miller <davem@davemloft.net>
Cc: Tom Herbert <therbert@google.com>
---
 kernel/softirq.c |   17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index ed567ba..64d61ea 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -195,21 +195,21 @@ void local_bh_enable_ip(unsigned long ip)
 EXPORT_SYMBOL(local_bh_enable_ip);
 
 /*
- * We restart softirq processing MAX_SOFTIRQ_RESTART times,
- * and we fall back to softirqd after that.
+ * We restart softirq processing for at most 2 ms,
+ * and if need_resched() is not set.
  *
- * This number has been established via experimentation.
+ * These limits have been established via experimentation.
  * The two things to balance is latency against fairness -
  * we want to handle softirqs as soon as possible, but they
  * should not be able to lock up the box.
  */
-#define MAX_SOFTIRQ_RESTART 10
+#define MAX_SOFTIRQ_TIME  min(1, (2*HZ/1000))
 
 asmlinkage void __do_softirq(void)
 {
 	struct softirq_action *h;
 	__u32 pending;
-	int max_restart = MAX_SOFTIRQ_RESTART;
+	unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
 	int cpu;
 	unsigned long old_flags = current->flags;
 
@@ -264,11 +264,12 @@ restart:
 	local_irq_disable();
 
 	pending = local_softirq_pending();
-	if (pending && --max_restart)
-		goto restart;
+	if (pending) {
+		if (time_before(jiffies, end) && !need_resched())
+			goto restart;
 
-	if (pending)
 		wakeup_softirqd();
+	}
 
 	lockdep_softirq_exit();
 

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next] softirq: reduce latencies
@ 2013-01-03 13:12 Sedat Dilek
  2013-01-03 13:31 ` Eric Dumazet
  0 siblings, 1 reply; 22+ messages in thread
From: Sedat Dilek @ 2013-01-03 13:12 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, LKML

Hi Eric,

your patch from [2] applies cleanly on top of Linux v3.8-rc2.
I would like to test it.
In [1] you were talking about benchmarks you did.
Can you describe them or provide a testcase (script etc.)?
You made only network testing?
Thanks in advance.

Regards,
- Sedat -

[1] http://marc.info/?l=linux-kernel&m=135721614718434&w=2
[2] https://patchwork.kernel.org/patch/1927531/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next] softirq: reduce latencies
  2013-01-03 13:12 [PATCH " Sedat Dilek
@ 2013-01-03 13:31 ` Eric Dumazet
  2013-01-03 19:41   ` Rick Jones
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2013-01-03 13:31 UTC (permalink / raw)
  To: sedat.dilek; +Cc: netdev, LKML

On Thu, 2013-01-03 at 14:12 +0100, Sedat Dilek wrote:
> Hi Eric,
> 
> your patch from [2] applies cleanly on top of Linux v3.8-rc2.
> I would like to test it.
> In [1] you were talking about benchmarks you did.
> Can you describe them or provide a testcase (script etc.)?
> You made only network testing?

Yes I did network testing : 

- net_rx_action() softirq handler is the typical function that can
consume 2 ms per call.

I did some netperf sessions, with multiqueue 10G nics, tuned to that IRQ
would be handled by few cpus.
 (check /proc/irq/*/eth0-$QUEUE/../smp_affinity )

Another way to make the softirq processing use more cpu cycles is by
adding a fake iptable setup like :

for n in `seq 1 100`
do
 iptables -I INPUT
done

A common network load is to launch ~200 concurrent TCP_RR netperf
sessions like the following

netperf -H remote_host -t TCP_RR -l 1000

And then you can launch some netperf asking P99_LATENCY results :

netperf -H remote_host -t TCP_RR -- -k P99_LATENCY

You can play with taskset or netperf option -T to force
netperf/netserver running on given cpus.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next] softirq: reduce latencies
  2013-01-03 13:31 ` Eric Dumazet
@ 2013-01-03 19:41   ` Rick Jones
  2013-01-04  4:41     ` Eric Dumazet
  0 siblings, 1 reply; 22+ messages in thread
From: Rick Jones @ 2013-01-03 19:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: sedat.dilek, netdev, LKML

On 01/03/2013 05:31 AM, Eric Dumazet wrote:
> A common network load is to launch ~200 concurrent TCP_RR netperf
> sessions like the following
>
> netperf -H remote_host -t TCP_RR -l 1000

> And then you can launch some netperf asking P99_LATENCY results :
>
> netperf -H remote_host -t TCP_RR -- -k P99_LATENCY

In terms of netperf overhead, once you specify P99_LATENCY, you are 
already in for the pound of cost but only getting the penny of output 
(so to speak).  While it would clutter the output, one could go ahead 
and ask for the other latency stats and it won't "cost" anything more:

... -- -k 
RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY

Additional information about how the omni output selectors work can be 
found at 
http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Omni-Output-Selection

happy benchmarking,

rick jones

BTW - you will likely see some differences between RT_LATENCY, which is 
calculated from the average transactions per second, and MEAN_LATENCY, 
which is calculated from the histogram of individual latencies 
maintained when any of the _LATENCY outputs other than RT_LATENCY is 
requested.  Kudos to the folks at Google who did the extensions to the 
then-existing histogram code to enable it to be used for more reasonably 
accurate statistics.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next] softirq: reduce latencies
  2013-01-03 12:28 [PATCH net-next] softirq: reduce latencies Eric Dumazet
@ 2013-01-03 20:46 ` Andrew Morton
  2013-01-03 22:41   ` Eric Dumazet
  2013-01-03 22:08 ` Ben Hutchings
  1 sibling, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2013-01-03 20:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, linux-kernel@vger.kernel.org, Tom Herbert

On Thu, 03 Jan 2013 04:28:52 -0800
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> From: Eric Dumazet <edumazet@google.com>
> 
> In various network workloads, __do_softirq() latencies can be up
> to 20 ms if HZ=1000, and 200 ms if HZ=100.
> 
> This is because we iterate 10 times in the softirq dispatcher,
> and some actions can consume a lot of cycles.

hm, where did that "20 ms" come from?  What caused it?  Is it simply
the case that you happened to have actions which consume 2ms if HZ=1000
and 20ms if HZ=100?

> This patch changes the fallback to ksoftirqd condition to :
> 
> - A time limit of 2 ms.
> - need_resched() being set on current task
>
> When one of this condition is met, we wakeup ksoftirqd for further
> softirq processing if we still have pending softirqs.

Do we need both tests?  The need_resched() test alone might be
sufficient?

With this change, there is a possibility that a rapidly-rescheduling
task will cause softirq starvation?

Can this change cause worsened latencies in some situations?  Say there
are a large number of short-running actions queued.  Presently we'll
dispatch ten of them and return.  With this change we'll dispatch many
more of them - however many consume 2ms.  So worst-case latency
increases from "10 * not-much" to "2 ms".

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next] softirq: reduce latencies
  2013-01-03 12:28 [PATCH net-next] softirq: reduce latencies Eric Dumazet
  2013-01-03 20:46 ` Andrew Morton
@ 2013-01-03 22:08 ` Ben Hutchings
  2013-01-03 22:40   ` Eric Dumazet
  1 sibling, 1 reply; 22+ messages in thread
From: Ben Hutchings @ 2013-01-03 22:08 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Andrew Morton, netdev, linux-kernel@vger.kernel.org,
	Tom Herbert

On Thu, 2013-01-03 at 04:28 -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> In various network workloads, __do_softirq() latencies can be up
> to 20 ms if HZ=1000, and 200 ms if HZ=100.
> 
> This is because we iterate 10 times in the softirq dispatcher,
> and some actions can consume a lot of cycles.
> 
> This patch changes the fallback to ksoftirqd condition to :
> 
> - A time limit of 2 ms.
> - need_resched() being set on current task
[...]
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
[...]
> -#define MAX_SOFTIRQ_RESTART 10
> +#define MAX_SOFTIRQ_TIME  min(1, (2*HZ/1000))
[...]

Really?  Never iterate if HZ < 500?

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next] softirq: reduce latencies
  2013-01-03 22:08 ` Ben Hutchings
@ 2013-01-03 22:40   ` Eric Dumazet
  2013-01-04  7:49     ` [PATCH v2 " Eric Dumazet
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2013-01-03 22:40 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David Miller, Andrew Morton, netdev, linux-kernel@vger.kernel.org,
	Tom Herbert

On Thu, 2013-01-03 at 22:08 +0000, Ben Hutchings wrote:
> On Thu, 2013-01-03 at 04:28 -0800, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > In various network workloads, __do_softirq() latencies can be up
> > to 20 ms if HZ=1000, and 200 ms if HZ=100.
> > 
> > This is because we iterate 10 times in the softirq dispatcher,
> > and some actions can consume a lot of cycles.
> > 
> > This patch changes the fallback to ksoftirqd condition to :
> > 
> > - A time limit of 2 ms.
> > - need_resched() being set on current task
> [...]
> > --- a/kernel/softirq.c
> > +++ b/kernel/softirq.c
> [...]
> > -#define MAX_SOFTIRQ_RESTART 10
> > +#define MAX_SOFTIRQ_TIME  min(1, (2*HZ/1000))
> [...]
> 
> Really?  Never iterate if HZ < 500?
> 

good catch, it should be max()

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next] softirq: reduce latencies
  2013-01-03 20:46 ` Andrew Morton
@ 2013-01-03 22:41   ` Eric Dumazet
  2013-01-04  5:16     ` Namhyung Kim
       [not found]     ` <787701357283699@web24e.yandex.ru>
  0 siblings, 2 replies; 22+ messages in thread
From: Eric Dumazet @ 2013-01-03 22:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, netdev, linux-kernel@vger.kernel.org, Tom Herbert

On Thu, 2013-01-03 at 12:46 -0800, Andrew Morton wrote:
> On Thu, 03 Jan 2013 04:28:52 -0800
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > In various network workloads, __do_softirq() latencies can be up
> > to 20 ms if HZ=1000, and 200 ms if HZ=100.
> > 
> > This is because we iterate 10 times in the softirq dispatcher,
> > and some actions can consume a lot of cycles.
> 
> hm, where did that "20 ms" come from?  What caused it?  Is it simply
> the case that you happened to have actions which consume 2ms if HZ=1000
> and 20ms if HZ=100?

net_rx_action() has such behavior yes.

In the worst/busy case, we spend 2 ticks per call, and even more
in some cases you dont want to know about (like triggering a IPv6 route
garbage collect)

> 
> > This patch changes the fallback to ksoftirqd condition to :
> > 
> > - A time limit of 2 ms.
> > - need_resched() being set on current task
> >
> > When one of this condition is met, we wakeup ksoftirqd for further
> > softirq processing if we still have pending softirqs.
> 
> Do we need both tests?  The need_resched() test alone might be
> sufficient?
> 

I tried a need_resched() only, but could trigger watchdog faults and
reboots, in case a cpu was dedicated to softirq and all other tasks run
on other cpus.

In other cases, the following RCU splat was triggered :

(need_resched() doesnt know the current cpu is blocking RCU )

Jan  2 21:33:40 lpq83 kernel: [  311.678050] INFO: rcu_sched self-detected stall on CPU { 2}  (t=21000 jiffies g=11416 c=11415 q=2665)
Jan  2 21:33:40 lpq83 kernel: [  311.687314] Pid: 1460, comm: simple-watchdog Not tainted 3.8.0-smp-DEV #63
Jan  2 21:33:40 lpq83 kernel: [  311.687316] Call Trace:
Jan  2 21:33:40 lpq83 kernel: [  311.687317]  <IRQ>  [<ffffffff81100e92>] rcu_check_callbacks+0x212/0x7a0
Jan  2 21:33:40 lpq83 kernel: [  311.687326]  [<ffffffff81097018>] update_process_times+0x48/0x90
Jan  2 21:33:40 lpq83 kernel: [  311.687329]  [<ffffffff810cfe31>] tick_sched_timer+0x81/0xd0
Jan  2 21:33:40 lpq83 kernel: [  311.687332]  [<ffffffff810ad69d>] __run_hrtimer+0x7d/0x220
Jan  2 21:33:40 lpq83 kernel: [  311.687333]  [<ffffffff810cfdb0>] ? tick_nohz_handler+0x100/0x100
Jan  2 21:33:40 lpq83 kernel: [  311.687337]  [<ffffffff810ca02c>] ? ktime_get_update_offsets+0x4c/0xd0
Jan  2 21:33:40 lpq83 kernel: [  311.687339]  [<ffffffff810adfa7>] hrtimer_interrupt+0xf7/0x230
Jan  2 21:33:40 lpq83 kernel: [  311.687343]  [<ffffffff815b7089>] smp_apic_timer_interrupt+0x69/0x99
Jan  2 21:33:40 lpq83 kernel: [  311.687345]  [<ffffffff815b630a>] apic_timer_interrupt+0x6a/0x70
Jan  2 21:33:40 lpq83 kernel: [  311.687348]  [<ffffffffa01a1b66>] ? ipt_do_table+0x106/0x5b0 [ip_tables]
Jan  2 21:33:40 lpq83 kernel: [  311.687352]  [<ffffffff810fb357>] ? handle_edge_irq+0x77/0x130
Jan  2 21:33:40 lpq83 kernel: [  311.687354]  [<ffffffff8108e9c9>] ? irq_exit+0x79/0xb0
Jan  2 21:33:40 lpq83 kernel: [  311.687356]  [<ffffffff815b6fa3>] ? do_IRQ+0x63/0xe0
Jan  2 21:33:40 lpq83 kernel: [  311.687359]  [<ffffffffa004a0d3>] iptable_filter_hook+0x33/0x64 [iptable_filter]
Jan  2 21:33:40 lpq83 kernel: [  311.687362]  [<ffffffff81523dff>] nf_iterate+0x8f/0xd0
Jan  2 21:33:40 lpq83 kernel: [  311.687364]  [<ffffffff81529fc0>] ? ip_rcv_finish+0x360/0x360
Jan  2 21:33:40 lpq83 kernel: [  311.687366]  [<ffffffff81523ebd>] nf_hook_slow+0x7d/0x150
Jan  2 21:33:40 lpq83 kernel: [  311.687368]  [<ffffffff81529fc0>] ? ip_rcv_finish+0x360/0x360
Jan  2 21:33:40 lpq83 kernel: [  311.687370]  [<ffffffff8152a39e>] ip_local_deliver+0x5e/0xa0
Jan  2 21:33:40 lpq83 kernel: [  311.687372]  [<ffffffff81529d79>] ip_rcv_finish+0x119/0x360
Jan  2 21:33:40 lpq83 kernel: [  311.687374]  [<ffffffff8152a631>] ip_rcv+0x251/0x300
Jan  2 21:33:40 lpq83 kernel: [  311.687377]  [<ffffffff814f4c72>] __netif_receive_skb+0x582/0x820
Jan  2 21:33:40 lpq83 kernel: [  311.687379]  [<ffffffff81560697>] ? inet_gro_receive+0x197/0x200
Jan  2 21:33:40 lpq83 kernel: [  311.687381]  [<ffffffff814f50ad>] netif_receive_skb+0x2d/0x90
Jan  2 21:33:40 lpq83 kernel: [  311.687383]  [<ffffffff814f5943>] napi_gro_frags+0xf3/0x2a0
Jan  2 21:33:40 lpq83 kernel: [  311.687387]  [<ffffffffa01aa87c>] mlx4_en_process_rx_cq+0x6cc/0x7b0 [mlx4_en]
Jan  2 21:33:40 lpq83 kernel: [  311.687390]  [<ffffffffa01aa9ff>] mlx4_en_poll_rx_cq+0x3f/0x80 [mlx4_en]
Jan  2 21:33:40 lpq83 kernel: [  311.687392]  [<ffffffff814f53c1>] net_rx_action+0x111/0x210
Jan  2 21:33:40 lpq83 kernel: [  311.687393]  [<ffffffff814f3a51>] ? net_tx_action+0x81/0x1d0


> 
> With this change, there is a possibility that a rapidly-rescheduling
> task will cause softirq starvation?
> 

Only if this task has higher priority than ksoftirqd, but then, people
wanting/playing with high priority tasks know what they do ;)

> 
> Can this change cause worsened latencies in some situations?  Say there
> are a large number of short-running actions queued.  Presently we'll
> dispatch ten of them and return.  With this change we'll dispatch many
> more of them - however many consume 2ms.  So worst-case latency
> increases from "10 * not-much" to "2 ms".

I tried to reproduce such workload but couldnt. 2 ms (or more exactly 1
to 2 ms given the jiffies/HZ granularity) is about the time needed to
process 1000 frames on current hardware.

Certainly, this patch will increase number of scheduler calls in some
situations. But with the increase of cores, it seems a bit odd to allow
softirq to be a bad guy. The current logic was more suited for the !SMP
age.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next] softirq: reduce latencies
  2013-01-03 19:41   ` Rick Jones
@ 2013-01-04  4:41     ` Eric Dumazet
  2013-01-04  5:31       ` Sedat Dilek
  2013-01-04 11:57       ` Sedat Dilek
  0 siblings, 2 replies; 22+ messages in thread
From: Eric Dumazet @ 2013-01-04  4:41 UTC (permalink / raw)
  To: Rick Jones; +Cc: sedat.dilek, netdev, LKML

On Thu, 2013-01-03 at 11:41 -0800, Rick Jones wrote:

> In terms of netperf overhead, once you specify P99_LATENCY, you are 
> already in for the pound of cost but only getting the penny of output 
> (so to speak).  While it would clutter the output, one could go ahead 
> and ask for the other latency stats and it won't "cost" anything more:
> 
> ... -- -k 
> RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
> 
> Additional information about how the omni output selectors work can be 
> found at 
> http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Omni-Output-Selection
> 
> happy benchmarking,
> 
> rick jones
> 
> BTW - you will likely see some differences between RT_LATENCY, which is 
> calculated from the average transactions per second, and MEAN_LATENCY, 
> which is calculated from the histogram of individual latencies 
> maintained when any of the _LATENCY outputs other than RT_LATENCY is 
> requested.  Kudos to the folks at Google who did the extensions to the 
> then-existing histogram code to enable it to be used for more reasonably 
> accurate statistics.
> 

Yeah ;)

Here are the before/after_patch results, cpu 2 handling the NIC irqs :


Before patch :

# netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
RT_LATENCY=550110.424
MIN_LATENCY=146858
MAX_LATENCY=997109
P50_LATENCY=305000
P90_LATENCY=550000
P99_LATENCY=710000
MEAN_LATENCY=376989.12
STDDEV_LATENCY=184046.92

After patch :

# netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
RT_LATENCY=40545.492
MIN_LATENCY=9834
MAX_LATENCY=78366
P50_LATENCY=33583
P90_LATENCY=59000
P99_LATENCY=69000
MEAN_LATENCY=38364.67
STDDEV_LATENCY=12865.26

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next] softirq: reduce latencies
  2013-01-03 22:41   ` Eric Dumazet
@ 2013-01-04  5:16     ` Namhyung Kim
  2013-01-04  6:53       ` Eric Dumazet
       [not found]     ` <787701357283699@web24e.yandex.ru>
  1 sibling, 1 reply; 22+ messages in thread
From: Namhyung Kim @ 2013-01-04  5:16 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrew Morton, David Miller, netdev, linux-kernel@vger.kernel.org,
	Tom Herbert

Hi,

On Thu, 03 Jan 2013 14:41:15 -0800, Eric Dumazet wrote:
> On Thu, 2013-01-03 at 12:46 -0800, Andrew Morton wrote:
>> Can this change cause worsened latencies in some situations?  Say there
>> are a large number of short-running actions queued.  Presently we'll
>> dispatch ten of them and return.  With this change we'll dispatch many
>> more of them - however many consume 2ms.  So worst-case latency
>> increases from "10 * not-much" to "2 ms".
>
> I tried to reproduce such workload but couldnt. 2 ms (or more exactly 1
> to 2 ms given the jiffies/HZ granularity) is about the time needed to
> process 1000 frames on current hardware.

Probably a silly question:

Why not using ktime rather than jiffies for this?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next] softirq: reduce latencies
  2013-01-04  4:41     ` Eric Dumazet
@ 2013-01-04  5:31       ` Sedat Dilek
  2013-01-04  6:54         ` Eric Dumazet
  2013-01-04 11:57       ` Sedat Dilek
  1 sibling, 1 reply; 22+ messages in thread
From: Sedat Dilek @ 2013-01-04  5:31 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Rick Jones, netdev, LKML, Ben Hutchings

On Fri, Jan 4, 2013 at 5:41 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2013-01-03 at 11:41 -0800, Rick Jones wrote:
>
>> In terms of netperf overhead, once you specify P99_LATENCY, you are
>> already in for the pound of cost but only getting the penny of output
>> (so to speak).  While it would clutter the output, one could go ahead
>> and ask for the other latency stats and it won't "cost" anything more:
>>
>> ... -- -k
>> RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
>>
>> Additional information about how the omni output selectors work can be
>> found at
>> http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Omni-Output-Selection
>>
>> happy benchmarking,
>>
>> rick jones
>>
>> BTW - you will likely see some differences between RT_LATENCY, which is
>> calculated from the average transactions per second, and MEAN_LATENCY,
>> which is calculated from the histogram of individual latencies
>> maintained when any of the _LATENCY outputs other than RT_LATENCY is
>> requested.  Kudos to the folks at Google who did the extensions to the
>> then-existing histogram code to enable it to be used for more reasonably
>> accurate statistics.
>>
>
> Yeah ;)
>
> Here are the before/after_patch results, cpu 2 handling the NIC irqs :
>
>
> Before patch :
>
> # netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
> RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
> MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
> to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
> RT_LATENCY=550110.424
> MIN_LATENCY=146858
> MAX_LATENCY=997109
> P50_LATENCY=305000
> P90_LATENCY=550000
> P99_LATENCY=710000
> MEAN_LATENCY=376989.12
> STDDEV_LATENCY=184046.92
>
> After patch :
>
> # netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
> RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
> MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
> to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
> RT_LATENCY=40545.492
> MIN_LATENCY=9834
> MAX_LATENCY=78366
> P50_LATENCY=33583
> P90_LATENCY=59000
> P99_LATENCY=69000
> MEAN_LATENCY=38364.67
> STDDEV_LATENCY=12865.26
>

Will you send a v2 with this change...?

-#define MAX_SOFTIRQ_TIME  min(1, (2*HZ/1000))
+#define MAX_SOFTIRQ_TIME  max(1, (2*HZ/1000))

- Sedat -

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next] softirq: reduce latencies
  2013-01-04  5:16     ` Namhyung Kim
@ 2013-01-04  6:53       ` Eric Dumazet
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Dumazet @ 2013-01-04  6:53 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Andrew Morton, David Miller, netdev, linux-kernel@vger.kernel.org,
	Tom Herbert

On Fri, 2013-01-04 at 14:16 +0900, Namhyung Kim wrote:

> Probably a silly question:
> 
> Why not using ktime rather than jiffies for this?

ktime is too expensive on some hardware.

Here we only want a safety belt, no need for high time resolution.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next] softirq: reduce latencies
  2013-01-04  5:31       ` Sedat Dilek
@ 2013-01-04  6:54         ` Eric Dumazet
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Dumazet @ 2013-01-04  6:54 UTC (permalink / raw)
  To: sedat.dilek; +Cc: Rick Jones, netdev, LKML, Ben Hutchings

On Fri, 2013-01-04 at 06:31 +0100, Sedat Dilek wrote:

> 
> Will you send a v2 with this change...?
> 
> -#define MAX_SOFTIRQ_TIME  min(1, (2*HZ/1000))
> +#define MAX_SOFTIRQ_TIME  max(1, (2*HZ/1000))

I will, I was planning to do this after waiting for other
comments/reviews.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next] softirq: reduce latencies
       [not found]     ` <787701357283699@web24e.yandex.ru>
@ 2013-01-04  7:46       ` Eric Dumazet
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Dumazet @ 2013-01-04  7:46 UTC (permalink / raw)
  To: Oleg A.Arkhangelsky
  Cc: Andrew Morton, David Miller, netdev, linux-kernel@vger.kernel.org,
	Tom Herbert

On Fri, 2013-01-04 at 11:14 +0400, Oleg A.Arkhangelsky wrote:

> It leads to many context switches when softirqs processing deffered to
> ksoftirqd kthreads which can be very expensive. Here is some evidence
> of ksoftirqd activation effects:
> 
> http://marc.info/?l=linux-netdev&m=124116262916969&w=2
> 
> Look for "magic threshold". Yes, I know there was another bug in scheduler
> discovered that time, but this bug was only about tick accounting.
> 

This thread is 3 years old : 

- It was a router workload. Forwarded packets should not wakeup a task.
- The measure of how cpus spent their cycles was completely wrong.
- A lot of things have changed, both in network stack and scheduler.

In fact, under moderate load, my patch is able to loop more than 10
times before deferring to ksoftirqd.

Under stress, ksoftirqd will be started anyway, and its a good thing,
because it enables process migration.

500 "context switches" [1] per second instead of 50 on behalf of
ksoftirqd is absolutely not measurable. It also permits smoother RCU
cleanups.

I did a lot of benchmarks, and didnt see any regression yet, but usual
noise.

[1] Under load, __do_softirq() would be called 500 times per second,
instead of ~50 times per second.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 net-next] softirq: reduce latencies
  2013-01-03 22:40   ` Eric Dumazet
@ 2013-01-04  7:49     ` Eric Dumazet
  2013-01-04  8:15       ` Joe Perches
  2013-01-04 21:49       ` [PATCH v2 net-next] softirq: reduce latencies David Miller
  0 siblings, 2 replies; 22+ messages in thread
From: Eric Dumazet @ 2013-01-04  7:49 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David Miller, Andrew Morton, netdev, linux-kernel@vger.kernel.org,
	Tom Herbert

From: Eric Dumazet <edumazet@google.com>

In various network workloads, __do_softirq() latencies can be up
to 20 ms if HZ=1000, and 200 ms if HZ=100.

This is because we iterate 10 times in the softirq dispatcher,
and some actions can consume a lot of cycles.

This patch changes the fallback to ksoftirqd condition to :

- A time limit of 2 ms.
- need_resched() being set on current task

When one of this condition is met, we wakeup ksoftirqd for further
softirq processing if we still have pending softirqs.

Using need_resched() as the only condition can trigger RCU stalls,
as we can keep BH disabled for too long.

I ran several benchmarks and got no significant difference in
throughput, but a very significant reduction of latencies (one order
of magnitude) :

In following bench, 200 antagonist "netperf -t TCP_RR" are started in
background, using all available cpus.

Then we start one "netperf -t TCP_RR", bound to the cpu handling the NIC
IRQ (hard+soft)

Before patch :

# netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
RT_LATENCY=550110.424
MIN_LATENCY=146858
MAX_LATENCY=997109
P50_LATENCY=305000
P90_LATENCY=550000
P99_LATENCY=710000
MEAN_LATENCY=376989.12
STDDEV_LATENCY=184046.92

After patch :

# netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
RT_LATENCY=40545.492
MIN_LATENCY=9834
MAX_LATENCY=78366
P50_LATENCY=33583
P90_LATENCY=59000
P99_LATENCY=69000
MEAN_LATENCY=38364.67
STDDEV_LATENCY=12865.26

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Miller <davem@davemloft.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
---
v2: min(1, (2*HZ/1000)) -> max(1, (2*HZ/1000)), as spotted by Ben

 kernel/softirq.c |   17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index ed567ba..8d5e4be 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -195,21 +195,21 @@ void local_bh_enable_ip(unsigned long ip)
 EXPORT_SYMBOL(local_bh_enable_ip);
 
 /*
- * We restart softirq processing MAX_SOFTIRQ_RESTART times,
- * and we fall back to softirqd after that.
+ * We restart softirq processing for at most 2 ms,
+ * and if need_resched() is not set.
  *
- * This number has been established via experimentation.
+ * These limits have been established via experimentation.
  * The two things to balance is latency against fairness -
  * we want to handle softirqs as soon as possible, but they
  * should not be able to lock up the box.
  */
-#define MAX_SOFTIRQ_RESTART 10
+#define MAX_SOFTIRQ_TIME  max(1, (2*HZ/1000))
 
 asmlinkage void __do_softirq(void)
 {
 	struct softirq_action *h;
 	__u32 pending;
-	int max_restart = MAX_SOFTIRQ_RESTART;
+	unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
 	int cpu;
 	unsigned long old_flags = current->flags;
 
@@ -264,11 +264,12 @@ restart:
 	local_irq_disable();
 
 	pending = local_softirq_pending();
-	if (pending && --max_restart)
-		goto restart;
+	if (pending) {
+		if (time_before(jiffies, end) && !need_resched())
+			goto restart;
 
-	if (pending)
 		wakeup_softirqd();
+	}
 
 	lockdep_softirq_exit();
 

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 net-next] softirq: reduce latencies
  2013-01-04  7:49     ` [PATCH v2 " Eric Dumazet
@ 2013-01-04  8:15       ` Joe Perches
  2013-01-04  8:23         ` Eric Dumazet
  2013-01-04 21:49       ` [PATCH v2 net-next] softirq: reduce latencies David Miller
  1 sibling, 1 reply; 22+ messages in thread
From: Joe Perches @ 2013-01-04  8:15 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ben Hutchings, David Miller, Andrew Morton, netdev,
	linux-kernel@vger.kernel.org, Tom Herbert

On Thu, 2013-01-03 at 23:49 -0800, Eric Dumazet wrote:
> In various network workloads, __do_softirq() latencies can be up
> to 20 ms if HZ=1000, and 200 ms if HZ=100.
> This patch changes the fallback to ksoftirqd condition to :
> - A time limit of 2 ms.

[]
> diff --git a/kernel/softirq.c b/kernel/softirq.c
[]
> +#define MAX_SOFTIRQ_TIME  max(1, (2*HZ/1000))

And if HZ is 10000?
 
>  asmlinkage void __do_softirq(void)
>  {
[]
> +	unsigned long end = jiffies + MAX_SOFTIRQ_TIME;

Perhaps MAX_SOFTIRQ_TIME should be

#define MAX_SOFTIRQ_TIME msecs_to_jiffies(2)

though it would be nicer if it were a compile time constant.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 net-next] softirq: reduce latencies
  2013-01-04  8:15       ` Joe Perches
@ 2013-01-04  8:23         ` Eric Dumazet
  2013-01-04  9:12           ` Joe Perches
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2013-01-04  8:23 UTC (permalink / raw)
  To: Joe Perches
  Cc: Ben Hutchings, David Miller, Andrew Morton, netdev,
	linux-kernel@vger.kernel.org, Tom Herbert

On Fri, 2013-01-04 at 00:15 -0800, Joe Perches wrote:
> On Thu, 2013-01-03 at 23:49 -0800, Eric Dumazet wrote:
> > In various network workloads, __do_softirq() latencies can be up
> > to 20 ms if HZ=1000, and 200 ms if HZ=100.
> > This patch changes the fallback to ksoftirqd condition to :
> > - A time limit of 2 ms.
> 
> []
> > diff --git a/kernel/softirq.c b/kernel/softirq.c
> []
> > +#define MAX_SOFTIRQ_TIME  max(1, (2*HZ/1000))
> 
> And if HZ is 10000?
>  

Then its OK.  2*10000/1000 -> 20 ticks -> 2 ms


> >  asmlinkage void __do_softirq(void)
> >  {
> []
> > +	unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
> 
> Perhaps MAX_SOFTIRQ_TIME should be
> 
> #define MAX_SOFTIRQ_TIME msecs_to_jiffies(2)
> 
> though it would be nicer if it were a compile time constant.

If you send a patch to convert msecs_to_jiffies() to an inline function
when HZ = 1000, I will gladly use it instead of (2*HZ/1000)

Right now, max(1, msecs_to_jiffies(2)) uses way too many instructions,
while it should be the constant 2, known at compile time.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 net-next] softirq: reduce latencies
  2013-01-04  8:23         ` Eric Dumazet
@ 2013-01-04  9:12           ` Joe Perches
  2013-01-04 17:00             ` Eric Dumazet
  0 siblings, 1 reply; 22+ messages in thread
From: Joe Perches @ 2013-01-04  9:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ben Hutchings, David Miller, Andrew Morton, netdev,
	linux-kernel@vger.kernel.org, Tom Herbert

On Fri, 2013-01-04 at 00:23 -0800, Eric Dumazet wrote:
> On Fri, 2013-01-04 at 00:15 -0800, Joe Perches wrote:
> > Perhaps MAX_SOFTIRQ_TIME should be
> > #define MAX_SOFTIRQ_TIME msecs_to_jiffies(2)
> > though it would be nicer if it were a compile time constant.
> 
> If you send a patch to convert msecs_to_jiffies() to an inline function
> when HZ = 1000, I will gladly use it instead of (2*HZ/1000)
> 
> Right now, max(1, msecs_to_jiffies(2)) uses way too many instructions,
> while it should be the constant 2, known at compile time.

Something like this might work.

This is incomplete, it just does msecs_to_jiffies,
and it should convert usecs_to_jiffies and the
jiffies_to_<foo> types too.

Maybe it's worthwhile.

It does reduce object size by 16 bytes per call site
(x86-32) when the argument is a constant. There are
about 800 of these jiffies conversions in kernel sources.

What do you think?

 include/linux/jiffies.h | 49 ++++++++++++++++++++++++++++++++++++++++++++++++-
 kernel/time.c           | 42 +++---------------------------------------
 2 files changed, 51 insertions(+), 40 deletions(-)

diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h
index 82ed068..c67ddcf 100644
--- a/include/linux/jiffies.h
+++ b/include/linux/jiffies.h
@@ -291,7 +291,54 @@ extern unsigned long preset_lpj;
  */
 extern unsigned int jiffies_to_msecs(const unsigned long j);
 extern unsigned int jiffies_to_usecs(const unsigned long j);
-extern unsigned long msecs_to_jiffies(const unsigned int m);
+extern unsigned long __msecs_to_jiffies(const unsigned int m);
+
+static inline unsigned long __inline_msecs_to_jiffies(const unsigned int m)
+{
+	/*
+	 * Negative value, means infinite timeout:
+	 */
+	if ((int)m < 0)
+		return MAX_JIFFY_OFFSET;
+
+#if HZ <= MSEC_PER_SEC && !(MSEC_PER_SEC % HZ)
+	/*
+	 * HZ is equal to or smaller than 1000, and 1000 is a nice
+	 * round multiple of HZ, divide with the factor between them,
+	 * but round upwards:
+	 */
+	return (m + (MSEC_PER_SEC / HZ) - 1) / (MSEC_PER_SEC / HZ);
+#elif HZ > MSEC_PER_SEC && !(HZ % MSEC_PER_SEC)
+	/*
+	 * HZ is larger than 1000, and HZ is a nice round multiple of
+	 * 1000 - simply multiply with the factor between them.
+	 *
+	 * But first make sure the multiplication result cannot
+	 * overflow:
+	 */
+	if (m > jiffies_to_msecs(MAX_JIFFY_OFFSET))
+		return MAX_JIFFY_OFFSET;
+
+	return m * (HZ / MSEC_PER_SEC);
+#else
+	/*
+	 * Generic case - multiply, round and divide. But first
+	 * check that if we are doing a net multiplication, that
+	 * we wouldn't overflow:
+	 */
+	if (HZ > MSEC_PER_SEC && m > jiffies_to_msecs(MAX_JIFFY_OFFSET))
+		return MAX_JIFFY_OFFSET;
+
+	return (MSEC_TO_HZ_MUL32 * m + MSEC_TO_HZ_ADJ32)
+		>> MSEC_TO_HZ_SHR32;
+#endif
+}
+
+#define msecs_to_jiffies(x)			\
+	(__builtin_constant_p(x) ?		\
+	 __inline_msecs_to_jiffies(x) :		\
+	 __msecs_to_jiffies(x))
+
 extern unsigned long usecs_to_jiffies(const unsigned int u);
 extern unsigned long timespec_to_jiffies(const struct timespec *value);
 extern void jiffies_to_timespec(const unsigned long jiffies,
diff --git a/kernel/time.c b/kernel/time.c
index d226c6a..231f2ac 100644
--- a/kernel/time.c
+++ b/kernel/time.c
@@ -425,47 +425,11 @@ EXPORT_SYMBOL(ns_to_timeval);
  *
  * We must also be careful about 32-bit overflows.
  */
-unsigned long msecs_to_jiffies(const unsigned int m)
+unsigned long __msecs_to_jiffies(const unsigned int m)
 {
-	/*
-	 * Negative value, means infinite timeout:
-	 */
-	if ((int)m < 0)
-		return MAX_JIFFY_OFFSET;
-
-#if HZ <= MSEC_PER_SEC && !(MSEC_PER_SEC % HZ)
-	/*
-	 * HZ is equal to or smaller than 1000, and 1000 is a nice
-	 * round multiple of HZ, divide with the factor between them,
-	 * but round upwards:
-	 */
-	return (m + (MSEC_PER_SEC / HZ) - 1) / (MSEC_PER_SEC / HZ);
-#elif HZ > MSEC_PER_SEC && !(HZ % MSEC_PER_SEC)
-	/*
-	 * HZ is larger than 1000, and HZ is a nice round multiple of
-	 * 1000 - simply multiply with the factor between them.
-	 *
-	 * But first make sure the multiplication result cannot
-	 * overflow:
-	 */
-	if (m > jiffies_to_msecs(MAX_JIFFY_OFFSET))
-		return MAX_JIFFY_OFFSET;
-
-	return m * (HZ / MSEC_PER_SEC);
-#else
-	/*
-	 * Generic case - multiply, round and divide. But first
-	 * check that if we are doing a net multiplication, that
-	 * we wouldn't overflow:
-	 */
-	if (HZ > MSEC_PER_SEC && m > jiffies_to_msecs(MAX_JIFFY_OFFSET))
-		return MAX_JIFFY_OFFSET;
-
-	return (MSEC_TO_HZ_MUL32 * m + MSEC_TO_HZ_ADJ32)
-		>> MSEC_TO_HZ_SHR32;
-#endif
+	return __inline_msecs_to_jiffies(m);
 }
-EXPORT_SYMBOL(msecs_to_jiffies);
+EXPORT_SYMBOL(__msecs_to_jiffies);
 
 unsigned long usecs_to_jiffies(const unsigned int u)
 {

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next] softirq: reduce latencies
  2013-01-04  4:41     ` Eric Dumazet
  2013-01-04  5:31       ` Sedat Dilek
@ 2013-01-04 11:57       ` Sedat Dilek
  1 sibling, 0 replies; 22+ messages in thread
From: Sedat Dilek @ 2013-01-04 11:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Rick Jones, netdev, LKML

[-- Attachment #1: Type: text/plain, Size: 2491 bytes --]

On Fri, Jan 4, 2013 at 5:41 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2013-01-03 at 11:41 -0800, Rick Jones wrote:
>
>> In terms of netperf overhead, once you specify P99_LATENCY, you are
>> already in for the pound of cost but only getting the penny of output
>> (so to speak).  While it would clutter the output, one could go ahead
>> and ask for the other latency stats and it won't "cost" anything more:
>>
>> ... -- -k
>> RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
>>
>> Additional information about how the omni output selectors work can be
>> found at
>> http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Omni-Output-Selection
>>
>> happy benchmarking,
>>
>> rick jones
>>
>> BTW - you will likely see some differences between RT_LATENCY, which is
>> calculated from the average transactions per second, and MEAN_LATENCY,
>> which is calculated from the histogram of individual latencies
>> maintained when any of the _LATENCY outputs other than RT_LATENCY is
>> requested.  Kudos to the folks at Google who did the extensions to the
>> then-existing histogram code to enable it to be used for more reasonably
>> accurate statistics.
>>
>
> Yeah ;)
>
> Here are the before/after_patch results, cpu 2 handling the NIC irqs :
>
>
> Before patch :
>
> # netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
> RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
> MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
> to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
> RT_LATENCY=550110.424
> MIN_LATENCY=146858
> MAX_LATENCY=997109
> P50_LATENCY=305000
> P90_LATENCY=550000
> P99_LATENCY=710000
> MEAN_LATENCY=376989.12
> STDDEV_LATENCY=184046.92
>
> After patch :
>
> # netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
> RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
> MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
> to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
> RT_LATENCY=40545.492
> MIN_LATENCY=9834
> MAX_LATENCY=78366
> P50_LATENCY=33583
> P90_LATENCY=59000
> P99_LATENCY=69000
> MEAN_LATENCY=38364.67
> STDDEV_LATENCY=12865.26
>

I also wanted to give some numbers.
But with localhost as default (no netserver running on a remote-host)
I am not sure if these numbers give any helpful feedback.
( I have not tested yet w/o your patch. )

- Sedat -

[-- Attachment #2: NETPERF_softirq-experimental.txt --]
[-- Type: text/plain, Size: 2150 bytes --]

# netperf -H localhost -t TCP_RR -T2,2 -- -k RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET : demo : first burst 0 : cpu bind
RT_LATENCY=11.407
MIN_LATENCY=10
MAX_LATENCY=152
P50_LATENCY=11
P90_LATENCY=12
P99_LATENCY=13
MEAN_LATENCY=11.31
STDDEV_LATENCY=0.95

# netperf -H localhost -t TCP_RR -T2,2 -- -k RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET : demo : first burst 0 : cpu bind
RT_LATENCY=11.364
MIN_LATENCY=10
MAX_LATENCY=156
P50_LATENCY=11
P90_LATENCY=12
P99_LATENCY=13
MEAN_LATENCY=11.27
STDDEV_LATENCY=1.06

# netperf -H localhost -t TCP_RR -T2,2 -- -k RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET : demo : first burst 0 : cpu bind
RT_LATENCY=11.408
MIN_LATENCY=10
MAX_LATENCY=156
P50_LATENCY=11
P90_LATENCY=12
P99_LATENCY=13
MEAN_LATENCY=11.31
STDDEV_LATENCY=1.07

# netperf -H localhost -t TCP_RR -T2,2 -- -k RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET : demo : first burst 0 : cpu bind
RT_LATENCY=11.312
MIN_LATENCY=10
MAX_LATENCY=151
P50_LATENCY=11
P90_LATENCY=12
P99_LATENCY=13
MEAN_LATENCY=11.22
STDDEV_LATENCY=0.94

# netperf -H localhost -t TCP_RR -T2,2 -- -k RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET : demo : first burst 0 : cpu bind
RT_LATENCY=11.345
MIN_LATENCY=10
MAX_LATENCY=159
P50_LATENCY=11
P90_LATENCY=12
P99_LATENCY=13
MEAN_LATENCY=11.25
STDDEV_LATENCY=1.03


-dileks // 04-Jan-2013


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 net-next] softirq: reduce latencies
  2013-01-04  9:12           ` Joe Perches
@ 2013-01-04 17:00             ` Eric Dumazet
  2013-01-04 21:15               ` [PATCH] jiffies conversions: Use compile time constants when possible Joe Perches
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2013-01-04 17:00 UTC (permalink / raw)
  To: Joe Perches
  Cc: Ben Hutchings, David Miller, Andrew Morton, netdev,
	linux-kernel@vger.kernel.org, Tom Herbert

On Fri, 2013-01-04 at 01:12 -0800, Joe Perches wrote:
> On Fri, 2013-01-04 at 00:23 -0800, Eric Dumazet wrote:
> > On Fri, 2013-01-04 at 00:15 -0800, Joe Perches wrote:
> > > Perhaps MAX_SOFTIRQ_TIME should be
> > > #define MAX_SOFTIRQ_TIME msecs_to_jiffies(2)
> > > though it would be nicer if it were a compile time constant.
> > 
> > If you send a patch to convert msecs_to_jiffies() to an inline function
> > when HZ = 1000, I will gladly use it instead of (2*HZ/1000)
> > 
> > Right now, max(1, msecs_to_jiffies(2)) uses way too many instructions,
> > while it should be the constant 2, known at compile time.
> 
> Something like this might work.
> 
> This is incomplete, it just does msecs_to_jiffies,
> and it should convert usecs_to_jiffies and the
> jiffies_to_<foo> types too.
> 
> Maybe it's worthwhile.
> 
> It does reduce object size by 16 bytes per call site
> (x86-32) when the argument is a constant. There are
> about 800 of these jiffies conversions in kernel sources.
> 
> What do you think?
> 

I think this is something to discuss in another thread, and definitely
worth to do, at least for msecs_to_jiffies()

We have many HZ references everywhere that could be cleaned up using
this.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] jiffies conversions: Use compile time constants when possible
  2013-01-04 17:00             ` Eric Dumazet
@ 2013-01-04 21:15               ` Joe Perches
  0 siblings, 0 replies; 22+ messages in thread
From: Joe Perches @ 2013-01-04 21:15 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ben Hutchings, David Miller, Andrew Morton, netdev,
	linux-kernel@vger.kernel.org, Tom Herbert

Do the multiplications and divisions at compile time
instead of runtime when the converted value is a constant.

Make the calculation functions static __always_inline to jiffies.h.

Add #defines with __builtin_constant_p to test and use the
static inline or the runtime functions as appropriate.

Prefix the old exported symbols/functions with __

Signed-off-by: Joe Perches <joe@perches.com>
---
 include/linux/jiffies.h | 136 ++++++++++++++++++++++++++++++++++++++++++++++--
 kernel/time.c           | 106 +++++--------------------------------
 2 files changed, 144 insertions(+), 98 deletions(-)

diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h
index 82ed068..88578e4 100644
--- a/include/linux/jiffies.h
+++ b/include/linux/jiffies.h
@@ -289,10 +289,138 @@ extern unsigned long preset_lpj;
 /*
  * Convert various time units to each other:
  */
-extern unsigned int jiffies_to_msecs(const unsigned long j);
-extern unsigned int jiffies_to_usecs(const unsigned long j);
-extern unsigned long msecs_to_jiffies(const unsigned int m);
-extern unsigned long usecs_to_jiffies(const unsigned int u);
+
+/*
+ * Avoid unnecessary multiplications/divisions in the
+ * two most common HZ cases:
+ */
+static __always_inline unsigned int
+__inline_jiffies_to_msecs(const unsigned long j)
+{
+#if HZ <= MSEC_PER_SEC && !(MSEC_PER_SEC % HZ)
+	return (MSEC_PER_SEC / HZ) * j;
+#elif HZ > MSEC_PER_SEC && !(HZ % MSEC_PER_SEC)
+	return (j + (HZ / MSEC_PER_SEC) - 1)/(HZ / MSEC_PER_SEC);
+#else
+# if BITS_PER_LONG == 32
+	return (HZ_TO_MSEC_MUL32 * j) >> HZ_TO_MSEC_SHR32;
+# else
+	return (j * HZ_TO_MSEC_NUM) / HZ_TO_MSEC_DEN;
+# endif
+#endif
+}
+extern unsigned int __jiffies_to_msecs(const unsigned long j);
+
+#define jiffies_to_msecs(x)			\
+	(__builtin_constant_p(x) ?		\
+	 __inline_jiffies_to_msecs(x) :		\
+	 __jiffies_to_msecs(x))
+
+static __always_inline unsigned int
+__inline_jiffies_to_usecs(const unsigned long j)
+{
+#if HZ <= USEC_PER_SEC && !(USEC_PER_SEC % HZ)
+	return (USEC_PER_SEC / HZ) * j;
+#elif HZ > USEC_PER_SEC && !(HZ % USEC_PER_SEC)
+	return (j + (HZ / USEC_PER_SEC) - 1)/(HZ / USEC_PER_SEC);
+#else
+# if BITS_PER_LONG == 32
+	return (HZ_TO_USEC_MUL32 * j) >> HZ_TO_USEC_SHR32;
+# else
+	return (j * HZ_TO_USEC_NUM) / HZ_TO_USEC_DEN;
+# endif
+#endif
+}
+extern unsigned int __jiffies_to_usecs(const unsigned long j);
+
+#define jiffies_to_usecs(x)			\
+	(__builtin_constant_p(x) ?		\
+	 __inline_jiffies_to_usecs(x) :		\
+	 __jiffies_to_usecs(x))
+
+/*
+ * When we convert to jiffies then we interpret incoming values
+ * the following way:
+ *
+ * - negative values mean 'infinite timeout' (MAX_JIFFY_OFFSET)
+ *
+ * - 'too large' values [that would result in larger than
+ *   MAX_JIFFY_OFFSET values] mean 'infinite timeout' too.
+ *
+ * - all other values are converted to jiffies by either multiplying
+ *   the input value by a factor or dividing it with a factor
+ *
+ * We must also be careful about 32-bit overflows.
+ */
+static __always_inline unsigned long
+__inline_msecs_to_jiffies(const unsigned int m)
+{
+	/*
+	 * Negative value, means infinite timeout:
+	 */
+	if ((int)m < 0)
+		return MAX_JIFFY_OFFSET;
+
+#if HZ <= MSEC_PER_SEC && !(MSEC_PER_SEC % HZ)
+	/*
+	 * HZ is equal to or smaller than 1000, and 1000 is a nice
+	 * round multiple of HZ, divide with the factor between them,
+	 * but round upwards:
+	 */
+	return (m + (MSEC_PER_SEC / HZ) - 1) / (MSEC_PER_SEC / HZ);
+#elif HZ > MSEC_PER_SEC && !(HZ % MSEC_PER_SEC)
+	/*
+	 * HZ is larger than 1000, and HZ is a nice round multiple of
+	 * 1000 - simply multiply with the factor between them.
+	 *
+	 * But first make sure the multiplication result cannot
+	 * overflow:
+	 */
+	if (m > jiffies_to_msecs(MAX_JIFFY_OFFSET))
+		return MAX_JIFFY_OFFSET;
+
+	return m * (HZ / MSEC_PER_SEC);
+#else
+	/*
+	 * Generic case - multiply, round and divide. But first
+	 * check that if we are doing a net multiplication, that
+	 * we wouldn't overflow:
+	 */
+	if (HZ > MSEC_PER_SEC && m > jiffies_to_msecs(MAX_JIFFY_OFFSET))
+		return MAX_JIFFY_OFFSET;
+
+	return (MSEC_TO_HZ_MUL32 * m + MSEC_TO_HZ_ADJ32)
+		>> MSEC_TO_HZ_SHR32;
+#endif
+}
+extern unsigned long __msecs_to_jiffies(const unsigned int m);
+
+#define msecs_to_jiffies(x)			\
+	(__builtin_constant_p(x) ?		\
+	 __inline_msecs_to_jiffies(x) :		\
+	 __msecs_to_jiffies(x))
+
+static __always_inline unsigned long
+__inline_usecs_to_jiffies(const unsigned int u)
+{
+	if (u > jiffies_to_usecs(MAX_JIFFY_OFFSET))
+		return MAX_JIFFY_OFFSET;
+#if HZ <= USEC_PER_SEC && !(USEC_PER_SEC % HZ)
+	return (u + (USEC_PER_SEC / HZ) - 1) / (USEC_PER_SEC / HZ);
+#elif HZ > USEC_PER_SEC && !(HZ % USEC_PER_SEC)
+	return u * (HZ / USEC_PER_SEC);
+#else
+	return (USEC_TO_HZ_MUL32 * u + USEC_TO_HZ_ADJ32)
+		>> USEC_TO_HZ_SHR32;
+#endif
+}
+extern unsigned long __usecs_to_jiffies(const unsigned int u);
+
+#define usecs_to_jiffies(x)			\
+	(__builtin_constant_p(x) ?		\
+	 __inline_usecs_to_jiffies(x) :		\
+	 __usecs_to_jiffies(x))
+
 extern unsigned long timespec_to_jiffies(const struct timespec *value);
 extern void jiffies_to_timespec(const unsigned long jiffies,
 				struct timespec *value);
diff --git a/kernel/time.c b/kernel/time.c
index d226c6a..b9d1024 100644
--- a/kernel/time.c
+++ b/kernel/time.c
@@ -228,41 +228,18 @@ EXPORT_SYMBOL(current_fs_time);
 
 /*
  * Convert jiffies to milliseconds and back.
- *
- * Avoid unnecessary multiplications/divisions in the
- * two most common HZ cases:
  */
-inline unsigned int jiffies_to_msecs(const unsigned long j)
+unsigned int __jiffies_to_msecs(const unsigned long j)
 {
-#if HZ <= MSEC_PER_SEC && !(MSEC_PER_SEC % HZ)
-	return (MSEC_PER_SEC / HZ) * j;
-#elif HZ > MSEC_PER_SEC && !(HZ % MSEC_PER_SEC)
-	return (j + (HZ / MSEC_PER_SEC) - 1)/(HZ / MSEC_PER_SEC);
-#else
-# if BITS_PER_LONG == 32
-	return (HZ_TO_MSEC_MUL32 * j) >> HZ_TO_MSEC_SHR32;
-# else
-	return (j * HZ_TO_MSEC_NUM) / HZ_TO_MSEC_DEN;
-# endif
-#endif
+	return __inline_jiffies_to_msecs(j);
 }
-EXPORT_SYMBOL(jiffies_to_msecs);
+EXPORT_SYMBOL(__jiffies_to_msecs);
 
-inline unsigned int jiffies_to_usecs(const unsigned long j)
+unsigned int __jiffies_to_usecs(const unsigned long j)
 {
-#if HZ <= USEC_PER_SEC && !(USEC_PER_SEC % HZ)
-	return (USEC_PER_SEC / HZ) * j;
-#elif HZ > USEC_PER_SEC && !(HZ % USEC_PER_SEC)
-	return (j + (HZ / USEC_PER_SEC) - 1)/(HZ / USEC_PER_SEC);
-#else
-# if BITS_PER_LONG == 32
-	return (HZ_TO_USEC_MUL32 * j) >> HZ_TO_USEC_SHR32;
-# else
-	return (j * HZ_TO_USEC_NUM) / HZ_TO_USEC_DEN;
-# endif
-#endif
+	return __inline_jiffies_to_usecs(j);
 }
-EXPORT_SYMBOL(jiffies_to_usecs);
+EXPORT_SYMBOL(__jiffies_to_usecs);
 
 /**
  * timespec_trunc - Truncate timespec to a granularity
@@ -411,76 +388,17 @@ struct timeval ns_to_timeval(const s64 nsec)
 }
 EXPORT_SYMBOL(ns_to_timeval);
 
-/*
- * When we convert to jiffies then we interpret incoming values
- * the following way:
- *
- * - negative values mean 'infinite timeout' (MAX_JIFFY_OFFSET)
- *
- * - 'too large' values [that would result in larger than
- *   MAX_JIFFY_OFFSET values] mean 'infinite timeout' too.
- *
- * - all other values are converted to jiffies by either multiplying
- *   the input value by a factor or dividing it with a factor
- *
- * We must also be careful about 32-bit overflows.
- */
-unsigned long msecs_to_jiffies(const unsigned int m)
+unsigned long __msecs_to_jiffies(const unsigned int m)
 {
-	/*
-	 * Negative value, means infinite timeout:
-	 */
-	if ((int)m < 0)
-		return MAX_JIFFY_OFFSET;
-
-#if HZ <= MSEC_PER_SEC && !(MSEC_PER_SEC % HZ)
-	/*
-	 * HZ is equal to or smaller than 1000, and 1000 is a nice
-	 * round multiple of HZ, divide with the factor between them,
-	 * but round upwards:
-	 */
-	return (m + (MSEC_PER_SEC / HZ) - 1) / (MSEC_PER_SEC / HZ);
-#elif HZ > MSEC_PER_SEC && !(HZ % MSEC_PER_SEC)
-	/*
-	 * HZ is larger than 1000, and HZ is a nice round multiple of
-	 * 1000 - simply multiply with the factor between them.
-	 *
-	 * But first make sure the multiplication result cannot
-	 * overflow:
-	 */
-	if (m > jiffies_to_msecs(MAX_JIFFY_OFFSET))
-		return MAX_JIFFY_OFFSET;
-
-	return m * (HZ / MSEC_PER_SEC);
-#else
-	/*
-	 * Generic case - multiply, round and divide. But first
-	 * check that if we are doing a net multiplication, that
-	 * we wouldn't overflow:
-	 */
-	if (HZ > MSEC_PER_SEC && m > jiffies_to_msecs(MAX_JIFFY_OFFSET))
-		return MAX_JIFFY_OFFSET;
-
-	return (MSEC_TO_HZ_MUL32 * m + MSEC_TO_HZ_ADJ32)
-		>> MSEC_TO_HZ_SHR32;
-#endif
+	return __inline_msecs_to_jiffies(m);
 }
-EXPORT_SYMBOL(msecs_to_jiffies);
+EXPORT_SYMBOL(__msecs_to_jiffies);
 
-unsigned long usecs_to_jiffies(const unsigned int u)
+unsigned long __usecs_to_jiffies(const unsigned int u)
 {
-	if (u > jiffies_to_usecs(MAX_JIFFY_OFFSET))
-		return MAX_JIFFY_OFFSET;
-#if HZ <= USEC_PER_SEC && !(USEC_PER_SEC % HZ)
-	return (u + (USEC_PER_SEC / HZ) - 1) / (USEC_PER_SEC / HZ);
-#elif HZ > USEC_PER_SEC && !(HZ % USEC_PER_SEC)
-	return u * (HZ / USEC_PER_SEC);
-#else
-	return (USEC_TO_HZ_MUL32 * u + USEC_TO_HZ_ADJ32)
-		>> USEC_TO_HZ_SHR32;
-#endif
+	return __inline_usecs_to_jiffies(u);
 }
-EXPORT_SYMBOL(usecs_to_jiffies);
+EXPORT_SYMBOL(__usecs_to_jiffies);
 
 /*
  * The TICK_NSEC - 1 rounds up the value to the next resolution.  Note

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 net-next] softirq: reduce latencies
  2013-01-04  7:49     ` [PATCH v2 " Eric Dumazet
  2013-01-04  8:15       ` Joe Perches
@ 2013-01-04 21:49       ` David Miller
  1 sibling, 0 replies; 22+ messages in thread
From: David Miller @ 2013-01-04 21:49 UTC (permalink / raw)
  To: eric.dumazet; +Cc: bhutchings, akpm, netdev, linux-kernel, therbert

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 03 Jan 2013 23:49:40 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> In various network workloads, __do_softirq() latencies can be up
> to 20 ms if HZ=1000, and 200 ms if HZ=100.
> 
> This is because we iterate 10 times in the softirq dispatcher,
> and some actions can consume a lot of cycles.
> 
> This patch changes the fallback to ksoftirqd condition to :
> 
> - A time limit of 2 ms.
> - need_resched() being set on current task
> 
> When one of this condition is met, we wakeup ksoftirqd for further
> softirq processing if we still have pending softirqs.
 ...
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2013-01-04 21:49 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-03 12:28 [PATCH net-next] softirq: reduce latencies Eric Dumazet
2013-01-03 20:46 ` Andrew Morton
2013-01-03 22:41   ` Eric Dumazet
2013-01-04  5:16     ` Namhyung Kim
2013-01-04  6:53       ` Eric Dumazet
     [not found]     ` <787701357283699@web24e.yandex.ru>
2013-01-04  7:46       ` Eric Dumazet
2013-01-03 22:08 ` Ben Hutchings
2013-01-03 22:40   ` Eric Dumazet
2013-01-04  7:49     ` [PATCH v2 " Eric Dumazet
2013-01-04  8:15       ` Joe Perches
2013-01-04  8:23         ` Eric Dumazet
2013-01-04  9:12           ` Joe Perches
2013-01-04 17:00             ` Eric Dumazet
2013-01-04 21:15               ` [PATCH] jiffies conversions: Use compile time constants when possible Joe Perches
2013-01-04 21:49       ` [PATCH v2 net-next] softirq: reduce latencies David Miller
  -- strict thread matches above, loose matches on Subject: below --
2013-01-03 13:12 [PATCH " Sedat Dilek
2013-01-03 13:31 ` Eric Dumazet
2013-01-03 19:41   ` Rick Jones
2013-01-04  4:41     ` Eric Dumazet
2013-01-04  5:31       ` Sedat Dilek
2013-01-04  6:54         ` Eric Dumazet
2013-01-04 11:57       ` Sedat Dilek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).