* [PATCH v2 net-next 2/2] mlx4: use napi_complete_done()
@ 2014-11-07  5:10 Eric Dumazet
  2014-11-07 22:00 ` David Miller
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2014-11-07  5:10 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Amir Vadai, ogerlitz, willemb
From: Eric Dumazet <edumazet@google.com>
To enable gro_flush_timeout, a driver has to use napi_complete_done()
instead of napi_complete().
Tested:
 Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues)
Without this feature, we send back about 305,000 ACK per second.
GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)
Setting a timer of 2000 nsec is enough to increase GRO packet sizes
and reduce number of ACK packets. (811/19.2 = 42)
Receiver performs less calls to upper stacks, less wakes up.
This also reduces cpu usage on the sender, as it receives less ACK
packets.
Note that reducing number of wakes up increases cpu efficiency, but can
decrease QPS, as applications wont have the chance to warmup cpu caches
doing a partial read of RPC requests/answers if they fit in one skb.
B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average:         eth0 811269.80 305732.30 1199462.57  19705.72      0.00
0.00      0.50
B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout
B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average:         eth0 811577.30  19230.80 1199916.51   1239.80      0.00
0.00      0.50
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |   11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index b173a0cf44e0..46ee78326f1f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -915,15 +915,12 @@ int mlx4_en_poll_rx_cq(struct napi_struct *napi, int budget)
 			 * probably affinity changed. need to stop this NAPI
 			 * poll, and restart it on the right CPU
 			 */
-			napi_complete(napi);
-			mlx4_en_arm_cq(priv, cq);
-			return 0;
+			done = 0;
 		}
-	} else {
-		/* Done for now */
-		napi_complete(napi);
-		mlx4_en_arm_cq(priv, cq);
 	}
+	/* Done for now */
+	napi_complete_done(napi, done);
+	mlx4_en_arm_cq(priv, cq);
 	return done;
 }
 
^ permalink raw reply related	[flat|nested] 5+ messages in thread
* Re: [PATCH v2 net-next 2/2] mlx4: use napi_complete_done()
  2014-11-07  5:10 [PATCH v2 net-next 2/2] mlx4: use napi_complete_done() Eric Dumazet
@ 2014-11-07 22:00 ` David Miller
  2014-11-10 18:01   ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: David Miller @ 2014-11-07 22:00 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, amirv, ogerlitz, willemb
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 06 Nov 2014 21:10:11 -0800
> From: Eric Dumazet <edumazet@google.com>
> 
> To enable gro_flush_timeout, a driver has to use napi_complete_done()
> instead of napi_complete().
> 
> Tested:
>  Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues)
> 
> Without this feature, we send back about 305,000 ACK per second.
> 
> GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)
> 
> Setting a timer of 2000 nsec is enough to increase GRO packet sizes
> and reduce number of ACK packets. (811/19.2 = 42)
> 
> Receiver performs less calls to upper stacks, less wakes up.
> This also reduces cpu usage on the sender, as it receives less ACK
> packets.
> 
> Note that reducing number of wakes up increases cpu efficiency, but can
> decrease QPS, as applications wont have the chance to warmup cpu caches
> doing a partial read of RPC requests/answers if they fit in one skb.
> 
> B:~# sar -n DEV 1 10 | grep eth0 | tail -1
> Average:         eth0 811269.80 305732.30 1199462.57  19705.72      0.00
> 0.00      0.50
> 
> B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout
> 
> B:~# sar -n DEV 1 10 | grep eth0 | tail -1
> Average:         eth0 811577.30  19230.80 1199916.51   1239.80      0.00
> 0.00      0.50
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Applied.
^ permalink raw reply	[flat|nested] 5+ messages in thread
* Re: [PATCH v2 net-next 2/2] mlx4: use napi_complete_done()
  2014-11-07 22:00 ` David Miller
@ 2014-11-10 18:01   ` Eric Dumazet
  2014-11-10 22:07     ` [PATCH net-next] mlx4: restore conditional call to napi_complete_done() Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2014-11-10 18:01 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, amirv, ogerlitz, willemb
On Fri, 2014-11-07 at 17:00 -0500, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 06 Nov 2014 21:10:11 -0800
...
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> 
> Applied.
I managed to add a bug in this trivial patch.
I'll send a fix asap.
^ permalink raw reply	[flat|nested] 5+ messages in thread
* [PATCH net-next] mlx4: restore conditional call to napi_complete_done()
  2014-11-10 18:01   ` Eric Dumazet
@ 2014-11-10 22:07     ` Eric Dumazet
  2014-11-11  2:13       ` David Miller
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2014-11-10 22:07 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, amirv, ogerlitz, willemb
From: Eric Dumazet <edumazet@google.com>
After commit 1a28817282 ("mlx4: use napi_complete_done()") we ended up
calling napi_complete_done() in the case NAPI poll consumed all its
budget.
This added extra interrupt pressure, this patch restores proper
behavior.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 1a28817282 ("mlx4: use napi_complete_done()")
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |   15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 46ee78326f1fecb14f42d1afcc113fa18087cfa6..5a193f40a14c2a89fecfbac5dce0771e15e69861 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -910,13 +910,14 @@ int mlx4_en_poll_rx_cq(struct napi_struct *napi, int budget)
 		cpu_curr = smp_processor_id();
 		aff = irq_desc_get_irq_data(cq->irq_desc)->affinity;
 
-		if (unlikely(!cpumask_test_cpu(cpu_curr, aff))) {
-			/* Current cpu is not according to smp_irq_affinity -
-			 * probably affinity changed. need to stop this NAPI
-			 * poll, and restart it on the right CPU
-			 */
-			done = 0;
-		}
+		if (likely(cpumask_test_cpu(cpu_curr, aff)))
+			return budget;
+
+		/* Current cpu is not according to smp_irq_affinity -
+		 * probably affinity changed. need to stop this NAPI
+		 * poll, and restart it on the right CPU
+		 */
+		done = 0;
 	}
 	/* Done for now */
 	napi_complete_done(napi, done);
^ permalink raw reply related	[flat|nested] 5+ messages in thread
* Re: [PATCH net-next] mlx4: restore conditional call to napi_complete_done()
  2014-11-10 22:07     ` [PATCH net-next] mlx4: restore conditional call to napi_complete_done() Eric Dumazet
@ 2014-11-11  2:13       ` David Miller
  0 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2014-11-11  2:13 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, amirv, ogerlitz, willemb
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 10 Nov 2014 14:07:20 -0800
> From: Eric Dumazet <edumazet@google.com>
> 
> After commit 1a28817282 ("mlx4: use napi_complete_done()") we ended up
> calling napi_complete_done() in the case NAPI poll consumed all its
> budget.
> 
> This added extra interrupt pressure, this patch restores proper
> behavior.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Fixes: 1a28817282 ("mlx4: use napi_complete_done()")
Applied, thanks Eric.
^ permalink raw reply	[flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-11-11  2:13 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-07  5:10 [PATCH v2 net-next 2/2] mlx4: use napi_complete_done() Eric Dumazet
2014-11-07 22:00 ` David Miller
2014-11-10 18:01   ` Eric Dumazet
2014-11-10 22:07     ` [PATCH net-next] mlx4: restore conditional call to napi_complete_done() Eric Dumazet
2014-11-11  2:13       ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).