All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/9] net: extend busy polling support
@ 2015-11-17 13:56 Eric Dumazet
  2015-11-17 13:56 ` [PATCH net-next 1/9] net: better skb->sender_cpu and skb->napi_id cohabitation Eric Dumazet
                   ` (8 more replies)
  0 siblings, 9 replies; 15+ messages in thread
From: Eric Dumazet @ 2015-11-17 13:56 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eli Cohen, Amir Vadai, Ariel Elior, Willem de Bruijn,
	Rida Assaf, Eric Dumazet, Eric Dumazet

This patch series extends busy polling range to tunnels devices,
and allows NAPI drivers to support busy polling with trivial changes.

No need to provide ndo_busy_poll() method and extra synchronization
between ndo_busy_poll() and normal napi->poll() method. This was proven
very difficult and bug prone.

mlx5 driver is changed to support busy polling using this new method,
and a second mlx5 patch adds napi_complete_done() support and proper
SNMP accounting.

bnx2x and mlx4 drivers are converted to new infrastructure,
reducing kernel bloat and improving performance.

Eric Dumazet (9):
  net: better skb->sender_cpu and skb->napi_id cohabitation
  mlx4: mlx4_en_low_latency_recv() called with BH disabled
  net: un-inline sk_busy_loop()
  net: allow BH servicing in sk_busy_loop()
  net: network drivers no longer need to implement ndo_busy_poll()
  mlx5: add busy polling support
  mlx5: support napi_complete_done()
  bnx2x: remove bnx2x_low_latency_recv() support
  mlx4: remove mlx4_en_low_latency_recv()

 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h       | 113 -------------------
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c   |  46 +-------
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h   |   7 --
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c  |   3 -
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c   |  17 ---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c    |  40 -------
 drivers/net/ethernet/mellanox/mlx4/en_rx.c        |  15 +--
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h      | 126 ----------------------
 drivers/net/ethernet/mellanox/mlx5/core/en.h      |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |   6 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   |  16 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c |  11 +-
 include/linux/netdevice.h                         |   9 --
 include/linux/skbuff.h                            |   3 -
 include/net/busy_poll.h                           |  45 +-------
 net/core/dev.c                                    |  91 ++++++++++++----
 16 files changed, 98 insertions(+), 452 deletions(-)

-- 
2.6.0.rc2.230.g3dd15c0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next 1/9] net: better skb->sender_cpu and skb->napi_id cohabitation
  2015-11-17 13:56 [PATCH net-next 0/9] net: extend busy polling support Eric Dumazet
@ 2015-11-17 13:56 ` Eric Dumazet
  2015-11-17 17:42   ` Alexei Starovoitov
  2015-11-17 13:56 ` [PATCH net-next 2/9] mlx4: mlx4_en_low_latency_recv() called with BH disabled Eric Dumazet
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2015-11-17 13:56 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eli Cohen, Amir Vadai, Ariel Elior, Willem de Bruijn,
	Rida Assaf, Eric Dumazet, Eric Dumazet

skb->sender_cpu and skb->napi_id share a common storage,
and we had various bugs about this.

We had to call skb_sender_cpu_clear() in some places to
not leave a prior skb->napi_id and fool netdev_pick_tx()

As suggested by Alexei, we could split the space so that
these errors can not happen.

0 value being reserved as the common (not initialized) value,
let's reserve [1 .. NR_CPUS] range for valid sender_cpu,
and [NR_CPUS+1 .. ~0U] for valid napi_id.

This will allow proper busy polling support over tunnels.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/skbuff.h |  3 ---
 net/core/dev.c         | 33 ++++++++++++++++-----------------
 2 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4355129fff91..c9c394bf0771 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1082,9 +1082,6 @@ static inline void skb_copy_hash(struct sk_buff *to, const struct sk_buff *from)
 
 static inline void skb_sender_cpu_clear(struct sk_buff *skb)
 {
-#ifdef CONFIG_XPS
-	skb->sender_cpu = 0;
-#endif
 }
 
 #ifdef NET_SKBUFF_DATA_USES_OFFSET
diff --git a/net/core/dev.c b/net/core/dev.c
index ab9b8d0d115e..0014f0e11db6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -182,7 +182,7 @@ EXPORT_SYMBOL(dev_base_lock);
 /* protects napi_hash addition/deletion and napi_gen_id */
 static DEFINE_SPINLOCK(napi_hash_lock);
 
-static unsigned int napi_gen_id;
+static unsigned int napi_gen_id = NR_CPUS;
 static DEFINE_HASHTABLE(napi_hash, 8);
 
 static seqcount_t devnet_rename_seq;
@@ -3018,7 +3018,9 @@ struct netdev_queue *netdev_pick_tx(struct net_device *dev,
 	int queue_index = 0;
 
 #ifdef CONFIG_XPS
-	if (skb->sender_cpu == 0)
+	u32 sender_cpu = skb->sender_cpu - 1;
+
+	if (sender_cpu >= (u32)NR_CPUS)
 		skb->sender_cpu = raw_smp_processor_id() + 1;
 #endif
 
@@ -4673,25 +4675,22 @@ EXPORT_SYMBOL_GPL(napi_by_id);
 
 void napi_hash_add(struct napi_struct *napi)
 {
-	if (!test_and_set_bit(NAPI_STATE_HASHED, &napi->state)) {
+	if (test_and_set_bit(NAPI_STATE_HASHED, &napi->state))
+		return;
 
-		spin_lock(&napi_hash_lock);
+	spin_lock(&napi_hash_lock);
 
-		/* 0 is not a valid id, we also skip an id that is taken
-		 * we expect both events to be extremely rare
-		 */
-		napi->napi_id = 0;
-		while (!napi->napi_id) {
-			napi->napi_id = ++napi_gen_id;
-			if (napi_by_id(napi->napi_id))
-				napi->napi_id = 0;
-		}
+	/* 0..NR_CPUS+1 range is reserved for sender_cpu use */
+	do {
+		if (unlikely(++napi_gen_id < NR_CPUS + 1))
+			napi_gen_id = NR_CPUS + 1;
+	} while (napi_by_id(napi_gen_id));
+	napi->napi_id = napi_gen_id;
 
-		hlist_add_head_rcu(&napi->napi_hash_node,
-			&napi_hash[napi->napi_id % HASH_SIZE(napi_hash)]);
+	hlist_add_head_rcu(&napi->napi_hash_node,
+			   &napi_hash[napi->napi_id % HASH_SIZE(napi_hash)]);
 
-		spin_unlock(&napi_hash_lock);
-	}
+	spin_unlock(&napi_hash_lock);
 }
 EXPORT_SYMBOL_GPL(napi_hash_add);
 
-- 
2.6.0.rc2.230.g3dd15c0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 2/9] mlx4: mlx4_en_low_latency_recv() called with BH disabled
  2015-11-17 13:56 [PATCH net-next 0/9] net: extend busy polling support Eric Dumazet
  2015-11-17 13:56 ` [PATCH net-next 1/9] net: better skb->sender_cpu and skb->napi_id cohabitation Eric Dumazet
@ 2015-11-17 13:56 ` Eric Dumazet
  2015-11-17 13:56 ` [PATCH net-next 3/9] net: un-inline sk_busy_loop() Eric Dumazet
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2015-11-17 13:56 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eli Cohen, Amir Vadai, Ariel Elior, Willem de Bruijn,
	Rida Assaf, Eric Dumazet, Eric Dumazet

mlx4_en_low_latency_recv() is called with BH disabled,
as other ndo_busy_poll() methods.

No need for spin_lock_bh()/spin_unlock_bh()

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index c41f15102ae0..965c8f016ac4 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -660,11 +660,12 @@ static inline bool mlx4_en_cq_unlock_napi(struct mlx4_en_cq *cq)
 	return rc;
 }
 
-/* called from mlx4_en_low_latency_poll() */
+/* called from mlx4_en_low_latency_recv(), BH are disabled */
 static inline bool mlx4_en_cq_lock_poll(struct mlx4_en_cq *cq)
 {
 	int rc = true;
-	spin_lock_bh(&cq->poll_lock);
+
+	spin_lock(&cq->poll_lock);
 	if ((cq->state & MLX4_CQ_LOCKED)) {
 		struct net_device *dev = cq->dev;
 		struct mlx4_en_priv *priv = netdev_priv(dev);
@@ -676,7 +677,7 @@ static inline bool mlx4_en_cq_lock_poll(struct mlx4_en_cq *cq)
 	} else
 		/* preserve yield marks */
 		cq->state |= MLX4_EN_CQ_STATE_POLL;
-	spin_unlock_bh(&cq->poll_lock);
+	spin_unlock(&cq->poll_lock);
 	return rc;
 }
 
@@ -684,13 +685,14 @@ static inline bool mlx4_en_cq_lock_poll(struct mlx4_en_cq *cq)
 static inline bool mlx4_en_cq_unlock_poll(struct mlx4_en_cq *cq)
 {
 	int rc = false;
-	spin_lock_bh(&cq->poll_lock);
+
+	spin_lock(&cq->poll_lock);
 	WARN_ON(cq->state & (MLX4_EN_CQ_STATE_NAPI));
 
 	if (cq->state & MLX4_EN_CQ_STATE_POLL_YIELD)
 		rc = true;
 	cq->state = MLX4_EN_CQ_STATE_IDLE;
-	spin_unlock_bh(&cq->poll_lock);
+	spin_unlock(&cq->poll_lock);
 	return rc;
 }
 
-- 
2.6.0.rc2.230.g3dd15c0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 3/9] net: un-inline sk_busy_loop()
  2015-11-17 13:56 [PATCH net-next 0/9] net: extend busy polling support Eric Dumazet
  2015-11-17 13:56 ` [PATCH net-next 1/9] net: better skb->sender_cpu and skb->napi_id cohabitation Eric Dumazet
  2015-11-17 13:56 ` [PATCH net-next 2/9] mlx4: mlx4_en_low_latency_recv() called with BH disabled Eric Dumazet
@ 2015-11-17 13:56 ` Eric Dumazet
  2015-11-17 14:15   ` Eric Dumazet
  2015-11-17 13:56 ` [PATCH net-next 4/9] net: allow BH servicing in sk_busy_loop() Eric Dumazet
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2015-11-17 13:56 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eli Cohen, Amir Vadai, Ariel Elior, Willem de Bruijn,
	Rida Assaf, Eric Dumazet, Eric Dumazet

There is really little gain from inlining this big function.
We'll soon make it even bigger in following patches.

This means we no longer need to export napi_by_id()

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netdevice.h |  9 ---------
 include/net/busy_poll.h   | 45 +--------------------------------------------
 net/core/dev.c            | 47 +++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 46 insertions(+), 55 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d20891465247..5695c5efa188 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -461,15 +461,6 @@ static inline void napi_complete(struct napi_struct *n)
 }
 
 /**
- *	napi_by_id - lookup a NAPI by napi_id
- *	@napi_id: hashed napi_id
- *
- * lookup @napi_id in napi_hash table
- * must be called under rcu_read_lock()
- */
-struct napi_struct *napi_by_id(unsigned int napi_id);
-
-/**
  *	napi_hash_add - add a NAPI to global hashtable
  *	@napi: napi context
  *
diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index 1d67fb6b23a0..2fbeb1313c0f 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -72,50 +72,7 @@ static inline bool busy_loop_timeout(unsigned long end_time)
 	return time_after(now, end_time);
 }
 
-/* when used in sock_poll() nonblock is known at compile time to be true
- * so the loop and end_time will be optimized out
- */
-static inline bool sk_busy_loop(struct sock *sk, int nonblock)
-{
-	unsigned long end_time = !nonblock ? sk_busy_loop_end_time(sk) : 0;
-	const struct net_device_ops *ops;
-	struct napi_struct *napi;
-	int rc = false;
-
-	/*
-	 * rcu read lock for napi hash
-	 * bh so we don't race with net_rx_action
-	 */
-	rcu_read_lock_bh();
-
-	napi = napi_by_id(sk->sk_napi_id);
-	if (!napi)
-		goto out;
-
-	ops = napi->dev->netdev_ops;
-	if (!ops->ndo_busy_poll)
-		goto out;
-
-	do {
-		rc = ops->ndo_busy_poll(napi);
-
-		if (rc == LL_FLUSH_FAILED)
-			break; /* permanent failure */
-
-		if (rc > 0)
-			/* local bh are disabled so it is ok to use _BH */
-			NET_ADD_STATS_BH(sock_net(sk),
-					 LINUX_MIB_BUSYPOLLRXPACKETS, rc);
-		cpu_relax();
-
-	} while (!nonblock && skb_queue_empty(&sk->sk_receive_queue) &&
-		 !need_resched() && !busy_loop_timeout(end_time));
-
-	rc = !skb_queue_empty(&sk->sk_receive_queue);
-out:
-	rcu_read_unlock_bh();
-	return rc;
-}
+bool sk_busy_loop(struct sock *sk, int nonblock);
 
 /* used in the NIC receive handler to mark the skb */
 static inline void skb_mark_napi_id(struct sk_buff *skb,
diff --git a/net/core/dev.c b/net/core/dev.c
index 0014f0e11db6..8902b01a827b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -96,6 +96,7 @@
 #include <linux/skbuff.h>
 #include <net/net_namespace.h>
 #include <net/sock.h>
+#include <net/busy_poll.h>
 #include <linux/rtnetlink.h>
 #include <linux/stat.h>
 #include <net/dst.h>
@@ -4660,7 +4661,7 @@ void napi_complete_done(struct napi_struct *n, int work_done)
 EXPORT_SYMBOL(napi_complete_done);
 
 /* must be called under rcu_read_lock(), as we dont take a reference */
-struct napi_struct *napi_by_id(unsigned int napi_id)
+static struct napi_struct *napi_by_id(unsigned int napi_id)
 {
 	unsigned int hash = napi_id % HASH_SIZE(napi_hash);
 	struct napi_struct *napi;
@@ -4671,7 +4672,49 @@ struct napi_struct *napi_by_id(unsigned int napi_id)
 
 	return NULL;
 }
-EXPORT_SYMBOL_GPL(napi_by_id);
+
+bool sk_busy_loop(struct sock *sk, int nonblock)
+{
+	unsigned long end_time = !nonblock ? sk_busy_loop_end_time(sk) : 0;
+	const struct net_device_ops *ops;
+	struct napi_struct *napi;
+	int rc = false;
+
+	/*
+	 * rcu read lock for napi hash
+	 * bh so we don't race with net_rx_action
+	 */
+	rcu_read_lock_bh();
+
+	napi = napi_by_id(sk->sk_napi_id);
+	if (!napi)
+		goto out;
+
+	ops = napi->dev->netdev_ops;
+	if (!ops->ndo_busy_poll)
+		goto out;
+
+	do {
+		rc = ops->ndo_busy_poll(napi);
+
+		if (rc == LL_FLUSH_FAILED)
+			break; /* permanent failure */
+
+		if (rc > 0)
+			/* local bh are disabled so it is ok to use _BH */
+			NET_ADD_STATS_BH(sock_net(sk),
+					 LINUX_MIB_BUSYPOLLRXPACKETS, rc);
+		cpu_relax();
+
+	} while (!nonblock && skb_queue_empty(&sk->sk_receive_queue) &&
+		 !need_resched() && !busy_loop_timeout(end_time));
+
+	rc = !skb_queue_empty(&sk->sk_receive_queue);
+out:
+	rcu_read_unlock_bh();
+	return rc;
+}
+EXPORT_SYMBOL(sk_busy_loop);
 
 void napi_hash_add(struct napi_struct *napi)
 {
-- 
2.6.0.rc2.230.g3dd15c0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 4/9] net: allow BH servicing in sk_busy_loop()
  2015-11-17 13:56 [PATCH net-next 0/9] net: extend busy polling support Eric Dumazet
                   ` (2 preceding siblings ...)
  2015-11-17 13:56 ` [PATCH net-next 3/9] net: un-inline sk_busy_loop() Eric Dumazet
@ 2015-11-17 13:56 ` Eric Dumazet
  2015-11-17 13:57 ` [PATCH net-next 5/9] net: network drivers no longer need to implement ndo_busy_poll() Eric Dumazet
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2015-11-17 13:56 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eli Cohen, Amir Vadai, Ariel Elior, Willem de Bruijn,
	Rida Assaf, Eric Dumazet, Eric Dumazet

Instead of blocking BH in whole sk_busy_loop(), block them
only around ->ndo_busy_poll() calls.

This has many benefits.

1) allow tunneled traffic to use busy poll as well as native traffic.
   Tunnels handlers usually call netif_rx() and depend on net_rx_action()
   being run (from sofirq handler)

2) allow RFS/RPS being used (sending IPI to other cpus if needed)

3) use the 'lets burn cpu cycles' budget to do useful work
   (like TX completions, timers, RCU callbacks...)

4) reduce BH latencies, making busy poll a better citizen.

Tested:

Tested with SIT tunnel

lpaa5:~# echo 0 >/proc/sys/net/core/busy_read
lpaa5:~# ./netperf -H 2002:af6:786::1 -t TCP_RR
MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:786::1 () port 0 AF_INET6 : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    37373.93
16384  87380

Now enable busy poll on both hosts

lpaa5:~# echo 70 >/proc/sys/net/core/busy_read
lpaa6:~# echo 70 >/proc/sys/net/core/busy_read

lpaa5:~# ./netperf -H 2002:af6:786::1 -t TCP_RR
MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:786::1 () port 0 AF_INET6 : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    58314.77
16384  87380

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/dev.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 8902b01a827b..4d0fbf5987ff 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4680,11 +4680,7 @@ bool sk_busy_loop(struct sock *sk, int nonblock)
 	struct napi_struct *napi;
 	int rc = false;
 
-	/*
-	 * rcu read lock for napi hash
-	 * bh so we don't race with net_rx_action
-	 */
-	rcu_read_lock_bh();
+	rcu_read_lock();
 
 	napi = napi_by_id(sk->sk_napi_id);
 	if (!napi)
@@ -4695,23 +4691,23 @@ bool sk_busy_loop(struct sock *sk, int nonblock)
 		goto out;
 
 	do {
+		local_bh_disable();
 		rc = ops->ndo_busy_poll(napi);
+		if (rc > 0)
+			NET_ADD_STATS_BH(sock_net(sk),
+					 LINUX_MIB_BUSYPOLLRXPACKETS, rc);
+		local_bh_enable();
 
 		if (rc == LL_FLUSH_FAILED)
 			break; /* permanent failure */
 
-		if (rc > 0)
-			/* local bh are disabled so it is ok to use _BH */
-			NET_ADD_STATS_BH(sock_net(sk),
-					 LINUX_MIB_BUSYPOLLRXPACKETS, rc);
 		cpu_relax();
-
 	} while (!nonblock && skb_queue_empty(&sk->sk_receive_queue) &&
 		 !need_resched() && !busy_loop_timeout(end_time));
 
 	rc = !skb_queue_empty(&sk->sk_receive_queue);
 out:
-	rcu_read_unlock_bh();
+	rcu_read_unlock();
 	return rc;
 }
 EXPORT_SYMBOL(sk_busy_loop);
-- 
2.6.0.rc2.230.g3dd15c0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 5/9] net: network drivers no longer need to implement ndo_busy_poll()
  2015-11-17 13:56 [PATCH net-next 0/9] net: extend busy polling support Eric Dumazet
                   ` (3 preceding siblings ...)
  2015-11-17 13:56 ` [PATCH net-next 4/9] net: allow BH servicing in sk_busy_loop() Eric Dumazet
@ 2015-11-17 13:57 ` Eric Dumazet
  2015-11-17 15:14   ` Eliezer Tamir
  2015-11-17 13:57 ` [PATCH net-next 6/9] mlx5: add busy polling support Eric Dumazet
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2015-11-17 13:57 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eli Cohen, Amir Vadai, Ariel Elior, Willem de Bruijn,
	Rida Assaf, Eric Dumazet, Eric Dumazet

Instead of having to implement complex ndo_busy_poll() method,
drivers can simply rely on NAPI poll logic.

Busy polling gains are mainly coming from polling itself,
not on exact details on how we poll the device.

ndo_busy_poll() if implemented can avoid touching
napi state, but it adds extra synchronization between
normal napi->poll() and busy poll handler, slowing down
the common path (non busy polling) with extra atomic operations.
In practice few drivers ever got busy poll because of the complexity.

We could go one step further, and make busy polling
available for all NAPI drivers, but this would require
that all netif_napi_del() calls are done in process context
so that we can call synchronize_rcu().
Full audit would be required.

Before this is done, a driver still needs to call :

- skb_mark_napi_id() for each skb provided to the stack, although we can
  easily do this directly in core networking stack in a future patch.

- napi_hash_add() and napi_hash_del() to allocate/free a napi_id per napi.

- Make sure RCU grace period is respected after napi_hash_del() before
  memory containing napi structure is freed.

Followup patch implements busy poll for mlx5 driver as an example.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/dev.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 4d0fbf5987ff..21a7d4093fde 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4673,10 +4673,11 @@ static struct napi_struct *napi_by_id(unsigned int napi_id)
 	return NULL;
 }
 
+#define BUSY_POLL_BUDGET 8
 bool sk_busy_loop(struct sock *sk, int nonblock)
 {
 	unsigned long end_time = !nonblock ? sk_busy_loop_end_time(sk) : 0;
-	const struct net_device_ops *ops;
+	int (*busy_poll)(struct napi_struct *dev);
 	struct napi_struct *napi;
 	int rc = false;
 
@@ -4686,13 +4687,27 @@ bool sk_busy_loop(struct sock *sk, int nonblock)
 	if (!napi)
 		goto out;
 
-	ops = napi->dev->netdev_ops;
-	if (!ops->ndo_busy_poll)
-		goto out;
+	/* Note: ndo_busy_poll method is optional in linux-4.5 */
+	busy_poll = napi->dev->netdev_ops->ndo_busy_poll;
 
 	do {
+		rc = 0;
 		local_bh_disable();
-		rc = ops->ndo_busy_poll(napi);
+		if (busy_poll) {
+			rc = busy_poll(napi);
+		} else if (napi_schedule_prep(napi)) {
+			void *have = netpoll_poll_lock(napi);
+
+			if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
+				rc = napi->poll(napi, BUSY_POLL_BUDGET);
+				trace_napi_poll(napi);
+				if (rc == BUSY_POLL_BUDGET) {
+					napi_complete_done(napi, rc);
+					napi_schedule(napi);
+				}
+			}
+			netpoll_poll_unlock(have);
+		}
 		if (rc > 0)
 			NET_ADD_STATS_BH(sock_net(sk),
 					 LINUX_MIB_BUSYPOLLRXPACKETS, rc);
-- 
2.6.0.rc2.230.g3dd15c0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 6/9] mlx5: add busy polling support
  2015-11-17 13:56 [PATCH net-next 0/9] net: extend busy polling support Eric Dumazet
                   ` (4 preceding siblings ...)
  2015-11-17 13:57 ` [PATCH net-next 5/9] net: network drivers no longer need to implement ndo_busy_poll() Eric Dumazet
@ 2015-11-17 13:57 ` Eric Dumazet
  2015-11-17 13:57 ` [PATCH net-next 7/9] mlx5: support napi_complete_done() Eric Dumazet
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2015-11-17 13:57 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eli Cohen, Amir Vadai, Ariel Elior, Willem de Bruijn,
	Rida Assaf, Eric Dumazet, Eric Dumazet

It is now easy to add busy polling support to a NAPI driver,
with very little impact on normal input path.

This patch serves as a reference implementation.

Note:

A followup patch will add proper napi_complete_done() in mlx5,
so that LINUX_MIB_BUSYPOLLRXPACKETS snmp counter is properly handled.

Tested:

Normal TCP_RR results without busy polling :

lpk51:~# echo 0 >/proc/sys/net/core/busy_read
lpk52:~# echo 0 >/proc/sys/net/core/busy_read

lpk51:~# ./netperf -H 192.168.4.52 -t TCP_RR -l 10
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.4.52 () port 0 AF_INET : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    53509.49
16384  87380

Now enable busy polling :

lpk51:~# echo 70 >/proc/sys/net/core/busy_read
lpk52:~# echo 70 >/proc/sys/net/core/busy_read

lpk51:~# ./netperf -H 192.168.4.52 -t TCP_RR -l 10
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.4.52 () port 0 AF_INET : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    97530.92
16384  87380

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 6 ++++++
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 5fc4d2d78cdf..174b7e19290a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -982,6 +982,7 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
 	mlx5e_build_channeltc_to_txq_map(priv, ix);
 
 	netif_napi_add(netdev, &c->napi, mlx5e_napi_poll, 64);
+	napi_hash_add(&c->napi);
 
 	err = mlx5e_open_tx_cqs(c, cparam);
 	if (err)
@@ -1020,6 +1021,7 @@ err_close_tx_cqs:
 
 err_napi_del:
 	netif_napi_del(&c->napi);
+	napi_hash_del(&c->napi);
 	kfree(c);
 
 	return err;
@@ -1033,6 +1035,10 @@ static void mlx5e_close_channel(struct mlx5e_channel *c)
 	mlx5e_close_cq(&c->rq.cq);
 	mlx5e_close_tx_cqs(c);
 	netif_napi_del(&c->napi);
+
+	napi_hash_del(&c->napi);
+	synchronize_rcu();
+
 	kfree(c);
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index cf0098596e85..54800c61a563 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -33,6 +33,7 @@
 #include <linux/ip.h>
 #include <linux/ipv6.h>
 #include <linux/tcp.h>
+#include <net/busy_poll.h>
 #include "en.h"
 
 static inline int mlx5e_alloc_rx_wqe(struct mlx5e_rq *rq,
@@ -242,6 +243,7 @@ bool mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget)
 		wqe            = mlx5_wq_ll_get_wqe(&rq->wq, wqe_counter);
 		skb            = rq->skb[wqe_counter];
 		prefetch(skb->data);
+		skb_mark_napi_id(skb, cq->napi);
 		rq->skb[wqe_counter] = NULL;
 
 		dma_unmap_single(rq->pdev,
-- 
2.6.0.rc2.230.g3dd15c0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 7/9] mlx5: support napi_complete_done()
  2015-11-17 13:56 [PATCH net-next 0/9] net: extend busy polling support Eric Dumazet
                   ` (5 preceding siblings ...)
  2015-11-17 13:57 ` [PATCH net-next 6/9] mlx5: add busy polling support Eric Dumazet
@ 2015-11-17 13:57 ` Eric Dumazet
  2015-11-17 13:57 ` [PATCH net-next 8/9] bnx2x: remove bnx2x_low_latency_recv() support Eric Dumazet
  2015-11-17 13:57 ` [PATCH net-next 9/9] mlx4: remove mlx4_en_low_latency_recv() Eric Dumazet
  8 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2015-11-17 13:57 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eli Cohen, Amir Vadai, Ariel Elior, Willem de Bruijn,
	Rida Assaf, Eric Dumazet, Eric Dumazet

A NAPI poll handler should return number of RX packets processed,
instead of 0 / budget.

This allows proper busy poll accounting through LINUX_MIB_BUSYPOLLRXPACKETS
SNMP counter.

napi_complete_done() allows /sys/class/net/ethX/gro_flush_timeout
to be used for finer GRO aggregation control.

Tested:

Enabled busy polling, and checked TcpExtBusyPollRxPackets counter is increasing.

echo 70 >/proc/sys/net/core/busy_read
nstat >/dev/null
netperf -H target -t TCP_RR >/dev/null
nstat | grep TcpExtBusyPollRxPackets
TcpExtBusyPollRxPackets         490958             0.0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eli Cohen <eli@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h      |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   | 14 ++++++--------
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 11 ++++++-----
 3 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index f2ae62dd8c09..633b22a17b77 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -558,7 +558,7 @@ void mlx5e_completion_event(struct mlx5_core_cq *mcq);
 void mlx5e_cq_error_event(struct mlx5_core_cq *mcq, enum mlx5_event event);
 int mlx5e_napi_poll(struct napi_struct *napi, int budget);
 bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq);
-bool mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget);
+int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget);
 bool mlx5e_post_rx_wqes(struct mlx5e_rq *rq);
 struct mlx5_cqe64 *mlx5e_get_cqe(struct mlx5e_cq *cq);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 54800c61a563..fe752f8e24b9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -216,16 +216,16 @@ static inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
 				       be16_to_cpu(cqe->vlan_info));
 }
 
-bool mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget)
+int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget)
 {
 	struct mlx5e_rq *rq = container_of(cq, struct mlx5e_rq, cq);
-	int i;
+	int work_done;
 
 	/* avoid accessing cq (dma coherent memory) if not needed */
 	if (!test_and_clear_bit(MLX5E_CQ_HAS_CQES, &cq->flags))
-		return false;
+		return 0;
 
-	for (i = 0; i < budget; i++) {
+	for (work_done = 0; work_done < budget; work_done++) {
 		struct mlx5e_rx_wqe *wqe;
 		struct mlx5_cqe64 *cqe;
 		struct sk_buff *skb;
@@ -271,10 +271,8 @@ wq_ll_pop:
 	/* ensure cq space is freed before enabling more cqes */
 	wmb();
 
-	if (i == budget) {
+	if (work_done == budget)
 		set_bit(MLX5E_CQ_HAS_CQES, &cq->flags);
-		return true;
-	}
 
-	return false;
+	return work_done;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index 2c7cb6755d1d..4ac8d716dbdd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -54,6 +54,7 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
 	struct mlx5e_channel *c = container_of(napi, struct mlx5e_channel,
 					       napi);
 	bool busy = false;
+	int work_done;
 	int i;
 
 	clear_bit(MLX5E_CHANNEL_NAPI_SCHED, &c->flags);
@@ -61,26 +62,26 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
 	for (i = 0; i < c->num_tc; i++)
 		busy |= mlx5e_poll_tx_cq(&c->sq[i].cq);
 
-	busy |= mlx5e_poll_rx_cq(&c->rq.cq, budget);
-
+	work_done = mlx5e_poll_rx_cq(&c->rq.cq, budget);
+	busy |= work_done == budget;
 	busy |= mlx5e_post_rx_wqes(&c->rq);
 
 	if (busy)
 		return budget;
 
-	napi_complete(napi);
+	napi_complete_done(napi, work_done);
 
 	/* avoid losing completion event during/after polling cqs */
 	if (test_bit(MLX5E_CHANNEL_NAPI_SCHED, &c->flags)) {
 		napi_schedule(napi);
-		return 0;
+		return work_done;
 	}
 
 	for (i = 0; i < c->num_tc; i++)
 		mlx5e_cq_arm(&c->sq[i].cq);
 	mlx5e_cq_arm(&c->rq.cq);
 
-	return 0;
+	return work_done;
 }
 
 void mlx5e_completion_event(struct mlx5_core_cq *mcq)
-- 
2.6.0.rc2.230.g3dd15c0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 8/9] bnx2x: remove bnx2x_low_latency_recv() support
  2015-11-17 13:56 [PATCH net-next 0/9] net: extend busy polling support Eric Dumazet
                   ` (6 preceding siblings ...)
  2015-11-17 13:57 ` [PATCH net-next 7/9] mlx5: support napi_complete_done() Eric Dumazet
@ 2015-11-17 13:57 ` Eric Dumazet
  2015-11-17 13:57 ` [PATCH net-next 9/9] mlx4: remove mlx4_en_low_latency_recv() Eric Dumazet
  8 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2015-11-17 13:57 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eli Cohen, Amir Vadai, Ariel Elior, Willem de Bruijn,
	Rida Assaf, Eric Dumazet, Eric Dumazet

Switch to native NAPI polling, as this reduces overhead and complexity.

Normal path is faster, since one cmpxchg() is not anymore requested,
and busy polling with the NAPI polling has same performance.

Tested:
lpk50:~# cat /proc/sys/net/core/busy_read
70
lpk50:~# nstat >/dev/null;./netperf -H lpk55 -t TCP_RR;nstat
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpk55.prod.google.com () port 0 AF_INET : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    40095.07
16384  87380
IpInReceives                    401062             0.0
IpInDelivers                    401062             0.0
IpOutRequests                   401079             0.0
TcpActiveOpens                  7                  0.0
TcpPassiveOpens                 3                  0.0
TcpAttemptFails                 3                  0.0
TcpEstabResets                  5                  0.0
TcpInSegs                       401036             0.0
TcpOutSegs                      401052             0.0
TcpOutRsts                      38                 0.0
UdpInDatagrams                  26                 0.0
UdpOutDatagrams                 27                 0.0
Ip6OutNoRoutes                  1                  0.0
TcpExtDelayedACKs               1                  0.0
TcpExtTCPPrequeued              98                 0.0
TcpExtTCPDirectCopyFromPrequeue 98                 0.0
TcpExtTCPHPHits                 4                  0.0
TcpExtTCPHPHitsToUser           98                 0.0
TcpExtTCPPureAcks               5                  0.0
TcpExtTCPHPAcks                 101                0.0
TcpExtTCPAbortOnData            6                  0.0
TcpExtBusyPollRxPackets         400832             0.0
TcpExtTCPOrigDataSent           400983             0.0
IpExtInOctets                   21273867           0.0
IpExtOutOctets                  21261254           0.0
IpExtInNoECTPkts                401064             0.0

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h      | 113 -----------------------
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  |  46 +--------
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h  |   7 --
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |   3 -
 4 files changed, 2 insertions(+), 167 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
index b5e64b02200c..0b214b5d944a 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
@@ -540,10 +540,6 @@ struct bnx2x_fastpath {
 
 	struct napi_struct	napi;
 
-#ifdef CONFIG_NET_RX_BUSY_POLL
-	unsigned long		busy_poll_state;
-#endif
-
 	union host_hc_status_block	status_blk;
 	/* chip independent shortcuts into sb structure */
 	__le16			*sb_index_values;
@@ -617,115 +613,6 @@ struct bnx2x_fastpath {
 #define bnx2x_fp_stats(bp, fp)	(&((bp)->fp_stats[(fp)->index]))
 #define bnx2x_fp_qstats(bp, fp)	(&((bp)->fp_stats[(fp)->index].eth_q_stats))
 
-#ifdef CONFIG_NET_RX_BUSY_POLL
-
-enum bnx2x_fp_state {
-	BNX2X_STATE_FP_NAPI	= BIT(0), /* NAPI handler owns the queue */
-
-	BNX2X_STATE_FP_NAPI_REQ_BIT = 1, /* NAPI would like to own the queue */
-	BNX2X_STATE_FP_NAPI_REQ = BIT(1),
-
-	BNX2X_STATE_FP_POLL_BIT = 2,
-	BNX2X_STATE_FP_POLL     = BIT(2), /* busy_poll owns the queue */
-
-	BNX2X_STATE_FP_DISABLE_BIT = 3, /* queue is dismantled */
-};
-
-static inline void bnx2x_fp_busy_poll_init(struct bnx2x_fastpath *fp)
-{
-	WRITE_ONCE(fp->busy_poll_state, 0);
-}
-
-/* called from the device poll routine to get ownership of a FP */
-static inline bool bnx2x_fp_lock_napi(struct bnx2x_fastpath *fp)
-{
-	unsigned long prev, old = READ_ONCE(fp->busy_poll_state);
-
-	while (1) {
-		switch (old) {
-		case BNX2X_STATE_FP_POLL:
-			/* make sure bnx2x_fp_lock_poll() wont starve us */
-			set_bit(BNX2X_STATE_FP_NAPI_REQ_BIT,
-				&fp->busy_poll_state);
-			/* fallthrough */
-		case BNX2X_STATE_FP_POLL | BNX2X_STATE_FP_NAPI_REQ:
-			return false;
-		default:
-			break;
-		}
-		prev = cmpxchg(&fp->busy_poll_state, old, BNX2X_STATE_FP_NAPI);
-		if (unlikely(prev != old)) {
-			old = prev;
-			continue;
-		}
-		return true;
-	}
-}
-
-static inline void bnx2x_fp_unlock_napi(struct bnx2x_fastpath *fp)
-{
-	smp_wmb();
-	fp->busy_poll_state = 0;
-}
-
-/* called from bnx2x_low_latency_poll() */
-static inline bool bnx2x_fp_lock_poll(struct bnx2x_fastpath *fp)
-{
-	return cmpxchg(&fp->busy_poll_state, 0, BNX2X_STATE_FP_POLL) == 0;
-}
-
-static inline void bnx2x_fp_unlock_poll(struct bnx2x_fastpath *fp)
-{
-	smp_mb__before_atomic();
-	clear_bit(BNX2X_STATE_FP_POLL_BIT, &fp->busy_poll_state);
-}
-
-/* true if a socket is polling */
-static inline bool bnx2x_fp_ll_polling(struct bnx2x_fastpath *fp)
-{
-	return READ_ONCE(fp->busy_poll_state) & BNX2X_STATE_FP_POLL;
-}
-
-/* false if fp is currently owned */
-static inline bool bnx2x_fp_ll_disable(struct bnx2x_fastpath *fp)
-{
-	set_bit(BNX2X_STATE_FP_DISABLE_BIT, &fp->busy_poll_state);
-	return !bnx2x_fp_ll_polling(fp);
-
-}
-#else
-static inline void bnx2x_fp_busy_poll_init(struct bnx2x_fastpath *fp)
-{
-}
-
-static inline bool bnx2x_fp_lock_napi(struct bnx2x_fastpath *fp)
-{
-	return true;
-}
-
-static inline void bnx2x_fp_unlock_napi(struct bnx2x_fastpath *fp)
-{
-}
-
-static inline bool bnx2x_fp_lock_poll(struct bnx2x_fastpath *fp)
-{
-	return false;
-}
-
-static inline void bnx2x_fp_unlock_poll(struct bnx2x_fastpath *fp)
-{
-}
-
-static inline bool bnx2x_fp_ll_polling(struct bnx2x_fastpath *fp)
-{
-	return false;
-}
-static inline bool bnx2x_fp_ll_disable(struct bnx2x_fastpath *fp)
-{
-	return true;
-}
-#endif /* CONFIG_NET_RX_BUSY_POLL */
-
 /* Use 2500 as a mini-jumbo MTU for FCoE */
 #define BNX2X_FCOE_MINI_JUMBO_MTU	2500
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index f8d7a2f06950..ca208a7eecd5 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -1096,10 +1096,7 @@ reuse_rx:
 
 		skb_mark_napi_id(skb, &fp->napi);
 
-		if (bnx2x_fp_ll_polling(fp))
-			netif_receive_skb(skb);
-		else
-			napi_gro_receive(&fp->napi, skb);
+		napi_gro_receive(&fp->napi, skb);
 next_rx:
 		rx_buf->data = NULL;
 
@@ -1869,7 +1866,6 @@ static void bnx2x_napi_enable_cnic(struct bnx2x *bp)
 	int i;
 
 	for_each_rx_queue_cnic(bp, i) {
-		bnx2x_fp_busy_poll_init(&bp->fp[i]);
 		napi_enable(&bnx2x_fp(bp, i, napi));
 	}
 }
@@ -1879,7 +1875,6 @@ static void bnx2x_napi_enable(struct bnx2x *bp)
 	int i;
 
 	for_each_eth_queue(bp, i) {
-		bnx2x_fp_busy_poll_init(&bp->fp[i]);
 		napi_enable(&bnx2x_fp(bp, i, napi));
 	}
 }
@@ -1890,8 +1885,6 @@ static void bnx2x_napi_disable_cnic(struct bnx2x *bp)
 
 	for_each_rx_queue_cnic(bp, i) {
 		napi_disable(&bnx2x_fp(bp, i, napi));
-		while (!bnx2x_fp_ll_disable(&bp->fp[i]))
-			usleep_range(1000, 2000);
 	}
 }
 
@@ -1901,8 +1894,6 @@ static void bnx2x_napi_disable(struct bnx2x *bp)
 
 	for_each_eth_queue(bp, i) {
 		napi_disable(&bnx2x_fp(bp, i, napi));
-		while (!bnx2x_fp_ll_disable(&bp->fp[i]))
-			usleep_range(1000, 2000);
 	}
 }
 
@@ -3232,9 +3223,6 @@ static int bnx2x_poll(struct napi_struct *napi, int budget)
 			return 0;
 		}
 #endif
-		if (!bnx2x_fp_lock_napi(fp))
-			return budget;
-
 		for_each_cos_in_tx_queue(fp, cos)
 			if (bnx2x_tx_queue_has_work(fp->txdata_ptr[cos]))
 				bnx2x_tx_int(bp, fp->txdata_ptr[cos]);
@@ -3243,14 +3231,10 @@ static int bnx2x_poll(struct napi_struct *napi, int budget)
 			work_done += bnx2x_rx_int(fp, budget - work_done);
 
 			/* must not complete if we consumed full budget */
-			if (work_done >= budget) {
-				bnx2x_fp_unlock_napi(fp);
+			if (work_done >= budget)
 				break;
-			}
 		}
 
-		bnx2x_fp_unlock_napi(fp);
-
 		/* Fall out from the NAPI loop if needed */
 		if (!(bnx2x_has_rx_work(fp) || bnx2x_has_tx_work(fp))) {
 
@@ -3294,32 +3278,6 @@ static int bnx2x_poll(struct napi_struct *napi, int budget)
 	return work_done;
 }
 
-#ifdef CONFIG_NET_RX_BUSY_POLL
-/* must be called with local_bh_disable()d */
-int bnx2x_low_latency_recv(struct napi_struct *napi)
-{
-	struct bnx2x_fastpath *fp = container_of(napi, struct bnx2x_fastpath,
-						 napi);
-	struct bnx2x *bp = fp->bp;
-	int found = 0;
-
-	if ((bp->state == BNX2X_STATE_CLOSED) ||
-	    (bp->state == BNX2X_STATE_ERROR) ||
-	    (bp->dev->features & (NETIF_F_LRO | NETIF_F_GRO)))
-		return LL_FLUSH_FAILED;
-
-	if (!bnx2x_fp_lock_poll(fp))
-		return LL_FLUSH_BUSY;
-
-	if (bnx2x_has_rx_work(fp))
-		found = bnx2x_rx_int(fp, 4);
-
-	bnx2x_fp_unlock_poll(fp);
-
-	return found;
-}
-#endif
-
 /* we split the first BD into headers and data BDs
  * to ease the pain of our fellow microcode engineers
  * we use one mapping for both BDs
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index b7d32e8412f1..4cbb03f87b5a 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -570,13 +570,6 @@ int bnx2x_enable_msix(struct bnx2x *bp);
 int bnx2x_enable_msi(struct bnx2x *bp);
 
 /**
- * bnx2x_low_latency_recv - LL callback
- *
- * @napi:	napi structure
- */
-int bnx2x_low_latency_recv(struct napi_struct *napi);
-
-/**
  * bnx2x_alloc_mem_bp - allocate memories outsize main driver structure
  *
  * @bp:		driver handle
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index f1d62d5dbaff..4c26aff9c0e5 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -13004,9 +13004,6 @@ static const struct net_device_ops bnx2x_netdev_ops = {
 	.ndo_fcoe_get_wwn	= bnx2x_fcoe_get_wwn,
 #endif
 
-#ifdef CONFIG_NET_RX_BUSY_POLL
-	.ndo_busy_poll		= bnx2x_low_latency_recv,
-#endif
 	.ndo_get_phys_port_id	= bnx2x_get_phys_port_id,
 	.ndo_set_vf_link_state	= bnx2x_set_vf_link_state,
 	.ndo_features_check	= bnx2x_features_check,
-- 
2.6.0.rc2.230.g3dd15c0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 9/9] mlx4: remove mlx4_en_low_latency_recv()
  2015-11-17 13:56 [PATCH net-next 0/9] net: extend busy polling support Eric Dumazet
                   ` (7 preceding siblings ...)
  2015-11-17 13:57 ` [PATCH net-next 8/9] bnx2x: remove bnx2x_low_latency_recv() support Eric Dumazet
@ 2015-11-17 13:57 ` Eric Dumazet
  8 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2015-11-17 13:57 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eli Cohen, Amir Vadai, Ariel Elior, Willem de Bruijn,
	Rida Assaf, Eric Dumazet, Eric Dumazet

Busy polling can now be handled in generic NAPI poll infrastructure.
This removes complexity and fast path overhead :

mlx4 used two spin_lock()/spin_unlock() pair per napi->poll() call
in mlx4_en_cq_lock_napi()/mlx4_en_cq_unlock_napi()

Tested:

Without busy polling :

lpaa23:~# echo 0 >/proc/sys/net/core/busy_read
lpaa24:~# echo 0 >/proc/sys/net/core/busy_read
lpaa23:~# ./netperf -H lpaa24 -t TCP_RR
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    47330.78

With busy polling :

lpaa23:~# echo 70 >/proc/sys/net/core/busy_read
lpaa24:~# echo 70 >/proc/sys/net/core/busy_read
lpaa23:~# ./netperf -H lpaa24 -t TCP_RR
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    97643.55

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |  17 ----
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |  40 --------
 drivers/net/ethernet/mellanox/mlx4/en_rx.c      |  15 +--
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h    | 128 ------------------------
 4 files changed, 2 insertions(+), 198 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index ddb5541882f5..dd84cabb2a51 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -337,11 +337,7 @@ static int mlx4_en_get_sset_count(struct net_device *dev, int sset)
 	case ETH_SS_STATS:
 		return bitmap_iterator_count(&it) +
 			(priv->tx_ring_num * 2) +
-#ifdef CONFIG_NET_RX_BUSY_POLL
-			(priv->rx_ring_num * 5);
-#else
 			(priv->rx_ring_num * 2);
-#endif
 	case ETH_SS_TEST:
 		return MLX4_EN_NUM_SELF_TEST - !(priv->mdev->dev->caps.flags
 					& MLX4_DEV_CAP_FLAG_UC_LOOPBACK) * 2;
@@ -408,11 +404,6 @@ static void mlx4_en_get_ethtool_stats(struct net_device *dev,
 	for (i = 0; i < priv->rx_ring_num; i++) {
 		data[index++] = priv->rx_ring[i]->packets;
 		data[index++] = priv->rx_ring[i]->bytes;
-#ifdef CONFIG_NET_RX_BUSY_POLL
-		data[index++] = priv->rx_ring[i]->yields;
-		data[index++] = priv->rx_ring[i]->misses;
-		data[index++] = priv->rx_ring[i]->cleaned;
-#endif
 	}
 	spin_unlock_bh(&priv->stats_lock);
 
@@ -486,14 +477,6 @@ static void mlx4_en_get_strings(struct net_device *dev,
 				"rx%d_packets", i);
 			sprintf(data + (index++) * ETH_GSTRING_LEN,
 				"rx%d_bytes", i);
-#ifdef CONFIG_NET_RX_BUSY_POLL
-			sprintf(data + (index++) * ETH_GSTRING_LEN,
-				"rx%d_napi_yield", i);
-			sprintf(data + (index++) * ETH_GSTRING_LEN,
-				"rx%d_misses", i);
-			sprintf(data + (index++) * ETH_GSTRING_LEN,
-				"rx%d_cleaned", i);
-#endif
 		}
 		break;
 	case ETH_SS_PRIV_FLAGS:
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 886e1bc86374..659209ff7af6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -69,34 +69,6 @@ int mlx4_en_setup_tc(struct net_device *dev, u8 up)
 	return 0;
 }
 
-#ifdef CONFIG_NET_RX_BUSY_POLL
-/* must be called with local_bh_disable()d */
-static int mlx4_en_low_latency_recv(struct napi_struct *napi)
-{
-	struct mlx4_en_cq *cq = container_of(napi, struct mlx4_en_cq, napi);
-	struct net_device *dev = cq->dev;
-	struct mlx4_en_priv *priv = netdev_priv(dev);
-	struct mlx4_en_rx_ring *rx_ring = priv->rx_ring[cq->ring];
-	int done;
-
-	if (!priv->port_up)
-		return LL_FLUSH_FAILED;
-
-	if (!mlx4_en_cq_lock_poll(cq))
-		return LL_FLUSH_BUSY;
-
-	done = mlx4_en_process_rx_cq(dev, cq, 4);
-	if (likely(done))
-		rx_ring->cleaned += done;
-	else
-		rx_ring->misses++;
-
-	mlx4_en_cq_unlock_poll(cq);
-
-	return done;
-}
-#endif	/* CONFIG_NET_RX_BUSY_POLL */
-
 #ifdef CONFIG_RFS_ACCEL
 
 struct mlx4_en_filter {
@@ -1561,8 +1533,6 @@ int mlx4_en_start_port(struct net_device *dev)
 	for (i = 0; i < priv->rx_ring_num; i++) {
 		cq = priv->rx_cq[i];
 
-		mlx4_en_cq_init_lock(cq);
-
 		err = mlx4_en_init_affinity_hint(priv, i);
 		if (err) {
 			en_err(priv, "Failed preparing IRQ affinity hint\n");
@@ -1859,13 +1829,6 @@ void mlx4_en_stop_port(struct net_device *dev, int detach)
 	for (i = 0; i < priv->rx_ring_num; i++) {
 		struct mlx4_en_cq *cq = priv->rx_cq[i];
 
-		local_bh_disable();
-		while (!mlx4_en_cq_lock_napi(cq)) {
-			pr_info("CQ %d locked\n", i);
-			mdelay(1);
-		}
-		local_bh_enable();
-
 		napi_synchronize(&cq->napi);
 		mlx4_en_deactivate_rx_ring(priv, priv->rx_ring[i]);
 		mlx4_en_deactivate_cq(priv, cq);
@@ -2504,9 +2467,6 @@ static const struct net_device_ops mlx4_netdev_ops = {
 #ifdef CONFIG_RFS_ACCEL
 	.ndo_rx_flow_steer	= mlx4_en_filter_rfs,
 #endif
-#ifdef CONFIG_NET_RX_BUSY_POLL
-	.ndo_busy_poll		= mlx4_en_low_latency_recv,
-#endif
 	.ndo_get_phys_port_id	= mlx4_en_get_phys_port_id,
 #ifdef CONFIG_MLX4_EN_VXLAN
 	.ndo_add_vxlan_port	= mlx4_en_add_vxlan_port,
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index e7a5000aa12c..1feead34093b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -873,10 +873,8 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 		 * - TCP/IP (v4)
 		 * - without IP options
 		 * - not an IP fragment
-		 * - no LLS polling in progress
 		 */
-		if (!mlx4_en_cq_busy_polling(cq) &&
-		    (dev->features & NETIF_F_GRO)) {
+		if (dev->features & NETIF_F_GRO) {
 			struct sk_buff *gro_skb = napi_get_frags(&cq->napi);
 			if (!gro_skb)
 				goto next;
@@ -992,11 +990,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 
 		skb_mark_napi_id(skb, &cq->napi);
 
-		if (!mlx4_en_cq_busy_polling(cq))
-			napi_gro_receive(&cq->napi, skb);
-		else
-			netif_receive_skb(skb);
-
+		napi_gro_receive(&cq->napi, skb);
 next:
 		for (nr = 0; nr < priv->num_frags; nr++)
 			mlx4_en_free_frag(priv, frags, nr);
@@ -1038,13 +1032,8 @@ int mlx4_en_poll_rx_cq(struct napi_struct *napi, int budget)
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	int done;
 
-	if (!mlx4_en_cq_lock_napi(cq))
-		return budget;
-
 	done = mlx4_en_process_rx_cq(dev, cq, budget);
 
-	mlx4_en_cq_unlock_napi(cq);
-
 	/* If we used up all the quota - we're probably not done yet... */
 	if (done == budget) {
 		const struct cpumask *aff;
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 965c8f016ac4..35de7d2e6b34 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -320,11 +320,6 @@ struct mlx4_en_rx_ring {
 	void *rx_info;
 	unsigned long bytes;
 	unsigned long packets;
-#ifdef CONFIG_NET_RX_BUSY_POLL
-	unsigned long yields;
-	unsigned long misses;
-	unsigned long cleaned;
-#endif
 	unsigned long csum_ok;
 	unsigned long csum_none;
 	unsigned long csum_complete;
@@ -347,18 +342,6 @@ struct mlx4_en_cq {
 	struct mlx4_cqe *buf;
 #define MLX4_EN_OPCODE_ERROR	0x1e
 
-#ifdef CONFIG_NET_RX_BUSY_POLL
-	unsigned int state;
-#define MLX4_EN_CQ_STATE_IDLE        0
-#define MLX4_EN_CQ_STATE_NAPI     1    /* NAPI owns this CQ */
-#define MLX4_EN_CQ_STATE_POLL     2    /* poll owns this CQ */
-#define MLX4_CQ_LOCKED (MLX4_EN_CQ_STATE_NAPI | MLX4_EN_CQ_STATE_POLL)
-#define MLX4_EN_CQ_STATE_NAPI_YIELD  4    /* NAPI yielded this CQ */
-#define MLX4_EN_CQ_STATE_POLL_YIELD  8    /* poll yielded this CQ */
-#define CQ_YIELD (MLX4_EN_CQ_STATE_NAPI_YIELD | MLX4_EN_CQ_STATE_POLL_YIELD)
-#define CQ_USER_PEND (MLX4_EN_CQ_STATE_POLL | MLX4_EN_CQ_STATE_POLL_YIELD)
-	spinlock_t poll_lock; /* protects from LLS/napi conflicts */
-#endif  /* CONFIG_NET_RX_BUSY_POLL */
 	struct irq_desc *irq_desc;
 };
 
@@ -622,117 +605,6 @@ static inline struct mlx4_cqe *mlx4_en_get_cqe(void *buf, int idx, int cqe_sz)
 	return buf + idx * cqe_sz;
 }
 
-#ifdef CONFIG_NET_RX_BUSY_POLL
-static inline void mlx4_en_cq_init_lock(struct mlx4_en_cq *cq)
-{
-	spin_lock_init(&cq->poll_lock);
-	cq->state = MLX4_EN_CQ_STATE_IDLE;
-}
-
-/* called from the device poll rutine to get ownership of a cq */
-static inline bool mlx4_en_cq_lock_napi(struct mlx4_en_cq *cq)
-{
-	int rc = true;
-	spin_lock(&cq->poll_lock);
-	if (cq->state & MLX4_CQ_LOCKED) {
-		WARN_ON(cq->state & MLX4_EN_CQ_STATE_NAPI);
-		cq->state |= MLX4_EN_CQ_STATE_NAPI_YIELD;
-		rc = false;
-	} else
-		/* we don't care if someone yielded */
-		cq->state = MLX4_EN_CQ_STATE_NAPI;
-	spin_unlock(&cq->poll_lock);
-	return rc;
-}
-
-/* returns true is someone tried to get the cq while napi had it */
-static inline bool mlx4_en_cq_unlock_napi(struct mlx4_en_cq *cq)
-{
-	int rc = false;
-	spin_lock(&cq->poll_lock);
-	WARN_ON(cq->state & (MLX4_EN_CQ_STATE_POLL |
-			       MLX4_EN_CQ_STATE_NAPI_YIELD));
-
-	if (cq->state & MLX4_EN_CQ_STATE_POLL_YIELD)
-		rc = true;
-	cq->state = MLX4_EN_CQ_STATE_IDLE;
-	spin_unlock(&cq->poll_lock);
-	return rc;
-}
-
-/* called from mlx4_en_low_latency_recv(), BH are disabled */
-static inline bool mlx4_en_cq_lock_poll(struct mlx4_en_cq *cq)
-{
-	int rc = true;
-
-	spin_lock(&cq->poll_lock);
-	if ((cq->state & MLX4_CQ_LOCKED)) {
-		struct net_device *dev = cq->dev;
-		struct mlx4_en_priv *priv = netdev_priv(dev);
-		struct mlx4_en_rx_ring *rx_ring = priv->rx_ring[cq->ring];
-
-		cq->state |= MLX4_EN_CQ_STATE_POLL_YIELD;
-		rc = false;
-		rx_ring->yields++;
-	} else
-		/* preserve yield marks */
-		cq->state |= MLX4_EN_CQ_STATE_POLL;
-	spin_unlock(&cq->poll_lock);
-	return rc;
-}
-
-/* returns true if someone tried to get the cq while it was locked */
-static inline bool mlx4_en_cq_unlock_poll(struct mlx4_en_cq *cq)
-{
-	int rc = false;
-
-	spin_lock(&cq->poll_lock);
-	WARN_ON(cq->state & (MLX4_EN_CQ_STATE_NAPI));
-
-	if (cq->state & MLX4_EN_CQ_STATE_POLL_YIELD)
-		rc = true;
-	cq->state = MLX4_EN_CQ_STATE_IDLE;
-	spin_unlock(&cq->poll_lock);
-	return rc;
-}
-
-/* true if a socket is polling, even if it did not get the lock */
-static inline bool mlx4_en_cq_busy_polling(struct mlx4_en_cq *cq)
-{
-	WARN_ON(!(cq->state & MLX4_CQ_LOCKED));
-	return cq->state & CQ_USER_PEND;
-}
-#else
-static inline void mlx4_en_cq_init_lock(struct mlx4_en_cq *cq)
-{
-}
-
-static inline bool mlx4_en_cq_lock_napi(struct mlx4_en_cq *cq)
-{
-	return true;
-}
-
-static inline bool mlx4_en_cq_unlock_napi(struct mlx4_en_cq *cq)
-{
-	return false;
-}
-
-static inline bool mlx4_en_cq_lock_poll(struct mlx4_en_cq *cq)
-{
-	return false;
-}
-
-static inline bool mlx4_en_cq_unlock_poll(struct mlx4_en_cq *cq)
-{
-	return false;
-}
-
-static inline bool mlx4_en_cq_busy_polling(struct mlx4_en_cq *cq)
-{
-	return false;
-}
-#endif /* CONFIG_NET_RX_BUSY_POLL */
-
 #define MLX4_EN_WOL_DO_MODIFY (1ULL << 63)
 
 void mlx4_en_update_loopback_state(struct net_device *dev,
-- 
2.6.0.rc2.230.g3dd15c0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 3/9] net: un-inline sk_busy_loop()
  2015-11-17 13:56 ` [PATCH net-next 3/9] net: un-inline sk_busy_loop() Eric Dumazet
@ 2015-11-17 14:15   ` Eric Dumazet
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2015-11-17 14:15 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, netdev, Eli Cohen, Amir Vadai, Ariel Elior,
	Willem de Bruijn, Rida Assaf

On Tue, 2015-11-17 at 05:56 -0800, Eric Dumazet wrote:
> There is really little gain from inlining this big function.
> We'll soon make it even bigger in following patches.

I forgot to guard sk_busy_poll() with appropriate #ifdef :

This will be added in v2

diff --git a/net/core/dev.c b/net/core/dev.c
index 21a7d4093fde..c87c8486ea21 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4673,6 +4673,7 @@ static struct napi_struct *napi_by_id(unsigned int napi_id)
 	return NULL;
 }
 
+#if defined(CONFIG_NET_RX_BUSY_POLL)
 #define BUSY_POLL_BUDGET 8
 bool sk_busy_loop(struct sock *sk, int nonblock)
 {
@@ -4726,6 +4727,7 @@ out:
 	return rc;
 }
 EXPORT_SYMBOL(sk_busy_loop);
+#endif
 
 void napi_hash_add(struct napi_struct *napi)
 {

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 5/9] net: network drivers no longer need to implement ndo_busy_poll()
  2015-11-17 13:57 ` [PATCH net-next 5/9] net: network drivers no longer need to implement ndo_busy_poll() Eric Dumazet
@ 2015-11-17 15:14   ` Eliezer Tamir
  2015-11-17 15:33     ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Eliezer Tamir @ 2015-11-17 15:14 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller
  Cc: netdev, Eli Cohen, Amir Vadai, Ariel Elior, Willem de Bruijn,
	Rida Assaf, Eric Dumazet

On 17/11/2015 15:57, Eric Dumazet wrote:
> Instead of having to implement complex ndo_busy_poll() method,
> drivers can simply rely on NAPI poll logic.

I really like where you are going with this series.

...

> We could go one step further, and make busy polling
> available for all NAPI drivers, but this would require
> that all netif_napi_del() calls are done in process context
> so that we can call synchronize_rcu().
> Full audit would be required.
> 
> Before this is done, a driver still needs to call :
> 
> - skb_mark_napi_id() for each skb provided to the stack, although we can
>   easily do this directly in core networking stack in a future patch.
> 
> - napi_hash_add() and napi_hash_del() to allocate/free a napi_id per napi.
> 
> - Make sure RCU grace period is respected after napi_hash_del() before
>   memory containing napi structure is freed.

Can we move all of these into the NAPI infrastructure?
Maybe hash add/del can be inside netif_napi_add/del.
Some way to force the right RCU behavior, and busy polling is
completely driver-agnostic, which IMHO outweighs the small gains
you can have by micro-optimizing ndo_busy_poll.

On another note, any thoughts about unifying poll_controller with
regular poll?

cheers,
Eliezer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 5/9] net: network drivers no longer need to implement ndo_busy_poll()
  2015-11-17 15:14   ` Eliezer Tamir
@ 2015-11-17 15:33     ` Eric Dumazet
  2015-11-17 16:03       ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2015-11-17 15:33 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: Eric Dumazet, David S . Miller, netdev, Eli Cohen, Amir Vadai,
	Ariel Elior, Willem de Bruijn, Rida Assaf

On Tue, 2015-11-17 at 17:14 +0200, Eliezer Tamir wrote:
> On 17/11/2015 15:57, Eric Dumazet wrote:
> > Instead of having to implement complex ndo_busy_poll() method,
> > drivers can simply rely on NAPI poll logic.
> 
> I really like where you are going with this series.
> 
> ...
> 
> > We could go one step further, and make busy polling
> > available for all NAPI drivers, but this would require
> > that all netif_napi_del() calls are done in process context
> > so that we can call synchronize_rcu().
> > Full audit would be required.
> > 
> > Before this is done, a driver still needs to call :
> > 
> > - skb_mark_napi_id() for each skb provided to the stack, although we can
> >   easily do this directly in core networking stack in a future patch.
> > 
> > - napi_hash_add() and napi_hash_del() to allocate/free a napi_id per napi.
> > 
> > - Make sure RCU grace period is respected after napi_hash_del() before
> >   memory containing napi structure is freed.
> 
> Can we move all of these into the NAPI infrastructure?
> Maybe hash add/del can be inside netif_napi_add/del.
> Some way to force the right RCU behavior, and busy polling is
> completely driver-agnostic, which IMHO outweighs the small gains
> you can have by micro-optimizing ndo_busy_poll.

This is exactly what I suggested in the changelog ;)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 5/9] net: network drivers no longer need to implement ndo_busy_poll()
  2015-11-17 15:33     ` Eric Dumazet
@ 2015-11-17 16:03       ` Eric Dumazet
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2015-11-17 16:03 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: Eric Dumazet, David S . Miller, netdev, Eli Cohen, Amir Vadai,
	Ariel Elior, Willem de Bruijn, Rida Assaf

On Tue, 2015-11-17 at 07:33 -0800, Eric Dumazet wrote:

> This is exactly what I suggested in the changelog ;)

In v2 I am adding a patch to move skb_mark_napi_id() out of drivers into
net core stack (napi_gro_receive() & napi_gro_frags())

But adding the rcu stuff in napi_napi_del() wont be a trivial patch.

Also, mlx4 uses a NAPI for TX completions only, I suspect other drivers
might do the same, so we would fill napi_hash[] with napi contexts that
would not be used in RX at all.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 1/9] net: better skb->sender_cpu and skb->napi_id cohabitation
  2015-11-17 13:56 ` [PATCH net-next 1/9] net: better skb->sender_cpu and skb->napi_id cohabitation Eric Dumazet
@ 2015-11-17 17:42   ` Alexei Starovoitov
  0 siblings, 0 replies; 15+ messages in thread
From: Alexei Starovoitov @ 2015-11-17 17:42 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, netdev, Eli Cohen, Amir Vadai, Ariel Elior,
	Willem de Bruijn, Rida Assaf, Eric Dumazet

On Tue, Nov 17, 2015 at 05:56:56AM -0800, Eric Dumazet wrote:
> skb->sender_cpu and skb->napi_id share a common storage,
> and we had various bugs about this.
> 
> We had to call skb_sender_cpu_clear() in some places to
> not leave a prior skb->napi_id and fool netdev_pick_tx()
> 
> As suggested by Alexei, we could split the space so that
> these errors can not happen.
> 
> 0 value being reserved as the common (not initialized) value,
> let's reserve [1 .. NR_CPUS] range for valid sender_cpu,
> and [NR_CPUS+1 .. ~0U] for valid napi_id.

nice. that's a cleaner and simpler range split.

> This will allow proper busy polling support over tunnels.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Suggested-by: Alexei Starovoitov <ast@kernel.org>

Acked-by: Alexei Starovoitov <ast@kernel.org>

Great series of patches! Looking forward to try them out on rpc workloads.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-11-17 17:42 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-17 13:56 [PATCH net-next 0/9] net: extend busy polling support Eric Dumazet
2015-11-17 13:56 ` [PATCH net-next 1/9] net: better skb->sender_cpu and skb->napi_id cohabitation Eric Dumazet
2015-11-17 17:42   ` Alexei Starovoitov
2015-11-17 13:56 ` [PATCH net-next 2/9] mlx4: mlx4_en_low_latency_recv() called with BH disabled Eric Dumazet
2015-11-17 13:56 ` [PATCH net-next 3/9] net: un-inline sk_busy_loop() Eric Dumazet
2015-11-17 14:15   ` Eric Dumazet
2015-11-17 13:56 ` [PATCH net-next 4/9] net: allow BH servicing in sk_busy_loop() Eric Dumazet
2015-11-17 13:57 ` [PATCH net-next 5/9] net: network drivers no longer need to implement ndo_busy_poll() Eric Dumazet
2015-11-17 15:14   ` Eliezer Tamir
2015-11-17 15:33     ` Eric Dumazet
2015-11-17 16:03       ` Eric Dumazet
2015-11-17 13:57 ` [PATCH net-next 6/9] mlx5: add busy polling support Eric Dumazet
2015-11-17 13:57 ` [PATCH net-next 7/9] mlx5: support napi_complete_done() Eric Dumazet
2015-11-17 13:57 ` [PATCH net-next 8/9] bnx2x: remove bnx2x_low_latency_recv() support Eric Dumazet
2015-11-17 13:57 ` [PATCH net-next 9/9] mlx4: remove mlx4_en_low_latency_recv() Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.