Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 03/15] net: bcm63xx_enet: Utilize phy_ethtool_nway_reset
From: Heinrich Schuchardt @ 2016-11-15 19:59 UTC (permalink / raw)
  To: Florian Fainelli, netdev
  Cc: davem, andrew, tremyfr,
	maintainer:BROADCOM BCM63XX ARM ARCHITECTURE, Arnd Bergmann,
	xypron.glpk@gmx.de, Jarod Wilson, Thierry Reding,
	moderated list:BROADCOM BCM63XX ARM ARCHITECTURE, open list
In-Reply-To: <20161115180644.3941-4-f.fainelli@gmail.com>

On 11/15/2016 07:06 PM, Florian Fainelli wrote:
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Each patch of your patch series should have a commit message.

You can add the missing commit messages with
git rebase -i HEAD~15
choosing reword.

Please, add a cover letter message describing the patch series using

git format-patch -ns --cover-letter HEAD~15..HEAD

and provide a summary in the 0/15 message.

The cover letter and all patches should be in one mails thhread.
If you put all messages into one directory you can send the series as
one mail thread with

git send-email <directory>

> ---
>  drivers/net/ethernet/broadcom/bcm63xx_enet.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/bcm63xx_enet.c b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
> index 5c7acef1de2e..a43ab90c051e 100644
> --- a/drivers/net/ethernet/broadcom/bcm63xx_enet.c
> +++ b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
> @@ -1434,11 +1434,8 @@ static int bcm_enet_nway_reset(struct net_device *dev)
>  	struct bcm_enet_priv *priv;
>  
>  	priv = netdev_priv(dev);
> -	if (priv->has_phy) {
> -		if (!dev->phydev)
> -			return -ENODEV;
> -		return genphy_restart_aneg(dev->phydev);
> -	}
> +	if (priv->has_phy)
> +		return phy_ethtool_nway_reset(dev),
>  
>  	return -EOPNOTSUPP;
>  }
> 

^ permalink raw reply

* Re: [net PATCH 1/2] ipv4: Restore fib_trie_flush_external function and fix call ordering
From: Jiri Pirko @ 2016-11-15 19:51 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, Eric Dumazet, Jiri Pirko, davem
In-Reply-To: <20161115104606.13711.37506.stgit@ahduyck-blue-test.jf.intel.com>

Tue, Nov 15, 2016 at 11:46:06AM CET, alexander.h.duyck@intel.com wrote:
>The patch that removed the FIB offload infrastructure was a bit too
>aggressive and also removed code needed to clean up us splitting the table
>if additional rules were added.  Specifically the function
>fib_trie_flush_external was called at the end of a new rule being added to
>flush the foreign trie entries from the main trie.
>
>I updated the code so that we only call fib_trie_flush_external on the main
>table so that we flush the entries for local from main.  This way we don't
>call it for every rule change which is what was happening previously.

Well, the function was introduced by:

commit 104616e74e0b464d449fdd2ee2f547d2fad71610
Author: Scott Feldman <sfeldma@gmail.com>
Date:   Thu Mar 5 21:21:16 2015 -0800

    switchdev: don't support custom ip rules, for now
    
    Keep switchdev FIB offload model simple for now and don't allow custom ip
    rules.

Why this was not needed before? What changed in between:
104616e74e0b464d449fdd2ee2f547d2fad71610 ("switchdev: don't support custom ip rules, for now")
and
347e3b28c1ba2 ("switchdev: remove FIB offload infrastructure")


?


>
>Fixes: 347e3b28c1ba2 ("switchdev: remove FIB offload infrastructure")
>Reported-by: Eric Dumazet <edumazet@google.com>
>Cc: Jiri Pirko <jiri@mellanox.com>
>Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>---
> include/net/ip_fib.h    |    1 +
> net/ipv4/fib_frontend.c |   20 +++++++++++---
> net/ipv4/fib_trie.c     |   65 +++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 81 insertions(+), 5 deletions(-)
>
>diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
>index b9314b4..f390c3b 100644
>--- a/include/net/ip_fib.h
>+++ b/include/net/ip_fib.h
>@@ -243,6 +243,7 @@ int fib_table_dump(struct fib_table *table, struct sk_buff *skb,
> 		   struct netlink_callback *cb);
> int fib_table_flush(struct net *net, struct fib_table *table);
> struct fib_table *fib_trie_unmerge(struct fib_table *main_tb);
>+void fib_table_flush_external(struct fib_table *table);
> void fib_free_table(struct fib_table *tb);
> 
> #ifndef CONFIG_IP_MULTIPLE_TABLES
>diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
>index c3b8047..161fc0f 100644
>--- a/net/ipv4/fib_frontend.c
>+++ b/net/ipv4/fib_frontend.c
>@@ -151,7 +151,7 @@ static void fib_replace_table(struct net *net, struct fib_table *old,
> 
> int fib_unmerge(struct net *net)
> {
>-	struct fib_table *old, *new;
>+	struct fib_table *old, *new, *main_table;
> 
> 	/* attempt to fetch local table if it has been allocated */
> 	old = fib_get_table(net, RT_TABLE_LOCAL);
>@@ -162,11 +162,21 @@ int fib_unmerge(struct net *net)
> 	if (!new)
> 		return -ENOMEM;
> 
>+	/* table is already unmerged */
>+	if (new == old)
>+		return 0;
>+
> 	/* replace merged table with clean table */
>-	if (new != old) {
>-		fib_replace_table(net, old, new);
>-		fib_free_table(old);
>-	}
>+	fib_replace_table(net, old, new);
>+	fib_free_table(old);
>+
>+	/* attempt to fetch main table if it has been allocated */
>+	main_table = fib_get_table(net, RT_TABLE_MAIN);
>+	if (!main_table)
>+		return 0;
>+
>+	/* flush local entries from main table */
>+	fib_table_flush_external(main_table);
> 
> 	return 0;
> }
>diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
>index 4cff74d..735edc9 100644
>--- a/net/ipv4/fib_trie.c
>+++ b/net/ipv4/fib_trie.c
>@@ -1760,6 +1760,71 @@ struct fib_table *fib_trie_unmerge(struct fib_table *oldtb)
> 	return NULL;
> }
> 
>+/* Caller must hold RTNL */
>+void fib_table_flush_external(struct fib_table *tb)
>+{
>+	struct trie *t = (struct trie *)tb->tb_data;
>+	struct key_vector *pn = t->kv;
>+	unsigned long cindex = 1;
>+	struct hlist_node *tmp;
>+	struct fib_alias *fa;
>+
>+	/* walk trie in reverse order */
>+	for (;;) {
>+		unsigned char slen = 0;
>+		struct key_vector *n;
>+
>+		if (!(cindex--)) {
>+			t_key pkey = pn->key;
>+
>+			/* cannot resize the trie vector */
>+			if (IS_TRIE(pn))
>+				break;
>+
>+			/* resize completed node */
>+			pn = resize(t, pn);
>+			cindex = get_index(pkey, pn);
>+
>+			continue;
>+		}
>+
>+		/* grab the next available node */
>+		n = get_child(pn, cindex);
>+		if (!n)
>+			continue;
>+
>+		if (IS_TNODE(n)) {
>+			/* record pn and cindex for leaf walking */
>+			pn = n;
>+			cindex = 1ul << n->bits;
>+
>+			continue;
>+		}
>+
>+		hlist_for_each_entry_safe(fa, tmp, &n->leaf, fa_list) {
>+			/* if alias was cloned to local then we just
>+			 * need to remove the local copy from main
>+			 */
>+			if (tb->tb_id != fa->tb_id) {
>+				hlist_del_rcu(&fa->fa_list);
>+				alias_free_mem_rcu(fa);
>+				continue;
>+			}
>+
>+			/* record local slen */
>+			slen = fa->fa_slen;
>+		}
>+
>+		/* update leaf slen */
>+		n->slen = slen;
>+
>+		if (hlist_empty(&n->leaf)) {
>+			put_child_root(pn, n->key, NULL);
>+			node_free(n);
>+		}
>+	}
>+}
>+
> /* Caller must hold RTNL. */
> int fib_table_flush(struct net *net, struct fib_table *tb)
> {
>

^ permalink raw reply

* Re: [PATCH] icmp: Restore resistence to abnormal messages
From: Vicente Jiménez @ 2016-11-15 19:32 UTC (permalink / raw)
  To: Florian Westphal
  Cc: David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, netdev, linux-kernel
In-Reply-To: <20161115173024.GD30581@breakpoint.cc>

I agree that both patches try to solve the same problem in a very similar way.
Florian Westphal's patch do two more things:
1- add warning with pr_warn_ratelimited. I like this idea. I also
though about adding some message but I have no kernel experience and I
preferred to have just a working solution.
2- Check if the packet size is lower than (536 + 8). I think this is
not necessary because low values (even the zero case) is already
handled by the protocol. Also I don't understand why you choose this
value, it seems to be related to TCP MSS and the compared value is IP
packet size.

Finally, both patches decrement current packet by a value: Mine by 2
and Florian's by 8 bytes. Both arbitrary values. Personally I prefer
to go by small steps. If the small step fails, it just iterate again
and with 4 iterations, my patch also decrement the original value by 8
bytes (4x2).
Basically they are the same but my patch take smaller steps and miss
the warning message.

If David Miller thinks this could be a good addition, I'll add the
warning message to my patch.

We can also discuss the amount to subtract.

On Tue, Nov 15, 2016 at 6:30 PM, Florian Westphal <fw@strlen.de> wrote:
> David Miller <davem@redhat.com> wrote:
>> From: Vicente Jiménez <googuy@gmail.com>
>> Date: Tue, 15 Nov 2016 17:49:43 +0100
>>
>> > On Mon, Nov 14, 2016 at 7:36 PM, David Miller <davem@davemloft.net> wrote:
>> >> From: Vicente Jimenez Aguilar <googuy@gmail.com>
>> >> Date: Fri, 11 Nov 2016 21:20:18 +0100
>> >>
>> >>> @@ -819,6 +820,12 @@ static bool icmp_unreach(struct sk_buff *skb)
>> >>>                               /* fall through */
>> >>>                       case 0:
>> >>>                               info = ntohs(icmph->un.frag.mtu);
>> >>> +                             /* Handle weird case where next hop MTU is
>> >>> +                              * equal to or exceeding dropped packet size
>> >>> +                              */
>> >>> +                             old_mtu = ntohs(iph->tot_len);
>> >>> +                             if (info >= old_mtu)
>> >>> +                                     info = old_mtu - 2;
>> >>
>> >> This isn't something the old code did.
>> >>
>> >> The old code behaved much differently.
>> >>
>> > I don't wanted to restore old behavior just fix a strange case that
>> > was handle by this code where the next hop MTU reported by the router
>> > is equal or greater than the actual path MTU. Because router
>> > information is wrong, we need a way to guess a good packet size
>> > ignoring router data. The simplest strategy that avoid odd numbers is
>> > reducing dropped packet size by 2.
>>
>> This whole approach seems arbitrary.
>>
>> You haven't discussed in any way, what causes this in the first place.
>> And what about that cause makes simply subtracting by 2 work well or
>> not.
>>
>> You have a very locallized, specific, situation on your end you want
>> to fix.  But we must accept changes that handle things generically and
>> in a way that would help more than just your specific case.
>
> FWIW this is similar to the patch I sent a while ago:
>
> https://patchwork.ozlabs.org/patch/493997/
>
> I think in interest of robustness principle ("eat shit and don't die")
> one of these changes should go in :-|



-- 
saludos
vicente

^ permalink raw reply

* Re: [PATCH 04/15] net: mv643xx_eth: Utilize phy_ethtool_nway_reset
From: Andrew Lunn @ 2016-11-15 19:28 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, davem, tremyfr, Sebastian Hesselbarth, open list
In-Reply-To: <20161115180644.3941-5-f.fainelli@gmail.com>

On Tue, Nov 15, 2016 at 10:06:33AM -0800, Florian Fainelli wrote:
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* [PATCH net-next] bpf: Fix compilation warning in __bpf_lru_list_rotate_inactive
From: Martin KaFai Lau @ 2016-11-15 19:00 UTC (permalink / raw)
  To: netdev, David Miller; +Cc: Alexei Starovoitov, Daniel Borkmann, Kernel Team

gcc-6.2.1 gives the following warning:
kernel/bpf/bpf_lru_list.c: In function ‘__bpf_lru_list_rotate_inactive.isra.3’:
kernel/bpf/bpf_lru_list.c:201:28: warning: ‘next’ may be used uninitialized in this function [-Wmaybe-uninitialized]

The "next" is currently initialized in the while() loop which must have >=1
iterations.

This patch initializes next to get rid of the compiler warning.

Fixes: 3a08c2fd7634 ("bpf: LRU List")
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 kernel/bpf/bpf_lru_list.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/bpf_lru_list.c b/kernel/bpf/bpf_lru_list.c
index bfebff0..89b7ef4 100644
--- a/kernel/bpf/bpf_lru_list.c
+++ b/kernel/bpf/bpf_lru_list.c
@@ -170,7 +170,7 @@ static void __bpf_lru_list_rotate_inactive(struct bpf_lru *lru,
 					   struct bpf_lru_list *l)
 {
 	struct list_head *inactive = &l->lists[BPF_LRU_LIST_T_INACTIVE];
-	struct list_head *cur, *next, *last;
+	struct list_head *cur, *last, *next = inactive;
 	struct bpf_lru_node *node;
 	unsigned int i = 0;
 
-- 
2.5.1

^ permalink raw reply related

* [PATCH net-next 0/5] net: Implenent ethtool::nway_reset for a few drivers
From: Florian Fainelli @ 2016-11-15 19:19 UTC (permalink / raw)
  To: netdev; +Cc: davem, andrew, tremyfr, Florian Fainelli

Hi all,

This patch series depends on "net: phy: Centralize auto-negotation restart"
since it provides phy_ethtool_nway_reset as a helper function.

The drivers here already support PHYLIB, so there really is no reason why
restarting auto-negotiation would not be possible with these.

Florian Fainelli (5):
  net: stmmac: Implement ethtool::nway_reset
  net: ethoc: Implement ethtool::nway_reset
  net: ethernet: mvneta: Implement ethtool::nway_reset
  net: ethernet: mvpp2: Implement ethtool::nway_reset
  net: ethernet: marvell: pxa168_eth: Implement ethtool::nway_reset

 drivers/net/ethernet/ethoc.c                         | 1 +
 drivers/net/ethernet/marvell/mvneta.c                | 1 +
 drivers/net/ethernet/marvell/mvpp2.c                 | 1 +
 drivers/net/ethernet/marvell/pxa168_eth.c            | 1 +
 drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 1 +
 5 files changed, 5 insertions(+)

-- 
2.9.3

^ permalink raw reply

* [PATCH net-next 5/5] net: ethernet: marvell: pxa168_eth: Implement ethtool::nway_reset
From: Florian Fainelli @ 2016-11-15 19:19 UTC (permalink / raw)
  To: netdev
  Cc: davem, andrew, tremyfr, Florian Fainelli, Sergei Shtylyov,
	Jisheng Zhang, Peter Chen, Jarod Wilson, Florian Westphal,
	open list
In-Reply-To: <20161115191949.15361-1-f.fainelli@gmail.com>

Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are
already using dev->phydev all over the place so this comes for free.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/marvell/pxa168_eth.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/marvell/pxa168_eth.c b/drivers/net/ethernet/marvell/pxa168_eth.c
index b78a838f306c..3af2814ada23 100644
--- a/drivers/net/ethernet/marvell/pxa168_eth.c
+++ b/drivers/net/ethernet/marvell/pxa168_eth.c
@@ -1393,6 +1393,7 @@ static void pxa168_get_drvinfo(struct net_device *dev,
 
 static const struct ethtool_ops pxa168_ethtool_ops = {
 	.get_drvinfo	= pxa168_get_drvinfo,
+	.nway_reset	= phy_ethtool_nway_reset,
 	.get_link	= ethtool_op_get_link,
 	.get_ts_info	= ethtool_op_get_ts_info,
 	.get_link_ksettings = pxa168_get_link_ksettings,
-- 
2.9.3

^ permalink raw reply related

* [PATCH net-next 4/5] net: ethernet: mvpp2: Implement ethtool::nway_reset
From: Florian Fainelli @ 2016-11-15 19:19 UTC (permalink / raw)
  To: netdev
  Cc: davem, andrew, tremyfr, Florian Fainelli, Jisheng Zhang,
	Marcin Wojtas, Jarod Wilson, Peter Chen, open list
In-Reply-To: <20161115191949.15361-1-f.fainelli@gmail.com>

Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are
already using dev->phydev all over the place so this comes for free.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/marvell/mvpp2.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c
index c8bf155cad94..ee857868bea5 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -5923,6 +5923,7 @@ static const struct net_device_ops mvpp2_netdev_ops = {
 };
 
 static const struct ethtool_ops mvpp2_eth_tool_ops = {
+	.nway_reset	= phy_ethtool_nway_reset,
 	.get_link	= ethtool_op_get_link,
 	.set_coalesce	= mvpp2_ethtool_set_coalesce,
 	.get_coalesce	= mvpp2_ethtool_get_coalesce,
-- 
2.9.3

^ permalink raw reply related

* [PATCH net-next 3/5] net: ethernet: mvneta: Implement ethtool::nway_reset
From: Florian Fainelli @ 2016-11-15 19:19 UTC (permalink / raw)
  To: netdev
  Cc: davem, andrew, tremyfr, Florian Fainelli, Thomas Petazzoni,
	open list
In-Reply-To: <20161115191949.15361-1-f.fainelli@gmail.com>

Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are
already using dev->phydev all over the place so this comes for free.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/marvell/mvneta.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index b85819ea8eea..87274d4ab102 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3811,6 +3811,7 @@ static const struct net_device_ops mvneta_netdev_ops = {
 };
 
 const struct ethtool_ops mvneta_eth_tool_ops = {
+	.nway_reset	= phy_ethtool_nway_reset,
 	.get_link       = ethtool_op_get_link,
 	.set_coalesce   = mvneta_ethtool_set_coalesce,
 	.get_coalesce   = mvneta_ethtool_get_coalesce,
-- 
2.9.3

^ permalink raw reply related

* [PATCH net-next 2/5] net: ethoc: Implement ethtool::nway_reset
From: Florian Fainelli @ 2016-11-15 19:19 UTC (permalink / raw)
  To: netdev
  Cc: davem, andrew, tremyfr, Florian Fainelli, Tobias Klauser,
	Colin Ian King, open list
In-Reply-To: <20161115191949.15361-1-f.fainelli@gmail.com>

Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are
already using dev->phydev all over the place so this comes for free.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/ethoc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/ethoc.c b/drivers/net/ethernet/ethoc.c
index c044667a0a25..6456c180114b 100644
--- a/drivers/net/ethernet/ethoc.c
+++ b/drivers/net/ethernet/ethoc.c
@@ -966,6 +966,7 @@ static int ethoc_set_ringparam(struct net_device *dev,
 const struct ethtool_ops ethoc_ethtool_ops = {
 	.get_regs_len = ethoc_get_regs_len,
 	.get_regs = ethoc_get_regs,
+	.nway_reset = phy_ethtool_nway_reset,
 	.get_link = ethtool_op_get_link,
 	.get_ringparam = ethoc_get_ringparam,
 	.set_ringparam = ethoc_set_ringparam,
-- 
2.9.3

^ permalink raw reply related

* [PATCH net-next 1/5] net: stmmac: Implement ethtool::nway_reset
From: Florian Fainelli @ 2016-11-15 19:19 UTC (permalink / raw)
  To: netdev
  Cc: davem, andrew, tremyfr, Florian Fainelli, Giuseppe Cavallaro,
	Alexandre Torgue, open list
In-Reply-To: <20161115191949.15361-1-f.fainelli@gmail.com>

Utilize the generic phy_ethtool_nway_reset() helper function to
implement an autonegotiation restart.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
index 3fe9340b748f..0290d52330da 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
@@ -870,6 +870,7 @@ static const struct ethtool_ops stmmac_ethtool_ops = {
 	.get_regs = stmmac_ethtool_gregs,
 	.get_regs_len = stmmac_ethtool_get_regs_len,
 	.get_link = ethtool_op_get_link,
+	.nway_reset = phy_ethtool_nway_reset,
 	.get_pauseparam = stmmac_get_pauseparam,
 	.set_pauseparam = stmmac_set_pauseparam,
 	.get_ethtool_stats = stmmac_get_ethtool_stats,
-- 
2.9.3

^ permalink raw reply related

* Re: [ovs-dev] [PATCH net-next v13 0/8] openvswitch: support for layer 3 encapsulated packets
From: Thadeu Lima de Souza Cascardo @ 2016-11-15 19:01 UTC (permalink / raw)
  To: Yang, Yi Y, Jiri Benc, netdev@vger.kernel.org
  Cc: dev@openvswitch.org, Simon Horman, egarver
In-Reply-To: <79BBBFE6CB6C9B488C1A45ACD284F5195F0EBFEA@SHSMSX103.ccr.corp.intel.com>

On November 15, 2016 11:57:21 AM GMT-02:00, "Yang, Yi Y" <yi.y.yang@intel.com> wrote:
>Hi, Jiri
>
>I'm very glad to see you're continuing this work :-), I asked Simon
>about this twice, but nobody replies. I also remember Cascardo has a
>patch set to collaborate with this patch set, I asked Cascardo, but
>nobody responds, will you continue to do Cascardo's " create tunnel
>devices using rtnetlink interface" patch set? I test the old one v3,
>that can work with vxlan module in kernel, but if I build ovs with
>option " --with-linux=/lib/modules/`uname -r`/build", ovs vxlan module
>is built in vport_vxlan module, when I create vxlan-gpe port, kernel
>will automatically load vxlan module in the kernel instead of using the
>APIs in vport_vxlan module. 
>
>Cascardo, are you still working on this?
>
>-----Original Message-----
>From: netdev-owner@vger.kernel.org
>[mailto:netdev-owner@vger.kernel.org] On Behalf Of Jiri Benc
>Sent: Thursday, November 10, 2016 11:28 PM
>To: netdev@vger.kernel.org
>Cc: dev@openvswitch.org; Pravin Shelar <pshelar@ovn.org>; Lorand Jakab
><lojakab@cisco.com>; Simon Horman <simon.horman@netronome.com>
>Subject: [PATCH net-next v13 0/8] openvswitch: support for layer 3
>encapsulated packets
>
>At the core of this patch set is removing the assumption in Open
>vSwitch datapath that all packets have Ethernet header.
>
>The implementation relies on the presence of pop_eth and push_eth
>actions in datapath flows to facilitate adding and removing Ethernet
>headers as appropriate. The construction of such flows is left up to
>user-space.
>
>This series is based on work by Simon Horman, Lorand Jakab, Thomas
>Morin and others. I kept Lorand's and Simon's s-o-b in the patches that
>are derived from v11 to record their authorship of parts of the code.
>
>Changes from v12 to v13:
>
>* Addressed Pravin's feedback.
>* Removed the GRE vport conversion patch; L3 GRE ports should be
>created by
>  rtnetlink instead.
>
>Main changes from v11 to v12:
>
>* The patches were restructured and split differently for easier
>review.
>* They were rebased and adjusted to the current net-next. Especially
>MPLS
>handling is different (and easier) thanks to the recent MPLS GSO
>rework.
>* Several bugs were discovered and fixed. The most notable is fragment
>handling: header adjustment for ARPHRD_NONE devices on tx needs to be
>done
>after refragmentation, not before it. This required significant changes
>in
>the patchset. Another one is stricter checking of attributes (match on
>L2
>  vs. L3 packet) at the kernel level.
>* Instead of is_layer3 bool, a mac_proto field is used.
>
>Jiri Benc (8):
>  openvswitch: use hard_header_len instead of hardcoded ETH_HLEN
>  openvswitch: add mac_proto field to the flow key
>  openvswitch: pass mac_proto to ovs_vport_send
>  openvswitch: support MPLS push and pop for L3 packets
>  openvswitch: add processing of L3 packets
>  openvswitch: netlink: support L3 packets
>  openvswitch: add Ethernet push and pop actions
>  openvswitch: allow L3 netdev ports
>
> include/uapi/linux/openvswitch.h |  15 ++++
> net/openvswitch/actions.c        | 111 +++++++++++++++++-------
> net/openvswitch/datapath.c       |  13 +--
> net/openvswitch/flow.c           | 105 +++++++++++++++++------
> net/openvswitch/flow.h           |  22 +++++
>net/openvswitch/flow_netlink.c   | 179
>++++++++++++++++++++++++++-------------
> net/openvswitch/vport-netdev.c   |   9 +-
> net/openvswitch/vport.c          |  31 +++++--
> net/openvswitch/vport.h          |   2 +-
> 9 files changed, 353 insertions(+), 134 deletions(-)
>
>--
>1.8.3.1
>
>_______________________________________________
>dev mailing list
>dev@openvswitch.org
>https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Hi.

I am still working on this. Just see my recent discussion with Pravin about the way to support out of tree drivers. If you have any opinion on that, please share on that thread, preferably with a patch. It's likely that Eric Garver will take this task as he was already working with me.

Cascardo.

^ permalink raw reply

* Re: [PATCH net] rtnetlink: fix rtnl message size computation for XDP
From: Brenden Blanco @ 2016-11-15 19:05 UTC (permalink / raw)
  To: Sabrina Dubroca; +Cc: netdev
In-Reply-To: <0169d2419991a4ba533087a4fda70420ccdaca30.1479204870.git.sd@queasysnail.net>

On Tue, Nov 15, 2016 at 11:16:35AM +0100, Sabrina Dubroca wrote:
> rtnl_xdp_size() only considers the size of the actual payload attribute,
> and misses the space taken by the attribute used for nesting (IFLA_XDP).
> 
> Fixes: d1fdd9138682 ("rtnl: add option for setting link xdp prog")
> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Brenden Blanco <bblanco@plumgrid.com>
> ---
>  net/core/rtnetlink.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 5aeb61aadd32..5b4c3ba507d0 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -901,7 +901,8 @@ static size_t rtnl_port_size(const struct net_device *dev,
>  
>  static size_t rtnl_xdp_size(const struct net_device *dev)
>  {
> -	size_t xdp_size = nla_total_size(1);	/* XDP_ATTACHED */
> +	size_t xdp_size = nla_total_size(0) +	/* nest IFLA_XDP */
> +			  nla_total_size(1);	/* XDP_ATTACHED */
>  
>  	if (!dev->netdev_ops->ndo_xdp)
>  		return 0;
> -- 
> 2.10.2
> 

^ permalink raw reply

* Re: linux-next: net->netns_ids is used after calling idr_destroy for it
From: Cong Wang @ 2016-11-15 18:50 UTC (permalink / raw)
  To: Andrei Vagin; +Cc: Nicolas Dichtel, Linux Kernel Network Developers
In-Reply-To: <CAM_iQpUQOCsVkaRPU3OkmNvyOLDtmRRK1G3uzvi9URSVEbapJQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 637 bytes --]

On Tue, Nov 15, 2016 at 10:04 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Mon, Nov 14, 2016 at 10:23 PM, Andrei Vagin <avagin@gmail.com> wrote:
>> Hi Nicolas,
>>
>> cleanup_net() calls idr_destroy(net->netns_ids) for network namespaces
>> and then it calls unregister_netdevice_many() which calls
>> idr_alloc(net0>netns_ids). It looks wrong, doesn't it?
>>
>
> netns id is designed to allocate lazily, but yeah it makes no sense
> to allocate id for the netns being destroyed, not to mention idr is freed.
>
> I will send a patch.

Could you try the attached patch? I just did some quick netns creation/destroy
tests.

Thanks!

[-- Attachment #2: unregister-netdevice.diff --]
[-- Type: text/plain, Size: 2296 bytes --]

diff --git a/net/core/dev.c b/net/core/dev.c
index 6deba68..f9a2969 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6685,7 +6685,7 @@ static void net_set_todo(struct net_device *dev)
 	dev_net(dev)->dev_unreg_count++;
 }
 
-static void rollback_registered_many(struct list_head *head)
+static void rollback_registered_many(struct list_head *head, bool send_rtmsg)
 {
 	struct net_device *dev, *tmp;
 	LIST_HEAD(close_head);
@@ -6737,8 +6737,8 @@ static void rollback_registered_many(struct list_head *head)
 		*/
 		call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
 
-		if (!dev->rtnl_link_ops ||
-		    dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
+		if ((!dev->rtnl_link_ops ||
+		    dev->rtnl_link_state == RTNL_LINK_INITIALIZED) && send_rtmsg)
 			skb = rtmsg_ifinfo_build_skb(RTM_DELLINK, dev, ~0U,
 						     GFP_KERNEL);
 
@@ -6777,7 +6777,7 @@ static void rollback_registered(struct net_device *dev)
 	LIST_HEAD(single);
 
 	list_add(&dev->unreg_list, &single);
-	rollback_registered_many(&single);
+	rollback_registered_many(&single, true);
 	list_del(&single);
 }
 
@@ -7769,6 +7769,18 @@ void unregister_netdevice_queue(struct net_device *dev, struct list_head *head)
 }
 EXPORT_SYMBOL(unregister_netdevice_queue);
 
+static void __unregister_netdevice_many(struct list_head *head, bool send_rtmsg)
+{
+	struct net_device *dev;
+
+	if (!list_empty(head)) {
+		rollback_registered_many(head, send_rtmsg);
+		list_for_each_entry(dev, head, unreg_list)
+			net_set_todo(dev);
+		list_del(head);
+	}
+}
+
 /**
  *	unregister_netdevice_many - unregister many devices
  *	@head: list of devices
@@ -7778,14 +7790,7 @@ EXPORT_SYMBOL(unregister_netdevice_queue);
  */
 void unregister_netdevice_many(struct list_head *head)
 {
-	struct net_device *dev;
-
-	if (!list_empty(head)) {
-		rollback_registered_many(head);
-		list_for_each_entry(dev, head, unreg_list)
-			net_set_todo(dev);
-		list_del(head);
-	}
+	__unregister_netdevice_many(head, true);
 }
 EXPORT_SYMBOL(unregister_netdevice_many);
 
@@ -8239,7 +8244,7 @@ static void __net_exit default_device_exit_batch(struct list_head *net_list)
 				unregister_netdevice_queue(dev, &dev_kill_list);
 		}
 	}
-	unregister_netdevice_many(&dev_kill_list);
+	__unregister_netdevice_many(&dev_kill_list, false);
 	rtnl_unlock();
 }
 

^ permalink raw reply related

* Re: [PATCH v2 iproute2 1/1] tc: print raw qdisc handle.
From: Stephen Hemminger @ 2016-11-15 18:30 UTC (permalink / raw)
  To: Roman Mashak; +Cc: netdev, jhs
In-Reply-To: <1479164360-24188-1-git-send-email-mrv@mojatatu.com>

On Mon, 14 Nov 2016 17:59:20 -0500
Roman Mashak <mrv@mojatatu.com> wrote:

> This is v2 patch with fixed code indentation.
> 
> Signed-off-by: Roman Mashak <mrv@mojatatu.com>
> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>

Applied, but inverted the if() since it reads better if code
is:
   if (show_raw)
	...
   else
	...

^ permalink raw reply

* Re: [LKP] [net] 2ab9fb18c4: kernel BUG at include/linux/skbuff.h:1935!
From: Alexander Duyck @ 2016-11-15 18:30 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Ye Xiaolong, Eric Dumazet, Alexander Duyck, Willem de Bruijn,
	netdev, Alexei Starovoitov, Jojy Varghese, Tom Herbert,
	Yibin Yang, LKP, David Miller
In-Reply-To: <20161114031130.yjb4anou24ede4ue@wfg-t540p.sh.intel.com>

On Sun, Nov 13, 2016 at 7:11 PM, Fengguang Wu <fengguang.wu@intel.com> wrote:
>>> Hi guys.
>>>
>>> I took a look at the commit again and I do not see how this can happen.
>>>
>>> Are you sure patch was properly applied ?
>>>
>>> In particular, the following extract is obscure for me :
>>>
>>>
>>>> https://github.com/0day-ci/linux
>>>> Eric-Dumazet/net-__skb_flow_dissect-must-cap-its-return-value/20161110-080839
>>>> commit 2ab9fb18c46b91b16a0f0f329336d3be9fc32deb ("net:
>>>> __skb_flow_dissect() must cap its return value")
>>>>
>>
>> Hi,
>>
>> The above two lines means 0day repo setup a new branch
>>
>> "Eric-Dumazet/net-__skb_flow_dissect-must-cap-its-return-value/20161110-080839"
>> which is based on net/master, then applied you patch on top of it,
>> commit id is 2ab9fb18c46b91b16a0f0f329336d3be9fc32deb.
>
>
> Xiaolong, it may be more helpful to show the base tree where we apply
> the patch to. And the final url:
>
> https://github.com/0day-ci/linux/tree/Eric-Dumazet/net-__skb_flow_dissect-must-cap-its-return-value/20161110-080839
>
> Thanks,
> Fengguang

Is there any chance you guys could send me the net/ethernet/eth.o and
drivers/net/ethernet/intel/igb/igb.o files for this build?  I'm still
not able to reproduce but I have a thought that this might be some
sort of compiler issue.

Specifically looking over the call trace it looks like we only pulled
8 bytes of the header instead of 14.  This is based on the values of
RAX and R15 differing by 6 which I believe represent skb->len and
skb->data_len  However eth_get_headlen is supposed to be using
sizeof(*eth) which should be 14.  I'm wondering if there is an issue
due to it being a packed structure that is resulting in 8 being
reported instead of 14.

Thanks.

- Alex

^ permalink raw reply

* Re: [net PATCH 1/2] ipv4: Restore fib_trie_flush_external function and fix call ordering
From: Eric Dumazet @ 2016-11-15 18:30 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, Eric Dumazet, Jiri Pirko, davem
In-Reply-To: <20161115104606.13711.37506.stgit@ahduyck-blue-test.jf.intel.com>

On Tue, 2016-11-15 at 05:46 -0500, Alexander Duyck wrote:
> The patch that removed the FIB offload infrastructure was a bit too
> aggressive and also removed code needed to clean up us splitting the table
> if additional rules were added.  Specifically the function
> fib_trie_flush_external was called at the end of a new rule being added to
> flush the foreign trie entries from the main trie.
> 
> I updated the code so that we only call fib_trie_flush_external on the main
> table so that we flush the entries for local from main.  This way we don't
> call it for every rule change which is what was happening previously.
> 
> Fixes: 347e3b28c1ba2 ("switchdev: remove FIB offload infrastructure")
> Reported-by: Eric Dumazet <edumazet@google.com>
> Cc: Jiri Pirko <jiri@mellanox.com>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---


Acked-by: Eric Dumazet <edumazet@google.com>

Thanks for sorting this out Alex !

^ permalink raw reply

* Re: [net PATCH 2/2] ipv4: Fix memory leak in exception case for splitting tries
From: Eric Dumazet @ 2016-11-15 18:24 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, Eric Dumazet, davem
In-Reply-To: <20161115104612.13711.44592.stgit@ahduyck-blue-test.jf.intel.com>

On Tue, 2016-11-15 at 05:46 -0500, Alexander Duyck wrote:
> Fix a small memory leak that can occur where we leak a fib_alias in the
> event of us not being able to insert it into the local table.
> 
> Fixes: 0ddcf43d5d4a0 ("ipv4: FIB Local/MAIN table collapse")
> Reported-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
>  net/ipv4/fib_trie.c |    4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* [PATCH net-next 2/5] net: busy-poll: remove need_resched() from sk_can_busy_loop()
From: Eric Dumazet @ 2016-11-15 18:15 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Willem de Bruijn, Adam Belay, Zach Brown, Tariq Toukan,
	Yuval Mintz, Ariel Elior, Eric Dumazet, Eric Dumazet
In-Reply-To: <1479233715-9905-1-git-send-email-edumazet@google.com>

Now sk_busy_loop() can schedule by itself, we can remove
need_resched() check from sk_can_busy_loop()

Also add a const to its struct sock parameter.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Adam Belay <abelay@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
---
 include/net/busy_poll.h | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index 2fbeb1313c0f..965e52b9b5a3 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -58,10 +58,9 @@ static inline unsigned long busy_loop_end_time(void)
 	return busy_loop_us_clock() + ACCESS_ONCE(sysctl_net_busy_poll);
 }
 
-static inline bool sk_can_busy_loop(struct sock *sk)
+static inline bool sk_can_busy_loop(const struct sock *sk)
 {
-	return sk->sk_ll_usec && sk->sk_napi_id &&
-	       !need_resched() && !signal_pending(current);
+	return sk->sk_ll_usec && sk->sk_napi_id && !signal_pending(current);
 }
 
 
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCH net-next 1/5] net: busy-poll: allow preemption in sk_busy_loop()
From: Eric Dumazet @ 2016-11-15 18:15 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Willem de Bruijn, Adam Belay, Zach Brown, Tariq Toukan,
	Yuval Mintz, Ariel Elior, Eric Dumazet, Eric Dumazet
In-Reply-To: <1479233715-9905-1-git-send-email-edumazet@google.com>

After commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job"),
sk_busy_loop() needs a bit of care :
softirqs might be delayed since we do not allow preemption yet.

This patch adds preemptiom points in sk_busy_loop(),
and makes sure no unnecessary cache line dirtying
or atomic operations are done while looping.

A new flag is added into napi->state : NAPI_STATE_IN_BUSY_POLL

This prevents napi_complete_done() from clearing NAPIF_STATE_SCHED,
so that sk_busy_loop() does not have to grab it again.

Similarly, netpoll_poll_lock() is done one time.

This gives about 10 to 20 % improvement in various busy polling
tests, especially when many threads are busy polling in
configurations with large number of NIC queues.

This should allow experimenting with bigger delays without
hurting overall latencies.

Tested:
 On a 40Gb mlx4 NIC, 32 RX/TX queues.

 echo 70 >/proc/sys/net/core/busy_read
 for i in `seq 1 40`; do echo -n $i: ; ./super_netperf $i -H lpaa24 -t UDP_RR -- -N -n; done

    Before:      After:
 1:   90072   92819
 2:  157289  184007
 3:  235772  213504
 4:  344074  357513
 5:  394755  458267
 6:  461151  487819
 7:  549116  625963
 8:  544423  716219
 9:  720460  738446
10:  794686  837612
11:  915998  923960
12:  937507  925107
13: 1019677  971506
14: 1046831 1113650
15: 1114154 1148902
16: 1105221 1179263
17: 1266552 1299585
18: 1258454 1383817
19: 1341453 1312194
20: 1363557 1488487
21: 1387979 1501004
22: 1417552 1601683
23: 1550049 1642002
24: 1568876 1601915
25: 1560239 1683607
26: 1640207 1745211
27: 1706540 1723574
28: 1638518 1722036
29: 1734309 1757447
30: 1782007 1855436
31: 1724806 1888539
32: 1717716 1944297
33: 1778716 1869118
34: 1805738 1983466
35: 1815694 2020758
36: 1893059 2035632
37: 1843406 2034653
38: 1888830 2086580
39: 1972827 2143567
40: 1877729 2181851

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Adam Belay <abelay@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
---
 include/linux/netdevice.h |  10 +++++
 net/core/dev.c            | 102 +++++++++++++++++++++++++++++++++++++---------
 2 files changed, 92 insertions(+), 20 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 86bacf6a64f0..e71de66e3792 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -334,6 +334,16 @@ enum {
 	NAPI_STATE_NPSVC,	/* Netpoll - don't dequeue from poll_list */
 	NAPI_STATE_HASHED,	/* In NAPI hash (busy polling possible) */
 	NAPI_STATE_NO_BUSY_POLL,/* Do not add in napi_hash, no busy polling */
+	NAPI_STATE_IN_BUSY_POLL,/* sk_busy_loop() owns this NAPI */
+};
+
+enum {
+	NAPIF_STATE_SCHED	 = (1UL << NAPI_STATE_SCHED),
+	NAPIF_STATE_DISABLE	 = (1UL << NAPI_STATE_DISABLE),
+	NAPIF_STATE_NPSVC	 = (1UL << NAPI_STATE_NPSVC),
+	NAPIF_STATE_HASHED	 = (1UL << NAPI_STATE_HASHED),
+	NAPIF_STATE_NO_BUSY_POLL = (1UL << NAPI_STATE_NO_BUSY_POLL),
+	NAPIF_STATE_IN_BUSY_POLL = (1UL << NAPI_STATE_IN_BUSY_POLL),
 };
 
 enum gro_result {
diff --git a/net/core/dev.c b/net/core/dev.c
index 6deba68ad9e4..369dcc8efc01 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4902,6 +4902,12 @@ void __napi_complete(struct napi_struct *n)
 {
 	BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
 
+	/* Some drivers call us directly, instead of calling
+	 * napi_complete_done().
+	 */
+	if (unlikely(test_bit(NAPI_STATE_IN_BUSY_POLL, &n->state)))
+		return;
+
 	list_del_init(&n->poll_list);
 	smp_mb__before_atomic();
 	clear_bit(NAPI_STATE_SCHED, &n->state);
@@ -4913,10 +4919,13 @@ void napi_complete_done(struct napi_struct *n, int work_done)
 	unsigned long flags;
 
 	/*
-	 * don't let napi dequeue from the cpu poll list
-	 * just in case its running on a different cpu
+	 * 1) Don't let napi dequeue from the cpu poll list
+	 *    just in case its running on a different cpu.
+	 * 2) If we are busy polling, do nothing here, we have
+	 *    the guarantee we will be called later.
 	 */
-	if (unlikely(test_bit(NAPI_STATE_NPSVC, &n->state)))
+	if (unlikely(n->state & (NAPIF_STATE_NPSVC |
+				 NAPIF_STATE_IN_BUSY_POLL)))
 		return;
 
 	if (n->gro_list) {
@@ -4956,13 +4965,41 @@ static struct napi_struct *napi_by_id(unsigned int napi_id)
 }
 
 #if defined(CONFIG_NET_RX_BUSY_POLL)
+
 #define BUSY_POLL_BUDGET 8
+
+static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock)
+{
+	int rc;
+
+	clear_bit(NAPI_STATE_IN_BUSY_POLL, &napi->state);
+
+	local_bh_disable();
+
+	/* All we really want here is to re-enable device interrupts.
+	 * Ideally, a new ndo_busy_poll_stop() could avoid another round.
+	 */
+	rc = napi->poll(napi, BUSY_POLL_BUDGET);
+	netpoll_poll_unlock(have_poll_lock);
+	if (rc == BUSY_POLL_BUDGET)
+		__napi_schedule(napi);
+	local_bh_enable();
+	if (local_softirq_pending())
+		do_softirq();
+}
+
 bool sk_busy_loop(struct sock *sk, int nonblock)
 {
 	unsigned long end_time = !nonblock ? sk_busy_loop_end_time(sk) : 0;
+	int (*napi_poll)(struct napi_struct *napi, int budget);
 	int (*busy_poll)(struct napi_struct *dev);
+	void *have_poll_lock = NULL;
 	struct napi_struct *napi;
-	int rc = false;
+	int rc;
+
+restart:
+	rc = false;
+	napi_poll = NULL;
 
 	rcu_read_lock();
 
@@ -4973,24 +5010,33 @@ bool sk_busy_loop(struct sock *sk, int nonblock)
 	/* Note: ndo_busy_poll method is optional in linux-4.5 */
 	busy_poll = napi->dev->netdev_ops->ndo_busy_poll;
 
-	do {
+	preempt_disable();
+	for (;;) {
 		rc = 0;
 		local_bh_disable();
 		if (busy_poll) {
 			rc = busy_poll(napi);
-		} else if (napi_schedule_prep(napi)) {
-			void *have = netpoll_poll_lock(napi);
-
-			if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
-				rc = napi->poll(napi, BUSY_POLL_BUDGET);
-				trace_napi_poll(napi, rc, BUSY_POLL_BUDGET);
-				if (rc == BUSY_POLL_BUDGET) {
-					napi_complete_done(napi, rc);
-					napi_schedule(napi);
-				}
-			}
-			netpoll_poll_unlock(have);
+			goto count;
 		}
+		if (!napi_poll) {
+			unsigned long val = READ_ONCE(napi->state);
+
+			/* If multiple threads are competing for this napi,
+			 * we avoid dirtying napi->state as much as we can.
+			 */
+			if (val & (NAPIF_STATE_DISABLE | NAPIF_STATE_SCHED |
+				   NAPIF_STATE_IN_BUSY_POLL))
+				goto count;
+			if (cmpxchg(&napi->state, val,
+				    val | NAPIF_STATE_IN_BUSY_POLL |
+					  NAPIF_STATE_SCHED) != val)
+				goto count;
+			have_poll_lock = netpoll_poll_lock(napi);
+			napi_poll = napi->poll;
+		}
+		rc = napi_poll(napi, BUSY_POLL_BUDGET);
+		trace_napi_poll(napi, rc, BUSY_POLL_BUDGET);
+count:
 		if (rc > 0)
 			__NET_ADD_STATS(sock_net(sk),
 					LINUX_MIB_BUSYPOLLRXPACKETS, rc);
@@ -4999,10 +5045,26 @@ bool sk_busy_loop(struct sock *sk, int nonblock)
 		if (rc == LL_FLUSH_FAILED)
 			break; /* permanent failure */
 
-		cpu_relax();
-	} while (!nonblock && skb_queue_empty(&sk->sk_receive_queue) &&
-		 !need_resched() && !busy_loop_timeout(end_time));
+		if (nonblock || !skb_queue_empty(&sk->sk_receive_queue) ||
+		    busy_loop_timeout(end_time))
+			break;
 
+		if (unlikely(need_resched())) {
+			if (napi_poll)
+				busy_poll_stop(napi, have_poll_lock);
+			preempt_enable();
+			rcu_read_unlock();
+			cond_resched();
+			rc = !skb_queue_empty(&sk->sk_receive_queue);
+			if (rc || busy_loop_timeout(end_time))
+				return rc;
+			goto restart;
+		}
+		cpu_relax_lowlatency();
+	}
+	if (napi_poll)
+		busy_poll_stop(napi, have_poll_lock);
+	preempt_enable();
 	rc = !skb_queue_empty(&sk->sk_receive_queue);
 out:
 	rcu_read_unlock();
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCH net-next 4/5] net/mlx4_en: use napi_complete_done() return value
From: Eric Dumazet @ 2016-11-15 18:15 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Willem de Bruijn, Adam Belay, Zach Brown, Tariq Toukan,
	Yuval Mintz, Ariel Elior, Eric Dumazet, Eric Dumazet
In-Reply-To: <1479233715-9905-1-git-send-email-edumazet@google.com>

Do not rearm interrupts if we are busy polling.

mlx4 uses separate CQ for TX and RX, so number of TX interrupts
does not change, unfortunately.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Adam Belay <abelay@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 2cc91002064f..22f08f9ef464 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -1137,8 +1137,8 @@ int mlx4_en_poll_rx_cq(struct napi_struct *napi, int budget)
 		done = 0;
 	}
 	/* Done for now */
-	napi_complete_done(napi, done);
-	mlx4_en_arm_cq(priv, cq);
+	if (napi_complete_done(napi, done))
+		mlx4_en_arm_cq(priv, cq);
 	return done;
 }
 
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCH net-next 5/5] bnx2x: switch to napi_complete_done()
From: Eric Dumazet @ 2016-11-15 18:15 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Willem de Bruijn, Adam Belay, Zach Brown, Tariq Toukan,
	Yuval Mintz, Ariel Elior, Eric Dumazet, Eric Dumazet
In-Reply-To: <1479233715-9905-1-git-send-email-edumazet@google.com>

Switch from napi_complete() to napi_complete_done()
for better GRO support (gro_flush_timeout) and core NAPI
features.

Do not rearm interrupts if we are busy polling,
to reduce bus and interrupts overhead.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Adam Belay <abelay@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index ed42c1009685..3fd36b421d51 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -3248,13 +3248,14 @@ static int bnx2x_poll(struct napi_struct *napi, int budget)
 			rmb();
 
 			if (!(bnx2x_has_rx_work(fp) || bnx2x_has_tx_work(fp))) {
-				napi_complete(napi);
-				/* Re-enable interrupts */
-				DP(NETIF_MSG_RX_STATUS,
-				   "Update index to %d\n", fp->fp_hc_idx);
-				bnx2x_ack_sb(bp, fp->igu_sb_id, USTORM_ID,
-					     le16_to_cpu(fp->fp_hc_idx),
-					     IGU_INT_ENABLE, 1);
+				if (napi_complete_done(napi, rx_work_done)) {
+					/* Re-enable interrupts */
+					DP(NETIF_MSG_RX_STATUS,
+					   "Update index to %d\n", fp->fp_hc_idx);
+					bnx2x_ack_sb(bp, fp->igu_sb_id, USTORM_ID,
+						     le16_to_cpu(fp->fp_hc_idx),
+						     IGU_INT_ENABLE, 1);
+				}
 			} else {
 				rx_work_done = budget;
 			}
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCH net-next 3/5] net: busy-poll: return busypolling status to drivers
From: Eric Dumazet @ 2016-11-15 18:15 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Willem de Bruijn, Adam Belay, Zach Brown, Tariq Toukan,
	Yuval Mintz, Ariel Elior, Eric Dumazet, Eric Dumazet
In-Reply-To: <1479233715-9905-1-git-send-email-edumazet@google.com>

NAPI drivers use napi_complete_done() or napi_complete() when
they drained RX ring and right before re-enabling device interrupts.

In busy polling, we can avoid interrupts being delivered since
we are polling RX ring in a controlled loop.

Drivers can chose to use napi_complete_done() return value
to reduce interrupts overhead while busy polling is active.

This is optional, legacy drivers should work fine even
if not updated.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Adam Belay <abelay@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
---
 include/linux/netdevice.h |  7 ++++---
 net/core/dev.c            | 10 ++++++----
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e71de66e3792..bcddf951ccee 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -463,16 +463,17 @@ static inline bool napi_reschedule(struct napi_struct *napi)
 	return false;
 }
 
-void __napi_complete(struct napi_struct *n);
-void napi_complete_done(struct napi_struct *n, int work_done);
+bool __napi_complete(struct napi_struct *n);
+bool napi_complete_done(struct napi_struct *n, int work_done);
 /**
  *	napi_complete - NAPI processing complete
  *	@n: NAPI context
  *
  * Mark NAPI processing as complete.
  * Consider using napi_complete_done() instead.
+ * Return false if device should avoid rearming interrupts.
  */
-static inline void napi_complete(struct napi_struct *n)
+static inline bool napi_complete(struct napi_struct *n)
 {
 	return napi_complete_done(n, 0);
 }
diff --git a/net/core/dev.c b/net/core/dev.c
index 369dcc8efc01..edba9efeb2e9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4898,7 +4898,7 @@ void __napi_schedule_irqoff(struct napi_struct *n)
 }
 EXPORT_SYMBOL(__napi_schedule_irqoff);
 
-void __napi_complete(struct napi_struct *n)
+bool __napi_complete(struct napi_struct *n)
 {
 	BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
 
@@ -4906,15 +4906,16 @@ void __napi_complete(struct napi_struct *n)
 	 * napi_complete_done().
 	 */
 	if (unlikely(test_bit(NAPI_STATE_IN_BUSY_POLL, &n->state)))
-		return;
+		return false;
 
 	list_del_init(&n->poll_list);
 	smp_mb__before_atomic();
 	clear_bit(NAPI_STATE_SCHED, &n->state);
+	return true;
 }
 EXPORT_SYMBOL(__napi_complete);
 
-void napi_complete_done(struct napi_struct *n, int work_done)
+bool napi_complete_done(struct napi_struct *n, int work_done)
 {
 	unsigned long flags;
 
@@ -4926,7 +4927,7 @@ void napi_complete_done(struct napi_struct *n, int work_done)
 	 */
 	if (unlikely(n->state & (NAPIF_STATE_NPSVC |
 				 NAPIF_STATE_IN_BUSY_POLL)))
-		return;
+		return false;
 
 	if (n->gro_list) {
 		unsigned long timeout = 0;
@@ -4948,6 +4949,7 @@ void napi_complete_done(struct napi_struct *n, int work_done)
 		__napi_complete(n);
 		local_irq_restore(flags);
 	}
+	return true;
 }
 EXPORT_SYMBOL(napi_complete_done);
 
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCH net-next 0/5] net: busy-poll: allow preemption and other optimizations
From: Eric Dumazet @ 2016-11-15 18:15 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Willem de Bruijn, Adam Belay, Zach Brown, Tariq Toukan,
	Yuval Mintz, Ariel Elior, Eric Dumazet, Eric Dumazet

It is time to have preemption points in sk_busy_loop() and improve
its scalability.

Also napi_complete() and friends can tell drivers when it is safe to
not re-enable device interrupts, saving some overhead under
high busy polling.

mlx4 and bnx2x are changed accordingly, to show how this busy polling
status can be exploited by drivers.

Next steps will implement Zach Brown suggestion, where NAPI polling
would be enabled all the time for some chosen queues.
This is needed for efficient epoll() support anyway.

Eric Dumazet (5):
  net: busy-poll: allow preemption in sk_busy_loop()
  net: busy-poll: remove need_resched() from sk_can_busy_loop()
  net: busy-poll: return busypolling status to drivers
  net/mlx4_en: use napi_complete_done() return value
  bnx2x: switch to napi_complete_done()

 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c |  15 ++--
 drivers/net/ethernet/mellanox/mlx4/en_rx.c      |   4 +-
 include/linux/netdevice.h                       |  17 +++-
 include/net/busy_poll.h                         |   5 +-
 net/core/dev.c                                  | 110 +++++++++++++++++++-----
 5 files changed, 113 insertions(+), 38 deletions(-)

-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply

* RE: [PATCH 15/15] net: usb: lan78xx: Utilize phy_ethtool_nway_reset
From: Woojung.Huh @ 2016-11-15 18:14 UTC (permalink / raw)
  To: f.fainelli, netdev
  Cc: davem, andrew, tremyfr, UNGLinuxDriver, linux-usb, linux-kernel
In-Reply-To: <20161115180644.3941-16-f.fainelli@gmail.com>

> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
>  drivers/net/usb/lan78xx.c | 7 +------
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
> index bcd9010c1f27..cf2857fa938f 100644
> --- a/drivers/net/usb/lan78xx.c
> +++ b/drivers/net/usb/lan78xx.c
> @@ -1447,11 +1447,6 @@ static u32 lan78xx_get_link(struct net_device *net)
>  	return net->phydev->link;
>  }
> 
> -static int lan78xx_nway_reset(struct net_device *net)
> -{
> -	return phy_start_aneg(net->phydev);
> -}
> -
>  static void lan78xx_get_drvinfo(struct net_device *net,
>  				struct ethtool_drvinfo *info)
>  {
> @@ -1655,7 +1650,7 @@ static int lan78xx_set_pause(struct net_device
> *net,
> 
>  static const struct ethtool_ops lan78xx_ethtool_ops = {
>  	.get_link	= lan78xx_get_link,
> -	.nway_reset	= lan78xx_nway_reset,
> +	.nway_reset	= phy_ethtool_nway_reset,
>  	.get_drvinfo	= lan78xx_get_drvinfo,
>  	.get_msglevel	= lan78xx_get_msglevel,
>  	.set_msglevel	= lan78xx_set_msglevel,
> --
> 2.9.3
Acked-by: Woojung Huh <woojung.huh@microchip.com>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox