Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net 2/3] of_mdio: fix device reference leak in of_phy_find_device
From: Johan Hovold @ 2016-11-16 14:20 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Rob Herring, Frank Rowand, netdev, linux-kernel, Johan Hovold
In-Reply-To: <1479306038-27211-1-git-send-email-johan@kernel.org>

Make sure to drop the reference taken by bus_find_device() before
returning NULL from of_phy_find_device() when the found device is not a
PHY.

Fixes: 6ed742363b9c ("of: of_mdio: Ensure mdio device is a PHY")
Signed-off-by: Johan Hovold <johan@kernel.org>
---
 drivers/of/of_mdio.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c
index 8f4648383fb2..5a3145a02547 100644
--- a/drivers/of/of_mdio.c
+++ b/drivers/of/of_mdio.c
@@ -292,6 +292,7 @@ struct phy_device *of_phy_find_device(struct device_node *phy_np)
 		mdiodev = to_mdio_device(d);
 		if (mdiodev->flags & MDIO_DEVICE_FLAG_PHY)
 			return to_phy_device(d);
+		put_device(d);
 	}
 
 	return NULL;
-- 
2.7.3

^ permalink raw reply related

* [PATCH net 1/3] of_mdio: fix node leak in of_phy_register_fixed_link error path
From: Johan Hovold @ 2016-11-16 14:20 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Rob Herring, Frank Rowand, netdev, linux-kernel, Johan Hovold
In-Reply-To: <1479306038-27211-1-git-send-email-johan@kernel.org>

Make sure to drop the of_node reference also on failure to parse the
speed property in of_phy_register_fixed_link().

Fixes: 3be2a49e5c08 ("of: provide a binding for fixed link PHYs")
Signed-off-by: Johan Hovold <johan@kernel.org>
---
 drivers/of/of_mdio.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c
index b470f7e3521d..8f4648383fb2 100644
--- a/drivers/of/of_mdio.c
+++ b/drivers/of/of_mdio.c
@@ -456,8 +456,11 @@ int of_phy_register_fixed_link(struct device_node *np)
 		status.link = 1;
 		status.duplex = of_property_read_bool(fixed_link_node,
 						      "full-duplex");
-		if (of_property_read_u32(fixed_link_node, "speed", &status.speed))
+		if (of_property_read_u32(fixed_link_node, "speed",
+					 &status.speed)) {
+			of_node_put(fixed_link_node);
 			return -EINVAL;
+		}
 		status.pause = of_property_read_bool(fixed_link_node, "pause");
 		status.asym_pause = of_property_read_bool(fixed_link_node,
 							  "asym-pause");
-- 
2.7.3

^ permalink raw reply related

* [PATCH net 0/3] net: phy: fix of_node and device leaks
From: Johan Hovold @ 2016-11-16 14:20 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Rob Herring, Frank Rowand, netdev, linux-kernel, Johan Hovold

These patches fix a couple of of_node leaks in the fixed-link code and a
device reference leak in a phy helper.

Johan


Johan Hovold (3):
  of_mdio: fix node leak in of_phy_register_fixed_link error path
  of_mdio: fix device reference leak in of_phy_find_device
  net: phy: fixed_phy: fix of_node leak in fixed_phy_unregister

 drivers/net/phy/fixed_phy.c | 2 +-
 drivers/of/of_mdio.c        | 6 +++++-
 2 files changed, 6 insertions(+), 2 deletions(-)

-- 
2.7.3

^ permalink raw reply

* Re: [PATCH net-next] net/mlx5e: remove napi_hash_del() calls
From: Eric Dumazet @ 2016-11-16 14:20 UTC (permalink / raw)
  To: David Miller; +Cc: Saeed Mahameed, netdev
In-Reply-To: <1479304537.8455.167.camel@edumazet-glaptop3.roam.corp.google.com>

On Wed, 2016-11-16 at 05:55 -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Calling napi_hash_del() after netif_napi_del() is pointless.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Saeed Mahameed <saeedm@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c |    4 ----
>  1 file changed, 4 deletions(-)


Hmmm... missing body. Will send a v2.

^ permalink raw reply

* [PATCH net] cxgb4: do not call napi_hash_del()
From: Eric Dumazet @ 2016-11-16 14:19 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Hariprasad S

From: Eric Dumazet <edumazet@google.com>

Calling napi_hash_del() before netif_napi_del() is dangerous
if a synchronize_rcu() is not enforced before NAPI struct freeing.

Lets leave this detail to core networking stack and feel
more comfortable.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/sge.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index 1e74fd6085df..e19a0ca8e5dd 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -2951,7 +2951,6 @@ void free_rspq_fl(struct adapter *adap, struct sge_rspq *rq,
 		   rq->cntxt_id, fl_id, 0xffff);
 	dma_free_coherent(adap->pdev_dev, (rq->size + 1) * rq->iqe_len,
 			  rq->desc, rq->phys_addr);
-	napi_hash_del(&rq->napi);
 	netif_napi_del(&rq->napi);
 	rq->netdev = NULL;
 	rq->cntxt_id = rq->abs_id = 0;

^ permalink raw reply related

* [PATCH net] be2net: do not call napi_hash_del()
From: Eric Dumazet @ 2016-11-16 14:12 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Sathya Perla, Ajit Khaparde, Sriharsha Basavapatna,
	Somnath Kotur

From: Eric Dumazet <edumazet@google.com>

Calling napi_hash_del() before netif_napi_del() is dangerous
if a synchronize_rcu() is not enforced before NAPI struct freeing.

Lets leave this detail to core networking stack and feel
more comfortable.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Sathya Perla <sathya.perla@broadcom.com>
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Cc: Somnath Kotur <somnath.kotur@broadcom.com>
---
 drivers/net/ethernet/emulex/benet/be_main.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index cece8a08edca..93aa2939142a 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -2813,7 +2813,6 @@ static void be_evt_queues_destroy(struct be_adapter *adapter)
 		if (eqo->q.created) {
 			be_eq_clean(eqo);
 			be_cmd_q_destroy(adapter, &eqo->q, QTYPE_EQ);
-			napi_hash_del(&eqo->napi);
 			netif_napi_del(&eqo->napi);
 			free_cpumask_var(eqo->affinity_mask);
 		}

^ permalink raw reply related

* [PATCH] netronome: don't access real_num_rx_queues directly
From: Arnd Bergmann @ 2016-11-16 14:10 UTC (permalink / raw)
  To: David S. Miller
  Cc: Arnd Bergmann, Jakub Kicinski, Rolf Neugebauer, oss-drivers,
	netdev, linux-kernel

The netdev->real_num_rx_queues setting is only available if CONFIG_SYSFS
is enabled, so we now get a build failure when that is turned off:

netronome/nfp/nfp_net_common.c: In function 'nfp_net_ring_swap_enable':
netronome/nfp/nfp_net_common.c:2489:18: error: 'struct net_device' has no member named 'real_num_rx_queues'; did you mean 'real_num_tx_queues'?

As far as I can tell, the check here is only used as an optimization that
we can skip in order to fix the compilation. If sysfs is disabled,
the following netif_set_real_num_rx_queues() has no effect.

Fixes: 164d1e9e5d52 ("nfp: add support for ethtool .set_channels")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 99edb9fd84bf..eb3715700c95 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -2486,15 +2486,13 @@ nfp_net_ring_swap_enable(struct nfp_net *nn, unsigned int *num_vecs,
 	for (r = 0; r <	nn->max_r_vecs; r++)
 		nfp_net_vector_assign_rings(nn, &nn->r_vecs[r], r);
 
-	if (nn->netdev->real_num_rx_queues != nn->num_rx_rings) {
-		if (!netif_is_rxfh_configured(nn->netdev))
-			nfp_net_rss_init_itbl(nn);
+	if (!netif_is_rxfh_configured(nn->netdev))
+		nfp_net_rss_init_itbl(nn);
 
-		err = netif_set_real_num_rx_queues(nn->netdev,
-						   nn->num_rx_rings);
-		if (err)
-			return err;
-	}
+	err = netif_set_real_num_rx_queues(nn->netdev,
+					   nn->num_rx_rings);
+	if (err)
+		return err;
 
 	if (nn->netdev->real_num_tx_queues != nn->num_stack_tx_rings) {
 		err = netif_set_real_num_tx_queues(nn->netdev,
-- 
2.9.0

^ permalink raw reply related

* [patch net-next 8/8] rocker: Request a dump of FIB tables during init
From: Jiri Pirko @ 2016-11-16 14:09 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz, roopa,
	dsa, nikolay, andy, vivien.didelot, andrew, f.fainelli,
	alexander.h.duyck, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <1479305343-13167-1-git-send-email-jiri@resnulli.us>

From: Ido Schimmel <idosch@mellanox.com>

Make sure the device has a complete view of the FIB tables by invoking
their dump during module init.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/rocker/rocker_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/rocker/rocker_main.c b/drivers/net/ethernet/rocker/rocker_main.c
index b80ff12..6968473 100644
--- a/drivers/net/ethernet/rocker/rocker_main.c
+++ b/drivers/net/ethernet/rocker/rocker_main.c
@@ -2806,6 +2806,7 @@ static int rocker_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
 	rocker->fib_nb.notifier_call = rocker_router_fib_event;
 	register_fib_notifier(&rocker->fib_nb);
+	fib_notifier_dump(&rocker->fib_nb);
 
 	dev_info(&pdev->dev, "Rocker switch with id %*phN\n",
 		 (int)sizeof(rocker->hw.id), &rocker->hw.id);
-- 
2.7.4

^ permalink raw reply related

* [patch net-next 7/8] mlxsw: spectrum_router: Request a dump of FIB tables during init
From: Jiri Pirko @ 2016-11-16 14:09 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz, roopa,
	dsa, nikolay, andy, vivien.didelot, andrew, f.fainelli,
	alexander.h.duyck, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <1479305343-13167-1-git-send-email-jiri@resnulli.us>

From: Ido Schimmel <idosch@mellanox.com>

Make sure the device has a complete view of the FIB tables by invoking
their dump during module init.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index a8011a5..46126d9 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -2048,6 +2048,7 @@ int mlxsw_sp_router_init(struct mlxsw_sp *mlxsw_sp)
 
 	mlxsw_sp->fib_nb.notifier_call = mlxsw_sp_router_fib_event;
 	register_fib_notifier(&mlxsw_sp->fib_nb);
+	fib_notifier_dump(&mlxsw_sp->fib_nb);
 	return 0;
 
 err_neigh_init:
-- 
2.7.4

^ permalink raw reply related

* [patch net-next 6/8] ipv4: fib: Add an API to request a FIB dump
From: Jiri Pirko @ 2016-11-16 14:09 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz, roopa,
	dsa, nikolay, andy, vivien.didelot, andrew, f.fainelli,
	alexander.h.duyck, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <1479305343-13167-1-git-send-email-jiri@resnulli.us>

From: Ido Schimmel <idosch@mellanox.com>

Commit b90eb7549499 ("fib: introduce FIB notification infrastructure")
introduced a new notification chain to notify listeners (f.e., switchdev
drivers) about addition and deletion of routes.

However, upon registration to the chain the FIB tables can already be
populated, which means potential listeners will have an incomplete view
of the tables.

Solve that by adding an API to request a FIB dump. The dump itself it
done using RCU in order not to starve consumers that need RTNL to make
progress.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/ip_fib.h |   1 +
 net/ipv4/fib_trie.c  | 108 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 109 insertions(+)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index b9314b4..095a374 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -221,6 +221,7 @@ enum fib_event_type {
 	FIB_EVENT_RULE_DEL,
 };
 
+void fib_notifier_dump(struct notifier_block *nb);
 int register_fib_notifier(struct notifier_block *nb);
 int unregister_fib_notifier(struct notifier_block *nb);
 int call_fib_notifiers(struct net *net, enum fib_event_type event_type,
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 2195601..9670f3f 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -86,6 +86,58 @@
 
 static ATOMIC_NOTIFIER_HEAD(fib_chain);
 
+static int call_fib_notifier(struct notifier_block *nb, struct net *net,
+			     enum fib_event_type event_type,
+			     struct fib_notifier_info *info)
+{
+	info->net = net;
+	return nb->notifier_call(nb, event_type, info);
+}
+
+static void fib_rules_notify(struct net *net, struct notifier_block *nb,
+			     enum fib_event_type event_type)
+{
+#ifdef CONFIG_IP_MULTIPLE_TABLES
+	struct fib_notifier_info info;
+
+	if (net->ipv4.fib_has_custom_rules)
+		call_fib_notifier(nb, net, event_type, &info);
+#endif
+}
+
+static void fib_notify(struct net *net, struct notifier_block *nb,
+		       enum fib_event_type event_type);
+
+static int call_fib_entry_notifier(struct notifier_block *nb, struct net *net,
+				   enum fib_event_type event_type, u32 dst,
+				   int dst_len, struct fib_info *fi,
+				   u8 tos, u8 type, u32 tb_id, u32 nlflags)
+{
+	struct fib_entry_notifier_info info = {
+		.dst = dst,
+		.dst_len = dst_len,
+		.fi = fi,
+		.tos = tos,
+		.type = type,
+		.tb_id = tb_id,
+		.nlflags = nlflags,
+	};
+	return call_fib_notifier(nb, net, event_type, &info.info);
+}
+
+void fib_notifier_dump(struct notifier_block *nb)
+{
+	struct net *net;
+
+	rcu_read_lock();
+	for_each_net_rcu(net) {
+		fib_rules_notify(net, nb, FIB_EVENT_RULE_ADD);
+		fib_notify(net, nb, FIB_EVENT_ENTRY_ADD);
+	}
+	rcu_read_unlock();
+}
+EXPORT_SYMBOL(fib_notifier_dump);
+
 int register_fib_notifier(struct notifier_block *nb)
 {
 	return atomic_notifier_chain_register(&fib_chain, nb);
@@ -1834,6 +1886,62 @@ int fib_table_flush(struct net *net, struct fib_table *tb)
 	return found;
 }
 
+static void fib_leaf_notify(struct net *net, struct key_vector *l,
+			    struct fib_table *tb, struct notifier_block *nb,
+			    enum fib_event_type event_type)
+{
+	struct fib_alias *fa;
+
+	hlist_for_each_entry_rcu(fa, &l->leaf, fa_list) {
+		struct fib_info *fi = fa->fa_info;
+
+		if (!fi)
+			continue;
+
+		/* local and main table can share the same trie,
+		 * so don't notify twice for the same entry.
+		 */
+		if (tb->tb_id != fa->tb_id)
+			continue;
+
+		call_fib_entry_notifier(nb, net, event_type, l->key,
+					KEYLENGTH - fa->fa_slen, fi, fa->fa_tos,
+					fa->fa_type, fa->tb_id, 0);
+	}
+}
+
+static void fib_table_notify(struct net *net, struct fib_table *tb,
+			     struct notifier_block *nb,
+			     enum fib_event_type event_type)
+{
+	struct trie *t = (struct trie *)tb->tb_data;
+	struct key_vector *l, *tp = t->kv;
+	t_key key = 0;
+
+	while ((l = leaf_walk_rcu(&tp, key)) != NULL) {
+		fib_leaf_notify(net, l, tb, nb, event_type);
+
+		key = l->key + 1;
+		/* stop in case of wrap around */
+		if (key < l->key)
+			break;
+	}
+}
+
+static void fib_notify(struct net *net, struct notifier_block *nb,
+		       enum fib_event_type event_type)
+{
+	unsigned int h;
+
+	for (h = 0; h < FIB_TABLE_HASHSZ; h++) {
+		struct hlist_head *head = &net->ipv4.fib_table_hash[h];
+		struct fib_table *tb;
+
+		hlist_for_each_entry_rcu(tb, head, tb_hlist)
+			fib_table_notify(net, tb, nb, event_type);
+	}
+}
+
 static void __trie_free_rcu(struct rcu_head *head)
 {
 	struct fib_table *tb = container_of(head, struct fib_table, rcu);
-- 
2.7.4

^ permalink raw reply related

* [patch net-next 5/8] net: ipv4: Send notifications only after removing FIB alias
From: Jiri Pirko @ 2016-11-16 14:09 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz, roopa,
	dsa, nikolay, andy, vivien.didelot, andrew, f.fainelli,
	alexander.h.duyck, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <1479305343-13167-1-git-send-email-jiri@resnulli.us>

From: Ido Schimmel <idosch@mellanox.com>

When removing a FIB alias we should send a notification (both to user
space and in kernel) only after the fact, or otherwise we could end up
in problematic situations.

For example, assume we have two tasks:
a) Task A - Removing a FIB alias (fa1) following RTM_DELROUTE
b) Task B - Replaying FIB tables under RCU for listener (l) following
   fib_notifier_dump() (to be introduced in the next patch).

Timeline:
t0 - Task A calls FIB_EVENT_ENTRY_DEL for fa1. Will be ignored by
     listener l since it's not aware of fa1.
t1 - Task B reaches fa1 in the trie and replays it to listener l using
     FIB_EVENT_ENTRY_ADD.
t2 - Task A removes fa1 from the trie.

The above will result in listener l (f.e., some capable device) having
fa1 in its tables whereas fa1 isn't present in the kernel's table
anymore.

If we always send notifications after the fact, then we can avoid such
races. During insertion, if we traversed the trie when fa1 wasn't
present, then we'll eventually get the notification and process it.
During deletion, if we traversed the trie when fa1 was still present,
then the notification will let us know that it actually needs to be
removed.

The above is consistent with insertion of a FIB alias, where
notifications are sent only after the fact.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/ipv4/fib_trie.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 64a51ec..2195601 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1569,17 +1569,17 @@ int fib_table_delete(struct net *net, struct fib_table *tb,
 	if (!fa_to_delete)
 		return -ESRCH;

+	if (!plen)
+		tb->tb_num_default--;
+
+	fib_remove_alias(t, tp, l, fa_to_delete);
+
 	call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, key, plen,
 				 fa_to_delete->fa_info, tos, cfg->fc_type,
 				 tb->tb_id, 0);
 	rtmsg_fib(RTM_DELROUTE, htonl(key), fa_to_delete, plen, tb->tb_id,
 		  &cfg->fc_nlinfo, 0);

-	if (!plen)
-		tb->tb_num_default--;
-
-	fib_remove_alias(t, tp, l, fa_to_delete);
-
 	if (fa_to_delete->fa_state & FA_S_ACCESSED)
 		rt_cache_flush(cfg->fc_nlinfo.nl_net);

@@ -1810,12 +1810,12 @@ int fib_table_flush(struct net *net, struct fib_table *tb)
 				continue;
 			}

+			hlist_del_rcu(&fa->fa_list);
 			call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_DEL,
 						 n->key,
 						 KEYLENGTH - fa->fa_slen,
 						 fi, fa->fa_tos, fa->fa_type,
 						 tb->tb_id, 0);
-			hlist_del_rcu(&fa->fa_list);
 			fib_release_info(fa->fa_info);
 			alias_free_mem_rcu(fa);
 			found++;
-- 
2.7.4

^ permalink raw reply related

* [patch net-next 4/8] ipv4: fib: Convert FIB notification chain to be atomic
From: Jiri Pirko @ 2016-11-16 14:08 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz, roopa,
	dsa, nikolay, andy, vivien.didelot, andrew, f.fainelli,
	alexander.h.duyck, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <1479305343-13167-1-git-send-email-jiri@resnulli.us>

From: Ido Schimmel <idosch@mellanox.com>

In order not to hold RTNL for long periods of time we're going to dump
the FIB tables using RCU.

Convert the FIB notification chain to be atomic, as we can't block in
RCU critical sections.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/ipv4/fib_trie.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 4cff74d..64a51ec 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -84,17 +84,17 @@
 #include <trace/events/fib.h>
 #include "fib_lookup.h"
 
-static BLOCKING_NOTIFIER_HEAD(fib_chain);
+static ATOMIC_NOTIFIER_HEAD(fib_chain);
 
 int register_fib_notifier(struct notifier_block *nb)
 {
-	return blocking_notifier_chain_register(&fib_chain, nb);
+	return atomic_notifier_chain_register(&fib_chain, nb);
 }
 EXPORT_SYMBOL(register_fib_notifier);
 
 int unregister_fib_notifier(struct notifier_block *nb)
 {
-	return blocking_notifier_chain_unregister(&fib_chain, nb);
+	return atomic_notifier_chain_unregister(&fib_chain, nb);
 }
 EXPORT_SYMBOL(unregister_fib_notifier);
 
@@ -102,7 +102,7 @@ int call_fib_notifiers(struct net *net, enum fib_event_type event_type,
 		       struct fib_notifier_info *info)
 {
 	info->net = net;
-	return blocking_notifier_call_chain(&fib_chain, event_type, info);
+	return atomic_notifier_call_chain(&fib_chain, event_type, info);
 }
 
 static int call_fib_entry_notifiers(struct net *net,
-- 
2.7.4

^ permalink raw reply related

* [patch net-next 3/8] rocker: Implement FIB offload in delayed work
From: Jiri Pirko @ 2016-11-16 14:08 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz, roopa,
	dsa, nikolay, andy, vivien.didelot, andrew, f.fainelli,
	alexander.h.duyck, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <1479305343-13167-1-git-send-email-jiri@resnulli.us>

From: Ido Schimmel <idosch@mellanox.com>

Convert rocker to offload FIBs in delayed work in a similar fashion to
mlxsw, which was converted in the previous patch.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/rocker/rocker_main.c  | 58 +++++++++++++++++++++++++-----
 drivers/net/ethernet/rocker/rocker_ofdpa.c |  1 +
 2 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker_main.c b/drivers/net/ethernet/rocker/rocker_main.c
index 67df4cf..b80ff12 100644
--- a/drivers/net/ethernet/rocker/rocker_main.c
+++ b/drivers/net/ethernet/rocker/rocker_main.c
@@ -2165,28 +2165,70 @@ static const struct switchdev_ops rocker_port_switchdev_ops = {
 	.switchdev_port_obj_dump	= rocker_port_obj_dump,
 };
 
-static int rocker_router_fib_event(struct notifier_block *nb,
-				   unsigned long event, void *ptr)
+struct rocker_fib_event_work {
+	struct fib_entry_notifier_info fen_info;
+	struct rocker *rocker;
+	struct delayed_work dw;
+	unsigned long event;
+};
+
+static void rocker_router_fib_event_work(struct work_struct *work)
 {
-	struct rocker *rocker = container_of(nb, struct rocker, fib_nb);
-	struct fib_entry_notifier_info *fen_info = ptr;
+	struct rocker_fib_event_work *fib_work =
+		container_of(work, struct rocker_fib_event_work, dw.work);
+	struct rocker *rocker = fib_work->rocker;
 	int err;
 
-	switch (event) {
+	/* Protect internal structures from changes */
+	rtnl_lock();
+	switch (fib_work->event) {
 	case FIB_EVENT_ENTRY_ADD:
-		err = rocker_world_fib4_add(rocker, fen_info);
+		err = rocker_world_fib4_add(rocker, &fib_work->fen_info);
 		if (err)
 			rocker_world_fib4_abort(rocker);
-		else
+		fib_info_put(fib_work->fen_info.fi);
 		break;
 	case FIB_EVENT_ENTRY_DEL:
-		rocker_world_fib4_del(rocker, fen_info);
+		rocker_world_fib4_del(rocker, &fib_work->fen_info);
+		fib_info_put(fib_work->fen_info.fi);
 		break;
 	case FIB_EVENT_RULE_ADD: /* fall through */
 	case FIB_EVENT_RULE_DEL:
 		rocker_world_fib4_abort(rocker);
 		break;
 	}
+	rtnl_unlock();
+	kfree(fib_work);
+}
+
+/* Called with rcu_read_lock() */
+static int rocker_router_fib_event(struct notifier_block *nb,
+				   unsigned long event, void *ptr)
+{
+	struct rocker *rocker = container_of(nb, struct rocker, fib_nb);
+	struct rocker_fib_event_work *fib_work;
+
+	fib_work = kzalloc(sizeof(*fib_work), GFP_ATOMIC);
+	if (WARN_ON(!fib_work))
+		return NOTIFY_BAD;
+
+	INIT_DELAYED_WORK(&fib_work->dw, rocker_router_fib_event_work);
+	fib_work->rocker = rocker;
+	fib_work->event = event;
+
+	switch (event) {
+	case FIB_EVENT_ENTRY_ADD: /* fall through */
+	case FIB_EVENT_ENTRY_DEL:
+		memcpy(&fib_work->fen_info, ptr, sizeof(fib_work->fen_info));
+		/* Take referece on fib_info to prevent it from being
+		 * freed while work is queued. Release it afterwards.
+		 */
+		atomic_inc(&fib_work->fen_info.fi->fib_clntref);
+		break;
+	}
+
+	schedule_work(&fib_work->dw.work);
+
 	return NOTIFY_DONE;
 }
 
diff --git a/drivers/net/ethernet/rocker/rocker_ofdpa.c b/drivers/net/ethernet/rocker/rocker_ofdpa.c
index 4ca4613..5d7c738 100644
--- a/drivers/net/ethernet/rocker/rocker_ofdpa.c
+++ b/drivers/net/ethernet/rocker/rocker_ofdpa.c
@@ -2516,6 +2516,7 @@ static void ofdpa_fini(struct rocker *rocker)
 	int bkt;
 
 	del_timer_sync(&ofdpa->fdb_cleanup_timer);
+	flush_scheduled_work();
 
 	spin_lock_irqsave(&ofdpa->flow_tbl_lock, flags);
 	hash_for_each_safe(ofdpa->flow_tbl, bkt, tmp, flow_entry, entry)
-- 
2.7.4

^ permalink raw reply related

* [patch net-next 2/8] mlxsw: spectrum_router: Implement FIB offload in delayed work
From: Jiri Pirko @ 2016-11-16 14:08 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz, roopa,
	dsa, nikolay, andy, vivien.didelot, andrew, f.fainelli,
	alexander.h.duyck, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <1479305343-13167-1-git-send-email-jiri@resnulli.us>

From: Ido Schimmel <idosch@mellanox.com>

FIB offload is currently done in process context with RTNL held, but
we're about to dump the FIB tables in RCU critical section, so we can no
longer sleep.

Instead, defer the operation to process context using delayed work. Make
sure fib info isn't freed while the work is queued by taking a reference
on it and releasing it after the operation is done.

Deferring the operation is valid because the upper layers always assume
the operation was successful. If it's not, then the driver-specific
abort mechanism is called and all routed traffic is directed to slow
path.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/core.c         |  6 ++
 drivers/net/ethernet/mellanox/mlxsw/core.h         |  1 +
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  | 72 +++++++++++++++++++---
 3 files changed, 69 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 6004817..7874e30 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -1839,6 +1839,12 @@ int mlxsw_core_schedule_dw(struct delayed_work *dwork, unsigned long delay)
 }
 EXPORT_SYMBOL(mlxsw_core_schedule_dw);
 
+void mlxsw_core_flush_wq(void)
+{
+	flush_workqueue(mlxsw_wq);
+}
+EXPORT_SYMBOL(mlxsw_core_flush_wq);
+
 static int __init mlxsw_core_module_init(void)
 {
 	int err;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index c0acc1b..e382ed0 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -156,6 +156,7 @@ enum devlink_port_type mlxsw_core_port_type_get(struct mlxsw_core *mlxsw_core,
 						u8 local_port);
 
 int mlxsw_core_schedule_dw(struct delayed_work *dwork, unsigned long delay);
+void mlxsw_core_flush_wq(void);
 
 #define MLXSW_CONFIG_PROFILE_SWID_COUNT 8
 
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 683f045..a8011a5 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -593,6 +593,14 @@ static void mlxsw_sp_router_fib_flush(struct mlxsw_sp *mlxsw_sp);
 
 static void mlxsw_sp_vrs_fini(struct mlxsw_sp *mlxsw_sp)
 {
+	/* At this stage we're guaranteed not to have new incoming
+	 * FIB notifications and the work queue is free from FIBs
+	 * sitting on top of mlxsw netdevs. However, we can still
+	 * have other FIBs queued. Flush the queue before flushing
+	 * the device's tables. No need for locks, as we're the only
+	 * writer.
+	 */
+	mlxsw_core_flush_wq();
 	mlxsw_sp_router_fib_flush(mlxsw_sp);
 	kfree(mlxsw_sp->router.vrs);
 }
@@ -1948,30 +1956,74 @@ static void __mlxsw_sp_router_fini(struct mlxsw_sp *mlxsw_sp)
 	kfree(mlxsw_sp->rifs);
 }
 
-static int mlxsw_sp_router_fib_event(struct notifier_block *nb,
-				     unsigned long event, void *ptr)
+struct mlxsw_sp_fib_event_work {
+	struct fib_entry_notifier_info fen_info;
+	struct mlxsw_sp *mlxsw_sp;
+	struct delayed_work dw;
+	unsigned long event;
+};
+
+static void mlxsw_sp_router_fib_event_work(struct work_struct *work)
 {
-	struct mlxsw_sp *mlxsw_sp = container_of(nb, struct mlxsw_sp, fib_nb);
-	struct fib_entry_notifier_info *fen_info = ptr;
+	struct mlxsw_sp_fib_event_work *fib_work =
+		container_of(work, struct mlxsw_sp_fib_event_work, dw.work);
+	struct mlxsw_sp *mlxsw_sp = fib_work->mlxsw_sp;
 	int err;
 
-	if (!net_eq(fen_info->info.net, &init_net))
-		return NOTIFY_DONE;
-
-	switch (event) {
+	/* Protect internal structures from changes */
+	rtnl_lock();
+	switch (fib_work->event) {
 	case FIB_EVENT_ENTRY_ADD:
-		err = mlxsw_sp_router_fib4_add(mlxsw_sp, fen_info);
+		err = mlxsw_sp_router_fib4_add(mlxsw_sp, &fib_work->fen_info);
 		if (err)
 			mlxsw_sp_router_fib4_abort(mlxsw_sp);
+		fib_info_put(fib_work->fen_info.fi);
 		break;
 	case FIB_EVENT_ENTRY_DEL:
-		mlxsw_sp_router_fib4_del(mlxsw_sp, fen_info);
+		mlxsw_sp_router_fib4_del(mlxsw_sp, &fib_work->fen_info);
+		fib_info_put(fib_work->fen_info.fi);
 		break;
 	case FIB_EVENT_RULE_ADD: /* fall through */
 	case FIB_EVENT_RULE_DEL:
 		mlxsw_sp_router_fib4_abort(mlxsw_sp);
 		break;
 	}
+	rtnl_unlock();
+	kfree(fib_work);
+}
+
+/* Called with rcu_read_lock() */
+static int mlxsw_sp_router_fib_event(struct notifier_block *nb,
+				     unsigned long event, void *ptr)
+{
+	struct mlxsw_sp *mlxsw_sp = container_of(nb, struct mlxsw_sp, fib_nb);
+	struct mlxsw_sp_fib_event_work *fib_work;
+	struct fib_notifier_info *info = ptr;
+
+	if (!net_eq(info->net, &init_net))
+		return NOTIFY_DONE;
+
+	fib_work = kzalloc(sizeof(*fib_work), GFP_ATOMIC);
+	if (WARN_ON(!fib_work))
+		return NOTIFY_BAD;
+
+	INIT_DELAYED_WORK(&fib_work->dw, mlxsw_sp_router_fib_event_work);
+	fib_work->mlxsw_sp = mlxsw_sp;
+	fib_work->event = event;
+
+	switch (event) {
+	case FIB_EVENT_ENTRY_ADD: /* fall through */
+	case FIB_EVENT_ENTRY_DEL:
+		memcpy(&fib_work->fen_info, ptr, sizeof(fib_work->fen_info));
+		/* Take referece on fib_info to prevent it from being
+		 * freed while work is queued. Release it afterwards.
+		 */
+		atomic_inc(&fib_work->fen_info.fi->fib_clntref);
+		break;
+	}
+
+	mlxsw_core_schedule_dw(&fib_work->dw, 0);
+
 	return NOTIFY_DONE;
 }
 
-- 
2.7.4

^ permalink raw reply related

* [patch net-next 1/8] ipv4: fib: Export free_fib_info()
From: Jiri Pirko @ 2016-11-16 14:08 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz, roopa,
	dsa, nikolay, andy, vivien.didelot, andrew, f.fainelli,
	alexander.h.duyck, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <1479305343-13167-1-git-send-email-jiri@resnulli.us>

From: Ido Schimmel <idosch@mellanox.com>

The FIB notification chain is going to be converted to an atomic chain,
which means switchdev drivers will have to offload FIB entries in
delayed work, as hardware operations entail sleeping.

However, while the work is queued fib info might be freed, so a
reference must be taken. To release the reference (and potentially free
the fib info) fib_info_put() will be called, which in turn calls
free_fib_info().

Export free_fib_info() so that modules will be able to invoke
fib_info_put().

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/ipv4/fib_semantics.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 388d3e2..c1bc1e9 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -234,6 +234,7 @@ void free_fib_info(struct fib_info *fi)
 #endif
 	call_rcu(&fi->rcu, free_fib_info_rcu);
 }
+EXPORT_SYMBOL_GPL(free_fib_info);

 void fib_release_info(struct fib_info *fi)
 {
-- 
2.7.4

^ permalink raw reply related

* [patch net-next 0/8] ipv4: fib: Allow modules to dump FIB tables
From: Jiri Pirko @ 2016-11-16 14:08 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz, roopa,
	dsa, nikolay, andy, vivien.didelot, andrew, f.fainelli,
	alexander.h.duyck, kuznet, jmorris, yoshfuji, kaber

From: Jiri Pirko <jiri@mellanox.com>

Ido says:

In kernel 4.9 the switchdev-specific FIB offload mechanism was replaced
by a new FIB notification chain to which modules could register in order
to be notified about the addition and deletion of FIB entries. The
motivation for this change was that switchdev drivers need to be able to
reflect the entire FIB table and not only FIBs configured on top of the
port netdevs themselves. This is useful in case of in-band management.

The fundamental problem with this approach is that upon registration
listeners lose all the information previously sent in the chain and
thus have an incomplete view of the FIB tables, which can result in
packet loss. This patchset fixes that by introducing a new API to dump
the FIB tables.

The entire dump process is done under RCU and thus the FIB notification
chain is converted to be atomic. The listeners are modified accordingly.
This is done in the first five patches.

The sixth patch adds the dump callback itself and the next patches call
it from current listeners of the FIB notification chain.

Ido Schimmel (8):
  ipv4: fib: Export free_fib_info()
  mlxsw: spectrum_router: Implement FIB offload in delayed work
  rocker: Implement FIB offload in delayed work
  ipv4: fib: Convert FIB notification chain to be atomic
  net: ipv4: Send notifications only after removing FIB alias
  ipv4: fib: Add an API to request a FIB dump
  mlxsw: spectrum_router: Request a dump of FIB tables during init
  rocker: Request a dump of FIB tables during init

 drivers/net/ethernet/mellanox/mlxsw/core.c         |   6 +
 drivers/net/ethernet/mellanox/mlxsw/core.h         |   1 +
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  |  73 ++++++++++--
 drivers/net/ethernet/rocker/rocker_main.c          |  59 ++++++++--
 drivers/net/ethernet/rocker/rocker_ofdpa.c         |   1 +
 include/net/ip_fib.h                               |   1 +
 net/ipv4/fib_semantics.c                           |   1 +
 net/ipv4/fib_trie.c                                | 128 +++++++++++++++++++--
 8 files changed, 242 insertions(+), 28 deletions(-)

-- 
2.7.4

^ permalink raw reply

* Aw: [PATCH 1/1] ixgbe: write flush vfta registers
From: Lino Sanfilippo @ 2016-11-16 14:05 UTC (permalink / raw)
  To: zyjzyj2000
  Cc: e1000-devel, netdev, jeffrey.t.kirsher, intel-wired-lan,
	Zhu Yanjun
In-Reply-To: <1479300041-3436-1-git-send-email-zyjzyj2000@gmail.com>



Hi,

> 
> Sometimes vfta registers can not be written successfully in dcb mode.
> This is very occassional. When the ixgbe nic runs for a very long time,
> sometimes this bug occurs. But after IXGBE_WRITE_FLUSH is executed,
> this bug never occurs.
> 
> Signed-off-by: Zhu Yanjun <zyjzyj2000@gmail.com>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index bd93d82..1221cfb 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -4138,8 +4138,10 @@ static void ixgbe_vlan_promisc_enable(struct ixgbe_adapter *adapter)
>  	}
>  
>  	/* Set all bits in the VLAN filter table array */
> -	for (i = hw->mac.vft_size; i--;)
> +	for (i = hw->mac.vft_size; i--;) {
>  		IXGBE_WRITE_REG(hw, IXGBE_VFTA(i), ~0U);
> +		IXGBE_WRITE_FLUSH(hw);
> +	}

Should it not be sufficient to do the flush only once, at the end of the function?

Regards,
Lino

^ permalink raw reply

* Re: [PATCH 1/2] ethernet: stmmac: make DWMAC_STM32 depend on it's associated SoC
From: Peter Robinson @ 2016-11-16 14:01 UTC (permalink / raw)
  To: David Miller; +Cc: mcoquelin.stm32, alexandre.torgue, peppe.cavallaro, netdev
In-Reply-To: <20161106.214128.668519501329683090.davem@davemloft.net>

On Mon, Nov 7, 2016 at 2:41 AM, David Miller <davem@davemloft.net> wrote:
> From: Peter Robinson <pbrobinson@gmail.com>
> Date: Sun,  6 Nov 2016 20:04:37 +0000
>
>> There's not much point, except compile test, enabling the stmmac
>> platform drivers unless the STM32 SoC is enabled. It's not
>> useful without it.
>>
>> Signed-off-by: Peter Robinson <pbrobinson@gmail.com>
>
> Please don't post just some of the patches in a patch series.
>
> Always post the complete series, with a proper "[PATCH 0/2] ..."
> posting at the beginning.

Sorry, it was a stm branch, I should have broke it our differently. I
can recompose and send this one again on it's own.

Peter

^ permalink raw reply

* [PATCH net-next] sfc: remove napi_hash_del() call
From: Eric Dumazet @ 2016-11-16 14:01 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Edward Cree, Bert Kenward

From: Eric Dumazet <edumazet@google.com>

Calling napi_hash_del() after netif_napi_del() is pointless.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Bert Kenward <bkenward@solarflare.com>
---
 drivers/net/ethernet/sfc/efx.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 649bc508fa4b9ea42aa08331c50335e1a51f9259..52ca0404c49121f1c86126ab7d443b1144f20720 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -2113,10 +2113,9 @@ static void efx_init_napi(struct efx_nic *efx)
 
 static void efx_fini_napi_channel(struct efx_channel *channel)
 {
-	if (channel->napi_dev) {
+	if (channel->napi_dev)
 		netif_napi_del(&channel->napi_str);
-		napi_hash_del(&channel->napi_str);
-	}
+
 	channel->napi_dev = NULL;
 }
 

^ permalink raw reply related

* Re: [PATCH net] virtio-net: add a missing synchronize_net()
From: Michael S. Tsirkin @ 2016-11-16 13:55 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Jason Wang
In-Reply-To: <1479277452.8455.156.camel@edumazet-glaptop3.roam.corp.google.com>

On Tue, Nov 15, 2016 at 10:24:12PM -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> It seems many drivers do not respect napi_hash_del() contract.
> 
> When napi_hash_del() is used before netif_napi_del(), an RCU grace
> period is needed before freeing NAPI object.
> 
> Fixes: 91815639d880 ("virtio-net: rx busy polling support")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>


> ---
>  drivers/net/virtio_net.c |    5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index fd8b1e62301f..7276d5a95bd0 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1497,6 +1497,11 @@ static void virtnet_free_queues(struct virtnet_info *vi)
>  		netif_napi_del(&vi->rq[i].napi);
>  	}
>  
> +	/* We called napi_hash_del() before netif_napi_del(),
> +	 * we need to respect an RCU grace period before freeing vi->rq
> +	 */
> +	synchronize_net();
> +
>  	kfree(vi->rq);
>  	kfree(vi->sq);
>  }
> 

^ permalink raw reply

* [PATCH net-next] net/mlx5e: remove napi_hash_del() calls
From: Eric Dumazet @ 2016-11-16 13:55 UTC (permalink / raw)
  To: David Miller; +Cc: Saeed Mahameed, netdev

From: Eric Dumazet <edumazet@google.com>

Calling napi_hash_del() after netif_napi_del() is pointless.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |    4 ----
 1 file changed, 4 deletions(-)

^ permalink raw reply

* Re: [PATCH net-next v2 2/4] bpf, mlx5: fix various refcount issues in mlx5e_xdp_set
From: Daniel Borkmann @ 2016-11-16 13:54 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Alexei Starovoitov, Brenden Blanco, zhiyisun,
	Rana Shahout, Saeed Mahameed, Linux Netdev List
In-Reply-To: <CALzJLG_r9tujvSVysMXLSO8cRR6zHaHfmacmGq-hCqg5Ou7w_w@mail.gmail.com>

On 11/16/2016 01:25 PM, Saeed Mahameed wrote:
> On Wed, Nov 16, 2016 at 2:04 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>> There are multiple issues in mlx5e_xdp_set():
>>
>> 1) The batched bpf_prog_add() is currently not checked for errors! When
>>     doing so, it should be done at an earlier point in time to makes sure
>>     that we cannot fail anymore at the time we want to set the program for
>>     each channel. This only means that we have to undo the bpf_prog_add()
>>     in case we return due to required reset.
>>
>> 2) When swapping the priv->xdp_prog, then no extra reference count must be
>>     taken since we got that from call path via dev_change_xdp_fd() already.
>>     Otherwise, we'd never be able to free the program. Also, bpf_prog_add()
>>     without checking the return code could fail.
>>
>> Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>> ---
>>   drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 25 ++++++++++++++++++-----
>>   1 file changed, 20 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> index ab0c336..cf26672 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> @@ -3142,6 +3142,17 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
>>                  goto unlock;
>>          }
>>
>> +       if (prog) {
>> +               /* num_channels is invariant here, so we can take the
>> +                * batched reference right upfront.
>> +                */
>> +               prog = bpf_prog_add(prog, priv->params.num_channels);
>> +               if (IS_ERR(prog)) {
>> +                       err = PTR_ERR(prog);
>> +                       goto unlock;
>> +               }
>> +       }
>> +
>
> With this approach you will end up taking a ref count twice per RQ! on
> the first time xdp_set is called i.e (old_prog = NULL, prog != NULL).
> One ref will be taken per RQ/Channel from the above code, and since
> reset will be TRUE mlx5e_open_locked will be called and another ref
> count will be taken on mlx5e_create_rq.
>
> The issue here is that we have two places for ref count accounting,
> xdp_set and  mlx5e_create_rq. Having ref-count updates in
> mlx5e_create_rq is essential for num_channels configuration changes
> (mlx5e_set_ringparam).
>
> Our previous approach made sure that only one path will do the ref
> counting (mlx5e_open_locked vs. mlx5e_xdp_set batch ref update when
> reset == false).

That is correct, for a short time bpf_prog_add() was charged also when
we reset. When we reset, we will then jump to unlock_put and safely undo
this since we took ref from mlx5e_create_rq() already in that case, and
return successfully. That was intentional, so that eventually we end up
just taking a /single/ ref per RQ when we exit mlx5e_xdp_set(), but more
below ...

[...]
> 2. Keep the current approach and make sure to not update the ref count
> twice, you can batch update only if (!reset && was_open) otherwise you
> can rely on mlx5e_open_locked to take the num_channels ref count for
> you.
>
> Personally I prefer option 2, since we will keep the current logic
> which will allow configuration updates such as (mlx5e_set_ringparam)
> to not worry about ref counting since it will be done in the reset
> flow.

... agree on keeping current approach. I actually like the idea, so we'd
end up with this simpler version for the batched ref then.

Note, your "bpf_prog_add(prog, 1) // one ref for the device." is not needed
since this we already got that one through dev_change_xdp_fd() as mentioned.

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index ab0c336..09ac27c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3148,11 +3148,21 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)

  	if (was_opened && reset)
  		mlx5e_close_locked(netdev);
+	if (was_opened && !reset) {
+		/* num_channels is invariant here, so we can take the
+		 * batched reference right upfront.
+		 */
+		prog = bpf_prog_add(prog, priv->params.num_channels);
+		if (IS_ERR(prog)) {
+			err = PTR_ERR(prog);
+			goto unlock;
+		}
+	}

-	/* exchange programs */
+	/* exchange programs, extra prog reference we got from caller
+	 * as long as we don't fail from this point onwards.
+	 */
  	old_prog = xchg(&priv->xdp_prog, prog);
-	if (prog)
-		bpf_prog_add(prog, 1);
  	if (old_prog)
  		bpf_prog_put(old_prog);

@@ -3168,7 +3178,6 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
  	/* exchanging programs w/o reset, we update ref counts on behalf
  	 * of the channels RQs here.
  	 */
-	bpf_prog_add(prog, priv->params.num_channels);
  	for (i = 0; i < priv->params.num_channels; i++) {
  		struct mlx5e_channel *c = priv->channel[i];

-- 
1.9.3

^ permalink raw reply related

* Re: [PATCH v2] net: ethernet: faraday: To support device tree usage.
From: Andrew Lunn @ 2016-11-16 13:53 UTC (permalink / raw)
  To: Greentime Hu; +Cc: netdev, linux-kernel, jiri
In-Reply-To: <1479297826-4335-1-git-send-email-green.hu@gmail.com>

On Wed, Nov 16, 2016 at 08:03:46PM +0800, Greentime Hu wrote:
> Signed-off-by: Greentime Hu <green.hu@gmail.com>
> ---
>  drivers/net/ethernet/faraday/ftmac100.c |    7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/net/ethernet/faraday/ftmac100.c b/drivers/net/ethernet/faraday/ftmac100.c
> index dce5f7b..5d70ee9 100644
> --- a/drivers/net/ethernet/faraday/ftmac100.c
> +++ b/drivers/net/ethernet/faraday/ftmac100.c
> @@ -1172,11 +1172,17 @@ static int __exit ftmac100_remove(struct platform_device *pdev)
>  	return 0;
>  }
>  
> +static const struct of_device_id ftmac100_of_ids[] = {
> +	{ .compatible = "andestech,atmac100" },

Please see my comment to v1.

You are also supposed to add a binding document.

    Andrew

^ permalink raw reply

* Re: [PATCH v2] net/phy/vitesse: Configure RGMII skew on VSC8601, if needed
From: Andrew Lunn @ 2016-11-16 13:50 UTC (permalink / raw)
  To: Alexandru Gagniuc; +Cc: f.fainelli, netdev, linux-kernel, davem
In-Reply-To: <1479286953-11481-1-git-send-email-alex.g@adaptrum.com>

On Wed, Nov 16, 2016 at 01:02:33AM -0800, Alexandru Gagniuc wrote:
> With RGMII, we need a 1.5 to 2ns skew between clock and data lines. The
> VSC8601 can handle this internally. While the VSC8601 can set more
> fine-grained delays, the standard skew settings work out of the box.
> The same heuristic is used to determine when this skew should be enabled
> as in vsc824x_config_init().
> 
> Tested on custom board with AM3352 SOC and VSC801 PHY.
> 
> Signed-off-by: Alexandru Gagniuc <alex.g@adaptrum.com>
> ---
> Changes since v1:
>  * Added comment detailing applicability to different RGMII interfaces.
> 
>  drivers/net/phy/vitesse.c | 34 +++++++++++++++++++++++++++++++++-
>  1 file changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/phy/vitesse.c b/drivers/net/phy/vitesse.c
> index 2e37eb3..24b4a09 100644
> --- a/drivers/net/phy/vitesse.c
> +++ b/drivers/net/phy/vitesse.c
> @@ -62,6 +62,10 @@
>  /* Vitesse Extended Page Access Register */
>  #define MII_VSC82X4_EXT_PAGE_ACCESS	0x1f
>  
> +/* Vitesse VSC8601 Extended PHY Control Register 1 */
> +#define MII_VSC8601_EPHY_CTL		0x17
> +#define MII_VSC8601_EPHY_CTL_RGMII_SKEW	(1 << 8)
> +
>  #define PHY_ID_VSC8234			0x000fc620
>  #define PHY_ID_VSC8244			0x000fc6c0
>  #define PHY_ID_VSC8514			0x00070670
> @@ -111,6 +115,34 @@ static int vsc824x_config_init(struct phy_device *phydev)
>  	return err;
>  }
>  
> +/* This adds a skew for both TX and RX clocks, so the skew should only be
> + * applied to "rgmii-id" interfaces. It may not work as expected
> + * on "rgmii-txid", "rgmii-rxid" or "rgmii" interfaces. */

Hi Alexandru

You should be able to make "rgmii" work as expected. If that is the
phy mode, disable the skew.

    Andrew

^ permalink raw reply

* [PATCH net-next] net/mlx4_en: remove napi_hash_del() call
From: Eric Dumazet @ 2016-11-16 13:49 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Tariq Toukan

From: Eric Dumazet <edumazet@google.com>

There is no need calling napi_hash_del()+synchronize_rcu() before
calling netif_napi_del()

netif_napi_del() does this already.

Using napi_hash_del() in a driver is useful only when dealing with
a batch of NAPI structures, so that a single synchronize_rcu() can
be used. mlx4_en_deactivate_cq() is deactivating a single NAPI.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_cq.c |    4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_cq.c b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
index 03f05c4d1f9897d851c4f36d1f0e37ad3398e76c..09dd3776db7632ae8818c9638507705176233ea4 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
@@ -185,10 +185,6 @@ void mlx4_en_destroy_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq **pcq)
 void mlx4_en_deactivate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq)
 {
 	napi_disable(&cq->napi);
-	if (cq->type == RX) {
-		napi_hash_del(&cq->napi);
-		synchronize_rcu();
-	}
 	netif_napi_del(&cq->napi);
 
 	mlx4_cq_free(priv->mdev->dev, &cq->mcq);

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox