Netdev List
 help / color / mirror / Atom feed
* RE: [patch net-next] net/sched: Fix the logic error to decide the ingress qdisc
From: Chopra, Manish @ 2017-08-22 20:47 UTC (permalink / raw)
  To: Chris Mi, netdev@vger.kernel.org; +Cc: davem@davemloft.net, jiri@resnulli.us
In-Reply-To: <1503055460-36795-1-git-send-email-chrism@mellanox.com>

-----Original Message-----
From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of Chris Mi
Sent: Friday, August 18, 2017 4:54 PM
To: netdev@vger.kernel.org
Cc: davem@davemloft.net; jiri@resnulli.us
Subject: [patch net-next] net/sched: Fix the logic error to decide the ingress qdisc

The offending commit used a newly added helper function.
But the logic is wrong. Without this fix, the affected NICs can't do HW offload. Error -EOPNOTSUPP will be returned directly.

Fixes: a2e8da9378cc ("net/sched: use newly added classid identity helpers")
Signed-off-by: Chris Mi <chrism@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c     | 2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c       | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c   | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c    | 2 +-
 drivers/net/ethernet/netronome/nfp/bpf/main.c       | 2 +-
 drivers/net/ethernet/netronome/nfp/flower/offload.c | 2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 77538cd..e55a929 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -2892,7 +2892,7 @@ static int cxgb_set_tx_maxrate(struct net_device *dev, int index, u32 rate)  static int cxgb_setup_tc_cls_u32(struct net_device *dev,
 				 struct tc_cls_u32_offload *cls_u32)  {
-	if (is_classid_clsact_ingress(cls_u32->common.classid) ||
+	if (!is_classid_clsact_ingress(cls_u32->common.classid) ||
 	    cls_u32->common.chain_index)
 		return -EOPNOTSUPP;
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index f9fd8d8..56d7ef0 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -9230,7 +9230,7 @@ static int ixgbe_setup_tc_cls_u32(struct net_device *dev,  {
 	struct ixgbe_adapter *adapter = netdev_priv(dev);
 
-	if (is_classid_clsact_ingress(cls_u32->common.classid) ||
+	if (!is_classid_clsact_ingress(cls_u32->common.classid) ||
 	    cls_u32->common.chain_index)
 		return -EOPNOTSUPP;

Hi Jiri/Chris,

I was looking at ixgbe and observed that it uses "is_tcf_mirred_egress_redirect()" API which works for below filter command -

     # add u32 filter with action to redirect to macvlan netdev
     tc filter add dev p4p1 parent ffff: protocol ip prio 99 \
        handle 800:0:2 u32 ht 800: \
        match ip src 192.168.2.3/32 \
        action mirred egress redirect dev mvlan_1

Assuming that hardware actually redirect the packets to "ingress" direction of mac vlan interface [that is it delivers the packet to the driver in receive flow]
Isn't the use of above API incorrect or misleading ?  shouldn't it be something like "is_tcf_mirred_ingress_redirect()" and part of the command should be replaced with below ?

action mirrred ingress redirect mvlan_1 

Thanks,
Manish

^ permalink raw reply related

* [PATCH net-next 4/5] xdp: remove net_device names from xdp_redirect tracepoint
From: Jesper Dangaard Brouer @ 2017-08-22 20:47 UTC (permalink / raw)
  To: netdev; +Cc: John Fastabend, Jesper Dangaard Brouer
In-Reply-To: <150343479290.31091.8019008896152616977.stgit@firesoul>

There is too much overhead in the current trace_xdp_redirect
tracepoint as it does strcpy and strlen on the net_device names.

Besides, exposing the ifindex/index is actually the information that
is needed in the tracepoint to diagnose issues.  When a lookup fails
(either ifindex or devmap index) then there is a need for saying which
to_index that have issues.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/trace/events/xdp.h |   17 ++++++++---------
 net/core/filter.c          |    6 +++---
 2 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
index 0e42e69f773b..7511bed80558 100644
--- a/include/trace/events/xdp.h
+++ b/include/trace/events/xdp.h
@@ -51,15 +51,14 @@ TRACE_EVENT(xdp_exception,
 
 TRACE_EVENT(xdp_redirect,
 
-	TP_PROTO(const struct net_device *from,
-		 const struct net_device *to,
+	TP_PROTO(int from_index, int to_index,
 		 const struct bpf_prog *xdp, u32 act, int err),
 
-	TP_ARGS(from, to, xdp, act, err),
+	TP_ARGS(from_index, to_index, xdp, act, err),
 
 	TP_STRUCT__entry(
-		__string(name_from, from->name)
-		__string(name_to, to->name)
+		__field(int, from_index)
+		__field(int, to_index)
 		__array(u8, prog_tag, 8)
 		__field(u32, act)
 		__field(int, err)
@@ -68,15 +67,15 @@ TRACE_EVENT(xdp_redirect,
 	TP_fast_assign(
 		BUILD_BUG_ON(sizeof(__entry->prog_tag) != sizeof(xdp->tag));
 		memcpy(__entry->prog_tag, xdp->tag, sizeof(xdp->tag));
-		__assign_str(name_from, from->name);
-		__assign_str(name_to, to->name);
+		__entry->from_index	= from_index;
+		__entry->to_index	= to_index;
 		__entry->act = act;
 		__entry->err = err;
 	),
 
-	TP_printk("prog=%s from=%s to=%s action=%s err=%d",
+	TP_printk("prog=%s from=%d to=%d action=%s err=%d",
 		  __print_hex_str(__entry->prog_tag, 8),
-		  __get_str(name_from), __get_str(name_to),
+		  __entry->from_index, __entry->to_index,
 		  __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB),
 		  __entry->err)
 );
diff --git a/net/core/filter.c b/net/core/filter.c
index 2d7cdb2c5c66..35f0383fa2b9 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2553,7 +2553,7 @@ int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp,
 		ri->map_to_flush = map;
 
 out:
-	trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT, err);
+	trace_xdp_redirect(dev->ifindex, index, xdp_prog, XDP_REDIRECT, err);
 	return err;
 }
 
@@ -2577,7 +2577,7 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
 
 	err = __bpf_tx_xdp(fwd, NULL, xdp, 0);
 out:
-	trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT, err);
+	trace_xdp_redirect(dev->ifindex, index, xdp_prog, XDP_REDIRECT, err);
 	return err;
 }
 EXPORT_SYMBOL_GPL(xdp_do_redirect);
@@ -2611,7 +2611,7 @@ int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb,
 
 	skb->dev = fwd;
 out:
-	trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT, err);
+	trace_xdp_redirect(dev->ifindex, index, xdp_prog, XDP_REDIRECT, err);
 	return err;
 }
 EXPORT_SYMBOL_GPL(xdp_do_generic_redirect);

^ permalink raw reply related

* [PATCH net-next 3/5] ixgbe: use return codes from ndo_xdp_xmit that are distinguishable
From: Jesper Dangaard Brouer @ 2017-08-22 20:47 UTC (permalink / raw)
  To: netdev; +Cc: John Fastabend, Jesper Dangaard Brouer
In-Reply-To: <150343479290.31091.8019008896152616977.stgit@firesoul>

For XDP_REDIRECT the use of return code -EINVAL is confusing, as it is
used in three different cases.  (1) When the index or ifindex lookup
fails, and in the ixgbe driver (2) when link is down and (3) when XDP
have not been enabled.

The return code can be picked up by the tracepoint xdp:xdp_redirect
for diagnosing why XDP_REDIRECT isn't working.  Thus, there is a need
different return codes to tell the issues apart.

I'm considering using a specific err-code scheme for XDP_REDIRECT
instead of using these errno codes.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 8d3224ad6434..3afb8c4b9d48 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -9849,14 +9849,14 @@ static int ixgbe_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp)
 	int err;
 
 	if (unlikely(test_bit(__IXGBE_DOWN, &adapter->state)))
-		return -EINVAL;
+		return -ENOLINK;
 
 	/* During program transitions its possible adapter->xdp_prog is assigned
 	 * but ring has not been configured yet. In this case simply abort xmit.
 	 */
 	ring = adapter->xdp_prog ? adapter->xdp_ring[smp_processor_id()] : NULL;
 	if (unlikely(!ring))
-		return -EINVAL;
+		return -ENXIO;
 
 	err = ixgbe_xmit_xdp_ring(adapter, xdp);
 	if (err != IXGBE_XDP_TX)

^ permalink raw reply related

* [PATCH net-next 2/5] xdp: make generic xdp redirect use tracepoint trace_xdp_redirect
From: Jesper Dangaard Brouer @ 2017-08-22 20:47 UTC (permalink / raw)
  To: netdev; +Cc: John Fastabend, Jesper Dangaard Brouer
In-Reply-To: <150343479290.31091.8019008896152616977.stgit@firesoul>

If the xdp_do_generic_redirect() call fails, it trigger the
trace_xdp_exception tracepoint.  It seems better to use the same
tracepoint trace_xdp_redirect, as the native xdp_do_redirect{,_map} does.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/linux/filter.h |    3 ++-
 net/core/dev.c         |    4 ++--
 net/core/filter.c      |   36 ++++++++++++++++++++++--------------
 3 files changed, 26 insertions(+), 17 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 7015116331af..d29e58fde364 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -718,7 +718,8 @@ struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off,
  * because we only track one map and force a flush when the map changes.
  * This does not appear to be a real limitation for existing software.
  */
-int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb);
+int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb,
+			    struct bpf_prog *prog);
 int xdp_do_redirect(struct net_device *dev,
 		    struct xdp_buff *xdp,
 		    struct bpf_prog *prog);
diff --git a/net/core/dev.c b/net/core/dev.c
index 40b28e417072..270b54754821 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3953,7 +3953,8 @@ int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
 		if (act != XDP_PASS) {
 			switch (act) {
 			case XDP_REDIRECT:
-				err = xdp_do_generic_redirect(skb->dev, skb);
+				err = xdp_do_generic_redirect(skb->dev, skb,
+							      xdp_prog);
 				if (err)
 					goto out_redir;
 			/* fallthru to submit skb */
@@ -3966,7 +3967,6 @@ int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
 	}
 	return XDP_PASS;
 out_redir:
-	trace_xdp_exception(skb->dev, xdp_prog, XDP_REDIRECT);
 	kfree_skb(skb);
 	return XDP_DROP;
 }
diff --git a/net/core/filter.c b/net/core/filter.c
index 31c579749679..2d7cdb2c5c66 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2582,29 +2582,37 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
 }
 EXPORT_SYMBOL_GPL(xdp_do_redirect);
 
-int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb)
+int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb,
+			    struct bpf_prog *xdp_prog)
 {
 	struct redirect_info *ri = this_cpu_ptr(&redirect_info);
-	unsigned int len;
 	u32 index = ri->ifindex;
+	struct net_device *fwd;
+	unsigned int len;
+	int err = 0;
 
-	dev = dev_get_by_index_rcu(dev_net(dev), index);
+	fwd = dev_get_by_index_rcu(dev_net(dev), index);
 	ri->ifindex = 0;
-	if (unlikely(!dev)) {
-		goto err;
+	if (unlikely(!fwd)) {
+		err = -EINVAL;
+		goto out;
 	}
 
-	if (unlikely(!(dev->flags & IFF_UP)))
-		goto err;
+	if (unlikely(!(fwd->flags & IFF_UP))) {
+		err = -ENOLINK;
+		goto out;
+	}
 
-	len = dev->mtu + dev->hard_header_len + VLAN_HLEN;
-	if (skb->len > len)
-		goto err;
+	len = fwd->mtu + fwd->hard_header_len + VLAN_HLEN;
+	if (skb->len > len) {
+		err = -EMSGSIZE;
+		goto out;
+	}
 
-	skb->dev = dev;
-	return 0;
-err:
-	return -EINVAL;
+	skb->dev = fwd;
+out:
+	trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT, err);
+	return err;
 }
 EXPORT_SYMBOL_GPL(xdp_do_generic_redirect);
 

^ permalink raw reply related

* [PATCH net-next 1/5] xdp: remove bpf_warn_invalid_xdp_redirect
From: Jesper Dangaard Brouer @ 2017-08-22 20:47 UTC (permalink / raw)
  To: netdev; +Cc: John Fastabend, Jesper Dangaard Brouer
In-Reply-To: <150343479290.31091.8019008896152616977.stgit@firesoul>

Given there is a tracepoint that can track the error code
of xdp_do_redirect calls, the WARN_ONCE in bpf_warn_invalid_xdp_redirect
doesn't seem relevant any longer.  Simply remove the function.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/core/filter.c |    8 --------
 1 file changed, 8 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index fa2115695037..31c579749679 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2499,7 +2499,6 @@ static int __bpf_tx_xdp(struct net_device *dev,
 	int err;
 
 	if (!dev->netdev_ops->ndo_xdp_xmit) {
-		bpf_warn_invalid_xdp_redirect(dev->ifindex);
 		return -EOPNOTSUPP;
 	}
 
@@ -2572,7 +2571,6 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
 	fwd = dev_get_by_index_rcu(dev_net(dev), index);
 	ri->ifindex = 0;
 	if (unlikely(!fwd)) {
-		bpf_warn_invalid_xdp_redirect(index);
 		err = -EINVAL;
 		goto out;
 	}
@@ -2593,7 +2591,6 @@ int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb)
 	dev = dev_get_by_index_rcu(dev_net(dev), index);
 	ri->ifindex = 0;
 	if (unlikely(!dev)) {
-		bpf_warn_invalid_xdp_redirect(index);
 		goto err;
 	}
 
@@ -3573,11 +3570,6 @@ void bpf_warn_invalid_xdp_action(u32 act)
 }
 EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action);
 
-void bpf_warn_invalid_xdp_redirect(u32 ifindex)
-{
-	WARN_ONCE(1, "Illegal XDP redirect to unsupported device ifindex(%i)\n", ifindex);
-}
-
 static bool __is_valid_sock_ops_access(int off, int size)
 {
 	if (off < 0 || off >= sizeof(struct bpf_sock_ops))

^ permalink raw reply related

* [PATCH net-next 0/5] xdp: more work on xdp tracepoints
From: Jesper Dangaard Brouer @ 2017-08-22 20:47 UTC (permalink / raw)
  To: netdev; +Cc: John Fastabend, Jesper Dangaard Brouer

More work on streamlining the tracepoints for XDP.

I've created a simple xdp_monitor application that uses this
tracepoint, and prints statistics. Available at github:

https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_monitor_kern.c
https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_monitor_user.c

---

Jesper Dangaard Brouer (5):
      xdp: remove bpf_warn_invalid_xdp_redirect
      xdp: make generic xdp redirect use tracepoint trace_xdp_redirect
      ixgbe: use return codes from ndo_xdp_xmit that are distinguishable
      xdp: remove net_device names from xdp_redirect tracepoint
      xdp: get tracepoints xdp_exception and xdp_redirect in sync


 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    4 +-
 include/linux/filter.h                        |    3 +-
 include/trace/events/xdp.h                    |   33 ++++++++---------
 net/core/dev.c                                |    4 +-
 net/core/filter.c                             |   48 +++++++++++++------------
 5 files changed, 46 insertions(+), 46 deletions(-)

^ permalink raw reply

* [patch net 2/2] net: sched: don't do tcf_chain_flush from tcf_chain_destroy
From: Jiri Pirko @ 2017-08-22 20:46 UTC (permalink / raw)
  To: netdev; +Cc: davem, jhs, xiyou.wangcong, mlxsw
In-Reply-To: <20170822204650.7016-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

tcf_chain_flush needs to be called with RTNL. However, on
free_tcf->
 tcf_action_goto_chain_fini->
  tcf_chain_put->
   tcf_chain_destroy->
    tcf_chain_flush
callpath, it is called without RTNL.
This issue was notified by following warning:

[  155.599052] WARNING: suspicious RCU usage
[  155.603165] 4.13.0-rc5jiri+ #54 Not tainted
[  155.607456] -----------------------------
[  155.611561] net/sched/cls_api.c:195 suspicious rcu_dereference_protected() usage!

Since on this callpath, the chain is guaranteed to be already empty
by check in tcf_chain_put, move the tcf_chain_flush call out and call it
only where it is needed - into tcf_block_put.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_api.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 45cd34e..6c5ea84 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -219,8 +219,6 @@ static void tcf_chain_destroy(struct tcf_chain *chain)
 	if (!list_empty(&chain->list))
 		list_del_init(&chain->list);
 
-	tcf_chain_flush(chain);
-
 	/* There might still be a reference held when we got here from
 	 * tcf_block_put. Wait for the user to drop reference before free.
 	 */
@@ -296,8 +294,10 @@ void tcf_block_put(struct tcf_block *block)
 	if (!block)
 		return;
 
-	list_for_each_entry_safe(chain, tmp, &block->chain_list, list)
+	list_for_each_entry_safe(chain, tmp, &block->chain_list, list) {
+		tcf_chain_flush(chain);
 		tcf_chain_destroy(chain);
+	}
 	kfree(block);
 }
 EXPORT_SYMBOL(tcf_block_put);
-- 
2.9.3

^ permalink raw reply related

* [patch net 1/2] net: sched: fix use after free when tcf_chain_destroy is called multiple times
From: Jiri Pirko @ 2017-08-22 20:46 UTC (permalink / raw)
  To: netdev; +Cc: davem, jhs, xiyou.wangcong, mlxsw
In-Reply-To: <20170822204650.7016-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

The goto_chain termination action takes a reference of a chain. In that
case, there is an issue when block_put is called tcf_chain_destroy
directly. The follo-up call of tcf_chain_put by goto_chain action free
works with memory that is already freed. This was caught by kasan:

[  220.337908] BUG: KASAN: use-after-free in tcf_chain_put+0x1b/0x50
[  220.344103] Read of size 4 at addr ffff88036d1f2cec by task systemd-journal/261
[  220.353047] CPU: 0 PID: 261 Comm: systemd-journal Not tainted 4.13.0-rc5jiri+ #54
[  220.360661] Hardware name: Mellanox Technologies Ltd. Mellanox switch/Mellanox x86 mezzanine board, BIOS 4.6.5 08/02/2016
[  220.371784] Call Trace:
[  220.374290]  <IRQ>
[  220.376355]  dump_stack+0xd5/0x150
[  220.391485]  print_address_description+0x86/0x410
[  220.396308]  kasan_report+0x181/0x4c0
[  220.415211]  tcf_chain_put+0x1b/0x50
[  220.418949]  free_tcf+0x95/0xc0

So allow tcf_chain_destroy to be called multiple times, free only in
case the reference count drops to 0.

Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_api.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 9fd44c2..45cd34e 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -215,9 +215,17 @@ static void tcf_chain_flush(struct tcf_chain *chain)
 
 static void tcf_chain_destroy(struct tcf_chain *chain)
 {
-	list_del(&chain->list);
+	/* May be already removed from the list by the previous call. */
+	if (!list_empty(&chain->list))
+		list_del_init(&chain->list);
+
 	tcf_chain_flush(chain);
-	kfree(chain);
+
+	/* There might still be a reference held when we got here from
+	 * tcf_block_put. Wait for the user to drop reference before free.
+	 */
+	if (!chain->refcnt)
+		kfree(chain);
 }
 
 struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
-- 
2.9.3

^ permalink raw reply related

* [patch net 0/2] net: sched: couple of chain fixes
From: Jiri Pirko @ 2017-08-22 20:46 UTC (permalink / raw)
  To: netdev; +Cc: davem, jhs, xiyou.wangcong, mlxsw

From: Jiri Pirko <jiri@mellanox.com>

Jiri Pirko (2):
  net: sched: fix use after free when tcf_chain_destroy is called
    multiple times
  net: sched: don't do tcf_chain_flush from tcf_chain_destroy

 net/sched/cls_api.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

-- 
2.9.3

^ permalink raw reply

* Re: [PATCH 2/2] net: smsc911x: Quiten netif during suspend
From: Andrew Lunn @ 2017-08-22 20:17 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: David S . Miller, Steve Glendinning, Florian Fainelli,
	Lukas Wunner, Rafael J . Wysocki, netdev, linux-pm,
	linux-renesas-soc, linux-kernel
In-Reply-To: <1503427046-17618-3-git-send-email-geert+renesas@glider.be>

On Tue, Aug 22, 2017 at 08:37:26PM +0200, Geert Uytterhoeven wrote:

Hi Geert

quieten. Has an E in the middle.

Otherwise this patch looks reasonable.

	  Andrew

^ permalink raw reply

* Re: XDP redirect measurements, gotchas and tracepoints
From: Michael Chan @ 2017-08-22 20:04 UTC (permalink / raw)
  To: Duyck, Alexander H
  Cc: john.fastabend@gmail.com, brouer@redhat.com,
	pstaszewski@itcare.pl, netdev@vger.kernel.org,
	xdp-newbies@vger.kernel.org, andy@greyhouse.net,
	borkmann@iogearbox.net
In-Reply-To: <1503426617.2434.5.camel@intel.com>

On Tue, Aug 22, 2017 at 11:30 AM, Duyck, Alexander H
<alexander.h.duyck@intel.com> wrote:
> On Tue, 2017-08-22 at 11:17 -0700, John Fastabend wrote:
>> On 08/22/2017 11:02 AM, Michael Chan wrote:
>> > On Mon, Aug 21, 2017 at 12:25 PM, Jesper Dangaard Brouer
>> > <brouer@redhat.com> wrote:
>> > >
>> > > I'be been playing with the latest XDP_REDIRECT feature, that was
>> > > accepted in net-next (for ixgbe), see merge commit[1].
>> > >  [1] https://git.kernel.org/davem/net-next/c/6093ec2dc31
>> > >
>> >
>> > Just catching on XDP_REDIRECT and I have a very basic question.  The
>> > ingress device passes the XDP buffer to the egress device for XDP
>> > redirect transmission.  When the egress device has transmitted the
>> > packet, is it supposed to just free the buffer?  Or is it supposed to
>> > be recycled?
>> >
>> > In XDP_TX, the buffer is recycled back to the rx ring.
>> >
>>
>> With XDP_REDIRECT we must "just free the buffer" in ixgbe this means
>> page_frag_free() on the data. There is no way to know where the xdp
>> buffer came from it could be a different NIC for example.
>>
>> However with how ixgbe is coded up recycling will work as long as
>> the memory is free'd before the driver ring tries to use it again. In
>> normal usage this should be the case. And if we are over-running a device
>> it doesn't really hurt to slow down the sender a bit.
>>
>> I think this is a pretty good model, we could probably provide a set
>> of APIs for drivers to use so that we get some consistency across
>> vendors here, ala Jesper's page pool ideas.
>>
>> (+Alex, for ixgbe details)
>>
>> Thanks,
>> John
>
> I think you pretty much covered the inner workings for the ixgbe bits.
>
> The only piece I would add is that the recycling trick normally only
> works if the same interface/driver is doing both the Tx and the Rx. The
> redirect code cannot assume that is the case and that is the reason why
> it must always be freeing the traffic on clean-up.
>

Right, but it's conceivable to add an API to "return" the buffer to
the input device, right?

^ permalink raw reply

* [PATCH net-next] liquidio: change manner of detecting whether or not NIC firmware is loaded
From: Felix Manlunas @ 2017-08-22 19:46 UTC (permalink / raw)
  To: davem; +Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla

In the NIC firmware, the 1-bit flag indicating "firmware is loaded" moved
from SLI_SCRATCH_1 to SLI_SCRATCH_2 (these are Octeon general-purpose
scratch registers).  Make the PF driver conform to this change.

Remove code that sets the "firmware is loaded" flag because it's now the
firmware's job to do that.

In the code that detects whether or not the firmware is loaded, don't just
rely on checking the "firmware is loaded" flag because that may cause a
rare false negative.  Add code that deduces whether or not the firmware is
loaded; that will never give a false negative.

Also bump up driver version to match newer NIC firmware.

Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: Derek Chickles <derek.chickles@cavium.com>
---
 drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c | 15 +++++++++++++--
 drivers/net/ethernet/cavium/liquidio/lio_main.c         |  6 ------
 drivers/net/ethernet/cavium/liquidio/liquidio_common.h  |  3 ++-
 3 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
index 4b0ca9f..96171cb 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
@@ -1405,8 +1405,19 @@ int cn23xx_fw_loaded(struct octeon_device *oct)
 {
 	u64 val;
 
-	val = octeon_read_csr64(oct, CN23XX_SLI_SCRATCH1);
-	return (val >> 1) & 1ULL;
+	/* If there's more than one active PF on this NIC, then that
+	 * implies that the NIC firmware is loaded and running.  This check
+	 * prevents a rare false negative that might occur if we only relied
+	 * on checking the SCR2_BIT_FW_LOADED flag.  The false negative would
+	 * happen if the PF driver sees SCR2_BIT_FW_LOADED as cleared even
+	 * though the firmware was already loaded but still booting and has yet
+	 * to set SCR2_BIT_FW_LOADED.
+	 */
+	if (atomic_read(oct->adapter_refcount) > 1)
+		return 1;
+
+	val = octeon_read_csr64(oct, CN23XX_SLI_SCRATCH2);
+	return (val >> SCR2_BIT_FW_LOADED) & 1ULL;
 }
 
 void cn23xx_tell_vf_its_macaddr_changed(struct octeon_device *oct, int vfidx,
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index 0eea6a2..e786ac8 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -4098,12 +4098,6 @@ static int octeon_device_init(struct octeon_device *octeon_dev)
 			dev_err(&octeon_dev->pci_dev->dev, "Could not load firmware to board\n");
 			return 1;
 		}
-		/* set bit 1 of SLI_SCRATCH_1 to indicate that firmware is
-		 * loaded
-		 */
-		if (OCTEON_CN23XX_PF(octeon_dev))
-			octeon_write_csr64(octeon_dev, CN23XX_SLI_SCRATCH1,
-					   2ULL);
 	}
 
 	handshake[octeon_dev->octeon_id].init_ok = 1;
diff --git a/drivers/net/ethernet/cavium/liquidio/liquidio_common.h b/drivers/net/ethernet/cavium/liquidio/liquidio_common.h
index 18d2955..1e5655c 100644
--- a/drivers/net/ethernet/cavium/liquidio/liquidio_common.h
+++ b/drivers/net/ethernet/cavium/liquidio/liquidio_common.h
@@ -28,7 +28,7 @@
 #define LIQUIDIO_PACKAGE ""
 #define LIQUIDIO_BASE_MAJOR_VERSION 1
 #define LIQUIDIO_BASE_MINOR_VERSION 6
-#define LIQUIDIO_BASE_MICRO_VERSION 0
+#define LIQUIDIO_BASE_MICRO_VERSION 1
 #define LIQUIDIO_BASE_VERSION   __stringify(LIQUIDIO_BASE_MAJOR_VERSION) "." \
 				__stringify(LIQUIDIO_BASE_MINOR_VERSION)
 #define LIQUIDIO_MICRO_VERSION  "." __stringify(LIQUIDIO_BASE_MICRO_VERSION)
@@ -106,6 +106,7 @@ enum octeon_tag_type {
 #define MAX_IOQ_INTERRUPTS_PER_PF   (64 * 2)
 #define MAX_IOQ_INTERRUPTS_PER_VF   (8 * 2)
 
+#define SCR2_BIT_FW_LOADED	    63
 
 static inline u32 incr_index(u32 index, u32 count, u32 max)
 {

^ permalink raw reply related

* Re: [PATCH v3 3/4] net: stmmac: register parent MDIO node for sun8i-h3-emac
From: Corentin Labbe @ 2017-08-22 19:37 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Chen-Yu Tsai, Andrew Lunn, Maxime Ripard, Rob Herring,
	Mark Rutland, Russell King, Giuseppe Cavallaro, Alexandre Torgue,
	devicetree, linux-arm-kernel, linux-kernel, netdev
In-Reply-To: <352ae66b-78a4-e88b-4544-a8edd9390b0c@gmail.com>

On Tue, Aug 22, 2017 at 11:35:01AM -0700, Florian Fainelli wrote:
> On 08/22/2017 11:11 AM, Corentin Labbe wrote:
> > On Tue, Aug 22, 2017 at 09:40:24AM -0700, Florian Fainelli wrote:
> >> On 08/22/2017 08:39 AM, Chen-Yu Tsai wrote:
> >>> On Mon, Aug 21, 2017 at 10:23 PM, Andrew Lunn <andrew@lunn.ch> wrote:
> >>>>> All muxes are mostly always represented the same way afaik, or do you
> >>>>> want to simply introduce a new compatible / property?
> >>>>
> >>>> +         mdio-mux {
> >>>> +               compatible = "allwinner,sun8i-h3-mdio-switch";
> >>>> +               mdio-parent-bus = <&mdio_parent>;
> >>>> +               #address-cells = <1>;
> >>>> +               #size-cells = <0>;
> >>>> +
> >>>> +               internal_mdio: mdio@1 {
> >>>>                         reg = <1>;
> >>>> -                       clocks = <&ccu CLK_BUS_EPHY>;
> >>>> -                       resets = <&ccu RST_BUS_EPHY>;
> >>>> +                       #address-cells = <1>;
> >>>> +                       #size-cells = <0>;
> >>>> +                       int_mii_phy: ethernet-phy@1 {
> >>>> +                               compatible = "ethernet-phy-ieee802.3-c22";
> >>>> +                               reg = <1>;
> >>>> +                               clocks = <&ccu CLK_BUS_EPHY>;
> >>>> +                               resets = <&ccu RST_BUS_EPHY>;
> >>>> +                               phy-is-integrated;
> >>>> +                       };
> >>>> +               };
> >>>> +               mdio: mdio@0 {
> >>>> +                       reg = <0>;
> >>>> +                       #address-cells = <1>;
> >>>> +                       #size-cells = <0>;
> >>>>                 };
> >>>>
> >>>> Hi Maxim
> >>>>
> >>>> Anybody who knows the MDIO-mux code/binding, knows that it is a run
> >>>> time mux. You swap the mux per MDIO transaction. You can access all
> >>>> the PHY and switches on the mux'ed MDIO bus.
> >>>>
> >>>> However here, it is effectively a boot-time MUX. You cannot change it
> >>>> on the fly. What happens when somebody has a phandle to a PHY on the
> >>>> internal and a phandle to a phy on the external? Does the driver at
> >>>> least return -EINVAL, or -EBUSY? Is there a representation which
> >>>> eliminates this possibility?
> >>>
> >>> There is only one controller. Either you use the internal PHY, which
> >>> is then directly coupled (no magnetics needed) to the RJ45 port, or
> >>> you use an external PHY over MII/RMII/RGMII. You could supposedly
> >>> have both on a board, and let the user choose one. But why bother
> >>> with the extra complexity and cost? Either you use the internal PHY
> >>> at 100M, or an external RGMII PHY for gigabit speeds.
> >>
> >> I agree, there is no point in over-engineering any of this. I don't
> >> think there is actually any MDIO mux per-se in that the MDIO clock and
> >> data lines are muxed, however there has to be some kind of built-in port
> >> multiplexer that lets you chose between connecting to the internal PHY
> >> and any external PHY/MAC, but that is not what a "mdio-mux" node represents.
> >>
> >>>
> >>> So I think what you are saying is either impossible or engineering-wise
> >>> a very stupid design, like using an external MAC with a discrete PHY
> >>> connected to the internal MAC's MDIO bus, while using the internal MAC
> >>> with the internal PHY.
> >>>
> >>> Now can we please decide on something? We're a week and a half from
> >>> the 4.13 release. If mdio-mux is wrong, then we could have two mdio
> >>> nodes (internal-mdio & external-mdio).
> >>
> >> I really don't see a need for a mdio-mux in the first place, just have
> >> one MDIO controller (current state) sub-node which describes the
> >> built-in STMMAC MDIO controller and declare the internal PHY as a child
> >> node (along with 'phy-is-integrated'). If a different configuration is
> >> used, then just put the external PHY as a child node there.
> >>
> >> If fixed-link is required, the mdio node becomes unused anyway.
> >>
> >> Works for everyone?
> > 
> > If we put an external PHY with reg=1 as a child of internal MDIO, il will be merged with internal PHY node and get phy-is-integrated.
> 
> Then have the .dtsi file contain just the mdio node, but no internal or
> external PHY and push all the internal and external PHY node definition
> (in its entirety) to the per-board DTS file, does not that work?
> 

It is near what I sent in v2 of this serie. (only one MDIO with internal PHY, but phy-is-integrated is only set per-board DTS)
But at that time Rob said to use a mdio-mux.

> > 
> > Does two MDIO node "internal-mdio" and "mdio" works for you ?
> > (We keep "mdio" for external MDIO for reducing the number of patchs)
> 
> How does that solve the problem and not create a new one where both MDIO
> nodes end-up being registered? Does that mean you force the writer of
> the board-level DTS to systematically disable the internal MDIO node
> when using an external PHY and vice versa? How is that better than not
> being explicit like I suggested earlier?
> 

Only one node is registered.
stmmac register only MDIO called "mdio".
So it is why I have added a patch "register parent MDIO node for sun8i-h3-emac" which for H3 only register the PHY's parent MDIO

But yes back to your solution "only one mdio with internal PHY," and phy-is-integrated only set on board DT which use internal PHY will work.

Regards

^ permalink raw reply

* [PATCH net-next] net: sched: use kvmalloc() for class hash tables
From: Eric Dumazet @ 2017-08-22 19:26 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jamal Hadi Salim, Cong Wang, Jiri Pirko

From: Eric Dumazet <edumazet@google.com>

High order GFP_KERNEL allocations can stress the host badly.

Use modern kvmalloc_array()/kvfree() instead of custom
allocations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/sched/sch_api.c |   22 ++++------------------
 1 file changed, 4 insertions(+), 18 deletions(-)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 0fea0c50b7636b75b57ddf408c45ecce344813f4..aaf552b8e1205f29e0d95e943898bea35b647299 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -621,14 +621,10 @@ EXPORT_SYMBOL(qdisc_watchdog_cancel);
 
 static struct hlist_head *qdisc_class_hash_alloc(unsigned int n)
 {
-	unsigned int size = n * sizeof(struct hlist_head), i;
 	struct hlist_head *h;
+	unsigned int i;
 
-	if (size <= PAGE_SIZE)
-		h = kmalloc(size, GFP_KERNEL);
-	else
-		h = (struct hlist_head *)
-			__get_free_pages(GFP_KERNEL, get_order(size));
+	h = kvmalloc_array(n, sizeof(struct hlist_head), GFP_KERNEL);
 
 	if (h != NULL) {
 		for (i = 0; i < n; i++)
@@ -637,16 +633,6 @@ static struct hlist_head *qdisc_class_hash_alloc(unsigned int n)
 	return h;
 }
 
-static void qdisc_class_hash_free(struct hlist_head *h, unsigned int n)
-{
-	unsigned int size = n * sizeof(struct hlist_head);
-
-	if (size <= PAGE_SIZE)
-		kfree(h);
-	else
-		free_pages((unsigned long)h, get_order(size));
-}
-
 void qdisc_class_hash_grow(struct Qdisc *sch, struct Qdisc_class_hash *clhash)
 {
 	struct Qdisc_class_common *cl;
@@ -679,7 +665,7 @@ void qdisc_class_hash_grow(struct Qdisc *sch, struct Qdisc_class_hash *clhash)
 	clhash->hashmask = nmask;
 	sch_tree_unlock(sch);
 
-	qdisc_class_hash_free(ohash, osize);
+	kvfree(ohash);
 }
 EXPORT_SYMBOL(qdisc_class_hash_grow);
 
@@ -699,7 +685,7 @@ EXPORT_SYMBOL(qdisc_class_hash_init);
 
 void qdisc_class_hash_destroy(struct Qdisc_class_hash *clhash)
 {
-	qdisc_class_hash_free(clhash->hash, clhash->hashsize);
+	kvfree(clhash->hash);
 }
 EXPORT_SYMBOL(qdisc_class_hash_destroy);
 

^ permalink raw reply related

* Re: Crash due to fbca164776e438 - "net: stmmac: Use the right logging function in stmmac_mdio_register"
From: Priit Laes @ 2017-08-22 19:20 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Romain Perier, Andrew Lunn, David S. Miller, netdev, linux-kernel,
	Chen-Yu Tsai, Maxime Ripard
In-Reply-To: <6d3c8253-1e31-761f-4b50-0bf087047753@gmail.com>

On Tue, Aug 22, 2017 at 12:01:32PM -0700, Florian Fainelli wrote:
> On 08/22/2017 11:47 AM, Priit Laes wrote:
> > Hi!
> > 
> > I'm running into following crash during boot on Cubietruck A20, with the patch
> > fbca164776e438
> > (net: stmmac: Use the right logging function in stmmac_mdio_register) applied:
> > 
> > [snip]
> > sun7i-dwmac 1c50000.ethernet: PTP uses main clock
> > sun7i-dwmac 1c50000.ethernet: no reset control found
> > sun7i-dwmac 1c50000.ethernet: no regulator found
> > sun7i-dwmac 1c50000.ethernet: Ring mode enabled
> > sun7i-dwmac 1c50000.ethernet: DMA HW capability register supported
> > sun7i-dwmac 1c50000.ethernet: Normal descriptors
> > libphy: stmmac: probed
> > Unable to handle kernel NULL pointer dereference at virtual address 00000048
> 
> Yes indeed:
> 
> (gdb) p/x (int)&((struct phy_driver *)0)->name
> $2 = 0x48
> 
> The following should fix it:
> 
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 9493fb369682..810f6fd2f639 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -877,15 +877,17 @@ EXPORT_SYMBOL(phy_attached_info);
>  #define ATTACHED_FMT "attached PHY driver [%s] (mii_bus:phy_addr=%s,
> irq=%d)"
>  void phy_attached_print(struct phy_device *phydev, const char *fmt, ...)
>  {
> +       const char *drv_name = phydev->drv ? phydev->drv->name : "unbound";
> +
>         if (!fmt) {
>                 dev_info(&phydev->mdio.dev, ATTACHED_FMT "\n",
> -                        phydev->drv->name, phydev_name(phydev),
> +                        drv_name, phydev_name(phydev),
>                          phydev->irq);
>         } else {
>                 va_list ap;
> 
>                 dev_info(&phydev->mdio.dev, ATTACHED_FMT,
> -                        phydev->drv->name, phydev_name(phydev),
> +                        drv_name, phydev_name(phydev),
>                          phydev->irq);
> 
>                 va_start(ap, fmt);
> 

Thanks, patch works.

Tested-By: Priit Laes <plaes@plaes.org>

With reverted patch:

sun7i-dwmac 1c50000.ethernet (unnamed net_device) (uninitialized): PHY ID 001cc915 at 0 IRQ POLL (stmmac-0:00) active
sun7i-dwmac 1c50000.ethernet (unnamed net_device) (uninitialized): PHY ID 001cc915 at 1 IRQ POLL (stmmac-0:01)

With changes by Florian:

mdio_bus stmmac-0:00: attached PHY driver [unbound] (mii_bus:phy_addr=stmmac-0:00, irq=-1)
mdio_bus stmmac-0:01: attached PHY driver [unbound] (mii_bus:phy_addr=stmmac-0:01, irq=-1)


Päikest,
Priit :)

^ permalink raw reply

* Re: [PATCH net-next 2/2] selftests/net: Add a test to validate behavior of rx timestamps
From: Willem de Bruijn @ 2017-08-22 19:15 UTC (permalink / raw)
  To: Mike Maloney; +Cc: Network Development, David Miller, Mike Maloney
In-Reply-To: <20170822172703.31703-3-maloneykernel@gmail.com>

> +static struct socket_type socket_types[] = {
> +       { "ip",         SOCK_DGRAM,     IPPROTO_IP },
> +       { "udp",        SOCK_DGRAM,     IPPROTO_UDP },

I think the intent is to have a SOCK_RAW case.

^ permalink raw reply

* Re: Crash due to fbca164776e438 - "net: stmmac: Use the right logging function in stmmac_mdio_register"
From: Florian Fainelli @ 2017-08-22 19:01 UTC (permalink / raw)
  To: Priit Laes, Romain Perier, Andrew Lunn, David S. Miller
  Cc: netdev, linux-kernel, Chen-Yu Tsai, Maxime Ripard
In-Reply-To: <20170822184757.GA6429@plaes.org>

On 08/22/2017 11:47 AM, Priit Laes wrote:
> Hi!
> 
> I'm running into following crash during boot on Cubietruck A20, with the patch
> fbca164776e438
> (net: stmmac: Use the right logging function in stmmac_mdio_register) applied:
> 
> [snip]
> sun7i-dwmac 1c50000.ethernet: PTP uses main clock
> sun7i-dwmac 1c50000.ethernet: no reset control found
> sun7i-dwmac 1c50000.ethernet: no regulator found
> sun7i-dwmac 1c50000.ethernet: Ring mode enabled
> sun7i-dwmac 1c50000.ethernet: DMA HW capability register supported
> sun7i-dwmac 1c50000.ethernet: Normal descriptors
> libphy: stmmac: probed
> Unable to handle kernel NULL pointer dereference at virtual address 00000048

Yes indeed:

(gdb) p/x (int)&((struct phy_driver *)0)->name
$2 = 0x48

The following should fix it:

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 9493fb369682..810f6fd2f639 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -877,15 +877,17 @@ EXPORT_SYMBOL(phy_attached_info);
 #define ATTACHED_FMT "attached PHY driver [%s] (mii_bus:phy_addr=%s,
irq=%d)"
 void phy_attached_print(struct phy_device *phydev, const char *fmt, ...)
 {
+       const char *drv_name = phydev->drv ? phydev->drv->name : "unbound";
+
        if (!fmt) {
                dev_info(&phydev->mdio.dev, ATTACHED_FMT "\n",
-                        phydev->drv->name, phydev_name(phydev),
+                        drv_name, phydev_name(phydev),
                         phydev->irq);
        } else {
                va_list ap;

                dev_info(&phydev->mdio.dev, ATTACHED_FMT,
-                        phydev->drv->name, phydev_name(phydev),
+                        drv_name, phydev_name(phydev),
                         phydev->irq);

                va_start(ap, fmt);



> pgd = c0004000
> [00000048] *pgd=00000000
> Internal error: Oops: 5 [#1] SMP ARM
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc6-00318-g0065bd7fa384 #1
> Hardware name: Allwinner sun7i (A20) Family
> task: ee868000 task.stack: ee85c000
> PC is at phy_attached_print+0x1c/0x8c
> LR is at stmmac_mdio_register+0x12c/0x200
> pc : [<c04510ac>]    lr : [<c045e6b4>]    psr: 60000013
> sp : ee85ddc8  ip : 00000000  fp : c07dfb5c
> r10: ee981210  r9 : 00000001  r8 : eea73000
> r7 : eeaa6dd0  r6 : eeb49800  r5 : 00000000  r4 : 00000000
> r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : eeb49800
> Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
> Control: 10c5387d  Table: 4000406a  DAC: 00000051
> Process swapper/0 (pid: 1, stack limit = 0xee85c210)
> Stack: (0xee85ddc8 to 0xee85e000)
> ddc0:                   00000000 00000002 eeb49400 eea72000 00000000 eeb49400
> dde0: c045e6b4 00000000 ffffffff eeab0810 00000000 c08051f8 ee9292c0 c016d480
> de00: eea725c0 eea73000 eea72000 00000001 eea726c0 c0457d0c 00000040 00000020
> de20: 00000000 c045b850 00000001 00000000 ee981200 eeab0810 eeaa6ed0 ee981210
> de40: 00000000 c094a4a0 00000000 c0465180 eeaa7550 f08d0000 c9ffb90c 00000032
> de60: fffffffa 00000032 ee981210 ffffffed c0a46620 fffffdfb c0a46620 c03f7be8
> de80: ee981210 c0a9a388 00000000 00000000 c0a46620 c03f63e0 ee981210 c0a46620
> dea0: ee981244 00000000 00000007 000000c6 c094a4a0 c03f6534 00000000 c0a46620
> dec0: c03f6490 c03f49ec ee828a58 ee9217b4 c0a46620 eeaa4b00 c0a43230 c03f59fc
> dee0: c08051f8 c094a49c c0a46620 c0a46620 00000000 c091c668 c093783c c03f6dfc
> df00: ffffe000 00000000 c091c668 c010177c eefe0938 eefe0935 c085e200 000000c6
> df20: 00000005 c0136bc8 60000013 c080b3a4 00000006 00000006 c07ce7b4 00000000
> df40: c07d7ddc c07cef28 eefe0938 eefe093e c0a0b2f0 c0a641c0 c0a641c0 c0a641c0
> df60: c0937834 00000007 000000c6 c094a4a0 00000000 c0900d88 00000006 00000006
> df80: 00000000 c09005a8 00000000 c060ecf4 00000000 00000000 00000000 00000000
> dfa0: 00000000 c060ecfc 00000000 c0107738 00000000 00000000 00000000 00000000
> dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 ffdeffff ffffffff
> [<c04510ac>] (phy_attached_print) from [<c045e6b4>] (stmmac_mdio_register+0x12c/0x200)
> [<c045e6b4>] (stmmac_mdio_register) from [<c045b850>] (stmmac_dvr_probe+0x850/0x96c)
> [<c045b850>] (stmmac_dvr_probe) from [<c0465180>] (sun7i_gmac_probe+0x120/0x180)
> [<c0465180>] (sun7i_gmac_probe) from [<c03f7be8>] (platform_drv_probe+0x50/0xac)
> [<c03f7be8>] (platform_drv_probe) from [<c03f63e0>] (driver_probe_device+0x234/0x2e4)
> [<c03f63e0>] (driver_probe_device) from [<c03f6534>] (__driver_attach+0xa4/0xa8)
> [<c03f6534>] (__driver_attach) from [<c03f49ec>] (bus_for_each_dev+0x4c/0x9c)
> [<c03f49ec>] (bus_for_each_dev) from [<c03f59fc>] (bus_add_driver+0x190/0x214)
> [<c03f59fc>] (bus_add_driver) from [<c03f6dfc>] (driver_register+0x78/0xf4)
> [<c03f6dfc>] (driver_register) from [<c010177c>] (do_one_initcall+0x44/0x168)
> [<c010177c>] (do_one_initcall) from [<c0900d88>] (kernel_init_freeable+0x144/0x1d0)
> [<c0900d88>] (kernel_init_freeable) from [<c060ecfc>] (kernel_init+0x8/0x110)
> [<c060ecfc>] (kernel_init) from [<c0107738>] (ret_from_fork+0x14/0x3c)
> Code: e59021c8 e59d401c e590302c e3540000 (e5922048) 
> ---[ end trace 39ae87c7923562d0 ]---
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> [/snip]
> 
> After reverting said patch, there are no issues.
> 
> Päikest,
> Priit Laes
> 


-- 
Florian

^ permalink raw reply related

* Re: [PATCH net-next 1/2] tcp: Extend SOF_TIMESTAMPING_RX_SOFTWARE to TCP recvmsg
From: Willem de Bruijn @ 2017-08-22 18:57 UTC (permalink / raw)
  To: Mike Maloney; +Cc: Network Development, David Miller, Mike Maloney
In-Reply-To: <20170822172703.31703-2-maloneykernel@gmail.com>

On Tue, Aug 22, 2017 at 1:27 PM, Mike Maloney <maloneykernel@gmail.com> wrote:
> From: Mike Maloney <maloney@google.com>
>
> When SOF_TIMESTAMPING_RX_SOFTWARE is enabled for tcp sockets, return the
> timestamp corresponding to the highest sequence number data returned.
>
> Previously the skb->tstamp is overwritten when a TCP packet is placed
> in the out of order queue.  While the packet is in the ooo queue, save the
> timestamp in the TCB_SKB_CB.  This space is shared with the gso_*
> options which are only used on the tx path, and a previously unused 4
> byte hole.
>
> When skbs are coalesced either in the sk_receive_queue or the
> out_of_order_queue always choose the timestamp of the appended skb to
> maintain the invariant of returning the timestamp of the last byte in
> the recvmsg buffer.
>

> +static void tcp_update_recv_tstamps(struct sk_buff *skb,
> +                                   struct scm_timestamping *tss)
> +{
> +       if (skb->tstamp)
> +               tss->ts[0] = ktime_to_timespec(skb->tstamp);
> +       else
> +               tss->ts[0] = (struct timespec) {0};
> +
> +       if (skb_hwtstamps(skb)->hwtstamp)
> +               tss->ts[2] = ktime_to_timespec(skb_hwtstamps(skb)->hwtstamp);
> +       else
> +               tss->ts[2] = (struct timespec) {0};

tss->ts[1] may remain uninitialized.

^ permalink raw reply

* Crash due to fbca164776e438 - "net: stmmac: Use the right logging function in stmmac_mdio_register"
From: Priit Laes @ 2017-08-22 18:47 UTC (permalink / raw)
  To: Romain Perier, Andrew Lunn, David S. Miller
  Cc: netdev, linux-kernel, Chen-Yu Tsai, Maxime Ripard

Hi!

I'm running into following crash during boot on Cubietruck A20, with the patch
fbca164776e438
(net: stmmac: Use the right logging function in stmmac_mdio_register) applied:

[snip]
sun7i-dwmac 1c50000.ethernet: PTP uses main clock
sun7i-dwmac 1c50000.ethernet: no reset control found
sun7i-dwmac 1c50000.ethernet: no regulator found
sun7i-dwmac 1c50000.ethernet: Ring mode enabled
sun7i-dwmac 1c50000.ethernet: DMA HW capability register supported
sun7i-dwmac 1c50000.ethernet: Normal descriptors
libphy: stmmac: probed
Unable to handle kernel NULL pointer dereference at virtual address 00000048
pgd = c0004000
[00000048] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc6-00318-g0065bd7fa384 #1
Hardware name: Allwinner sun7i (A20) Family
task: ee868000 task.stack: ee85c000
PC is at phy_attached_print+0x1c/0x8c
LR is at stmmac_mdio_register+0x12c/0x200
pc : [<c04510ac>]    lr : [<c045e6b4>]    psr: 60000013
sp : ee85ddc8  ip : 00000000  fp : c07dfb5c
r10: ee981210  r9 : 00000001  r8 : eea73000
r7 : eeaa6dd0  r6 : eeb49800  r5 : 00000000  r4 : 00000000
r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : eeb49800
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 4000406a  DAC: 00000051
Process swapper/0 (pid: 1, stack limit = 0xee85c210)
Stack: (0xee85ddc8 to 0xee85e000)
ddc0:                   00000000 00000002 eeb49400 eea72000 00000000 eeb49400
dde0: c045e6b4 00000000 ffffffff eeab0810 00000000 c08051f8 ee9292c0 c016d480
de00: eea725c0 eea73000 eea72000 00000001 eea726c0 c0457d0c 00000040 00000020
de20: 00000000 c045b850 00000001 00000000 ee981200 eeab0810 eeaa6ed0 ee981210
de40: 00000000 c094a4a0 00000000 c0465180 eeaa7550 f08d0000 c9ffb90c 00000032
de60: fffffffa 00000032 ee981210 ffffffed c0a46620 fffffdfb c0a46620 c03f7be8
de80: ee981210 c0a9a388 00000000 00000000 c0a46620 c03f63e0 ee981210 c0a46620
dea0: ee981244 00000000 00000007 000000c6 c094a4a0 c03f6534 00000000 c0a46620
dec0: c03f6490 c03f49ec ee828a58 ee9217b4 c0a46620 eeaa4b00 c0a43230 c03f59fc
dee0: c08051f8 c094a49c c0a46620 c0a46620 00000000 c091c668 c093783c c03f6dfc
df00: ffffe000 00000000 c091c668 c010177c eefe0938 eefe0935 c085e200 000000c6
df20: 00000005 c0136bc8 60000013 c080b3a4 00000006 00000006 c07ce7b4 00000000
df40: c07d7ddc c07cef28 eefe0938 eefe093e c0a0b2f0 c0a641c0 c0a641c0 c0a641c0
df60: c0937834 00000007 000000c6 c094a4a0 00000000 c0900d88 00000006 00000006
df80: 00000000 c09005a8 00000000 c060ecf4 00000000 00000000 00000000 00000000
dfa0: 00000000 c060ecfc 00000000 c0107738 00000000 00000000 00000000 00000000
dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 ffdeffff ffffffff
[<c04510ac>] (phy_attached_print) from [<c045e6b4>] (stmmac_mdio_register+0x12c/0x200)
[<c045e6b4>] (stmmac_mdio_register) from [<c045b850>] (stmmac_dvr_probe+0x850/0x96c)
[<c045b850>] (stmmac_dvr_probe) from [<c0465180>] (sun7i_gmac_probe+0x120/0x180)
[<c0465180>] (sun7i_gmac_probe) from [<c03f7be8>] (platform_drv_probe+0x50/0xac)
[<c03f7be8>] (platform_drv_probe) from [<c03f63e0>] (driver_probe_device+0x234/0x2e4)
[<c03f63e0>] (driver_probe_device) from [<c03f6534>] (__driver_attach+0xa4/0xa8)
[<c03f6534>] (__driver_attach) from [<c03f49ec>] (bus_for_each_dev+0x4c/0x9c)
[<c03f49ec>] (bus_for_each_dev) from [<c03f59fc>] (bus_add_driver+0x190/0x214)
[<c03f59fc>] (bus_add_driver) from [<c03f6dfc>] (driver_register+0x78/0xf4)
[<c03f6dfc>] (driver_register) from [<c010177c>] (do_one_initcall+0x44/0x168)
[<c010177c>] (do_one_initcall) from [<c0900d88>] (kernel_init_freeable+0x144/0x1d0)
[<c0900d88>] (kernel_init_freeable) from [<c060ecfc>] (kernel_init+0x8/0x110)
[<c060ecfc>] (kernel_init) from [<c0107738>] (ret_from_fork+0x14/0x3c)
Code: e59021c8 e59d401c e590302c e3540000 (e5922048) 
---[ end trace 39ae87c7923562d0 ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[/snip]

After reverting said patch, there are no issues.

Päikest,
Priit Laes

^ permalink raw reply

* Re: [PATCH 0/2] net: Fix crashes due to activity during suspend
From: Florian Fainelli @ 2017-08-22 18:49 UTC (permalink / raw)
  To: Geert Uytterhoeven, David S . Miller, Steve Glendinning,
	Andrew Lunn
  Cc: Lukas Wunner, Rafael J . Wysocki, netdev, linux-pm,
	linux-renesas-soc, linux-kernel
In-Reply-To: <1503427046-17618-1-git-send-email-geert+renesas@glider.be>

On 08/22/2017 11:37 AM, Geert Uytterhoeven wrote:
> 	Hi all,
> 
> If an Ethernet device is used while the device is suspended, the system may
> crash.
> 
> E.g. on sh73a0/kzm9g and r8a73a4/ape6evm, the external Ethernet chip is
> driven by a PM controlled clock.  If the Ethernet registers are accessed
> while the clock is not running, the system will crash with an imprecise
> external abort.
> 
> This patch series fixes two of such crashes:
>   1. The first patch prevents the PHY polling state machine from accessing
>      PHY registers while a device is suspended,
>   2. The second patch prevents the net core from trying to transmit packets
>      when an smsc911x device is suspended.
> 
> Both crashes can be reproduced on sh73a0/kzm9g and r8a73a4/ape6evm during
> s2ram (rarely), or by using pm_test (more likely to trigger):
> 
>     # echo 0 > /sys/module/printk/parameters/console_suspend
>     # echo platform > /sys/power/pm_test
>     # echo mem > /sys/power/state
> 
> With this series applied, my test systems survive a loop of 100 test
> suspends.

It seems to me like part, if not the entire problem is that smsc91xx's
suspend and resume functions are way too simplistic and absolutely do
not manage the PHY during suspend/resume, the PHY state machine is not
even stopped, so of course, this will cause bus errors if you access
those registers.

You are addressing this as part of patch 2, but this seems to me like
this is still a bit incomplete and you'd need at least phy_stop() and/or
phy_suspend() (does a power down of the PHY) and phy_start() and/or
phy_resume() calls to complete the PHY state machine shutdown during
suspend.

Have you tried that?
-- 
Florian

^ permalink raw reply

* Re: [PATCH] once: switch to new jump label API
From: Hannes Frederic Sowa @ 2017-08-22 18:44 UTC (permalink / raw)
  To: Eric Biggers
  Cc: netdev, linux-kernel, Jason Baron, Peter Zijlstra, Eric Biggers
In-Reply-To: <20170821234241.88438-1-ebiggers3@gmail.com>

Eric Biggers <ebiggers3@gmail.com> writes:

> From: Eric Biggers <ebiggers@google.com>
>
> Switch the DO_ONCE() macro from the deprecated jump label API to the new
> one.  The new one is more readable, and for DO_ONCE() it also makes the
> generated code more icache-friendly: now the one-time initialization
> code is placed out-of-line at the jump target, rather than at the inline
> fallthrough case.
>
> Signed-off-by: Eric Biggers <ebiggers@google.com>

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org.

Thanks!

^ permalink raw reply

* Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi
From: Michael S. Tsirkin @ 2017-08-22 18:39 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, jasowang, willemdebruijn.kernel, den,
	virtualization, netdev
In-Reply-To: <1503426508.2499.47.camel@edumazet-glaptop3.roam.corp.google.com>

On Tue, Aug 22, 2017 at 11:28:28AM -0700, Eric Dumazet wrote:
> On Tue, 2017-08-22 at 11:01 -0700, David Miller wrote:
> > From: "Michael S. Tsirkin" <mst@redhat.com>
> > Date: Tue, 22 Aug 2017 20:55:56 +0300
> > 
> > > Which reminds me that skb_linearize in net core seems to be
> > > fundamentally racy - I suspect that if skb is cloned, and someone is
> > > trying to use the shared frags while another thread calls skb_linearize,
> > > we get some use after free bugs which likely mostly go undetected
> > > because the corrupted packets mostly go on wire and get dropped
> > > by checksum code.
> > 
> > Indeed, it does assume that the skb from which the clone was made
> > never has it's geometry changed.
> > 
> > I don't think even the TCP retransmit queue has this guarantee.
> 
> TCP retransmit makes sure to avoid that.
> 
> if (skb_unclone(skb, GFP_ATOMIC))
>      return -ENOMEM;
> 
> ( Before cloning again skb )
> 
> 

I'm pretty sure not all users of skb_clone or generally
__pskb_pull_tail are careful like this.

E.g. skb_cow_data actually intentionally pulls pages when
skb is cloned.


-- 
MST

^ permalink raw reply

* Re: [PATCH net-next v2 00/10] net: mvpp2: MAC/GoP configuration
From: Antoine Tenart @ 2017-08-22 18:38 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Antoine Tenart, davem, jason, gregory.clement,
	sebastian.hesselbarth, thomas.petazzoni, nadavh, linux, mw,
	stefanc, netdev, linux-arm-kernel
In-Reply-To: <20170822180757.GD18076@lunn.ch>

[-- Attachment #1: Type: text/plain, Size: 673 bytes --]

Hi Andrew,

On Tue, Aug 22, 2017 at 08:07:57PM +0200, Andrew Lunn wrote:
> On Tue, Aug 22, 2017 at 07:08:20PM +0200, Antoine Tenart wrote:
> > 
> > This is based on net-next (e2a7c34fb285).
> > 
> > I removed the GoP interrupt and PHY optional parts in this v2 to ease
> > the review process as the MAC/GoP initialization seemed to start less
> > discussions :)
> 
> By that, do you mean setting the SMI PHY address?

Yes, the one set in the GoP. You asked more precisions but I need get
more info to answer, so I removed it in the meantime.

Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH 2/2] net: smsc911x: Quiten netif during suspend
From: Geert Uytterhoeven @ 2017-08-22 18:37 UTC (permalink / raw)
  To: David S . Miller, Steve Glendinning, Andrew Lunn,
	Florian Fainelli
  Cc: Lukas Wunner, Rafael J . Wysocki, netdev, linux-pm,
	linux-renesas-soc, linux-kernel, Geert Uytterhoeven
In-Reply-To: <1503427046-17618-1-git-send-email-geert+renesas@glider.be>

If the network interface is kept running during suspend, the net core
may call net_device_ops.ndo_start_xmit() while the Ethernet device is
still suspended, which may lead to a system crash.

E.g. on sh73a0/kzm9g and r8a73a4/ape6evm, the external Ethernet chip is
driven by a PM controlled clock.  If the Ethernet registers are accessed
while the clock is not running, the system will crash with an imprecise
external abort.

As this is a race condition with a small time window, it is not so easy
to trigger at will.  Using pm_test may increase your chances:

    # echo 0 > /sys/module/printk/parameters/console_suspend
    # echo platform > /sys/power/pm_test
    # echo mem > /sys/power/state

To fix this, make sure the network interface is quitened during suspend.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
No stacktrace is provided, as the imprecise external abort is usually
reported from an innocent looking and unrelated function like
__loop_delay(), cpu_idle_poll(), or arch_timer_read_counter_long().
---
 drivers/net/ethernet/smsc/smsc911x.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/smsc/smsc911x.c b/drivers/net/ethernet/smsc/smsc911x.c
index 0b6a39b003a4e188..012fb66eed8dd618 100644
--- a/drivers/net/ethernet/smsc/smsc911x.c
+++ b/drivers/net/ethernet/smsc/smsc911x.c
@@ -2595,6 +2595,11 @@ static int smsc911x_suspend(struct device *dev)
 	struct net_device *ndev = dev_get_drvdata(dev);
 	struct smsc911x_data *pdata = netdev_priv(ndev);
 
+	if (netif_running(ndev)) {
+		netif_stop_queue(ndev);
+		netif_device_detach(ndev);
+	}
+
 	/* enable wake on LAN, energy detection and the external PME
 	 * signal. */
 	smsc911x_reg_write(pdata, PMT_CTRL,
@@ -2628,7 +2633,15 @@ static int smsc911x_resume(struct device *dev)
 	while (!(smsc911x_reg_read(pdata, PMT_CTRL) & PMT_CTRL_READY_) && --to)
 		udelay(1000);
 
-	return (to == 0) ? -EIO : 0;
+	if (to == 0)
+		return -EIO;
+
+	if (netif_running(ndev)) {
+		netif_device_attach(ndev);
+		netif_start_queue(ndev);
+	}
+
+	return 0;
 }
 
 static const struct dev_pm_ops smsc911x_pm_ops = {
-- 
2.7.4

^ permalink raw reply related

* [PATCH 0/2] net: Fix crashes due to activity during suspend
From: Geert Uytterhoeven @ 2017-08-22 18:37 UTC (permalink / raw)
  To: David S . Miller, Steve Glendinning, Andrew Lunn,
	Florian Fainelli
  Cc: Lukas Wunner, Rafael J . Wysocki, netdev, linux-pm,
	linux-renesas-soc, linux-kernel, Geert Uytterhoeven

	Hi all,

If an Ethernet device is used while the device is suspended, the system may
crash.

E.g. on sh73a0/kzm9g and r8a73a4/ape6evm, the external Ethernet chip is
driven by a PM controlled clock.  If the Ethernet registers are accessed
while the clock is not running, the system will crash with an imprecise
external abort.

This patch series fixes two of such crashes:
  1. The first patch prevents the PHY polling state machine from accessing
     PHY registers while a device is suspended,
  2. The second patch prevents the net core from trying to transmit packets
     when an smsc911x device is suspended.

Both crashes can be reproduced on sh73a0/kzm9g and r8a73a4/ape6evm during
s2ram (rarely), or by using pm_test (more likely to trigger):

    # echo 0 > /sys/module/printk/parameters/console_suspend
    # echo platform > /sys/power/pm_test
    # echo mem > /sys/power/state

With this series applied, my test systems survive a loop of 100 test
suspends.

Thanks for your comments!

Geert Uytterhoeven (2):
  net: phy: Freeze PHY polling before suspending devices
  net: smsc911x: Quiten netif during suspend

 drivers/net/ethernet/smsc/smsc911x.c | 15 ++++++++++++++-
 drivers/net/phy/phy.c                | 12 +++++++-----
 2 files changed, 21 insertions(+), 6 deletions(-)

-- 
2.7.4

Gr{oetje,eeting}s,

						Geert

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox