Netdev List
 help / color / mirror / Atom feed
* Re: DSA vs. SWTICHDEV ?
From: Joakim Tjernlund @ 2016-11-30 14:30 UTC (permalink / raw)
  To: andrew@lunn.ch; +Cc: netdev@vger.kernel.org
In-Reply-To: <20161130135257.GC18716@lunn.ch>

On Wed, 2016-11-30 at 14:52 +0100, Andrew Lunn wrote:
> On Wed, Nov 30, 2016 at 08:50:34AM +0000, Joakim Tjernlund wrote:
> > I am trying to wrap my head around these two "devices" and have a hard time telling them apart.
> > We are looking att adding a faily large switch(over PCIe) to our board and from what I can tell
> > switchdev is the new way to do it but DSA is still there. Is it possible to just list
> > how they differ?
> 
> Hi Joakim

Hi Andrew, thanks for answering

> 
> If the interface you use to send frames from the host to the switch is
> PCIe, you probably want to use switchdev directly.

OK, we will have a few ethernet I/F's connected too but I these should be used
as normal interfaces just interfacing a switch.

> 
> DSA devices all use a host Ethernet interface to send frames to the
> switch. DSA sits under switchdev, and effectively provides a lot of
> the common stuff needed for implementing switch drivers of this
> sort. It creates the slave interfaces, links the MAC to the PHY, has
> one uniform device tree binding which all DSA switches have, deals
> with encapsulation/decapsulating frames sent over the master device,
> etc.

And switchdev can do all this over PCIe instead? Can you have a switch tree in switchdev too?

We probably need to push a custom tag for final step (after the switch) and I think that
should be doable with a custom VLAN driver over one/several of switchdev's virtual
interfaces? 

 Joakim

^ permalink raw reply

* Re: [PATCH iproute2 V3 3/3] tc/act_tunnel: Introduce ip tunnel action
From: Jiri Benc @ 2016-11-30 14:44 UTC (permalink / raw)
  To: Amir Vadai
  Cc: Stephen Hemminger, netdev, David S. Miller, Or Gerlitz,
	Hadar Har-Zion, Jiri Pirko
In-Reply-To: <20161130073840.5269-4-amir@vadai.me>

On Wed, 30 Nov 2016 09:38:40 +0200, Amir Vadai wrote:
> +The
> +.I UNSET
> +mode is optional - even without using it, the metadata information will be
> +released automatically when packet processing will be finished.
> +.IR UNSET
> +function could be used in cases when traffic is forwarded between two tunnels,
> +where the metadata from the first tunnel will be used for encapsulation done by
> +the second tunnel.

This looks good.

> +It must be used for offloaded filters, such that hardware drivers can
> +realize they need to program the HW to do decapsulation.

However, this is wrong. The hardware offloading must be transparent.
The same configuration that works when processed in software must work
in hardware if the hardware has the necessary capabilities. Requiring
the user to alter the configuration to accommodate hardware
peculiarities is not acceptable.

Or maybe I'm misunderstanding what you mean here. In which case it's
not documented properly :-)

> +.IR SET
> +mode requires the source and destination ip
> +.I ADDRESS
> +and the tunnel key id
> +.I KEY_ID
> +which will be used by the ip tunnel shared device to create the tunnel header. The
> +.B tunnel_key
> +action is useful only in combination with a
> +.B mirred redirect
> +action to a shared IP tunnel device which will use the metadata (for
> +.I SET
> +) and unset the metadata created by it (for
> +.I UNSET
> +).
> +
> +.SH OPTIONS
> +.TP
> +.B unset
> +Decapsulation mode, no further arguments allowed. This function is not
> +mandatory and might be used only in some specific use cases.

This is NOT decapsulation. The packet is decapsulated at this point in
any case, whether or not set/unset or whatever is used. These actions
are only and solely about metadata associated with the packet. The
actual encapsulation and decapsulation happens at the target netdevice.

Calling this "decapsulation" is wrong. And if it's implemented as such
in your hardware offloading, then it's doubly wrong as it doesn't match
software processing and hence you must not do that and you must change
that.

> +.TP
> +.B set
> +Encapsulation mode. Requires

Likewise, this is not encapsulation. It just sets metadata.

 Jiri

^ permalink raw reply

* Re: [PATCH net 7/7] net: ethernet: stmmac: fix of-node and fixed-link-phydev leaks
From: Maxime Ripard @ 2016-11-30 14:39 UTC (permalink / raw)
  To: Johan Hovold
  Cc: David S. Miller, Giuseppe Cavallaro, Alexandre Torgue,
	Joachim Eastwood, Carlo Caione, Kevin Hilman, Maxime Coquelin,
	Chen-Yu Tsai, netdev, linux-kernel
In-Reply-To: <1480516195-27696-8-git-send-email-johan@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 848 bytes --]

On Wed, Nov 30, 2016 at 03:29:55PM +0100, Johan Hovold wrote:
> Make sure to deregister and free any fixed-link phy registered during
> probe on probe errors and on driver unbind by adding a new glue helper
> function.
> 
> Drop the of-node reference taken in the same path also on late probe
> errors (and not just on driver unbind) by moving the put from
> stmmac_dvr_remove() to the new helper.
> 
> Fixes: 277323814e49 ("stmmac: add fixed-link device-tree support")
> Fixes: 4613b279bee7 ("ethernet: stmicro: stmmac: add missing of_node_put
> after calling of_parse_phandle")
> Signed-off-by: Johan Hovold <johan@kernel.org>

For the sunxi part,
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>

Thanks!
Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* [PATCH net 6/7] net: ethernet: stmmac: platform: fix outdated function header
From: Johan Hovold @ 2016-11-30 14:29 UTC (permalink / raw)
  To: David S. Miller
  Cc: Giuseppe Cavallaro, Alexandre Torgue, Joachim Eastwood,
	Carlo Caione, Kevin Hilman, Maxime Coquelin, Maxime Ripard,
	Chen-Yu Tsai, netdev, linux-kernel, Johan Hovold
In-Reply-To: <1480516195-27696-1-git-send-email-johan@kernel.org>

Fix the OF-helper function header to reflect that the function no longer
has a platform-data parameter.

Fixes: b0003ead75f3 ("stmmac: make stmmac_probe_config_dt return the
platform data struct")
Signed-off-by: Johan Hovold <johan@kernel.org>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 0a0d6a86f397..bcbf123d5ba2 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -200,7 +200,6 @@ static int stmmac_dt_phy(struct plat_stmmacenet_data *plat,
 /**
  * stmmac_probe_config_dt - parse device-tree driver parameters
  * @pdev: platform_device structure
- * @plat: driver data platform structure
  * @mac: MAC address to use
  * Description:
  * this function is to read the driver parameters from device-tree and
-- 
2.7.3

^ permalink raw reply related

* [PATCH net 3/7] net: ethernet: stmmac: dwmac-rk: fix probe error path
From: Johan Hovold @ 2016-11-30 14:29 UTC (permalink / raw)
  To: David S. Miller
  Cc: Giuseppe Cavallaro, Alexandre Torgue, Joachim Eastwood,
	Carlo Caione, Kevin Hilman, Maxime Coquelin, Maxime Ripard,
	Chen-Yu Tsai, netdev, linux-kernel, Johan Hovold
In-Reply-To: <1480516195-27696-1-git-send-email-johan@kernel.org>

Make sure to disable runtime PM, power down the PHY, and disable clocks
before returning on late probe errors.

Fixes: 27ffefd2d109 ("stmmac: dwmac-rk: create a new probe function")
Signed-off-by: Johan Hovold <johan@kernel.org>
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
index 3740a4417fa0..e7aabe56c15a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
@@ -988,7 +988,16 @@ static int rk_gmac_probe(struct platform_device *pdev)
 	if (ret)
 		return ret;
 
-	return stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
+	ret = stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
+	if (ret)
+		goto err_gmac_exit;
+
+	return 0;
+
+err_gmac_exit:
+	rk_gmac_exit(pdev, plat_dat->bsp_priv);
+
+	return ret;
 }
 
 static const struct of_device_id rk_gmac_dwmac_match[] = {
-- 
2.7.3

^ permalink raw reply related

* [PATCH net 2/7] net: ethernet: stmmac: dwmac-sti: fix probe error path
From: Johan Hovold @ 2016-11-30 14:29 UTC (permalink / raw)
  To: David S. Miller
  Cc: Giuseppe Cavallaro, Alexandre Torgue, Joachim Eastwood,
	Carlo Caione, Kevin Hilman, Maxime Coquelin, Maxime Ripard,
	Chen-Yu Tsai, netdev, linux-kernel, Johan Hovold
In-Reply-To: <1480516195-27696-1-git-send-email-johan@kernel.org>

Make sure to disable clocks before returning on late probe errors.

Fixes: 8387ee21f972 ("stmmac: dwmac-sti: turn setup callback into a
probe function")
Signed-off-by: Johan Hovold <johan@kernel.org>
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
index 58c05acc2aab..a1ce018bf844 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
@@ -365,7 +365,16 @@ static int sti_dwmac_probe(struct platform_device *pdev)
 	if (ret)
 		return ret;
 
-	return stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
+	ret = stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
+	if (ret)
+		goto err_dwmac_exit;
+
+	return 0;
+
+err_dwmac_exit:
+	sti_dwmac_exit(pdev, plat_dat->bsp_priv);
+
+	return ret;
 }
 
 static const struct sti_dwmac_of_data stih4xx_dwmac_data = {
-- 
2.7.3

^ permalink raw reply related

* [PATCH net 0/7] net: stmmac: fix probe error handling and phydev leaks
From: Johan Hovold @ 2016-11-30 14:29 UTC (permalink / raw)
  To: David S. Miller
  Cc: Giuseppe Cavallaro, Alexandre Torgue, Joachim Eastwood,
	Carlo Caione, Kevin Hilman, Maxime Coquelin, Maxime Ripard,
	Chen-Yu Tsai, netdev, linux-kernel, Johan Hovold

This series fixes a number of issues with the stmmac-driver probe error
handling, which for example left clocks enabled after probe failures.

The final patch fixes a failure to deregister and free any fixed-link
PHYs that were registered during probe on probe errors and on driver
unbind. It also fixes a related of-node leak on late probe errors.

This series depends on the of_phy_deregister_fixed_link() helper that
was just merged to net.

As mentioned earlier, one staging driver also suffers from a similar
leak and can be fixed up once the above mentioned helper hits mainline.

Note that these patches have only been compile tested.

Johan


Johan Hovold (7):
  net: ethernet: stmmac: dwmac-socfpga: fix use-after-free on probe
    errors
  net: ethernet: stmmac: dwmac-sti: fix probe error path
  net: ethernet: stmmac: dwmac-rk: fix probe error path
  net: ethernet: stmmac: dwmac-generic: fix probe error path
  net: ethernet: stmmac: dwmac-meson8b: fix probe error path
  net: ethernet: stmmac: platform: fix outdated function header
  net: ethernet: stmmac: fix of-node and fixed-link-phydev leaks

 .../net/ethernet/stmicro/stmmac/dwmac-generic.c    | 17 ++++++++--
 .../net/ethernet/stmicro/stmmac/dwmac-ipq806x.c    | 25 ++++++++++----
 .../net/ethernet/stmicro/stmmac/dwmac-lpc18xx.c    | 17 ++++++++--
 drivers/net/ethernet/stmicro/stmmac/dwmac-meson.c  | 23 ++++++++++---
 .../net/ethernet/stmicro/stmmac/dwmac-meson8b.c    | 32 +++++++++++++-----
 drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c     | 21 +++++++++---
 .../net/ethernet/stmicro/stmmac/dwmac-socfpga.c    | 39 ++++++++++++++--------
 drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c    | 23 ++++++++++---
 drivers/net/ethernet/stmicro/stmmac/dwmac-stm32.c  | 19 ++++++++---
 drivers/net/ethernet/stmicro/stmmac/dwmac-sunxi.c  | 26 +++++++++++----
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |  1 -
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  | 33 +++++++++++++++---
 .../net/ethernet/stmicro/stmmac/stmmac_platform.h  |  2 ++
 13 files changed, 215 insertions(+), 63 deletions(-)

-- 
2.7.3

^ permalink raw reply

* Re: [PATCH v3 net-next 3/3] openvswitch: Fix skb->protocol for vlan frames.
From: Jiri Benc @ 2016-11-30 14:30 UTC (permalink / raw)
  To: Jarno Rajahalme; +Cc: netdev, pshelar, e
In-Reply-To: <1480462253-114713-3-git-send-email-jarno@ovn.org>

On Tue, 29 Nov 2016 15:30:53 -0800, Jarno Rajahalme wrote:
> Do not always set skb->protocol to be the ethertype of the L3 header.
> For a packet with non-accelerated VLAN tags skb->protocol needs to be
> the ethertype of the outermost non-accelerated VLAN ethertype.

Well, the current handling of skb->protocol matches what used to be the
handling of the kernel net stack before Jiri Pirko cleaned up the vlan
code.

I'm not opposed to changing this but I'm afraid it needs much deeper
review. Because with this in place, no core kernel functions that
depend on skb->protocol may be called from within openvswitch.

> @@ -361,6 +362,11 @@ static int parse_vlan(struct sk_buff *skb, struct sw_flow_key *key)
>  	if (res <= 0)
>  		return res;
>  
> +	/* If the outer vlan tag was accelerated, skb->protocol should
> +	 * refelect the inner vlan type. */
> +	if (!eth_type_vlan(skb->protocol))
> +		skb->protocol = key->eth.cvlan.tpid;

This should not depend on the current value in skb->protocol which
could be arbitrary at this point (from the point of view of how this
patch understands the skb->protocol values). It's easy to fix, though -
just add a local bool variable tracking whether the skb->protocol has
been set.

> @@ -531,15 +544,16 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
>  		if (unlikely(parse_vlan(skb, key)))
>  			return -ENOMEM;
>  
> -		skb->protocol = parse_ethertype(skb);
> -		if (unlikely(skb->protocol == htons(0)))
> +		key->eth.type = parse_ethertype(skb);
> +		if (unlikely(key->eth.type == htons(0)))
>  			return -ENOMEM;
>  
>  		skb_reset_network_header(skb);
>  		__skb_push(skb, skb->data - skb_mac_header(skb));
>  	}
>  	skb_reset_mac_len(skb);
> -	key->eth.type = skb->protocol;
> +	if (!eth_type_vlan(skb->protocol))
> +		skb->protocol = key->eth.type;

This leaves key->eth.type undefined for key->mac_proto ==
MAC_PROTO_NONE.

Plus the same problem as above with unknown value of skb->protocol. But
this is more complicated here, as skb->protocol may be either
uninitialized at this point or already initialized by parse_vlan.

>  
>  	/* Network layer. */
>  	if (key->eth.type == htons(ETH_P_IP)) {
> @@ -800,29 +814,15 @@ int ovs_flow_key_extract_userspace(struct net *net, const struct nlattr *attr,
>  	if (err)
>  		return err;
>  
> -	if (ovs_key_mac_proto(key) == MAC_PROTO_NONE) {
> -		/* key_extract assumes that skb->protocol is set-up for
> -		 * layer 3 packets which is the case for other callers,
> -		 * in particular packets recieved from the network stack.
> -		 * Here the correct value can be set from the metadata
> -		 * extracted above.
> -		 */
> -		skb->protocol = key->eth.type;
> -	} else {
> -		struct ethhdr *eth;
> -
> -		skb_reset_mac_header(skb);
> -		eth = eth_hdr(skb);
> -
> -		/* Normally, setting the skb 'protocol' field would be
> -		 * handled by a call to eth_type_trans(), but it assumes
> -		 * there's a sending device, which we may not have.
> -		 */
> -		if (eth_proto_is_802_3(eth->h_proto))
> -			skb->protocol = eth->h_proto;
> -		else
> -			skb->protocol = htons(ETH_P_802_2);
> -	}
> +	/* key_extract assumes that skb->protocol is set-up for
> +	 * layer 3 packets which is the case for other callers,
> +	 * in particular packets recieved from the network stack.
> +	 * Here the correct value can be set from the metadata
> +	 * extracted above.  For layer 2 packets we initialize
> +         * skb->protocol to zero and set it in key_extract() while
> +         * parsing the L2 headers.
> +	 */
> +	skb->protocol = key->eth.type;
>  
>  	return key_extract(skb, key);
>  }

Interesting. This hunk looks safe even without the rest of the patch.
You should fix the comment indentation, though.

 Jiri

^ permalink raw reply

* Re: [net-next PATCH v3 6/6] virtio_net: xdp, add slowpath case for non contiguous buffers
From: Jakub Kicinski @ 2016-11-30 14:30 UTC (permalink / raw)
  To: John Fastabend, Michael S. Tsirkin
  Cc: eric.dumazet, daniel, shm, davem, tgraf, alexei.starovoitov,
	john.r.fastabend, netdev, bblanco, brouer
In-Reply-To: <20161129201133.26851.31803.stgit@john-Precision-Tower-5810>

[add MST]

On Tue, 29 Nov 2016 12:11:33 -0800, John Fastabend wrote:
> virtio_net XDP support expects receive buffers to be contiguous.
> If this is not the case we enable a slowpath to allow connectivity
> to continue but at a significan performance overhead associated with
> linearizing data. To make it painfully aware to users that XDP is
> running in a degraded mode we throw an xdp buffer error.
> 
> To linearize packets we allocate a page and copy the segments of
> the data, including the header, into it. After this the page can be
> handled by XDP code flow as normal.
> 
> Then depending on the return code the page is either freed or sent
> to the XDP xmit path. There is no attempt to optimize this path.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  drivers/net/virtio_net.c |   70 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 68 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 9604e55..b0ce4ef 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -449,6 +449,56 @@ static struct sk_buff *receive_big(struct net_device *dev,
>  	return NULL;
>  }
>  
> +/* The conditions to enable XDP should preclude the underlying device from
> + * sending packets across multiple buffers (num_buf > 1). However per spec
> + * it does not appear to be illegal to do so but rather just against convention.
> + * So in order to avoid making a system unresponsive the packets are pushed
> + * into a page and the XDP program is run. This will be extremely slow and we
> + * push a warning to the user to fix this as soon as possible. Fixing this may
> + * require resolving the underlying hardware to determine why multiple buffers
> + * are being received or simply loading the XDP program in the ingress stack
> + * after the skb is built because there is no advantage to running it here
> + * anymore.
> + */
> +static struct page *xdp_linearize_page(struct receive_queue *rq,
> +				       u16 num_buf,
> +				       struct page *p,
> +				       int offset,
> +				       unsigned int *len)
> +{
> +	struct page *page = alloc_page(GFP_ATOMIC);
> +	unsigned int page_off = 0;
> +
> +	if (!page)
> +		return NULL;
> +
> +	memcpy(page_address(page) + page_off, page_address(p) + offset, *len);
> +	while (--num_buf) {
> +		unsigned int buflen;
> +		unsigned long ctx;
> +		void *buf;
> +		int off;
> +
> +		ctx = (unsigned long)virtqueue_get_buf(rq->vq, &buflen);
> +		if (unlikely(!ctx))
> +			goto err_buf;
> +
> +		buf = mergeable_ctx_to_buf_address(ctx);
> +		p = virt_to_head_page(buf);
> +		off = buf - page_address(p);
> +
> +		memcpy(page_address(page) + page_off,
> +		       page_address(p) + off, buflen);
> +		page_off += buflen;

Could malicious user potentially submit a frame bigger than MTU?

> +	}
> +
> +	*len = page_off;
> +	return page;
> +err_buf:
> +	__free_pages(page, 0);
> +	return NULL;
> +}
> +
>  static struct sk_buff *receive_mergeable(struct net_device *dev,
>  					 struct virtnet_info *vi,
>  					 struct receive_queue *rq,
> @@ -469,21 +519,37 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  	rcu_read_lock();
>  	xdp_prog = rcu_dereference(rq->xdp_prog);
>  	if (xdp_prog) {
> +		struct page *xdp_page;
>  		u32 act;
>  
>  		if (num_buf > 1) {
>  			bpf_warn_invalid_xdp_buffer();
> -			goto err_xdp;
> +
> +			/* linearize data for XDP */
> +			xdp_page = xdp_linearize_page(rq, num_buf,
> +						      page, offset, &len);
> +			if (!xdp_page)
> +				goto err_xdp;
> +			offset = len;
> +		} else {
> +			xdp_page = page;
>  		}
>  
> -		act = do_xdp_prog(vi, xdp_prog, page, offset, len);
> +		act = do_xdp_prog(vi, xdp_prog, xdp_page, offset, len);
>  		switch (act) {
>  		case XDP_PASS:
> +			if (unlikely(xdp_page != page))
> +				__free_pages(xdp_page, 0);
>  			break;
>  		case XDP_TX:
> +			if (unlikely(xdp_page != page))
> +				goto err_xdp;
> +			rcu_read_unlock();

Only if there is a reason for v4 - this unlock could go to the previous
patch.

^ permalink raw reply

* [PATCH net 7/7] net: ethernet: stmmac: fix of-node and fixed-link-phydev leaks
From: Johan Hovold @ 2016-11-30 14:29 UTC (permalink / raw)
  To: David S. Miller
  Cc: Giuseppe Cavallaro, Alexandre Torgue, Joachim Eastwood,
	Carlo Caione, Kevin Hilman, Maxime Coquelin, Maxime Ripard,
	Chen-Yu Tsai, netdev, linux-kernel, Johan Hovold
In-Reply-To: <1480516195-27696-1-git-send-email-johan@kernel.org>

Make sure to deregister and free any fixed-link phy registered during
probe on probe errors and on driver unbind by adding a new glue helper
function.

Drop the of-node reference taken in the same path also on late probe
errors (and not just on driver unbind) by moving the put from
stmmac_dvr_remove() to the new helper.

Fixes: 277323814e49 ("stmmac: add fixed-link device-tree support")
Fixes: 4613b279bee7 ("ethernet: stmicro: stmmac: add missing of_node_put
after calling of_parse_phandle")
Signed-off-by: Johan Hovold <johan@kernel.org>
---
 .../net/ethernet/stmicro/stmmac/dwmac-generic.c    |  5 +++-
 .../net/ethernet/stmicro/stmmac/dwmac-ipq806x.c    | 25 +++++++++++++----
 .../net/ethernet/stmicro/stmmac/dwmac-lpc18xx.c    | 17 ++++++++++--
 drivers/net/ethernet/stmicro/stmmac/dwmac-meson.c  | 23 ++++++++++++----
 .../net/ethernet/stmicro/stmmac/dwmac-meson8b.c    | 21 +++++++++-----
 drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c     | 10 +++++--
 .../net/ethernet/stmicro/stmmac/dwmac-socfpga.c    | 12 +++++---
 drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c    | 12 +++++---
 drivers/net/ethernet/stmicro/stmmac/dwmac-stm32.c  | 19 +++++++++----
 drivers/net/ethernet/stmicro/stmmac/dwmac-sunxi.c  | 26 +++++++++++++-----
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |  1 -
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  | 32 ++++++++++++++++++++--
 .../net/ethernet/stmicro/stmmac/stmmac_platform.h  |  2 ++
 13 files changed, 156 insertions(+), 49 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c
index 05e46a82cdb1..e6e6c2fcc4b7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c
@@ -50,7 +50,7 @@ static int dwmac_generic_probe(struct platform_device *pdev)
 	if (plat_dat->init) {
 		ret = plat_dat->init(pdev, plat_dat->bsp_priv);
 		if (ret)
-			return ret;
+			goto err_remove_config_dt;
 	}
 
 	ret = stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
@@ -62,6 +62,9 @@ static int dwmac_generic_probe(struct platform_device *pdev)
 err_exit:
 	if (plat_dat->exit)
 		plat_dat->exit(pdev, plat_dat->bsp_priv);
+err_remove_config_dt:
+	if (pdev->dev.of_node)
+		stmmac_remove_config_dt(pdev, plat_dat);
 
 	return ret;
 }
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-ipq806x.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-ipq806x.c
index 36d3355f2fb0..866444b6c82f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-ipq806x.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-ipq806x.c
@@ -271,15 +271,17 @@ static int ipq806x_gmac_probe(struct platform_device *pdev)
 		return PTR_ERR(plat_dat);
 
 	gmac = devm_kzalloc(dev, sizeof(*gmac), GFP_KERNEL);
-	if (!gmac)
-		return -ENOMEM;
+	if (!gmac) {
+		err = -ENOMEM;
+		goto err_remove_config_dt;
+	}
 
 	gmac->pdev = pdev;
 
 	err = ipq806x_gmac_of_parse(gmac);
 	if (err) {
 		dev_err(dev, "device tree parsing error\n");
-		return err;
+		goto err_remove_config_dt;
 	}
 
 	regmap_write(gmac->qsgmii_csr, QSGMII_PCS_CAL_LCKDT_CTL,
@@ -300,7 +302,8 @@ static int ipq806x_gmac_probe(struct platform_device *pdev)
 	default:
 		dev_err(&pdev->dev, "Unsupported PHY mode: \"%s\"\n",
 			phy_modes(gmac->phy_mode));
-		return -EINVAL;
+		err = -EINVAL;
+		goto err_remove_config_dt;
 	}
 	regmap_write(gmac->nss_common, NSS_COMMON_GMAC_CTL(gmac->id), val);
 
@@ -319,7 +322,8 @@ static int ipq806x_gmac_probe(struct platform_device *pdev)
 	default:
 		dev_err(&pdev->dev, "Unsupported PHY mode: \"%s\"\n",
 			phy_modes(gmac->phy_mode));
-		return -EINVAL;
+		err = -EINVAL;
+		goto err_remove_config_dt;
 	}
 	regmap_write(gmac->nss_common, NSS_COMMON_CLK_SRC_CTRL, val);
 
@@ -346,7 +350,16 @@ static int ipq806x_gmac_probe(struct platform_device *pdev)
 	plat_dat->bsp_priv = gmac;
 	plat_dat->fix_mac_speed = ipq806x_gmac_fix_mac_speed;
 
-	return stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
+	err = stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
+	if (err)
+		goto err_remove_config_dt;
+
+	return 0;
+
+err_remove_config_dt:
+	stmmac_remove_config_dt(pdev, plat_dat);
+
+	return err;
 }
 
 static const struct of_device_id ipq806x_gmac_dwmac_match[] = {
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-lpc18xx.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-lpc18xx.c
index 78e9d1861896..3d3f43d91b98 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-lpc18xx.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-lpc18xx.c
@@ -46,7 +46,8 @@ static int lpc18xx_dwmac_probe(struct platform_device *pdev)
 	reg = syscon_regmap_lookup_by_compatible("nxp,lpc1850-creg");
 	if (IS_ERR(reg)) {
 		dev_err(&pdev->dev, "syscon lookup failed\n");
-		return PTR_ERR(reg);
+		ret = PTR_ERR(reg);
+		goto err_remove_config_dt;
 	}
 
 	if (plat_dat->interface == PHY_INTERFACE_MODE_MII) {
@@ -55,13 +56,23 @@ static int lpc18xx_dwmac_probe(struct platform_device *pdev)
 		ethmode = LPC18XX_CREG_CREG6_ETHMODE_RMII;
 	} else {
 		dev_err(&pdev->dev, "Only MII and RMII mode supported\n");
-		return -EINVAL;
+		ret = -EINVAL;
+		goto err_remove_config_dt;
 	}
 
 	regmap_update_bits(reg, LPC18XX_CREG_CREG6,
 			   LPC18XX_CREG_CREG6_ETHMODE_MASK, ethmode);
 
-	return stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
+	ret = stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
+	if (ret)
+		goto err_remove_config_dt;
+
+	return 0;
+
+err_remove_config_dt:
+	stmmac_remove_config_dt(pdev, plat_dat);
+
+	return ret;
 }
 
 static const struct of_device_id lpc18xx_dwmac_match[] = {
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson.c
index 309d99536a2c..7fdd1760a74c 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson.c
@@ -64,18 +64,31 @@ static int meson6_dwmac_probe(struct platform_device *pdev)
 		return PTR_ERR(plat_dat);
 
 	dwmac = devm_kzalloc(&pdev->dev, sizeof(*dwmac), GFP_KERNEL);
-	if (!dwmac)
-		return -ENOMEM;
+	if (!dwmac) {
+		ret = -ENOMEM;
+		goto err_remove_config_dt;
+	}
 
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
 	dwmac->reg = devm_ioremap_resource(&pdev->dev, res);
-	if (IS_ERR(dwmac->reg))
-		return PTR_ERR(dwmac->reg);
+	if (IS_ERR(dwmac->reg)) {
+		ret = PTR_ERR(dwmac->reg);
+		goto err_remove_config_dt;
+	}
 
 	plat_dat->bsp_priv = dwmac;
 	plat_dat->fix_mac_speed = meson6_dwmac_fix_mac_speed;
 
-	return stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
+	ret = stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
+	if (ret)
+		goto err_remove_config_dt;
+
+	return 0;
+
+err_remove_config_dt:
+	stmmac_remove_config_dt(pdev, plat_dat);
+
+	return ret;
 }
 
 static const struct of_device_id meson6_dwmac_match[] = {
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
index 45e7aaf0170d..ffaed1f35efe 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
@@ -264,28 +264,33 @@ static int meson8b_dwmac_probe(struct platform_device *pdev)
 		return PTR_ERR(plat_dat);
 
 	dwmac = devm_kzalloc(&pdev->dev, sizeof(*dwmac), GFP_KERNEL);
-	if (!dwmac)
-		return -ENOMEM;
+	if (!dwmac) {
+		ret = -ENOMEM;
+		goto err_remove_config_dt;
+	}
 
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
 	dwmac->regs = devm_ioremap_resource(&pdev->dev, res);
-	if (IS_ERR(dwmac->regs))
-		return PTR_ERR(dwmac->regs);
+	if (IS_ERR(dwmac->regs)) {
+		ret = PTR_ERR(dwmac->regs);
+		goto err_remove_config_dt;
+	}
 
 	dwmac->pdev = pdev;
 	dwmac->phy_mode = of_get_phy_mode(pdev->dev.of_node);
 	if (dwmac->phy_mode < 0) {
 		dev_err(&pdev->dev, "missing phy-mode property\n");
-		return -EINVAL;
+		ret = -EINVAL;
+		goto err_remove_config_dt;
 	}
 
 	ret = meson8b_init_clk(dwmac);
 	if (ret)
-		return ret;
+		goto err_remove_config_dt;
 
 	ret = meson8b_init_prg_eth(dwmac);
 	if (ret)
-		return ret;
+		goto err_remove_config_dt;
 
 	plat_dat->bsp_priv = dwmac;
 
@@ -297,6 +302,8 @@ static int meson8b_dwmac_probe(struct platform_device *pdev)
 
 err_clk_disable:
 	clk_disable_unprepare(dwmac->m25_div_clk);
+err_remove_config_dt:
+	stmmac_remove_config_dt(pdev, plat_dat);
 
 	return ret;
 }
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
index e7aabe56c15a..d80c88bd2bba 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
@@ -981,12 +981,14 @@ static int rk_gmac_probe(struct platform_device *pdev)
 	plat_dat->resume = rk_gmac_resume;
 
 	plat_dat->bsp_priv = rk_gmac_setup(pdev, data);
-	if (IS_ERR(plat_dat->bsp_priv))
-		return PTR_ERR(plat_dat->bsp_priv);
+	if (IS_ERR(plat_dat->bsp_priv)) {
+		ret = PTR_ERR(plat_dat->bsp_priv);
+		goto err_remove_config_dt;
+	}
 
 	ret = rk_gmac_init(pdev, plat_dat->bsp_priv);
 	if (ret)
-		return ret;
+		goto err_remove_config_dt;
 
 	ret = stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
 	if (ret)
@@ -996,6 +998,8 @@ static int rk_gmac_probe(struct platform_device *pdev)
 
 err_gmac_exit:
 	rk_gmac_exit(pdev, plat_dat->bsp_priv);
+err_remove_config_dt:
+	stmmac_remove_config_dt(pdev, plat_dat);
 
 	return ret;
 }
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
index 47db157da3e8..0c420e97de1e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
@@ -316,13 +316,15 @@ static int socfpga_dwmac_probe(struct platform_device *pdev)
 		return PTR_ERR(plat_dat);
 
 	dwmac = devm_kzalloc(dev, sizeof(*dwmac), GFP_KERNEL);
-	if (!dwmac)
-		return -ENOMEM;
+	if (!dwmac) {
+		ret = -ENOMEM;
+		goto err_remove_config_dt;
+	}
 
 	ret = socfpga_dwmac_parse_data(dwmac, dev);
 	if (ret) {
 		dev_err(dev, "Unable to parse OF data\n");
-		return ret;
+		goto err_remove_config_dt;
 	}
 
 	plat_dat->bsp_priv = dwmac;
@@ -330,7 +332,7 @@ static int socfpga_dwmac_probe(struct platform_device *pdev)
 
 	ret = stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
 	if (ret)
-		return ret;
+		goto err_remove_config_dt;
 
 	ndev = platform_get_drvdata(pdev);
 	stpriv = netdev_priv(ndev);
@@ -349,6 +351,8 @@ static int socfpga_dwmac_probe(struct platform_device *pdev)
 
 err_dvr_remove:
 	stmmac_dvr_remove(&pdev->dev);
+err_remove_config_dt:
+	stmmac_remove_config_dt(pdev, plat_dat);
 
 	return ret;
 }
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
index a1ce018bf844..060b98c37a85 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
@@ -345,13 +345,15 @@ static int sti_dwmac_probe(struct platform_device *pdev)
 		return PTR_ERR(plat_dat);
 
 	dwmac = devm_kzalloc(&pdev->dev, sizeof(*dwmac), GFP_KERNEL);
-	if (!dwmac)
-		return -ENOMEM;
+	if (!dwmac) {
+		ret = -ENOMEM;
+		goto err_remove_config_dt;
+	}
 
 	ret = sti_dwmac_parse_data(dwmac, pdev);
 	if (ret) {
 		dev_err(&pdev->dev, "Unable to parse OF data\n");
-		return ret;
+		goto err_remove_config_dt;
 	}
 
 	dwmac->fix_retime_src = data->fix_retime_src;
@@ -363,7 +365,7 @@ static int sti_dwmac_probe(struct platform_device *pdev)
 
 	ret = sti_dwmac_init(pdev, plat_dat->bsp_priv);
 	if (ret)
-		return ret;
+		goto err_remove_config_dt;
 
 	ret = stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
 	if (ret)
@@ -373,6 +375,8 @@ static int sti_dwmac_probe(struct platform_device *pdev)
 
 err_dwmac_exit:
 	sti_dwmac_exit(pdev, plat_dat->bsp_priv);
+err_remove_config_dt:
+	stmmac_remove_config_dt(pdev, plat_dat);
 
 	return ret;
 }
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-stm32.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-stm32.c
index e5a926b8bee7..61cb24810d10 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-stm32.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-stm32.c
@@ -107,24 +107,33 @@ static int stm32_dwmac_probe(struct platform_device *pdev)
 		return PTR_ERR(plat_dat);
 
 	dwmac = devm_kzalloc(&pdev->dev, sizeof(*dwmac), GFP_KERNEL);
-	if (!dwmac)
-		return -ENOMEM;
+	if (!dwmac) {
+		ret = -ENOMEM;
+		goto err_remove_config_dt;
+	}
 
 	ret = stm32_dwmac_parse_data(dwmac, &pdev->dev);
 	if (ret) {
 		dev_err(&pdev->dev, "Unable to parse OF data\n");
-		return ret;
+		goto err_remove_config_dt;
 	}
 
 	plat_dat->bsp_priv = dwmac;
 
 	ret = stm32_dwmac_init(plat_dat);
 	if (ret)
-		return ret;
+		goto err_remove_config_dt;
 
 	ret = stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
 	if (ret)
-		stm32_dwmac_clk_disable(dwmac);
+		goto err_clk_disable;
+
+	return 0;
+
+err_clk_disable:
+	stm32_dwmac_clk_disable(dwmac);
+err_remove_config_dt:
+	stmmac_remove_config_dt(pdev, plat_dat);
 
 	return ret;
 }
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sunxi.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-sunxi.c
index adff46375a32..d07520fb969e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sunxi.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sunxi.c
@@ -120,22 +120,27 @@ static int sun7i_gmac_probe(struct platform_device *pdev)
 		return PTR_ERR(plat_dat);
 
 	gmac = devm_kzalloc(dev, sizeof(*gmac), GFP_KERNEL);
-	if (!gmac)
-		return -ENOMEM;
+	if (!gmac) {
+		ret = -ENOMEM;
+		goto err_remove_config_dt;
+	}
 
 	gmac->interface = of_get_phy_mode(dev->of_node);
 
 	gmac->tx_clk = devm_clk_get(dev, "allwinner_gmac_tx");
 	if (IS_ERR(gmac->tx_clk)) {
 		dev_err(dev, "could not get tx clock\n");
-		return PTR_ERR(gmac->tx_clk);
+		ret = PTR_ERR(gmac->tx_clk);
+		goto err_remove_config_dt;
 	}
 
 	/* Optional regulator for PHY */
 	gmac->regulator = devm_regulator_get_optional(dev, "phy");
 	if (IS_ERR(gmac->regulator)) {
-		if (PTR_ERR(gmac->regulator) == -EPROBE_DEFER)
-			return -EPROBE_DEFER;
+		if (PTR_ERR(gmac->regulator) == -EPROBE_DEFER) {
+			ret = -EPROBE_DEFER;
+			goto err_remove_config_dt;
+		}
 		dev_info(dev, "no regulator found\n");
 		gmac->regulator = NULL;
 	}
@@ -151,11 +156,18 @@ static int sun7i_gmac_probe(struct platform_device *pdev)
 
 	ret = sun7i_gmac_init(pdev, plat_dat->bsp_priv);
 	if (ret)
-		return ret;
+		goto err_remove_config_dt;
 
 	ret = stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
 	if (ret)
-		sun7i_gmac_exit(pdev, plat_dat->bsp_priv);
+		goto err_gmac_exit;
+
+	return 0;
+
+err_gmac_exit:
+	sun7i_gmac_exit(pdev, plat_dat->bsp_priv);
+err_remove_config_dt:
+	stmmac_remove_config_dt(pdev, plat_dat);
 
 	return ret;
 }
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 1f9ec02fa7f8..caf069a465f2 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -3416,7 +3416,6 @@ int stmmac_dvr_remove(struct device *dev)
 	stmmac_set_mac(priv->ioaddr, false);
 	netif_carrier_off(ndev);
 	unregister_netdev(ndev);
-	of_node_put(priv->plat->phy_node);
 	if (priv->stmmac_rst)
 		reset_control_assert(priv->stmmac_rst);
 	clk_disable_unprepare(priv->pclk);
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index bcbf123d5ba2..a840818bf4df 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -305,7 +305,7 @@ stmmac_probe_config_dt(struct platform_device *pdev, const char **mac)
 		dma_cfg = devm_kzalloc(&pdev->dev, sizeof(*dma_cfg),
 				       GFP_KERNEL);
 		if (!dma_cfg) {
-			of_node_put(plat->phy_node);
+			stmmac_remove_config_dt(pdev, plat);
 			return ERR_PTR(-ENOMEM);
 		}
 		plat->dma_cfg = dma_cfg;
@@ -328,14 +328,37 @@ stmmac_probe_config_dt(struct platform_device *pdev, const char **mac)
 
 	return plat;
 }
+
+/**
+ * stmmac_remove_config_dt - undo the effects of stmmac_probe_config_dt()
+ * @pdev: platform_device structure
+ * @plat: driver data platform structure
+ *
+ * Release resources claimed by stmmac_probe_config_dt().
+ */
+void stmmac_remove_config_dt(struct platform_device *pdev,
+			     struct plat_stmmacenet_data *plat)
+{
+	struct device_node *np = pdev->dev.of_node;
+
+	if (of_phy_is_fixed_link(np))
+		of_phy_deregister_fixed_link(np);
+	of_node_put(plat->phy_node);
+}
 #else
 struct plat_stmmacenet_data *
 stmmac_probe_config_dt(struct platform_device *pdev, const char **mac)
 {
 	return ERR_PTR(-ENOSYS);
 }
+
+void stmmac_remove_config_dt(struct platform_device *pdev,
+			     struct plat_stmmacenet_data *plat)
+{
+}
 #endif /* CONFIG_OF */
 EXPORT_SYMBOL_GPL(stmmac_probe_config_dt);
+EXPORT_SYMBOL_GPL(stmmac_remove_config_dt);
 
 int stmmac_get_platform_resources(struct platform_device *pdev,
 				  struct stmmac_resources *stmmac_res)
@@ -391,10 +414,13 @@ int stmmac_pltfr_remove(struct platform_device *pdev)
 {
 	struct net_device *ndev = platform_get_drvdata(pdev);
 	struct stmmac_priv *priv = netdev_priv(ndev);
+	struct plat_stmmacenet_data *plat = priv->plat;
 	int ret = stmmac_dvr_remove(&pdev->dev);
 
-	if (priv->plat->exit)
-		priv->plat->exit(pdev, priv->plat->bsp_priv);
+	if (plat->exit)
+		plat->exit(pdev, plat->bsp_priv);
+
+	stmmac_remove_config_dt(pdev, plat);
 
 	return ret;
 }
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.h b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.h
index 64e147f53a9c..b72eb0de57b7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.h
@@ -23,6 +23,8 @@
 
 struct plat_stmmacenet_data *
 stmmac_probe_config_dt(struct platform_device *pdev, const char **mac);
+void stmmac_remove_config_dt(struct platform_device *pdev,
+			     struct plat_stmmacenet_data *plat);
 
 int stmmac_get_platform_resources(struct platform_device *pdev,
 				  struct stmmac_resources *stmmac_res);
-- 
2.7.3

^ permalink raw reply related

* [PATCH net 5/7] net: ethernet: stmmac: dwmac-meson8b: fix probe error path
From: Johan Hovold @ 2016-11-30 14:29 UTC (permalink / raw)
  To: David S. Miller
  Cc: Giuseppe Cavallaro, Alexandre Torgue, Joachim Eastwood,
	Carlo Caione, Kevin Hilman, Maxime Coquelin, Maxime Ripard,
	Chen-Yu Tsai, netdev, linux-kernel, Johan Hovold
In-Reply-To: <1480516195-27696-1-git-send-email-johan@kernel.org>

Make sure to disable clocks before returning on late probe errors.

Fixes: 566e82516253 ("net: stmmac: add a glue driver for the Amlogic
Meson 8b / GXBB DWMAC")
Signed-off-by: Johan Hovold <johan@kernel.org>
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
index 250e4ceafc8d..45e7aaf0170d 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
@@ -289,7 +289,16 @@ static int meson8b_dwmac_probe(struct platform_device *pdev)
 
 	plat_dat->bsp_priv = dwmac;
 
-	return stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
+	ret = stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
+	if (ret)
+		goto err_clk_disable;
+
+	return 0;
+
+err_clk_disable:
+	clk_disable_unprepare(dwmac->m25_div_clk);
+
+	return ret;
 }
 
 static int meson8b_dwmac_remove(struct platform_device *pdev)
-- 
2.7.3

^ permalink raw reply related

* [PATCH net 4/7] net: ethernet: stmmac: dwmac-generic: fix probe error path
From: Johan Hovold @ 2016-11-30 14:29 UTC (permalink / raw)
  To: David S. Miller
  Cc: Giuseppe Cavallaro, Alexandre Torgue, Joachim Eastwood,
	Carlo Caione, Kevin Hilman, Maxime Coquelin, Maxime Ripard,
	Chen-Yu Tsai, netdev, linux-kernel, Johan Hovold
In-Reply-To: <1480516195-27696-1-git-send-email-johan@kernel.org>

Make sure to call any exit() callback to undo the effect of init()
before returning on late probe errors.

Fixes: cf3f047b9af4 ("stmmac: move hw init in the probe (v2)")
Signed-off-by: Johan Hovold <johan@kernel.org>
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c
index b1e5f24708c9..05e46a82cdb1 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c
@@ -53,7 +53,17 @@ static int dwmac_generic_probe(struct platform_device *pdev)
 			return ret;
 	}
 
-	return stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
+	ret = stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
+	if (ret)
+		goto err_exit;
+
+	return 0;
+
+err_exit:
+	if (plat_dat->exit)
+		plat_dat->exit(pdev, plat_dat->bsp_priv);
+
+	return ret;
 }
 
 static const struct of_device_id dwmac_generic_match[] = {
-- 
2.7.3

^ permalink raw reply related

* [PATCH net 1/7] net: ethernet: stmmac: dwmac-socfpga: fix use-after-free on probe errors
From: Johan Hovold @ 2016-11-30 14:29 UTC (permalink / raw)
  To: David S. Miller
  Cc: Giuseppe Cavallaro, Alexandre Torgue, Joachim Eastwood,
	Carlo Caione, Kevin Hilman, Maxime Coquelin, Maxime Ripard,
	Chen-Yu Tsai, netdev, linux-kernel, Johan Hovold
In-Reply-To: <1480516195-27696-1-git-send-email-johan@kernel.org>

Make sure to call stmmac_dvr_remove() before returning on late probe
errors so that memory is freed, clocks are disabled, and the netdev is
deregistered before its resources go away.

Fixes: 3c201b5a84ed ("net: stmmac: socfpga: Remove re-registration of
reset controller")
Signed-off-by: Johan Hovold <johan@kernel.org>
---
 .../net/ethernet/stmicro/stmmac/dwmac-socfpga.c    | 29 ++++++++++++++--------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
index bec6963ac71e..47db157da3e8 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
@@ -304,6 +304,8 @@ static int socfpga_dwmac_probe(struct platform_device *pdev)
 	struct device		*dev = &pdev->dev;
 	int			ret;
 	struct socfpga_dwmac	*dwmac;
+	struct net_device	*ndev;
+	struct stmmac_priv	*stpriv;
 
 	ret = stmmac_get_platform_resources(pdev, &stmmac_res);
 	if (ret)
@@ -327,19 +329,26 @@ static int socfpga_dwmac_probe(struct platform_device *pdev)
 	plat_dat->fix_mac_speed = socfpga_dwmac_fix_mac_speed;
 
 	ret = stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
+	if (ret)
+		return ret;
 
-	if (!ret) {
-		struct net_device *ndev = platform_get_drvdata(pdev);
-		struct stmmac_priv *stpriv = netdev_priv(ndev);
+	ndev = platform_get_drvdata(pdev);
+	stpriv = netdev_priv(ndev);
 
-		/* The socfpga driver needs to control the stmmac reset to
-		 * set the phy mode. Create a copy of the core reset handel
-		 * so it can be used by the driver later.
-		 */
-		dwmac->stmmac_rst = stpriv->stmmac_rst;
+	/* The socfpga driver needs to control the stmmac reset to set the phy
+	 * mode. Create a copy of the core reset handle so it can be used by
+	 * the driver later.
+	 */
+	dwmac->stmmac_rst = stpriv->stmmac_rst;
 
-		ret = socfpga_dwmac_set_phy_mode(dwmac);
-	}
+	ret = socfpga_dwmac_set_phy_mode(dwmac);
+	if (ret)
+		goto err_dvr_remove;
+
+	return 0;
+
+err_dvr_remove:
+	stmmac_dvr_remove(&pdev->dev);
 
 	return ret;
 }
-- 
2.7.3

^ permalink raw reply related

* [PATCH 2/2] net: ethernet: altera_tse: add support for SGMII PCS
From: Neill Whillans @ 2016-11-30 13:41 UTC (permalink / raw)
  To: netdev, linux-kernel, linux-kernel, nios2-dev, vbridger,
	f.fainelli
In-Reply-To: <cover.1480497966.git.neill.whillans@codethink.co.uk>

Add support for the (optional) SGMII PCS functionality of the Altera
TSE MAC. If the phy-mode is set to 'sgmii' then we attempt to discover
and initialise the PCS so that the MAC can communicate to the PHY.

The PCS IP block provides a scratch register for testing presence of
the PCS, which is mapped into one of the two MDIO spaces present in
the MAC's register space.  Once we have determined that the scratch
register is functioning, we attempt to initialise the PCS to
auto-negotiate an SGMII link with the PHY. There is no need to monitor
or manage the SGMII link beyond this, since the normal PHY MDIO will
then be used to monitor the media layer.

The Altera TSE MAC has only one way in which it can be configured with an
SGMII PCS, and as such, this patch only looks to the phy-mode to select
whether or not to attempt to initialise the PCS registers.  During
initialisation, we report the PCS's equivalent of a PHY ID register.
This can be parameterised during the IP instantiation and is often left
as '0x00000000' which is not an error.

Signed-off-by: Neill Whillans <neill.whillans@codethink.co.uk>
Reviewed-by: Daniel Silverstone <daniel.silverstone@codethink.co.uk>
---
 drivers/net/ethernet/altera/altera_tse.h      | 11 ++++
 drivers/net/ethernet/altera/altera_tse_main.c | 91 +++++++++++++++++++++++++++
 2 files changed, 102 insertions(+)

diff --git a/drivers/net/ethernet/altera/altera_tse.h b/drivers/net/ethernet/altera/altera_tse.h
index e005200..446e778 100644
--- a/drivers/net/ethernet/altera/altera_tse.h
+++ b/drivers/net/ethernet/altera/altera_tse.h
@@ -120,6 +120,17 @@
 #define MAC_CMDCFG_DISABLE_READ_TIMEOUT_GET(v)	GET_BIT_VALUE(v, 27)
 #define MAC_CMDCFG_CNT_RESET_GET(v)		GET_BIT_VALUE(v, 31)
 
+/* SGMII PCS register addresses
+ */
+#define SGMII_PCS_SCRATCH	0x10
+#define SGMII_PCS_REV		0x11
+#define SGMII_PCS_LINK_TIMER_0	0x12
+#define SGMII_PCS_LINK_TIMER_1	0x13
+#define SGMII_PCS_IF_MODE	0x14
+#define SGMII_PCS_DIS_READ_TO	0x15
+#define SGMII_PCS_READ_TO	0x16
+#define SGMII_PCS_SW_RESET_TIMEOUT 100 /* usecs */
+
 /* MDIO registers within MAC register Space
  */
 struct altera_tse_mdio {
diff --git a/drivers/net/ethernet/altera/altera_tse_main.c b/drivers/net/ethernet/altera/altera_tse_main.c
index bda31f3..afecdd1 100644
--- a/drivers/net/ethernet/altera/altera_tse_main.c
+++ b/drivers/net/ethernet/altera/altera_tse_main.c
@@ -37,6 +37,7 @@
 #include <linux/io.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
+#include <linux/mii.h>
 #include <linux/netdevice.h>
 #include <linux/of_device.h>
 #include <linux/of_mdio.h>
@@ -96,6 +97,27 @@ static inline u32 tse_tx_avail(struct altera_tse_private *priv)
 	return priv->tx_cons + priv->tx_ring_size - priv->tx_prod - 1;
 }
 
+/* PCS Register read/write functions
+ */
+static u16 sgmii_pcs_read(struct altera_tse_private *priv, int regnum)
+{
+	return csrrd32(priv->mac_dev,
+		       tse_csroffs(mdio_phy0) + regnum * 4) & 0xffff;
+}
+
+static void sgmii_pcs_write(struct altera_tse_private *priv, int regnum,
+				u16 value)
+{
+	csrwr32(value, priv->mac_dev, tse_csroffs(mdio_phy0) + regnum * 4);
+}
+
+/* Check PCS scratch memory */
+static int sgmii_pcs_scratch_test(struct altera_tse_private *priv, u16 value)
+{
+	sgmii_pcs_write(priv, SGMII_PCS_SCRATCH, value);
+	return (sgmii_pcs_read(priv, SGMII_PCS_SCRATCH) == value);
+}
+
 /* MDIO specific functions
  */
 static int altera_tse_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
@@ -1092,6 +1114,66 @@ static void tse_set_rx_mode(struct net_device *dev)
 	spin_unlock(&priv->mac_cfg_lock);
 }
 
+/* Initialise (if necessary) the SGMII PCS component
+ */
+static int init_sgmii_pcs(struct net_device *dev)
+{
+	struct altera_tse_private *priv = netdev_priv(dev);
+	int n;
+	unsigned int tmp_reg = 0;
+
+	if (priv->phy_iface != PHY_INTERFACE_MODE_SGMII)
+		return 0; /* Nothing to do, not in SGMII mode */
+
+	/* The TSE SGMII PCS block looks a little like a PHY, it is
+	 * mapped into the zeroth MDIO space of the MAC and it has
+	 * ID registers like a PHY would.  Sadly this is often
+	 * configured to zeroes, so don't be surprised if it does
+	 * show 0x00000000.
+	 */
+
+	if (sgmii_pcs_scratch_test(priv, 0x0000) &&
+		sgmii_pcs_scratch_test(priv, 0xffff) &&
+		sgmii_pcs_scratch_test(priv, 0xa5a5) &&
+		sgmii_pcs_scratch_test(priv, 0x5a5a)) {
+		netdev_info(dev, "PCS PHY ID: 0x%04x%04x\n",
+				sgmii_pcs_read(priv, MII_PHYSID1),
+				sgmii_pcs_read(priv, MII_PHYSID2));
+	} else {
+		netdev_err(dev, "SGMII PCS Scratch memory test failed.\n");
+		return -ENOMEM;
+	}
+
+	/* Starting on page 5-29 of the MegaCore Function User Guide
+	 * Set SGMII Link timer to 1.6ms
+	 */
+	sgmii_pcs_write(priv, SGMII_PCS_LINK_TIMER_0, 0x0D40);
+	sgmii_pcs_write(priv, SGMII_PCS_LINK_TIMER_1, 0x03);
+
+	/* Enable SGMII Interface and Enable SGMII Auto Negotiation */
+	sgmii_pcs_write(priv, SGMII_PCS_IF_MODE, 0x3);
+
+	/* Enable Autonegotiation */
+	tmp_reg = sgmii_pcs_read(priv, MII_BMCR);
+	tmp_reg |= (BMCR_SPEED1000 | BMCR_FULLDPLX | BMCR_ANENABLE);
+	sgmii_pcs_write(priv, MII_BMCR, tmp_reg);
+
+	/* Reset PCS block */
+	tmp_reg |= BMCR_RESET;
+	sgmii_pcs_write(priv, MII_BMCR, tmp_reg);
+	for (n = 0; n < SGMII_PCS_SW_RESET_TIMEOUT; n++) {
+		if (!(sgmii_pcs_read(priv, MII_BMCR) & BMCR_RESET)) {
+			netdev_info(dev, "SGMII PCS block initialised OK\n");
+			return 0;
+		}
+		udelay(1);
+	}
+
+	/* We failed to reset the block, return a timeout */
+	netdev_err(dev, "SGMII PCS block reset failed.\n");
+	return -ETIMEDOUT;
+}
+
 /* Open and initialize the interface
  */
 static int tse_open(struct net_device *dev)
@@ -1116,6 +1198,15 @@ static int tse_open(struct net_device *dev)
 		netdev_warn(dev, "TSE revision %x\n", priv->revision);
 
 	spin_lock(&priv->mac_cfg_lock);
+	/* no-op if MAC not operating in SGMII mode*/
+	ret = init_sgmii_pcs(dev);
+	if (ret) {
+		netdev_err(dev,
+			   "Cannot init the SGMII PCS (error: %d)\n", ret);
+		spin_unlock(&priv->mac_cfg_lock);
+		goto phy_error;
+	}
+
 	ret = reset_mac(priv);
 	/* Note that reset_mac will fail if the clocks are gated by the PHY
 	 * due to the PHY being put into isolation or power down mode.
-- 
2.1.4

^ permalink raw reply related

* [PATCH 1/2] net: phy: vitesse: add support for VSC8572
From: Neill Whillans @ 2016-11-30 13:41 UTC (permalink / raw)
  To: netdev, linux-kernel, linux-kernel, nios2-dev, vbridger,
	f.fainelli
In-Reply-To: <cover.1480497966.git.neill.whillans@codethink.co.uk>

From: Stephen Agate <stephen.agate@uk.thalesgroup.com>

Add support for the Vitesse VSC8572 which is functionally equivalent to
the already supported VSC8574. As such, all the same handling functions
are used since the VSC8572 merely has half the number of phy blocks
internally.

Signed-off-by: Stephen Agate <stephen.agate@uk.thalesgroup.com>
Signed-off-by: Neill Whillans <neill.whillans@codethink.co.uk>
Reviewed-by: Daniel Silverstone <daniel.silverstone@codethink.co.uk>
---
 drivers/net/phy/vitesse.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/net/phy/vitesse.c b/drivers/net/phy/vitesse.c
index 24b4a09..f78ff02 100644
--- a/drivers/net/phy/vitesse.c
+++ b/drivers/net/phy/vitesse.c
@@ -69,6 +69,7 @@
 #define PHY_ID_VSC8234			0x000fc620
 #define PHY_ID_VSC8244			0x000fc6c0
 #define PHY_ID_VSC8514			0x00070670
+#define PHY_ID_VSC8572			0x000704d0
 #define PHY_ID_VSC8574			0x000704a0
 #define PHY_ID_VSC8601			0x00070420
 #define PHY_ID_VSC8662			0x00070660
@@ -166,6 +167,7 @@ static int vsc82xx_config_intr(struct phy_device *phydev)
 			(phydev->drv->phy_id == PHY_ID_VSC8234 ||
 			 phydev->drv->phy_id == PHY_ID_VSC8244 ||
 			 phydev->drv->phy_id == PHY_ID_VSC8514 ||
+			 phydev->drv->phy_id == PHY_ID_VSC8572 ||
 			 phydev->drv->phy_id == PHY_ID_VSC8574 ||
 			 phydev->drv->phy_id == PHY_ID_VSC8601) ?
 				MII_VSC8244_IMASK_MASK :
@@ -291,6 +293,17 @@ static struct phy_driver vsc82xx_driver[] = {
 	.ack_interrupt	= &vsc824x_ack_interrupt,
 	.config_intr	= &vsc82xx_config_intr,
 }, {
+	.phy_id         = PHY_ID_VSC8572,
+	.name           = "Vitesse VSC8572",
+	.phy_id_mask    = 0x000ffff0,
+	.features       = PHY_GBIT_FEATURES,
+	.flags          = PHY_HAS_INTERRUPT,
+	.config_init    = &vsc824x_config_init,
+	.config_aneg    = &vsc82x4_config_aneg,
+	.read_status    = &genphy_read_status,
+	.ack_interrupt  = &vsc824x_ack_interrupt,
+	.config_intr    = &vsc82xx_config_intr,
+}, {
 	.phy_id         = PHY_ID_VSC8574,
 	.name           = "Vitesse VSC8574",
 	.phy_id_mask    = 0x000ffff0,
@@ -355,6 +368,7 @@ static struct mdio_device_id __maybe_unused vitesse_tbl[] = {
 	{ PHY_ID_VSC8234, 0x000ffff0 },
 	{ PHY_ID_VSC8244, 0x000fffc0 },
 	{ PHY_ID_VSC8514, 0x000ffff0 },
+	{ PHY_ID_VSC8572, 0x000ffff0 },
 	{ PHY_ID_VSC8574, 0x000ffff0 },
 	{ PHY_ID_VSC8662, 0x000ffff0 },
 	{ PHY_ID_VSC8221, 0x000ffff0 },
-- 
2.1.4

^ permalink raw reply related

* Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters
From: Jesper Dangaard Brouer @ 2016-11-30 14:08 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: brouer, David Miller, netdev, Tariq Toukan
In-Reply-To: <1480088780.8455.543.camel@edumazet-glaptop3.roam.corp.google.com>


On Fri, 25 Nov 2016 07:46:20 -0800 Eric Dumazet <eric.dumazet@gmail.com> wrote:

> From: Eric Dumazet <edumazet@google.com>

Ended up-in net-next as:

 commit 40931b85113dad7881d49e8759e5ad41d30a5e6c
 Author: Eric Dumazet <edumazet@google.com>
 Date:   Fri Nov 25 07:46:20 2016 -0800

    mlx4: give precise rx/tx bytes/packets counters
    
    mlx4 stats are chaotic because a deferred work queue is responsible
    to update them every 250 ms.

Likely after this patch I get this crash (below), when rebooting my machine.
Looks like a device removal order thing.
Tested with net-next at commit 93ba22225504.   

[...]
[ 1967.248453] mlx5_core 0000:02:00.1: Shutdown was called
[ 1967.854556] mlx5_core 0000:02:00.0: Shutdown was called
[ 1968.443015] e1000e: EEE TX LPI TIMER: 00000011
[ 1968.484676] sd 3:0:0:0: [sda] Synchronizing SCSI cache
[ 1968.528354] mlx4_core 0000:01:00.0: mlx4_shutdown was called
[ 1968.534054] mlx4_en: mlx4p1: Close port called
[ 1968.571156] mlx4_en 0000:01:00.0: removed PHC
[ 1968.575677] mlx4_en: mlx4p2: Close port called
[ 1969.506602] BUG: unable to handle kernel NULL pointer dereference at 0000000000000d08
[ 1969.514530] IP: [<ffffffffa0127ca4>] mlx4_en_fold_software_stats.part.1+0x34/0xb0 [mlx4_en]
[ 1969.522963] PGD 0 [ 1969.524803] 
[ 1969.526332] Oops: 0000 [#1] PREEMPT SMP
[ 1969.530201] Modules linked in: coretemp kvm_intel kvm irqbypass intel_cstate mxm_wmi i2c_i801 intel_rapl_perf i2c_smbus sg pcspkr i2c_core shpchp nfsd wmi video acpi_pad auth_rpcgss oid_registry nfs_acl lockd grace sunrpc ip_tables x_tables mlx4_en e1000e mlx5_core ptp serio_raw sd_mod mlx4_core pps_core devlink hid_generic
[ 1969.559616] CPU: 3 PID: 3104 Comm: kworker/3:1 Not tainted 4.9.0-rc6-net-next3-01390-g93ba22225504 #12
[ 1969.568984] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z97 Extreme4, BIOS P2.10 05/12/2015
[ 1969.578877] Workqueue: events linkwatch_event
[ 1969.583285] task: ffff8803f42a0000 task.stack: ffff88040b2d0000
[ 1969.589238] RIP: 0010:[<ffffffffa0127ca4>]  [<ffffffffa0127ca4>] mlx4_en_fold_software_stats.part.1+0x34/0xb0 [mlx4_en]
[ 1969.600102] RSP: 0018:ffff88040b2d3bd8  EFLAGS: 00010282
[ 1969.605442] RAX: ffff8803f432efc8 RBX: ffff8803f4320000 RCX: 0000000000000000
[ 1969.612604] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8803f4320000
[ 1969.619772] RBP: ffff88040b2d3bd8 R08: 000000000000000c R09: ffff8803f432f000
[ 1969.626938] R10: 0000000000000000 R11: ffff88040d64ac00 R12: ffff8803e5aff8dc
[ 1969.634104] R13: ffff8803f4320a28 R14: ffff8803e5aff800 R15: 0000000000000000
[ 1969.641273] FS:  0000000000000000(0000) GS:ffff88041fac0000(0000) knlGS:0000000000000000
[ 1969.649422] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1969.655197] CR2: 0000000000000d08 CR3: 0000000001c07000 CR4: 00000000001406e0
[ 1969.662366] Stack:
[ 1969.664412]  ffff88040b2d3be8 ffffffffa0127f8e ffff88040b2d3c10 ffffffffa012a23b
[ 1969.671948]  ffff8803e5aff8dc ffff8803f4320000 ffff8803f4320000 ffff88040b2d3c30
[ 1969.679478]  ffffffff8160ae29 ffff8803e5aff8d8 ffff8804088ff300 ffff88040b2d3c58
[ 1969.687001] Call Trace:
[ 1969.689484]  [<ffffffffa0127f8e>] mlx4_en_fold_software_stats+0x1e/0x20 [mlx4_en]
[ 1969.697026]  [<ffffffffa012a23b>] mlx4_en_get_stats64+0x2b/0x50 [mlx4_en]
[ 1969.703844]  [<ffffffff8160ae29>] dev_get_stats+0x39/0xa0
[ 1969.709274]  [<ffffffff81622470>] rtnl_fill_stats+0x40/0x130
[ 1969.714968]  [<ffffffff8162631b>] rtnl_fill_ifinfo+0x55b/0x1010
[ 1969.720921]  [<ffffffff816285d3>] rtmsg_ifinfo_build_skb+0x73/0xd0
[ 1969.727136]  [<ffffffff81628646>] rtmsg_ifinfo.part.25+0x16/0x50
[ 1969.733176]  [<ffffffff81628698>] rtmsg_ifinfo+0x18/0x20
[ 1969.738522]  [<ffffffff8160e947>] netdev_state_change+0x47/0x50
[ 1969.744478]  [<ffffffff81629018>] linkwatch_do_dev+0x38/0x50
[ 1969.750170]  [<ffffffff81629257>] __linkwatch_run_queue+0xe7/0x160
[ 1969.756385]  [<ffffffff816292f5>] linkwatch_event+0x25/0x30
[ 1969.761991]  [<ffffffff8107b3cb>] process_one_work+0x15b/0x460
[ 1969.767857]  [<ffffffff8107b71e>] worker_thread+0x4e/0x480
[ 1969.773378]  [<ffffffff8107b6d0>] ? process_one_work+0x460/0x460
[ 1969.779420]  [<ffffffff8107b6d0>] ? process_one_work+0x460/0x460
[ 1969.785460]  [<ffffffff810811ea>] kthread+0xca/0xe0
[ 1969.790372]  [<ffffffff81081120>] ? kthread_worker_fn+0x120/0x120
[ 1969.796495]  [<ffffffff817302d2>] ret_from_fork+0x22/0x30
[ 1969.801924] Code: 00 00 55 48 89 e5 85 d2 0f 84 90 00 00 00 83 ea 01 31 c9 31 f6 48 8d 87 c0 ef 00 00 4c 8d 8c d7 c8 ef 00 00 48 8b 10 48 83 c0 08 <4c> 8b 82 08 0d 00 00 48 8b 92 00 0d 00 00 4c 01 c6 48 01 d1 4c 
[ 1969.821969] RIP  [<ffffffffa0127ca4>] mlx4_en_fold_software_stats.part.1+0x34/0xb0 [mlx4_en]
[ 1969.830486]  RSP <ffff88040b2d3bd8>
[ 1969.834002] CR2: 0000000000000d08
[ 1969.837440] ---[ end trace 80b9fbc1e7baed9b ]---
[ 1969.842102] Kernel panic - not syncing: Fatal exception in interrupt
[ 1969.848520] Kernel Offset: disabled
[ 1969.852050] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
(END)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH net 2/2] macvtap: handle ubuf refcount correctly when meet erros
From: Michael S. Tsirkin @ 2016-11-30 13:58 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, wangyunjian
In-Reply-To: <1480483072-14201-2-git-send-email-jasowang@redhat.com>

On Wed, Nov 30, 2016 at 01:17:52PM +0800, Jason Wang wrote:
> We trigger uarg->callback() immediately after we decide do datacopy
> even if caller want to do zerocopy. This will cause the callback
> (vhost_net_zerocopy_callback) decrease the refcount. But when we meet
> an error afterwards, the error handling in vhost handle_tx() will try
> to decrease it again. This is wrong and fix this by delay the
> uarg->callback() until we're sure there's no errors.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
> The patch is needed for -stable.
> ---
>  drivers/net/macvtap.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index bceca28..7869b06 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -742,13 +742,8 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m,
>  
>  	if (zerocopy)
>  		err = zerocopy_sg_from_iter(skb, from);
> -	else {
> +	else
>  		err = skb_copy_datagram_from_iter(skb, 0, from, len);
> -		if (!err && m && m->msg_control) {
> -			struct ubuf_info *uarg = m->msg_control;
> -			uarg->callback(uarg, false);
> -		}
> -	}
>  
>  	if (err)
>  		goto err_kfree;
> @@ -779,7 +774,11 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m,
>  		skb_shinfo(skb)->destructor_arg = m->msg_control;
>  		skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
>  		skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
> +	} else if (m && m->msg_control) {
> +		struct ubuf_info *uarg = m->msg_control;
> +		uarg->callback(uarg, false);
>  	}
> +
>  	if (vlan) {
>  		skb->dev = vlan->dev;
>  		dev_queue_xmit(skb);
> -- 
> 2.7.4

^ permalink raw reply

* Re: [PATCH net 1/2] tun: handle ubuf refcount correctly when meet erros
From: Michael S. Tsirkin @ 2016-11-30 13:57 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, wangyunjian
In-Reply-To: <1480483072-14201-1-git-send-email-jasowang@redhat.com>

On Wed, Nov 30, 2016 at 01:17:51PM +0800, Jason Wang wrote:
> We trigger uarg->callback() immediately after we decide do datacopy
> even if caller want to do zerocopy. This will cause the callback
> (vhost_net_zerocopy_callback) decrease the refcount. But when we meet
> an error afterwards, the error handling in vhost handle_tx() will try
> to decrease it again. This is wrong and fix this by delay the
> uarg->callback() until we're sure there's no errors.
> 
> Reported-by: wangyunjian <wangyunjian@huawei.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
> The patch is needed for -stable.
> ---
>  drivers/net/tun.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 8093e39..db6acec 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1246,13 +1246,8 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>  
>  	if (zerocopy)
>  		err = zerocopy_sg_from_iter(skb, from);
> -	else {
> +	else
>  		err = skb_copy_datagram_from_iter(skb, 0, from, len);
> -		if (!err && msg_control) {
> -			struct ubuf_info *uarg = msg_control;
> -			uarg->callback(uarg, false);
> -		}
> -	}
>  
>  	if (err) {
>  		this_cpu_inc(tun->pcpu_stats->rx_dropped);
> @@ -1298,6 +1293,9 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>  		skb_shinfo(skb)->destructor_arg = msg_control;
>  		skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
>  		skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
> +	} else if (msg_control) {
> +		struct ubuf_info *uarg = msg_control;
> +		uarg->callback(uarg, false);
>  	}
>  
>  	skb_reset_network_header(skb);
> -- 
> 2.7.4

^ permalink raw reply

* Re: DSA vs. SWTICHDEV ?
From: Andrew Lunn @ 2016-11-30 13:52 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: netdev@vger.kernel.org
In-Reply-To: <1480495831.3563.135.camel@infinera.com>

On Wed, Nov 30, 2016 at 08:50:34AM +0000, Joakim Tjernlund wrote:
> I am trying to wrap my head around these two "devices" and have a hard time telling them apart.
> We are looking att adding a faily large switch(over PCIe) to our board and from what I can tell
> switchdev is the new way to do it but DSA is still there. Is it possible to just list
> how they differ?

Hi Joakim

If the interface you use to send frames from the host to the switch is
PCIe, you probably want to use switchdev directly.

DSA devices all use a host Ethernet interface to send frames to the
switch. DSA sits under switchdev, and effectively provides a lot of
the common stuff needed for implementing switch drivers of this
sort. It creates the slave interfaces, links the MAC to the PHY, has
one uniform device tree binding which all DSA switches have, deals
with encapsulation/decapsulating frames sent over the master device,
etc.

	    Andrew

^ permalink raw reply

* Re: [PATCH v3 net-next 2/3] openvswitch: Use is_skb_forwardable() for length check.
From: Jiri Benc @ 2016-11-30 13:51 UTC (permalink / raw)
  To: Jarno Rajahalme; +Cc: netdev, pshelar, e
In-Reply-To: <1480462253-114713-2-git-send-email-jarno@ovn.org>

On Tue, 29 Nov 2016 15:30:52 -0800, Jarno Rajahalme wrote:
> @@ -504,11 +485,20 @@ void ovs_vport_send(struct vport *vport, struct sk_buff *skb, u8 mac_proto)
>  		goto drop;
>  	}
>  
> -	if (unlikely(packet_length(skb, vport->dev) > mtu &&
> -		     !skb_is_gso(skb))) {
> -		net_warn_ratelimited("%s: dropped over-mtu packet: %d > %d\n",
> -				     vport->dev->name,
> -				     packet_length(skb, vport->dev), mtu);
> +	if (unlikely(!is_skb_forwardable(vport->dev, skb))) {

How does this work when the vlan tag is accelerated? Then we can be
over MTU, yet the check will pass.

 Jiri

^ permalink raw reply

* Re: [WIP] net+mlx4: auto doorbell
From: Saeed Mahameed @ 2016-11-30 13:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jesper Dangaard Brouer, Rick Jones, Linux Netdev List,
	Saeed Mahameed, Tariq Toukan
In-Reply-To: <1480402716.18162.124.camel@edumazet-glaptop3.roam.corp.google.com>

On Tue, Nov 29, 2016 at 8:58 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2016-11-21 at 10:10 -0800, Eric Dumazet wrote:
>
>
>> Not sure it this has been tried before, but the doorbell avoidance could
>> be done by the driver itself, because it knows a TX completion will come
>> shortly (well... if softirqs are not delayed too much !)
>>
>> Doorbell would be forced only if :
>>
>> (    "skb->xmit_more is not set" AND "TX engine is not 'started yet'" )
>> OR
>> ( too many [1] packets were put in TX ring buffer, no point deferring
>> more)
>>
>> Start the pump, but once it is started, let the doorbells being done by
>> TX completion.
>>
>> ndo_start_xmit and TX completion handler would have to maintain a shared
>> state describing if packets were ready but doorbell deferred.
>>
>>
>> Note that TX completion means "if at least one packet was drained",
>> otherwise busy polling, constantly calling napi->poll() would force a
>> doorbell too soon for devices sharing a NAPI for both RX and TX.
>>
>> But then, maybe busy poll would like to force a doorbell...
>>
>> I could try these ideas on mlx4 shortly.
>>
>>
>> [1] limit could be derived from active "ethtool -c" params, eg tx-frames
>
> I have a WIP, that increases pktgen rate by 75 % on mlx4 when bulking is
> not used.

Hi Eric, Nice Idea indeed and we need something like this,
today we almost don't exploit the TX bulking at all.

But please see below, i am not sure different contexts should share
the doorbell ringing, it is really risky.

>  drivers/net/ethernet/mellanox/mlx4/en_rx.c   |    2
>  drivers/net/ethernet/mellanox/mlx4/en_tx.c   |   90 +++++++++++------
>  drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |    4
>  include/linux/netdevice.h                    |    1
>  net/core/net-sysfs.c                         |   18 +++
>  5 files changed, 83 insertions(+), 32 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 6562f78b07f4..fbea83218fc0 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -1089,7 +1089,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
>
>         if (polled) {
>                 if (doorbell_pending)
> -                       mlx4_en_xmit_doorbell(priv->tx_ring[TX_XDP][cq->ring]);
> +                       mlx4_en_xmit_doorbell(dev, priv->tx_ring[TX_XDP][cq->ring]);
>
>                 mlx4_cq_set_ci(&cq->mcq);
>                 wmb(); /* ensure HW sees CQ consumer before we post new buffers */
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> index 4b597dca5c52..affebb435679 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> @@ -67,7 +67,7 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
>         ring->size = size;
>         ring->size_mask = size - 1;
>         ring->sp_stride = stride;
> -       ring->full_size = ring->size - HEADROOM - MAX_DESC_TXBBS;
> +       ring->full_size = ring->size - HEADROOM - 2*MAX_DESC_TXBBS;
>
>         tmp = size * sizeof(struct mlx4_en_tx_info);
>         ring->tx_info = kmalloc_node(tmp, GFP_KERNEL | __GFP_NOWARN, node);
> @@ -193,6 +193,7 @@ int mlx4_en_activate_tx_ring(struct mlx4_en_priv *priv,
>         ring->sp_cqn = cq;
>         ring->prod = 0;
>         ring->cons = 0xffffffff;
> +       ring->ncons = 0;
>         ring->last_nr_txbb = 1;
>         memset(ring->tx_info, 0, ring->size * sizeof(struct mlx4_en_tx_info));
>         memset(ring->buf, 0, ring->buf_size);
> @@ -227,9 +228,9 @@ void mlx4_en_deactivate_tx_ring(struct mlx4_en_priv *priv,
>                        MLX4_QP_STATE_RST, NULL, 0, 0, &ring->sp_qp);
>  }
>
> -static inline bool mlx4_en_is_tx_ring_full(struct mlx4_en_tx_ring *ring)
> +static inline bool mlx4_en_is_tx_ring_full(const struct mlx4_en_tx_ring *ring)
>  {
> -       return ring->prod - ring->cons > ring->full_size;
> +       return READ_ONCE(ring->prod) - READ_ONCE(ring->cons) > ring->full_size;
>  }
>
>  static void mlx4_en_stamp_wqe(struct mlx4_en_priv *priv,
> @@ -374,6 +375,7 @@ int mlx4_en_free_tx_buf(struct net_device *dev, struct mlx4_en_tx_ring *ring)
>
>         /* Skip last polled descriptor */
>         ring->cons += ring->last_nr_txbb;
> +       ring->ncons += ring->last_nr_txbb;
>         en_dbg(DRV, priv, "Freeing Tx buf - cons:0x%x prod:0x%x\n",
>                  ring->cons, ring->prod);
>
> @@ -389,6 +391,7 @@ int mlx4_en_free_tx_buf(struct net_device *dev, struct mlx4_en_tx_ring *ring)
>                                                 !!(ring->cons & ring->size), 0,
>                                                 0 /* Non-NAPI caller */);
>                 ring->cons += ring->last_nr_txbb;
> +               ring->ncons += ring->last_nr_txbb;
>                 cnt++;
>         }
>
> @@ -401,6 +404,38 @@ int mlx4_en_free_tx_buf(struct net_device *dev, struct mlx4_en_tx_ring *ring)
>         return cnt;
>  }
>
> +void mlx4_en_xmit_doorbell(const struct net_device *dev,
> +                          struct mlx4_en_tx_ring *ring)
> +{
> +
> +       if (dev->doorbell_opt & 1) {
> +               u32 oval = READ_ONCE(ring->prod_bell);
> +               u32 nval = READ_ONCE(ring->prod);
> +
> +               if (oval == nval)
> +                       return;
> +
> +               /* I can not tell yet if a cmpxchg() is needed or not */
> +               if (dev->doorbell_opt & 2)
> +                       WRITE_ONCE(ring->prod_bell, nval);
> +               else
> +                       if (cmpxchg(&ring->prod_bell, oval, nval) != oval)
> +                               return;
> +       }
> +       /* Since there is no iowrite*_native() that writes the
> +        * value as is, without byteswapping - using the one
> +        * the doesn't do byteswapping in the relevant arch
> +        * endianness.
> +        */
> +#if defined(__LITTLE_ENDIAN)
> +       iowrite32(
> +#else
> +       iowrite32be(
> +#endif
> +                 ring->doorbell_qpn,
> +                 ring->bf.uar->map + MLX4_SEND_DOORBELL);
> +}
> +
>  static bool mlx4_en_process_tx_cq(struct net_device *dev,
>                                   struct mlx4_en_cq *cq, int napi_budget)
>  {
> @@ -496,8 +531,13 @@ static bool mlx4_en_process_tx_cq(struct net_device *dev,
>         wmb();
>
>         /* we want to dirty this cache line once */
> -       ACCESS_ONCE(ring->last_nr_txbb) = last_nr_txbb;
> -       ACCESS_ONCE(ring->cons) = ring_cons + txbbs_skipped;
> +       WRITE_ONCE(ring->last_nr_txbb, last_nr_txbb);
> +       ring_cons += txbbs_skipped;
> +       WRITE_ONCE(ring->cons, ring_cons);
> +       WRITE_ONCE(ring->ncons, ring_cons + last_nr_txbb);
> +
> +       if (dev->doorbell_opt)
> +               mlx4_en_xmit_doorbell(dev, ring);
>
>         if (ring->free_tx_desc == mlx4_en_recycle_tx_desc)
>                 return done < budget;
> @@ -725,29 +765,14 @@ static void mlx4_bf_copy(void __iomem *dst, const void *src,
>         __iowrite64_copy(dst, src, bytecnt / 8);
>  }
>
> -void mlx4_en_xmit_doorbell(struct mlx4_en_tx_ring *ring)
> -{
> -       wmb();

you missed/removed this "wmb()" in the new function, why ? where did
you compensate for it ?

> -       /* Since there is no iowrite*_native() that writes the
> -        * value as is, without byteswapping - using the one
> -        * the doesn't do byteswapping in the relevant arch
> -        * endianness.
> -        */
> -#if defined(__LITTLE_ENDIAN)
> -       iowrite32(
> -#else
> -       iowrite32be(
> -#endif
> -                 ring->doorbell_qpn,
> -                 ring->bf.uar->map + MLX4_SEND_DOORBELL);
> -}
>
>  static void mlx4_en_tx_write_desc(struct mlx4_en_tx_ring *ring,
>                                   struct mlx4_en_tx_desc *tx_desc,
>                                   union mlx4_wqe_qpn_vlan qpn_vlan,
>                                   int desc_size, int bf_index,
>                                   __be32 op_own, bool bf_ok,
> -                                 bool send_doorbell)
> +                                 bool send_doorbell,
> +                                 const struct net_device *dev, int nr_txbb)
>  {
>         tx_desc->ctrl.qpn_vlan = qpn_vlan;
>
> @@ -761,6 +786,7 @@ static void mlx4_en_tx_write_desc(struct mlx4_en_tx_ring *ring,
>
>                 wmb();
>
> +               ring->prod += nr_txbb;
>                 mlx4_bf_copy(ring->bf.reg + ring->bf.offset, &tx_desc->ctrl,
>                              desc_size);
>
> @@ -773,8 +799,9 @@ static void mlx4_en_tx_write_desc(struct mlx4_en_tx_ring *ring,
>                  */
>                 dma_wmb();
>                 tx_desc->ctrl.owner_opcode = op_own;
> +               ring->prod += nr_txbb;
>                 if (send_doorbell)
> -                       mlx4_en_xmit_doorbell(ring);
> +                       mlx4_en_xmit_doorbell(dev, ring);
>                 else
>                         ring->xmit_more++;
>         }
> @@ -1017,8 +1044,6 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
>                         op_own |= cpu_to_be32(MLX4_WQE_CTRL_IIP);
>         }
>
> -       ring->prod += nr_txbb;
> -
>         /* If we used a bounce buffer then copy descriptor back into place */
>         if (unlikely(bounce))
>                 tx_desc = mlx4_en_bounce_to_desc(priv, ring, index, desc_size);
> @@ -1033,6 +1058,14 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
>         }
>         send_doorbell = !skb->xmit_more || netif_xmit_stopped(ring->tx_queue);
>
> +       /* Doorbell avoidance : We can omit doorbell if we know a TX completion
> +        * will happen shortly.
> +        */
> +       if (send_doorbell &&
> +           dev->doorbell_opt &&
> +           (s32)(READ_ONCE(ring->prod_bell) - READ_ONCE(ring->ncons)) > 0)

Aelexi already expressed his worries about synchronization, and i
think here (in this exact line) sits the problem,
what about if exactly at this point the TX completion handler just
finished and rang the last doorbell,
you didn't write the new TX descriptor yet (mlx4_en_tx_write_desc), so
the last doorbell from the CQ handler missed it.
even if you wrote the TX descriptor before the doorbell decision here,
you will need a memory barrier to make sure the HW sees
the new packet, which was typically done before ringing the doorbell.

All in all, this is risky business :),  the right way to go is to
force the upper layer to use xmit-more and delay doorbells/use bulking
but from the same context
(xmit routine).  For example see Achiad's suggestion (attached in
Jesper's response), he used stop queue to force the stack to queue up
packets (TX bulking)
which would set xmit-more and will use the next completion to release
the "stopped" ring TXQ rather than hit the doorbell on behalf of it.

> +               send_doorbell = false;
> +
>         real_size = (real_size / 16) & 0x3f;
>
>         bf_ok &= desc_size <= MAX_BF && send_doorbell;
> @@ -1043,7 +1076,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
>                 qpn_vlan.fence_size = real_size;
>
>         mlx4_en_tx_write_desc(ring, tx_desc, qpn_vlan, desc_size, bf_index,
> -                             op_own, bf_ok, send_doorbell);
> +                             op_own, bf_ok, send_doorbell, dev, nr_txbb);
>
>         if (unlikely(stop_queue)) {
>                 /* If queue was emptied after the if (stop_queue) , and before
> @@ -1054,7 +1087,6 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
>                  */
>                 smp_rmb();
>
> -               ring_cons = ACCESS_ONCE(ring->cons);
>                 if (unlikely(!mlx4_en_is_tx_ring_full(ring))) {
>                         netif_tx_wake_queue(ring->tx_queue);
>                         ring->wake_queue++;
> @@ -1158,8 +1190,6 @@ netdev_tx_t mlx4_en_xmit_frame(struct mlx4_en_rx_ring *rx_ring,
>         rx_ring->xdp_tx++;
>         AVG_PERF_COUNTER(priv->pstats.tx_pktsz_avg, length);
>
> -       ring->prod += nr_txbb;
> -
>         stop_queue = mlx4_en_is_tx_ring_full(ring);
>         send_doorbell = stop_queue ||
>                                 *doorbell_pending > MLX4_EN_DOORBELL_BUDGET;
> @@ -1173,7 +1203,7 @@ netdev_tx_t mlx4_en_xmit_frame(struct mlx4_en_rx_ring *rx_ring,
>                 qpn_vlan.fence_size = real_size;
>
>         mlx4_en_tx_write_desc(ring, tx_desc, qpn_vlan, TXBB_SIZE, bf_index,
> -                             op_own, bf_ok, send_doorbell);
> +                             op_own, bf_ok, send_doorbell, dev, nr_txbb);
>         *doorbell_pending = send_doorbell ? 0 : *doorbell_pending + 1;
>
>         return NETDEV_TX_OK;
> diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> index 574bcbb1b38f..c3fd0deda198 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> @@ -280,6 +280,7 @@ struct mlx4_en_tx_ring {
>          */
>         u32                     last_nr_txbb;
>         u32                     cons;
> +       u32                     ncons;
>         unsigned long           wake_queue;
>         struct netdev_queue     *tx_queue;
>         u32                     (*free_tx_desc)(struct mlx4_en_priv *priv,
> @@ -290,6 +291,7 @@ struct mlx4_en_tx_ring {
>
>         /* cache line used and dirtied in mlx4_en_xmit() */
>         u32                     prod ____cacheline_aligned_in_smp;
> +       u32                     prod_bell;
>         unsigned int            tx_dropped;
>         unsigned long           bytes;
>         unsigned long           packets;
> @@ -699,7 +701,7 @@ netdev_tx_t mlx4_en_xmit_frame(struct mlx4_en_rx_ring *rx_ring,
>                                struct mlx4_en_rx_alloc *frame,
>                                struct net_device *dev, unsigned int length,
>                                int tx_ind, int *doorbell_pending);
> -void mlx4_en_xmit_doorbell(struct mlx4_en_tx_ring *ring);
> +void mlx4_en_xmit_doorbell(const struct net_device *dev, struct mlx4_en_tx_ring *ring);
>  bool mlx4_en_rx_recycle(struct mlx4_en_rx_ring *ring,
>                         struct mlx4_en_rx_alloc *frame);
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 4ffcd874cc20..39565b5425a6 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1816,6 +1816,7 @@ struct net_device {
>         DECLARE_HASHTABLE       (qdisc_hash, 4);
>  #endif
>         unsigned long           tx_queue_len;
> +       unsigned long           doorbell_opt;
>         spinlock_t              tx_global_lock;
>         int                     watchdog_timeo;
>
> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index b0c04cf4851d..df05f81f5150 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -367,6 +367,23 @@ static ssize_t gro_flush_timeout_store(struct device *dev,
>  }
>  NETDEVICE_SHOW_RW(gro_flush_timeout, fmt_ulong);
>
> +static int change_doorbell_opt(struct net_device *dev, unsigned long val)
> +{
> +       dev->doorbell_opt = val;
> +       return 0;
> +}
> +
> +static ssize_t doorbell_opt_store(struct device *dev,
> +                                 struct device_attribute *attr,
> +                                 const char *buf, size_t len)
> +{
> +       if (!capable(CAP_NET_ADMIN))
> +               return -EPERM;
> +
> +       return netdev_store(dev, attr, buf, len, change_doorbell_opt);
> +}
> +NETDEVICE_SHOW_RW(doorbell_opt, fmt_ulong);
> +
>  static ssize_t ifalias_store(struct device *dev, struct device_attribute *attr,
>                              const char *buf, size_t len)
>  {
> @@ -531,6 +548,7 @@ static struct attribute *net_class_attrs[] = {
>         &dev_attr_phys_port_name.attr,
>         &dev_attr_phys_switch_id.attr,
>         &dev_attr_proto_down.attr,
> +       &dev_attr_doorbell_opt.attr,
>         NULL,
>  };
>  ATTRIBUTE_GROUPS(net_class);
>
>

^ permalink raw reply

* Re: [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
From: Michael S. Tsirkin @ 2016-11-30 13:43 UTC (permalink / raw)
  To: Jason Wang; +Cc: Yunjian Wang, netdev, linux-kernel, caihe
In-Reply-To: <e4ac063d-db9f-ec93-30cd-163375ececfc@redhat.com>

On Wed, Nov 30, 2016 at 09:07:16PM +0800, Jason Wang wrote:
> 
> 
> On 2016年11月30日 20:10, Yunjian Wang wrote:
> > When we meet an error(err=-EBADFD) recvmsg, the error handling in vhost
> > handle_rx() will continue. This will cause a soft CPU lockup in vhost thread.
> > 
> > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> > ---
> >   drivers/vhost/net.c | 3 +++
> >   1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index 5dc128a..edc470b 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -717,6 +717,9 @@ static void handle_rx(struct vhost_net *net)
> >   			pr_debug("Discarded rx packet: "
> >   				 " len %d, expected %zd\n", err, sock_len);
> >   			vhost_discard_vq_desc(vq, headcount);
> > +			/* Don't continue to do, when meet errors. */
> > +			if (err < 0)
> > +				goto out;
> >   			continue;
> >   		}
> >   		/* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */
> 
> Acked-by: Jason Wang <jasowang@redhat.com>
> 
> We may want to rename vhost_discard_vq_desc() in the future, since it does
> not discard the desc in fact.

To me this looks a bit too aggressive. I would also like commit log
to explain better what is going on, to make sure we are
not just treating the symptoms.
-- 
MST

^ permalink raw reply

* Re: [PATCH net-next v2 3/4] Documentation: net: phy: Add blurb about RGMII
From: Måns Rullgård @ 2016-11-30 13:43 UTC (permalink / raw)
  To: David Laight
  Cc: 'Florian Fainelli', Timur Tabi, netdev@vger.kernel.org,
	davem@davemloft.net, andrew@lunn.ch, sf84@laposte.net,
	martin.blumenstingl@googlemail.com, alexandre.torgue@st.com,
	peppe.cavallaro@st.com, jbrunet@baylibre.com
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DB023015E@AcuExch.aculab.com>

David Laight <David.Laight@ACULAB.COM> writes:

> From: Florian Fainelli
>> Sent: 27 November 2016 23:03
>> Le 27/11/2016  14:24, Timur Tabi a crit :
>> >> + * PHY device drivers in PHYLIB being reusable by nature, being able to
>> >> +   configure correctly a specified delay enables more designs with
>> >> similar delay
>> >> +   requirements to be operate correctly
>> >
>> > Ok, this one I don't know how to fix.  I'm not really sure what you're
>> > trying to say.
>> 
>> What I am trying to say is that once a PHY driver properly configures a
>> delay that you have specified, there is no reason why this is not
>> applicable to other platforms using this same PHY driver.
>
> As has been stated earlier it can depend on the track lengths on the
> board itself.
> (Although 1ns is about 1 foot - so track delays of that length are unlikely.)

There could be a delay element.

-- 
Måns Rullgård

^ permalink raw reply

* [PATCH 0/2] net: Add support for SGMII PCS on Altera TSE MAC
From: Neill Whillans @ 2016-11-30 13:41 UTC (permalink / raw)
  To: netdev, linux-kernel, linux-kernel, nios2-dev, vbridger,
	f.fainelli

These patches were created as part of work to add support for SGMII
PCS functionality to the Altera TSE MAC. Patches are based on 4.9-rc6
git tree.

The first patch in the series adds support for the VSC8572 dual-port
Gigabit Ethernet transceiver, used in integration testing.

The second patch adds support for the SGMII PCS functionality to the
Altera TSE driver.

 
Neill Whillans (1):
  net: ethernet: altera_tse: add support for SGMII PCS

Stephen Agate (1):
  net: phy: vitesse: add support for VSC8572

 drivers/net/ethernet/altera/altera_tse.h      | 11 ++++
 drivers/net/ethernet/altera/altera_tse_main.c | 91 +++++++++++++++++++++++++++
 drivers/net/phy/vitesse.c                     | 14 +++++
 3 files changed, 116 insertions(+)

-- 
2.1.4

^ permalink raw reply

* Re: [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
From: Michael S. Tsirkin @ 2016-11-30 13:40 UTC (permalink / raw)
  To: Yunjian Wang; +Cc: jasowang, netdev, linux-kernel, caihe
In-Reply-To: <1480507857-22976-1-git-send-email-wangyunjian@huawei.com>

On Wed, Nov 30, 2016 at 08:10:57PM +0800, Yunjian Wang wrote:
> When we meet an error(err=-EBADFD) recvmsg,

How do you get EBADFD? Won't vhost_net_rx_peek_head_len
return 0 in this case, breaking the loop?

> the error handling in vhost
> handle_rx() will continue. This will cause a soft CPU lockup in vhost thread.
> 
> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> ---
>  drivers/vhost/net.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 5dc128a..edc470b 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -717,6 +717,9 @@ static void handle_rx(struct vhost_net *net)
>  			pr_debug("Discarded rx packet: "
>  				 " len %d, expected %zd\n", err, sock_len);
>  			vhost_discard_vq_desc(vq, headcount);
> +			/* Don't continue to do, when meet errors. */
> +			if (err < 0)
> +				goto out;

You might get e.g. EAGAIN and I think you need to retry
in this case.

>  			continue;
>  		}
>  		/* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */
> -- 
> 1.9.5.msysgit.1
> 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox