Netdev List
 help / color / mirror / Atom feed
* [net-next PATCH 3/8] net: Fix netdev_fix_features so that TSO_MANGLEID is only available with TSO
From: Alexander Duyck @ 2016-04-25 18:31 UTC (permalink / raw)
  To: talal, netdev, davem, galp, ogerlitz, eranbe
In-Reply-To: <20160425182442.11331.88349.stgit@ahduyck-xeon-server>

This change makes it so that we will strip the TSO_MANGLEID bit if TSO is
not present.  This way we will also handle ECN correctly of TSO is not
present.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---
 net/core/dev.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index d6d9f286c4e1..6a5ef49ed1ab 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6720,6 +6720,10 @@ static netdev_features_t netdev_fix_features(struct net_device *dev,
 		features &= ~NETIF_F_TSO6;
 	}
 
+	/* TSO with IPv4 ID mangling requires IPv4 TSO be enabled */
+	if ((features & NETIF_F_TSO_MANGLEID) && !(features & NETIF_F_TSO))
+		features &= ~NETIF_F_TSO_MANGLEID;
+
 	/* TSO ECN requires that TSO is present as well. */
 	if ((features & NETIF_F_ALL_TSO) == NETIF_F_TSO_ECN)
 		features &= ~NETIF_F_TSO_ECN;

^ permalink raw reply related

* [net-next PATCH 4/8] vxlan: Add checksum check to the features check function
From: Alexander Duyck @ 2016-04-25 18:31 UTC (permalink / raw)
  To: talal, netdev, davem, galp, ogerlitz, eranbe
In-Reply-To: <20160425182442.11331.88349.stgit@ahduyck-xeon-server>

We need to perform an additional check on the inner headers to determine if
we can offload the checksum for them.  Previously this check didn't occur
so we would generate an invalid frame in the case of an IPv6 header
encapsulated inside of an IPv4 tunnel.  To fix this I added a secondary
check to vxlan_features_check so that we can verify that we can offload the
inner checksum.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---
 include/linux/if_ether.h |    5 +++++
 include/net/vxlan.h      |    4 +++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/linux/if_ether.h b/include/linux/if_ether.h
index d5569734f672..548fd535fd02 100644
--- a/include/linux/if_ether.h
+++ b/include/linux/if_ether.h
@@ -28,6 +28,11 @@ static inline struct ethhdr *eth_hdr(const struct sk_buff *skb)
 	return (struct ethhdr *)skb_mac_header(skb);
 }
 
+static inline struct ethhdr *inner_eth_hdr(const struct sk_buff *skb)
+{
+	return (struct ethhdr *)skb_inner_mac_header(skb);
+}
+
 int eth_header_parse(const struct sk_buff *skb, unsigned char *haddr);
 
 extern ssize_t sysfs_format_mac(char *buf, const unsigned char *addr, int len);
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 673e9f9e6da7..b8803165df91 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -317,7 +317,9 @@ static inline netdev_features_t vxlan_features_check(struct sk_buff *skb,
 	    (skb->inner_protocol_type != ENCAP_TYPE_ETHER ||
 	     skb->inner_protocol != htons(ETH_P_TEB) ||
 	     (skb_inner_mac_header(skb) - skb_transport_header(skb) !=
-	      sizeof(struct udphdr) + sizeof(struct vxlanhdr))))
+	      sizeof(struct udphdr) + sizeof(struct vxlanhdr)) ||
+	     (skb->ip_summed != CHECKSUM_NONE &&
+	      !can_checksum_protocol(features, inner_eth_hdr(skb)->h_proto))))
 		return features & ~(NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK);
 
 	return features;

^ permalink raw reply related

* [net-next PATCH 5/8] mlx4: Add support for UDP tunnel segmentation with outer checksum offload
From: Alexander Duyck @ 2016-04-25 18:31 UTC (permalink / raw)
  To: talal, netdev, davem, galp, ogerlitz, eranbe
In-Reply-To: <20160425182442.11331.88349.stgit@ahduyck-xeon-server>

This patch assumes that the mlx4 hardware will ignore existing IPv4/v6
header fields for length and checksum as well as the length and checksum
fields for outer UDP headers.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |   17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 8bd143dda95d..bce37cbfde24 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2358,7 +2358,9 @@ out:
 
 	/* set offloads */
 	priv->dev->hw_enc_features |= NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
-				      NETIF_F_TSO | NETIF_F_GSO_UDP_TUNNEL;
+				      NETIF_F_TSO | NETIF_F_GSO_UDP_TUNNEL |
+				      NETIF_F_GSO_UDP_TUNNEL_CSUM |
+				      NETIF_F_GSO_PARTIAL;
 }
 
 static void mlx4_en_del_vxlan_offloads(struct work_struct *work)
@@ -2368,7 +2370,9 @@ static void mlx4_en_del_vxlan_offloads(struct work_struct *work)
 						 vxlan_del_task);
 	/* unset offloads */
 	priv->dev->hw_enc_features &= ~(NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
-				      NETIF_F_TSO | NETIF_F_GSO_UDP_TUNNEL);
+					NETIF_F_TSO | NETIF_F_GSO_UDP_TUNNEL |
+					NETIF_F_GSO_UDP_TUNNEL_CSUM |
+					NETIF_F_GSO_PARTIAL);
 
 	ret = mlx4_SET_PORT_VXLAN(priv->mdev->dev, priv->port,
 				  VXLAN_STEER_BY_OUTER_MAC, 0);
@@ -2992,8 +2996,13 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 	}
 
 	if (mdev->dev->caps.tunnel_offload_mode == MLX4_TUNNEL_OFFLOAD_MODE_VXLAN) {
-		dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL;
-		dev->features    |= NETIF_F_GSO_UDP_TUNNEL;
+		dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL |
+				    NETIF_F_GSO_UDP_TUNNEL_CSUM |
+				    NETIF_F_GSO_PARTIAL;
+		dev->features    |= NETIF_F_GSO_UDP_TUNNEL |
+				    NETIF_F_GSO_UDP_TUNNEL_CSUM |
+				    NETIF_F_GSO_PARTIAL;
+		dev->gso_partial_features = NETIF_F_GSO_UDP_TUNNEL_CSUM;
 	}
 
 	mdev->pndev[port] = dev;

^ permalink raw reply related

* [net-next PATCH 6/8] mlx4: Add support for inner IPv6 checksum offloads and TSO
From: Alexander Duyck @ 2016-04-25 18:31 UTC (permalink / raw)
  To: talal, netdev, davem, galp, ogerlitz, eranbe
In-Reply-To: <20160425182442.11331.88349.stgit@ahduyck-xeon-server>

>From what I can tell the ConnectX-3 will support an inner IPv6 checksum and
segmentation offload, however it cannot support outer IPv6 headers.  For
this reason I am adding the feature to the hw_enc_features and adding an
extra check to the features_check call that will disable GSO and checksum
offload in the case that the encapsulated frame has an outer IP version of
that is not 4.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |   25 +++++++++++++++++++-----
 drivers/net/ethernet/mellanox/mlx4/en_tx.c     |   15 ++++++++++++--
 2 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index bce37cbfde24..6f28ac58251c 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2357,8 +2357,10 @@ out:
 	}
 
 	/* set offloads */
-	priv->dev->hw_enc_features |= NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
-				      NETIF_F_TSO | NETIF_F_GSO_UDP_TUNNEL |
+	priv->dev->hw_enc_features |= NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
+				      NETIF_F_RXCSUM |
+				      NETIF_F_TSO | NETIF_F_TSO6 |
+				      NETIF_F_GSO_UDP_TUNNEL |
 				      NETIF_F_GSO_UDP_TUNNEL_CSUM |
 				      NETIF_F_GSO_PARTIAL;
 }
@@ -2369,8 +2371,10 @@ static void mlx4_en_del_vxlan_offloads(struct work_struct *work)
 	struct mlx4_en_priv *priv = container_of(work, struct mlx4_en_priv,
 						 vxlan_del_task);
 	/* unset offloads */
-	priv->dev->hw_enc_features &= ~(NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
-					NETIF_F_TSO | NETIF_F_GSO_UDP_TUNNEL |
+	priv->dev->hw_enc_features &= ~(NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
+					NETIF_F_RXCSUM |
+					NETIF_F_TSO | NETIF_F_TSO6 |
+					NETIF_F_GSO_UDP_TUNNEL |
 					NETIF_F_GSO_UDP_TUNNEL_CSUM |
 					NETIF_F_GSO_PARTIAL);
 
@@ -2431,7 +2435,18 @@ static netdev_features_t mlx4_en_features_check(struct sk_buff *skb,
 						netdev_features_t features)
 {
 	features = vlan_features_check(skb, features);
-	return vxlan_features_check(skb, features);
+	features = vxlan_features_check(skb, features);
+
+	/* The ConnectX-3 doesn't support outer IPv6 checksums but it does
+	 * support inner IPv6 checksums and segmentation so  we need to
+	 * strip that feature if this is an IPv6 encapsulated frame.
+	 */
+	if (skb->encapsulation &&
+	    (skb->ip_summed == CHECKSUM_PARTIAL) &&
+	    (ip_hdr(skb)->version != 4))
+		features &= ~(NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK);
+
+	return features;
 }
 #endif
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index c0d7b7296236..c9f5388ea22a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -41,6 +41,7 @@
 #include <linux/vmalloc.h>
 #include <linux/tcp.h>
 #include <linux/ip.h>
+#include <linux/ipv6.h>
 #include <linux/moduleparam.h>
 
 #include "mlx4_en.h"
@@ -918,8 +919,18 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 				 tx_ind, fragptr);
 
 	if (skb->encapsulation) {
-		struct iphdr *ipv4 = (struct iphdr *)skb_inner_network_header(skb);
-		if (ipv4->protocol == IPPROTO_TCP || ipv4->protocol == IPPROTO_UDP)
+		union {
+			struct iphdr *v4;
+			struct ipv6hdr *v6;
+			unsigned char *hdr;
+		} ip;
+		u8 proto;
+
+		ip.hdr = skb_inner_network_header(skb);
+		proto = (ip.v4->version == 4) ? ip.v4->protocol :
+						ip.v6->nexthdr;
+
+		if (proto == IPPROTO_TCP || proto == IPPROTO_UDP)
 			op_own |= cpu_to_be32(MLX4_WQE_CTRL_IIP | MLX4_WQE_CTRL_ILP);
 		else
 			op_own |= cpu_to_be32(MLX4_WQE_CTRL_IIP);

^ permalink raw reply related

* [net-next PATCH 7/8] mlx5e: Add support for UDP tunnel segmentation with outer checksum offload
From: Alexander Duyck @ 2016-04-25 18:31 UTC (permalink / raw)
  To: talal, netdev, davem, galp, ogerlitz, eranbe
In-Reply-To: <20160425182442.11331.88349.stgit@ahduyck-xeon-server>

This patch assumes that the mlx5 hardware will ignore existing IPv4/v6
header fields for length and checksum as well as the length and checksum
fields for outer UDP headers.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index d485d1e4e100..a687d6624c8e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2580,13 +2580,18 @@ static void mlx5e_build_netdev(struct net_device *netdev)
 	netdev->hw_features      |= NETIF_F_HW_VLAN_CTAG_FILTER;
 
 	if (mlx5e_vxlan_allowed(mdev)) {
-		netdev->hw_features     |= NETIF_F_GSO_UDP_TUNNEL;
+		netdev->hw_features     |= NETIF_F_GSO_UDP_TUNNEL |
+					   NETIF_F_GSO_UDP_TUNNEL_CSUM |
+					   NETIF_F_GSO_PARTIAL;
 		netdev->hw_enc_features |= NETIF_F_IP_CSUM;
 		netdev->hw_enc_features |= NETIF_F_RXCSUM;
 		netdev->hw_enc_features |= NETIF_F_TSO;
 		netdev->hw_enc_features |= NETIF_F_TSO6;
 		netdev->hw_enc_features |= NETIF_F_RXHASH;
 		netdev->hw_enc_features |= NETIF_F_GSO_UDP_TUNNEL;
+		netdev->hw_enc_features |= NETIF_F_GSO_UDP_TUNNEL_CSUM |
+					   NETIF_F_GSO_PARTIAL;
+		netdev->gso_partial_features = NETIF_F_GSO_UDP_TUNNEL_CSUM;
 	}
 
 	netdev->features          = netdev->hw_features;

^ permalink raw reply related

* [net-next PATCH 8/8] mlx5e: Fix IPv6 tunnel checksum offload
From: Alexander Duyck @ 2016-04-25 18:31 UTC (permalink / raw)
  To: talal, netdev, davem, galp, ogerlitz, eranbe
In-Reply-To: <20160425182442.11331.88349.stgit@ahduyck-xeon-server>

The mlx5 driver exposes support for TSO6 but not IPv6 csum for hardware
encapsulated tunnels.  This leads to issues as it triggers warnings in
skb_checksum_help as it ends up being called as we report supporting the
segmentation but not the checksumming for IPv6 frames.

This patch corrects that and drops 2 features that don't actually need to
be supported in hw_enc_features since they are Rx features and don't
actually impact anything by being present in hw_enc_features.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index a687d6624c8e..4728abd46cb2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2584,10 +2584,9 @@ static void mlx5e_build_netdev(struct net_device *netdev)
 					   NETIF_F_GSO_UDP_TUNNEL_CSUM |
 					   NETIF_F_GSO_PARTIAL;
 		netdev->hw_enc_features |= NETIF_F_IP_CSUM;
-		netdev->hw_enc_features |= NETIF_F_RXCSUM;
+		netdev->hw_enc_features |= NETIF_F_IPV6_CSUM;
 		netdev->hw_enc_features |= NETIF_F_TSO;
 		netdev->hw_enc_features |= NETIF_F_TSO6;
-		netdev->hw_enc_features |= NETIF_F_RXHASH;
 		netdev->hw_enc_features |= NETIF_F_GSO_UDP_TUNNEL;
 		netdev->hw_enc_features |= NETIF_F_GSO_UDP_TUNNEL_CSUM |
 					   NETIF_F_GSO_PARTIAL;

^ permalink raw reply related

* Re: [PATCH RFC] b43: stop hardcoding LED behavior
From: Michael Büsch @ 2016-04-25 18:32 UTC (permalink / raw)
  To: Lucas Stach; +Cc: Kalle Valo, netdev, linux-wireless, b43-dev
In-Reply-To: <1461608496.2364.36.camel@lynxeye.de>

[-- Attachment #1: Type: text/plain, Size: 1972 bytes --]

On Mon, 25 Apr 2016 20:21:36 +0200
Lucas Stach <dev@lynxeye.de> wrote:

> > Numbers please. Did you measure that is actually causes more
> > _wakeups_?
> > How many?
> > The led work is placed in the mac80211 workqueue and LED updates only
> > happen on behalf of mac80211 activities (by default). It only causes
> > additional wakeups, if there's nothing else scheduled on the
> > workqueue
> > anyways (which might well be the case. So we need numbers. :)
> >   
> The blinking LEDs use a timer to enforce a constant blink rate at a
> 50ms on/off interval. While they are only triggered if there is some
> RX/TX activity in the system, they cause up to 20 wakeups/second/led.
> As the timers used for LED activity aren't deferrable, this hardcode is
> causing 40 unnecessary CPU wakeups/s in my system.


Ok this is 40 to 40k for the interrupt requests?
We need real measured numbers and a percentage on how much the b43 LEDs
increase the system wakeups in relation to all other wakeups.


> There are some people that find those kinds of blinking LEDs
> distracting,


Those can already disable them via the standard LED framework.


> so a module parameter to disable them altogether might be
> a good thing to have.


No. We have a standard API for this.


> Causing CPU wakeups in a system where those LEDs
> aren't even physically populated is clearly undesired behavior.


Yes, but this is not going to be fixed via regressions.


> If checking that the SPROM doesn't define any LED behavior is enough to
> not regress your use case, I would be glad to rework the patch
> accordingly.


As it turns out I don't have that card here and I don't have a dump of
its SPROM as I expected. So I cannot really verify this. But I'm pretty
sure that this card did not define any LEDs in its SPROM at all.
I'm not aware of any card that only partially defines LEDs in the
SPROM. So that fix would be OK.

-- 
Michael

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH] [RFC] net: dsa: mv88e6xxx: Pre-initialize err in mv88e6xxx_port_bridge_join()
From: Vivien Didelot @ 2016-04-25 18:35 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: David S. Miller, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <CAMuHMdVfYTx3CCa0GnqmcWHBQPYZZUYZQz1hOOaUjgXFmhVP3w@mail.gmail.com>

Hi Geert,

Geert Uytterhoeven <geert@linux-m68k.org> writes:

> Hi Vivien,
>
> On Mon, Apr 25, 2016 at 7:31 PM, Vivien Didelot
> <vivien.didelot@savoirfairelinux.com> wrote:
>> Geert Uytterhoeven <geert@linux-m68k.org> writes:
>>> On Mon, Apr 25, 2016 at 5:03 PM, Vivien Didelot
>>> <vivien.didelot@savoirfairelinux.com> wrote:
>>>> Geert Uytterhoeven <geert@linux-m68k.org> writes:
>>>>> drivers/net/dsa/mv88e6xxx.c: In function ‘mv88e6xxx_port_bridge_join’:
>>>>> drivers/net/dsa/mv88e6xxx.c:2184: warning: ‘err’ may be used uninitialized in this function
>>>>
>>>> Interesting, I don't have those warnings on 207afda1b5036009...
>>>
>>> It depends on the compiler version (still using 4.1.2) and options.
>>>
>>>>> If netdev_notifier_changeupper_info.upper_dev is ever NULL, the bridge
>>>>> parameter will be NULL too, and the function will return an
>>>>> uninitialized value.
>>>>>
>>>>> Pre-initialize err to zero to fix this.
>>>>>
>>>>> Fixes: 207afda1b5036009 ("net: dsa: mv88e6xxx: share the same default FDB")
>>>>> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
>>>>> ---
>>>>> Can this actually happen?
>>>>
>>>> bridge cannot be NULL here. Also ps->ports[port].bridge_dev is assigned
>>>> to it before entering the for loop, so _mv88e6xxx_port_based_vlan_map
>>>> will be called at least for this port.
>>>
>>> But there's no way the compiler can know that...
>>
>> Or maybe it can in new configurations. Anyway, this fix doesn't hurt,
>> with a relevant commit message, I'd ack it.
>
> What would you consider a relevant commit message?

bridge being NULL is not the reason why err can eventually get returned
uninitialized. The GCC version would be great too, I have no such
warning here with 5.3.0.

Thanks,
-v

^ permalink raw reply

* Re: [PATCH v2 net-next] net: ethernet: enc28j60: add device tree support
From: Michael Heimpold @ 2016-04-25 18:41 UTC (permalink / raw)
  To: Rob Herring
  Cc: Jonathan Cameron, Andrew F . Davis, Mark Brown, netdev,
	devicetree, Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala
In-Reply-To: <20160425132326.GA17445@rob-hp-laptop>

Hi,

Am Monday 25 April 2016, 08:23:26 schrieb Rob Herring:
> On Sun, Apr 24, 2016 at 11:28:03PM +0200, Michael Heimpold wrote:
> > The following patch adds the required match table for device tree support
> > (and while at, fix the indent). It's also possible to specify the
> > MAC address in the DT blob.
> > 
> > Also add the corresponding binding documentation file.
> > 
> > Signed-off-by: Michael Heimpold <mhei@heimpold.de>
> > ---
> > 
> > v2: * took care of Arnd Bergmann's review comments
> > 
> >       - allow to specify MAC address via DT
> >       - unconditionally define DT id table
> >     
> >     * increased the driver version minor number
> >     * driver author's email address bounces, removed from address list
> >  
> >  .../devicetree/bindings/net/microchip-enc28j60.txt | 50
> >  ++++++++++++++++++++++
> Matching the compatible string is preferred here: microchip,enc28j60.txt
> 

Ok, will change in next round.

> >  drivers/net/ethernet/microchip/enc28j60.c          | 20 +++++++--
> >  2 files changed, 67 insertions(+), 3 deletions(-)
> >  create mode 100644
> >  Documentation/devicetree/bindings/net/microchip-enc28j60.txt> 
> > diff --git a/Documentation/devicetree/bindings/net/microchip-enc28j60.txt
> > b/Documentation/devicetree/bindings/net/microchip-enc28j60.txt new file
> > mode 100644
> > index 0000000..847a97b
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/net/microchip-enc28j60.txt
> > @@ -0,0 +1,50 @@
> > +* Microchip ENC28J60
> > +
> > +This is a standalone 10 MBit ethernet controller with SPI interface.
> > +
> > +For each device connected to a SPI bus, define a child node within
> > +the SPI master node.
> > +
> > +Required properties:
> > +- compatible: Should be "microchip,enc28j60"
> > +- reg: Specify the SPI chip select the ENC28J60 is wired to
> > +- interrupts: Specify the interrupt and interrupt type (usually falling
> > edge) +
> > +Optional properties:
> > +- interrupt-parent: Specify the pHandle of the source interrupt
> 
> This is required in the sense that either the node or a parent node must
> define it. In this case, since the SPI controller likely has a different
> parent, you will pretty much always need it defined in this node.

Understand. I think I should then move it to the required properties?
BTW: I orientated on the qca-qca7000-spi.txt as this is a SPI ethernet
device, too. Maybe we should also clarify this topic for this binding...

> > +- spi-max-frequency: Maximum frequency of the SPI bus when accessing the
> > ENC28J60. +  According to the ENC28J80 datasheet, the chip allows a
> > maximum of 20 MHz, however, +  board designs may need to limit this
> > value.
> > +- local-mac-address: See ethernet.txt in the same directory.
> > +
> > +
> > +Example (for NXP i.MX28 with pin control stuff for GPIO irq):
> > +
> > +        ssp2: ssp@80014000 {
> > +                compatible = "fsl,imx28-spi";
> > +                pinctrl-names = "default";
> > +                pinctrl-0 = <&spi2_pins_b &spi2_sck_cfg>;
> > +                status = "okay";
> > +
> > +                enc28j60: ethernet@0 {
> > +                        compatible = "microchip,enc28j60";
> > +                        pinctrl-names = "default";
> > +                        pinctrl-0 = <&enc28j60_pins>;
> 
> Need to document using the pinctrl binding.

Would it be enough, when I list both properties above and refer to
the pinctrl binding?
I wondered whether there are other possibilities to wire the irq,
but I guess that using GPIO line is the usual way to do, so I'd add
this to required properties, too.

Thanks for your feedback, regards

Michael
> 
> > +                        reg = <0>;
> > +                        interrupt-parent = <&gpio3>;
> > +                        interrupts = <3 IRQ_TYPE_EDGE_FALLING>;
> > +                        spi-max-frequency = <12000000>;
> > +                };
> > +        };
> > +

^ permalink raw reply

* Re: [PATCH net-next 0/9] netlink: align attributes when needed (patchset #2)
From: David Miller @ 2016-04-25 19:09 UTC (permalink / raw)
  To: nicolas.dichtel
  Cc: netdev, jhs, stephen, pshelar, aar, linux-wpan, wensong, horms,
	ja, pablo, kaber, kadlec, lvs-devel, netfilter-devel, johannes,
	linux-wireless
In-Reply-To: <1461572722-6029-1-git-send-email-nicolas.dichtel@6wind.com>

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Mon, 25 Apr 2016 10:25:13 +0200

> This is the continuation (series #2) of the work done to align netlink
> attributes when these attributes contain some 64-bit fields.
> 
> In patch #3, I didn't modify the function ila_encap_nlsize(). I was waiting
> feedback for this patch: http://patchwork.ozlabs.org/patch/613766/
> If it's approved, there will be an update to switch nla_total_size() to
> nla_total_size_64bit() after the merge of net in net-next.

Looks good, series applied, thanks!

^ permalink raw reply

* Re: [PATCH net-next] macvlan: fix failure during registration v3
From: Eric W. Biederman @ 2016-04-25 19:00 UTC (permalink / raw)
  To: Francesco Ruggeri; +Cc: netdev, David S. Miller, Mahesh Bandewar
In-Reply-To: <1461449012-7401-1-git-send-email-fruggeri@arista.com>

Francesco Ruggeri <fruggeri@arista.com> writes:

> If macvlan_common_newlink fails in register_netdevice after macvlan_init
> then it decrements port->count twice, first in macvlan_uninit (from
> register_netdevice or rollback_registered) and then again in
> macvlan_common_newlink.
> A similar problem may exist in the ipvlan driver.
> This patch consolidates modifications to port->count into macvlan_init
> and macvlan_uninit (thanks to Eric Biederman for suggesting this approach).
>
> v3: remove macvtap specific bits.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
> ---
>  drivers/net/macvlan.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
> index 2bcf1f3..cb01023 100644
> --- a/drivers/net/macvlan.c
> +++ b/drivers/net/macvlan.c
> @@ -795,6 +795,7 @@ static int macvlan_init(struct net_device *dev)
>  {
>  	struct macvlan_dev *vlan = netdev_priv(dev);
>  	const struct net_device *lowerdev = vlan->lowerdev;
> +	struct macvlan_port *port = vlan->port;
>  
>  	dev->state		= (dev->state & ~MACVLAN_STATE_MASK) |
>  				  (lowerdev->state & MACVLAN_STATE_MASK);
> @@ -812,6 +813,8 @@ static int macvlan_init(struct net_device *dev)
>  	if (!vlan->pcpu_stats)
>  		return -ENOMEM;
>  
> +	port->count += 1;
> +
>  	return 0;
>  }
>  
> @@ -1312,10 +1315,9 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev,
>  			return err;
>  	}
>  
> -	port->count += 1;
>  	err = register_netdevice(dev);
>  	if (err < 0)
> -		goto destroy_port;
> +		return err;
>  
>  	dev->priv_flags |= IFF_MACVLAN;
>  	err = netdev_upper_dev_link(lowerdev, dev);
> @@ -1330,10 +1332,6 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev,
>  
>  unregister_netdev:
>  	unregister_netdevice(dev);
> -destroy_port:
> -	port->count -= 1;
> -	if (!port->count)
> -		macvlan_port_destroy(lowerdev);
>  
>  	return err;
>  }

^ permalink raw reply

* Re: [PATCH net-next] macvtap: check minor when unregistering
From: Eric W. Biederman @ 2016-04-25 19:01 UTC (permalink / raw)
  To: Francesco Ruggeri; +Cc: netdev, David S. Miller
In-Reply-To: <1461449071-7499-1-git-send-email-fruggeri@arista.com>

Francesco Ruggeri <fruggeri@arista.com> writes:

> macvtap_device_event(NETDEV_UNREGISTER) should check vlan->minor to
> determine if it is being invoked in the context of a macvtap_newlink
> that failed, for example in this code sequence:
>
> macvtap_newlink
>   macvlan_common_newlink
>     register_netdevice
>       call_netdevice_notifiers(NETDEV_REGISTER, dev)
>         macvtap_device_event(NETDEV_REGISTER)
>           <fail here, vlan->minor = 0>
>       rollback_registered(dev);
>         rollback_registered_many
>           call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
>             macvtap_device_event(NETDEV_UNREGISTER)
>               <nothing to clean up here>
>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
> ---
>  drivers/net/macvtap.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index 95394ed..74cb15a 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -1303,6 +1303,9 @@ static int macvtap_device_event(struct notifier_block *unused,
>  		}
>  		break;
>  	case NETDEV_UNREGISTER:
> +		/* vlan->minor == 0 if NETDEV_REGISTER above failed */
> +		if (vlan->minor == 0)
> +			break;
>  		devt = MKDEV(MAJOR(macvtap_major), vlan->minor);
>  		device_destroy(macvtap_class, devt);
>  		macvtap_free_minor(vlan);

^ permalink raw reply

* Re: [PATCH net v2 2/3] drivers: net: cpsw: fix error messages when using phy-handle DT property
From: Grygorii Strashko @ 2016-04-25 19:12 UTC (permalink / raw)
  To: David Rivshin (Allworx), Rob Herring
  Cc: netdev, linux-omap, linux-arm-kernel, devicetree, linux-kernel,
	David Miller, Mugunthan V N, Andrew Goodbody, Markus Brunner,
	Nicolas Chauvet
In-Reply-To: <20160422114545.57b477ad.drivshin.allworx@gmail.com>

On 04/22/2016 06:45 PM, David Rivshin (Allworx) wrote:
> On Fri, 22 Apr 2016 16:03:34 +0300
> Grygorii Strashko <grygorii.strashko@ti.com> wrote:
> 
>> On 04/21/2016 09:26 PM, David Rivshin (Allworx) wrote:
>>> From: David Rivshin <drivshin@allworx.com>
>>>
>>> The phy-handle, phy_id, and fixed-link properties are mutually exclusive,
>>> and only one need be specified. However if phy-handle was specified, an
>>> error message would complain about the lack of phy_id or fixed-link.

I think, commit message need to be updated.
You not only fix log messages - you also fix the issue with 
of_get_phy_mode(slave_node); which will not be called if phy-handle is used.
 

slave_data->phy_if = of_get_phy_mode(slave_node); 
^ see below
>>>
>>> Also, if phy-handle was specified and the subsequent of_phy_connect()
>>> failed, the error message still referenced slaved->data->phy_id, which
>>> would be empty. Instead, use the name of the device_node as a useful
>>> identifier.
>>>
>>> Fixes: 9e42f715264f ("drivers: net: cpsw: add phy-handle parsing")
>>> Signed-off-by: David Rivshin <drivshin@allworx.com>
>>> Acked-by: Rob Herring <robh@kernel.org>
>>> Tested-by: Nicolas Chauvet <kwizart@gmail.com>
>>> ---
>>> If would like this for -stable it should apply cleanly as far back
>>> as 4.5. It failes on 4.4 due to some context differences, but can be
>>> applied with 'git am -C2'. Or, I can produce a separate patch against
>>> linux-4.4.y if preferred.
>>>
>>> Changes since v1 [1]:
>>> - Rebased (no conflicts)
>>> - Added Tested-by from Nicolas Chauvet
>>> - Added Acked-by from Rob Herring for the binding change
>>>
>>> [1] https://patchwork.ozlabs.org/patch/560324/
>>>
>>>    Documentation/devicetree/bindings/net/cpsw.txt |  4 ++--
>>>    drivers/net/ethernet/ti/cpsw.c                 | 17 +++++++++++++----
>>>    2 files changed, 15 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/Documentation/devicetree/bindings/net/cpsw.txt b/Documentation/devicetree/bindings/net/cpsw.txt
>>> index 28a4781..3033c0f 100644
>>> --- a/Documentation/devicetree/bindings/net/cpsw.txt
>>> +++ b/Documentation/devicetree/bindings/net/cpsw.txt
>>> @@ -46,16 +46,16 @@ Optional properties:
>>>    - dual_emac_res_vlan	: Specifies VID to be used to segregate the ports
>>>    - mac-address		: See ethernet.txt file in the same directory
>>>    - phy_id		: Specifies slave phy id
>>
>> May be the "phy_id" can be marked as deprecated? (while here)
>> The recommended property now is "phy-handle".
> 
> I can certainly do that. Perhaps something like this?
>   - phy_id		: Specifies slave phy id (deprecated, use phy-handle)
> 
> Rob, would you have any issues with bundling that?
> 
>>
>>>    - phy-handle		: See ethernet.txt file in the same directory
>>>    
>>>    Slave sub-nodes:
>>>    - fixed-link		: See fixed-link.txt file in the same directory
>>> -			  Either the property phy_id, or the sub-node
>>> -			  fixed-link can be specified
>>> +
>>> +Note: Exactly one of phy_id, phy-handle, or fixed-link must be specified.
>>>    
>>>    Note: "ti,hwmods" field is used to fetch the base address and irq
>>>    resources from TI, omap hwmod data base during device registration.
>>>    Future plan is to migrate hwmod data base contents into device tree
>>>    blob so that, all the required data will be used from device tree dts
>>>    file.
>>>    
>>> diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
>>> index d69cb3f..3c81413 100644
>>> --- a/drivers/net/ethernet/ti/cpsw.c
>>> +++ b/drivers/net/ethernet/ti/cpsw.c
>>> @@ -1150,16 +1150,19 @@ static void cpsw_slave_open(struct cpsw_slave *slave, struct cpsw_priv *priv)
>>>    	if (slave->data->phy_node)
>>>    		slave->phy = of_phy_connect(priv->ndev, slave->data->phy_node,
>>>    				 &cpsw_adjust_link, 0, slave->data->phy_if);
>>>    	else
>>>    		slave->phy = phy_connect(priv->ndev, slave->data->phy_id,
>>>    				 &cpsw_adjust_link, slave->data->phy_if);
>>>    	if (IS_ERR(slave->phy)) {
>>> -		dev_err(priv->dev, "phy %s not found on slave %d\n",
>>> -			slave->data->phy_id, slave->slave_num);
>>> +		dev_err(priv->dev, "phy \"%s\" not found on slave %d\n",
>>> +			slave->data->phy_node ?
>>> +				slave->data->phy_node->full_name :
>>> +				slave->data->phy_id,
>>> +			slave->slave_num);
>>
>> Unfortunately,  there are some inconsistency between legacy and FDT API :(
>> of_phy_connect() will return valid phy_device or NULL, but phy_connect()
>> can return valid phy_device or ERR_PTR().
> 
> Good catch, I hadn't noticed that. It looks like that's actually a more
> serious (pre-existing) bug: if of_phy_connect() returns NULL, we'd end
> up dereferencing it and pagefaulting.
> 
> How about moving the IS_ERR() check into the phy_connect() case like this:
> 	if (slave->data->phy_node) {
> 		slave->phy = of_phy_connect(priv->ndev, slave->data->phy_node,
> 				 &cpsw_adjust_link, 0, slave->data->phy_if);

[1]

> 	} else {
> 		slave->phy = phy_connect(priv->ndev, slave->data->phy_id,
> 				 &cpsw_adjust_link, slave->data->phy_if);
> 		if (IS_ERR(slave->phy))
> 			slave->phy = NULL;
[2]
> 	}
> 	if (!slave->phy) {
> 		dev_err(priv->dev, "phy \"%s\" not found on slave %d\n",
> 			slave->data->phy_node ?
> 				slave->data->phy_node->full_name :
> 				slave->data->phy_id,
> 			slave->slave_num);
> 	} else {
> 
> Since you say the phy_id case is deprecated anyways, I'm not too concerned
> about not printing the error code returned by phy_connect() in that case
> (especially since it never did so in the past). That lets us still avoid
> duplicating the dev_err() itself.

I'm not worry too much about duplicating dev_err() - it's always good to know
the reason of failure.

So, may be for of_phy_connect() [1]:
 dev_err(priv->dev, "phy \"%s\" not found on slave %d\n",
	slave->data->phy_node->full_name,
 	slave->slave_num);

and for phy_connect() [2]:
  dev_err(priv->dev, "phy %s not found on slave %d, err %d\n",
  	slave->data->phy_id, slave->slave_num, PTR_ERR(slave->phy));

Mugunthan, any comments?

> 
> 
>>
>>
>>
>>>    		slave->phy = NULL;
>>>    	} else {
>>>    		phy_attached_info(slave->phy);
>>>    
>>>    		phy_start(slave->phy);
>>>    
>>>    		/* Configure GMII_SEL register */
>>> @@ -2030,15 +2033,19 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
>>>    		/* This is no slave child node, continue */
>>>    		if (strcmp(slave_node->name, "slave"))
>>>    			continue;
>>>    
>>>    		slave_data->phy_node = of_parse_phandle(slave_node,
>>>    							"phy-handle", 0);
>>>    		parp = of_get_property(slave_node, "phy_id", &lenp);
>>> -		if (of_phy_is_fixed_link(slave_node)) {
>>> +		if (slave_data->phy_node) {
>>> +			dev_dbg(&pdev->dev,
>>> +				"slave[%d] using phy-handle=\"%s\"\n",
>>> +				i, slave_data->phy_node->full_name);
>>> +		} else if (of_phy_is_fixed_link(slave_node)) {
>>>    			struct device_node *phy_node;
>>>    			struct phy_device *phy_dev;
>>>    
>>>    			/* In the case of a fixed PHY, the DT node associated
>>>    			 * to the PHY is the Ethernet MAC DT node.
>>>    			 */
>>>    			ret = of_phy_register_fixed_link(slave_node);
>>> @@ -2067,15 +2074,17 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
>>>    			if (!mdio) {
>>>    				dev_err(&pdev->dev, "Missing mdio platform device\n");
>>>    				return -EINVAL;
>>>    			}
>>>    			snprintf(slave_data->phy_id, sizeof(slave_data->phy_id),
>>>    				 PHY_ID_FMT, mdio->name, phyid);
>>>    		} else {
>>> -			dev_err(&pdev->dev, "No slave[%d] phy_id or fixed-link property\n", i);
>>> +			dev_err(&pdev->dev,
>>> +				"No slave[%d] phy_id, phy-handle, or fixed-link property\n",
>>> +				i);
>>>    			goto no_phy_slave;
>>>    		}
>>>    		slave_data->phy_if = of_get_phy_mode(slave_node);

Your change will allow the code to reach this point in case of phy-handle.

>>>    		if (slave_data->phy_if < 0) {
>>>    			dev_err(&pdev->dev, "Missing or malformed slave[%d] phy-mode property\n",
>>>    				i);
>>>    			return slave_data->phy_if;
>>>    
>>
>>


-- 
regards,
-grygorii

^ permalink raw reply

* Re: [net-next 00/15][pull request] 10GbE Intel Wired LAN Driver Updates 2016-04-25
From: David Miller @ 2016-04-25 19:12 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene, john.ronciak
In-Reply-To: <1461588269-128728-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 25 Apr 2016 05:44:14 -0700

> This series contains updates to ixgbe and ixgbevf.

Pulled, thanks Jeff.

^ permalink raw reply

* Re: [PATCH net v2 1/3] drivers: net: cpsw: fix parsing of phy-handle DT property in dual_emac config
From: Grygorii Strashko @ 2016-04-25 19:14 UTC (permalink / raw)
  To: David Rivshin (Allworx), netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-omap-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, David Miller, Mugunthan V N,
	Andrew Goodbody, Markus Brunner, Nicolas Chauvet
In-Reply-To: <571A210E.2020607-l0cyMroinI0@public.gmane.org>

On 04/22/2016 04:03 PM, Grygorii Strashko wrote:
> On 04/21/2016 09:19 PM, David Rivshin (Allworx) wrote:
>> From: David Rivshin <drivshin-5fOYsn7Fw8lBDgjK7y7TUQ@public.gmane.org>
>>
>> Commit 9e42f715264ff158478fa30eaed847f6e131366b ("drivers: net: cpsw: add
>> phy-handle parsing") saved the "phy-handle" phandle into a new cpsw_priv
>> field. However, phy connections are per-slave, so the phy_node field 
>> should
>> be in cpsw_slave_data rather than cpsw_priv.
>>
>> This would go unnoticed in a single emac configuration. But in dual_emac
>> mode, the last "phy-handle" property parsed for either slave would be 
>> used
>> by both of them, causing them both to refer to the same phy_device.
>>
>> Fixes: 9e42f715264f ("drivers: net: cpsw: add phy-handle parsing")
>> Signed-off-by: David Rivshin <drivshin-5fOYsn7Fw8lBDgjK7y7TUQ@public.gmane.org>
>> Tested-by: Nicolas Chauvet <kwizart-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> ---
>> I would suggest this for -stable. It should apply cleanly as far back
>> as 4.4.
>>
>> Changes since v1 [1]:
>> - Rebased (no conflicts)
>> - Added Tested-by from Nicolas Chauvet
>>
>> [1] https://patchwork.ozlabs.org/patch/560326/
> 
> Reviewed-by: Grygorii Strashko <grygorii.strashko-l0cyMroinI0@public.gmane.org>

In my opinion, it will be good to have this patch merged as part of -rc cycle, since
it will fix "NULL pointer dereference" issue with current LKML as reported by Andrew Goodbody.


> 
>>
>>   drivers/net/ethernet/ti/cpsw.c | 13 ++++++-------
>>   drivers/net/ethernet/ti/cpsw.h |  1 +
>>   2 files changed, 7 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/ti/cpsw.c 
>> b/drivers/net/ethernet/ti/cpsw.c
>> index 42fdfd4..d69cb3f 100644
>> --- a/drivers/net/ethernet/ti/cpsw.c
>> +++ b/drivers/net/ethernet/ti/cpsw.c
>> @@ -363,15 +363,14 @@ static inline void slave_write(struct cpsw_slave 
>> *slave, u32 val, u32 offset)
>>       __raw_writel(val, slave->regs + offset);
>>   }
>>
>>   struct cpsw_priv {
>>       spinlock_t            lock;
>>       struct platform_device        *pdev;
>>       struct net_device        *ndev;
>> -    struct device_node        *phy_node;
>>       struct napi_struct        napi_rx;
>>       struct napi_struct        napi_tx;
>>       struct device            *dev;
>>       struct cpsw_platform_data    data;
>>       struct cpsw_ss_regs __iomem    *regs;
>>       struct cpsw_wr_regs __iomem    *wr_regs;
>>       u8 __iomem            *hw_stats;
>> @@ -1144,16 +1143,16 @@ static void cpsw_slave_open(struct cpsw_slave 
>> *slave, struct cpsw_priv *priv)
>>
>>       if (priv->data.dual_emac)
>>           cpsw_add_dual_emac_def_ale_entries(priv, slave, slave_port);
>>       else
>>           cpsw_ale_add_mcast(priv->ale, priv->ndev->broadcast,
>>                      1 << slave_port, 0, 0, ALE_MCAST_FWD_2);
>>
>> -    if (priv->phy_node)
>> -        slave->phy = of_phy_connect(priv->ndev, priv->phy_node,
>> +    if (slave->data->phy_node)
>> +        slave->phy = of_phy_connect(priv->ndev, slave->data->phy_node,
>>                    &cpsw_adjust_link, 0, slave->data->phy_if);
>>       else
>>           slave->phy = phy_connect(priv->ndev, slave->data->phy_id,
>>                    &cpsw_adjust_link, slave->data->phy_if);
>>       if (IS_ERR(slave->phy)) {
>>           dev_err(priv->dev, "phy %s not found on slave %d\n",
>>               slave->data->phy_id, slave->slave_num);
>> @@ -1936,20 +1935,19 @@ static void cpsw_slave_init(struct cpsw_slave 
>> *slave, struct cpsw_priv *priv,
>>
>>       slave->data    = data;
>>       slave->regs    = regs + slave_reg_ofs;
>>       slave->sliver    = regs + sliver_reg_ofs;
>>       slave->port_vlan = data->dual_emac_res_vlan;
>>   }
>>

[..]

-- 
regards,
-grygorii
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next] macvtap: add namespace support to the sysfs device class
From: Eric W. Biederman @ 2016-04-25 19:12 UTC (permalink / raw)
  To: Marc Angel; +Cc: netdev
In-Reply-To: <1461161491-22867-1-git-send-email-marc@arista.com>

Marc Angel <marc@arista.com> writes:

> When creating macvtaps that are expected to have the same ifindex
> in different network namespaces, only the first one will succeed.
> The others will fail with a sysfs_warn_dup warning due to them trying
> to create the following sysfs link (with 'NN' the ifindex of macvtapX):
>
> /sys/class/macvtap/tapNN -> /sys/devices/virtual/net/macvtapX/tapNN
>
> This is reproducible by running the following commands:
>
> ip netns add ns1
> ip netns add ns2
> ip link add veth0 type veth peer name veth1
> ip link set veth0 netns ns1
> ip link set veth1 netns ns2
> ip netns exec ns1 ip l add link veth0 macvtap0 type macvtap
> ip netns exec ns2 ip l add link veth1 macvtap1 type macvtap
>
> The last command will fail with "RTNETLINK answers: File exists" (along
> with the kernel warning) but retrying it will work because the ifindex
> was incremented.

Yes.  That is totally broken, and you would not even care excpect that
the ifindex maintained separately per network namespace.  Useful for
migration but it totally breaks things in this case.

> The 'net' device class is isolated between network namespaces so each
> one has its own hierarchy of net devices.
> This isn't the case for the 'macvtap' device class.
> The problem occurs half-way through the netdev registration, when
> `macvtap_device_event` is called-back to create the 'tapNN' macvtap
> class device under the 'macvtapX' net class device.
>
> This patch adds namespace support the the 'macvtap' device class so
> that /sys/class/macvtap is no longer shared between net namespaces.
>
> However, doing this has the side effect of changing
> /sys/devices/virtual/net/macvtapX/tapNN  into
> /sys/devices/virtual/net/macvtapX/macvtap/tapNN

I forget the details of how this interface works, but
/sys/devices/virtual/net is definitely allows different overlapping
content per network namespace, so we should not need to add an extra
directory to make this work.

> This is due to Commit 24b1442 ("Driver-core: Always create class
> directories for classses that support namespaces.")
>
> Signed-off-by: Marc Angel <marc@arista.com>
> ---
> I'm not sure that the problems described in that commit message
> apply to macvtaps so maybe it is possible to keep the 'tapNN'
> device directly under 'macvtapX' and not disrupt userland.
>
> Should it even be possible to add a device of a class that doesn't
> support namespaces under one that does?
> This could lead to dead symlinks in the new device class directory or
> duplicate warnings because a device of the same name already exists in
> another namespace.

This definitely looks like something that bears digging into, and fixing
properly.

Eric

^ permalink raw reply

* Re: [PATCH] net: ipv6: Delete host routes on an ifdown
From: David Miller @ 2016-04-25 19:26 UTC (permalink / raw)
  To: dsa; +Cc: netdev, mmanning
In-Reply-To: <1461297372-18796-1-git-send-email-dsa@cumulusnetworks.com>

From: David Ahern <dsa@cumulusnetworks.com>
Date: Thu, 21 Apr 2016 20:56:12 -0700

> Dave: I realize this goes against your preference to keep routes cached
>       on a down but this patch really emphasizes my point of all the
>       dark corners to be handled. 

I've learned my lesson, I never should have put the original patch in
to begin with.

And you're so sloppy here, you didn't even bother adding a Fixes: tag.

The patch at the root of all of this has terrible semantics, and it's
added tons of regressions.  All of which goes against our most basic
principles.

I've been patient enough, but I've now reverting everything, sorry.

It's unfortunate if you have things which depend upon this, but too
bad.  This change has been extremely disruptive for a lot of people,
including me.

You're going to have to work really hard to ever get me to consider
applying patches which do this again.

^ permalink raw reply

* Re: [PATCH] net: ipv6: Delete host routes on an ifdown
From: David Ahern @ 2016-04-25 19:40 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, mmanning
In-Reply-To: <20160425.152611.583503438886577374.davem@davemloft.net>

On 4/25/16 1:26 PM, David Miller wrote:
> From: David Ahern <dsa@cumulusnetworks.com>
> Date: Thu, 21 Apr 2016 20:56:12 -0700
>
>> Dave: I realize this goes against your preference to keep routes cached
>>        on a down but this patch really emphasizes my point of all the
>>        dark corners to be handled.
>
> I've learned my lesson, I never should have put the original patch in
> to begin with.
>
> And you're so sloppy here, you didn't even bother adding a Fixes: tag.
>
> The patch at the root of all of this has terrible semantics, and it's
> added tons of regressions.  All of which goes against our most basic
> principles.
>
> I've been patient enough, but I've now reverting everything, sorry.
>
> It's unfortunate if you have things which depend upon this, but too
> bad.  This change has been extremely disruptive for a lot of people,
> including me.
>
> You're going to have to work really hard to ever get me to consider
> applying patches which do this again.
>

It's unfortunate you want to take that action. Last week I came across a 
prior attempt by Stephen to do this same thing -- keep IPv6 addresses. 
That prior attempt was reverted by commit 73a8bd74e261. Cumulus, 
Brocade, and others clearly want this capability.

You have made many comments about the pains due to differences between 
IPv4 and IPv6. I deal with those pains daily. It takes time to close 
those differences and unfortunately there have been and will be bumps in 
the road. If you look back at the commits on top of the original I 
responded quickly to the 1 bug report from Andrey and the rest of the 
followups have come from me (or the delta from Mike at Brocade) which 
are a result of our range of testing with this sysctl enabled.

^ permalink raw reply

* [PATCH] ipv6: Revert optional address flusing on ifdown.
From: David Miller @ 2016-04-25 19:47 UTC (permalink / raw)
  To: dsa; +Cc: netdev


This reverts the following three commits:

70af921db6f8835f4b11c65731116560adb00c14
799977d9aafbf0ca0b9c39b04cbfb16db71302c9
f1705ec197e705b79ea40fe7a2cc5acfa1d3bfac

The feature was ill conceived, has terrible semantics, and has added
nothing but regressions to the already fragile ipv6 stack.

Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/networking/ip-sysctl.txt |   9 --
 include/linux/ipv6.h                   |   1 -
 include/uapi/linux/ipv6.h              |   1 -
 net/ipv6/addrconf.c                    | 162 +++------------------------------
 4 files changed, 12 insertions(+), 161 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index b183e2b..e0ac252 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1568,15 +1568,6 @@ temp_prefered_lft - INTEGER
 	Preferred lifetime (in seconds) for temporary addresses.
 	Default: 86400 (1 day)
 
-keep_addr_on_down - INTEGER
-	Keep all IPv6 addresses on an interface down event. If set static
-	global addresses with no expiration time are not flushed.
-	  >0 : enabled
-	   0 : system default
-	  <0 : disabled
-
-	Default: 0 (addresses are removed)
-
 max_desync_factor - INTEGER
 	Maximum value for DESYNC_FACTOR, which is a random value
 	that ensures that clients don't synchronize with each
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 7edc14f..4b2267e 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -62,7 +62,6 @@ struct ipv6_devconf {
 		struct in6_addr secret;
 	} stable_secret;
 	__s32		use_oif_addrs_only;
-	__s32		keep_addr_on_down;
 	void		*sysctl;
 };
 
diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
index 3958760..ec117b6 100644
--- a/include/uapi/linux/ipv6.h
+++ b/include/uapi/linux/ipv6.h
@@ -176,7 +176,6 @@ enum {
 	DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN,
 	DEVCONF_DROP_UNICAST_IN_L2_MULTICAST,
 	DEVCONF_DROP_UNSOLICITED_NA,
-	DEVCONF_KEEP_ADDR_ON_DOWN,
 	DEVCONF_MAX
 };
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 23cec53..d77ba39 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -216,7 +216,6 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
 	},
 	.use_oif_addrs_only	= 0,
 	.ignore_routes_with_linkdown = 0,
-	.keep_addr_on_down	= 0,
 };
 
 static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
@@ -261,7 +260,6 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
 	},
 	.use_oif_addrs_only	= 0,
 	.ignore_routes_with_linkdown = 0,
-	.keep_addr_on_down	= 0,
 };
 
 /* Check if a valid qdisc is available */
@@ -3176,81 +3174,6 @@ static void addrconf_gre_config(struct net_device *dev)
 }
 #endif
 
-#if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV)
-/* If the host route is cached on the addr struct make sure it is associated
- * with the proper table. e.g., enslavement can change and if so the cached
- * host route needs to move to the new table.
- */
-static void l3mdev_check_host_rt(struct inet6_dev *idev,
-				  struct inet6_ifaddr *ifp)
-{
-	if (ifp->rt) {
-		u32 tb_id = l3mdev_fib_table(idev->dev) ? : RT6_TABLE_LOCAL;
-
-		if (tb_id != ifp->rt->rt6i_table->tb6_id) {
-			ip6_del_rt(ifp->rt);
-			ifp->rt = NULL;
-		}
-	}
-}
-#else
-static void l3mdev_check_host_rt(struct inet6_dev *idev,
-				  struct inet6_ifaddr *ifp)
-{
-}
-#endif
-
-static int fixup_permanent_addr(struct inet6_dev *idev,
-				struct inet6_ifaddr *ifp)
-{
-	l3mdev_check_host_rt(idev, ifp);
-
-	if (!ifp->rt) {
-		struct rt6_info *rt;
-
-		rt = addrconf_dst_alloc(idev, &ifp->addr, false);
-		if (unlikely(IS_ERR(rt)))
-			return PTR_ERR(rt);
-
-		ifp->rt = rt;
-	}
-
-	if (!(ifp->flags & IFA_F_NOPREFIXROUTE)) {
-		addrconf_prefix_route(&ifp->addr, ifp->prefix_len,
-				      idev->dev, 0, 0);
-	}
-
-	addrconf_dad_start(ifp);
-
-	return 0;
-}
-
-static void addrconf_permanent_addr(struct net_device *dev)
-{
-	struct inet6_ifaddr *ifp, *tmp;
-	struct inet6_dev *idev;
-
-	idev = __in6_dev_get(dev);
-	if (!idev)
-		return;
-
-	write_lock_bh(&idev->lock);
-
-	list_for_each_entry_safe(ifp, tmp, &idev->addr_list, if_list) {
-		if ((ifp->flags & IFA_F_PERMANENT) &&
-		    fixup_permanent_addr(idev, ifp) < 0) {
-			write_unlock_bh(&idev->lock);
-			ipv6_del_addr(ifp);
-			write_lock_bh(&idev->lock);
-
-			net_info_ratelimited("%s: Failed to add prefix route for address %pI6c; dropping\n",
-					     idev->dev->name, &ifp->addr);
-		}
-	}
-
-	write_unlock_bh(&idev->lock);
-}
-
 static int addrconf_notify(struct notifier_block *this, unsigned long event,
 			   void *ptr)
 {
@@ -3337,9 +3260,6 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
 			run_pending = 1;
 		}
 
-		/* restore routes for permanent addresses */
-		addrconf_permanent_addr(dev);
-
 		switch (dev->type) {
 #if IS_ENABLED(CONFIG_IPV6_SIT)
 		case ARPHRD_SIT:
@@ -3448,20 +3368,11 @@ static void addrconf_type_change(struct net_device *dev, unsigned long event)
 		ipv6_mc_unmap(idev);
 }
 
-static bool addr_is_local(const struct in6_addr *addr)
-{
-	return ipv6_addr_type(addr) &
-		(IPV6_ADDR_LINKLOCAL | IPV6_ADDR_LOOPBACK);
-}
-
 static int addrconf_ifdown(struct net_device *dev, int how)
 {
 	struct net *net = dev_net(dev);
 	struct inet6_dev *idev;
-	struct inet6_ifaddr *ifa, *tmp;
-	struct list_head del_list;
-	int _keep_addr;
-	bool keep_addr;
+	struct inet6_ifaddr *ifa;
 	int state, i;
 
 	ASSERT_RTNL();
@@ -3488,16 +3399,6 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 
 	}
 
-	/* aggregate the system setting and interface setting */
-	_keep_addr = net->ipv6.devconf_all->keep_addr_on_down;
-	if (!_keep_addr)
-		_keep_addr = idev->cnf.keep_addr_on_down;
-
-	/* combine the user config with event to determine if permanent
-	 * addresses are to be removed from address hash table
-	 */
-	keep_addr = !(how || _keep_addr <= 0);
-
 	/* Step 2: clear hash table */
 	for (i = 0; i < IN6_ADDR_HSIZE; i++) {
 		struct hlist_head *h = &inet6_addr_lst[i];
@@ -3506,16 +3407,9 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 restart:
 		hlist_for_each_entry_rcu(ifa, h, addr_lst) {
 			if (ifa->idev == idev) {
+				hlist_del_init_rcu(&ifa->addr_lst);
 				addrconf_del_dad_work(ifa);
-				/* combined flag + permanent flag decide if
-				 * address is retained on a down event
-				 */
-				if (!keep_addr ||
-				    !(ifa->flags & IFA_F_PERMANENT) ||
-				    addr_is_local(&ifa->addr)) {
-					hlist_del_init_rcu(&ifa->addr_lst);
-					goto restart;
-				}
+				goto restart;
 			}
 		}
 		spin_unlock_bh(&addrconf_hash_lock);
@@ -3549,54 +3443,31 @@ restart:
 		write_lock_bh(&idev->lock);
 	}
 
-	/* re-combine the user config with event to determine if permanent
-	 * addresses are to be removed from the interface list
-	 */
-	keep_addr = (!how && _keep_addr > 0);
-
-	INIT_LIST_HEAD(&del_list);
-	list_for_each_entry_safe(ifa, tmp, &idev->addr_list, if_list) {
+	while (!list_empty(&idev->addr_list)) {
+		ifa = list_first_entry(&idev->addr_list,
+				       struct inet6_ifaddr, if_list);
 		addrconf_del_dad_work(ifa);
 
-		write_unlock_bh(&idev->lock);
-		spin_lock_bh(&ifa->lock);
-
-		if (keep_addr && (ifa->flags & IFA_F_PERMANENT) &&
-		    !addr_is_local(&ifa->addr)) {
-			/* set state to skip the notifier below */
-			state = INET6_IFADDR_STATE_DEAD;
-			ifa->state = 0;
-			if (!(ifa->flags & IFA_F_NODAD))
-				ifa->flags |= IFA_F_TENTATIVE;
-		} else {
-			state = ifa->state;
-			ifa->state = INET6_IFADDR_STATE_DEAD;
+		list_del(&ifa->if_list);
 
-			list_del(&ifa->if_list);
-			list_add(&ifa->if_list, &del_list);
-		}
+		write_unlock_bh(&idev->lock);
 
+		spin_lock_bh(&ifa->lock);
+		state = ifa->state;
+		ifa->state = INET6_IFADDR_STATE_DEAD;
 		spin_unlock_bh(&ifa->lock);
 
 		if (state != INET6_IFADDR_STATE_DEAD) {
 			__ipv6_ifa_notify(RTM_DELADDR, ifa);
 			inet6addr_notifier_call_chain(NETDEV_DOWN, ifa);
 		}
+		in6_ifa_put(ifa);
 
 		write_lock_bh(&idev->lock);
 	}
 
 	write_unlock_bh(&idev->lock);
 
-	/* now clean up addresses to be removed */
-	while (!list_empty(&del_list)) {
-		ifa = list_first_entry(&del_list,
-				       struct inet6_ifaddr, if_list);
-		list_del(&ifa->if_list);
-
-		in6_ifa_put(ifa);
-	}
-
 	/* Step 5: Discard anycast and multicast list */
 	if (how) {
 		ipv6_ac_destroy_dev(idev);
@@ -4861,7 +4732,6 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf,
 	array[DEVCONF_USE_OIF_ADDRS_ONLY] = cnf->use_oif_addrs_only;
 	array[DEVCONF_DROP_UNICAST_IN_L2_MULTICAST] = cnf->drop_unicast_in_l2_multicast;
 	array[DEVCONF_DROP_UNSOLICITED_NA] = cnf->drop_unsolicited_na;
-	array[DEVCONF_KEEP_ADDR_ON_DOWN] = cnf->keep_addr_on_down;
 }
 
 static inline size_t inet6_ifla6_size(void)
@@ -5950,14 +5820,6 @@ static struct addrconf_sysctl_table
 			.proc_handler	= proc_dointvec,
 		},
 		{
-			.procname       = "keep_addr_on_down",
-			.data           = &ipv6_devconf.keep_addr_on_down,
-			.maxlen         = sizeof(int),
-			.mode           = 0644,
-			.proc_handler   = proc_dointvec,
-
-		},
-		{
 			/* sentinel */
 		}
 	},
-- 
2.4.1

^ permalink raw reply related

* Re: [PATCH 1/5] phylib: don't return NULL from get_phy_device()
From: Florian Fainelli @ 2016-04-25 19:45 UTC (permalink / raw)
  To: Sergei Shtylyov, netdev; +Cc: arnd
In-Reply-To: <16806435.OmqP10ba8M@wasted.cogentembedded.com>

On 24/04/16 10:25, Sergei Shtylyov wrote:
> Arnd Bergmann asked that get_phy_device() returns either NULL or the error
> value,  not both on error.  Do as he said, return ERR_PTR(-ENODEV) instead
> of NULL when the PHY ID registers read as  all ones.
> 
> Suggested-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* Re: [PATCH 2/5] xgene: get_phy_device() doesn't return NULL anymore
From: Florian Fainelli @ 2016-04-25 19:46 UTC (permalink / raw)
  To: Sergei Shtylyov, isubramanian, kchudgar; +Cc: netdev, arnd
In-Reply-To: <73083313.ix2iJdrut5@wasted.cogentembedded.com>

On 24/04/16 10:27, Sergei Shtylyov wrote:
> Now that get_phy_device() no longer returns NULL on error, we don't need
> to check for it...
> 
> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* Re: [PATCH 3/5] fixed_phy: get_phy_device() doesn't return NULL anymore
From: Florian Fainelli @ 2016-04-25 19:46 UTC (permalink / raw)
  To: Sergei Shtylyov, netdev; +Cc: arnd
In-Reply-To: <7242612.WszImdz4eS@wasted.cogentembedded.com>

On 24/04/16 10:29, Sergei Shtylyov wrote:
> Now that get_phy_device() no longer returns NULL on error, we don't need
> to check for it...
> 
> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* Re: [PATCH 4/5] mdio_bus: get_phy_device() doesn't return NULL anymore
From: Florian Fainelli @ 2016-04-25 19:46 UTC (permalink / raw)
  To: Sergei Shtylyov, netdev; +Cc: arnd
In-Reply-To: <2703899.TobPpSfS5b@wasted.cogentembedded.com>

On 24/04/16 10:30, Sergei Shtylyov wrote:
> Now that get_phy_device() no longer returns NULL on error, we don't need
> to check for it...
> 
> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* Re: [PATCH 5/5] of_mdio: get_phy_device() doesn't return NULL anymore
From: Florian Fainelli @ 2016-04-25 19:47 UTC (permalink / raw)
  To: Sergei Shtylyov, netdev; +Cc: arnd
In-Reply-To: <2718908.IUaySUkXrd@wasted.cogentembedded.com>

On 24/04/16 10:31, Sergei Shtylyov wrote:
> Now that get_phy_device() no longer returns NULL on error, we don't need
> to check for it...
> 
> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next] tcp: SYN packets are now simply consumed
From: David Miller @ 2016-04-25 19:50 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1461301981.7627.24.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 21 Apr 2016 22:13:01 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> We now have proper per-listener but also per network namespace counters
> for SYN packets that might be dropped.
> 
> We replace the kfree_skb() by consume_skb() to be drop monitor [1]
> friendly, and remove an obsolete comment.
> FastOpen SYN packets can carry payload in them just fine.
> 
> [1] perf record -a -g -e skb:kfree_skb sleep 1; perf report
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox