Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] net: macb: do not scan PHYs manually
From: Nathan Sullivan @ 2016-04-28 14:19 UTC (permalink / raw)
  To: nicolas.ferre; +Cc: netdev, linux-kernel, Nathan Sullivan

Since of_mdiobus_register and mdiobus_register will scan automatically,
do not manually scan for PHY devices in the macb ethernet driver. Doing
so will result in many nonexistent PHYs on the MDIO bus if the MDIO
lines are floating or grounded, such as when they are not used.

Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com>
---
 drivers/net/ethernet/cadence/macb.c |   17 -----------------
 1 file changed, 17 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c
index 48a7d7d..e80e487 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -450,23 +450,6 @@ static int macb_mii_init(struct macb *bp)
 	if (np) {
 		/* try dt phy registration */
 		err = of_mdiobus_register(bp->mii_bus, np);
-
-		/* fallback to standard phy registration if no phy were
-		   found during dt phy registration */
-		if (!err && !phy_find_first(bp->mii_bus)) {
-			for (i = 0; i < PHY_MAX_ADDR; i++) {
-				struct phy_device *phydev;
-
-				phydev = mdiobus_scan(bp->mii_bus, i);
-				if (IS_ERR(phydev)) {
-					err = PTR_ERR(phydev);
-					break;
-				}
-			}
-
-			if (err)
-				goto err_out_unregister_bus;
-		}
 	} else {
 		if (pdata)
 			bp->mii_bus->phy_mask = pdata->phy_mask;
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH net v3 0/5] drivers: net: cpsw: phy-handle fixes
From: Tony Lindgren @ 2016-04-28 14:28 UTC (permalink / raw)
  To: David Rivshin (Allworx)
  Cc: netdev, linux-omap, Markus Brunner, devicetree, Grygorii Strashko,
	Mugunthan V N, Nicolas Chauvet, linux-kernel, Andrew Goodbody,
	David Miller, linux-arm-kernel
In-Reply-To: <1461805808-4102-1-git-send-email-drivshin.allworx@gmail.com>

* David Rivshin (Allworx) <drivshin.allworx@gmail.com> [160427 18:13]:
> From: David Rivshin <drivshin@allworx.com>
> 
> This series fixes a number of related issues around using phy-handle
> properties in cpsw emac nodes.
> 
> Patch 1 fixes a bug if more than one slave is used, and either
> slave uses the phy-handle property in the devicetree.
> 
> Patch 2 fixes a NULL pointer dereference which can occur if a
> phy-handle property is used and of_phy_connect() return NULL,
> such as with a bad devicetree.
> 
> Patch 3 fixes an issue where the phy-mode property would be ignored
> if a phy-handle property was used. This also fixes a bogus error
> message that would be emitted.
> 
> Patch 4 fixes makes the binding documentation more explicit that
> exactly one PHY property should be used, and also marks phy_id as
> deprecated.
> 
> Patch 5 cleans up the fixed-link case to work like the now-fixed
> phy-handle case.
> 
> I have tested on the following hardware configurations:
>  - (EVMSK) dual emac, phy_id property in both slaves
>  - (EVMSK) dual emac, phy-handle property in both slaves
>  - (EVMSK) a bad phy-handle property pointing to &mmc1
>  - (EVMSK) phy_id property with incorrect PHY address
>  - (BeagleBoneBlack) single emac, phy_id property
>  - (custom) single emac, fixed-link subnode
> 
> Andrew Goodbody reported testing v2 on a board that doesn't use
> dual_emac mode, but with 2 PHYs using phy-handle properties [1].
> 
> Nicolas Chauvet reported testing v2 on an HP t410 (dm8148).
> 
> Markus Brunner reported testing v1 on the following [2]:
>  - emac0 with phy_id and emac1 with fixed phy
>  - emac0 with phy-handle and emac1 with fixed phy
>  - emac0 with fixed phy and emac1 with fixed phy

Quickly boot tested these against next on dra62x-j5eco EVM:

Tested-by: Tony Lindgren <tony@atomide.com>

^ permalink raw reply

* [PATCH net-next] vxlan: fix initialization with custom link parameters
From: Jiri Benc @ 2016-04-28 14:36 UTC (permalink / raw)
  To: netdev; +Cc: Nicolas Dichtel

Commit 0c867c9bf84c ("vxlan: move Ethernet initialization to a separate
function") changed initialization order and as an unintended result, when the
user specifies additional link parameters (such as IFLA_ADDRESS) while
creating vxlan interface, those are overwritten by vxlan_ether_setup later.

It's necessary to call ether_setup from withing the ->setup callback. That
way, the correct parameters are set by rtnl_create_link later. This is done
also for VXLAN-GPE, as we don't know the interface type yet at that point,
and changed to the correct interface type later.

Fixes: 0c867c9bf84c ("vxlan: move Ethernet initialization to a separate function")
Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
 drivers/net/vxlan.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 6fb93b57a724..2668e528dee4 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2557,6 +2557,9 @@ static void vxlan_setup(struct net_device *dev)
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 	unsigned int h;
 
+	eth_hw_addr_random(dev);
+	ether_setup(dev);
+
 	dev->destructor = free_netdev;
 	SET_NETDEV_DEVTYPE(dev, &vxlan_type);
 
@@ -2592,8 +2595,6 @@ static void vxlan_setup(struct net_device *dev)
 
 static void vxlan_ether_setup(struct net_device *dev)
 {
-	eth_hw_addr_random(dev);
-	ether_setup(dev);
 	dev->priv_flags &= ~IFF_TX_SKB_SHARING;
 	dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
 	dev->netdev_ops = &vxlan_netdev_ether_ops;
@@ -2601,11 +2602,10 @@ static void vxlan_ether_setup(struct net_device *dev)
 
 static void vxlan_raw_setup(struct net_device *dev)
 {
+	dev->header_ops = NULL;
 	dev->type = ARPHRD_NONE;
 	dev->hard_header_len = 0;
 	dev->addr_len = 0;
-	dev->mtu = ETH_DATA_LEN;
-	dev->tx_queue_len = 1000;
 	dev->flags = IFF_POINTOPOINT | IFF_NOARP | IFF_MULTICAST;
 	dev->netdev_ops = &vxlan_netdev_raw_ops;
 }
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH] net: macb: do not scan PHYs manually
From: kbuild test robot @ 2016-04-28 14:37 UTC (permalink / raw)
  To: Nathan Sullivan
  Cc: kbuild-all, nicolas.ferre, netdev, linux-kernel, Nathan Sullivan
In-Reply-To: <1461853187-31715-1-git-send-email-nathan.sullivan@ni.com>

[-- Attachment #1: Type: text/plain, Size: 3784 bytes --]

Hi,

[auto build test WARNING on v4.6-rc5]
[cannot apply to net-next/master next-20160428]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url:    https://github.com/0day-ci/linux/commits/Nathan-Sullivan/net-macb-do-not-scan-PHYs-manually/20160428-222826
config: powerpc-allyesconfig (attached as .config)
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=powerpc 

All warnings (new ones prefixed by >>):

   drivers/net/ethernet/cadence/macb.c: In function 'macb_mii_init':
>> drivers/net/ethernet/cadence/macb.c:427:20: warning: unused variable 'i' [-Wunused-variable]
     int err = -ENXIO, i;
                       ^

vim +/i +427 drivers/net/ethernet/cadence/macb.c

222ca8e0 drivers/net/ethernet/cadence/macb.c Nathan Sullivan   2015-05-22  411  		phydev->supported &= ~SUPPORTED_1000baseT_Half;
222ca8e0 drivers/net/ethernet/cadence/macb.c Nathan Sullivan   2015-05-22  412  
6c36a707 drivers/net/macb.c                  frederic RODO     2007-07-12  413  	phydev->advertising = phydev->supported;
6c36a707 drivers/net/macb.c                  frederic RODO     2007-07-12  414  
6c36a707 drivers/net/macb.c                  frederic RODO     2007-07-12  415  	bp->link = 0;
6c36a707 drivers/net/macb.c                  frederic RODO     2007-07-12  416  	bp->speed = 0;
6c36a707 drivers/net/macb.c                  frederic RODO     2007-07-12  417  	bp->duplex = -1;
6c36a707 drivers/net/macb.c                  frederic RODO     2007-07-12  418  	bp->phy_dev = phydev;
6c36a707 drivers/net/macb.c                  frederic RODO     2007-07-12  419  
6c36a707 drivers/net/macb.c                  frederic RODO     2007-07-12  420  	return 0;
6c36a707 drivers/net/macb.c                  frederic RODO     2007-07-12  421  }
6c36a707 drivers/net/macb.c                  frederic RODO     2007-07-12  422  
421d9df0 drivers/net/ethernet/cadence/macb.c Cyrille Pitchen   2015-03-07  423  static int macb_mii_init(struct macb *bp)
6c36a707 drivers/net/macb.c                  frederic RODO     2007-07-12  424  {
84e0cdb0 drivers/net/ethernet/cadence/macb.c Jamie Iles        2011-03-08  425  	struct macb_platform_data *pdata;
148cbb53 drivers/net/ethernet/cadence/macb.c Boris BREZILLON   2013-08-22  426  	struct device_node *np;
6c36a707 drivers/net/macb.c                  frederic RODO     2007-07-12 @427  	int err = -ENXIO, i;
6c36a707 drivers/net/macb.c                  frederic RODO     2007-07-12  428  
3dbda77e drivers/net/macb.c                  Uwe Kleine-Koenig 2009-07-23  429  	/* Enable management port */
6c36a707 drivers/net/macb.c                  frederic RODO     2007-07-12  430  	macb_writel(bp, NCR, MACB_BIT(MPE));
6c36a707 drivers/net/macb.c                  frederic RODO     2007-07-12  431  
298cf9be drivers/net/macb.c                  Lennert Buytenhek 2008-10-08  432  	bp->mii_bus = mdiobus_alloc();
298cf9be drivers/net/macb.c                  Lennert Buytenhek 2008-10-08  433  	if (bp->mii_bus == NULL) {
298cf9be drivers/net/macb.c                  Lennert Buytenhek 2008-10-08  434  		err = -ENOMEM;
298cf9be drivers/net/macb.c                  Lennert Buytenhek 2008-10-08  435  		goto err_out;

:::::: The code at line 427 was first introduced by commit
:::::: 6c36a7074436e181fb3df41f66bbdaf53980951e macb: Use generic PHY layer

:::::: TO: frederic RODO <f.rodo@til-technologies.fr>
:::::: CC: Jeff Garzik <jeff@garzik.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 48660 bytes --]

^ permalink raw reply

* Re: [net-next PATCH V4 1/5] samples/bpf: add back functionality to redefine LLC command
From: Jesper Dangaard Brouer @ 2016-04-28 14:40 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: netdev, linux-kbuild, bblanco, borkmann, alexei.starovoitov,
	brouer
In-Reply-To: <20160428132132.GB7880@naverao1-tp.in.ibm.com>

On Thu, 28 Apr 2016 18:51:33 +0530
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:

> > Add this features back. Note that it is possible to redefine the LLC
> > on the make command like:
> > 
> >  make samples/bpf/ LLC=~/git/llvm/build/bin/llc  
> 
> I don't have an objection to this patch, but you didn't explain why/how 
> this approach is better than just doing:
>   PATH=~/git/llvm/build/bin make samples/bpf/

It is almost the same. There is always another way to do the same.

I explicitly use this to test different combinations of LLC and CLANG,
in-order to validate Alexei's claim that older versions of CLANG could
still work with a newer version of LLC.  Thus, one use-case you
approach cannot cover ;-)

And clang seems to install a clang-3.9, which my solution also covers
by explicitly specifying CLANG=clang-3.9, if several avail clang's are
in the PATH.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [RFC PATCH V2 2/2] vhost: device IOTLB API
From: Michael S. Tsirkin @ 2016-04-28 14:43 UTC (permalink / raw)
  To: Jason Wang
  Cc: kvm, qemu-devel, netdev, linux-kernel, peterx, virtualization,
	pbonzini
In-Reply-To: <5721AF9C.9030209@redhat.com>

On Thu, Apr 28, 2016 at 02:37:16PM +0800, Jason Wang wrote:
> 
> 
> On 04/27/2016 07:45 PM, Michael S. Tsirkin wrote:
> > On Fri, Mar 25, 2016 at 10:34:34AM +0800, Jason Wang wrote:
> >> This patch tries to implement an device IOTLB for vhost. This could be
> >> used with for co-operation with userspace(qemu) implementation of DMA
> >> remapping.
> >>
> >> The idea is simple. When vhost meets an IOTLB miss, it will request
> >> the assistance of userspace to do the translation, this is done
> >> through:
> >>
> >> - Fill the translation request in a preset userspace address (This
> >>   address is set through ioctl VHOST_SET_IOTLB_REQUEST_ENTRY).
> >> - Notify userspace through eventfd (This eventfd was set through ioctl
> >>   VHOST_SET_IOTLB_FD).
> > Why use an eventfd for this?
> 
> The aim is to implement the API all through ioctls.
> 
> >  We use them for interrupts because
> > that happens to be what kvm wants, but here - why don't we
> > just add a generic support for reading out events
> > on the vhost fd itself?
> 
> I've considered this approach, but what's the advantages of this? I mean
> looks like all other ioctls could be done through vhost fd
> reading/writing too.

read/write have a non-blocking flag.

It's not useful for other ioctls but it's useful here.


> >
> >> - device IOTLB were started and stopped through VHOST_RUN_IOTLB ioctl
> >>
> >> When userspace finishes the translation, it will update the vhost
> >> IOTLB through VHOST_UPDATE_IOTLB ioctl. Userspace is also in charge of
> >> snooping the IOTLB invalidation of IOMMU IOTLB and use
> >> VHOST_UPDATE_IOTLB to invalidate the possible entry in vhost.
> > There's one problem here, and that is that VQs still do not undergo
> > translation.  In theory VQ could be mapped in such a way
> > that it's not contigious in userspace memory.
> 
> I'm not sure I get the issue, current vhost API support setting
> desc_user_addr, used_user_addr and avail_user_addr independently. So
> looks ok? If not, looks not a problem to device IOTLB API itself.

The problem is that addresses are all HVA.

Without an iommu, we ask for them to be contigious and
since bus address == GPA, this means contigious GPA =>
contigious HVA. With an IOMMU you can map contigious
bus address but non contigious GPA and non contigious HVA.

Another concern: what if guest changes the GPA while keeping bus address
constant? Normal devices will work because they only use
bus addresses, but virtio will break.



> >
> >
> >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> > What limits amount of entries that kernel keeps around?
> 
> It depends on guest working set I think. Looking at
> http://dpdk.org/doc/guides/linux_gsg/sys_reqs.html:
> 
> - For 2MB page size in guest, it suggests hugepages=1024
> - For 1GB page size, it suggests a hugepages=4
> 
> So I choose 2048 to make sure it can cover this.

4K page size is rather common, too.

> > Do we want at least a mod parameter for this?
> 
> Maybe.
> 
> >
> >> ---
> >>  drivers/vhost/net.c        |   6 +-
> >>  drivers/vhost/vhost.c      | 301 +++++++++++++++++++++++++++++++++++++++------
> >>  drivers/vhost/vhost.h      |  17 ++-
> >>  fs/eventfd.c               |   3 +-
> >>  include/uapi/linux/vhost.h |  35 ++++++
> >>  5 files changed, 320 insertions(+), 42 deletions(-)
> >>
> 
> [...]
> 
> >> +struct vhost_iotlb_entry {
> >> +	__u64 iova;
> >> +	__u64 size;
> >> +	__u64 userspace_addr;
> > Alignment requirements?
> 
> The API does not require any alignment. Will add a comment for this.
> 
> >
> >> +	struct {
> >> +#define VHOST_ACCESS_RO      0x1
> >> +#define VHOST_ACCESS_WO      0x2
> >> +#define VHOST_ACCESS_RW      0x3
> >> +		__u8  perm;
> >> +#define VHOST_IOTLB_MISS           1
> >> +#define VHOST_IOTLB_UPDATE         2
> >> +#define VHOST_IOTLB_INVALIDATE     3
> >> +		__u8  type;
> >> +#define VHOST_IOTLB_INVALID        0x1
> >> +#define VHOST_IOTLB_VALID          0x2
> >> +		__u8  valid;
> > why do we need this flag?
> 
> Useless, will remove.
> 
> >
> >> +		__u8  u8_padding;
> >> +		__u32 padding;
> >> +	} flags;
> >> +};
> >> +
> >> +struct vhost_vring_iotlb_entry {
> >> +	unsigned int index;
> >> +	__u64 userspace_addr;
> >> +};
> >> +
> >>  struct vhost_memory_region {
> >>  	__u64 guest_phys_addr;
> >>  	__u64 memory_size; /* bytes */
> >> @@ -127,6 +153,15 @@ struct vhost_memory {
> >>  /* Set eventfd to signal an error */
> >>  #define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct vhost_vring_file)
> >>  
> >> +/* IOTLB */
> >> +/* Specify an eventfd file descriptor to signle on IOTLB miss */
> > typo
> 
> Will fix it.
> 
> >
> >> +#define VHOST_SET_VRING_IOTLB_CALL _IOW(VHOST_VIRTIO, 0x23, struct      \
> >> +                                        vhost_vring_file)
> >> +#define VHOST_SET_VRING_IOTLB_REQUEST _IOW(VHOST_VIRTIO, 0x25, struct   \
> >> +                                           vhost_vring_iotlb_entry)
> >> +#define VHOST_UPDATE_IOTLB _IOW(VHOST_VIRTIO, 0x24, struct vhost_iotlb_entry)
> >> +#define VHOST_RUN_IOTLB _IOW(VHOST_VIRTIO, 0x26, int)
> >> +
> > Is the assumption that userspace must dedicate a thread to running the iotlb? 
> > I dislike this one.
> > Please support asynchronous APIs at least optionally to make
> > userspace make its own threading decisions.
> 
> Nope, my qemu patches does not use a dedicated thread. This API is used
> to start or top DMAR according to e.g whether guest enable DMAR for
> intel IOMMU.

I see. Seems rather confusing - do we need to start/stop it
while device is running?


> >
> >>  /* VHOST_NET specific defines */
> >>  
> >>  /* Attach virtio net ring to a raw socket, or tap device.
> > Don't we need a feature bit for this?
> 
> Yes we need it. The feature bit is not considered in this patch and
> looks like it was still under discussion. After we finalize it, I will add.
> 
> > Are we short on feature bits? If yes maybe it's time to add
> > something like PROTOCOL_FEATURES that we have in vhost-user.
> >
> 
> I believe it can just work like VERSION_1, or is there anything I missed?

VERSION_1 is a virtio feature though. This one would be backend specific
...


-- 
MST

^ permalink raw reply

* [PATCH v2] net: macb: do not scan PHYs manually
From: Nathan Sullivan @ 2016-04-28 14:46 UTC (permalink / raw)
  To: nicolas.ferre; +Cc: netdev, linux-kernel, Nathan Sullivan

Since of_mdiobus_register and mdiobus_register will scan automatically,
do not manually scan for PHY devices in the macb ethernet driver. Doing
so will result in many nonexistent PHYs on the MDIO bus if the MDIO
lines are floating or grounded, such as when they are not used.

Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com>
---
 drivers/net/ethernet/cadence/macb.c |   19 +------------------
 1 file changed, 1 insertion(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c
index 48a7d7d..6506b4e 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -424,7 +424,7 @@ static int macb_mii_init(struct macb *bp)
 {
 	struct macb_platform_data *pdata;
 	struct device_node *np;
-	int err = -ENXIO, i;
+	int err = -ENXIO;
 
 	/* Enable management port */
 	macb_writel(bp, NCR, MACB_BIT(MPE));
@@ -450,23 +450,6 @@ static int macb_mii_init(struct macb *bp)
 	if (np) {
 		/* try dt phy registration */
 		err = of_mdiobus_register(bp->mii_bus, np);
-
-		/* fallback to standard phy registration if no phy were
-		   found during dt phy registration */
-		if (!err && !phy_find_first(bp->mii_bus)) {
-			for (i = 0; i < PHY_MAX_ADDR; i++) {
-				struct phy_device *phydev;
-
-				phydev = mdiobus_scan(bp->mii_bus, i);
-				if (IS_ERR(phydev)) {
-					err = PTR_ERR(phydev);
-					break;
-				}
-			}
-
-			if (err)
-				goto err_out_unregister_bus;
-		}
 	} else {
 		if (pdata)
 			bp->mii_bus->phy_mask = pdata->phy_mask;
-- 
1.7.10.4

^ permalink raw reply related

* Re: [net-next PATCH V4 1/5] samples/bpf: add back functionality to redefine LLC command
From: Naveen N. Rao @ 2016-04-28 14:50 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, linux-kbuild, bblanco, borkmann, alexei.starovoitov
In-Reply-To: <20160428164046.0998f03d@redhat.com>

On 2016/04/28 04:40PM, Jesper Dangaard Brouer wrote:
> On Thu, 28 Apr 2016 18:51:33 +0530
> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
> 
> > > Add this features back. Note that it is possible to redefine the LLC
> > > on the make command like:
> > > 
> > >  make samples/bpf/ LLC=~/git/llvm/build/bin/llc  
> > 
> > I don't have an objection to this patch, but you didn't explain why/how 
> > this approach is better than just doing:
> >   PATH=~/git/llvm/build/bin make samples/bpf/
> 
> It is almost the same. There is always another way to do the same.
> 
> I explicitly use this to test different combinations of LLC and CLANG,
> in-order to validate Alexei's claim that older versions of CLANG could
> still work with a newer version of LLC.  Thus, one use-case you
> approach cannot cover ;-)
> 
> And clang seems to install a clang-3.9, which my solution also covers
> by explicitly specifying CLANG=clang-3.9, if several avail clang's are
> in the PATH.

Ah, so a very niche use-case.

Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>

- Naveen


^ permalink raw reply

* Re: [PATCH] ip: add udp_csum, udp6_csum_tx, udp6_csum_rx control flags to ip l2tp add tunnel
From: James Chapman @ 2016-04-28 14:50 UTC (permalink / raw)
  To: Wang Shanker; +Cc: netdev
In-Reply-To: <AFE6B2C1-93B5-4782-870D-2C6C81BF6F52@gmail.com>

Yes, that looks like the problem.

The comments in l2tp.h which indicate that the csum attributes are u8
values are wrong. Code in net/l2tp/l2tp_netlink.c accesses these
attributes using nla_get_flag().

Please submit a patch to fix l2tp_tunnel_sock_create(). Include good
change notes and your signed-off-by tag so that it gets reviewed. See
Documentation/SubmittingPatches if you haven't submitted a kernel
patch here before.

On 27 April 2016 at 18:06, Wang Shanker <shankerwangmiao@gmail.com> wrote:
>
>
>> 在 2016年4月27日，23:33，Wang Shanker <shankerwangmiao@gmail.com> 写道：
>>
>>
>>
>>> 在 2016年4月27日，20:21，James Chapman <jchapman@katalix.com> 写道：
>>>
>>> On 26 April 2016 at 15:15, Wang Shanker <shankerwangmiao@gmail.com> wrote:
>>>> Hi, all
>>>>
>>>> It’s my first time to contribute to such an important open source project. Things began when I upgraded my server, called "Server A", form ubuntu 14.04 to 16.04, which is shipped with new kernel version, 4.4. After upgrade, I soon found a l2tp tunnel between this server and another linux server, called "Server B", via ipv6 broke down. Here is the network topology:
>>>>
>>>> +----------+                    +----------+
>>>> | Server A | -- IPV6 Network -- | Server B |
>>>> +----------+                    +----------+
>>>>
>>>> The l2tp tunnel was encapsulated in udp datagrams. All the configuration was normal and could work after I reverted the kernel on Server A to original version.
>>>>
>>>> Here is what i did to create the tunnel:
>>>>
>>>> ```
>>>> on Server A:
>>>>
>>>> ip l2tp add tunnel tunnel_id 86 peer_tunnel_id 86 remote 2001:db8::aaaa local 2001:db8::bbbb udp_sport 1086 udp_dport 1086
>>>> ip l2tp add session name l2tpeth0 tunnel_id 86 session_id 86 peer_session_id 86
>>>> ip l s l2tpeth0 up
>>>>
>>>> on Server B:
>>>>
>>>> ip l2tp add tunnel tunnel_id 86 peer_tunnel_id 86 local 2001:db8::aaaa remote 2001:db8::bbbb udp_sport 1086 udp_dport 1086
>>>> ip l2tp add session name l2tpeth0 tunnel_id 86 session_id 86 peer_session_id 86
>>>> ip l s l2tpeth0 up
>>>>
>>>> ```
>>>>
>>>> When I used tcpdump to diagnose the problem, I got such result:
>>>>
>>>> ```
>>>> on Server A:
>>>>
>>>> arping -i l2tpeth0 -0 1.2.3.4
>>>>
>>>> on Server B:
>>>>
>>>> tcpdump -i eth0 -n port 1086  -v
>>>>
>>>> 21:35:57.818810 IP6 (flowlabel 0x8f028, hlim 64, next-header UDP (17) payload length: 62) 2001:db8::aaaa.1086 > 2001:db8::bbbb.1086: [bad udp cksum 0x0000 -> 0x1140!] UDP, length 54
>>>> 21:35:58.820572 IP6 (flowlabel 0x8f028, hlim 64, next-header UDP (17) payload length: 62) 2001:db8::aaaa.1086 > 2001:db8::bbbb.1086: [bad udp cksum 0x0000 -> 0x1140!] UDP, length 54
>>>> 21:35:59.822216 IP6 (flowlabel 0x8f028, hlim 64, next-header UDP (17) payload length: 62) 2001:db8::aaaa.1086 > 2001:db8::bbbb.1086: [bad udp cksum 0x0000 -> 0x1140!] UDP, length 54
>>>>
>>>> ```
>>>>
>>>> After looking into kernel source, I found out that in this commit a new feature to set udp6 checksum to zero in commit 6b649fe, which added `L2TP_ATTR_UDP_ZERO_CSUM6_TX` and `L2TP_ATTR_UDP_ZERO_CSUM6_TX`.
>>>>
>>>> As a result, I added `udp_csum`, `udp6_csum_tx`, `udp6_csum_rx` control flags to `ip l2tp add tunnel` to control those attributes about checksum.
>>>>
>>>> Using this to create the tunnel instead on Server A:
>>>>
>>>> ```
>>>> ip l2tp add tunnel tunnel_id 86 peer_tunnel_id 86 remote 2001:db8::aaaa local 2001:db8::bbbb udp_sport 1086 udp_dport 1086 udp6_csum_tx on udp6_csum_rx on
>>>> ```
>>>>
>>>> I finally got:
>>>>
>>>> ```
>>>> on Server A:
>>>>
>>>> arping -i l2tpeth0 -0 1.2.3.4
>>>>
>>>> on Server B:
>>>>
>>>> tcpdump -i eth0 -n port 1086  -v
>>>>
>>>> 22:07:03.844297 IP6 (flowlabel 0x8f028, hlim 64, next-header UDP (17) payload length: 62) 2001:db8::aaaa.1086 > 2001:db8::bbbb.1086: [udp sum ok] UDP, length 54
>>>> 22:07:04.845717 IP6 (flowlabel 0x8f028, hlim 64, next-header UDP (17) payload length: 62) 2001:db8::aaaa.1086 > 2001:db8::bbbb.1086: [udp sum ok] UDP, length 54
>>>> 22:07:05.847965 IP6 (flowlabel 0x8f028, hlim 64, next-header UDP (17) payload length: 62) 2001:db8::aaaa.1086 > 2001:db8::bbbb.1086: [udp sum ok] UDP, length 54
>>>>
>>>> tcpdump -i l2tpeth0 -v
>>>>
>>>> 22:10:35.691326 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.2.3.4 tell 0.0.0.0, length 28
>>>> 22:10:36.693627 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.2.3.4 tell 0.0.0.0, length 28
>>>> 22:10:37.695010 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.2.3.4 tell 0.0.0.0, length 28
>>>> 22:10:38.697121 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.2.3.4 tell 0.0.0.0, length 28
>>>>
>>>> ```
>>>>
>>>> It seems to work. However, is it the real point that should be fixed and does my patch fix it well? I need your suggestion.
>>>
>>> This seems reasonable to me. It is good to provide user control of
>>> L2TP checksum options.
>>>
>>> However, there is a problem with your patch. The netlink attributes
>>> you are accessing to control checksums are flags, not u8 values.
>>
>> I’m not so familiar with kernel code. However, in `<linux/l2tp.h>` :
>>
>> ```
>>
>> /*
>> * ATTR types defined for L2TP
>> */
>> enum {
>>       L2TP_ATTR_NONE,                 /* no data */
>> // ...
>>       L2TP_ATTR_IP6_DADDR,            /* struct in6_addr */
>>       L2TP_ATTR_UDP_ZERO_CSUM6_TX,    /* u8 */
>>       L2TP_ATTR_UDP_ZERO_CSUM6_RX,    /* u8 */
>> // ...
>>
>> }
>> ```
>>
>> isn’t L2TP_ATTR_UDP_ZERO_CSUM6_TX a u8 value? Or should I use `addattr` instead of `addattr8`?
>>
>>>
>>> Maybe the default checksum setting for such l2tp tunnels should be
>>> changed in the l2tp kernel code to match the previous behaviour where
>>> IPv6 checksums were disabled?
>>
>> I think so. However, I’m confused with those code.
>>
>> From the name of the attrs `L2TP_ATTR_UDP_ZERO_CSUM6_TX` and `L2TP_ATTR_UDP_ZERO_CSUM6_RX`, I
>> can tell that when those flags are set, the checksum will be zero. Also, according to the
>> comment of commit 6b649fe in kernel source, “Added new L2TP configuration options to allow
>> TX and RX of zero checksums in IPv6. Default is not to use them.”, checksums shouldn't have
>> been zero by default. However, in fact, they are. I think there may be some bugs in kernel
>> source.
>
> I think I’ve got the bug. Here is the patch for kernel
>
> ---
>  net/l2tp/l2tp_core.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
> index afca2eb..6edfa99 100644
> --- a/net/l2tp/l2tp_core.c
> +++ b/net/l2tp/l2tp_core.c
> @@ -1376,9 +1376,9 @@ static int l2tp_tunnel_sock_create(struct net *net,
>                         memcpy(&udp_conf.peer_ip6, cfg->peer_ip6,
>                                sizeof(udp_conf.peer_ip6));
>                         udp_conf.use_udp6_tx_checksums =
> -                           cfg->udp6_zero_tx_checksums;
> +                         ! cfg->udp6_zero_tx_checksums;
>                         udp_conf.use_udp6_rx_checksums =
> -                           cfg->udp6_zero_rx_checksums;
> +                         ! cfg->udp6_zero_rx_checksums;
>                 } else
>  #endif
>                 {
> --
>
>
>>>
>>>>
>>>>
>>>> ---
>>>> ip/ipl2tp.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
>>>> 1 file changed, 45 insertions(+)
>>>>
>>>> diff --git a/ip/ipl2tp.c b/ip/ipl2tp.c
>>>> index 3c8ee93..67a6482 100644
>>>> --- a/ip/ipl2tp.c
>>>> +++ b/ip/ipl2tp.c
>>>> @@ -56,6 +56,8 @@ struct l2tp_parm {
>>>>
>>>>       uint16_t pw_type;
>>>>       uint16_t mtu;
>>>> +       int udp6_csum_tx:1;
>>>> +       int udp6_csum_rx:1;
>>>>       int udp_csum:1;
>>>>       int recv_seq:1;
>>>>       int send_seq:1;
>>>> @@ -117,6 +119,9 @@ static int create_tunnel(struct l2tp_parm *p)
>>>>       if (p->encap == L2TP_ENCAPTYPE_UDP) {
>>>>               addattr16(&req.n, 1024, L2TP_ATTR_UDP_SPORT, p->local_udp_port);
>>>>               addattr16(&req.n, 1024, L2TP_ATTR_UDP_DPORT, p->peer_udp_port);
>>>> +               addattr8 (&req.n, 1024, L2TP_ATTR_UDP_CSUM, p->udp_csum);
>>>> +               addattr8 (&req.n, 1024, L2TP_ATTR_UDP_ZERO_CSUM6_TX, p->udp6_csum_tx);
>>>> +               addattr8 (&req.n, 1024, L2TP_ATTR_UDP_ZERO_CSUM6_RX, p->udp6_csum_rx);
>>>>       }
>>>>
>>>>       if (rtnl_talk(&genl_rth, &req.n, NULL, 0) < 0)
>>>> @@ -282,6 +287,14 @@ static int get_response(struct nlmsghdr *n, void *arg)
>>>>               p->l2spec_len = rta_getattr_u8(attrs[L2TP_ATTR_L2SPEC_LEN]);
>>>>
>>>>       p->udp_csum = !!attrs[L2TP_ATTR_UDP_CSUM];
>>>> +       if (attrs[L2TP_ATTR_UDP_ZERO_CSUM6_TX])
>>>> +         p->udp6_csum_tx = rta_getattr_u8(attrs[L2TP_ATTR_UDP_ZERO_CSUM6_TX]);
>>>> +       else
>>>> +         p->udp6_csum_tx = 1;
>>>> +       if (attrs[L2TP_ATTR_UDP_ZERO_CSUM6_RX])
>>>> +         p->udp6_csum_rx = rta_getattr_u8(attrs[L2TP_ATTR_UDP_ZERO_CSUM6_RX]);
>>>> +       else
>>>> +         p->udp6_csum_rx = 1;
>>>>       if (attrs[L2TP_ATTR_COOKIE])
>>>>               memcpy(p->cookie, RTA_DATA(attrs[L2TP_ATTR_COOKIE]),
>>>>                      p->cookie_len = RTA_PAYLOAD(attrs[L2TP_ATTR_COOKIE]));
>>>> @@ -470,6 +483,9 @@ static void usage(void)
>>>>       fprintf(stderr, "          tunnel_id ID peer_tunnel_id ID\n");
>>>>       fprintf(stderr, "          [ encap { ip | udp } ]\n");
>>>>       fprintf(stderr, "          [ udp_sport PORT ] [ udp_dport PORT ]\n");
>>>> +       fprintf(stderr, "          [ udp_csum { on | off } ]\n");
>>>> +       fprintf(stderr, "          [ udp6_csum_tx { on | off } ]\n");
>>>> +       fprintf(stderr, "          [ udp6_csum_rx { on | off } ]\n");
>>>>       fprintf(stderr, "Usage: ip l2tp add session [ name NAME ]\n");
>>>>       fprintf(stderr, "          tunnel_id ID\n");
>>>>       fprintf(stderr, "          session_id ID peer_session_id ID\n");
>>>> @@ -500,6 +516,8 @@ static int parse_args(int argc, char **argv, int cmd, struct l2tp_parm *p)
>>>>       /* Defaults */
>>>>       p->l2spec_type = L2TP_L2SPECTYPE_DEFAULT;
>>>>       p->l2spec_len = 4;
>>>> +       p->udp6_csum_rx = 1;
>>>> +       p->udp6_csum_tx = 1;
>>>>
>>>>       while (argc > 0) {
>>>>               if (strcmp(*argv, "encap") == 0) {
>>>> @@ -569,6 +587,33 @@ static int parse_args(int argc, char **argv, int cmd, struct l2tp_parm *p)
>>>>                       if (get_u16(&uval, *argv, 0))
>>>>                               invarg("invalid port\n", *argv);
>>>>                       p->peer_udp_port = uval;
>>>> +               } else if (strcmp(*argv, "udp_csum") == 0) {
>>>> +                       NEXT_ARG();
>>>> +                       if (strcmp(*argv, "on") == 0) {
>>>> +                               p->udp_csum = 1;
>>>> +                       } else if (strcmp(*argv, "off") == 0) {
>>>> +                               p->udp_csum = 0;
>>>> +                       } else {
>>>> +                               invarg("invalid option for udp_csum\n", *argv);
>>>> +                       }
>>>> +               } else if (strcmp(*argv, "udp6_csum_rx") == 0) {
>>>> +                       NEXT_ARG();
>>>> +                       if (strcmp(*argv, "on") == 0) {
>>>> +                               p->udp6_csum_rx = 1;
>>>> +                       } else if (strcmp(*argv, "off") == 0) {
>>>> +                               p->udp6_csum_rx = 0;
>>>> +                       } else {
>>>> +                               invarg("invalid option for udp6_csum_rx\n", *argv);
>>>> +                       }
>>>> +               } else if (strcmp(*argv, "udp6_csum_tx") == 0) {
>>>> +                       NEXT_ARG();
>>>> +                       if (strcmp(*argv, "on") == 0) {
>>>> +                               p->udp6_csum_tx = 1;
>>>> +                       } else if (strcmp(*argv, "off") == 0) {
>>>> +                               p->udp6_csum_tx = 0;
>>>> +                       } else {
>>>> +                               invarg("invalid option for udp6_csum_tx\n", *argv);
>>>> +                       }
>>>>               } else if (strcmp(*argv, "offset") == 0) {
>>>>                       __u8 uval;
>>>>
>>>> --
>>>> 2.5.2
>

^ permalink raw reply

* Re: [PATCH net-next] vxlan: fix initialization with custom link parameters
From: Nicolas Dichtel @ 2016-04-28 15:06 UTC (permalink / raw)
  To: Jiri Benc, netdev
In-Reply-To: <7c19200582d6f4272cee914934f428775e405e89.1461854177.git.jbenc@redhat.com>

Le 28/04/2016 16:36, Jiri Benc a écrit :
> Commit 0c867c9bf84c ("vxlan: move Ethernet initialization to a separate
> function") changed initialization order and as an unintended result, when the
> user specifies additional link parameters (such as IFLA_ADDRESS) while
> creating vxlan interface, those are overwritten by vxlan_ether_setup later.
> 
> It's necessary to call ether_setup from withing the ->setup callback. That
> way, the correct parameters are set by rtnl_create_link later. This is done
> also for VXLAN-GPE, as we don't know the interface type yet at that point,
> and changed to the correct interface type later.
> 
> Fixes: 0c867c9bf84c ("vxlan: move Ethernet initialization to a separate function")
> Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Signed-off-by: Jiri Benc <jbenc@redhat.com>
Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

^ permalink raw reply

* Re: [PATCH] net: macb: do not scan PHYs manually
From: Josh Cartwright @ 2016-04-28 15:07 UTC (permalink / raw)
  To: Nathan Sullivan; +Cc: nicolas.ferre, netdev, linux-kernel
In-Reply-To: <1461853187-31715-1-git-send-email-nathan.sullivan@ni.com>

[-- Attachment #1: Type: text/plain, Size: 1493 bytes --]

On Thu, Apr 28, 2016 at 09:19:47AM -0500, Nathan Sullivan wrote:
> Since of_mdiobus_register and mdiobus_register will scan automatically,

This is only partially true.  of_mdiobus_register() only scans for PHYs
with device tree presence (starting with nodes which specify an address,
then continuing for nodes without an address specifier).

So, this patch will regress any platform currently relying on a PHY
being found/probed even though it has no representation as a device tree
node.  I don't have enough history with this driver to make a judgement
as to whether or not there are any users of this functionality.

> do not manually scan for PHY devices in the macb ethernet driver. Doing
> so will result in many nonexistent PHYs on the MDIO bus if the MDIO
> lines are floating or grounded, such as when they are not used.

Just a restatement/elaboration of the problem in an attempt to provide
more context:

The assumption being made is that the phy associated with a given macb
instance exists on that instance's controlled mdio bus.  This assumption
doesn't hold true on some dual-ethernet Zynq 7000-series based designs
where both phys exist on the same MDIO bus (for pinning reasons, or
whatever).

             MDIO
    macb0 ----+----- phy0, addrX
              |
              +----- phy1, addrY

    macb1 ----- {MDIO lines NC, not pinmuxed}

(In this case, the phy associated with macb1 is phy1, which exists on
the MDIO bus controlled by the macb0 instance).

HTH,

  Josh

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply

* Re: [PATCH net-next] vxlan: fix ethernet address initialization
From: Nicolas Dichtel @ 2016-04-28 15:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, Jiri Benc
In-Reply-To: <1461837856-4316-1-git-send-email-nicolas.dichtel@6wind.com>

Le 28/04/2016 12:04, Nicolas Dichtel a écrit :
> Since commit 0c867c9bf84c, when the user specifies an ethernet address with
> IFLA_ADDRESS, it's overridden by vxlan_ether_setup() (rtnl_link_ops->setup
> is called in rtnetlink.c before handling IFLA_ADDRESS).
> 
> To test it:
> ip link add name vxlan1 address de:ad:de:4c:0f:c2 type vxlan id 1 group 239.0.0.10 dev eth0
> 
> CC: Jiri Benc <jbenc@redhat.com>
> Fixes: 0c867c9bf84c ("vxlan: move Ethernet initialization to a separate function")
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
David, please drop this patch. Jiri has sent a better fix.


Thank you,
Nicolas

^ permalink raw reply

* Re: [PATCH v2] net: macb: do not scan PHYs manually
From: Nicolas Ferre @ 2016-04-28 15:44 UTC (permalink / raw)
  To: Nathan Sullivan, netdev; +Cc: linux-kernel, Florian Fainelli, Alexandre Belloni
In-Reply-To: <1461854802-8142-1-git-send-email-nathan.sullivan@ni.com>

Le 28/04/2016 16:46, Nathan Sullivan a écrit :
> Since of_mdiobus_register and mdiobus_register will scan automatically,
> do not manually scan for PHY devices in the macb ethernet driver. Doing
> so will result in many nonexistent PHYs on the MDIO bus if the MDIO
> lines are floating or grounded, such as when they are not used.
> 
> Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com>

Well, as explained in the commit message that added this feature and in
the comment, if no phy is specified in the DT we end up without phy...

There are AT91 platforms which lack specification for the phy node in
the DT. So, I don't know if there is a better way to deal with this case
but I see this removal as risky.

Bye,


> ---
>  drivers/net/ethernet/cadence/macb.c |   19 +------------------
>  1 file changed, 1 insertion(+), 18 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c
> index 48a7d7d..6506b4e 100644
> --- a/drivers/net/ethernet/cadence/macb.c
> +++ b/drivers/net/ethernet/cadence/macb.c
> @@ -424,7 +424,7 @@ static int macb_mii_init(struct macb *bp)
>  {
>  	struct macb_platform_data *pdata;
>  	struct device_node *np;
> -	int err = -ENXIO, i;
> +	int err = -ENXIO;
>  
>  	/* Enable management port */
>  	macb_writel(bp, NCR, MACB_BIT(MPE));
> @@ -450,23 +450,6 @@ static int macb_mii_init(struct macb *bp)
>  	if (np) {
>  		/* try dt phy registration */
>  		err = of_mdiobus_register(bp->mii_bus, np);
> -
> -		/* fallback to standard phy registration if no phy were
> -		   found during dt phy registration */
> -		if (!err && !phy_find_first(bp->mii_bus)) {
> -			for (i = 0; i < PHY_MAX_ADDR; i++) {
> -				struct phy_device *phydev;
> -
> -				phydev = mdiobus_scan(bp->mii_bus, i);
> -				if (IS_ERR(phydev)) {
> -					err = PTR_ERR(phydev);
> -					break;
> -				}
> -			}
> -
> -			if (err)
> -				goto err_out_unregister_bus;
> -		}
>  	} else {
>  		if (pdata)
>  			bp->mii_bus->phy_mask = pdata->phy_mask;
> 


-- 
Nicolas Ferre

^ permalink raw reply

* [PATCH net-next v2 0/5] bridge: per-vlan stats
From: Nikolay Aleksandrov @ 2016-04-28 15:52 UTC (permalink / raw)
  To: netdev; +Cc: roopa, davem, stephen, jhs, Nikolay Aleksandrov

Hi,
This set adds support for bridge per-vlan statistics.
In order to be able to dump statistics for many vlans we need a way to
continue dumping after reaching maximum size, thus patches 01 and 02 extend
the new stats API with a per-device extended link stats attribute and
callback which can save its local state and continue where it left off
afterwards. I considered using the already existing "fill_xstats" callback
but it gets confusing since we need to separate the linkinfo dump from the
new stats api dump and adding a flag/argument to do that just looks messy.
I don't think the rtnl_link_ops size is an issue, so adding these seemed
like the cleaner approach.

Patch 03 converts the pvid to a pointer so we can consolidate the vlan
stats accounting paths later, also allows to simplify the pvid code.
Patches 04 and 05 add the stats support and netlink dump support
respectively.
I've tested this set with both old and modified iproute2, kmemleak on and
some traffic stress tests while adding/removing vlans and ports.

v2:
 - Improve the error checking, rename lidx to prividx and save the current
   idx user instead of restricting it to one in patch 01
 - squash patch 02 into 01 and remove the restriction
 - add callback descriptions, improve the size calculation and change the
   xstats message structure to have an embedding level per rtnl link type
   so we can avoid one call to get the link type (and thus filter on it)
   and also each link type can now have any number of private attributes
   inside
 - fix a problem where the vlan stats are not dumped if the bridge has 0
   vlans on it but has vlans on the ports, add bridge link type private
   attributes and also add paddings for future extensions to avoid at least
   a few netlink attributes and improve struct alignment
 - drop the is_skb_forwardable argument constifying patch as it's not
   needed anymore, but it's a nice cleanup which I'll send separately

Thank you,
 Nik

Note: Jamal I haven't forgotten about the per-port per-vlan stats, I've got
a follow-up patch that adds it. You can easily see that the infrastructure
for private port/vlan stats is in place after this set. Though the stats
api will need some more changes to support that.

Nikolay Aleksandrov (5):
  net: rtnetlink: allow rtnl_fill_statsinfo to save private state
    counter
  net: rtnetlink: add linkxstats callbacks and attribute
  bridge: vlan: RCUify pvid
  bridge: vlan: learn to count
  bridge: netlink: export per-vlan stats

 include/net/rtnetlink.h        |   7 +++
 include/uapi/linux/if_bridge.h |  18 ++++++
 include/uapi/linux/if_link.h   |  13 ++++
 net/bridge/br_netlink.c        |  88 +++++++++++++++++++++++----
 net/bridge/br_private.h        |  32 +++++-----
 net/bridge/br_vlan.c           | 134 +++++++++++++++++++++++++++++------------
 net/core/rtnetlink.c           |  74 +++++++++++++++++++----
 7 files changed, 287 insertions(+), 79 deletions(-)

-- 
2.4.11

^ permalink raw reply

* [PATCH net-next v2 1/5] net: rtnetlink: allow rtnl_fill_statsinfo to save private state counter
From: Nikolay Aleksandrov @ 2016-04-28 15:52 UTC (permalink / raw)
  To: netdev; +Cc: roopa, davem, stephen, jhs, Nikolay Aleksandrov
In-Reply-To: <1461858771-4732-1-git-send-email-nikolay@cumulusnetworks.com>

The new prividx argument allows the current dumping device to save a
private state counter which would enable it to continue dumping from
where it left off. And the idxattr is used to save the current idx user
so multiple prividx using attributes can be requested at the same time
as suggested by Roopa Prabhu.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
v2: improve the error check in rtnl_fill_statsinfo, rename lidx to
prividx, squash patch 2 into this one and save the current idx user
instead of restricting only one

 net/core/rtnetlink.c | 44 +++++++++++++++++++++++++++++++-------------
 1 file changed, 31 insertions(+), 13 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 5503dfe6a050..de529a20cd18 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3444,13 +3444,21 @@ out:
 	return err;
 }
 
+static bool stats_attr_valid(unsigned int mask, int attrid, int idxattr)
+{
+	return (mask & IFLA_STATS_FILTER_BIT(attrid)) &&
+	       (!idxattr || idxattr == attrid);
+}
+
 static int rtnl_fill_statsinfo(struct sk_buff *skb, struct net_device *dev,
 			       int type, u32 pid, u32 seq, u32 change,
-			       unsigned int flags, unsigned int filter_mask)
+			       unsigned int flags, unsigned int filter_mask,
+			       int *idxattr, int *prividx)
 {
 	struct if_stats_msg *ifsm;
 	struct nlmsghdr *nlh;
 	struct nlattr *attr;
+	int s_prividx = *prividx;
 
 	ASSERT_RTNL();
 
@@ -3462,7 +3470,7 @@ static int rtnl_fill_statsinfo(struct sk_buff *skb, struct net_device *dev,
 	ifsm->ifindex = dev->ifindex;
 	ifsm->filter_mask = filter_mask;
 
-	if (filter_mask & IFLA_STATS_FILTER_BIT(IFLA_STATS_LINK_64)) {
+	if (stats_attr_valid(filter_mask, IFLA_STATS_LINK_64, *idxattr)) {
 		struct rtnl_link_stats64 *sp;
 
 		attr = nla_reserve_64bit(skb, IFLA_STATS_LINK_64,
@@ -3480,7 +3488,11 @@ static int rtnl_fill_statsinfo(struct sk_buff *skb, struct net_device *dev,
 	return 0;
 
 nla_put_failure:
-	nlmsg_cancel(skb, nlh);
+	/* not a multi message or no progress mean a real error */
+	if (!(flags & NLM_F_MULTI) || s_prividx == *prividx)
+		nlmsg_cancel(skb, nlh);
+	else
+		nlmsg_end(skb, nlh);
 
 	return -EMSGSIZE;
 }
@@ -3494,7 +3506,7 @@ static size_t if_nlmsg_stats_size(const struct net_device *dev,
 {
 	size_t size = 0;
 
-	if (filter_mask & IFLA_STATS_FILTER_BIT(IFLA_STATS_LINK_64))
+	if (stats_attr_valid(filter_mask, IFLA_STATS_LINK_64, 0))
 		size += nla_total_size_64bit(sizeof(struct rtnl_link_stats64));
 
 	return size;
@@ -3503,8 +3515,9 @@ static size_t if_nlmsg_stats_size(const struct net_device *dev,
 static int rtnl_stats_get(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
-	struct if_stats_msg *ifsm;
 	struct net_device *dev = NULL;
+	int idxattr = 0, prividx = 0;
+	struct if_stats_msg *ifsm;
 	struct sk_buff *nskb;
 	u32 filter_mask;
 	int err;
@@ -3528,7 +3541,7 @@ static int rtnl_stats_get(struct sk_buff *skb, struct nlmsghdr *nlh)
 
 	err = rtnl_fill_statsinfo(nskb, dev, RTM_NEWSTATS,
 				  NETLINK_CB(skb).portid, nlh->nlmsg_seq, 0,
-				  0, filter_mask);
+				  0, filter_mask, &idxattr, &prividx);
 	if (err < 0) {
 		/* -EMSGSIZE implies BUG in if_nlmsg_stats_size */
 		WARN_ON(err == -EMSGSIZE);
@@ -3542,18 +3555,19 @@ static int rtnl_stats_get(struct sk_buff *skb, struct nlmsghdr *nlh)
 
 static int rtnl_stats_dump(struct sk_buff *skb, struct netlink_callback *cb)
 {
+	int h, s_h, err, s_idx, s_idxattr, s_prividx;
 	struct net *net = sock_net(skb->sk);
+	unsigned int flags = NLM_F_MULTI;
 	struct if_stats_msg *ifsm;
-	int h, s_h;
-	int idx = 0, s_idx;
-	struct net_device *dev;
 	struct hlist_head *head;
-	unsigned int flags = NLM_F_MULTI;
+	struct net_device *dev;
 	u32 filter_mask = 0;
-	int err;
+	int idx = 0;
 
 	s_h = cb->args[0];
 	s_idx = cb->args[1];
+	s_idxattr = cb->args[2];
+	s_prividx = cb->args[3];
 
 	cb->seq = net->dev_base_seq;
 
@@ -3571,7 +3585,8 @@ static int rtnl_stats_dump(struct sk_buff *skb, struct netlink_callback *cb)
 			err = rtnl_fill_statsinfo(skb, dev, RTM_NEWSTATS,
 						  NETLINK_CB(cb->skb).portid,
 						  cb->nlh->nlmsg_seq, 0,
-						  flags, filter_mask);
+						  flags, filter_mask,
+						  &s_idxattr, &s_prividx);
 			/* If we ran out of room on the first message,
 			 * we're in trouble
 			 */
@@ -3579,13 +3594,16 @@ static int rtnl_stats_dump(struct sk_buff *skb, struct netlink_callback *cb)
 
 			if (err < 0)
 				goto out;
-
+			s_prividx = 0;
+			s_idxattr = 0;
 			nl_dump_check_consistent(cb, nlmsg_hdr(skb));
 cont:
 			idx++;
 		}
 	}
 out:
+	cb->args[3] = s_prividx;
+	cb->args[2] = s_idxattr;
 	cb->args[1] = idx;
 	cb->args[0] = h;
 
-- 
2.4.11

^ permalink raw reply related

* [PATCH net-next v2 2/5] net: rtnetlink: add linkxstats callbacks and attribute
From: Nikolay Aleksandrov @ 2016-04-28 15:52 UTC (permalink / raw)
  To: netdev; +Cc: roopa, davem, stephen, jhs, Nikolay Aleksandrov
In-Reply-To: <1461858771-4732-1-git-send-email-nikolay@cumulusnetworks.com>

Add callbacks to calculate the size and fill link extended statistics
which can be split into multiple messages and are dumped via the new
rtnl stats API (RTM_GETSTATS) with the IFLA_STATS_LINK_XSTATS attribute.
Also add that attribute to the idx mask check since it is expected to
be able to save state and resume dumping (e.g. future bridge per-vlan
stats will be dumped via this attribute and callbacks).
Each link type should nest its private attributes under the per-link type
attribute. This allows to have any number of separated private attributes
and to avoid one call to get the dev link type.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
v2: add callback descriptions and make the size calculation more
accurate, change the netlink xstats message structure with one more
level for each rtnl link type which allows for private link type attributes
and also allows us to avoid 1 call to get the dev link type.

 include/net/rtnetlink.h      |  7 +++++++
 include/uapi/linux/if_link.h | 12 ++++++++++++
 net/core/rtnetlink.c         | 30 ++++++++++++++++++++++++++++++
 3 files changed, 49 insertions(+)

diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index 2f87c1ba13de..006a7b81d758 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -47,6 +47,9 @@ static inline int rtnl_msg_family(const struct nlmsghdr *nlh)
  *	@get_num_rx_queues: Function to determine number of receive queues
  *			    to create when creating a new device.
  *	@get_link_net: Function to get the i/o netns of the device
+ *	@get_linkxstats_size: Function to calculate the required room for
+ *			      dumping device-specific extended link stats
+ *	@fill_linkxstats: Function to dump device-specific extended link stats
  */
 struct rtnl_link_ops {
 	struct list_head	list;
@@ -95,6 +98,10 @@ struct rtnl_link_ops {
 						   const struct net_device *dev,
 						   const struct net_device *slave_dev);
 	struct net		*(*get_link_net)(const struct net_device *dev);
+	size_t			(*get_linkxstats_size)(const struct net_device *dev);
+	int			(*fill_linkxstats)(struct sk_buff *skb,
+						   const struct net_device *dev,
+						   int *prividx);
 };
 
 int __rtnl_link_register(struct rtnl_link_ops *ops);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index d82de331bb6b..8577c0e4116f 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -802,6 +802,7 @@ struct if_stats_msg {
 enum {
 	IFLA_STATS_UNSPEC, /* also used as 64bit pad attribute */
 	IFLA_STATS_LINK_64,
+	IFLA_STATS_LINK_XSTATS,
 	__IFLA_STATS_MAX,
 };
 
@@ -809,4 +810,15 @@ enum {
 
 #define IFLA_STATS_FILTER_BIT(ATTR)	(1 << (ATTR - 1))
 
+/* These are embedded into IFLA_STATS_LINK_XSTATS:
+ * [IFLA_STATS_LINK_XSTATS]
+ * -> [LINK_XSTATS_TYPE_xxx]
+ *    -> [rtnl link type specific attributes]
+ */
+enum {
+	LINK_XSTATS_TYPE_UNSPEC,
+	__LINK_XSTATS_TYPE_MAX
+};
+#define LINK_XSTATS_TYPE_MAX (__LINK_XSTATS_TYPE_MAX - 1)
+
 #endif /* _UAPI_LINUX_IF_LINK_H */
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index de529a20cd18..d471f097c739 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3483,6 +3483,26 @@ static int rtnl_fill_statsinfo(struct sk_buff *skb, struct net_device *dev,
 		dev_get_stats(dev, sp);
 	}
 
+	if (stats_attr_valid(filter_mask, IFLA_STATS_LINK_XSTATS, *idxattr)) {
+		const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
+
+		if (ops && ops->fill_linkxstats) {
+			int err;
+
+			*idxattr = IFLA_STATS_LINK_XSTATS;
+			attr = nla_nest_start(skb,
+					      IFLA_STATS_LINK_XSTATS);
+			if (!attr)
+				goto nla_put_failure;
+
+			err = ops->fill_linkxstats(skb, dev, prividx);
+			nla_nest_end(skb, attr);
+			if (err)
+				goto nla_put_failure;
+			*idxattr = 0;
+		}
+	}
+
 	nlmsg_end(skb, nlh);
 
 	return 0;
@@ -3509,6 +3529,16 @@ static size_t if_nlmsg_stats_size(const struct net_device *dev,
 	if (stats_attr_valid(filter_mask, IFLA_STATS_LINK_64, 0))
 		size += nla_total_size_64bit(sizeof(struct rtnl_link_stats64));
 
+	if (stats_attr_valid(filter_mask, IFLA_STATS_LINK_XSTATS, 0)) {
+		const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
+
+		if (ops && ops->get_linkxstats_size) {
+			size += nla_total_size(ops->get_linkxstats_size(dev));
+			/* for IFLA_STATS_LINK_XSTATS */
+			size += nla_total_size(0);
+		}
+	}
+
 	return size;
 }
 
-- 
2.4.11

^ permalink raw reply related

* [PATCH net-next v2 3/5] bridge: vlan: RCUify pvid
From: Nikolay Aleksandrov @ 2016-04-28 15:52 UTC (permalink / raw)
  To: netdev; +Cc: roopa, davem, stephen, jhs, Nikolay Aleksandrov
In-Reply-To: <1461858771-4732-1-git-send-email-nikolay@cumulusnetworks.com>

Make pvid a pointer to a vlan struct and RCUify the access to it. Vlans
are already RCU-protected so the pvid vlan entry cannot disappear
without being initialized to NULL and going through a grace period first.
This change is necessary for the upcoming vlan counters and also would
serve to later move to vlan passing via a pointer instead of id.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
v2: no change

 net/bridge/br_netlink.c | 23 ++++++++++-----------
 net/bridge/br_private.h | 16 +--------------
 net/bridge/br_vlan.c    | 54 +++++++++++++++++++++++--------------------------
 3 files changed, 37 insertions(+), 56 deletions(-)

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 6bae1125e36d..9c74fa438a8c 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -24,22 +24,22 @@
 static int __get_num_vlan_infos(struct net_bridge_vlan_group *vg,
 				u32 filter_mask)
 {
-	struct net_bridge_vlan *v;
+	struct net_bridge_vlan *v, *pvid;
 	u16 vid_range_start = 0, vid_range_end = 0, vid_range_flags = 0;
-	u16 flags, pvid;
 	int num_vlans = 0;
+	u16 flags;
 
 	if (!(filter_mask & RTEXT_FILTER_BRVLAN_COMPRESSED))
 		return 0;
 
-	pvid = br_get_pvid(vg);
+	pvid = rcu_dereference(vg->pvid);
 	/* Count number of vlan infos */
 	list_for_each_entry_rcu(v, &vg->vlan_list, vlist) {
 		flags = 0;
 		/* only a context, bridge vlan not activated */
 		if (!br_vlan_should_use(v))
 			continue;
-		if (v->vid == pvid)
+		if (v == pvid)
 			flags |= BRIDGE_VLAN_INFO_PVID;
 
 		if (v->flags & BRIDGE_VLAN_INFO_UNTAGGED)
@@ -246,21 +246,21 @@ nla_put_failure:
 static int br_fill_ifvlaninfo_compressed(struct sk_buff *skb,
 					 struct net_bridge_vlan_group *vg)
 {
-	struct net_bridge_vlan *v;
+	struct net_bridge_vlan *v, *pvid;
 	u16 vid_range_start = 0, vid_range_end = 0, vid_range_flags = 0;
-	u16 flags, pvid;
 	int err = 0;
+	u16 flags;
 
 	/* Pack IFLA_BRIDGE_VLAN_INFO's for every vlan
 	 * and mark vlan info with begin and end flags
 	 * if vlaninfo represents a range
 	 */
-	pvid = br_get_pvid(vg);
+	pvid = rcu_dereference(vg->pvid);
 	list_for_each_entry_rcu(v, &vg->vlan_list, vlist) {
 		flags = 0;
 		if (!br_vlan_should_use(v))
 			continue;
-		if (v->vid == pvid)
+		if (v == pvid)
 			flags |= BRIDGE_VLAN_INFO_PVID;
 
 		if (v->flags & BRIDGE_VLAN_INFO_UNTAGGED)
@@ -301,18 +301,17 @@ initvars:
 static int br_fill_ifvlaninfo(struct sk_buff *skb,
 			      struct net_bridge_vlan_group *vg)
 {
+	struct net_bridge_vlan *v, *pvid;
 	struct bridge_vlan_info vinfo;
-	struct net_bridge_vlan *v;
-	u16 pvid;
 
-	pvid = br_get_pvid(vg);
+	pvid = rcu_dereference(vg->pvid);
 	list_for_each_entry_rcu(v, &vg->vlan_list, vlist) {
 		if (!br_vlan_should_use(v))
 			continue;
 
 		vinfo.vid = v->vid;
 		vinfo.flags = 0;
-		if (v->vid == pvid)
+		if (v == pvid)
 			vinfo.flags |= BRIDGE_VLAN_INFO_PVID;
 
 		if (v->flags & BRIDGE_VLAN_INFO_UNTAGGED)
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 1b5d145dfcbf..50d70b5eb307 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -130,8 +130,8 @@ struct net_bridge_vlan {
 struct net_bridge_vlan_group {
 	struct rhashtable		vlan_hash;
 	struct list_head		vlan_list;
+	struct net_bridge_vlan __rcu	*pvid;
 	u16				num_vlans;
-	u16				pvid;
 };
 
 struct net_bridge_fdb_entry
@@ -741,15 +741,6 @@ static inline int br_vlan_get_tag(const struct sk_buff *skb, u16 *vid)
 	return err;
 }
 
-static inline u16 br_get_pvid(const struct net_bridge_vlan_group *vg)
-{
-	if (!vg)
-		return 0;
-
-	smp_rmb();
-	return vg->pvid;
-}
-
 static inline int br_vlan_enabled(struct net_bridge *br)
 {
 	return br->vlan_enabled;
@@ -835,11 +826,6 @@ static inline u16 br_vlan_get_tag(const struct sk_buff *skb, u16 *tag)
 	return 0;
 }
 
-static inline u16 br_get_pvid(const struct net_bridge_vlan_group *vg)
-{
-	return 0;
-}
-
 static inline int br_vlan_enabled(struct net_bridge *br)
 {
 	return 0;
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index e001152d6ad1..4fab7665df8c 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -31,22 +31,11 @@ static struct net_bridge_vlan *br_vlan_lookup(struct rhashtable *tbl, u16 vid)
 	return rhashtable_lookup_fast(tbl, &vid, br_vlan_rht_params);
 }
 
-static void __vlan_add_pvid(struct net_bridge_vlan_group *vg, u16 vid)
+/* __vlan_delete_pvid is just __vlan_set_pvid(vg, NULL) */
+static void __vlan_set_pvid(struct net_bridge_vlan_group *vg,
+			    struct net_bridge_vlan *v)
 {
-	if (vg->pvid == vid)
-		return;
-
-	smp_wmb();
-	vg->pvid = vid;
-}
-
-static void __vlan_delete_pvid(struct net_bridge_vlan_group *vg, u16 vid)
-{
-	if (vg->pvid != vid)
-		return;
-
-	smp_wmb();
-	vg->pvid = 0;
+	rcu_assign_pointer(vg->pvid, v);
 }
 
 static void __vlan_add_flags(struct net_bridge_vlan *v, u16 flags)
@@ -59,9 +48,9 @@ static void __vlan_add_flags(struct net_bridge_vlan *v, u16 flags)
 		vg = nbp_vlan_group(v->port);
 
 	if (flags & BRIDGE_VLAN_INFO_PVID)
-		__vlan_add_pvid(vg, v->vid);
-	else
-		__vlan_delete_pvid(vg, v->vid);
+		__vlan_set_pvid(vg, v);
+	else if (rtnl_dereference(vg->pvid) == v)
+		__vlan_set_pvid(vg, NULL);
 
 	if (flags & BRIDGE_VLAN_INFO_UNTAGGED)
 		v->flags |= BRIDGE_VLAN_INFO_UNTAGGED;
@@ -285,7 +274,9 @@ static int __vlan_del(struct net_bridge_vlan *v)
 		masterv = v->brvlan;
 	}
 
-	__vlan_delete_pvid(vg, v->vid);
+	if (rtnl_dereference(vg->pvid) == v)
+		__vlan_set_pvid(vg, NULL);
+
 	if (p) {
 		err = __vlan_vid_del(p->dev, p->br, v->vid);
 		if (err)
@@ -320,7 +311,7 @@ static void __vlan_flush(struct net_bridge_vlan_group *vg)
 {
 	struct net_bridge_vlan *vlan, *tmp;
 
-	__vlan_delete_pvid(vg, vg->pvid);
+	__vlan_set_pvid(vg, NULL);
 	list_for_each_entry_safe(vlan, tmp, &vg->vlan_list, vlist)
 		__vlan_del(vlan);
 }
@@ -404,29 +395,29 @@ static bool __allowed_ingress(struct net_bridge_vlan_group *vg, __be16 proto,
 	}
 
 	if (!*vid) {
-		u16 pvid = br_get_pvid(vg);
+		v = rcu_dereference(vg->pvid);
 
 		/* Frame had a tag with VID 0 or did not have a tag.
 		 * See if pvid is set on this port.  That tells us which
 		 * vlan untagged or priority-tagged traffic belongs to.
 		 */
-		if (!pvid)
+		if (!v)
 			goto drop;
 
 		/* PVID is set on this port.  Any untagged or priority-tagged
 		 * ingress frame is considered to belong to this vlan.
 		 */
-		*vid = pvid;
+		*vid = v->vid;
 		if (likely(!tagged))
 			/* Untagged Frame. */
-			__vlan_hwaccel_put_tag(skb, proto, pvid);
+			__vlan_hwaccel_put_tag(skb, proto, v->vid);
 		else
 			/* Priority-tagged Frame.
 			 * At this point, We know that skb->vlan_tci had
 			 * VLAN_TAG_PRESENT bit and its VID field was 0x000.
 			 * We update only VID field and preserve PCP field.
 			 */
-			skb->vlan_tci |= pvid;
+			skb->vlan_tci |= v->vid;
 
 		return true;
 	}
@@ -451,6 +442,9 @@ bool br_allowed_ingress(const struct net_bridge *br,
 		BR_INPUT_SKB_CB(skb)->vlan_filtered = false;
 		return true;
 	}
+	/* if there's no vlan_group, there's nothing to match against */
+	if (!vg)
+		return false;
 
 	return __allowed_ingress(vg, br->vlan_proto, skb, vid);
 }
@@ -492,9 +486,11 @@ bool br_should_learn(struct net_bridge_port *p, struct sk_buff *skb, u16 *vid)
 		*vid = 0;
 
 	if (!*vid) {
-		*vid = br_get_pvid(vg);
-		if (!*vid)
+		struct net_bridge_vlan *v = rcu_dereference(vg->pvid);
+
+		if (!v)
 			return false;
+		*vid = v->vid;
 
 		return true;
 	}
@@ -713,9 +709,9 @@ int br_vlan_set_proto(struct net_bridge *br, unsigned long val)
 
 static bool vlan_default_pvid(struct net_bridge_vlan_group *vg, u16 vid)
 {
-	struct net_bridge_vlan *v;
+	struct net_bridge_vlan *v = rtnl_dereference(vg->pvid);
 
-	if (vid != vg->pvid)
+	if (v && vid != v->vid)
 		return false;
 
 	v = br_vlan_lookup(&vg->vlan_hash, vid);
-- 
2.4.11

^ permalink raw reply related

* [PATCH net-next v2 4/5] bridge: vlan: learn to count
From: Nikolay Aleksandrov @ 2016-04-28 15:52 UTC (permalink / raw)
  To: netdev; +Cc: roopa, davem, stephen, jhs, Nikolay Aleksandrov
In-Reply-To: <1461858771-4732-1-git-send-email-nikolay@cumulusnetworks.com>

Add support for per-VLAN Tx/Rx statistics. Every global vlan context gets
allocated a per-cpu stats which is then set in each per-port vlan context
for quick access. The br_allowed_ingress() common function is used to
account for Rx packets and the br_handle_vlan() common function is used
to account for Tx packets.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
v2: no change

 net/bridge/br_private.h | 11 +++++++++-
 net/bridge/br_vlan.c    | 53 +++++++++++++++++++++++++++++++++++++++----------
 2 files changed, 53 insertions(+), 11 deletions(-)

diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 50d70b5eb307..f6876ed718a5 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -77,12 +77,21 @@ struct bridge_mcast_querier {
 };
 #endif
 
+struct br_vlan_stats {
+	u64 rx_bytes;
+	u64 rx_packets;
+	u64 tx_bytes;
+	u64 tx_packets;
+	struct u64_stats_sync syncp;
+};
+
 /**
  * struct net_bridge_vlan - per-vlan entry
  *
  * @vnode: rhashtable member
  * @vid: VLAN id
  * @flags: bridge vlan flags
+ * @stats: per-cpu VLAN statistics
  * @br: if MASTER flag set, this points to a bridge struct
  * @port: if MASTER flag unset, this points to a port struct
  * @refcnt: if MASTER flag set, this is bumped for each port referencing it
@@ -100,6 +109,7 @@ struct net_bridge_vlan {
 	struct rhash_head		vnode;
 	u16				vid;
 	u16				flags;
+	struct br_vlan_stats __percpu	*stats;
 	union {
 		struct net_bridge	*br;
 		struct net_bridge_port	*port;
@@ -866,7 +876,6 @@ static inline struct net_bridge_vlan_group *nbp_vlan_group_rcu(
 {
 	return NULL;
 }
-
 #endif
 
 struct nf_br_ops {
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index 4fab7665df8c..d7a70c2ea3ec 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -151,6 +151,17 @@ static struct net_bridge_vlan *br_vlan_get_master(struct net_bridge *br, u16 vid
 	return masterv;
 }
 
+static void br_master_vlan_rcu_free(struct rcu_head *rcu)
+{
+	struct net_bridge_vlan *v;
+
+	v = container_of(rcu, struct net_bridge_vlan, rcu);
+	WARN_ON(!br_vlan_is_master(v));
+	free_percpu(v->stats);
+	v->stats = NULL;
+	kfree(v);
+}
+
 static void br_vlan_put_master(struct net_bridge_vlan *masterv)
 {
 	struct net_bridge_vlan_group *vg;
@@ -163,7 +174,7 @@ static void br_vlan_put_master(struct net_bridge_vlan *masterv)
 		rhashtable_remove_fast(&vg->vlan_hash,
 				       &masterv->vnode, br_vlan_rht_params);
 		__vlan_del_list(masterv);
-		kfree_rcu(masterv, rcu);
+		call_rcu(&masterv->rcu, br_master_vlan_rcu_free);
 	}
 }
 
@@ -219,6 +230,7 @@ static int __vlan_add(struct net_bridge_vlan *v, u16 flags)
 		if (!masterv)
 			goto out_filt;
 		v->brvlan = masterv;
+		v->stats = masterv->stats;
 	}
 
 	/* Add the dev mac and count the vlan only if it's usable */
@@ -320,6 +332,7 @@ struct sk_buff *br_handle_vlan(struct net_bridge *br,
 			       struct net_bridge_vlan_group *vg,
 			       struct sk_buff *skb)
 {
+	struct br_vlan_stats *stats;
 	struct net_bridge_vlan *v;
 	u16 vid;
 
@@ -346,9 +359,14 @@ struct sk_buff *br_handle_vlan(struct net_bridge *br,
 			return NULL;
 		}
 	}
+	stats = this_cpu_ptr(v->stats);
+	u64_stats_update_begin(&stats->syncp);
+	stats->tx_bytes += skb->len;
+	stats->tx_packets++;
+	u64_stats_update_end(&stats->syncp);
+
 	if (v->flags & BRIDGE_VLAN_INFO_UNTAGGED)
 		skb->vlan_tci = 0;
-
 out:
 	return skb;
 }
@@ -357,7 +375,8 @@ out:
 static bool __allowed_ingress(struct net_bridge_vlan_group *vg, __be16 proto,
 			      struct sk_buff *skb, u16 *vid)
 {
-	const struct net_bridge_vlan *v;
+	struct br_vlan_stats *stats;
+	struct net_bridge_vlan *v;
 	bool tagged;
 
 	BR_INPUT_SKB_CB(skb)->vlan_filtered = true;
@@ -418,14 +437,21 @@ static bool __allowed_ingress(struct net_bridge_vlan_group *vg, __be16 proto,
 			 * We update only VID field and preserve PCP field.
 			 */
 			skb->vlan_tci |= v->vid;
-
-		return true;
+	} else {
+		/* Frame had a valid vlan tag.  See if vlan is allowed */
+		v = br_vlan_find(vg, *vid);
 	}
+	if (!v || !br_vlan_should_use(v))
+		goto drop;
+
+	stats = this_cpu_ptr(v->stats);
+	u64_stats_update_begin(&stats->syncp);
+	stats->rx_bytes += skb->len;
+	stats->rx_packets++;
+	u64_stats_update_end(&stats->syncp);
+
+	return true;
 
-	/* Frame had a valid vlan tag.  See if vlan is allowed */
-	v = br_vlan_find(vg, *vid);
-	if (v && br_vlan_should_use(v))
-		return true;
 drop:
 	kfree_skb(skb);
 	return false;
@@ -538,6 +564,11 @@ int br_vlan_add(struct net_bridge *br, u16 vid, u16 flags)
 	if (!vlan)
 		return -ENOMEM;
 
+	vlan->stats = netdev_alloc_pcpu_stats(struct br_vlan_stats);
+	if (!vlan->stats) {
+		kfree(vlan);
+		return -ENOMEM;
+	}
 	vlan->vid = vid;
 	vlan->flags = flags | BRIDGE_VLAN_INFO_MASTER;
 	vlan->flags &= ~BRIDGE_VLAN_INFO_PVID;
@@ -545,8 +576,10 @@ int br_vlan_add(struct net_bridge *br, u16 vid, u16 flags)
 	if (flags & BRIDGE_VLAN_INFO_BRENTRY)
 		atomic_set(&vlan->refcnt, 1);
 	ret = __vlan_add(vlan, flags);
-	if (ret)
+	if (ret) {
+		free_percpu(vlan->stats);
 		kfree(vlan);
+	}
 
 	return ret;
 }
-- 
2.4.11

^ permalink raw reply related

* [PATCH net-next v2 5/5] bridge: netlink: export per-vlan stats
From: Nikolay Aleksandrov @ 2016-04-28 15:52 UTC (permalink / raw)
  To: netdev; +Cc: roopa, davem, stephen, jhs, Nikolay Aleksandrov
In-Reply-To: <1461858771-4732-1-git-send-email-nikolay@cumulusnetworks.com>

Add a new LINK_XSTATS_TYPE_BRIDGE attribute and implement the
RTM_GETSTATS callbacks for IFLA_STATS_LINK_XSTATS (fill_linkxstats and
get_linkxstats_size) in order to export the per-vlan stats.
The paddings were added because soon these fields will be needed for
per-port per-vlan stats (or something else if someone beats me to it) so
avoiding at least a few more netlink attributes.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
v2: remove unused pvid pointer, fix the case where bridge has 0 vlans
but there're global contexts and move to rtnl link type private
attributes nested into a LINK_XSTATS_TYPE_ attribute. The paddings were
added because soon these fields will be needed for per-port per-vlan
stats (or something else if someone beats me to it) so avoiding at least
a few more netlink attributes.


 include/uapi/linux/if_bridge.h | 18 ++++++++++++
 include/uapi/linux/if_link.h   |  1 +
 net/bridge/br_netlink.c        | 65 ++++++++++++++++++++++++++++++++++++++++++
 net/bridge/br_private.h        |  7 +++++
 net/bridge/br_vlan.c           | 27 ++++++++++++++++++
 5 files changed, 118 insertions(+)

diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index 0536eefff9bf..397d503fdedb 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/include/uapi/linux/if_bridge.h
@@ -134,6 +134,16 @@ struct bridge_vlan_info {
 	__u16 vid;
 };
 
+struct bridge_vlan_xstats {
+	__u64 rx_bytes;
+	__u64 rx_packets;
+	__u64 tx_bytes;
+	__u64 tx_packets;
+	__u16 vid;
+	__u16 pad1;
+	__u32 pad2;
+};
+
 /* Bridge multicast database attributes
  * [MDBA_MDB] = {
  *     [MDBA_MDB_ENTRY] = {
@@ -233,4 +243,12 @@ enum {
 };
 #define MDBA_SET_ENTRY_MAX (__MDBA_SET_ENTRY_MAX - 1)
 
+/* Embedded inside LINK_XSTATS_TYPE_BRIDGE */
+enum {
+	BRIDGE_XSTATS_UNSPEC,
+	BRIDGE_XSTATS_VLAN,
+	__BRIDGE_XSTATS_MAX
+};
+#define BRIDGE_XSTATS_MAX (__BRIDGE_XSTATS_MAX - 1)
+
 #endif /* _UAPI_LINUX_IF_BRIDGE_H */
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 8577c0e4116f..fd0621ec9222 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -817,6 +817,7 @@ enum {
  */
 enum {
 	LINK_XSTATS_TYPE_UNSPEC,
+	LINK_XSTATS_TYPE_BRIDGE,
 	__LINK_XSTATS_TYPE_MAX
 };
 #define LINK_XSTATS_TYPE_MAX (__LINK_XSTATS_TYPE_MAX - 1)
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 9c74fa438a8c..c36bafcad569 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -1222,6 +1222,69 @@ static int br_fill_info(struct sk_buff *skb, const struct net_device *brdev)
 	return 0;
 }
 
+static size_t br_get_linkxstats_size(const struct net_device *dev)
+{
+	struct net_bridge *br = netdev_priv(dev);
+	struct net_bridge_vlan_group *vg;
+	struct net_bridge_vlan *v;
+	int numvls = 0;
+
+	vg = br_vlan_group(br);
+	if (!vg)
+		return 0;
+
+	/* we need to count all, even placeholder entries */
+	list_for_each_entry(v, &vg->vlan_list, vlist)
+		numvls++;
+
+	/* account for the vlans and the link xstats type nest attribute */
+	return numvls * nla_total_size(sizeof(struct bridge_vlan_xstats)) +
+	       nla_total_size(0);
+}
+
+static int br_fill_linkxstats(struct sk_buff *skb, const struct net_device *dev,
+			      int *prividx)
+{
+	struct net_bridge *br = netdev_priv(dev);
+	struct net_bridge_vlan_group *vg;
+	struct net_bridge_vlan *v;
+	struct nlattr *nest;
+	int vl_idx = 0;
+
+	vg = br_vlan_group(br);
+	if (!vg)
+		goto out;
+	nest = nla_nest_start(skb, LINK_XSTATS_TYPE_BRIDGE);
+	if (!nest)
+		return -EMSGSIZE;
+	list_for_each_entry(v, &vg->vlan_list, vlist) {
+		struct bridge_vlan_xstats vxi;
+		struct br_vlan_stats stats;
+
+		if (vl_idx++ < *prividx)
+			continue;
+		memset(&vxi, 0, sizeof(vxi));
+		vxi.vid = v->vid;
+		br_vlan_get_stats(v, &stats);
+		vxi.rx_bytes = stats.rx_bytes;
+		vxi.rx_packets = stats.rx_packets;
+		vxi.tx_bytes = stats.tx_bytes;
+		vxi.tx_packets = stats.tx_packets;
+
+		if (nla_put(skb, BRIDGE_XSTATS_VLAN, sizeof(vxi), &vxi))
+			goto nla_put_failure;
+	}
+	nla_nest_end(skb, nest);
+	*prividx = 0;
+out:
+	return 0;
+
+nla_put_failure:
+	nla_nest_end(skb, nest);
+	*prividx = vl_idx;
+
+	return -EMSGSIZE;
+}
 
 static struct rtnl_af_ops br_af_ops __read_mostly = {
 	.family			= AF_BRIDGE,
@@ -1240,6 +1303,8 @@ struct rtnl_link_ops br_link_ops __read_mostly = {
 	.dellink		= br_dev_delete,
 	.get_size		= br_get_size,
 	.fill_info		= br_fill_info,
+	.fill_linkxstats	= br_fill_linkxstats,
+	.get_linkxstats_size	= br_get_linkxstats_size,
 
 	.slave_maxtype		= IFLA_BRPORT_MAX,
 	.slave_policy		= br_port_policy,
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index f6876ed718a5..a10f7ed26f3b 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -709,6 +709,8 @@ int nbp_vlan_delete(struct net_bridge_port *port, u16 vid);
 void nbp_vlan_flush(struct net_bridge_port *port);
 int nbp_vlan_init(struct net_bridge_port *port);
 int nbp_get_num_vlan_infos(struct net_bridge_port *p, u32 filter_mask);
+void br_vlan_get_stats(const struct net_bridge_vlan *v,
+		       struct br_vlan_stats *stats);
 
 static inline struct net_bridge_vlan_group *br_vlan_group(
 					const struct net_bridge *br)
@@ -876,6 +878,11 @@ static inline struct net_bridge_vlan_group *nbp_vlan_group_rcu(
 {
 	return NULL;
 }
+
+static inline void br_vlan_get_stats(const struct net_bridge_vlan *v,
+				     struct br_vlan_stats *stats)
+{
+}
 #endif
 
 struct nf_br_ops {
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index d7a70c2ea3ec..b39d9f5761d9 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -1029,3 +1029,30 @@ void nbp_vlan_flush(struct net_bridge_port *port)
 	synchronize_rcu();
 	__vlan_group_free(vg);
 }
+
+void br_vlan_get_stats(const struct net_bridge_vlan *v,
+		       struct br_vlan_stats *stats)
+{
+	int i;
+
+	memset(stats, 0, sizeof(*stats));
+	for_each_possible_cpu(i) {
+		u64 rxpackets, rxbytes, txpackets, txbytes;
+		struct br_vlan_stats *cpu_stats;
+		unsigned int start;
+
+		cpu_stats = per_cpu_ptr(v->stats, i);
+		do {
+			start = u64_stats_fetch_begin_irq(&cpu_stats->syncp);
+			rxpackets = cpu_stats->rx_packets;
+			rxbytes = cpu_stats->rx_bytes;
+			txbytes = cpu_stats->tx_bytes;
+			txpackets = cpu_stats->tx_packets;
+		} while (u64_stats_fetch_retry_irq(&cpu_stats->syncp, start));
+
+		stats->rx_packets += rxpackets;
+		stats->rx_bytes += rxbytes;
+		stats->tx_bytes += txbytes;
+		stats->tx_packets += txpackets;
+	}
+}
-- 
2.4.11

^ permalink raw reply related

* Re: [PATCH net-next] drivers/net: add 6WIND SHULTI support
From: David Miller @ 2016-04-28 15:54 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: jiri, fw, netdev
In-Reply-To: <5721FB26.6070305@6wind.com>

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Thu, 28 Apr 2016 13:59:34 +0200

> Le 27/04/2016 18:55, David Miller a écrit :
>> From: Jiri Pirko <jiri@resnulli.us>
> [snip]
>>> The difference is that it this tries to allow userspace crap to mirror
>>> setting user does for bridge/ovs. Basically this looks to me like an
>>> attempt to enable userspace SDKs and such.
>> 
>> +1
> I don't think so because a userspace can receive all the bridge/ovs settings by
> mean of netlink mirror, without the need for this driver at all.

You can say whatever you want, but the facilities you are adding to
this driver enables proprietary userland SDK components.

And this is precisely what we are trying to avoid by having a clean,
fully featured switch device model in the kernel.

It is against your interestes of upstreaming your driver to continue
denying what your changes facilitate.

THanks.

^ permalink raw reply

* [PATCH v4 net-next 0/2] ppp: add rtnetlink support
From: Guillaume Nault @ 2016-04-28 15:55 UTC (permalink / raw)
  To: netdev
  Cc: linux-ppp, Paul Mackerras, David Miller, Stephen Hemminger,
	walter harms

PPP devices lack the ability to be customised at creation time. In
particular they can't be created in a given netns or with a particular
name. Moving or renaming the device after creation is possible, but
creates undesirable transient effects on servers where PPP devices are
constantly created and removed, as users connect and disconnect.
Implementing rtnetlink support solves this problem.

The rtnetlink handlers implemented in this series are minimal, and can
only replace the PPPIOCNEWUNIT ioctl. The rest of PPP ioctls remains
necessary for any other operation on channels and units.
It is perfectly possible to mix PPP devices created by rtnl
and by ioctl(PPPIOCNEWUNIT). Devices will behave in the same way.

mutex_trylock() is used to resolve the locking issue wrt. locking
dependency between rtnl_lock() and ppp_mutex (see ppp_nl_newlink() in
patch #2).

A user visible difference brought by this series is that old PPP
interfaces (those created with ioctl(PPPIOCNEWUNIT)), can now be
removed by "ip link del", just like new rtnl based PPP devices.

Changes since v3:
  - Rebase on net-next.
  - Not an RFC anymore.

Changes since v2:
  - Define ->rtnl_link_ops for ioctl based PPP devices, so they can
    handle rtnl messages just like rtnl based ones (suggested by
    Stephen Hemminger).
  - Move back to original lock ordering between ppp_mutex and rtnl_lock
    to simplify patch series. Handle lock inversion issue using
    mutex_trylock() (suggested by Stephen Hemminger).
  - Do file descriptor lookup directly in ppp_nl_newlink(), to simplify
    ppp_dev_configure().

Changes since v1:
  - Rebase on net-next.
  - Invert locking order wrt. ppp_mutex and rtnl_lock and protect
    file->private_data with ppp_mutex.

Guillaume Nault (2):
  ppp: define reusable device creation functions
  ppp: add rtnetlink device creation support

 drivers/net/ppp/ppp_generic.c | 315 ++++++++++++++++++++++++++++++------------
 include/uapi/linux/if_link.h  |   8 ++
 2 files changed, 235 insertions(+), 88 deletions(-)

-- 
2.8.1

^ permalink raw reply

* [PATCH v4 net-next 1/2] ppp: define reusable device creation functions
From: Guillaume Nault @ 2016-04-28 15:55 UTC (permalink / raw)
  To: netdev
  Cc: linux-ppp, Paul Mackerras, David Miller, Stephen Hemminger,
	walter harms
In-Reply-To: <cover.1461854990.git.g.nault@alphalink.fr>

Move PPP device initialisation and registration out of
ppp_create_interface().
This prepares code for device registration with rtnetlink.

While there, simplify the prototype of ppp_create_interface():

  * Since ppp_dev_configure() takes care of setting file->private_data,
    there's no need to return a ppp structure to ppp_unattached_ioctl()
    anymore.

  * The unit parameter is made read/write so that ppp_create_interface()
    can tell which unit number has been assigned.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
 drivers/net/ppp/ppp_generic.c | 206 ++++++++++++++++++++++++------------------
 1 file changed, 118 insertions(+), 88 deletions(-)

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index f572b31..59077c8 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -183,6 +183,11 @@ struct channel {
 #endif /* CONFIG_PPP_MULTILINK */
 };
 
+struct ppp_config {
+	struct file *file;
+	s32 unit;
+};
+
 /*
  * SMP locking issues:
  * Both the ppp.rlock and ppp.wlock locks protect the ppp.channels
@@ -269,8 +274,7 @@ static void ppp_ccp_peek(struct ppp *ppp, struct sk_buff *skb, int inbound);
 static void ppp_ccp_closed(struct ppp *ppp);
 static struct compressor *find_compressor(int type);
 static void ppp_get_stats(struct ppp *ppp, struct ppp_stats *st);
-static struct ppp *ppp_create_interface(struct net *net, int unit,
-					struct file *file, int *retp);
+static int ppp_create_interface(struct net *net, struct file *file, int *unit);
 static void init_ppp_file(struct ppp_file *pf, int kind);
 static void ppp_destroy_interface(struct ppp *ppp);
 static struct ppp *ppp_find_unit(struct ppp_net *pn, int unit);
@@ -853,12 +857,12 @@ static int ppp_unattached_ioctl(struct net *net, struct ppp_file *pf,
 		/* Create a new ppp unit */
 		if (get_user(unit, p))
 			break;
-		ppp = ppp_create_interface(net, unit, file, &err);
-		if (!ppp)
+		err = ppp_create_interface(net, file, &unit);
+		if (err < 0)
 			break;
-		file->private_data = &ppp->file;
+
 		err = -EFAULT;
-		if (put_user(ppp->file.index, p))
+		if (put_user(unit, p))
 			break;
 		err = 0;
 		break;
@@ -960,6 +964,94 @@ static struct pernet_operations ppp_net_ops = {
 	.size = sizeof(struct ppp_net),
 };
 
+static int ppp_unit_register(struct ppp *ppp, int unit)
+{
+	struct ppp_net *pn = ppp_pernet(ppp->ppp_net);
+	int ret;
+
+	mutex_lock(&pn->all_ppp_mutex);
+
+	if (unit < 0) {
+		ret = unit_get(&pn->units_idr, ppp);
+		if (ret < 0)
+			goto err;
+	} else {
+		/* Caller asked for a specific unit number. Fail with -EEXIST
+		 * if unavailable. For backward compatibility, return -EEXIST
+		 * too if idr allocation fails; this makes pppd retry without
+		 * requesting a specific unit number.
+		 */
+		if (unit_find(&pn->units_idr, unit)) {
+			ret = -EEXIST;
+			goto err;
+		}
+		ret = unit_set(&pn->units_idr, ppp, unit);
+		if (ret < 0) {
+			/* Rewrite error for backward compatibility */
+			ret = -EEXIST;
+			goto err;
+		}
+	}
+	ppp->file.index = ret;
+
+	snprintf(ppp->dev->name, IFNAMSIZ, "ppp%i", ppp->file.index);
+
+	ret = register_netdevice(ppp->dev);
+	if (ret < 0)
+		goto err_unit;
+
+	atomic_inc(&ppp_unit_count);
+
+	mutex_unlock(&pn->all_ppp_mutex);
+
+	return 0;
+
+err_unit:
+	unit_put(&pn->units_idr, ppp->file.index);
+err:
+	mutex_unlock(&pn->all_ppp_mutex);
+
+	return ret;
+}
+
+static int ppp_dev_configure(struct net *src_net, struct net_device *dev,
+			     const struct ppp_config *conf)
+{
+	struct ppp *ppp = netdev_priv(dev);
+	int indx;
+	int err;
+
+	ppp->dev = dev;
+	ppp->ppp_net = src_net;
+	ppp->mru = PPP_MRU;
+	ppp->owner = conf->file;
+
+	init_ppp_file(&ppp->file, INTERFACE);
+	ppp->file.hdrlen = PPP_HDRLEN - 2; /* don't count proto bytes */
+
+	for (indx = 0; indx < NUM_NP; ++indx)
+		ppp->npmode[indx] = NPMODE_PASS;
+	INIT_LIST_HEAD(&ppp->channels);
+	spin_lock_init(&ppp->rlock);
+	spin_lock_init(&ppp->wlock);
+#ifdef CONFIG_PPP_MULTILINK
+	ppp->minseq = -1;
+	skb_queue_head_init(&ppp->mrq);
+#endif /* CONFIG_PPP_MULTILINK */
+#ifdef CONFIG_PPP_FILTER
+	ppp->pass_filter = NULL;
+	ppp->active_filter = NULL;
+#endif /* CONFIG_PPP_FILTER */
+
+	err = ppp_unit_register(ppp, conf->unit);
+	if (err < 0)
+		return err;
+
+	conf->file->private_data = &ppp->file;
+
+	return 0;
+}
+
 #define PPP_MAJOR	108
 
 /* Called at boot time if ppp is compiled into the kernel,
@@ -2732,102 +2824,40 @@ ppp_get_stats(struct ppp *ppp, struct ppp_stats *st)
  * or if there is already a unit with the requested number.
  * unit == -1 means allocate a new number.
  */
-static struct ppp *ppp_create_interface(struct net *net, int unit,
-					struct file *file, int *retp)
+static int ppp_create_interface(struct net *net, struct file *file, int *unit)
 {
+	struct ppp_config conf = {
+		.file = file,
+		.unit = *unit,
+	};
+	struct net_device *dev;
 	struct ppp *ppp;
-	struct ppp_net *pn;
-	struct net_device *dev = NULL;
-	int ret = -ENOMEM;
-	int i;
+	int err;
 
 	dev = alloc_netdev(sizeof(struct ppp), "", NET_NAME_ENUM, ppp_setup);
-	if (!dev)
-		goto out1;
-
-	pn = ppp_pernet(net);
-
-	ppp = netdev_priv(dev);
-	ppp->dev = dev;
-	ppp->mru = PPP_MRU;
-	init_ppp_file(&ppp->file, INTERFACE);
-	ppp->file.hdrlen = PPP_HDRLEN - 2;	/* don't count proto bytes */
-	ppp->owner = file;
-	for (i = 0; i < NUM_NP; ++i)
-		ppp->npmode[i] = NPMODE_PASS;
-	INIT_LIST_HEAD(&ppp->channels);
-	spin_lock_init(&ppp->rlock);
-	spin_lock_init(&ppp->wlock);
-#ifdef CONFIG_PPP_MULTILINK
-	ppp->minseq = -1;
-	skb_queue_head_init(&ppp->mrq);
-#endif /* CONFIG_PPP_MULTILINK */
-#ifdef CONFIG_PPP_FILTER
-	ppp->pass_filter = NULL;
-	ppp->active_filter = NULL;
-#endif /* CONFIG_PPP_FILTER */
-
-	/*
-	 * drum roll: don't forget to set
-	 * the net device is belong to
-	 */
+	if (!dev) {
+		err = -ENOMEM;
+		goto err;
+	}
 	dev_net_set(dev, net);
 
 	rtnl_lock();
-	mutex_lock(&pn->all_ppp_mutex);
-
-	if (unit < 0) {
-		unit = unit_get(&pn->units_idr, ppp);
-		if (unit < 0) {
-			ret = unit;
-			goto out2;
-		}
-	} else {
-		ret = -EEXIST;
-		if (unit_find(&pn->units_idr, unit))
-			goto out2; /* unit already exists */
-		/*
-		 * if caller need a specified unit number
-		 * lets try to satisfy him, otherwise --
-		 * he should better ask us for new unit number
-		 *
-		 * NOTE: yes I know that returning EEXIST it's not
-		 * fair but at least pppd will ask us to allocate
-		 * new unit in this case so user is happy :)
-		 */
-		unit = unit_set(&pn->units_idr, ppp, unit);
-		if (unit < 0)
-			goto out2;
-	}
-
-	/* Initialize the new ppp unit */
-	ppp->file.index = unit;
-	sprintf(dev->name, "ppp%d", unit);
 
-	ret = register_netdevice(dev);
-	if (ret != 0) {
-		unit_put(&pn->units_idr, unit);
-		netdev_err(ppp->dev, "PPP: couldn't register device %s (%d)\n",
-			   dev->name, ret);
-		goto out2;
-	}
-
-	ppp->ppp_net = net;
+	err = ppp_dev_configure(net, dev, &conf);
+	if (err < 0)
+		goto err_dev;
+	ppp = netdev_priv(dev);
+	*unit = ppp->file.index;
 
-	atomic_inc(&ppp_unit_count);
-	mutex_unlock(&pn->all_ppp_mutex);
 	rtnl_unlock();
 
-	*retp = 0;
-	return ppp;
+	return 0;
 
-out2:
-	mutex_unlock(&pn->all_ppp_mutex);
+err_dev:
 	rtnl_unlock();
 	free_netdev(dev);
-out1:
-	*retp = ret;
-	return NULL;
+err:
+	return err;
 }
 
 /*
-- 
2.8.1


^ permalink raw reply related

* [PATCH v4 net-next 2/2] ppp: add rtnetlink device creation support
From: Guillaume Nault @ 2016-04-28 15:55 UTC (permalink / raw)
  To: netdev
  Cc: linux-ppp, Paul Mackerras, David Miller, Stephen Hemminger,
	walter harms
In-Reply-To: <cover.1461854990.git.g.nault@alphalink.fr>

Define PPP device handler for use with rtnetlink.
The only PPP specific attribute is IFLA_PPP_DEV_FD. It is mandatory and
contains the file descriptor of the associated /dev/ppp instance (the
file descriptor which would have been used for ioctl(PPPIOCNEWUNIT) in
the ioctl-based API). The PPP device is removed when this file
descriptor is released (same behaviour as with ioctl based PPP
devices).

PPP devices created with the rtnetlink API behave like the ones created
with ioctl(PPPIOCNEWUNIT). In particular existing ioctls work the same
way, no matter how the PPP device was created.
The rtnl callbacks are also assigned to ioctl based PPP devices. This
way, rtnl messages have the same effect on any PPP devices.
The immediate effect is that all PPP devices, even ioctl-based
ones, can now be removed with "ip link del".

A minor difference still exists between ioctl and rtnl based PPP
interfaces: in the device name, the number following the "ppp" prefix
corresponds to the PPP unit number for ioctl based devices, while it is
just an unrelated incrementing index for rtnl ones.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
 drivers/net/ppp/ppp_generic.c | 115 ++++++++++++++++++++++++++++++++++++++++--
 include/uapi/linux/if_link.h  |   8 +++
 2 files changed, 120 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index 59077c8..8dedafa 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -46,6 +46,7 @@
 #include <linux/device.h>
 #include <linux/mutex.h>
 #include <linux/slab.h>
+#include <linux/file.h>
 #include <asm/unaligned.h>
 #include <net/slhc_vj.h>
 #include <linux/atomic.h>
@@ -186,6 +187,7 @@ struct channel {
 struct ppp_config {
 	struct file *file;
 	s32 unit;
+	bool ifname_is_set;
 };
 
 /*
@@ -286,6 +288,7 @@ static int unit_get(struct idr *p, void *ptr);
 static int unit_set(struct idr *p, void *ptr, int n);
 static void unit_put(struct idr *p, int n);
 static void *unit_find(struct idr *p, int n);
+static void ppp_setup(struct net_device *dev);
 
 static const struct net_device_ops ppp_netdev_ops;
 
@@ -964,7 +967,7 @@ static struct pernet_operations ppp_net_ops = {
 	.size = sizeof(struct ppp_net),
 };
 
-static int ppp_unit_register(struct ppp *ppp, int unit)
+static int ppp_unit_register(struct ppp *ppp, int unit, bool ifname_is_set)
 {
 	struct ppp_net *pn = ppp_pernet(ppp->ppp_net);
 	int ret;
@@ -994,7 +997,8 @@ static int ppp_unit_register(struct ppp *ppp, int unit)
 	}
 	ppp->file.index = ret;
 
-	snprintf(ppp->dev->name, IFNAMSIZ, "ppp%i", ppp->file.index);
+	if (!ifname_is_set)
+		snprintf(ppp->dev->name, IFNAMSIZ, "ppp%i", ppp->file.index);
 
 	ret = register_netdevice(ppp->dev);
 	if (ret < 0)
@@ -1043,7 +1047,7 @@ static int ppp_dev_configure(struct net *src_net, struct net_device *dev,
 	ppp->active_filter = NULL;
 #endif /* CONFIG_PPP_FILTER */
 
-	err = ppp_unit_register(ppp, conf->unit);
+	err = ppp_unit_register(ppp, conf->unit, conf->ifname_is_set);
 	if (err < 0)
 		return err;
 
@@ -1052,6 +1056,99 @@ static int ppp_dev_configure(struct net *src_net, struct net_device *dev,
 	return 0;
 }
 
+static const struct nla_policy ppp_nl_policy[IFLA_PPP_MAX + 1] = {
+	[IFLA_PPP_DEV_FD]	= { .type = NLA_S32 },
+};
+
+static int ppp_nl_validate(struct nlattr *tb[], struct nlattr *data[])
+{
+	if (!data)
+		return -EINVAL;
+
+	if (!data[IFLA_PPP_DEV_FD])
+		return -EINVAL;
+	if (nla_get_s32(data[IFLA_PPP_DEV_FD]) < 0)
+		return -EBADF;
+
+	return 0;
+}
+
+static int ppp_nl_newlink(struct net *src_net, struct net_device *dev,
+			  struct nlattr *tb[], struct nlattr *data[])
+{
+	struct ppp_config conf = {
+		.unit = -1,
+		.ifname_is_set = true,
+	};
+	struct file *file;
+	int err;
+
+	file = fget(nla_get_s32(data[IFLA_PPP_DEV_FD]));
+	if (!file)
+		return -EBADF;
+
+	/* rtnl_lock is already held here, but ppp_create_interface() locks
+	 * ppp_mutex before holding rtnl_lock. Using mutex_trylock() avoids
+	 * possible deadlock due to lock order inversion, at the cost of
+	 * pushing the problem back to userspace.
+	 */
+	if (!mutex_trylock(&ppp_mutex)) {
+		err = -EBUSY;
+		goto out;
+	}
+
+	if (file->f_op != &ppp_device_fops || file->private_data) {
+		err = -EBADF;
+		goto out_unlock;
+	}
+
+	conf.file = file;
+	err = ppp_dev_configure(src_net, dev, &conf);
+
+out_unlock:
+	mutex_unlock(&ppp_mutex);
+out:
+	fput(file);
+
+	return err;
+}
+
+static void ppp_nl_dellink(struct net_device *dev, struct list_head *head)
+{
+	unregister_netdevice_queue(dev, head);
+}
+
+static size_t ppp_nl_get_size(const struct net_device *dev)
+{
+	return 0;
+}
+
+static int ppp_nl_fill_info(struct sk_buff *skb, const struct net_device *dev)
+{
+	return 0;
+}
+
+static struct net *ppp_nl_get_link_net(const struct net_device *dev)
+{
+	struct ppp *ppp = netdev_priv(dev);
+
+	return ppp->ppp_net;
+}
+
+static struct rtnl_link_ops ppp_link_ops __read_mostly = {
+	.kind		= "ppp",
+	.maxtype	= IFLA_PPP_MAX,
+	.policy		= ppp_nl_policy,
+	.priv_size	= sizeof(struct ppp),
+	.setup		= ppp_setup,
+	.validate	= ppp_nl_validate,
+	.newlink	= ppp_nl_newlink,
+	.dellink	= ppp_nl_dellink,
+	.get_size	= ppp_nl_get_size,
+	.fill_info	= ppp_nl_fill_info,
+	.get_link_net	= ppp_nl_get_link_net,
+};
+
 #define PPP_MAJOR	108
 
 /* Called at boot time if ppp is compiled into the kernel,
@@ -1080,11 +1177,19 @@ static int __init ppp_init(void)
 		goto out_chrdev;
 	}
 
+	err = rtnl_link_register(&ppp_link_ops);
+	if (err) {
+		pr_err("failed to register rtnetlink PPP handler\n");
+		goto out_class;
+	}
+
 	/* not a big deal if we fail here :-) */
 	device_create(ppp_class, NULL, MKDEV(PPP_MAJOR, 0), NULL, "ppp");
 
 	return 0;
 
+out_class:
+	class_destroy(ppp_class);
 out_chrdev:
 	unregister_chrdev(PPP_MAJOR, "ppp");
 out_net:
@@ -2829,6 +2934,7 @@ static int ppp_create_interface(struct net *net, struct file *file, int *unit)
 	struct ppp_config conf = {
 		.file = file,
 		.unit = *unit,
+		.ifname_is_set = false,
 	};
 	struct net_device *dev;
 	struct ppp *ppp;
@@ -2840,6 +2946,7 @@ static int ppp_create_interface(struct net *net, struct file *file, int *unit)
 		goto err;
 	}
 	dev_net_set(dev, net);
+	dev->rtnl_link_ops = &ppp_link_ops;
 
 	rtnl_lock();
 
@@ -3046,6 +3153,7 @@ static void __exit ppp_cleanup(void)
 	/* should never happen */
 	if (atomic_read(&ppp_unit_count) || atomic_read(&channel_count))
 		pr_err("PPP: removing module but units remain!\n");
+	rtnl_link_unregister(&ppp_link_ops);
 	unregister_chrdev(PPP_MAJOR, "ppp");
 	device_destroy(ppp_class, MKDEV(PPP_MAJOR, 0));
 	class_destroy(ppp_class);
@@ -3104,4 +3212,5 @@ EXPORT_SYMBOL(ppp_register_compressor);
 EXPORT_SYMBOL(ppp_unregister_compressor);
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_CHARDEV(PPP_MAJOR, 0);
+MODULE_ALIAS_RTNL_LINK("ppp");
 MODULE_ALIAS("devname:ppp");
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index d82de33..3e80974 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -520,6 +520,14 @@ enum {
 };
 #define IFLA_GENEVE_MAX	(__IFLA_GENEVE_MAX - 1)
 
+/* PPP section */
+enum {
+	IFLA_PPP_UNSPEC,
+	IFLA_PPP_DEV_FD,
+	__IFLA_PPP_MAX
+};
+#define IFLA_PPP_MAX (__IFLA_PPP_MAX - 1)
+
 /* Bonding section */
 
 enum {
-- 
2.8.1

^ permalink raw reply related

* Re: [PATCH net-next] net: snmp: fix 64bit stats on 32bit arches
From: David Miller @ 2016-04-28 15:55 UTC (permalink / raw)
  To: eric.dumazet; +Cc: nicolas.dichtel, edumazet, netdev
In-Reply-To: <1461850404.5535.103.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 28 Apr 2016 06:33:24 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> I accidentally replaced BH disabling by preemption disabling
> in SNMP_ADD_STATS64() and SNMP_UPD_PO_STATS64() on 32bit builds.
> 
> For 64bit stats on 32bit arch, we really need to disable BH,
> since the "struct u64_stats_sync syncp" might be manipulated
> both from process and BH contexts.
> 
> Fixes: 6aef70a851ac ("net: snmp: kill various STATS_USER() helpers")
> Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH v2] net: macb: do not scan PHYs manually
From: Nathan Sullivan @ 2016-04-28 15:55 UTC (permalink / raw)
  To: Nicolas Ferre; +Cc: netdev, linux-kernel, Florian Fainelli, Alexandre Belloni
In-Reply-To: <57222FCE.8050407@atmel.com>

On Thu, Apr 28, 2016 at 05:44:14PM +0200, Nicolas Ferre wrote:
> Le 28/04/2016 16:46, Nathan Sullivan a écrit :
> > Since of_mdiobus_register and mdiobus_register will scan automatically,
> > do not manually scan for PHY devices in the macb ethernet driver. Doing
> > so will result in many nonexistent PHYs on the MDIO bus if the MDIO
> > lines are floating or grounded, such as when they are not used.
> > 
> > Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com>
> 
> Well, as explained in the commit message that added this feature and in
> the comment, if no phy is specified in the DT we end up without phy...
> 
> There are AT91 platforms which lack specification for the phy node in
> the DT. So, I don't know if there is a better way to deal with this case
> but I see this removal as risky.
> 
> Bye,
> 
> Nicolas Ferre

Hmm, are AT91 platforms special in this regard? As far as I can tell, this
driver (macb) and Marvell PXA are the only ethernet drivers that call
mdiobus_scan directly, and PXA does it on a known address. I do see that there
are trees that use macb and don't have a phy listed, which is unfortunate.

Another way to fix our issue would be to consider all 0x0s a bad ID in
mdiobus_scan, so grounded MDIO lines do not get PHYs scanned.  Or we could add a
DT property to disable the manual scan.  I'm not sure what the correct solution
is, do you have a preference?

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox