Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2 net-next 1/1] tc-testing: Clarify the use of tdc's -d option
From: David Miller @ 2019-07-31 22:54 UTC (permalink / raw)
  To: lucasb
  Cc: netdev, nicolas.dichtel, jhs, xiyou.wangcong, jiri, mleitner,
	vladbu, dcaratti, kernel
In-Reply-To: <1564442292-4731-1-git-send-email-lucasb@mojatatu.com>

From: Lucas Bates <lucasb@mojatatu.com>
Date: Mon, 29 Jul 2019 19:18:12 -0400

> The -d command line argument to tdc requires the name of a physical device
> on the system where the tests will be run. If -d has not been used, tdc
> will skip tests that require a physical device.
> 
> This patch is intended to better document what the -d option does and how
> it is used.
> 
> Signed-off-by: Lucas Bates <lucasb@mojatatu.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] selftests/tls: fix TLS tests with CONFIG_TLS=n
From: David Miller @ 2019-07-31 22:54 UTC (permalink / raw)
  To: jakub.kicinski; +Cc: netdev, oss-drivers, rong.a.chen
In-Reply-To: <20190729230803.10781-1-jakub.kicinski@netronome.com>

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Mon, 29 Jul 2019 16:08:03 -0700

> Build bot reports some recent TLS tests are failing
> with CONFIG_TLS=n. Correct the expected return code
> and skip TLS installation if not supported.
> 
> Tested with CONFIG_TLS=n and CONFIG_TLS=m.
> 
> Reported-by: kernel test robot <rong.a.chen@intel.com>
> Fixes: cf32526c8842 ("selftests/tls: add a test for ULP but no keys")
> Fixes: 65d41fb317c6 ("selftests/tls: add a bidirectional test")
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Applied, thanks Jakub.

^ permalink raw reply

* Re: [PATCH net v3] net: bridge: move vlan init/deinit to NETDEV_REGISTER/UNREGISTER
From: Stephen Hemminger @ 2019-07-31 22:53 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, roopa, davem, bridge, michael-dev
In-Reply-To: <20190731224955.10908-1-nikolay@cumulusnetworks.com>

  
> -int br_vlan_init(struct net_bridge *br)
> +static int br_vlan_init(struct net_bridge *br)
>  {
>  	struct net_bridge_vlan_group *vg;
>  	int ret = -ENOMEM;
> @@ -1083,6 +1085,8 @@ int br_vlan_init(struct net_bridge *br)
>  	return ret;
>  
>  err_vlan_add:
> +	RCU_INIT_POINTER(br->vlgrp, NULL);
> +	synchronize_rcu();

Calling sychronize_rcu is expensive. And the callback for
notifier is always called with rtnl_head. 

Why not just keep the pointer initialization back in the
code where bridge is created, it was safe there.

^ permalink raw reply

* Re: [PATCH 00/14] ARM: move lpc32xx and dove to multiplatform
From: Russell King - ARM Linux admin @ 2019-07-31 22:53 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: soc, linux-arm-kernel, Vladimir Zapolskiy, Sylvain Lemieux,
	Gregory Clement, Linus Walleij, Jason Cooper, Andrew Lunn,
	Sebastian Hesselbarth, David S. Miller, Greg Kroah-Hartman,
	Alan Stern, Guenter Roeck, linux-gpio, netdev, linux-serial,
	linux-usb, linux-watchdog
In-Reply-To: <20190731195713.3150463-1-arnd@arndb.de>

On Wed, Jul 31, 2019 at 09:56:42PM +0200, Arnd Bergmann wrote:
> For dove, the patches are basically what I had proposed back in
> 2015 when all other ARMv6/ARMv7 machines became part of a single
> kernel build. I don't know what the state is mach-dove support is,
> compared to the DT based support in mach-mvebu for the same
> hardware. If they are functionally the same, we could also just
> remove mach-dove rather than applying my patches.

Well, the good news is that I'm down to a small board support file
for the Dove Cubox now - but the bad news is, that there's still a
board support file necessary to support everything the Dove SoC has
to offer.

Even for a DT based Dove Cubox, I'm still using mach-dove, but it
may be possible to drop most of mach-dove now.  Without spending a
lot of time digging through it, it's impossible to really know.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply

* Re: [PATCH] tipc: compat: allow tipc commands without arguments
From: David Miller @ 2019-07-31 22:52 UTC (permalink / raw)
  To: takondra
  Cc: jon.maloy, ying.xue, netdev, tipc-discussion, linux-kernel,
	xe-linux-external, stable
In-Reply-To: <20190729221507.48893-1-takondra@cisco.com>


Jon et al., please review.

Thank you.

^ permalink raw reply

* [PATCH net v3] net: bridge: move vlan init/deinit to NETDEV_REGISTER/UNREGISTER
From: Nikolay Aleksandrov @ 2019-07-31 22:49 UTC (permalink / raw)
  To: netdev; +Cc: roopa, davem, bridge, Nikolay Aleksandrov, michael-dev
In-Reply-To: <319fda43-195d-2b92-7f62-7e273c084a29@cumulusnetworks.com>

Most of the bridge device's vlan init bugs come from the fact that it's
done in the wrong place, way too early in ndo_init() before the device is
even assigned an ifindex. That makes error handling harder, especially for
older kernels which don't have bridge ndo_uninit callback. It also
introduces another bug when the bridge's dev_addr is added as fdb in the
the initial default pvid on vlan initialization, the fdb notification has
ifindex/NDA_MASTER both equal to 0 (see example below) which really
makes no sense for user-space[0]. Usually user-space software would ignore
such entries, but they are actually valid and will eventually have all
necessary attributes. I chose to change the order because this can be
backported to all kernels even pre-ndo_uninit ones without many changes
and it keeps init/deinit symmetric. As a bonus this allows us to keep
the vlan init/deinit entirely in br_vlan.c and remove those exports.
It makes much more sense to send a notification *after* the device has
registered and has a proper ifindex allocated rather than before when
there's a chance that the registration might still fail.

For the demonstration below a small change to iproute2 for printing all fdb
notifications is added, because it contained a workaround not to show
entries with ifindex == 0.
Command executed while monitoring: $ ip l add br0 type bridge
Before (both ifindex and master == 0):
$ bridge monitor fdb
36:7e:8a:b3:56:ba dev * vlan 1 master * permanent

After (proper br0 ifindex):
$ bridge monitor fdb
e6:2a:ae:7a:b7:48 dev br0 vlan 1 master br0 permanent

v3: send the correct v2 patch with all changes (stub should return 0)
v2: on error in br_vlan_init set br->vlgrp to NULL and return 0 in
    the br_vlan_bridge_event stub when bridge vlans are disabled

[0] https://bugzilla.kernel.org/show_bug.cgi?id=204389

Reported-by: michael-dev <michael-dev@fami-braun.de>
Fixes: 5be5a2df40f0 ("bridge: Add filtering support for default_pvid")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
I tried a few different approaches to resolve this but they were all
unsuitable for some kernels, this approach can go to stables easily
and IMO is the way this had to be done from the start. Alternatively
we could move only the br_vlan_add and pair it with a br_vlan_del of
default_pvid on the same events, but I don't think it hurts to move
the whole init/deinit there as it'd help older stable releases as well.

I also tested the br_vlan_init error handling after the move by always
returning errors from all over it. Since errors at NETDEV_REGISTER cause
NETDEV_UNREGISTER we can deinit vlans properly for all cases regardless
why it happened (e.g. device destruction or init error).

 net/bridge/br.c         |  5 ++++-
 net/bridge/br_device.c  | 10 ----------
 net/bridge/br_private.h | 20 +++++---------------
 net/bridge/br_vlan.c    | 25 ++++++++++++++++++-------
 4 files changed, 27 insertions(+), 33 deletions(-)

diff --git a/net/bridge/br.c b/net/bridge/br.c
index d164f63a4345..8a8f9e5f264f 100644
--- a/net/bridge/br.c
+++ b/net/bridge/br.c
@@ -37,12 +37,15 @@ static int br_device_event(struct notifier_block *unused, unsigned long event, v
 	int err;
 
 	if (dev->priv_flags & IFF_EBRIDGE) {
+		err = br_vlan_bridge_event(dev, event, ptr);
+		if (err)
+			return notifier_from_errno(err);
+
 		if (event == NETDEV_REGISTER) {
 			/* register of bridge completed, add sysfs entries */
 			br_sysfs_addbr(dev);
 			return NOTIFY_DONE;
 		}
-		br_vlan_bridge_event(dev, event, ptr);
 	}
 
 	/* not a port of a bridge */
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 681b72862c16..b3e3de2ecf95 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -135,18 +135,9 @@ static int br_dev_init(struct net_device *dev)
 		return err;
 	}
 
-	err = br_vlan_init(br);
-	if (err) {
-		free_percpu(br->stats);
-		br_mdb_hash_fini(br);
-		br_fdb_hash_fini(br);
-		return err;
-	}
-
 	err = br_multicast_init_stats(br);
 	if (err) {
 		free_percpu(br->stats);
-		br_vlan_flush(br);
 		br_mdb_hash_fini(br);
 		br_fdb_hash_fini(br);
 	}
@@ -161,7 +152,6 @@ static void br_dev_uninit(struct net_device *dev)
 
 	br_multicast_dev_del(br);
 	br_multicast_uninit_stats(br);
-	br_vlan_flush(br);
 	br_mdb_hash_fini(br);
 	br_fdb_hash_fini(br);
 	free_percpu(br->stats);
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index e8cf03b43b7d..782f18c6b2a9 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -872,7 +872,6 @@ struct sk_buff *br_handle_vlan(struct net_bridge *br,
 int br_vlan_add(struct net_bridge *br, u16 vid, u16 flags,
 		bool *changed, struct netlink_ext_ack *extack);
 int br_vlan_delete(struct net_bridge *br, u16 vid);
-void br_vlan_flush(struct net_bridge *br);
 struct net_bridge_vlan *br_vlan_find(struct net_bridge_vlan_group *vg, u16 vid);
 void br_recalculate_fwd_mask(struct net_bridge *br);
 int __br_vlan_filter_toggle(struct net_bridge *br, unsigned long val);
@@ -881,7 +880,6 @@ int __br_vlan_set_proto(struct net_bridge *br, __be16 proto);
 int br_vlan_set_proto(struct net_bridge *br, unsigned long val);
 int br_vlan_set_stats(struct net_bridge *br, unsigned long val);
 int br_vlan_set_stats_per_port(struct net_bridge *br, unsigned long val);
-int br_vlan_init(struct net_bridge *br);
 int br_vlan_set_default_pvid(struct net_bridge *br, unsigned long val);
 int __br_vlan_set_default_pvid(struct net_bridge *br, u16 pvid,
 			       struct netlink_ext_ack *extack);
@@ -894,8 +892,8 @@ int nbp_get_num_vlan_infos(struct net_bridge_port *p, u32 filter_mask);
 void br_vlan_get_stats(const struct net_bridge_vlan *v,
 		       struct br_vlan_stats *stats);
 void br_vlan_port_event(struct net_bridge_port *p, unsigned long event);
-void br_vlan_bridge_event(struct net_device *dev, unsigned long event,
-			  void *ptr);
+int br_vlan_bridge_event(struct net_device *dev, unsigned long event,
+			 void *ptr);
 
 static inline struct net_bridge_vlan_group *br_vlan_group(
 					const struct net_bridge *br)
@@ -988,19 +986,10 @@ static inline int br_vlan_delete(struct net_bridge *br, u16 vid)
 	return -EOPNOTSUPP;
 }
 
-static inline void br_vlan_flush(struct net_bridge *br)
-{
-}
-
 static inline void br_recalculate_fwd_mask(struct net_bridge *br)
 {
 }
 
-static inline int br_vlan_init(struct net_bridge *br)
-{
-	return 0;
-}
-
 static inline int nbp_vlan_add(struct net_bridge_port *port, u16 vid, u16 flags,
 			       bool *changed, struct netlink_ext_ack *extack)
 {
@@ -1085,9 +1074,10 @@ static inline void br_vlan_port_event(struct net_bridge_port *p,
 {
 }
 
-static inline void br_vlan_bridge_event(struct net_device *dev,
-					unsigned long event, void *ptr)
+static inline int br_vlan_bridge_event(struct net_device *dev,
+				       unsigned long event, void *ptr)
 {
+	return 0;
 }
 #endif
 
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index a544e161c7fa..6a17408b4eb8 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -709,7 +709,7 @@ int br_vlan_delete(struct net_bridge *br, u16 vid)
 	return __vlan_del(v);
 }
 
-void br_vlan_flush(struct net_bridge *br)
+static void br_vlan_flush(struct net_bridge *br)
 {
 	struct net_bridge_vlan_group *vg;
 
@@ -721,6 +721,8 @@ void br_vlan_flush(struct net_bridge *br)
 	br_fdb_delete_by_port(br, NULL, 0, 1);
 
 	vg = br_vlan_group(br);
+	if (!vg)
+		return;
 	__vlan_flush(vg);
 	RCU_INIT_POINTER(br->vlgrp, NULL);
 	synchronize_rcu();
@@ -1054,7 +1056,7 @@ int br_vlan_set_default_pvid(struct net_bridge *br, unsigned long val)
 	return err;
 }
 
-int br_vlan_init(struct net_bridge *br)
+static int br_vlan_init(struct net_bridge *br)
 {
 	struct net_bridge_vlan_group *vg;
 	int ret = -ENOMEM;
@@ -1083,6 +1085,8 @@ int br_vlan_init(struct net_bridge *br)
 	return ret;
 
 err_vlan_add:
+	RCU_INIT_POINTER(br->vlgrp, NULL);
+	synchronize_rcu();
 	vlan_tunnel_deinit(vg);
 err_tunnel_init:
 	rhashtable_destroy(&vg->vlan_hash);
@@ -1469,13 +1473,19 @@ static void nbp_vlan_set_vlan_dev_state(struct net_bridge_port *p, u16 vid)
 }
 
 /* Must be protected by RTNL. */
-void br_vlan_bridge_event(struct net_device *dev, unsigned long event,
-			  void *ptr)
+int br_vlan_bridge_event(struct net_device *dev, unsigned long event, void *ptr)
 {
 	struct netdev_notifier_changeupper_info *info;
-	struct net_bridge *br;
+	struct net_bridge *br = netdev_priv(dev);
+	int ret = 0;
 
 	switch (event) {
+	case NETDEV_REGISTER:
+		ret = br_vlan_init(br);
+		break;
+	case NETDEV_UNREGISTER:
+		br_vlan_flush(br);
+		break;
 	case NETDEV_CHANGEUPPER:
 		info = ptr;
 		br_vlan_upper_change(dev, info->upper_dev, info->linking);
@@ -1483,12 +1493,13 @@ void br_vlan_bridge_event(struct net_device *dev, unsigned long event,
 
 	case NETDEV_CHANGE:
 	case NETDEV_UP:
-		br = netdev_priv(dev);
 		if (!br_opt_get(br, BROPT_VLAN_BRIDGE_BINDING))
-			return;
+			break;
 		br_vlan_link_state_change(dev, br);
 		break;
 	}
+
+	return ret;
 }
 
 /* Must be protected by RTNL. */
-- 
2.21.0


^ permalink raw reply related

* Re: [pull request][net-next 00/13] Mellanox, mlx5 tc flow handling for concurrent execution (Part 1)
From: David Miller @ 2019-07-31 22:48 UTC (permalink / raw)
  To: saeedm; +Cc: netdev
In-Reply-To: <20190729234934.23595-1-saeedm@mellanox.com>

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Mon, 29 Jul 2019 23:50:14 +0000

> This series, mostly from Vlad, is the first part of ongoing work to
> improve mlx5 tc flow handling by removing dependency on rtnl_lock and
> providing a more fine-grained locking and rcu safe data structures.
> 
> For more information please see tag log below.
> 
> Please pull and let me know if there is any problem.

Pulled, thanks Saeed.

I will push this back out after a build test (which will take a while
since I am on my laptop).  So please be patient.

Thanks.

^ permalink raw reply

* Re: KASAN: invalid-free in tls_sk_proto_cleanup
From: Jakub Kicinski @ 2019-07-31 22:46 UTC (permalink / raw)
  To: syzbot
  Cc: aviadye, borisp, daniel, davejwatson, davem, john.fastabend,
	linux-kernel, netdev, syzkaller-bugs
In-Reply-To: <00000000000010fb45058ef5eb52@google.com>

On Wed, 31 Jul 2019 01:29:05 -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    fde50b96 Add linux-next specific files for 20190726
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=15ea7f3fa00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=4b58274564b354c1
> dashboard link: https://syzkaller.appspot.com/bug?extid=f5731e2256eb5130dbd6
> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+f5731e2256eb5130dbd6@syzkaller.appspotmail.com
> 
> ==================================================================
> BUG: KASAN: double-free or invalid-free in tls_sk_proto_cleanup+0x216/0x3e0  
> net/tls/tls_main.c:300

FWIW there is a fix in the works for this and all new TLS issues.
I think John and Eric didn't have time yet to look at my theories,
so I'd like to keep it in QA for one extra day before posting.

^ permalink raw reply

* Re: [PATCH net v2] net: bridge: move vlan init/deinit to NETDEV_REGISTER/UNREGISTER
From: Nikolay Aleksandrov @ 2019-07-31 22:40 UTC (permalink / raw)
  To: netdev; +Cc: roopa, davem, bridge, michael-dev
In-Reply-To: <20190731223736.18856-1-nikolay@cumulusnetworks.com>

On 01/08/2019 01:37, Nikolay Aleksandrov wrote:
> Most of the bridge device's vlan init bugs come from the fact that it's
> done in the wrong place, way too early in ndo_init() before the device is
> even assigned an ifindex. That makes error handling harder, especially for
> older kernels which don't have bridge ndo_uninit callback. It also
> introduces another bug when the bridge's dev_addr is added as fdb in the
> the initial default pvid on vlan initialization, the fdb notification has
> ifindex/NDA_MASTER both equal to 0 (see example below) which really
> makes no sense for user-space[0]. Usually user-space software would ignore
> such entries, but they are actually valid and will eventually have all
> necessary attributes. I chose to change the order because this can be
> backported to all kernels even pre-ndo_uninit ones without many changes
> and it keeps init/deinit symmetric. As a bonus this allows us to keep
> the vlan init/deinit entirely in br_vlan.c and remove those exports.
> It makes much more sense to send a notification *after* the device has
> registered and has a proper ifindex allocated rather than before when
> there's a chance that the registration might still fail.
> 
> For the demonstration below a small change to iproute2 for printing all fdb
> notifications is added, because it contained a workaround not to show
> entries with ifindex == 0.
> Command executed while monitoring: $ ip l add br0 type bridge
> Before (both ifindex and master == 0):
> $ bridge monitor fdb
> 36:7e:8a:b3:56:ba dev * vlan 1 master * permanent
> 
> After (proper br0 ifindex):
> $ bridge monitor fdb
> e6:2a:ae:7a:b7:48 dev br0 vlan 1 master br0 permanent
> 
> v2: on error in br_vlan_init set br->vlgrp to NULL
> 
> [0] https://bugzilla.kernel.org/show_bug.cgi?id=204389
> 
> Reported-by: michael-dev <michael-dev@fami-braun.de>
> Fixes: 5be5a2df40f0 ("bridge: Add filtering support for default_pvid")
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> ---
> I tried a few different approaches to resolve this but they were all
> unsuitable for some kernels, this approach can go to stables easily
> and IMO is the way this had to be done from the start. Alternatively
> we could move only the br_vlan_add and pair it with a br_vlan_del of
> default_pvid on the same events, but I don't think it hurts to move
> the whole init/deinit there as it'd help older stable releases as well.
> 
> I also tested the br_vlan_init error handling after the move by always
> returning errors from all over it. Since errors at NETDEV_REGISTER cause
> NETDEV_UNREGISTER we can deinit vlans properly for all cases regardless
> why it happened (e.g. device destruction or init error).
> 
>  net/bridge/br.c         |  5 ++++-
>  net/bridge/br_device.c  | 10 ----------
>  net/bridge/br_private.h | 19 ++++---------------
>  net/bridge/br_vlan.c    | 25 ++++++++++++++++++-------
>  4 files changed, 26 insertions(+), 33 deletions(-)
> 

Aargh, I apologize for the noise, this is the wrong v2 patch...
Will send the correct one as v3 in a moment.


^ permalink raw reply

* [PATCH net v2] net: bridge: move vlan init/deinit to NETDEV_REGISTER/UNREGISTER
From: Nikolay Aleksandrov @ 2019-07-31 22:37 UTC (permalink / raw)
  To: netdev; +Cc: roopa, davem, bridge, Nikolay Aleksandrov, michael-dev
In-Reply-To: <20190731183623.20127-1-nikolay@cumulusnetworks.com>

Most of the bridge device's vlan init bugs come from the fact that it's
done in the wrong place, way too early in ndo_init() before the device is
even assigned an ifindex. That makes error handling harder, especially for
older kernels which don't have bridge ndo_uninit callback. It also
introduces another bug when the bridge's dev_addr is added as fdb in the
the initial default pvid on vlan initialization, the fdb notification has
ifindex/NDA_MASTER both equal to 0 (see example below) which really
makes no sense for user-space[0]. Usually user-space software would ignore
such entries, but they are actually valid and will eventually have all
necessary attributes. I chose to change the order because this can be
backported to all kernels even pre-ndo_uninit ones without many changes
and it keeps init/deinit symmetric. As a bonus this allows us to keep
the vlan init/deinit entirely in br_vlan.c and remove those exports.
It makes much more sense to send a notification *after* the device has
registered and has a proper ifindex allocated rather than before when
there's a chance that the registration might still fail.

For the demonstration below a small change to iproute2 for printing all fdb
notifications is added, because it contained a workaround not to show
entries with ifindex == 0.
Command executed while monitoring: $ ip l add br0 type bridge
Before (both ifindex and master == 0):
$ bridge monitor fdb
36:7e:8a:b3:56:ba dev * vlan 1 master * permanent

After (proper br0 ifindex):
$ bridge monitor fdb
e6:2a:ae:7a:b7:48 dev br0 vlan 1 master br0 permanent

v2: on error in br_vlan_init set br->vlgrp to NULL

[0] https://bugzilla.kernel.org/show_bug.cgi?id=204389

Reported-by: michael-dev <michael-dev@fami-braun.de>
Fixes: 5be5a2df40f0 ("bridge: Add filtering support for default_pvid")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
I tried a few different approaches to resolve this but they were all
unsuitable for some kernels, this approach can go to stables easily
and IMO is the way this had to be done from the start. Alternatively
we could move only the br_vlan_add and pair it with a br_vlan_del of
default_pvid on the same events, but I don't think it hurts to move
the whole init/deinit there as it'd help older stable releases as well.

I also tested the br_vlan_init error handling after the move by always
returning errors from all over it. Since errors at NETDEV_REGISTER cause
NETDEV_UNREGISTER we can deinit vlans properly for all cases regardless
why it happened (e.g. device destruction or init error).

 net/bridge/br.c         |  5 ++++-
 net/bridge/br_device.c  | 10 ----------
 net/bridge/br_private.h | 19 ++++---------------
 net/bridge/br_vlan.c    | 25 ++++++++++++++++++-------
 4 files changed, 26 insertions(+), 33 deletions(-)

diff --git a/net/bridge/br.c b/net/bridge/br.c
index d164f63a4345..8a8f9e5f264f 100644
--- a/net/bridge/br.c
+++ b/net/bridge/br.c
@@ -37,12 +37,15 @@ static int br_device_event(struct notifier_block *unused, unsigned long event, v
 	int err;
 
 	if (dev->priv_flags & IFF_EBRIDGE) {
+		err = br_vlan_bridge_event(dev, event, ptr);
+		if (err)
+			return notifier_from_errno(err);
+
 		if (event == NETDEV_REGISTER) {
 			/* register of bridge completed, add sysfs entries */
 			br_sysfs_addbr(dev);
 			return NOTIFY_DONE;
 		}
-		br_vlan_bridge_event(dev, event, ptr);
 	}
 
 	/* not a port of a bridge */
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 681b72862c16..b3e3de2ecf95 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -135,18 +135,9 @@ static int br_dev_init(struct net_device *dev)
 		return err;
 	}
 
-	err = br_vlan_init(br);
-	if (err) {
-		free_percpu(br->stats);
-		br_mdb_hash_fini(br);
-		br_fdb_hash_fini(br);
-		return err;
-	}
-
 	err = br_multicast_init_stats(br);
 	if (err) {
 		free_percpu(br->stats);
-		br_vlan_flush(br);
 		br_mdb_hash_fini(br);
 		br_fdb_hash_fini(br);
 	}
@@ -161,7 +152,6 @@ static void br_dev_uninit(struct net_device *dev)
 
 	br_multicast_dev_del(br);
 	br_multicast_uninit_stats(br);
-	br_vlan_flush(br);
 	br_mdb_hash_fini(br);
 	br_fdb_hash_fini(br);
 	free_percpu(br->stats);
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index e8cf03b43b7d..96dd1c68d73f 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -872,7 +872,6 @@ struct sk_buff *br_handle_vlan(struct net_bridge *br,
 int br_vlan_add(struct net_bridge *br, u16 vid, u16 flags,
 		bool *changed, struct netlink_ext_ack *extack);
 int br_vlan_delete(struct net_bridge *br, u16 vid);
-void br_vlan_flush(struct net_bridge *br);
 struct net_bridge_vlan *br_vlan_find(struct net_bridge_vlan_group *vg, u16 vid);
 void br_recalculate_fwd_mask(struct net_bridge *br);
 int __br_vlan_filter_toggle(struct net_bridge *br, unsigned long val);
@@ -881,7 +880,6 @@ int __br_vlan_set_proto(struct net_bridge *br, __be16 proto);
 int br_vlan_set_proto(struct net_bridge *br, unsigned long val);
 int br_vlan_set_stats(struct net_bridge *br, unsigned long val);
 int br_vlan_set_stats_per_port(struct net_bridge *br, unsigned long val);
-int br_vlan_init(struct net_bridge *br);
 int br_vlan_set_default_pvid(struct net_bridge *br, unsigned long val);
 int __br_vlan_set_default_pvid(struct net_bridge *br, u16 pvid,
 			       struct netlink_ext_ack *extack);
@@ -894,8 +892,8 @@ int nbp_get_num_vlan_infos(struct net_bridge_port *p, u32 filter_mask);
 void br_vlan_get_stats(const struct net_bridge_vlan *v,
 		       struct br_vlan_stats *stats);
 void br_vlan_port_event(struct net_bridge_port *p, unsigned long event);
-void br_vlan_bridge_event(struct net_device *dev, unsigned long event,
-			  void *ptr);
+int br_vlan_bridge_event(struct net_device *dev, unsigned long event,
+			 void *ptr);
 
 static inline struct net_bridge_vlan_group *br_vlan_group(
 					const struct net_bridge *br)
@@ -988,19 +986,10 @@ static inline int br_vlan_delete(struct net_bridge *br, u16 vid)
 	return -EOPNOTSUPP;
 }
 
-static inline void br_vlan_flush(struct net_bridge *br)
-{
-}
-
 static inline void br_recalculate_fwd_mask(struct net_bridge *br)
 {
 }
 
-static inline int br_vlan_init(struct net_bridge *br)
-{
-	return 0;
-}
-
 static inline int nbp_vlan_add(struct net_bridge_port *port, u16 vid, u16 flags,
 			       bool *changed, struct netlink_ext_ack *extack)
 {
@@ -1085,8 +1074,8 @@ static inline void br_vlan_port_event(struct net_bridge_port *p,
 {
 }
 
-static inline void br_vlan_bridge_event(struct net_device *dev,
-					unsigned long event, void *ptr)
+static inline int br_vlan_bridge_event(struct net_device *dev,
+				       unsigned long event, void *ptr)
 {
 }
 #endif
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index a544e161c7fa..6a17408b4eb8 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -709,7 +709,7 @@ int br_vlan_delete(struct net_bridge *br, u16 vid)
 	return __vlan_del(v);
 }
 
-void br_vlan_flush(struct net_bridge *br)
+static void br_vlan_flush(struct net_bridge *br)
 {
 	struct net_bridge_vlan_group *vg;
 
@@ -721,6 +721,8 @@ void br_vlan_flush(struct net_bridge *br)
 	br_fdb_delete_by_port(br, NULL, 0, 1);
 
 	vg = br_vlan_group(br);
+	if (!vg)
+		return;
 	__vlan_flush(vg);
 	RCU_INIT_POINTER(br->vlgrp, NULL);
 	synchronize_rcu();
@@ -1054,7 +1056,7 @@ int br_vlan_set_default_pvid(struct net_bridge *br, unsigned long val)
 	return err;
 }
 
-int br_vlan_init(struct net_bridge *br)
+static int br_vlan_init(struct net_bridge *br)
 {
 	struct net_bridge_vlan_group *vg;
 	int ret = -ENOMEM;
@@ -1083,6 +1085,8 @@ int br_vlan_init(struct net_bridge *br)
 	return ret;
 
 err_vlan_add:
+	RCU_INIT_POINTER(br->vlgrp, NULL);
+	synchronize_rcu();
 	vlan_tunnel_deinit(vg);
 err_tunnel_init:
 	rhashtable_destroy(&vg->vlan_hash);
@@ -1469,13 +1473,19 @@ static void nbp_vlan_set_vlan_dev_state(struct net_bridge_port *p, u16 vid)
 }
 
 /* Must be protected by RTNL. */
-void br_vlan_bridge_event(struct net_device *dev, unsigned long event,
-			  void *ptr)
+int br_vlan_bridge_event(struct net_device *dev, unsigned long event, void *ptr)
 {
 	struct netdev_notifier_changeupper_info *info;
-	struct net_bridge *br;
+	struct net_bridge *br = netdev_priv(dev);
+	int ret = 0;
 
 	switch (event) {
+	case NETDEV_REGISTER:
+		ret = br_vlan_init(br);
+		break;
+	case NETDEV_UNREGISTER:
+		br_vlan_flush(br);
+		break;
 	case NETDEV_CHANGEUPPER:
 		info = ptr;
 		br_vlan_upper_change(dev, info->upper_dev, info->linking);
@@ -1483,12 +1493,13 @@ void br_vlan_bridge_event(struct net_device *dev, unsigned long event,
 
 	case NETDEV_CHANGE:
 	case NETDEV_UP:
-		br = netdev_priv(dev);
 		if (!br_opt_get(br, BROPT_VLAN_BRIDGE_BINDING))
-			return;
+			break;
 		br_vlan_link_state_change(dev, br);
 		break;
 	}
+
+	return ret;
 }
 
 /* Must be protected by RTNL. */
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH bpf v2] libbpf : make libbpf_num_possible_cpus function thread safe
From: Jakub Kicinski @ 2019-07-31 22:31 UTC (permalink / raw)
  To: Takshak Chahande; +Cc: netdev, ast, daniel, rdna, kernel-team, hechaol
In-Reply-To: <20190731221055.1478201-1-ctakshak@fb.com>

On Wed, 31 Jul 2019 15:10:55 -0700, Takshak Chahande wrote:
> Having static variable `cpus` in libbpf_num_possible_cpus function
> without guarding it with mutex makes this function thread-unsafe.
> 
> If multiple threads accessing this function, in the current form; it
> leads to incrementing the static variable value `cpus` in the multiple
> of total available CPUs.
> 
> Used local stack variable to calculate the number of possible CPUs and
> then updated the static variable using WRITE_ONCE().
> 
> Changes since v1:
>  * added stack variable to calculate cpus
>  * serialized static variable update using WRITE_ONCE()
>  * fixed Fixes tag
> 
> Fixes: 6446b3155521 ("bpf: add a new API libbpf_num_possible_cpus()")
> Signed-off-by: Takshak Chahande <ctakshak@fb.com>

Perhaps we would have a little less code churn if the static variable
was renamed (e.g. to saved_cpus), but functionally looks good, so:

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

(FWIW I think Andrey's comment does not apply to the networking and BPF
trees so if you respin please keep the changelog in the commit message.)

^ permalink raw reply

* Re: [patch net-next 0/3] net: devlink: Finish network namespace support
From: David Ahern @ 2019-07-31 22:31 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Jiri Pirko, netdev, davem, sthemmin, mlxsw
In-Reply-To: <20190731152805.4110ec41@cakuba.netronome.com>

On 7/31/19 4:28 PM, Jakub Kicinski wrote:
> On Wed, 31 Jul 2019 16:07:31 -0600, David Ahern wrote:
>> On 7/31/19 4:02 PM, Jakub Kicinski wrote:
>>> Can you elaborate further? Ports for most purposes are represented by
>>> netdevices. Devlink port instances expose global topological view of
>>> the ports which is primarily relevant if you can see the entire ASIC.
>>> I think the global configuration and global view of resources is still
>>> the most relevant need, so in your diagram you must account for some
>>> "all-seeing" instance, e.g.:
>>>
>>>    namespace 1 |  namespace 2  | ... | namespace N
>>>                |               |     |
>>>   { ports 1 }  |   { ports 2 } | ... | { ports N }
>>>                |               |     |
>>> subdevlink 1   | subdevlink 2  | ... |  subdevlink N
>>>          \______      |              _______/
>>>                  master ASIC devlink
>>>   =================================================
>>>                    driver
>>>
>>> No?
>>
>> sure, there could be a master devlink visible to the user if that makes
>> sense or the driver can account for it behind the scenes as the sum of
>> the devlink instances.
>>
>> The goal is to allow ports within an asic [1] to be divided across
>> network namespace where each namespace sees a subset of the ports. This
>> allows creating multiple logical switches from a single physical asic.
>>
>> [1] within constraints imposed by the driver/hardware - for example to
>> account for resources shared by a set of ports. e.g., front panel ports
>> 1 - 4 have shared resources and must always be in the same devlink instance.
> 
> So the ASIC would start out all partitioned? Presumably some would
> still like to use it non-partitioned? What follows there must be a top
> level instance to decide on partitioning, and moving resources between
> sub-instances.
> 
> Right now I don't think there is much info in devlink ports which would
> be relevant without full view of the ASIC..
> 

not sure how it would play out. really just 'thinking out loud' about
the above use case to make sure devlink with proper namespace support
allows it - or does not prevent it.

^ permalink raw reply

* Re: [patch net-next 0/3] net: devlink: Finish network namespace support
From: Jakub Kicinski @ 2019-07-31 22:28 UTC (permalink / raw)
  To: David Ahern; +Cc: Jiri Pirko, netdev, davem, sthemmin, mlxsw
In-Reply-To: <45803ed3-0328-9409-4351-6c26ba8af3cd@gmail.com>

On Wed, 31 Jul 2019 16:07:31 -0600, David Ahern wrote:
> On 7/31/19 4:02 PM, Jakub Kicinski wrote:
> > Can you elaborate further? Ports for most purposes are represented by
> > netdevices. Devlink port instances expose global topological view of
> > the ports which is primarily relevant if you can see the entire ASIC.
> > I think the global configuration and global view of resources is still
> > the most relevant need, so in your diagram you must account for some
> > "all-seeing" instance, e.g.:
> > 
> >    namespace 1 |  namespace 2  | ... | namespace N
> >                |               |     |
> >   { ports 1 }  |   { ports 2 } | ... | { ports N }
> >                |               |     |
> > subdevlink 1   | subdevlink 2  | ... |  subdevlink N
> >          \______      |              _______/
> >                  master ASIC devlink
> >   =================================================
> >                    driver
> > 
> > No?
> 
> sure, there could be a master devlink visible to the user if that makes
> sense or the driver can account for it behind the scenes as the sum of
> the devlink instances.
> 
> The goal is to allow ports within an asic [1] to be divided across
> network namespace where each namespace sees a subset of the ports. This
> allows creating multiple logical switches from a single physical asic.
> 
> [1] within constraints imposed by the driver/hardware - for example to
> account for resources shared by a set of ports. e.g., front panel ports
> 1 - 4 have shared resources and must always be in the same devlink instance.

So the ASIC would start out all partitioned? Presumably some would
still like to use it non-partitioned? What follows there must be a top
level instance to decide on partitioning, and moving resources between
sub-instances.

Right now I don't think there is much info in devlink ports which would
be relevant without full view of the ASIC..

^ permalink raw reply

* Re: [PATCH] net: sctp: Rename fallthrough label to unhandled
From: Joe Perches @ 2019-07-31 22:23 UTC (permalink / raw)
  To: Neil Horman
  Cc: Vlad Yasevich, Marcelo Ricardo Leitner, David S. Miller,
	linux-sctp, netdev, linux-kernel
In-Reply-To: <20190731205804.GE9823@hmswarspite.think-freely.org>

On Wed, 2019-07-31 at 16:58 -0400, Neil Horman wrote:
> On Wed, Jul 31, 2019 at 09:35:31AM -0700, Joe Perches wrote:
> > On Wed, 2019-07-31 at 08:16 -0400, Neil Horman wrote:
> > > On Wed, Jul 31, 2019 at 04:32:43AM -0700, Joe Perches wrote:
> > > > On Wed, 2019-07-31 at 07:19 -0400, Neil Horman wrote:
> > > > > On Tue, Jul 30, 2019 at 10:04:37PM -0700, Joe Perches wrote:
> > > > > > fallthrough may become a pseudo reserved keyword so this only use of
> > > > > > fallthrough is better renamed to allow it.
> > > > > > 
> > > > > > Signed-off-by: Joe Perches <joe@perches.com>
> > > > > Are you referring to the __attribute__((fallthrough)) statement that gcc
> > > > > supports?  If so the compiler should by all rights be able to differentiate
> > > > > between a null statement attribute and a explicit goto and label without the
> > > > > need for renaming here.  Or are you referring to something else?
> > > > 
> > > > Hi.
> > > > 
> > > > I sent after this a patch that adds
> > > > 
> > > > # define fallthrough                    __attribute__((__fallthrough__))
> > > > 
> > > > https://lore.kernel.org/patchwork/patch/1108577/
> > > > 
> > > > So this rename is a prerequisite to adding this #define.
> > > > 
> > > why not just define __fallthrough instead, like we do for all the other
> > > attributes we alias (i.e. __read_mostly, __protected_by, __unused, __exception,
> > > etc)
> > 
> > Because it's not as intelligible when used as a statement.
> I think thats somewhat debatable.  __fallthrough to me looks like an internal
> macro, whereas fallthrough looks like a comment someone forgot to /* */


I'd rather see:

	switch (foo) {
	case FOO:
		bar |= baz;
		fallthrough;
	case BAR:
		bar |= qux;
		break;
	default:
		error();
	}

than

	switch (foo) {
	case FOO:
		bar |= baz;
		__fallthrough;
	case BAR:
		bar |= qux;
		break;
	default:
		error();
	}

or esoecially

	switch (foo) {
	case FOO:
		bar |= baz;
		/* fallthrough
*/;
	case BAR:
		bar |= qux;
		break;
	default:
		error();
	}

but <shrug>, bikeshed ahoy!...



^ permalink raw reply

* Re: [Potential Spoof] [PATCH bpf v2] libbpf : make libbpf_num_possible_cpus function thread safe
From: Andrey Ignatov @ 2019-07-31 22:17 UTC (permalink / raw)
  To: Takshak Chahande
  Cc: netdev@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
	Kernel Team, Hechao Li, jakub.kicinski@netronome.com
In-Reply-To: <20190731221055.1478201-1-ctakshak@fb.com>

Takshak Chahande <ctakshak@fb.com> [Wed, 2019-07-31 15:11 -0700]:
> Having static variable `cpus` in libbpf_num_possible_cpus function
> without guarding it with mutex makes this function thread-unsafe.
> 
> If multiple threads accessing this function, in the current form; it
> leads to incrementing the static variable value `cpus` in the multiple
> of total available CPUs.
> 
> Used local stack variable to calculate the number of possible CPUs and
> then updated the static variable using WRITE_ONCE().
> 
> Changes since v1:
>  * added stack variable to calculate cpus
>  * serialized static variable update using WRITE_ONCE()
>  * fixed Fixes tag

This "Changes" section should be after "---" line not to be included in
the final commit message.

Not sure if resubmit is needed because of it, but other than this looks
good to me.

Acked-by: Andrey Ignatov <rdna@fb.com>


> Fixes: 6446b3155521 ("bpf: add a new API libbpf_num_possible_cpus()")
> Signed-off-by: Takshak Chahande <ctakshak@fb.com>
> ---
>  tools/lib/bpf/libbpf.c | 18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 6718d0b90130..2e84fa5b8479 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -4995,13 +4995,15 @@ int libbpf_num_possible_cpus(void)
>  	static const char *fcpu = "/sys/devices/system/cpu/possible";
>  	int len = 0, n = 0, il = 0, ir = 0;
>  	unsigned int start = 0, end = 0;
> +	int tmp_cpus = 0;
>  	static int cpus;
>  	char buf[128];
>  	int error = 0;
>  	int fd = -1;
>  
> -	if (cpus > 0)
> -		return cpus;
> +	tmp_cpus = READ_ONCE(cpus);
> +	if (tmp_cpus > 0)
> +		return tmp_cpus;
>  
>  	fd = open(fcpu, O_RDONLY);
>  	if (fd < 0) {
> @@ -5024,7 +5026,7 @@ int libbpf_num_possible_cpus(void)
>  	}
>  	buf[len] = '\0';
>  
> -	for (ir = 0, cpus = 0; ir <= len; ir++) {
> +	for (ir = 0, tmp_cpus = 0; ir <= len; ir++) {
>  		/* Each sub string separated by ',' has format \d+-\d+ or \d+ */
>  		if (buf[ir] == ',' || buf[ir] == '\0') {
>  			buf[ir] = '\0';
> @@ -5036,13 +5038,15 @@ int libbpf_num_possible_cpus(void)
>  			} else if (n == 1) {
>  				end = start;
>  			}
> -			cpus += end - start + 1;
> +			tmp_cpus += end - start + 1;
>  			il = ir + 1;
>  		}
>  	}
> -	if (cpus <= 0) {
> -		pr_warning("Invalid #CPUs %d from %s\n", cpus, fcpu);
> +	if (tmp_cpus <= 0) {
> +		pr_warning("Invalid #CPUs %d from %s\n", tmp_cpus, fcpu);
>  		return -EINVAL;
>  	}
> -	return cpus;
> +
> +	WRITE_ONCE(cpus, tmp_cpus);
> +	return tmp_cpus;
>  }
> -- 
> 2.17.1
> 

-- 
Andrey Ignatov

^ permalink raw reply

* [PATCH V37 23/29] bpf: Restrict bpf when kernel lockdown is in confidentiality mode
From: Matthew Garrett @ 2019-07-31 22:16 UTC (permalink / raw)
  To: jmorris
  Cc: linux-security-module, linux-kernel, linux-api, David Howells,
	Alexei Starovoitov, Matthew Garrett, Kees Cook, netdev,
	Chun-Yi Lee, Daniel Borkmann
In-Reply-To: <20190731221617.234725-1-matthewgarrett@google.com>

From: David Howells <dhowells@redhat.com>

bpf_read() and bpf_read_str() could potentially be abused to (eg) allow
private keys in kernel memory to be leaked. Disable them if the kernel
has been locked down in confidentiality mode.

Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Matthew Garrett <mjg59@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
cc: netdev@vger.kernel.org
cc: Chun-Yi Lee <jlee@suse.com>
cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/security.h     |  1 +
 kernel/trace/bpf_trace.c     | 10 ++++++++++
 security/lockdown/lockdown.c |  1 +
 3 files changed, 12 insertions(+)

diff --git a/include/linux/security.h b/include/linux/security.h
index 987d8427f091..8dd1741a52cd 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -118,6 +118,7 @@ enum lockdown_reason {
 	LOCKDOWN_INTEGRITY_MAX,
 	LOCKDOWN_KCORE,
 	LOCKDOWN_KPROBES,
+	LOCKDOWN_BPF_READ,
 	LOCKDOWN_CONFIDENTIALITY_MAX,
 };
 
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index ca1255d14576..492a8bfaae98 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -142,8 +142,13 @@ BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const void *, unsafe_ptr)
 {
 	int ret;
 
+	ret = security_locked_down(LOCKDOWN_BPF_READ);
+	if (ret < 0)
+		goto out;
+
 	ret = probe_kernel_read(dst, unsafe_ptr, size);
 	if (unlikely(ret < 0))
+out:
 		memset(dst, 0, size);
 
 	return ret;
@@ -569,6 +574,10 @@ BPF_CALL_3(bpf_probe_read_str, void *, dst, u32, size,
 {
 	int ret;
 
+	ret = security_locked_down(LOCKDOWN_BPF_READ);
+	if (ret < 0)
+		goto out;
+
 	/*
 	 * The strncpy_from_unsafe() call will likely not fill the entire
 	 * buffer, but that's okay in this circumstance as we're probing
@@ -580,6 +589,7 @@ BPF_CALL_3(bpf_probe_read_str, void *, dst, u32, size,
 	 */
 	ret = strncpy_from_unsafe(dst, unsafe_ptr, size);
 	if (unlikely(ret < 0))
+out:
 		memset(dst, 0, size);
 
 	return ret;
diff --git a/security/lockdown/lockdown.c b/security/lockdown/lockdown.c
index 6b123cbf3748..1b89d3e8e54d 100644
--- a/security/lockdown/lockdown.c
+++ b/security/lockdown/lockdown.c
@@ -33,6 +33,7 @@ static char *lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1] = {
 	[LOCKDOWN_INTEGRITY_MAX] = "integrity",
 	[LOCKDOWN_KCORE] = "/proc/kcore access",
 	[LOCKDOWN_KPROBES] = "use of kprobes",
+	[LOCKDOWN_BPF_READ] = "use of bpf to read kernel RAM",
 	[LOCKDOWN_CONFIDENTIALITY_MAX] = "confidentiality",
 };
 
-- 
2.22.0.770.g0f2c4a37fd-goog


^ permalink raw reply related

* [PATCH bpf v2] libbpf : make libbpf_num_possible_cpus function thread safe
From: Takshak Chahande @ 2019-07-31 22:10 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, rdna, ctakshak, kernel-team, hechaol, jakub.kicinski

Having static variable `cpus` in libbpf_num_possible_cpus function
without guarding it with mutex makes this function thread-unsafe.

If multiple threads accessing this function, in the current form; it
leads to incrementing the static variable value `cpus` in the multiple
of total available CPUs.

Used local stack variable to calculate the number of possible CPUs and
then updated the static variable using WRITE_ONCE().

Changes since v1:
 * added stack variable to calculate cpus
 * serialized static variable update using WRITE_ONCE()
 * fixed Fixes tag

Fixes: 6446b3155521 ("bpf: add a new API libbpf_num_possible_cpus()")
Signed-off-by: Takshak Chahande <ctakshak@fb.com>
---
 tools/lib/bpf/libbpf.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 6718d0b90130..2e84fa5b8479 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -4995,13 +4995,15 @@ int libbpf_num_possible_cpus(void)
 	static const char *fcpu = "/sys/devices/system/cpu/possible";
 	int len = 0, n = 0, il = 0, ir = 0;
 	unsigned int start = 0, end = 0;
+	int tmp_cpus = 0;
 	static int cpus;
 	char buf[128];
 	int error = 0;
 	int fd = -1;
 
-	if (cpus > 0)
-		return cpus;
+	tmp_cpus = READ_ONCE(cpus);
+	if (tmp_cpus > 0)
+		return tmp_cpus;
 
 	fd = open(fcpu, O_RDONLY);
 	if (fd < 0) {
@@ -5024,7 +5026,7 @@ int libbpf_num_possible_cpus(void)
 	}
 	buf[len] = '\0';
 
-	for (ir = 0, cpus = 0; ir <= len; ir++) {
+	for (ir = 0, tmp_cpus = 0; ir <= len; ir++) {
 		/* Each sub string separated by ',' has format \d+-\d+ or \d+ */
 		if (buf[ir] == ',' || buf[ir] == '\0') {
 			buf[ir] = '\0';
@@ -5036,13 +5038,15 @@ int libbpf_num_possible_cpus(void)
 			} else if (n == 1) {
 				end = start;
 			}
-			cpus += end - start + 1;
+			tmp_cpus += end - start + 1;
 			il = ir + 1;
 		}
 	}
-	if (cpus <= 0) {
-		pr_warning("Invalid #CPUs %d from %s\n", cpus, fcpu);
+	if (tmp_cpus <= 0) {
+		pr_warning("Invalid #CPUs %d from %s\n", tmp_cpus, fcpu);
 		return -EINVAL;
 	}
-	return cpus;
+
+	WRITE_ONCE(cpus, tmp_cpus);
+	return tmp_cpus;
 }
-- 
2.17.1


^ permalink raw reply related

* Re: [patch net-next 0/3] net: devlink: Finish network namespace support
From: David Ahern @ 2019-07-31 22:07 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Jiri Pirko, netdev, davem, sthemmin, mlxsw
In-Reply-To: <20190731150233.432d3c86@cakuba.netronome.com>

On 7/31/19 4:02 PM, Jakub Kicinski wrote:
> On Wed, 31 Jul 2019 15:50:26 -0600, David Ahern wrote:
>> On 7/30/19 12:08 AM, Jiri Pirko wrote:
>>> Mon, Jul 29, 2019 at 10:17:25PM CEST, dsahern@gmail.com wrote:  
>>>> On 7/27/19 3:44 AM, Jiri Pirko wrote:  
>>>>> From: Jiri Pirko <jiri@mellanox.com>
>>>>>
>>>>> Devlink from the beginning counts with network namespaces, but the
>>>>> instances has been fixed to init_net. The first patch allows user
>>>>> to move existing devlink instances into namespaces:
>>>>>  
>>>>
>>>> so you intend for an asic, for example, to have multiple devlink
>>>> instances where each instance governs a set of related ports (e.g.,
>>>> ports that share a set of hardware resources) and those instances can be
>>>> managed from distinct network namespaces?  
>>>
>>> No, no multiple devlink instances for asic intended.  
>>
>> So it should be allowed for an asic to have resources split across
>> network namespaces. e.g., something like this:
>>
>>    namespace 1 |  namespace 2  | ... | namespace N
>>                |               |     |
>>   { ports 1 }  |   { ports 2 } | ... | { ports N }
>>                |               |     |
>>    devlink 1   |    devlink 2  | ... |  devlink N
>>   =================================================
>>                    driver
> 
> Can you elaborate further? Ports for most purposes are represented by
> netdevices. Devlink port instances expose global topological view of
> the ports which is primarily relevant if you can see the entire ASIC.
> I think the global configuration and global view of resources is still
> the most relevant need, so in your diagram you must account for some
> "all-seeing" instance, e.g.:
> 
>    namespace 1 |  namespace 2  | ... | namespace N
>                |               |     |
>   { ports 1 }  |   { ports 2 } | ... | { ports N }
>                |               |     |
> subdevlink 1   | subdevlink 2  | ... |  subdevlink N
>          \______      |              _______/
>                  master ASIC devlink
>   =================================================
>                    driver
> 
> No?
> 

sure, there could be a master devlink visible to the user if that makes
sense or the driver can account for it behind the scenes as the sum of
the devlink instances.

The goal is to allow ports within an asic [1] to be divided across
network namespace where each namespace sees a subset of the ports. This
allows creating multiple logical switches from a single physical asic.

[1] within constraints imposed by the driver/hardware - for example to
account for resources shared by a set of ports. e.g., front panel ports
1 - 4 have shared resources and must always be in the same devlink instance.

^ permalink raw reply

* Re: [patch net-next 0/3] net: devlink: Finish network namespace support
From: Jakub Kicinski @ 2019-07-31 22:02 UTC (permalink / raw)
  To: David Ahern; +Cc: Jiri Pirko, netdev, davem, sthemmin, mlxsw
In-Reply-To: <8f5afc58-1cbc-9e9a-aa15-94d1bafcda22@gmail.com>

On Wed, 31 Jul 2019 15:50:26 -0600, David Ahern wrote:
> On 7/30/19 12:08 AM, Jiri Pirko wrote:
> > Mon, Jul 29, 2019 at 10:17:25PM CEST, dsahern@gmail.com wrote:  
> >> On 7/27/19 3:44 AM, Jiri Pirko wrote:  
> >>> From: Jiri Pirko <jiri@mellanox.com>
> >>>
> >>> Devlink from the beginning counts with network namespaces, but the
> >>> instances has been fixed to init_net. The first patch allows user
> >>> to move existing devlink instances into namespaces:
> >>>  
> >>
> >> so you intend for an asic, for example, to have multiple devlink
> >> instances where each instance governs a set of related ports (e.g.,
> >> ports that share a set of hardware resources) and those instances can be
> >> managed from distinct network namespaces?  
> > 
> > No, no multiple devlink instances for asic intended.  
> 
> So it should be allowed for an asic to have resources split across
> network namespaces. e.g., something like this:
> 
>    namespace 1 |  namespace 2  | ... | namespace N
>                |               |     |
>   { ports 1 }  |   { ports 2 } | ... | { ports N }
>                |               |     |
>    devlink 1   |    devlink 2  | ... |  devlink N
>   =================================================
>                    driver

Can you elaborate further? Ports for most purposes are represented by
netdevices. Devlink port instances expose global topological view of
the ports which is primarily relevant if you can see the entire ASIC.
I think the global configuration and global view of resources is still
the most relevant need, so in your diagram you must account for some
"all-seeing" instance, e.g.:

   namespace 1 |  namespace 2  | ... | namespace N
               |               |     |
  { ports 1 }  |   { ports 2 } | ... | { ports N }
               |               |     |
subdevlink 1   | subdevlink 2  | ... |  subdevlink N
         \______      |              _______/
                 master ASIC devlink
  =================================================
                   driver

No?

^ permalink raw reply

* Re: [PATCH 0/2] tools: bpftool: add net (un)load command to load XDP
From: David Ahern @ 2019-07-31 21:59 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, Daniel T. Lee, Stephen Hemminger,
	Jesper Dangaard Brouer, John Fastabend, Daniel Borkmann,
	Alexei Starovoitov, David Miller, netdev
In-Reply-To: <20190730182144.1355bf50@cakuba.netronome.com>

On 7/30/19 7:21 PM, Jakub Kicinski wrote:
> 
>>>> If bpftool was taught to do equivalent of 'ip link' that would be
>>>> very different story and I would be opposed to that.  
>>> Yes, that'd be pretty clear cut, only the XDP stuff is a bit more 
>>> of a judgement call.  
>> bpftool must be able to introspect every aspect of bpf programming.
>> That includes detaching and attaching anywhere.
>> Anyone doing 'bpftool p s' should be able to switch off particular
>> prog id without learning ten different other tools.
> I think the fact that we already have an implementation in iproute2,
> which is at the risk of bit rot is more important to me that the
> hypothetical scenario where everyone knows to just use bpftool (for
> XDP, for TC it's still iproute2 unless there's someone crazy enough 
> to reimplement the TC functionality :))

apparently the iproute2 version has bit rot which is a shame.

> 
> I'm not sure we can settle our differences over email :)
> I have tremendous respect for all the maintainers I CCed here, 
> if nobody steps up to agree with me I'll concede the bpftool net
> battle entirely :)

bpftool started as an introspection tool and has turned into a one stop
shop for all things ebpf. I am mixed on whether that is a good thing or
a bad thing.

^ permalink raw reply

* Re: [PATCH] net: mdio-octeon: Fix build error and Kconfig warning
From: Randy Dunlap @ 2019-07-31 21:55 UTC (permalink / raw)
  To: Nathan Chancellor, davem
  Cc: andrew, broonie, devel, f.fainelli, gregkh, hkallweit1,
	kernel-build-reports, linux-arm-kernel, linux-next, netdev, willy,
	kbuild test robot
In-Reply-To: <20190731185023.20954-1-natechancellor@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3444 bytes --]

On 7/31/19 11:50 AM, Nathan Chancellor wrote:
> arm allyesconfig warns:
> 
> WARNING: unmet direct dependencies detected for MDIO_OCTEON
>   Depends on [n]: NETDEVICES [=y] && MDIO_DEVICE [=y] && MDIO_BUS [=y]
> && 64BIT && HAS_IOMEM [=y] && OF_MDIO [=y]
>   Selected by [y]:
>   - OCTEON_ETHERNET [=y] && STAGING [=y] && (CAVIUM_OCTEON_SOC &&
> NETDEVICES [=y] || COMPILE_TEST [=y])
> 
> and errors:
> 
> In file included from ../drivers/net/phy/mdio-octeon.c:14:
> ../drivers/net/phy/mdio-octeon.c: In function 'octeon_mdiobus_probe':
> ../drivers/net/phy/mdio-cavium.h:111:36: error: implicit declaration of
> function 'writeq'; did you mean 'writeb'?
> [-Werror=implicit-function-declaration]
>   111 | #define oct_mdio_writeq(val, addr) writeq(val, (void *)addr)
>       |                                    ^~~~~~
> ../drivers/net/phy/mdio-octeon.c:56:2: note: in expansion of macro
> 'oct_mdio_writeq'
>    56 |  oct_mdio_writeq(smi_en.u64, bus->register_base + SMI_EN);
>       |  ^~~~~~~~~~~~~~~
> cc1: some warnings being treated as errors
> 
> This allows MDIO_OCTEON to be built with COMPILE_TEST as well and
> includes the proper header for readq/writeq. This does not address
> the several -Wint-to-pointer-cast and -Wpointer-to-int-cast warnings
> that appeared as a result of commit 171a9bae68c7 ("staging/octeon:
> Allow test build on !MIPS") in these files.
> 
> Fixes: 171a9bae68c7 ("staging/octeon: Allow test build on !MIPS")
> Reported-by: kbuild test robot <lkp@intel.com>
> Reported-by: Mark Brown <broonie@kernel.org>
> Reported-by: Randy Dunlap <rdunlap@infradead.org>
> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>


With today's linux-next (20190731), I am still seeing a Kconfig warning and
build errors (building for i386):

and applying Greg's "depends on NETDEVICES" patch and this patch:

WARNING: unmet direct dependencies detected for MDIO_OCTEON
  Depends on [n]: NETDEVICES [=y] && MDIO_DEVICE [=m] && MDIO_BUS [=m] && (64BIT [=n] || COMPILE_TEST [=y]) && HAS_IOMEM [=y] && OF_MDIO [=n]
  Selected by [m]:
  - OCTEON_ETHERNET [=m] && STAGING [=y] && (CAVIUM_OCTEON_SOC || COMPILE_TEST [=y]) && NETDEVICES [=y]

ERROR: "cavium_mdiobus_write" [drivers/net/phy/mdio-octeon.ko] undefined!
ERROR: "cavium_mdiobus_read" [drivers/net/phy/mdio-octeon.ko] undefined!


kernel .config file is attached.

Am I missing another patch?

thanks.

> ---
>  drivers/net/phy/Kconfig       | 2 +-
>  drivers/net/phy/mdio-cavium.h | 2 ++
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
> index 20f14c5fbb7e..ed2edf4b5b0e 100644
> --- a/drivers/net/phy/Kconfig
> +++ b/drivers/net/phy/Kconfig
> @@ -159,7 +159,7 @@ config MDIO_MSCC_MIIM
>  
>  config MDIO_OCTEON
>  	tristate "Octeon and some ThunderX SOCs MDIO buses"
> -	depends on 64BIT
> +	depends on 64BIT || COMPILE_TEST
>  	depends on HAS_IOMEM && OF_MDIO
>  	select MDIO_CAVIUM
>  	help
> diff --git a/drivers/net/phy/mdio-cavium.h b/drivers/net/phy/mdio-cavium.h
> index ed5f9bb5448d..b7f89ad27465 100644
> --- a/drivers/net/phy/mdio-cavium.h
> +++ b/drivers/net/phy/mdio-cavium.h
> @@ -108,6 +108,8 @@ static inline u64 oct_mdio_readq(u64 addr)
>  	return cvmx_read_csr(addr);
>  }
>  #else
> +#include <linux/io-64-nonatomic-lo-hi.h>
> +
>  #define oct_mdio_writeq(val, addr)	writeq(val, (void *)addr)
>  #define oct_mdio_readq(addr)		readq((void *)addr)
>  #endif
> 


-- 
~Randy

[-- Attachment #2: cavium.i386.config --]
[-- Type: application/x-config, Size: 107345 bytes --]

^ permalink raw reply

* Re: [patch net-next 0/3] net: devlink: Finish network namespace support
From: David Ahern @ 2019-07-31 21:50 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, jakub.kicinski, sthemmin, mlxsw
In-Reply-To: <20190730060817.GD2312@nanopsycho.orion>

On 7/30/19 12:08 AM, Jiri Pirko wrote:
> Mon, Jul 29, 2019 at 10:17:25PM CEST, dsahern@gmail.com wrote:
>> On 7/27/19 3:44 AM, Jiri Pirko wrote:
>>> From: Jiri Pirko <jiri@mellanox.com>
>>>
>>> Devlink from the beginning counts with network namespaces, but the
>>> instances has been fixed to init_net. The first patch allows user
>>> to move existing devlink instances into namespaces:
>>>
>>
>> so you intend for an asic, for example, to have multiple devlink
>> instances where each instance governs a set of related ports (e.g.,
>> ports that share a set of hardware resources) and those instances can be
>> managed from distinct network namespaces?
> 
> No, no multiple devlink instances for asic intended.

So it should be allowed for an asic to have resources split across
network namespaces. e.g., something like this:

   namespace 1 |  namespace 2  | ... | namespace N
               |               |     |
  { ports 1 }  |   { ports 2 } | ... | { ports N }
               |               |     |
   devlink 1   |    devlink 2  | ... |  devlink N
  =================================================
                   driver


^ permalink raw reply

* Re: [PATCH net v3] net: ipv6: Fix a bug in ndisc_send_ns when netdev only has a global address
From: David Ahern @ 2019-07-31 21:42 UTC (permalink / raw)
  To: Su Yanjun, davem, kuznet, yoshfuji; +Cc: netdev, linux-kernel
In-Reply-To: <1564537972-76503-1-git-send-email-suyj.fnst@cn.fujitsu.com>

On 7/30/19 7:52 PM, Su Yanjun wrote:
> When the egress interface does not have a link local address, it can
> not communicate with other hosts.
> 
> In RFC4861, 7.2.2 says
> "If the source address of the packet prompting the solicitation is the
> same as one of the addresses assigned to the outgoing interface, that
> address SHOULD be placed in the IP Source Address of the outgoing
> solicitation.  Otherwise, any one of the addresses assigned to the
> interface should be used."
> 
> In this patch we try get a global address if we get ll address failed.
> 
> Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
> ---
> Changes since V2:
> 	- Let banned_flags under the scope of its use.
> ---
>  include/net/addrconf.h |  2 ++
>  net/ipv6/addrconf.c    | 34 ++++++++++++++++++++++++++++++++++
>  net/ipv6/ndisc.c       | 10 +++++++---
>  3 files changed, 43 insertions(+), 3 deletions(-)
> 


This change looks fine to me given the RFC reference, so for that part:
Reviewed-by: David Ahern <dsahern@gmail.com>

Bigger picture is the issue Mark raised that a different RFC says all
links should have an LLA, so use of IN6_ADDR_GEN_MODE_NONE means
userspace is expected to create and add the LLA. Lack of an LLA is a
misconfigured system. If that is enforced via some to be developed
patch, then this patch would not be needed.

^ permalink raw reply

* Re: KASAN: use-after-free Read in nr_rx_frame (2)
From: syzbot @ 2019-07-31 21:32 UTC (permalink / raw)
  To: davem, dvyukov, linux-hams, linux-kernel, netdev, ralf,
	syzkaller-bugs, xiyou.wangcong
In-Reply-To: <000000000000e42667058e554371@google.com>

syzbot has bisected this bug to:

commit c8c8218ec5af5d2598381883acbefbf604e56b5e
Author: Cong Wang <xiyou.wangcong@gmail.com>
Date:   Thu Jun 27 21:30:58 2019 +0000

     netrom: fix a memory leak in nr_rx_frame()

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=122ddaec600000
start commit:   629f8205 Merge tag 'for-linus-20190730' of git://git.kerne..
git tree:       upstream
final crash:    https://syzkaller.appspot.com/x/report.txt?x=112ddaec600000
console output: https://syzkaller.appspot.com/x/log.txt?x=162ddaec600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=e397351d2615e10
dashboard link: https://syzkaller.appspot.com/bug?extid=701728447042217b67c1
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=14a6e008600000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11937d92600000

Reported-by: syzbot+701728447042217b67c1@syzkaller.appspotmail.com
Fixes: c8c8218ec5af ("netrom: fix a memory leak in nr_rx_frame()")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply

* [PATCH net] net: phy: fix race in genphy_update_link
From: Heiner Kallweit @ 2019-07-31 21:05 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller
  Cc: netdev@vger.kernel.org, liuyonglong

In phy_start_aneg() autoneg is started, and immediately after that
link and autoneg status are read. As reported in [0] it can happen that
at time of this read the PHY has reset the "aneg complete" bit but not
yet the "link up" bit, what can result in a false link-up detection.
To fix this don't report link as up if we're in aneg mode and PHY
doesn't signal "aneg complete".

[0] https://marc.info/?t=156413509900003&r=1&w=2

Fixes: 4950c2ba49cc ("net: phy: fix autoneg mismatch case in genphy_read_status")
Reported-by: liuyonglong <liuyonglong@huawei.com>
Tested-by: liuyonglong <liuyonglong@huawei.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/phy_device.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 6b5cb87f3..7ddd91df9 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1774,6 +1774,12 @@ int genphy_update_link(struct phy_device *phydev)
 	phydev->link = status & BMSR_LSTATUS ? 1 : 0;
 	phydev->autoneg_complete = status & BMSR_ANEGCOMPLETE ? 1 : 0;
 
+	/* Consider the case that autoneg was started and "aneg complete"
+	 * bit has been reset, but "link up" bit not yet.
+	 */
+	if (phydev->autoneg == AUTONEG_ENABLE && !phydev->autoneg_complete)
+		phydev->link = 0;
+
 	return 0;
 }
 EXPORT_SYMBOL(genphy_update_link);
-- 
2.22.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox