Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next] selftests: pmtu: Minimum MTU for vti6 is 68
From: Stefano Brivio @ 2018-04-26 17:41 UTC (permalink / raw)
  To: David S . Miller
  Cc: Steffen Klassert, Xin Long, Alexey Kodanev, Jarod Wilson,
	Sabrina Dubroca, netdev

A vti6 interface can carry IPv4 packets too.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 tools/testing/selftests/net/pmtu.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh
index 1e428781a625..7651fd4d86fe 100755
--- a/tools/testing/selftests/net/pmtu.sh
+++ b/tools/testing/selftests/net/pmtu.sh
@@ -368,7 +368,7 @@ test_pmtu_vti6_link_add_mtu() {
 
 	fail=0
 
-	min=1280
+	min=68			# vti6 can carry IPv4 packets too
 	max=$((65535 - 40))
 	# Check invalid values first
 	for v in $((min - 1)) $((max + 1)); do
@@ -384,7 +384,7 @@ test_pmtu_vti6_link_add_mtu() {
 	done
 
 	# Now check valid values
-	for v in 1280 1300 $((65535 - 40)); do
+	for v in 68 1280 1300 $((65535 - 40)); do
 		${ns_a} ip link add vti6_a mtu ${v} type vti6 local ${veth6_a_addr} remote ${veth6_b_addr} key 10
 		mtu="$(link_get_mtu "${ns_a}" vti6_a)"
 		${ns_a} ip link del vti6_a
-- 
2.15.1

^ permalink raw reply related

* [PATCH ipsec] vti6: Change minimum MTU to IPV4_MIN_MTU, vti6 can carry IPv4 too
From: Stefano Brivio @ 2018-04-26 17:39 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Xin Long, Alexey Kodanev, Jarod Wilson, Sabrina Dubroca, netdev

A vti6 interface can carry IPv4 as well, so it makes no sense to
enforce a minimum MTU of IPV6_MIN_MTU.

If the user sets an MTU below IPV6_MIN_MTU, IPv6 will be
disabled on the interface, courtesy of addrconf_notify().

Reported-by: Xin Long <lucien.xin@gmail.com>
Fixes: b96f9afee4eb ("ipv4/6: use core net MTU range checking")
Fixes: c6741fbed6dc ("vti6: Properly adjust vti6 MTU from MTU of lower device")
Fixes: 53c81e95df17 ("ip6_vti: adjust vti mtu according to mtu of lower device")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 net/ipv6/ip6_vti.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index c214ffec02f0..ca957dd93a29 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -669,7 +669,7 @@ static void vti6_link_config(struct ip6_tnl *t, bool keep_mtu)
 	else
 		mtu = ETH_DATA_LEN - LL_MAX_HEADER - sizeof(struct ipv6hdr);
 
-	dev->mtu = max_t(int, mtu, IPV6_MIN_MTU);
+	dev->mtu = max_t(int, mtu, IPV4_MIN_MTU);
 }
 
 /**
@@ -881,7 +881,7 @@ static void vti6_dev_setup(struct net_device *dev)
 	dev->priv_destructor = vti6_dev_free;
 
 	dev->type = ARPHRD_TUNNEL6;
-	dev->min_mtu = IPV6_MIN_MTU;
+	dev->min_mtu = IPV4_MIN_MTU;
 	dev->max_mtu = IP_MAX_MTU - sizeof(struct ipv6hdr);
 	dev->flags |= IFF_NOARP;
 	dev->addr_len = sizeof(struct in6_addr);
-- 
2.15.1

^ permalink raw reply related

* RE: [PATCH net-next] microchipT1phy: Add driver for Microchip LAN87XX T1 PHYs
From: Nisar.Sayed @ 2018-04-26 17:24 UTC (permalink / raw)
  To: andrew; +Cc: f.fainelli, davem, UNGLinuxDriver, netdev
In-Reply-To: <20180426122743.GB13467@lunn.ch>

Hi Andrew,

> > Fine, will change the filename.
> 
> > The reason for moving to separate file is that we have a series of
> > T1 standard PHYs, which support cable diagnostics, signal quality
> > indicator(SQI) and sleep and wakeup (TC10) support etc. we planned to
> > keep all T1 standard PHYs separate to support additional features
> > supported by these PHYs.
> 
> Is there anything shared with the other microchip PHYs? If there is potential
> for code sharing, you should do it.

Yes, there will be no code sharing between existing microchip PHYs and the newly getting added T1 phys.

> 
> > > > + */
> > > > +#ifndef _MICROCHIPT1PHY_H_
> > > > +#define _MICROCHIPT1PHY_H_
> > > > +
> > > > +/* Interrupt Source Register */
> > > > +#define LAN87XX_INTERRUPT_SOURCE                (0x18)
> > > > +
> > > > +/* Interrupt Mask Register */
> > > > +#define LAN87XX_INTERRUPT_MASK                  (0x19)
> > > > +#define LAN87XX_MASK_LINK_UP                    (0x0004)
> > > > +#define LAN87XX_MASK_LINK_DOWN                  (0x0002)
> > >
> > > What's the point of that header file if all definitions are consumed
> > > by the same driver?
> > >
> >
> > We have planned a series of patches where we planned to use this further.
> 
> Are you adding multiple files which share the header? If not, just add the
> defines to the C code.
> 
>     Andrew

We have a plan, I think as you suggested better to go with defines in C codes itself now.
Maybe we can create/move during future submissions.

- Nisar

^ permalink raw reply

* Re: [PATCH v3 net-next 1/3] lan78xx: Lan7801 Support for Fixed PHY
From: Florian Fainelli @ 2018-04-26 17:24 UTC (permalink / raw)
  To: Raghuram Chary J, davem; +Cc: netdev, unglinuxdriver, woojung.huh
In-Reply-To: <20180426171114.19753-2-raghuramchary.jallipalli@microchip.com>

On 04/26/2018 10:11 AM, Raghuram Chary J wrote:
> Adding Fixed PHY support to the lan78xx driver.
> 
> Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
> ---
> v0->v1:
>    * Remove driver version #define
>    * Modify netdev_info to netdev_dbg
>    * Move lan7801 specific to new routine and add switch case
>    * Minor cleanup
> 
> v1->v2:
>    * Removed fixedphy variable and used phy_is_pseudo_fixed_link() check.
> v2->v3:
>    * Revert driver version, debug statment changes for separate patch.
>    * Modify lan7801 specific routine with return type struct phy_device.
> ---
>  drivers/net/usb/Kconfig   |   1 +
>  drivers/net/usb/lan78xx.c | 103 ++++++++++++++++++++++++++++++++--------------
>  2 files changed, 74 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/net/usb/Kconfig b/drivers/net/usb/Kconfig
> index f28bd74ac275..418b0904cecb 100644
> --- a/drivers/net/usb/Kconfig
> +++ b/drivers/net/usb/Kconfig
> @@ -111,6 +111,7 @@ config USB_LAN78XX
>  	select MII
>  	select PHYLIB
>  	select MICROCHIP_PHY
> +	select FIXED_PHY
>  	help
>  	  This option adds support for Microchip LAN78XX based USB 2
>  	  & USB 3 10/100/1000 Ethernet adapters.
> diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
> index 0867f7275852..ef169d73fadc 100644
> --- a/drivers/net/usb/lan78xx.c
> +++ b/drivers/net/usb/lan78xx.c
> @@ -36,7 +36,7 @@
>  #include <linux/irq.h>
>  #include <linux/irqchip/chained_irq.h>
>  #include <linux/microchipphy.h>
> -#include <linux/phy.h>
> +#include <linux/phy_fixed.h>
>  #include "lan78xx.h"
>  
>  #define DRIVER_AUTHOR	"WOOJUNG HUH <woojung.huh@microchip.com>"
> @@ -2003,52 +2003,90 @@ static int ksz9031rnx_fixup(struct phy_device *phydev)
>  	return 1;
>  }
>  
> -static int lan78xx_phy_init(struct lan78xx_net *dev)
> +static struct phy_device *lan7801_phy_init(struct phy_device *phydev,
> +					   struct lan78xx_net *dev)

This is confusing, why cannot you just pass a struct lan78xx_net
reference and return a phy_device reference in turn if either:

- phy_find_first() succeeds, or
- you registered a fixed PHY device?

To get there, just restructure how lan78xx_phy_init() does things by
moving the phy_find_first() under the switch()/case for
ID_REV_CHIP_ID_7800_ and ID_REV_CHIP_ID_7850_ and then you can have
lan7801_phy_init() do the phy_find_first().

Or better just merge lan78xx_phy_init() with lan7801_phy_init() maybe?
-- 
Florian

^ permalink raw reply

* Re: [PATCH v3 net-next 2/3] lan78xx: Remove DRIVER_VERSION for lan78xx driver
From: Florian Fainelli @ 2018-04-26 17:20 UTC (permalink / raw)
  To: Raghuram Chary J, davem; +Cc: netdev, unglinuxdriver, woojung.huh
In-Reply-To: <20180426171114.19753-3-raghuramchary.jallipalli@microchip.com>

On 04/26/2018 10:11 AM, Raghuram Chary J wrote:
> Remove driver version info from the lan78xx driver.
> 
> Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* Re: [PATCH v3 net-next 3/3] lan78xx: Modify error messages
From: Florian Fainelli @ 2018-04-26 17:20 UTC (permalink / raw)
  To: Raghuram Chary J, davem; +Cc: netdev, unglinuxdriver, woojung.huh
In-Reply-To: <20180426171114.19753-4-raghuramchary.jallipalli@microchip.com>

On 04/26/2018 10:11 AM, Raghuram Chary J wrote:
> Modify the error messages when phy registration fails.
> 
> Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next 1/2 v2] netns: restrict uevents
From: Eric W. Biederman @ 2018-04-26 17:10 UTC (permalink / raw)
  To: Christian Brauner
  Cc: David Miller, netdev, linux-kernel, avagin, ktkhai, serge, gregkh
In-Reply-To: <20180426170324.GA10061@gmail.com>

Christian Brauner <christian.brauner@canonical.com> writes:

> On Thu, Apr 26, 2018 at 11:47:19AM -0500, Eric W. Biederman wrote:
>> Christian Brauner <christian.brauner@canonical.com> writes:
>> 
>> > On Tue, Apr 24, 2018 at 06:00:35PM -0500, Eric W. Biederman wrote:
>> >> Christian Brauner <christian.brauner@canonical.com> writes:
>> >> 
>> >> > On Wed, Apr 25, 2018, 00:41 Eric W. Biederman <ebiederm@xmission.com> wrote:
>> >> >
>> >> >  Bah. This code is obviously correct and probably wrong.
>> >> >
>> >> >  How do we deliver uevents for network devices that are outside of the
>> >> >  initial user namespace? The kernel still needs to deliver those.
>> >> >
>> >> >  The logic to figure out which network namespace a device needs to be
>> >> >  delivered to is is present in kobj_bcast_filter. That logic will almost
>> >> >  certainly need to be turned inside out. Sign not as easy as I would
>> >> >  have hoped.
>> >> >
>> >> > My first patch that we discussed put additional filtering logic into kobj_bcast_filter for that very reason. But I can move that logic
>> >> > out and come up with a new patch.
>> >> 
>> >> I may have mis-understood.
>> >> 
>> >> I heard and am still hearing additional filtering to reduce the places
>> >> the packet is delievered.
>> >> 
>> >> I am saying something needs to change to increase the number of places
>> >> the packet is delivered.
>> >> 
>> >> For the special class of devices that kobj_bcast_filter would apply to
>> >> those need to be delivered to netowrk namespaces  that are no longer on
>> >> uevent_sock_list.
>> >> 
>> >> So the code fundamentally needs to split into two paths.  Ordinary
>> >> devices that use uevent_sock_list.  Network devices that are just
>> >> delivered in their own network namespace.
>> >> 
>> >> netlink_broadcast_filtered gets to go away completely.
>> >
>> > The split *might* make sense but I think you're wrong about removing the
>> > kobj_bcast_filter. The current filter doesn't operate on the uevent
>> > socket in uevent_sock_list itself it rather operates on the sockets in
>> > mc_list. And if socket in mc_list can have a different network namespace
>> > then the uevent_socket itself then your way won't work. That's why my
>> > original patch added additional filtering in there. The way I see it we
>> > need something like:
>> 
>> We already filter the sockets in the mc_list by network namespace.
>
> Oh really? That's good to know. I haven't found where in the code this
> actually happens. I thought that when netlink_bind() is called anyone
> could register themselves in mc_list.

The code in af_netlink.c does:
> static void do_one_broadcast(struct sock *sk,
> 				    struct netlink_broadcast_data *p)
> {
> 	struct netlink_sock *nlk = nlk_sk(sk);
> 	int val;
> 
> 	if (p->exclude_sk == sk)
> 		return;
> 
> 	if (nlk->portid == p->portid || p->group - 1 >= nlk->ngroups ||
> 	    !test_bit(p->group - 1, nlk->groups))
> 		return;
> 
> 	if (!net_eq(sock_net(sk), p->net)) {
            ^^^^^^^^^^^^ Here
> 		if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
> 			return;
                ^^^^^^^^^^^ Here
> 
> 		if (!peernet_has_id(sock_net(sk), p->net))
> 			return;
> 
> 		if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
> 				     CAP_NET_BROADCAST))
> 			return;
> 	}

Which if you are not a magic NETLINK_F_LISTEN_ALL_NSID socket filters
you out if you are the wrong network namespace.


>> When a packet is transmitted with netlink_broadcast it is only
>> transmitted within a single network namespace.
>> 
>> Even in the case of a NETLINK_F_LISTEN_ALL_NSID socket the skb is tagged
>> with it's source network namespace so no confusion will result, and the
>> permission checks have been done to make it safe. So you can safely
>> ignore that case.  Please ignore that case.  It only needs to be
>> considered if refactoring af_netlink.c
>> 
>> When I added netlink_broadcast_filtered I imagined that we would need
>> code that worked across network namespaces that worked for different
>> namespaces.   So it looked like we would need the level of granularity
>> that you can get with netlink_broadcast_filtered.  It turns out we don't
>> and that it was a case of over design.  As the only split we care about
>> is per network namespace there is no need for
>> netlink_broadcast_filtered.
>> 
>> > init_user_ns_broadcast_filtered(uevent_sock_list, kobj_bcast_filter);
>> > user_ns_broadcast_filtered(uevent_sock_list,kobj_bcast_filter);
>> >
>> > The question that remains is whether we can rely on the network
>> > namespace information we can gather from the kobject_ns_type_operations
>> > to decide where we want to broadcast that event to. So something
>> > *like*:
>> 
>> We can.  We already do.  That is what kobj_bcast_filter implements.
>> 
>> > 	ops = kobj_ns_ops(kobj);
>> > 	if (!ops && kobj->kset) {
>> > 		struct kobject *ksobj = &kobj->kset->kobj;
>> > 		if (ksobj->parent != NULL)
>> > 			ops = kobj_ns_ops(ksobj->parent);
>> > 	}
>> >
>> > 	if (ops && ops->netlink_ns && kobj->ktype->namespace)
>> > 		if (ops->type == KOBJ_NS_TYPE_NET)
>> > 			net = kobj->ktype->namespace(kobj);
>> 
>> Please note the only entry in the enumeration in the kobj_ns_type
>> enumeration other than KOBJ_NS_TYPE_NONE is KOBJ_NS_TYPE_NET.  So the
>> check for ops->type in this case is redundant.
>
> Yes, I know the reason for doing it explicitly is to block the case
> where kobjects get tagged with other namespaces. So we'd need to be
> vigilant should that ever happen but fine.

It is fine to keep the check.

I was intending to point out that it is much more likely that we remove
the enumeration and remove some of the extra abstraction, than another
namespace is implemented there.

>> That is something else that could be simplifed.  At the time it was the
>> necessary to get the sysfs changes merged.
>> 
>> > 	if (!net || net->user_ns == &init_user_ns)
>> > 		ret = init_user_ns_broadcast(env, action_string, devpath);
>> > 	else
>> > 		ret = user_ns_broadcast(net->uevent_sock->sk, env,
>> > 					action_string, devpath);
>> 
>> Almost.
>> 
>> 	if (!net)
>>         	kobject_uevent_net_broadcast(kobj, env, action_string,
>>         					dev_path);
>> 	else
>>         	netlink_broadcast(net->uevent_sock->sk, skb, 0, 1, GFP_KERNEL);
>> 
>> 
>> I am handwaving to get the skb in the netlink_broadcast case but that
>> should be enough for you to see what I am thinking.
>
> I have added a helper alloc_uevent_skb() that can be used in both cases.
>
> static struct sk_buff *alloc_uevent_skb(struct kobj_uevent_env *env,
> 					const char *action_string,
> 					const char *devpath)
> {
> 	struct sk_buff *skb = NULL;
> 	char *scratch;
> 	size_t len;
>
> 	/* allocate message with maximum possible size */
> 	len = strlen(action_string) + strlen(devpath) + 2;
> 	skb = alloc_skb(len + env->buflen, GFP_KERNEL);
> 	if (!skb)
> 		return NULL;
>
> 	/* add header */
> 	scratch = skb_put(skb, len);
> 	sprintf(scratch, "%s@%s", action_string, devpath);
>
> 	skb_put_data(skb, env->buf, env->buflen);
>
> 	NETLINK_CB(skb).dst_group = 1;
>
> 	return skb;
> }
>
>> 
>> My only concern with the above is that we almost certainly need to fix
>> the credentials on the skb so that userspace does not drop the packet
>> sent to a network namespace because it has the credentials that will
>> cause userspace to drop the packet today.
>> 
>> But it should be straight forward to look at net->user_ns, to fix the
>> credentials.
>
> Yes, afaict, the only thing that needs to be updated is the uid.

I suspect there may also be a gid.

Eric

^ permalink raw reply

* [PATCH v3 net-next 3/3] lan78xx: Modify error messages
From: Raghuram Chary J @ 2018-04-26 17:11 UTC (permalink / raw)
  To: davem; +Cc: netdev, unglinuxdriver, woojung.huh, raghuramchary.jallipalli
In-Reply-To: <20180426171114.19753-1-raghuramchary.jallipalli@microchip.com>

Modify the error messages when phy registration fails.

Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
---
 drivers/net/usb/lan78xx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 0bd973516d56..64c741f012aa 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -2039,14 +2039,14 @@ static struct phy_device *lan7801_phy_init(struct phy_device *phydev,
 		ret = phy_register_fixup_for_uid(PHY_KSZ9031RNX, 0xfffffff0,
 						 ksz9031rnx_fixup);
 		if (ret < 0) {
-			netdev_err(dev->net, "fail to register fixup\n");
+			netdev_err(dev->net, "fail to register fixup for PHY_KSZ9031RNX\n");
 			return NULL;
 		}
 		/* external PHY fixup for LAN8835 */
 		ret = phy_register_fixup_for_uid(PHY_LAN8835, 0xfffffff0,
 						 lan8835_fixup);
 		if (ret < 0) {
-			netdev_err(dev->net, "fail to register fixup\n");
+			netdev_err(dev->net, "fail to register fixup for PHY_LAN8835\n");
 			return NULL;
 		}
 		/* add more external PHY fixup here if needed */
-- 
2.16.2

^ permalink raw reply related

* [PATCH v3 net-next 2/3] lan78xx: Remove DRIVER_VERSION for lan78xx driver
From: Raghuram Chary J @ 2018-04-26 17:11 UTC (permalink / raw)
  To: davem; +Cc: netdev, unglinuxdriver, woojung.huh, raghuramchary.jallipalli
In-Reply-To: <20180426171114.19753-1-raghuramchary.jallipalli@microchip.com>

Remove driver version info from the lan78xx driver.

Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
---
 drivers/net/usb/lan78xx.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index ef169d73fadc..0bd973516d56 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -42,7 +42,6 @@
 #define DRIVER_AUTHOR	"WOOJUNG HUH <woojung.huh@microchip.com>"
 #define DRIVER_DESC	"LAN78XX USB 3.0 Gigabit Ethernet Devices"
 #define DRIVER_NAME	"lan78xx"
-#define DRIVER_VERSION	"1.0.6"
 
 #define TX_TIMEOUT_JIFFIES		(5 * HZ)
 #define THROTTLE_JIFFIES		(HZ / 8)
@@ -1477,7 +1476,6 @@ static void lan78xx_get_drvinfo(struct net_device *net,
 	struct lan78xx_net *dev = netdev_priv(net);
 
 	strncpy(info->driver, DRIVER_NAME, sizeof(info->driver));
-	strncpy(info->version, DRIVER_VERSION, sizeof(info->version));
 	usb_make_path(dev->udev, info->bus_info, sizeof(info->bus_info));
 }
 
-- 
2.16.2

^ permalink raw reply related

* [PATCH v3 net-next 1/3] lan78xx: Lan7801 Support for Fixed PHY
From: Raghuram Chary J @ 2018-04-26 17:11 UTC (permalink / raw)
  To: davem; +Cc: netdev, unglinuxdriver, woojung.huh, raghuramchary.jallipalli
In-Reply-To: <20180426171114.19753-1-raghuramchary.jallipalli@microchip.com>

Adding Fixed PHY support to the lan78xx driver.

Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
---
v0->v1:
   * Remove driver version #define
   * Modify netdev_info to netdev_dbg
   * Move lan7801 specific to new routine and add switch case
   * Minor cleanup

v1->v2:
   * Removed fixedphy variable and used phy_is_pseudo_fixed_link() check.
v2->v3:
   * Revert driver version, debug statment changes for separate patch.
   * Modify lan7801 specific routine with return type struct phy_device.
---
 drivers/net/usb/Kconfig   |   1 +
 drivers/net/usb/lan78xx.c | 103 ++++++++++++++++++++++++++++++++--------------
 2 files changed, 74 insertions(+), 30 deletions(-)

diff --git a/drivers/net/usb/Kconfig b/drivers/net/usb/Kconfig
index f28bd74ac275..418b0904cecb 100644
--- a/drivers/net/usb/Kconfig
+++ b/drivers/net/usb/Kconfig
@@ -111,6 +111,7 @@ config USB_LAN78XX
 	select MII
 	select PHYLIB
 	select MICROCHIP_PHY
+	select FIXED_PHY
 	help
 	  This option adds support for Microchip LAN78XX based USB 2
 	  & USB 3 10/100/1000 Ethernet adapters.
diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 0867f7275852..ef169d73fadc 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -36,7 +36,7 @@
 #include <linux/irq.h>
 #include <linux/irqchip/chained_irq.h>
 #include <linux/microchipphy.h>
-#include <linux/phy.h>
+#include <linux/phy_fixed.h>
 #include "lan78xx.h"
 
 #define DRIVER_AUTHOR	"WOOJUNG HUH <woojung.huh@microchip.com>"
@@ -2003,52 +2003,90 @@ static int ksz9031rnx_fixup(struct phy_device *phydev)
 	return 1;
 }
 
-static int lan78xx_phy_init(struct lan78xx_net *dev)
+static struct phy_device *lan7801_phy_init(struct phy_device *phydev,
+					   struct lan78xx_net *dev)
 {
+	u32 buf;
 	int ret;
-	u32 mii_adv;
-	struct phy_device *phydev;
+	struct fixed_phy_status fphy_status = {
+		.link = 1,
+		.speed = SPEED_1000,
+		.duplex = DUPLEX_FULL,
+	};
 
-	phydev = phy_find_first(dev->mdiobus);
 	if (!phydev) {
-		netdev_err(dev->net, "no PHY found\n");
-		return -EIO;
-	}
-
-	if ((dev->chipid == ID_REV_CHIP_ID_7800_) ||
-	    (dev->chipid == ID_REV_CHIP_ID_7850_)) {
-		phydev->is_internal = true;
-		dev->interface = PHY_INTERFACE_MODE_GMII;
-
-	} else if (dev->chipid == ID_REV_CHIP_ID_7801_) {
+		netdev_dbg(dev->net, "PHY Not Found!! Registering Fixed PHY\n");
+		phydev = fixed_phy_register(PHY_POLL, &fphy_status, -1,
+					    NULL);
+		if (IS_ERR(phydev)) {
+			netdev_err(dev->net, "No PHY/fixed_PHY found\n");
+			return NULL;
+		}
+		netdev_dbg(dev->net, "Registered FIXED PHY\n");
+		dev->interface = PHY_INTERFACE_MODE_RGMII;
+		ret = lan78xx_write_reg(dev, MAC_RGMII_ID,
+					MAC_RGMII_ID_TXC_DELAY_EN_);
+		ret = lan78xx_write_reg(dev, RGMII_TX_BYP_DLL, 0x3D00);
+		ret = lan78xx_read_reg(dev, HW_CFG, &buf);
+		buf |= HW_CFG_CLK125_EN_;
+		buf |= HW_CFG_REFCLK25_EN_;
+		ret = lan78xx_write_reg(dev, HW_CFG, buf);
+	} else {
 		if (!phydev->drv) {
 			netdev_err(dev->net, "no PHY driver found\n");
-			return -EIO;
+			return NULL;
 		}
-
 		dev->interface = PHY_INTERFACE_MODE_RGMII;
-
 		/* external PHY fixup for KSZ9031RNX */
 		ret = phy_register_fixup_for_uid(PHY_KSZ9031RNX, 0xfffffff0,
 						 ksz9031rnx_fixup);
 		if (ret < 0) {
 			netdev_err(dev->net, "fail to register fixup\n");
-			return ret;
+			return NULL;
 		}
 		/* external PHY fixup for LAN8835 */
 		ret = phy_register_fixup_for_uid(PHY_LAN8835, 0xfffffff0,
 						 lan8835_fixup);
 		if (ret < 0) {
 			netdev_err(dev->net, "fail to register fixup\n");
-			return ret;
+			return NULL;
 		}
 		/* add more external PHY fixup here if needed */
 
 		phydev->is_internal = false;
-	} else {
-		netdev_err(dev->net, "unknown ID found\n");
-		ret = -EIO;
-		goto error;
+	}
+	return phydev;
+}
+
+static int lan78xx_phy_init(struct lan78xx_net *dev)
+{
+	int ret;
+	u32 mii_adv;
+	struct phy_device *phydev;
+
+	phydev = phy_find_first(dev->mdiobus);
+	switch (dev->chipid) {
+	case ID_REV_CHIP_ID_7801_:
+		phydev = lan7801_phy_init(phydev, dev);
+		if (!phydev) {
+			netdev_err(dev->net, "lan7801: PHY Init Failed");
+			return -EIO;
+		}
+		break;
+
+	case ID_REV_CHIP_ID_7800_:
+	case ID_REV_CHIP_ID_7850_:
+		if (!phydev) {
+			netdev_err(dev->net, "no PHY found\n");
+			return -EIO;
+		}
+		phydev->is_internal = true;
+		dev->interface = PHY_INTERFACE_MODE_GMII;
+		break;
+
+	default:
+		netdev_err(dev->net, "Unknown CHIP ID found\n");
+		return -EIO;
 	}
 
 	/* if phyirq is not set, use polling mode in phylib */
@@ -2067,6 +2105,12 @@ static int lan78xx_phy_init(struct lan78xx_net *dev)
 	if (ret) {
 		netdev_err(dev->net, "can't attach PHY to %s\n",
 			   dev->mdiobus->id);
+		if (dev->chipid == ID_REV_CHIP_ID_7801_) {
+			phy_unregister_fixup_for_uid(PHY_KSZ9031RNX,
+						     0xfffffff0);
+			phy_unregister_fixup_for_uid(PHY_LAN8835,
+						     0xfffffff0);
+		}
 		return -EIO;
 	}
 
@@ -2084,12 +2128,6 @@ static int lan78xx_phy_init(struct lan78xx_net *dev)
 	dev->fc_autoneg = phydev->autoneg;
 
 	return 0;
-
-error:
-	phy_unregister_fixup_for_uid(PHY_KSZ9031RNX, 0xfffffff0);
-	phy_unregister_fixup_for_uid(PHY_LAN8835, 0xfffffff0);
-
-	return ret;
 }
 
 static int lan78xx_set_rx_max_frame_length(struct lan78xx_net *dev, int size)
@@ -3487,6 +3525,7 @@ static void lan78xx_disconnect(struct usb_interface *intf)
 	struct lan78xx_net		*dev;
 	struct usb_device		*udev;
 	struct net_device		*net;
+	struct phy_device		*phydev;
 
 	dev = usb_get_intfdata(intf);
 	usb_set_intfdata(intf, NULL);
@@ -3495,12 +3534,16 @@ static void lan78xx_disconnect(struct usb_interface *intf)
 
 	udev = interface_to_usbdev(intf);
 	net = dev->net;
+	phydev = net->phydev;
 
 	phy_unregister_fixup_for_uid(PHY_KSZ9031RNX, 0xfffffff0);
 	phy_unregister_fixup_for_uid(PHY_LAN8835, 0xfffffff0);
 
 	phy_disconnect(net->phydev);
 
+	if (phy_is_pseudo_fixed_link(phydev))
+		fixed_phy_unregister(phydev);
+
 	unregister_netdev(net);
 
 	cancel_delayed_work_sync(&dev->wq);
-- 
2.16.2

^ permalink raw reply related

* [PATCH v3 net-next 0/3] lan78xx updates along with Fixed
From: Raghuram Chary J @ 2018-04-26 17:11 UTC (permalink / raw)
  To: davem; +Cc: netdev, unglinuxdriver, woojung.huh, raghuramchary.jallipalli

These series of patches handle few modifications in driver
and adds support for fixed phy.

Raghuram Chary J (3):
  lan78xx: Lan7801 Support for Fixed PHY
  lan78xx: Remove DRIVER_VERSION for lan78xx driver
  lan78xx: Modify error messages

 drivers/net/usb/Kconfig   |   1 +
 drivers/net/usb/lan78xx.c | 109 +++++++++++++++++++++++++++++++---------------
 2 files changed, 76 insertions(+), 34 deletions(-)

-- 
2.16.2

^ permalink raw reply

* Re: [PATCH net-next] tcp: remove mss check in tcp_select_initial_window()
From: Neal Cardwell @ 2018-04-26 17:05 UTC (permalink / raw)
  To: Wei Wang
  Cc: David Miller, Netdev, Yuchung Cheng, Eric Dumazet,
	Soheil Hassas Yeganeh
In-Reply-To: <20180426165810.164524-1-tracywwnj@gmail.com>

On Thu, Apr 26, 2018 at 12:58 PM Wei Wang <weiwan@google.com> wrote:

> From: Wei Wang <weiwan@google.com>

> In tcp_select_initial_window(), we only set rcv_wnd to
> tcp_default_init_rwnd() if current mss > (1 << wscale). Otherwise,
> rcv_wnd is kept at the full receive space of the socket which is a
> value way larger than tcp_default_init_rwnd().
> With larger initial rcv_wnd value, receive buffer autotuning logic
> takes longer to kick in and increase the receive buffer.

> In a TCP throughput test where receiver has rmem[2] set to 125MB
> (wscale is 11), we see the connection gets recvbuf limited at the
> beginning of the connection and gets less throughput overall.

> Signed-off-by: Wei Wang <weiwan@google.com>
> Acked-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
> Acked-by: Yuchung Cheng <ycheng@google.com>
> ---

Acked-by: Neal Cardwell <ncardwell@google.com>

Thanks, Wei!

neal

^ permalink raw reply

* Re: [PATCH net-next 1/2 v2] netns: restrict uevents
From: Christian Brauner @ 2018-04-26 17:03 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David Miller, netdev, linux-kernel, avagin, ktkhai, serge, gregkh
In-Reply-To: <871sf1q5ig.fsf@xmission.com>

On Thu, Apr 26, 2018 at 11:47:19AM -0500, Eric W. Biederman wrote:
> Christian Brauner <christian.brauner@canonical.com> writes:
> 
> > On Tue, Apr 24, 2018 at 06:00:35PM -0500, Eric W. Biederman wrote:
> >> Christian Brauner <christian.brauner@canonical.com> writes:
> >> 
> >> > On Wed, Apr 25, 2018, 00:41 Eric W. Biederman <ebiederm@xmission.com> wrote:
> >> >
> >> >  Bah. This code is obviously correct and probably wrong.
> >> >
> >> >  How do we deliver uevents for network devices that are outside of the
> >> >  initial user namespace? The kernel still needs to deliver those.
> >> >
> >> >  The logic to figure out which network namespace a device needs to be
> >> >  delivered to is is present in kobj_bcast_filter. That logic will almost
> >> >  certainly need to be turned inside out. Sign not as easy as I would
> >> >  have hoped.
> >> >
> >> > My first patch that we discussed put additional filtering logic into kobj_bcast_filter for that very reason. But I can move that logic
> >> > out and come up with a new patch.
> >> 
> >> I may have mis-understood.
> >> 
> >> I heard and am still hearing additional filtering to reduce the places
> >> the packet is delievered.
> >> 
> >> I am saying something needs to change to increase the number of places
> >> the packet is delivered.
> >> 
> >> For the special class of devices that kobj_bcast_filter would apply to
> >> those need to be delivered to netowrk namespaces  that are no longer on
> >> uevent_sock_list.
> >> 
> >> So the code fundamentally needs to split into two paths.  Ordinary
> >> devices that use uevent_sock_list.  Network devices that are just
> >> delivered in their own network namespace.
> >> 
> >> netlink_broadcast_filtered gets to go away completely.
> >
> > The split *might* make sense but I think you're wrong about removing the
> > kobj_bcast_filter. The current filter doesn't operate on the uevent
> > socket in uevent_sock_list itself it rather operates on the sockets in
> > mc_list. And if socket in mc_list can have a different network namespace
> > then the uevent_socket itself then your way won't work. That's why my
> > original patch added additional filtering in there. The way I see it we
> > need something like:
> 
> We already filter the sockets in the mc_list by network namespace.

Oh really? That's good to know. I haven't found where in the code this
actually happens. I thought that when netlink_bind() is called anyone
could register themselves in mc_list.

> 
> When a packet is transmitted with netlink_broadcast it is only
> transmitted within a single network namespace.
> 
> Even in the case of a NETLINK_F_LISTEN_ALL_NSID socket the skb is tagged
> with it's source network namespace so no confusion will result, and the
> permission checks have been done to make it safe. So you can safely
> ignore that case.  Please ignore that case.  It only needs to be
> considered if refactoring af_netlink.c
> 
> When I added netlink_broadcast_filtered I imagined that we would need
> code that worked across network namespaces that worked for different
> namespaces.   So it looked like we would need the level of granularity
> that you can get with netlink_broadcast_filtered.  It turns out we don't
> and that it was a case of over design.  As the only split we care about
> is per network namespace there is no need for
> netlink_broadcast_filtered.
> 
> > init_user_ns_broadcast_filtered(uevent_sock_list, kobj_bcast_filter);
> > user_ns_broadcast_filtered(uevent_sock_list,kobj_bcast_filter);
> >
> > The question that remains is whether we can rely on the network
> > namespace information we can gather from the kobject_ns_type_operations
> > to decide where we want to broadcast that event to. So something
> > *like*:
> 
> We can.  We already do.  That is what kobj_bcast_filter implements.
> 
> > 	ops = kobj_ns_ops(kobj);
> > 	if (!ops && kobj->kset) {
> > 		struct kobject *ksobj = &kobj->kset->kobj;
> > 		if (ksobj->parent != NULL)
> > 			ops = kobj_ns_ops(ksobj->parent);
> > 	}
> >
> > 	if (ops && ops->netlink_ns && kobj->ktype->namespace)
> > 		if (ops->type == KOBJ_NS_TYPE_NET)
> > 			net = kobj->ktype->namespace(kobj);
> 
> Please note the only entry in the enumeration in the kobj_ns_type
> enumeration other than KOBJ_NS_TYPE_NONE is KOBJ_NS_TYPE_NET.  So the
> check for ops->type in this case is redundant.

Yes, I know the reason for doing it explicitly is to block the case
where kobjects get tagged with other namespaces. So we'd need to be
vigilant should that ever happen but fine.

> 
> That is something else that could be simplifed.  At the time it was the
> necessary to get the sysfs changes merged.
> 
> > 	if (!net || net->user_ns == &init_user_ns)
> > 		ret = init_user_ns_broadcast(env, action_string, devpath);
> > 	else
> > 		ret = user_ns_broadcast(net->uevent_sock->sk, env,
> > 					action_string, devpath);
> 
> Almost.
> 
> 	if (!net)
>         	kobject_uevent_net_broadcast(kobj, env, action_string,
>         					dev_path);
> 	else
>         	netlink_broadcast(net->uevent_sock->sk, skb, 0, 1, GFP_KERNEL);
> 
> 
> I am handwaving to get the skb in the netlink_broadcast case but that
> should be enough for you to see what I am thinking.

I have added a helper alloc_uevent_skb() that can be used in both cases.

static struct sk_buff *alloc_uevent_skb(struct kobj_uevent_env *env,
					const char *action_string,
					const char *devpath)
{
	struct sk_buff *skb = NULL;
	char *scratch;
	size_t len;

	/* allocate message with maximum possible size */
	len = strlen(action_string) + strlen(devpath) + 2;
	skb = alloc_skb(len + env->buflen, GFP_KERNEL);
	if (!skb)
		return NULL;

	/* add header */
	scratch = skb_put(skb, len);
	sprintf(scratch, "%s@%s", action_string, devpath);

	skb_put_data(skb, env->buf, env->buflen);

	NETLINK_CB(skb).dst_group = 1;

	return skb;
}

> 
> My only concern with the above is that we almost certainly need to fix
> the credentials on the skb so that userspace does not drop the packet
> sent to a network namespace because it has the credentials that will
> cause userspace to drop the packet today.
> 
> But it should be straight forward to look at net->user_ns, to fix the
> credentials.

Yes, afaict, the only thing that needs to be updated is the uid.

Thanks Eric!
Christian

^ permalink raw reply

* [PATCH net-next] tcp: remove mss check in tcp_select_initial_window()
From: Wei Wang @ 2018-04-26 16:58 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: Yuchung Cheng, Eric Dumazet, Soheil Hassas Yeganeh, Wei Wang

From: Wei Wang <weiwan@google.com>

In tcp_select_initial_window(), we only set rcv_wnd to
tcp_default_init_rwnd() if current mss > (1 << wscale). Otherwise,
rcv_wnd is kept at the full receive space of the socket which is a
value way larger than tcp_default_init_rwnd().
With larger initial rcv_wnd value, receive buffer autotuning logic
takes longer to kick in and increase the receive buffer.

In a TCP throughput test where receiver has rmem[2] set to 125MB
(wscale is 11), we see the connection gets recvbuf limited at the
beginning of the connection and gets less throughput overall.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
---
 net/ipv4/tcp_output.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 95feffb6d53f..d07c0dcc99aa 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -229,11 +229,9 @@ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss,
 		}
 	}
 
-	if (mss > (1 << *rcv_wscale)) {
-		if (!init_rcv_wnd) /* Use default unless specified otherwise */
-			init_rcv_wnd = tcp_default_init_rwnd(mss);
-		*rcv_wnd = min(*rcv_wnd, init_rcv_wnd * mss);
-	}
+	if (!init_rcv_wnd) /* Use default unless specified otherwise */
+		init_rcv_wnd = tcp_default_init_rwnd(mss);
+	*rcv_wnd = min(*rcv_wnd, init_rcv_wnd * mss);
 
 	/* Set the clamp no higher than max representable value */
 	(*window_clamp) = min_t(__u32, U16_MAX << (*rcv_wscale), *window_clamp);
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related

* Re: [PATCH net-next 1/2 v2] netns: restrict uevents
From: Eric W. Biederman @ 2018-04-26 16:47 UTC (permalink / raw)
  To: Christian Brauner
  Cc: David Miller, netdev, linux-kernel, avagin, ktkhai, serge, gregkh
In-Reply-To: <20180426161353.GA2014@gmail.com>

Christian Brauner <christian.brauner@canonical.com> writes:

> On Tue, Apr 24, 2018 at 06:00:35PM -0500, Eric W. Biederman wrote:
>> Christian Brauner <christian.brauner@canonical.com> writes:
>> 
>> > On Wed, Apr 25, 2018, 00:41 Eric W. Biederman <ebiederm@xmission.com> wrote:
>> >
>> >  Bah. This code is obviously correct and probably wrong.
>> >
>> >  How do we deliver uevents for network devices that are outside of the
>> >  initial user namespace? The kernel still needs to deliver those.
>> >
>> >  The logic to figure out which network namespace a device needs to be
>> >  delivered to is is present in kobj_bcast_filter. That logic will almost
>> >  certainly need to be turned inside out. Sign not as easy as I would
>> >  have hoped.
>> >
>> > My first patch that we discussed put additional filtering logic into kobj_bcast_filter for that very reason. But I can move that logic
>> > out and come up with a new patch.
>> 
>> I may have mis-understood.
>> 
>> I heard and am still hearing additional filtering to reduce the places
>> the packet is delievered.
>> 
>> I am saying something needs to change to increase the number of places
>> the packet is delivered.
>> 
>> For the special class of devices that kobj_bcast_filter would apply to
>> those need to be delivered to netowrk namespaces  that are no longer on
>> uevent_sock_list.
>> 
>> So the code fundamentally needs to split into two paths.  Ordinary
>> devices that use uevent_sock_list.  Network devices that are just
>> delivered in their own network namespace.
>> 
>> netlink_broadcast_filtered gets to go away completely.
>
> The split *might* make sense but I think you're wrong about removing the
> kobj_bcast_filter. The current filter doesn't operate on the uevent
> socket in uevent_sock_list itself it rather operates on the sockets in
> mc_list. And if socket in mc_list can have a different network namespace
> then the uevent_socket itself then your way won't work. That's why my
> original patch added additional filtering in there. The way I see it we
> need something like:

We already filter the sockets in the mc_list by network namespace.

When a packet is transmitted with netlink_broadcast it is only
transmitted within a single network namespace.

Even in the case of a NETLINK_F_LISTEN_ALL_NSID socket the skb is tagged
with it's source network namespace so no confusion will result, and the
permission checks have been done to make it safe. So you can safely
ignore that case.  Please ignore that case.  It only needs to be
considered if refactoring af_netlink.c

When I added netlink_broadcast_filtered I imagined that we would need
code that worked across network namespaces that worked for different
namespaces.   So it looked like we would need the level of granularity
that you can get with netlink_broadcast_filtered.  It turns out we don't
and that it was a case of over design.  As the only split we care about
is per network namespace there is no need for
netlink_broadcast_filtered.

> init_user_ns_broadcast_filtered(uevent_sock_list, kobj_bcast_filter);
> user_ns_broadcast_filtered(uevent_sock_list,kobj_bcast_filter);
>
> The question that remains is whether we can rely on the network
> namespace information we can gather from the kobject_ns_type_operations
> to decide where we want to broadcast that event to. So something
> *like*:

We can.  We already do.  That is what kobj_bcast_filter implements.

> 	ops = kobj_ns_ops(kobj);
> 	if (!ops && kobj->kset) {
> 		struct kobject *ksobj = &kobj->kset->kobj;
> 		if (ksobj->parent != NULL)
> 			ops = kobj_ns_ops(ksobj->parent);
> 	}
>
> 	if (ops && ops->netlink_ns && kobj->ktype->namespace)
> 		if (ops->type == KOBJ_NS_TYPE_NET)
> 			net = kobj->ktype->namespace(kobj);

Please note the only entry in the enumeration in the kobj_ns_type
enumeration other than KOBJ_NS_TYPE_NONE is KOBJ_NS_TYPE_NET.  So the
check for ops->type in this case is redundant.

That is something else that could be simplifed.  At the time it was the
necessary to get the sysfs changes merged.

> 	if (!net || net->user_ns == &init_user_ns)
> 		ret = init_user_ns_broadcast(env, action_string, devpath);
> 	else
> 		ret = user_ns_broadcast(net->uevent_sock->sk, env,
> 					action_string, devpath);

Almost.

	if (!net)
        	kobject_uevent_net_broadcast(kobj, env, action_string,
        					dev_path);
	else
        	netlink_broadcast(net->uevent_sock->sk, skb, 0, 1, GFP_KERNEL);


I am handwaving to get the skb in the netlink_broadcast case but that
should be enough for you to see what I am thinking.

My only concern with the above is that we almost certainly need to fix
the credentials on the skb so that userspace does not drop the packet
sent to a network namespace because it has the credentials that will
cause userspace to drop the packet today.

But it should be straight forward to look at net->user_ns, to fix the
credentials.

Eric

^ permalink raw reply

* Re: [PATCH net-next 1/2 v2] netns: restrict uevents
From: Christian Brauner @ 2018-04-26 16:13 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David Miller, netdev, linux-kernel, avagin, ktkhai, serge, gregkh
In-Reply-To: <87wowww6p8.fsf@xmission.com>

On Tue, Apr 24, 2018 at 06:00:35PM -0500, Eric W. Biederman wrote:
> Christian Brauner <christian.brauner@canonical.com> writes:
> 
> > On Wed, Apr 25, 2018, 00:41 Eric W. Biederman <ebiederm@xmission.com> wrote:
> >
> >  Bah. This code is obviously correct and probably wrong.
> >
> >  How do we deliver uevents for network devices that are outside of the
> >  initial user namespace? The kernel still needs to deliver those.
> >
> >  The logic to figure out which network namespace a device needs to be
> >  delivered to is is present in kobj_bcast_filter. That logic will almost
> >  certainly need to be turned inside out. Sign not as easy as I would
> >  have hoped.
> >
> > My first patch that we discussed put additional filtering logic into kobj_bcast_filter for that very reason. But I can move that logic
> > out and come up with a new patch.
> 
> I may have mis-understood.
> 
> I heard and am still hearing additional filtering to reduce the places
> the packet is delievered.
> 
> I am saying something needs to change to increase the number of places
> the packet is delivered.
> 
> For the special class of devices that kobj_bcast_filter would apply to
> those need to be delivered to netowrk namespaces  that are no longer on
> uevent_sock_list.
> 
> So the code fundamentally needs to split into two paths.  Ordinary
> devices that use uevent_sock_list.  Network devices that are just
> delivered in their own network namespace.
> 
> netlink_broadcast_filtered gets to go away completely.

The split *might* make sense but I think you're wrong about removing the
kobj_bcast_filter. The current filter doesn't operate on the uevent
socket in uevent_sock_list itself it rather operates on the sockets in
mc_list. And if socket in mc_list can have a different network namespace
then the uevent_socket itself then your way won't work. That's why my
original patch added additional filtering in there. The way I see it we
need something like:

init_user_ns_broadcast_filtered(uevent_sock_list, kobj_bcast_filter);
user_ns_broadcast_filtered(uevent_sock_list,kobj_bcast_filter);

The question that remains is whether we can rely on the network
namespace information we can gather from the kobject_ns_type_operations
to decide where we want to broadcast that event to. So something *like*:

	ops = kobj_ns_ops(kobj);
	if (!ops && kobj->kset) {
		struct kobject *ksobj = &kobj->kset->kobj;
		if (ksobj->parent != NULL)
			ops = kobj_ns_ops(ksobj->parent);
	}

	if (ops && ops->netlink_ns && kobj->ktype->namespace)
		if (ops->type == KOBJ_NS_TYPE_NET)
			net = kobj->ktype->namespace(kobj);

	if (!net || net->user_ns == &init_user_ns)
		ret = init_user_ns_broadcast(env, action_string, devpath);
	else
		ret = user_ns_broadcast(net->uevent_sock->sk, env,
					action_string, devpath);

Christian

> The logic of figuring out the network namespace though becomes trickier.
> 
> Now it may make sense to have all of that as an additional patch on top
> of this one or perhaps a precursor patch that addresses the problem.  We
> will unfortunately drop those uevents today because their uids are not
> valid.  But they are not delivered anywhere else so to allow them to be
> received we need to fix them.
> 
> Eric
> 
> >
> >  Christian Brauner <christian.brauner@ubuntu.com> writes:
> >  > commit 07e98962fa77 ("kobject: Send hotplug events in all network namespaces")
> >  >
> >  > enabled sending hotplug events into all network namespaces back in 2010.
> >  > Over time the set of uevents that get sent into all network namespaces has
> >  > shrunk a little. We have now reached the point where hotplug events for all
> >  > devices that carry a namespace tag are filtered according to that
> >  > namespace. Specifically, they are filtered whenever the namespace tag of
> >  > the kobject does not match the namespace tag of the netlink socket. One
> >  > example are network devices. Uevents for network devices only show up in
> >  > the network namespaces these devices are moved to or created in.
> >  >
> >  > However, any uevent for a kobject that does not have a namespace tag
> >  > associated with it will not be filtered and we will broadcast it into all
> >  > network namespaces. This behavior stopped making sense when user namespaces
> >  > were introduced.
> >  >
> >  > This patch restricts uevents to the initial user namespace for a couple of
> >  > reasons that have been extensively discusses on the mailing list [1].
> >  > - Thundering herd:
> >  > Broadcasting uevents into all network namespaces introduces significant
> >  > overhead.
> >  > All processes that listen to uevents running in non-initial user
> >  > namespaces will end up responding to uevents that will be meaningless to
> >  > them. Mainly, because non-initial user namespaces cannot easily manage
> >  > devices unless they have a privileged host-process helping them out. This
> >  > means that there will be a thundering herd of activity when there
> >  > shouldn't be any.
> >  > - Uevents from non-root users are already filtered in userspace:
> >  > Uevents are filtered by userspace in a user namespace because the
> >  > received uid != 0. Instead the uid associated with the event will be
> >  > 65534 == "nobody" because the global root uid is not mapped.
> >  > This means we can safely and without introducing regressions modify the
> >  > kernel to not send uevents into all network namespaces whose owning user
> >  > namespace is not the initial user namespace because we know that
> >  > userspace will ignore the message because of the uid anyway. I have
> >  > a) verified that is is true for every udev implementation out there b)
> >  > that this behavior has been present in all udev implementations from the
> >  > very beginning.
> >  > - Removing needless overhead/Increasing performance:
> >  > Currently, the uevent socket for each network namespace is added to the
> >  > global variable uevent_sock_list. The list itself needs to be protected
> >  > by a mutex. So everytime a uevent is generated the mutex is taken on the
> >  > list. The mutex is held *from the creation of the uevent (memory
> >  > allocation, string creation etc. until all uevent sockets have been
> >  > handled*. This is aggravated by the fact that for each uevent socket that
> >  > has listeners the mc_list must be walked as well which means we're
> >  > talking O(n^2) here. Given that a standard Linux workload usually has
> >  > quite a lot of network namespaces and - in the face of containers - a lot
> >  > of user namespaces this quickly becomes a performance problem (see
> >  > "Thundering herd" above). By just recording uevent sockets of network
> >  > namespaces that are owned by the initial user namespace we significantly
> >  > increase performance in this codepath.
> >  > - Injecting uevents:
> >  > There's a valid argument that containers might be interested in receiving
> >  > device events especially if they are delegated to them by a privileged
> >  > userspace process. One prime example are SR-IOV enabled devices that are
> >  > explicitly designed to be handed of to other users such as VMs or
> >  > containers.
> >  > This use-case can now be correctly handled since
> >  > commit 692ec06d7c92 ("netns: send uevent messages"). This commit
> >  > introduced the ability to send uevents from userspace. As such we can let
> >  > a sufficiently privileged (CAP_SYS_ADMIN in the owning user namespace of
> >  > the network namespace of the netlink socket) userspace process make a
> >  > decision what uevents should be sent. This removes the need to blindly
> >  > broadcast uevents into all user namespaces and provides a performant and
> >  > safe solution to this problem.
> >  > - Filtering logic:
> >  > This patch filters by *owning user namespace of the network namespace a
> >  > given task resides in* and not by user namespace of the task per se. This
> >  > means if the user namespace of a given task is unshared but the network
> >  > namespace is kept and is owned by the initial user namespace a listener
> >  > that is opening the uevent socket in that network namespace can still
> >  > listen to uevents.
> >  >
> >  > [1]: https://lkml.org/lkml/2018/4/4/739
> >  > Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
> >  > ---
> >  > Changelog v1->v2:
> >  > * patch unchanged
> >  > Changelog v0->v1:
> >  > * patch unchanged
> >  > ---
> >  > lib/kobject_uevent.c | 18 ++++++++++++------
> >  > 1 file changed, 12 insertions(+), 6 deletions(-)
> >  >
> >  > diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> >  > index 15ea216a67ce..f5f5038787ac 100644
> >  > --- a/lib/kobject_uevent.c
> >  > +++ b/lib/kobject_uevent.c
> >  > @@ -703,9 +703,13 @@ static int uevent_net_init(struct net *net)
> >  > 
> >  > net->uevent_sock = ue_sk;
> >  > 
> >  > - mutex_lock(&uevent_sock_mutex);
> >  > - list_add_tail(&ue_sk->list, &uevent_sock_list);
> >  > - mutex_unlock(&uevent_sock_mutex);
> >  > + /* Restrict uevents to initial user namespace. */
> >  > + if (sock_net(ue_sk->sk)->user_ns == &init_user_ns) {
> >  > + mutex_lock(&uevent_sock_mutex);
> >  > + list_add_tail(&ue_sk->list, &uevent_sock_list);
> >  > + mutex_unlock(&uevent_sock_mutex);
> >  > + }
> >  > +
> >  > return 0;
> >  > }
> >  > 
> >  > @@ -713,9 +717,11 @@ static void uevent_net_exit(struct net *net)
> >  > {
> >  > struct uevent_sock *ue_sk = net->uevent_sock;
> >  > 
> >  > - mutex_lock(&uevent_sock_mutex);
> >  > - list_del(&ue_sk->list);
> >  > - mutex_unlock(&uevent_sock_mutex);
> >  > + if (sock_net(ue_sk->sk)->user_ns == &init_user_ns) {
> >  > + mutex_lock(&uevent_sock_mutex);
> >  > + list_del(&ue_sk->list);
> >  > + mutex_unlock(&uevent_sock_mutex);
> >  > + }
> >  > 
> >  > netlink_kernel_release(ue_sk->sk);
> >  > kfree(ue_sk);

^ permalink raw reply

* Re: [dm-devel] [PATCH v5] fault-injection: introduce kvmalloc fallback options
From: Mikulas Patocka @ 2018-04-26 16:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: James Bottomley, Michal Hocko, David Rientjes, dm-devel,
	eric.dumazet, netdev, jasowang, Randy Dunlap, linux-kernel,
	Matthew Wilcox, linux-mm, edumazet, Andrew Morton, virtualization,
	David Miller, Vlastimil Babka
In-Reply-To: <20180426184845-mutt-send-email-mst@kernel.org>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2389 bytes --]



On Thu, 26 Apr 2018, Michael S. Tsirkin wrote:

> On Thu, Apr 26, 2018 at 11:44:21AM -0400, Mikulas Patocka wrote:
> > 
> > 
> > On Thu, 26 Apr 2018, James Bottomley wrote:
> > 
> > > On Thu, 2018-04-26 at 11:05 -0400, Mikulas Patocka wrote:
> > > > 
> > > > On Thu, 26 Apr 2018, James Bottomley wrote:
> > > [...]
> > > > > Perhaps find out beforehand instead of insisting on an approach
> > > > without
> > > > > knowing.  On openSUSE the grub config is built from the files in
> > > > > /etc/grub.d/ so any package can add a kernel option (and various
> > > > > conditions around activating it) simply by adding a new file.
> > > > 
> > > > And then, different versions of the debug kernel will clash when 
> > > > attempting to create the same file.
> > > 
> > > Don't be silly ... there are many ways of coping with that in rpm/dpkg.
> > 
> > I know you can deal with it - but how many lines of code will that 
> > consume? Multiplied by the total number of rpm-based distros.
> > 
> > Mikulas
> 
> I don't think debug kernels should inject faults by default.

But it is not injecting any faults. It is using the fault-injection 
framework because Michal said that it should use it. The original version 
of the patch didn't use the fault-injection framework at all.

The kvmalloc function can take two branches, the kmalloc branch and 
vmalloc branch. The vmalloc branch is taken rarely, so the kernel contains 
bugs regarding this path - and the patch increases the probability that 
the vmalloc patch is taken, to discover the bugs early and make them 
reproducible.

> IIUC debug kernels mainly exist so people who experience e.g. memory
> corruption can try and debug the failure. In this case, CONFIG_DEBUG_SG
> will *already* catch a failure early. Nothing special needs to be done.

The patch helps people debug such memory coprruptions (such as using DMA 
API on the result of kvmalloc).

> There is a much smaller class of people like QA who go actively looking
> for trouble. That's the kind of thing fault injection is good for, and
> IMO you do not want your QA team to test a separate kernel - otherwise
> you are never quite sure whatever was tested will work in the field.
> So a config option won't help them either.
> 
> How do you make sure QA tests a specific corner case? Add it to
> the test plan :)
> 
> I don't speak for Red Hat, etc.
> 
> -- 
> MST

Mikulas

^ permalink raw reply

* [PATCH 2/2] net: stmmac: dwmac-meson: extend phy mode setting
From: Yixun Lan @ 2018-04-26 16:05 UTC (permalink / raw)
  To: David S. Miller, netdev
  Cc: Yixun Lan, Kevin Hilman, Carlo Caione, Rob Herring,
	Martin Blumenstingl, linux-amlogic, linux-arm-kernel,
	linux-kernel
In-Reply-To: <20180426160508.29380-1-yixun.lan@amlogic.com>

  In the Meson-AXG SoC, the phy mode setting of PRG_ETH0 in the glue layer
is extended from bit[0] to bit[2:0].
  There is no problem if we configure it to the RGMII 1000M PHY mode,
since the register setting is coincidentally compatible with previous one,
but for the RMII 100M PHY mode, the configuration need to be changed to
value - b100.
  This patch was verified with a RTL8201F 100M ethernet PHY.

Signed-off-by: Yixun Lan <yixun.lan@amlogic.com>
---
 .../ethernet/stmicro/stmmac/dwmac-meson8b.c   | 95 ++++++++++++++++---
 1 file changed, 84 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
index 7cb794094a70..e3688b6dd87c 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
@@ -18,6 +18,7 @@
 #include <linux/io.h>
 #include <linux/ioport.h>
 #include <linux/module.h>
+#include <linux/of_device.h>
 #include <linux/of_net.h>
 #include <linux/mfd/syscon.h>
 #include <linux/platform_device.h>
@@ -29,6 +30,10 @@
 
 #define PRG_ETH0_RGMII_MODE		BIT(0)
 
+#define PRG_ETH0_EXT_PHY_MODE_MASK	GENMASK(2, 0)
+#define PRG_ETH0_EXT_RGMII_MODE		1
+#define PRG_ETH0_EXT_RMII_MODE		4
+
 /* mux to choose between fclk_div2 (bit unset) and mpll2 (bit set) */
 #define PRG_ETH0_CLK_M250_SEL_SHIFT	4
 #define PRG_ETH0_CLK_M250_SEL_MASK	GENMASK(4, 4)
@@ -46,10 +51,16 @@
 #define PRG_ETH0_TX_AND_PHY_REF_CLK	BIT(12)
 
 #define MUX_CLK_NUM_PARENTS		2
+struct meson8b_dwmac_data {
+	bool ext_phy_mode;
+};
 
 struct meson8b_dwmac {
 	struct device		*dev;
 	void __iomem		*regs;
+
+	const struct meson8b_dwmac_data *data;
+
 	phy_interface_t		phy_mode;
 	struct clk		*rgmii_tx_clk;
 	u32			tx_delay_ns;
@@ -171,6 +182,46 @@ static int meson8b_init_rgmii_tx_clk(struct meson8b_dwmac *dwmac)
 	return 0;
 }
 
+static int meson8b_init_set_mode(struct meson8b_dwmac *dwmac)
+{
+	bool ext_phy_mode = dwmac->data->ext_phy_mode;
+
+	switch (dwmac->phy_mode) {
+	case PHY_INTERFACE_MODE_RGMII:
+	case PHY_INTERFACE_MODE_RGMII_RXID:
+	case PHY_INTERFACE_MODE_RGMII_ID:
+	case PHY_INTERFACE_MODE_RGMII_TXID:
+		/* enable RGMII mode */
+		if (ext_phy_mode)
+			meson8b_dwmac_mask_bits(dwmac, PRG_ETH0,
+						PRG_ETH0_EXT_PHY_MODE_MASK,
+						PRG_ETH0_EXT_RGMII_MODE);
+		else
+			meson8b_dwmac_mask_bits(dwmac, PRG_ETH0,
+						PRG_ETH0_RGMII_MODE,
+						PRG_ETH0_RGMII_MODE);
+
+		break;
+	case PHY_INTERFACE_MODE_RMII:
+		/* disable RGMII mode -> enables RMII mode */
+		if (ext_phy_mode)
+			meson8b_dwmac_mask_bits(dwmac, PRG_ETH0,
+						PRG_ETH0_EXT_PHY_MODE_MASK,
+						PRG_ETH0_EXT_RMII_MODE);
+		else
+			meson8b_dwmac_mask_bits(dwmac, PRG_ETH0,
+						PRG_ETH0_RGMII_MODE, 0);
+
+		break;
+	default:
+		dev_err(dwmac->dev, "fail to set phy-mode %s\n",
+			phy_modes(dwmac->phy_mode));
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static int meson8b_init_prg_eth(struct meson8b_dwmac *dwmac)
 {
 	int ret;
@@ -188,10 +239,6 @@ static int meson8b_init_prg_eth(struct meson8b_dwmac *dwmac)
 
 	case PHY_INTERFACE_MODE_RGMII_ID:
 	case PHY_INTERFACE_MODE_RGMII_TXID:
-		/* enable RGMII mode */
-		meson8b_dwmac_mask_bits(dwmac, PRG_ETH0, PRG_ETH0_RGMII_MODE,
-					PRG_ETH0_RGMII_MODE);
-
 		/* only relevant for RMII mode -> disable in RGMII mode */
 		meson8b_dwmac_mask_bits(dwmac, PRG_ETH0,
 					PRG_ETH0_INVERTED_RMII_CLK, 0);
@@ -224,10 +271,6 @@ static int meson8b_init_prg_eth(struct meson8b_dwmac *dwmac)
 		break;
 
 	case PHY_INTERFACE_MODE_RMII:
-		/* disable RGMII mode -> enables RMII mode */
-		meson8b_dwmac_mask_bits(dwmac, PRG_ETH0, PRG_ETH0_RGMII_MODE,
-					0);
-
 		/* invert internal clk_rmii_i to generate 25/2.5 tx_rx_clk */
 		meson8b_dwmac_mask_bits(dwmac, PRG_ETH0,
 					PRG_ETH0_INVERTED_RMII_CLK,
@@ -274,6 +317,11 @@ static int meson8b_dwmac_probe(struct platform_device *pdev)
 		goto err_remove_config_dt;
 	}
 
+	dwmac->data = (const struct meson8b_dwmac_data *)
+		of_device_get_match_data(&pdev->dev);
+	if (!dwmac->data)
+		return -EINVAL;
+
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
 	dwmac->regs = devm_ioremap_resource(&pdev->dev, res);
 	if (IS_ERR(dwmac->regs)) {
@@ -298,6 +346,10 @@ static int meson8b_dwmac_probe(struct platform_device *pdev)
 	if (ret)
 		goto err_remove_config_dt;
 
+	ret = meson8b_init_set_mode(dwmac);
+	if (ret)
+		goto err_remove_config_dt;
+
 	ret = meson8b_init_prg_eth(dwmac);
 	if (ret)
 		goto err_remove_config_dt;
@@ -316,10 +368,31 @@ static int meson8b_dwmac_probe(struct platform_device *pdev)
 	return ret;
 }
 
+static const struct meson8b_dwmac_data meson8b_dwmac_data = {
+	.ext_phy_mode = false,
+};
+
+static const struct meson8b_dwmac_data meson_axg_dwmac_data = {
+	.ext_phy_mode = true,
+};
+
 static const struct of_device_id meson8b_dwmac_match[] = {
-	{ .compatible = "amlogic,meson8b-dwmac" },
-	{ .compatible = "amlogic,meson8m2-dwmac" },
-	{ .compatible = "amlogic,meson-gxbb-dwmac" },
+	{
+		.compatible = "amlogic,meson8b-dwmac",
+		.data = &meson8b_dwmac_data,
+	},
+	{
+		.compatible = "amlogic,meson8m2-dwmac",
+		.data = &meson8b_dwmac_data,
+	},
+	{
+		.compatible = "amlogic,meson-gxbb-dwmac",
+		.data = &meson8b_dwmac_data,
+	},
+	{
+		.compatible = "amlogic,meson-axg-dwmac",
+		.data = &meson_axg_dwmac_data,
+	},
 	{ }
 };
 MODULE_DEVICE_TABLE(of, meson8b_dwmac_match);
-- 
2.17.0

^ permalink raw reply related

* [PATCH 1/2] dt-bindings: net: meson-dwmac: new compatible name for AXG SoC
From: Yixun Lan @ 2018-04-26 16:05 UTC (permalink / raw)
  To: David S. Miller, netdev
  Cc: Yixun Lan, Kevin Hilman, Carlo Caione, Rob Herring,
	Martin Blumenstingl, linux-amlogic, linux-arm-kernel,
	linux-kernel, devicetree
In-Reply-To: <20180426160508.29380-1-yixun.lan@amlogic.com>

We need to introduce a new compatible name for the Meson-AXG SoC
in order to support the RMII 100M ethernet PHY, since the PRG_ETH0
register of the dwmac glue layer is changed from previous old SoC.

Signed-off-by: Yixun Lan <yixun.lan@amlogic.com>
---
 Documentation/devicetree/bindings/net/meson-dwmac.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/net/meson-dwmac.txt b/Documentation/devicetree/bindings/net/meson-dwmac.txt
index 61cada22ae6c..1321bb194ed9 100644
--- a/Documentation/devicetree/bindings/net/meson-dwmac.txt
+++ b/Documentation/devicetree/bindings/net/meson-dwmac.txt
@@ -11,6 +11,7 @@ Required properties on all platforms:
 			- "amlogic,meson8b-dwmac"
 			- "amlogic,meson8m2-dwmac"
 			- "amlogic,meson-gxbb-dwmac"
+			- "amlogic,meson-axg-dwmac"
 		Additionally "snps,dwmac" and any applicable more
 		detailed version number described in net/stmmac.txt
 		should be used.
-- 
2.17.0

^ permalink raw reply related

* [PATCH 0/2] net: stmmac: dwmac-meson: 100M phy mode support for AXG SoC
From: Yixun Lan @ 2018-04-26 16:05 UTC (permalink / raw)
  To: David S. Miller, netdev
  Cc: Yixun Lan, Kevin Hilman, Carlo Caione, Rob Herring,
	Martin Blumenstingl, linux-amlogic, linux-arm-kernel,
	linux-kernel, devicetree

Due to the dwmac glue layer register changed, we need to 
introduce a new compatible name for the Meson-AXG SoC
to support for the RMII 100M ethernet PHY.

Yixun Lan (2):
  dt-bindings: net: meson-dwmac: new compatible name for AXG SoC
  net: stmmac: dwmac-meson: extend phy mode setting

 .../devicetree/bindings/net/meson-dwmac.txt   |  1 +
 .../ethernet/stmicro/stmmac/dwmac-meson8b.c   | 96 ++++++++++++++++---
 2 files changed, 86 insertions(+), 11 deletions(-)

-- 
2.17.0

^ permalink raw reply

* Re: [dm-devel] [PATCH v5] fault-injection: introduce kvmalloc fallback options
From: Michael S. Tsirkin @ 2018-04-26 15:59 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: James Bottomley, Michal Hocko, David Rientjes, dm-devel,
	eric.dumazet, netdev, jasowang, Randy Dunlap, linux-kernel,
	Matthew Wilcox, linux-mm, edumazet, Andrew Morton, virtualization,
	David Miller, Vlastimil Babka
In-Reply-To: <alpine.LRH.2.02.1804261142480.21152@file01.intranet.prod.int.rdu2.redhat.com>

On Thu, Apr 26, 2018 at 11:44:21AM -0400, Mikulas Patocka wrote:
> 
> 
> On Thu, 26 Apr 2018, James Bottomley wrote:
> 
> > On Thu, 2018-04-26 at 11:05 -0400, Mikulas Patocka wrote:
> > > 
> > > On Thu, 26 Apr 2018, James Bottomley wrote:
> > [...]
> > > > Perhaps find out beforehand instead of insisting on an approach
> > > without
> > > > knowing.  On openSUSE the grub config is built from the files in
> > > > /etc/grub.d/ so any package can add a kernel option (and various
> > > > conditions around activating it) simply by adding a new file.
> > > 
> > > And then, different versions of the debug kernel will clash when 
> > > attempting to create the same file.
> > 
> > Don't be silly ... there are many ways of coping with that in rpm/dpkg.
> 
> I know you can deal with it - but how many lines of code will that 
> consume? Multiplied by the total number of rpm-based distros.
> 
> Mikulas

I don't think debug kernels should inject faults by default.

IIUC debug kernels mainly exist so people who experience e.g. memory
corruption can try and debug the failure. In this case, CONFIG_DEBUG_SG
will *already* catch a failure early. Nothing special needs to be done.

There is a much smaller class of people like QA who go actively looking
for trouble. That's the kind of thing fault injection is good for, and
IMO you do not want your QA team to test a separate kernel - otherwise
you are never quite sure whatever was tested will work in the field.
So a config option won't help them either.

How do you make sure QA tests a specific corner case? Add it to
the test plan :)

I don't speak for Red Hat, etc.

-- 
MST

^ permalink raw reply

* Re: [PATCH net-next 6/6] mlxsw: spectrum_span: Allow bridge for gretap mirror
From: Stephen Hemminger @ 2018-04-26 15:58 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Ido Schimmel, netdev, bridge, davem, jiri, petrm, mlxsw
In-Reply-To: <1e3e120f-821a-2d99-b720-0585ce9c1803@cumulusnetworks.com>

On Thu, 26 Apr 2018 13:50:12 +0300
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:

> On 26/04/18 12:06, Ido Schimmel wrote:
> > From: Petr Machata <petrm@mellanox.com>
> > 
> > When handling mirroring to a gretap or ip6gretap netdevice in mlxsw, the
> > underlay address (i.e. the remote address of the tunnel) may be routed
> > to a bridge.
> > 
> > In that case, look up the resolved neighbor Ethernet address in that
> > bridge's FDB. Then configure the offload to direct the mirrored traffic
> > to that port, possibly with tagging.
> > 
> > Signed-off-by: Petr Machata <petrm@mellanox.com>
> > Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> > ---
> >   .../net/ethernet/mellanox/mlxsw/spectrum_span.c    | 102 +++++++++++++++++++--
> >   .../net/ethernet/mellanox/mlxsw/spectrum_span.h    |   1 +
> >   2 files changed, 97 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
> > index 65a77708ff61..92fb81839852 100644
> > --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
> > +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
> > @@ -32,6 +32,7 @@
> >    * POSSIBILITY OF SUCH DAMAGE.
> >    */
> >   
> > +#include <linux/if_bridge.h>
> >   #include <linux/list.h>
> >   #include <net/arp.h>
> >   #include <net/gre.h>
> > @@ -39,8 +40,9 @@
> >   #include <net/ip6_tunnel.h>
> >   
> >   #include "spectrum.h"
> > -#include "spectrum_span.h"
> >   #include "spectrum_ipip.h"
> > +#include "spectrum_span.h"
> > +#include "spectrum_switchdev.h"
> >   
> >   int mlxsw_sp_span_init(struct mlxsw_sp *mlxsw_sp)
> >   {
> > @@ -167,6 +169,79 @@ mlxsw_sp_span_entry_unoffloadable(struct mlxsw_sp_span_parms *sparmsp)
> >   	return 0;
> >   }
> >   
> > +static struct net_device *
> > +mlxsw_sp_span_entry_bridge_8021q(const struct net_device *br_dev,
> > +				 unsigned char *dmac,
> > +				 u16 *p_vid)
> > +{
> > +	struct net_bridge_vlan_group *vg = br_vlan_group_rtnl(br_dev);
> > +	u16 pvid = br_vlan_group_pvid(vg);
> > +	struct net_device *edev = NULL;
> > +	struct net_bridge_vlan *v;
> > +  
> 
> Hi,
> These structures are not really exported anywhere outside of br_private.h
> Instead of passing them around and risking someone else actually trying to
> dereference an incomplete type, why don't you add just 2 new helpers -
> br_vlan_pvid(netdevice) and br_vlan_info(netdevice, vid, *bridge_vlan_info)
> 
> br_vlan_info can return the exported and already available type "bridge_vlan_info"
> (defined in uapi/linux/if_bridge.h) into the bridge_vlan_info arg
> and br_vlan_pvid is obvious - return the current dev pvid if available.
> 
> Another bonus you'll avoid dealing with the bridge's vlan internals.

I am particularly worried that any locking changes will impact the device
driver. Also lock inversion issues, and implied (or not RTNL).
Also what if a value is changed, there is no notification path back to the device.

Having an API that only takes net_device and vlan seems preferable.

^ permalink raw reply

* Re: [dm-devel] [PATCH v5] fault-injection: introduce kvmalloc fallback options
From: Mikulas Patocka @ 2018-04-26 15:55 UTC (permalink / raw)
  To: James Bottomley
  Cc: Michal Hocko, David Rientjes, dm-devel, eric.dumazet, mst, netdev,
	jasowang, Randy Dunlap, linux-kernel, Matthew Wilcox, linux-mm,
	edumazet, Andrew Morton, virtualization, David Miller,
	Vlastimil Babka
In-Reply-To: <1524756256.3226.7.camel@HansenPartnership.com>



On Thu, 26 Apr 2018, James Bottomley wrote:

> So you're shifting your argument from "I have to do it as a Kconfig
> option because the distros require it" to "distributions will build
> separate kernel packages for this, but won't do enabling in a non
> kernel package"?  To be honest, I think the argument is nuts but I
> don't really care.  From my point of view it's usually me explaining to
> people how to debug stuff and "you have to build your own kernel with
> this Kconfig option" compared to "add this to the kernel command line
> and reboot" is much more effort for the debugger.
> 
> James

If you have to explain to the user that he needs to turn it on, it is 
already wrong.

In order to find the kvmalloc abuses, it should be tested by as many users 
as possible. And it could be tested by as many users as possible, if it 
can be enabled in a VISIBLE place (i.e. menuconfig) - or (in my opinion 
even better) it should be bound to an CONFIG_ option that is already 
enabled for debugging kernel - then you won't have to explain anything to 
the user at all.

Hardly anyone - except for people who read this thread - will know about 
the new commandline parameters or debugfs files.

I'm not arguing that the commandline parameter or debugfs files are wrong. 
They are OK to overridde the default settings for advanced users. But they 
are useless for common users because common users won't know about them.

Mikulas

^ permalink raw reply

* Re: [PATCH net-next] bridge: use hlist_entry_safe
From: Stephen Hemminger @ 2018-04-26 15:54 UTC (permalink / raw)
  To: YueHaibing; +Cc: davem, netdev, bridge
In-Reply-To: <20180426030705.6632-1-yuehaibing@huawei.com>

On Thu, 26 Apr 2018 11:07:05 +0800
YueHaibing <yuehaibing@huawei.com> wrote:

> Use hlist_entry_safe() instead of open-coding it.
> 
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> ---
>  net/bridge/br_forward.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
> index b4eed11..7a7fd67 100644
> --- a/net/bridge/br_forward.c
> +++ b/net/bridge/br_forward.c
> @@ -274,8 +274,7 @@ void br_multicast_flood(struct net_bridge_mdb_entry *mdst,
>  		struct net_bridge_port *port, *lport, *rport;
>  
>  		lport = p ? p->port : NULL;
> -		rport = rp ? hlist_entry(rp, struct net_bridge_port, rlist) :
> -			     NULL;
> +		rport = hlist_entry_safe(rp, struct net_bridge_port, rlist);
>  
>  		if ((unsigned long)lport > (unsigned long)rport) {
>
  			port = lport;

Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>

^ permalink raw reply

* Re: [PATCH net] sctp: clear the new asoc's stream outcnt in sctp_stream_update
From: Marcelo Ricardo Leitner @ 2018-04-26 15:53 UTC (permalink / raw)
  To: Xin Long; +Cc: network dev, linux-sctp, davem, Neil Horman
In-Reply-To: <7a1180e29789ab0aa339ae8b456a100520ffcdc5.1524727304.git.lucien.xin@gmail.com>

On Thu, Apr 26, 2018 at 03:21:44PM +0800, Xin Long wrote:
> When processing a duplicate cookie-echo chunk, sctp moves the new
> temp asoc's stream out/in into the old asoc, and later frees this
> new temp asoc.
>
> But now after this move, the new temp asoc's stream->outcnt is not
> cleared while stream->out is set to NULL, which would cause a same
> crash as the one fixed in Commit 79d0895140e9 ("sctp: fix error
> path in sctp_stream_init") when freeing this asoc later.
>
> This fix is to clear this outcnt in sctp_stream_update.
>
> Fixes: f952be79cebd ("sctp: introduce struct sctp_stream_out_ext")
> Reported-by: Jianwen Ji <jiji@redhat.com>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

> ---
>  net/sctp/stream.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/net/sctp/stream.c b/net/sctp/stream.c
> index f799043..f1f1d1b 100644
> --- a/net/sctp/stream.c
> +++ b/net/sctp/stream.c
> @@ -240,6 +240,8 @@ void sctp_stream_update(struct sctp_stream *stream, struct sctp_stream *new)
>
>  	new->out = NULL;
>  	new->in  = NULL;
> +	new->outcnt = 0;
> +	new->incnt  = 0;
>  }
>
>  static int sctp_send_reconf(struct sctp_association *asoc,
> --
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox