Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH RFC iproute2-next] iplink: allow to change iplink value
From: Marek Behún @ 2019-08-24  2:42 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, David Ahern,
	Stephen Hemminger, Marek Behún
In-Reply-To: <20190824024251.4542-1-marek.behun@nic.cz>

Allow to change the interface to which a given interface is linked to.
This is useful in the case of multi-CPU port DSA, for changing the CPU
port of a given user port.

Signed-off-by: Marek Behún <marek.behun@nic.cz>
Cc: David Ahern <dsahern@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
---
 ip/iplink.c           | 16 +++++-----------
 man/man8/ip-link.8.in |  7 +++++++
 2 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/ip/iplink.c b/ip/iplink.c
index 212a0885..d52c0aaf 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -579,7 +579,6 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req, char **type)
 {
 	char *name = NULL;
 	char *dev = NULL;
-	char *link = NULL;
 	int ret, len;
 	char abuf[32];
 	int qlen = -1;
@@ -590,6 +589,7 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req, char **type)
 	int numrxqueues = -1;
 	int link_netnsid = -1;
 	int index = 0;
+	int link = -1;
 	int group = -1;
 	int addr_len = 0;
 
@@ -620,7 +620,10 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req, char **type)
 				invarg("Invalid \"index\" value", *argv);
 		} else if (matches(*argv, "link") == 0) {
 			NEXT_ARG();
-			link = *argv;
+			link = ll_name_to_index(*argv);
+			if (!link)
+				return nodev(*argv);
+			addattr32(&req->n, sizeof(*req), IFLA_LINK, link);
 		} else if (matches(*argv, "address") == 0) {
 			NEXT_ARG();
 			addr_len = ll_addr_a2n(abuf, sizeof(abuf), *argv);
@@ -1004,15 +1007,6 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req, char **type)
 			exit(-1);
 		}
 
-		if (link) {
-			int ifindex;
-
-			ifindex = ll_name_to_index(link);
-			if (!ifindex)
-				return nodev(link);
-			addattr32(&req->n, sizeof(*req), IFLA_LINK, ifindex);
-		}
-
 		req->i.ifi_index = index;
 	}
 
diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index a8ae72d2..800aed05 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -149,6 +149,9 @@ ip-link \- network device configuration
 .br
 .RB "[ " nomaster " ]"
 .br
+.RB "[ " link
+.IR DEVICE " ]"
+.br
 .RB "[ " vrf
 .IR NAME " ]"
 .br
@@ -2131,6 +2134,10 @@ set master device of the device (enslave device).
 .BI nomaster
 unset master device of the device (release device).
 
+.TP
+.BI link " DEVICE"
+set device to which this device is linked to.
+
 .TP
 .BI addrgenmode " eui64|none|stable_secret|random"
 set the IPv6 address generation mode
-- 
2.21.0


^ permalink raw reply related

* [PATCH RFC net-next 3/3] net: dsa: implement ndo_set_netlink for chaning port's CPU port
From: Marek Behún @ 2019-08-24  2:42 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, David Ahern,
	Stephen Hemminger, Marek Behún
In-Reply-To: <20190824024251.4542-1-marek.behun@nic.cz>

Implement ndo_set_iflink for DSA slave device. In multi-CPU port setup
this should be used to change to which CPU destination port a given port
should be connected.

This adds a new operation into the DSA switch operations structure,
port_change_cpu_port. A driver implementing this function has the
ability to change CPU destination port of a given port.

Signed-off-by: Marek Behún <marek.behun@nic.cz>
---
 include/net/dsa.h |  6 ++++++
 net/dsa/slave.c   | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 64bd70608f2f..4f3f0032b886 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -545,6 +545,12 @@ struct dsa_switch_ops {
 	 */
 	netdev_tx_t (*port_deferred_xmit)(struct dsa_switch *ds, int port,
 					  struct sk_buff *skb);
+
+	/*
+	 * Multi-CPU port support
+	 */
+	int	(*port_change_cpu_port)(struct dsa_switch *ds, int port,
+					struct dsa_port *new_cpu_dp);
 };
 
 struct dsa_switch_driver {
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 33f41178afcc..bafaadeca912 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -64,6 +64,40 @@ static int dsa_slave_get_iflink(const struct net_device *dev)
 	return dsa_slave_to_master(dev)->ifindex;
 }
 
+static int dsa_slave_set_iflink(struct net_device *dev, int iflink)
+{
+	struct dsa_port *dp = dsa_slave_to_port(dev);
+	struct dsa_slave_priv *p = netdev_priv(dev);
+	struct net_device *new_cpu_dev;
+	struct dsa_port *new_cpu_dp;
+	int err;
+
+	if (!dp->ds->ops->port_change_cpu_port)
+		return -EOPNOTSUPP;
+
+	new_cpu_dev = dev_get_by_index(dev_net(dev), iflink);
+	if (!new_cpu_dev)
+		return -ENODEV;
+
+	new_cpu_dp = new_cpu_dev->dsa_ptr;
+	if (!new_cpu_dp)
+		return -EINVAL;
+
+	/* new CPU port has to be on the same switch tree */
+	if (new_cpu_dp->dst != dp->dst)
+		return -EINVAL;
+
+	err = dp->ds->ops->port_change_cpu_port(dp->ds, dp->index, new_cpu_dp);
+	if (err)
+		return err;
+
+	/* should this be done atomically? */
+	dp->cpu_dp = new_cpu_dp;
+	p->xmit = new_cpu_dp->tag_ops->xmit;
+
+	return 0;
+}
+
 static int dsa_slave_open(struct net_device *dev)
 {
 	struct net_device *master = dsa_slave_to_master(dev);
@@ -1176,6 +1210,7 @@ static const struct net_device_ops dsa_slave_netdev_ops = {
 	.ndo_fdb_dump		= dsa_slave_fdb_dump,
 	.ndo_do_ioctl		= dsa_slave_ioctl,
 	.ndo_get_iflink		= dsa_slave_get_iflink,
+	.ndo_set_iflink		= dsa_slave_set_iflink,
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_netpoll_setup	= dsa_slave_netpoll_setup,
 	.ndo_netpoll_cleanup	= dsa_slave_netpoll_cleanup,
-- 
2.21.0


^ permalink raw reply related

* RE: [PATCH v2 0/2] Simplify mtty driver and mdev core
From: Parav Pandit @ 2019-08-24  3:56 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Jiri Pirko, Jiri Pirko, David S . Miller, Kirti Wankhede,
	Cornelia Huck, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	cjia, netdev@vger.kernel.org
In-Reply-To: <20190823134337.37e4b215@x1.home>



> -----Original Message-----
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Saturday, August 24, 2019 1:14 AM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>; David S . Miller
> <davem@davemloft.net>; Kirti Wankhede <kwankhede@nvidia.com>; Cornelia
> Huck <cohuck@redhat.com>; kvm@vger.kernel.org; linux-
> kernel@vger.kernel.org; cjia <cjia@nvidia.com>; netdev@vger.kernel.org
> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> 
> On Fri, 23 Aug 2019 18:00:30 +0000
> Parav Pandit <parav@mellanox.com> wrote:
> 
> > > -----Original Message-----
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Friday, August 23, 2019 10:47 PM
> > > To: Parav Pandit <parav@mellanox.com>
> > > Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>;
> > > David S . Miller <davem@davemloft.net>; Kirti Wankhede
> > > <kwankhede@nvidia.com>; Cornelia Huck <cohuck@redhat.com>;
> > > kvm@vger.kernel.org; linux- kernel@vger.kernel.org; cjia
> > > <cjia@nvidia.com>; netdev@vger.kernel.org
> > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > >
> > > On Fri, 23 Aug 2019 16:14:04 +0000
> > > Parav Pandit <parav@mellanox.com> wrote:
> > >
> > > > > > Idea is to have mdev alias as optional.
> > > > > > Each mdev_parent says whether it wants mdev_core to generate
> > > > > > an alias or not. So only networking device drivers would set it to true.
> > > > > > For rest, alias won't be generated, and won't be compared
> > > > > > either during creation time. User continue to provide only uuid.
> > > > >
> > > > > Ok
> > > > >
> > > > > > I am tempted to have alias collision detection only within
> > > > > > children mdevs of the same parent, but doing so will always
> > > > > > mandate to prefix in netdev name. And currently we are left
> > > > > > with only 3 characters to prefix it, so that may not be good either.
> > > > > > Hence, I think mdev core wide alias is better with 12 characters.
> > > > >
> > > > > I suppose it depends on the API, if the vendor driver can ask
> > > > > the mdev core for an alias as part of the device creation
> > > > > process, then it could manage the netdev namespace for all its
> > > > > devices, choosing how many characters to use, and fail the
> > > > > creation if it can't meet a uniqueness requirement.  IOW,
> > > > > mdev-core would always provide a full
> > > > > sha1 and therefore gets itself out of the uniqueness/collision aspects.
> > > > >
> > > > This doesn't work. At mdev core level 20 bytes sha1 are unique, so
> > > > mdev core allowed to create a mdev.
> > >
> > > The mdev vendor driver has the opportunity to fail the device
> > > creation in mdev_parent_ops.create().
> > >
> > That is not helpful for below reasons.
> > 1. vendor driver doesn't have visibility in other vendor's alias.
> > 2. Even for single vendor, it needs to maintain global list of devices to see
> collision.
> > 3. multiple vendors needs to implement same scheme.
> >
> > Mdev core should be the owner. Shifting ownership from one layer to a
> > lower layer in vendor driver doesn't solve the problem (if there is
> > one, which I think doesn't exist).
> >
> > > > And then devlink core chooses
> > > > only 6 bytes (12 characters) and there is collision. Things fall
> > > > apart. Since mdev provides unique uuid based scheme, it's the mdev
> > > > core's ownership to provide unique aliases.
> > >
> > > You're suggesting/contemplating multiple solutions here, 3-char
> > > prefix + 12- char sha1 vs <parent netdev> + ?-char sha1.  Also, the
> > > 15-char total limit is imposed by an external subsystem, where the
> > > vendor driver is the gateway between that subsystem and mdev.  How
> > > would mdev integrate with another subsystem that maybe only has
> > > 9-chars available?  Would the vendor driver API specify "I need an
> > > alias" or would it specify "I need an X-char length alias"?
> > Yes, Vendor driver should say how long the alias it wants.
> > However before we implement that, I suggest let such
> > vendor/user/driver arrive which needs that. Such variable length alias
> > can be added at that time and even with that alias collision can be
> > detected by single mdev module.
> 
> If we agree that different alias lengths are possible, then I would request that
> minimally an mdev sample driver be modified to request an alias with a length
> that can be adjusted without recompiling in order to exercise the collision path.
> 
Yes. this can be done. But I fail to understand the need to do so.
It is not the responsibility of the mdev core to show case sha1 collision efficiency/deficiency.
So why do you insist exercise it?

> If mdev-core is guaranteeing uniqueness, does this indicate that each alias
> length constitutes a separate namespace?  ie. strictly a strcmp(), not a
> strncmp() to the shorter alias.
> 
Yes.


> > > Does it make sense that mdev-core would fail creation of a device if
> > > there's a collision in the 12-char address space between different
> > > subsystems?  For example, does enm0123456789ab really
> > > collide with xyz0123456789ab?
> > I think so, because at mdev level its 12-char alias matters.
> > Choosing the prefix not adding prefix is really a user space choice.
> >
> > >  So if
> > > mdev were to provided a 40-char sha1, is it possible that the vendor
> > > driver could consume this in its create callback, truncate it to the
> > > number of chars required by the vendor driver's subsystem, and
> > > determine whether a collision exists?
> > We shouldn't shift the problem from mdev to multiple vendor drivers to
> > detect collision.
> >
> > I still think that user providing alias is better because it knows the
> > use-case system in use, and eliminates these collision issue.
> 
> How is a user provided alias immune from collisions?  The burden is on the user
> to provide both a unique uuid and a unique alias.  That makes it trivial to create
> a collision.
> 
Than such collision should have occurred for other subsystem such as netdev while creating vlan, macvlan, ipvlan, vxlan and more devices who are named by the user.
But that isn't the case.

> > > > > > I do not understand how an extra character reduces collision,
> > > > > > if that's what you meant.
> > > > >
> > > > > If the default were for example 3-chars, we might already have
> > > > > device 'abc'.  A collision would expose one more char of the new
> > > > > device, so we might add device with alias 'abcd'.  I mentioned
> > > > > previously that this leaves an issue for userspace that we can't
> > > > > change the alias of device abc, so without additional
> > > > > information, userspace can only determine via elimination the
> > > > > mapping of alias to device, but userspace has more information
> > > > > available to it in the form of sysfs links.
> > > > > > Module options are almost not encouraged anymore with other
> > > > > > subsystems/drivers.
> > > > >
> > > > > We don't live in a world of absolutes.  I agree that the
> > > > > defaults should work in the vast majority of cases.  Requiring a
> > > > > user to twiddle module options to make things work is
> > > > > undesirable, verging on a bug.  A module option to enable some
> > > > > specific feature, unsafe condition, or test that is outside of
> > > > > the typical use case is reasonable, imo.
> > > > > > For testing collision rate, a sample user space script and
> > > > > > sample mtty is easy and get us collision count too. We
> > > > > > shouldn't put that using module option in production kernel.
> > > > > > I practically have the code ready to play with; Changing 12 to
> > > > > > smaller value is easy with module reload.
> > > > > >
> > > > > > #define MDEV_ALIAS_LEN 12
> > > > >
> > > > > If it can't be tested with a shipping binary, it probably won't
> > > > > be tested.  Thanks,
> > > > It is not the role of mdev core to expose collision
> > > > efficiency/deficiency of the sha1. It can be tested outside before
> > > > mdev choose to use it.
> > >
> > > The testing I'm considering is the user and kernel response to a
> > > collision.
> > > > I am saying we should test with 12 characters with 10,000 or more
> > > > devices and see how collision occurs. Even if collision occurs,
> > > > mdev returns EEXIST status indicating user to pick a different
> > > > UUID for those rare conditions.
> > >
> > > The only way we're going to see collision with a 12-char sha1 is if
> > > we burn the CPU cycles to find uuids that collide in that space.
> > > 10,000 devices is not remotely enough to generate a collision in
> > > that address space.  That puts a prerequisite in place that in order
> > > to test collision, someone needs to know certain magic inputs.
> > > OTOH, if we could use a shorter abbreviation, collisions are trivial
> > > to test experimentally.  Thanks,
> > Yes, and therefore a sane user who wants to create more mdevs,
> > wouldn't intentionally stress it to see failures.
> 
> I don't understand this logic.  I'm simply asking that we have a way to test the
> collision behavior without changing the binary.  The path we're driving towards
> seems to be making this easier and easier.  If the vendor can request an alias of
> a specific length, then a sample driver with a module option to set the desired
> alias length to 1-char makes it trivially easy to induce a collision.  
Sure it is easy to test collision, but my point is - mdev core is not sha1 test module.
Hence adding functionality of variable alias length to test collision doesn't make sense.
When the actual user arrives who needs small alias, we will be able to add additional pieces very easily.

> It doesn't
> even need to be exposed in a real driver.  Besides, when do we ever get to
> design interfaces that only worry about sane users???  Thanks,
> 
I intent to say that a sane user who wants to create mdev's will just work fine with less collision.
If there is collision EEXIST is returns and sane user picks different UUID.
If user is intentionally picking UUIDs in such a way that triggers sha1 collision, his intention is likely to not create mdevs for actual use.
And if interface returns error code it is still fine.

> Alex

^ permalink raw reply

* RE: [PATCH v2 0/2] Simplify mtty driver and mdev core
From: Parav Pandit @ 2019-08-24  4:45 UTC (permalink / raw)
  To: Parav Pandit, Alex Williamson
  Cc: Jiri Pirko, Jiri Pirko, David S . Miller, Kirti Wankhede,
	Cornelia Huck, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	cjia, netdev@vger.kernel.org
In-Reply-To: <AM0PR05MB4866008B0571B90DAFFADA97D1A70@AM0PR05MB4866.eurprd05.prod.outlook.com>

Hi Alex,

> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org <linux-kernel-
> owner@vger.kernel.org> On Behalf Of Parav Pandit
> Sent: Saturday, August 24, 2019 9:26 AM
> To: Alex Williamson <alex.williamson@redhat.com>
> Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>; David S .
> Miller <davem@davemloft.net>; Kirti Wankhede <kwankhede@nvidia.com>;
> Cornelia Huck <cohuck@redhat.com>; kvm@vger.kernel.org; linux-
> kernel@vger.kernel.org; cjia <cjia@nvidia.com>; netdev@vger.kernel.org
> Subject: RE: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > I don't understand this logic.  I'm simply asking that we have a way
> > to test the collision behavior without changing the binary.  The path
> > we're driving towards seems to be making this easier and easier.  If
> > the vendor can request an alias of a specific length, then a sample
> > driver with a module option to set the desired alias length to 1-char makes
> it trivially easy to induce a collision.
> Sure it is easy to test collision, but my point is - mdev core is not sha1 test
> module.
> Hence adding functionality of variable alias length to test collision doesn't
> make sense.
> When the actual user arrives who needs small alias, we will be able to add
> additional pieces very easily.

My initial thoughts to add parent_ops to have bool flag to generate alias or not.
However, instead of bool, keeping it unsigned int to say, zero to skip alias and non-zero length to convey generate alias.
This will serve both the purpose with trivial handling.



^ permalink raw reply

* Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
From: Alex Williamson @ 2019-08-24  4:59 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jiri Pirko, Jiri Pirko, David S . Miller, Kirti Wankhede,
	Cornelia Huck, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	cjia, netdev@vger.kernel.org
In-Reply-To: <AM0PR05MB4866008B0571B90DAFFADA97D1A70@AM0PR05MB4866.eurprd05.prod.outlook.com>

On Sat, 24 Aug 2019 03:56:08 +0000
Parav Pandit <parav@mellanox.com> wrote:

> > -----Original Message-----
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Saturday, August 24, 2019 1:14 AM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>; David S . Miller
> > <davem@davemloft.net>; Kirti Wankhede <kwankhede@nvidia.com>; Cornelia
> > Huck <cohuck@redhat.com>; kvm@vger.kernel.org; linux-
> > kernel@vger.kernel.org; cjia <cjia@nvidia.com>; netdev@vger.kernel.org
> > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > 
> > On Fri, 23 Aug 2019 18:00:30 +0000
> > Parav Pandit <parav@mellanox.com> wrote:
> >   
> > > > -----Original Message-----
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Friday, August 23, 2019 10:47 PM
> > > > To: Parav Pandit <parav@mellanox.com>
> > > > Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>;
> > > > David S . Miller <davem@davemloft.net>; Kirti Wankhede
> > > > <kwankhede@nvidia.com>; Cornelia Huck <cohuck@redhat.com>;
> > > > kvm@vger.kernel.org; linux- kernel@vger.kernel.org; cjia
> > > > <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > > >
> > > > On Fri, 23 Aug 2019 16:14:04 +0000
> > > > Parav Pandit <parav@mellanox.com> wrote:
> > > >  
> > > > > > > Idea is to have mdev alias as optional.
> > > > > > > Each mdev_parent says whether it wants mdev_core to generate
> > > > > > > an alias or not. So only networking device drivers would set it to true.
> > > > > > > For rest, alias won't be generated, and won't be compared
> > > > > > > either during creation time. User continue to provide only uuid.  
> > > > > >
> > > > > > Ok
> > > > > >  
> > > > > > > I am tempted to have alias collision detection only within
> > > > > > > children mdevs of the same parent, but doing so will always
> > > > > > > mandate to prefix in netdev name. And currently we are left
> > > > > > > with only 3 characters to prefix it, so that may not be good either.
> > > > > > > Hence, I think mdev core wide alias is better with 12 characters.  
> > > > > >
> > > > > > I suppose it depends on the API, if the vendor driver can ask
> > > > > > the mdev core for an alias as part of the device creation
> > > > > > process, then it could manage the netdev namespace for all its
> > > > > > devices, choosing how many characters to use, and fail the
> > > > > > creation if it can't meet a uniqueness requirement.  IOW,
> > > > > > mdev-core would always provide a full
> > > > > > sha1 and therefore gets itself out of the uniqueness/collision aspects.
> > > > > >  
> > > > > This doesn't work. At mdev core level 20 bytes sha1 are unique, so
> > > > > mdev core allowed to create a mdev.  
> > > >
> > > > The mdev vendor driver has the opportunity to fail the device
> > > > creation in mdev_parent_ops.create().
> > > >  
> > > That is not helpful for below reasons.
> > > 1. vendor driver doesn't have visibility in other vendor's alias.
> > > 2. Even for single vendor, it needs to maintain global list of devices to see  
> > collision.  
> > > 3. multiple vendors needs to implement same scheme.
> > >
> > > Mdev core should be the owner. Shifting ownership from one layer to a
> > > lower layer in vendor driver doesn't solve the problem (if there is
> > > one, which I think doesn't exist).
> > >  
> > > > > And then devlink core chooses
> > > > > only 6 bytes (12 characters) and there is collision. Things fall
> > > > > apart. Since mdev provides unique uuid based scheme, it's the mdev
> > > > > core's ownership to provide unique aliases.  
> > > >
> > > > You're suggesting/contemplating multiple solutions here, 3-char
> > > > prefix + 12- char sha1 vs <parent netdev> + ?-char sha1.  Also, the
> > > > 15-char total limit is imposed by an external subsystem, where the
> > > > vendor driver is the gateway between that subsystem and mdev.  How
> > > > would mdev integrate with another subsystem that maybe only has
> > > > 9-chars available?  Would the vendor driver API specify "I need an
> > > > alias" or would it specify "I need an X-char length alias"?  
> > > Yes, Vendor driver should say how long the alias it wants.
> > > However before we implement that, I suggest let such
> > > vendor/user/driver arrive which needs that. Such variable length alias
> > > can be added at that time and even with that alias collision can be
> > > detected by single mdev module.  
> > 
> > If we agree that different alias lengths are possible, then I would request that
> > minimally an mdev sample driver be modified to request an alias with a length
> > that can be adjusted without recompiling in order to exercise the collision path.
> >   
> Yes. this can be done. But I fail to understand the need to do so.
> It is not the responsibility of the mdev core to show case sha1
> collision efficiency/deficiency. So why do you insist exercise it?

I don't understand what you're trying to imply with "show case sha1
collision efficiency/deficiency".  Are you suggesting that I'm asking
for this feature to experimentally test the probability of collisions
at different character lengths?  We can use shell scripts for that.
I'm simply observing that collisions are possible based on user input,
but they're not practical to test for at the character lengths we're
using.  Therefore, how do I tell QA to develop a tests to make sure the
kernel and userspace tools that might be involved behave correctly when
this rare event occurs?

As I mentioned previously, we can burn the cpu cyles to find some uuids
which will collide with our aliases, but the more accessible approach
seems to be to have a tune-able to reduce the alias address space such
that we can simply throw enough random uuids into the test to guarantee
a collision.  Simply generating 10,000 devices with a 12-character
alias, as you suggested previously, has effectively a 0% probability of
generating a collision.

If we accept that different vendor drivers might have different alias
requirements, and therefore the vendor driver should have the ability
to specify an alias length, then this all fits very nicely into
modifying a sample driver to request a sufficiently short alias such
that we can use it to test the behavior of mdev-core and surrounding
code when an alias collision occurs.

> > If mdev-core is guaranteeing uniqueness, does this indicate that
> > each alias length constitutes a separate namespace?  ie. strictly a
> > strcmp(), not a strncmp() to the shorter alias.
> >   
> Yes.
> 
> 
> > > > Does it make sense that mdev-core would fail creation of a
> > > > device if there's a collision in the 12-char address space
> > > > between different subsystems?  For example, does
> > > > enm0123456789ab really collide with xyz0123456789ab?  
> > > I think so, because at mdev level its 12-char alias matters.
> > > Choosing the prefix not adding prefix is really a user space
> > > choice. 
> > > >  So if
> > > > mdev were to provided a 40-char sha1, is it possible that the
> > > > vendor driver could consume this in its create callback,
> > > > truncate it to the number of chars required by the vendor
> > > > driver's subsystem, and determine whether a collision exists?  
> > > We shouldn't shift the problem from mdev to multiple vendor
> > > drivers to detect collision.
> > >
> > > I still think that user providing alias is better because it
> > > knows the use-case system in use, and eliminates these collision
> > > issue.  
> > 
> > How is a user provided alias immune from collisions?  The burden is
> > on the user to provide both a unique uuid and a unique alias.  That
> > makes it trivial to create a collision.
> >   
> Than such collision should have occurred for other subsystem such as
> netdev while creating vlan, macvlan, ipvlan, vxlan and more devices
> who are named by the user. But that isn't the case.
> 
> > > > > > > I do not understand how an extra character reduces
> > > > > > > collision, if that's what you meant.  
> > > > > >
> > > > > > If the default were for example 3-chars, we might already
> > > > > > have device 'abc'.  A collision would expose one more char
> > > > > > of the new device, so we might add device with alias
> > > > > > 'abcd'.  I mentioned previously that this leaves an issue
> > > > > > for userspace that we can't change the alias of device abc,
> > > > > > so without additional information, userspace can only
> > > > > > determine via elimination the mapping of alias to device,
> > > > > > but userspace has more information available to it in the
> > > > > > form of sysfs links.  
> > > > > > > Module options are almost not encouraged anymore with
> > > > > > > other subsystems/drivers.  
> > > > > >
> > > > > > We don't live in a world of absolutes.  I agree that the
> > > > > > defaults should work in the vast majority of cases.
> > > > > > Requiring a user to twiddle module options to make things
> > > > > > work is undesirable, verging on a bug.  A module option to
> > > > > > enable some specific feature, unsafe condition, or test
> > > > > > that is outside of the typical use case is reasonable,
> > > > > > imo.  
> > > > > > > For testing collision rate, a sample user space script and
> > > > > > > sample mtty is easy and get us collision count too. We
> > > > > > > shouldn't put that using module option in production
> > > > > > > kernel. I practically have the code ready to play with;
> > > > > > > Changing 12 to smaller value is easy with module reload.
> > > > > > >
> > > > > > > #define MDEV_ALIAS_LEN 12  
> > > > > >
> > > > > > If it can't be tested with a shipping binary, it probably
> > > > > > won't be tested.  Thanks,  
> > > > > It is not the role of mdev core to expose collision
> > > > > efficiency/deficiency of the sha1. It can be tested outside
> > > > > before mdev choose to use it.  
> > > >
> > > > The testing I'm considering is the user and kernel response to a
> > > > collision.  
> > > > > I am saying we should test with 12 characters with 10,000 or
> > > > > more devices and see how collision occurs. Even if collision
> > > > > occurs, mdev returns EEXIST status indicating user to pick a
> > > > > different UUID for those rare conditions.  
> > > >
> > > > The only way we're going to see collision with a 12-char sha1
> > > > is if we burn the CPU cycles to find uuids that collide in that
> > > > space. 10,000 devices is not remotely enough to generate a
> > > > collision in that address space.  That puts a prerequisite in
> > > > place that in order to test collision, someone needs to know
> > > > certain magic inputs. OTOH, if we could use a shorter
> > > > abbreviation, collisions are trivial to test experimentally.
> > > > Thanks,  
> > > Yes, and therefore a sane user who wants to create more mdevs,
> > > wouldn't intentionally stress it to see failures.  
> > 
> > I don't understand this logic.  I'm simply asking that we have a
> > way to test the collision behavior without changing the binary.
> > The path we're driving towards seems to be making this easier and
> > easier.  If the vendor can request an alias of a specific length,
> > then a sample driver with a module option to set the desired alias
> > length to 1-char makes it trivially easy to induce a collision.    
> Sure it is easy to test collision, but my point is - mdev core is not
> sha1 test module. Hence adding functionality of variable alias length
> to test collision doesn't make sense. When the actual user arrives
> who needs small alias, we will be able to add additional pieces very
> easily.
> 
> > It doesn't
> > even need to be exposed in a real driver.  Besides, when do we ever
> > get to design interfaces that only worry about sane users???
> > Thanks, 
> I intent to say that a sane user who wants to create mdev's will just
> work fine with less collision. If there is collision EEXIST is
> returns and sane user picks different UUID. If user is intentionally
> picking UUIDs in such a way that triggers sha1 collision, his
> intention is likely to not create mdevs for actual use. And if
> interface returns error code it is still fine.

This is exactly the scenarios that I'm asking "how do we test that it
works as we expect".  I can test that passing identical uuids into the
mdev create interface only allows the first to succeed.  With a 12-char
sha1 alias, it's not practical to construct a test to validate the
alias collision behavior.  Do you suggest we rely only on code
inspection instead?  Thanks,

Alex

^ permalink raw reply

* RE: [PATCH v2 0/2] Simplify mtty driver and mdev core
From: Parav Pandit @ 2019-08-24  5:22 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Jiri Pirko, Jiri Pirko, David S . Miller, Kirti Wankhede,
	Cornelia Huck, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	cjia, netdev@vger.kernel.org
In-Reply-To: <20190823225929.38fd86f5@x1.home>



> -----Original Message-----
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Saturday, August 24, 2019 10:29 AM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>; David S .
> Miller <davem@davemloft.net>; Kirti Wankhede <kwankhede@nvidia.com>;
> Cornelia Huck <cohuck@redhat.com>; kvm@vger.kernel.org; linux-
> kernel@vger.kernel.org; cjia <cjia@nvidia.com>; netdev@vger.kernel.org
> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> 
> On Sat, 24 Aug 2019 03:56:08 +0000
> Parav Pandit <parav@mellanox.com> wrote:
> 
> > > -----Original Message-----
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Saturday, August 24, 2019 1:14 AM
> > > To: Parav Pandit <parav@mellanox.com>
> > > Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>;
> > > David S . Miller <davem@davemloft.net>; Kirti Wankhede
> > > <kwankhede@nvidia.com>; Cornelia Huck <cohuck@redhat.com>;
> > > kvm@vger.kernel.org; linux- kernel@vger.kernel.org; cjia
> > > <cjia@nvidia.com>; netdev@vger.kernel.org
> > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > >
> > > On Fri, 23 Aug 2019 18:00:30 +0000
> > > Parav Pandit <parav@mellanox.com> wrote:
> > >
> > > > > -----Original Message-----
> > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > Sent: Friday, August 23, 2019 10:47 PM
> > > > > To: Parav Pandit <parav@mellanox.com>
> > > > > Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko
> > > > > <jiri@mellanox.com>; David S . Miller <davem@davemloft.net>;
> > > > > Kirti Wankhede <kwankhede@nvidia.com>; Cornelia Huck
> > > > > <cohuck@redhat.com>; kvm@vger.kernel.org; linux-
> > > > > kernel@vger.kernel.org; cjia <cjia@nvidia.com>;
> > > > > netdev@vger.kernel.org
> > > > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > > > >
> > > > > On Fri, 23 Aug 2019 16:14:04 +0000 Parav Pandit
> > > > > <parav@mellanox.com> wrote:
> > > > >
> > > > > > > > Idea is to have mdev alias as optional.
> > > > > > > > Each mdev_parent says whether it wants mdev_core to
> > > > > > > > generate an alias or not. So only networking device drivers
> would set it to true.
> > > > > > > > For rest, alias won't be generated, and won't be compared
> > > > > > > > either during creation time. User continue to provide only uuid.
> > > > > > >
> > > > > > > Ok
> > > > > > >
> > > > > > > > I am tempted to have alias collision detection only within
> > > > > > > > children mdevs of the same parent, but doing so will
> > > > > > > > always mandate to prefix in netdev name. And currently we
> > > > > > > > are left with only 3 characters to prefix it, so that may not be
> good either.
> > > > > > > > Hence, I think mdev core wide alias is better with 12 characters.
> > > > > > >
> > > > > > > I suppose it depends on the API, if the vendor driver can
> > > > > > > ask the mdev core for an alias as part of the device
> > > > > > > creation process, then it could manage the netdev namespace
> > > > > > > for all its devices, choosing how many characters to use,
> > > > > > > and fail the creation if it can't meet a uniqueness
> > > > > > > requirement.  IOW, mdev-core would always provide a full
> > > > > > > sha1 and therefore gets itself out of the uniqueness/collision
> aspects.
> > > > > > >
> > > > > > This doesn't work. At mdev core level 20 bytes sha1 are
> > > > > > unique, so mdev core allowed to create a mdev.
> > > > >
> > > > > The mdev vendor driver has the opportunity to fail the device
> > > > > creation in mdev_parent_ops.create().
> > > > >
> > > > That is not helpful for below reasons.
> > > > 1. vendor driver doesn't have visibility in other vendor's alias.
> > > > 2. Even for single vendor, it needs to maintain global list of
> > > > devices to see
> > > collision.
> > > > 3. multiple vendors needs to implement same scheme.
> > > >
> > > > Mdev core should be the owner. Shifting ownership from one layer
> > > > to a lower layer in vendor driver doesn't solve the problem (if
> > > > there is one, which I think doesn't exist).
> > > >
> > > > > > And then devlink core chooses
> > > > > > only 6 bytes (12 characters) and there is collision. Things
> > > > > > fall apart. Since mdev provides unique uuid based scheme, it's
> > > > > > the mdev core's ownership to provide unique aliases.
> > > > >
> > > > > You're suggesting/contemplating multiple solutions here, 3-char
> > > > > prefix + 12- char sha1 vs <parent netdev> + ?-char sha1.  Also,
> > > > > the 15-char total limit is imposed by an external subsystem,
> > > > > where the vendor driver is the gateway between that subsystem
> > > > > and mdev.  How would mdev integrate with another subsystem that
> > > > > maybe only has 9-chars available?  Would the vendor driver API
> > > > > specify "I need an alias" or would it specify "I need an X-char length
> alias"?
> > > > Yes, Vendor driver should say how long the alias it wants.
> > > > However before we implement that, I suggest let such
> > > > vendor/user/driver arrive which needs that. Such variable length
> > > > alias can be added at that time and even with that alias collision
> > > > can be detected by single mdev module.
> > >
> > > If we agree that different alias lengths are possible, then I would
> > > request that minimally an mdev sample driver be modified to request
> > > an alias with a length that can be adjusted without recompiling in order
> to exercise the collision path.
> > >
> > Yes. this can be done. But I fail to understand the need to do so.
> > It is not the responsibility of the mdev core to show case sha1
> > collision efficiency/deficiency. So why do you insist exercise it?
> 
> I don't understand what you're trying to imply with "show case sha1 collision
> efficiency/deficiency".  Are you suggesting that I'm asking for this feature to
> experimentally test the probability of collisions at different character
> lengths?  We can use shell scripts for that.
> I'm simply observing that collisions are possible based on user input, but
> they're not practical to test for at the character lengths we're using.
> Therefore, how do I tell QA to develop a tests to make sure the kernel and
> userspace tools that might be involved behave correctly when this rare event
> occurs?
>
Ok. so you want to have code coverage and want to add a knob for that.
That is fine. I will have the mdev_parent->ops.alias_len as API instead of bool.
And extend mtty module parameter to set the alias length.

Unfortunately similar code coverage doesn't exist for API like mdev_get/set_iommu_device() in sample of real vendor driver.
And QA is not able to test this functionality without tainting the kernel.

^ permalink raw reply

* [PATCH 4.14] tcp: fix tcp_rtx_queue_tail in case of empty retransmit queue
From: Tim Froidcoeur @ 2019-08-24  6:03 UTC (permalink / raw)
  To: matthieu.baerts
  Cc: aprout, cpaasch, davem, edumazet, gregkh, jonathan.lemon, jtl,
	linux-kernel, mkubecek, ncardwell, sashal, stable, tim.froidcoeur,
	ycheng, netdev
In-Reply-To: <529376a4-cf63-f225-ce7c-4747e9966938@tessares.net>

Commit 8c3088f895a0 ("tcp: be more careful in tcp_fragment()")
triggers following stack trace:

[25244.848046] kernel BUG at ./include/linux/skbuff.h:1406!
[25244.859335] RIP: 0010:skb_queue_prev+0x9/0xc
[25244.888167] Call Trace:
[25244.889182]  <IRQ>
[25244.890001]  tcp_fragment+0x9c/0x2cf
[25244.891295]  tcp_write_xmit+0x68f/0x988
[25244.892732]  __tcp_push_pending_frames+0x3b/0xa0
[25244.894347]  tcp_data_snd_check+0x2a/0xc8
[25244.895775]  tcp_rcv_established+0x2a8/0x30d
[25244.897282]  tcp_v4_do_rcv+0xb2/0x158
[25244.898666]  tcp_v4_rcv+0x692/0x956
[25244.899959]  ip_local_deliver_finish+0xeb/0x169
[25244.901547]  __netif_receive_skb_core+0x51c/0x582
[25244.903193]  ? inet_gro_receive+0x239/0x247
[25244.904756]  netif_receive_skb_internal+0xab/0xc6
[25244.906395]  napi_gro_receive+0x8a/0xc0
[25244.907760]  receive_buf+0x9a1/0x9cd
[25244.909160]  ? load_balance+0x17a/0x7b7
[25244.910536]  ? vring_unmap_one+0x18/0x61
[25244.911932]  ? detach_buf+0x60/0xfa
[25244.913234]  virtnet_poll+0x128/0x1e1
[25244.914607]  net_rx_action+0x12a/0x2b1
[25244.915953]  __do_softirq+0x11c/0x26b
[25244.917269]  ? handle_irq_event+0x44/0x56
[25244.918695]  irq_exit+0x61/0xa0
[25244.919947]  do_IRQ+0x9d/0xbb
[25244.921065]  common_interrupt+0x85/0x85
[25244.922479]  </IRQ>

tcp_rtx_queue_tail() (called by tcp_fragment()) can call
tcp_write_queue_prev() on the first packet in the queue, which will trigger
the BUG in tcp_write_queue_prev(), because there is no previous packet.

This happens when the retransmit queue is empty, for example in case of a
zero window.

Patch is needed for 4.4, 4.9 and 4.14 stable branches.

Fixes: 8c3088f895a0 ("tcp: be more careful in tcp_fragment()")
Change-Id: I839bde7167ae59e2f7d916c913507372445765c5
Signed-off-by: Tim Froidcoeur <tim.froidcoeur@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Christoph Paasch <cpaasch@apple.com>
---
 include/net/tcp.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 9de2c8cdcc51..1e70ca75c8bf 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1705,6 +1705,10 @@ static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
 {
 	struct sk_buff *skb = tcp_send_head(sk);

+	/* empty retransmit queue, for example due to zero window */
+	if (skb == tcp_write_queue_head(sk))
+		return NULL;
+
 	return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk);
 }

--
2.23.0


-- 


Disclaimer: https://www.tessares.net/mail-disclaimer/ 
<https://www.tessares.net/mail-disclaimer/>



^ permalink raw reply related

* Re: Aw: [PATCH net-next v3 0/3] net: ethernet: mediatek: convert to PHYLINK
From: René van Dorst @ 2019-08-24  7:41 UTC (permalink / raw)
  To: Frank Wunderlich
  Cc: John Crispin, Sean Wang, Nelson Chang, David S . Miller,
	Matthias Brugger, netdev, linux-arm-kernel, linux-mediatek,
	linux-mips, Russell King, Stefan Roese
In-Reply-To: <trinity-df75d11a-c27f-4941-a880-b017ebabd3dc-1566583013438@3c-app-gmx-bs75>

Hi Frank,

Quoting Frank Wunderlich <frank-w@public-files.de>:

> tested on bpi-r2 (mt7623/mt7530) and bpi-r64 (mt7622/rtl8367)
>

Thanks for testing!

> as reported to rene directly rx-path needs some rework because  
> current rx-speed
> on bpi-r2 is 865 Mbits/sec instead of ~940 Mbits/sec

I still think it is a result of the extra code in the rx path when mt76x8
was introduced.

Greats,

René

>
> Tested-by: Frank Wunderlich <frank-w@public-files.de>
>
> regards Frank
>
>
>> Gesendet: Freitag, 23. August 2019 um 15:45 Uhr
>> Von: "René van Dorst" <opensource@vdorst.com>
>> An: "John Crispin" <john@phrozen.org>, "Sean Wang"  
>> <sean.wang@mediatek.com>, "Nelson Chang"  
>> <nelson.chang@mediatek.com>, "David S . Miller"  
>> <davem@davemloft.net>, "Matthias Brugger" <matthias.bgg@gmail.com>
>> Cc: netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org,  
>> linux-mediatek@lists.infradead.org, linux-mips@vger.kernel.org,  
>> "Russell King" <linux@armlinux.org.uk>, "Frank Wunderlich"  
>> <frank-w@public-files.de>, "Stefan Roese" <sr@denx.de>, "René van  
>> Dorst" <opensource@vdorst.com>
>> Betreff: [PATCH net-next v3 0/3] net: ethernet: mediatek: convert to PHYLINK
>>
>> These patches converts mediatek driver to PHYLINK API.
>>
>> v2->v3:
>> * Phylink improvements and clean-ups after review
>> v1->v2:
>> * Rebase for mt76x8 changes
>> * Phylink improvements and clean-ups after review
>> * SGMII port doesn't support 2.5Gbit in SGMII mode only in BASE-X mode.
>>   Refactor the code.
>>
>> René van Dorst (3):
>>   net: ethernet: mediatek: Add basic PHYLINK support
>>   net: ethernet: mediatek: Re-add support SGMII
>>   dt-bindings: net: ethernet: Update mt7622 docs and dts to reflect the
>>     new phylink API
>>
>>  .../arm/mediatek/mediatek,sgmiisys.txt        |   2 -
>>  .../dts/mediatek/mt7622-bananapi-bpi-r64.dts  |  28 +-
>>  arch/arm64/boot/dts/mediatek/mt7622.dtsi      |   1 -
>>  drivers/net/ethernet/mediatek/Kconfig         |   2 +-
>>  drivers/net/ethernet/mediatek/mtk_eth_path.c  |  75 +--
>>  drivers/net/ethernet/mediatek/mtk_eth_soc.c   | 529 ++++++++++++------
>>  drivers/net/ethernet/mediatek/mtk_eth_soc.h   |  68 ++-
>>  drivers/net/ethernet/mediatek/mtk_sgmii.c     |  65 ++-
>>  8 files changed, 477 insertions(+), 293 deletions(-)
>>
>> --
>> 2.20.1
>>
>>




^ permalink raw reply

* Re: [PATCH 0/3] Add NETIF_F_HW_BRIDGE feature
From: Jiri Pirko @ 2019-08-24  7:42 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Horatiu Vultur, roopa, nikolay, davem, UNGLinuxDriver,
	alexandre.belloni, allan.nielsen, netdev, linux-kernel, bridge
In-Reply-To: <e47a318c-6446-71cd-660c-8592037d8166@gmail.com>

Sat, Aug 24, 2019 at 01:25:02AM CEST, f.fainelli@gmail.com wrote:
>On 8/22/19 12:07 PM, Horatiu Vultur wrote:
>> Current implementation of the SW bridge is setting the interfaces in
>> promisc mode when they are added to bridge if learning of the frames is
>> enabled.
>> In case of Ocelot which has HW capabilities to switch frames, it is not
>> needed to set the ports in promisc mode because the HW already capable of
>> doing that. Therefore add NETIF_F_HW_BRIDGE feature to indicate that the
>> HW has bridge capabilities. Therefore the SW bridge doesn't need to set
>> the ports in promisc mode to do the switching.
>
>Then do not do anything when the ndo_set_rx_mode() for the ocelot
>network device is called and indicates that IFF_PROMISC is set and that
>your network port is a bridge port member. That is what mlxsw does AFAICT.

Correct.

>
>As other pointed out, the Linux bridge implements a software bridge by
>default, and because it needs to operate on a wide variety of network
>devices, all with different capabilities, the easiest way to make sure
>that all management (IGMP, BPDU, etc. ) as well as non-management
>traffic can make it to the bridge ports, is to put the network devices
>in promiscuous mode. If this is suboptimal for you, you can take
>shortcuts in your driver that do not hinder the overall functionality.
>
>> This optimization takes places only if all the interfaces that are part
>> of the bridge have this flag and have the same network driver.
>> 
>> If the bridge interfaces is added in promisc mode then also the ports part
>> of the bridge are set in promisc mode.
>> 
>> Horatiu Vultur (3):
>>   net: Add HW_BRIDGE offload feature
>>   net: mscc: Use NETIF_F_HW_BRIDGE
>>   net: mscc: Implement promisc mode.
>> 
>>  drivers/net/ethernet/mscc/ocelot.c | 26 ++++++++++++++++++++++++--
>>  include/linux/netdev_features.h    |  3 +++
>>  net/bridge/br_if.c                 | 29 ++++++++++++++++++++++++++++-
>>  net/core/ethtool.c                 |  1 +
>>  4 files changed, 56 insertions(+), 3 deletions(-)
>> 
>
>
>-- 
>Florian

^ permalink raw reply

* [PATCH] r8152: Set memory to all 0xFFs on failed reg reads
From: Prashant Malani @ 2019-08-24  8:36 UTC (permalink / raw)
  To: hayeswang, davem; +Cc: grundler, netdev, Prashant Malani

get_registers() blindly copies the memory written to by the
usb_control_msg() call even if the underlying urb failed.

This could lead to junk register values being read by the driver, since
some indirect callers of get_registers() ignore the return values. One
example is:
  ocp_read_dword() ignores the return value of generic_ocp_read(), which
  calls get_registers().

So, emulate PCI "Master Abort" behavior by setting the buffer to all
0xFFs when usb_control_msg() fails.

This patch is copied from the r8152 driver (v2.12.0) published by
Realtek (www.realtek.com).

Signed-off-by: Prashant Malani <pmalani@chromium.org>
---
 drivers/net/usb/r8152.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 0cc03a9ff545..eee0f5007ee3 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -799,8 +799,11 @@ int get_registers(struct r8152 *tp, u16 value, u16 index, u16 size, void *data)
 	ret = usb_control_msg(tp->udev, usb_rcvctrlpipe(tp->udev, 0),
 			      RTL8152_REQ_GET_REGS, RTL8152_REQT_READ,
 			      value, index, tmp, size, 500);
+	if (ret < 0)
+		memset(data, 0xff, size);
+	else
+		memcpy(data, tmp, size);

-	memcpy(data, tmp, size);
 	kfree(tmp);

 	return ret;
-- 
2.23.0.187.g17f5b7556c-goog

^ permalink raw reply related

* Re: [PATCH net-next v3 1/3] net: ethernet: mediatek: Add basic PHYLINK support
From: Russell King - ARM Linux admin @ 2019-08-24  9:11 UTC (permalink / raw)
  To: René van Dorst
  Cc: John Crispin, Sean Wang, Nelson Chang, David S . Miller,
	Matthias Brugger, netdev, linux-arm-kernel, linux-mediatek,
	linux-mips, Frank Wunderlich, Stefan Roese
In-Reply-To: <20190823134516.27559-2-opensource@vdorst.com>

On Fri, Aug 23, 2019 at 03:45:14PM +0200, René van Dorst wrote:
> This convert the basics to PHYLINK API.
> SGMII support is not in this patch.
> 
> Signed-off-by: René van Dorst <opensource@vdorst.com>
> --
> v2->v3:
> * Make link_down() similar as link_up() suggested by Russell King

Yep, almost there, but...

> +static void mtk_mac_link_down(struct phylink_config *config, unsigned int mode,
> +			      phy_interface_t interface)
> +{
> +	struct mtk_mac *mac = container_of(config, struct mtk_mac,
> +					   phylink_config);
> +	u32 mcr = mtk_r32(mac->hw, MTK_MAC_MCR(mac->id));
>  
> +	mcr &= (MAC_MCR_TX_EN | MAC_MCR_RX_EN);

... this clears all bits _except_ for the tx and rx enable (which will
remain set) - you probably wanted a ~ before the (.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply

* Re: [PATCH net-next v3 2/3] net: ethernet: mediatek: Re-add support SGMII
From: Russell King - ARM Linux admin @ 2019-08-24  9:21 UTC (permalink / raw)
  To: René van Dorst
  Cc: John Crispin, Sean Wang, Nelson Chang, David S . Miller,
	Matthias Brugger, netdev, linux-arm-kernel, linux-mediatek,
	linux-mips, Frank Wunderlich, Stefan Roese
In-Reply-To: <20190823134516.27559-3-opensource@vdorst.com>

On Fri, Aug 23, 2019 at 03:45:15PM +0200, René van Dorst wrote:
> +	switch (state->interface) {
> +	case PHY_INTERFACE_MODE_SGMII:
> +		phylink_set(mask, 10baseT_Half);
> +		phylink_set(mask, 10baseT_Full);
> +		phylink_set(mask, 100baseT_Half);
> +		phylink_set(mask, 100baseT_Full);

You also want 1000baseX_Full here - the connected PHY could have a fiber
interface on it.

> +		/* fall through */
> +	case PHY_INTERFACE_MODE_TRGMII:
>  		phylink_set(mask, 1000baseT_Full);

I don't know enough about this interface type to comment whether it
should support 1000baseX_Full - if this is connected to a PHY that may
support fiber, then it ought to set it.

> +		break;
> +	case PHY_INTERFACE_MODE_2500BASEX:
> +		phylink_set(mask, 2500baseX_Full);
> +		/* fall through */
> +	case PHY_INTERFACE_MODE_1000BASEX:
> +		phylink_set(mask, 1000baseX_Full);

Both should be set.  The reasoning here is that if you have a
Fiberchannel 4Gbaud SFP plugged in and connected directly to the
MAC, it can operate at either 2500Base-X or 1000Base-X.  If we
decide to operate at 2500Base-X, then PHY_INTERFACE_MODE_2500BASEX
will be chosen.  Otherwise, PHY_INTERFACE_MODE_1000BASEX will be
used.

The user can use ethtool to control which interface mode is used
by adjusting the advertise mask and/or placing the interface in
manual mode and setting the speed directly.  This will change
the PHY_INTERFACE_MODE_xxxxBASEX (via phylink_helper_basex_speed())
between the two settings.

If we lose 2500baseX_Full when 1000Base-X is selected, the user
will not be able to go back to 2500Base-X mode.

Yes, it's a little confusing and has slightly different rules
from the other modes - partly due to phylink_helper_basex_speed().
These are the only interface modes that we dynamically switch
between depending on the settings that the user configures via
ethtool.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply

* Re: 5.3-rc3-ish VM crash: RIP: 0010:tcp_trim_head+0x20/0xe0
From: Sander Eikelenboom @ 2019-08-24 10:22 UTC (permalink / raw)
  To: Eric Dumazet, netdev, linux-kernel
In-Reply-To: <59dd3497-6d08-1e0e-7a4f-b121b850a24f@gmail.com>

On 17/08/2019 18:35, Eric Dumazet wrote:
> 
> 
> On 8/17/19 10:24 AM, Sander Eikelenboom wrote:
>> On 12/08/2019 19:56, Eric Dumazet wrote:
>>>
>>>
>>> On 8/12/19 2:50 PM, Sander Eikelenboom wrote:
>>>> L.S.,
>>>>
>>>> While testing a somewhere-after-5.3-rc3 kernel (which included the latest net merge (33920f1ec5bf47c5c0a1d2113989bdd9dfb3fae9),
>>>> one of my Xen VM's (which gets quite some network load) crashed.
>>>> See below for the stacktrace.
>>>>
>>>> Unfortunately I haven't got a clear trigger, so bisection doesn't seem to be an option at the moment. 
>>>> I haven't encountered this on 5.2, so it seems to be an regression against 5.2.
>>>>
>>>> Any ideas ?
>>>>
>>>> --
>>>> Sander
>>>>
>>>>
>>>> [16930.653595] general protection fault: 0000 [#1] SMP NOPTI
>>>> [16930.653624] CPU: 0 PID: 3275 Comm: rsync Not tainted 5.3.0-rc3-20190809-doflr+ #1
>>>> [16930.653657] RIP: 0010:tcp_trim_head+0x20/0xe0
>>>> [16930.653677] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8
>>>> [16930.653741] RSP: 0000:ffffc90000003ad8 EFLAGS: 00010286
>>>> [16930.653762] RAX: fffe888005bf62c0 RBX: ffff8880115fb800 RCX: 000000008010000b
>>>
>>> crash in " mov    0x20(%rax),%eax"   and RAX=fffe888005bf62c0 (not a valid kernel address)
>>>
>>> Look like one bit corruption maybe.
>>>
>>> Nothing comes to mind really between 5.2 and 53 that could explain this.
>>>
>>>> [16930.653791] RDX: 00000000000005a0 RSI: ffff8880115fb800 RDI: ffff888016b00880
>>>> [16930.653819] RBP: ffff888016b00880 R08: 0000000000000001 R09: 0000000000000000
>>>> [16930.653848] R10: ffff88800ae00800 R11: 00000000bfe632e6 R12: 00000000000005a0
>>>> [16930.653875] R13: 0000000000000001 R14: 00000000bfe62d46 R15: 0000000000000004
>>>> [16930.653913] FS:  00007fe71fe2cb80(0000) GS:ffff88801f200000(0000) knlGS:0000000000000000
>>>> [16930.653943] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [16930.653965] CR2: 000055de0f3e7000 CR3: 0000000011f32000 CR4: 00000000000006f0
>>>> [16930.653993] Call Trace:
>>>> [16930.654005]  <IRQ>
>>>> [16930.654018]  tcp_ack+0xbb0/0x1230
>>>> [16930.654033]  tcp_rcv_established+0x2e8/0x630
>>>> [16930.654053]  tcp_v4_do_rcv+0x129/0x1d0
>>>> [16930.654070]  tcp_v4_rcv+0xac9/0xcb0
>>>> [16930.654088]  ip_protocol_deliver_rcu+0x27/0x1b0
>>>> [16930.654109]  ip_local_deliver_finish+0x3f/0x50
>>>> [16930.654128]  ip_local_deliver+0x4d/0xe0
>>>> [16930.654145]  ? ip_protocol_deliver_rcu+0x1b0/0x1b0
>>>> [16930.654163]  ip_rcv+0x4c/0xd0
>>>> [16930.654179]  __netif_receive_skb_one_core+0x79/0x90
>>>> [16930.654200]  netif_receive_skb_internal+0x2a/0xa0
>>>> [16930.654219]  napi_gro_receive+0xe7/0x140
>>>> [16930.654237]  xennet_poll+0x9be/0xae0
>>>> [16930.654254]  net_rx_action+0x136/0x340
>>>> [16930.654271]  __do_softirq+0xdd/0x2cf
>>>> [16930.654287]  irq_exit+0x7a/0xa0
>>>> [16930.654304]  xen_evtchn_do_upcall+0x27/0x40
>>>> [16930.654320]  xen_hvm_callback_vector+0xf/0x20
>>>> [16930.654339]  </IRQ>
>>>> [16930.654349] RIP: 0033:0x55de0d87db99
>>>> [16930.654364] Code: 00 00 48 89 7c 24 f8 45 39 fe 45 0f 42 fe 44 89 7c 24 f4 eb 09 0f 1f 40 00 83 e9 01 74 3e 89 f2 48 63 f8 4c 01 d2 44 38 1c 3a <75> 25 44 38 6c 3a ff 75 1e 41 0f b6 3c 24 40 38 3a 75 14 41 0f b6
>>>> [16930.654432] RSP: 002b:00007ffd5531eec8 EFLAGS: 00000a87 ORIG_RAX: ffffffffffffff0c
>>>> [16930.655004] RAX: 0000000000000002 RBX: 000055de0f3e8e50 RCX: 000000000000007f
>>>> [16930.655034] RDX: 000055de0f3dc2d2 RSI: 0000000000003492 RDI: 0000000000000002
>>>> [16930.655062] RBP: 0000000000007fff R08: 00000000000080ea R09: 00000000000001f0
>>>> [16930.655089] R10: 000055de0f3d8e40 R11: 0000000000000094 R12: 000055de0f3e0f2a
>>>> [16930.655116] R13: 0000000000000010 R14: 0000000000007f16 R15: 0000000000000080
>>>> [16930.655144] Modules linked in:
>>>> [16930.655200] ---[ end trace 533367c95501b645 ]---
>>>> [16930.655223] RIP: 0010:tcp_trim_head+0x20/0xe0
>>>> [16930.655243] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8
>>>> [16930.655312] RSP: 0000:ffffc90000003ad8 EFLAGS: 00010286
>>>> [16930.655331] RAX: fffe888005bf62c0 RBX: ffff8880115fb800 RCX: 000000008010000b
>>>> [16930.655360] RDX: 00000000000005a0 RSI: ffff8880115fb800 RDI: ffff888016b00880
>>>> [16930.655387] RBP: ffff888016b00880 R08: 0000000000000001 R09: 0000000000000000
>>>> [16930.655414] R10: ffff88800ae00800 R11: 00000000bfe632e6 R12: 00000000000005a0
>>>> [16930.655441] R13: 0000000000000001 R14: 00000000bfe62d46 R15: 0000000000000004
>>>> [16930.655475] FS:  00007fe71fe2cb80(0000) GS:ffff88801f200000(0000) knlGS:0000000000000000
>>>> [16930.655502] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [16930.655525] CR2: 000055de0f3e7000 CR3: 0000000011f32000 CR4: 00000000000006f0
>>>> [16930.655553] Kernel panic - not syncing: Fatal exception in interrupt
>>>> [16930.655789] Kernel Offset: disabled
>>>>
>>
>> Hi Eric,
>>
>> Got another VM crash, with a slightly different stacktrace this time around.
>> Still networking though.
>>
>> --
>> Sander
>>
>> [112522.697498] general protection fault: 0000 [#1] SMP NOPTI
>> [112522.697555] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc4-20190812-doflr+ #1
>> [112522.697592] RIP: 0010:skb_shift+0x63/0x430
>> [112522.697608] Code: bc 00 00 00 48 03 8f c0 00 00 00 f6 41 03 08 74 07 48 83 79 28 00 75 d0 8b 8e bc 00 00 00 48 03 8e c0 00 00 00 48 85 f6 74 0a <f6> 41 03 08 0f 85 09 03 00 00 49 89 fd 8b bf bc 00 00 00 41 89 
> 
> 
> 
> crash in "testb  $0x8,0x3(%rcx)"  with RCX==fffe8880117da6c0
> 
> Same strange looking address on x86_64
> 
> I have no idea.
> 
>> [112522.697673] RSP: 0018:ffffc900000039b0 EFLAGS: 00010286
>> [112522.697693] RAX: 00000000000005a0 RBX: ffff8880117fb800 RCX: fffe8880117da6c0
>> [112522.697721] RDX: 00000000000005a0 RSI: ffff8880117fb800 RDI: ffff88800ae58000
>> [112522.697748] RBP: ffffc900000039e8 R08: 000000000004cfe0 R09: 00000000000005a0
>> [112522.697775] R10: 00000000000005a0 R11: ffff8880117fb800 R12: 0000000000000000
>> [112522.697803] R13: 00000000c95a98c2 R14: 0000000000000000 R15: ffff88800ae58000
>> [112522.697839] FS:  0000000000000000(0000) GS:ffff88801f200000(0000) knlGS:0000000000000000
>> [112522.697869] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [112522.697895] CR2: 00007f9210d8e078 CR3: 000000000b660000 CR4: 00000000000006f0
>> [112522.697925] Call Trace:
>> [112522.697938]  <IRQ>
>> [112522.697951]  tcp_sacktag_walk+0x2af/0x480
>> [112522.697967]  tcp_sacktag_write_queue+0x34d/0x820
>> [112522.697986]  ? ip_forward_options.cold.0+0x1c/0x1c
>> [112522.698007]  tcp_ack+0xb8c/0x1230
>> [112522.698023]  ? tcp_event_new_data_sent+0x4a/0x90
>> [112522.698043]  tcp_rcv_established+0x14c/0x630
>> [112522.698064]  tcp_v4_do_rcv+0x129/0x1d0
>> [112522.698081]  tcp_v4_rcv+0xac9/0xcb0
>> [112522.698099]  ip_protocol_deliver_rcu+0x27/0x1b0
>> [112522.698119]  ip_local_deliver_finish+0x3f/0x50
>> [112522.698139]  ip_local_deliver+0x4d/0xe0
>> [112522.698155]  ? ip_protocol_deliver_rcu+0x1b0/0x1b0
>> [112522.698177]  ip_rcv+0x4c/0xd0
>> [112522.698194]  __netif_receive_skb_one_core+0x79/0x90
>> [112522.698215]  netif_receive_skb_internal+0x2a/0xa0
>> [112522.698237]  napi_gro_receive+0xe7/0x140
>> [112522.698255]  xennet_poll+0x9be/0xae0
>> [112522.698271]  net_rx_action+0x136/0x340
>> [112522.698288]  __do_softirq+0xdd/0x2cf
>> [112522.698304]  irq_exit+0x7a/0xa0
>> [112522.698321]  xen_evtchn_do_upcall+0x27/0x40
>> [112522.698340]  xen_hvm_callback_vector+0xf/0x20
>> [112522.698359]  </IRQ>
>> [112522.698373] RIP: 0010:native_safe_halt+0xe/0x10
>> [112522.698392] Code: 48 8b 04 25 c0 6b 01 00 f0 80 48 02 20 48 8b 00 a8 08 75 c4 eb 80 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 54 fb 41 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d 44 fb 41 00 f4 c3 90 90 41 55 41 54
>> [112522.699522] RSP: 0018:ffffffff82a03e90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff0c
>> [112522.699552] RAX: 0001a54800000000 RBX: 0000000000000000 RCX: 0000000000000001
>> [112522.699580] RDX: 0000000002b9f9b6 RSI: 0000000000000087 RDI: 0000000000000000
>> [112522.699608] RBP: 0000000000000000 R08: 000000001eb5c3cb R09: ffffffff82a08460
>> [112522.699634] R10: 000000000002e46e R11: 0000000000000000 R12: 0000000000000000
>> [112522.699662] R13: 0000000000000000 R14: ffffffff8326e0a0 R15: 0000000000000000
>> [112522.699692]  default_idle+0x17/0x140
>> [112522.699709]  do_idle+0x1ee/0x210
>> [112522.699726]  cpu_startup_entry+0x14/0x20
>> [112522.699743]  start_kernel+0x4e9/0x50b
>> [112522.699760]  secondary_startup_64+0xa4/0xb0
>> [112522.699780] Modules linked in:
>> [112522.699829] ---[ end trace 3b8db3603485e952 ]---
>> [112522.699850] RIP: 0010:skb_shift+0x63/0x430
>> [112522.699866] Code: bc 00 00 00 48 03 8f c0 00 00 00 f6 41 03 08 74 07 48 83 79 28 00 75 d0 8b 8e bc 00 00 00 48 03 8e c0 00 00 00 48 85 f6 74 0a <f6> 41 03 08 0f 85 09 03 00 00 49 89 fd 8b bf bc 00 00 00 41 89 d4
>> [112522.699938] RSP: 0018:ffffc900000039b0 EFLAGS: 00010286
>> [112522.699959] RAX: 00000000000005a0 RBX: ffff8880117fb800 RCX: fffe8880117da6c0
>> [112522.699986] RDX: 00000000000005a0 RSI: ffff8880117fb800 RDI: ffff88800ae58000
>> [112522.700013] RBP: ffffc900000039e8 R08: 000000000004cfe0 R09: 00000000000005a0
>> [112522.700041] R10: 00000000000005a0 R11: ffff8880117fb800 R12: 0000000000000000
>> [112522.700067] R13: 00000000c95a98c2 R14: 0000000000000000 R15: ffff88800ae58000
>> [112522.700111] FS:  0000000000000000(0000) GS:ffff88801f200000(0000) knlGS:0000000000000000
>> [112522.700140] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [112522.700165] CR2: 00007f9210d8e078 CR3: 000000000b660000 CR4: 00000000000006f0
>> [112522.700201] Kernel panic - not syncing: Fatal exception in interrupt
>> [112522.702992] Kernel Offset: disabled
>>
>>

And another one (on an RC5 kernel now), but I probably know the answer :( ...
Is there any extra debugging options I can turn on that could shed some more light on this ?
(it doesn't seem to be an incident and happens at about 3 to 4 days of uptime, so bisecting is not a realistic option).

--
Sander



[327011.399239] general protection fault: 0000 [#1] SMP NOPTI
[327011.399271] CPU: 0 PID: 19440 Comm: rsync Not tainted 5.3.0-rc5-20190820-doflr+ #1
[327011.399309] RIP: 0010:skb_shift+0x63/0x430
[327011.399326] Code: bc 00 00 00 48 03 8f c0 00 00 00 f6 41 03 08 74 07 48 83 79 28 00 75 d0 8b 8e bc 00 00 00 48 03 8e c0 00 00 00 48 85 f6 74 0a <f6> 41 03 08 0f 85 09 03 00 00 49 89 fd 8b bf bc 00 00 00 41 89 d4
[327011.399407] RSP: 0000:ffffc900000039b0 EFLAGS: 00010286
[327011.399427] RAX: 00000000000005a0 RBX: ffff8880115fb800 RCX: fffe8880115152c0
[327011.399456] RDX: 00000000000005a0 RSI: ffff8880115fb800 RDI: ffff88801b0f2200
[327011.399485] RBP: ffffc900000039e8 R08: 0000000000070260 R09: 00000000000005a0
[327011.399515] R10: 00000000000005a0 R11: ffff8880115fb800 R12: 0000000000000000
[327011.399544] R13: 000000000ca599b0 R14: 0000000000000000 R15: ffff88801b0f2200
[327011.399581] FS:  00007f20c1b8bb80(0000) GS:ffff88801f200000(0000) knlGS:0000000000000000
[327011.399610] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[327011.399633] CR2: 00007ffac4c245cc CR3: 0000000005544000 CR4: 00000000000006f0
[327011.399663] Call Trace:
[327011.399675]  <IRQ>
[327011.399688]  tcp_sacktag_walk+0x2af/0x480
[327011.399705]  ? __alloc_skb+0x52/0x1d0
[327011.399721]  tcp_sacktag_write_queue+0x34d/0x820
[327011.399743]  ? xennet_tx_setup_grant+0xbd/0x140
[327011.399762]  tcp_ack+0xb8c/0x1230
[327011.399778]  ? gnttab_foreach_grant_in_range+0x83/0xf0
[327011.399799]  ? xennet_tx_setup_grant+0x140/0x140
[327011.399819]  ? xennet_make_txreqs+0x81/0xb0
[327011.399838]  tcp_rcv_established+0x14c/0x630
[327011.399860]  tcp_v4_do_rcv+0x129/0x1d0
[327011.399876]  tcp_v4_rcv+0xac9/0xcb0
[327011.399892]  ip_protocol_deliver_rcu+0x27/0x1b0
[327011.399912]  ip_local_deliver_finish+0x3f/0x50
[327011.399932]  ip_local_deliver+0x4d/0xe0
[327011.399957]  ? ip_protocol_deliver_rcu+0x1b0/0x1b0
[327011.399977]  ip_rcv+0x4c/0xd0
[327011.400312]  __netif_receive_skb_one_core+0x79/0x90
[327011.400347]  netif_receive_skb_internal+0x2a/0xa0
[327011.400367]  napi_gro_receive+0xe7/0x140
[327011.400384]  xennet_poll+0x9be/0xae0
[327011.400401]  net_rx_action+0x136/0x340
[327011.400419]  __do_softirq+0xdd/0x2cf
[327011.400437]  irq_exit+0x7a/0xa0
[327011.400454]  xen_evtchn_do_upcall+0x27/0x40
[327011.400474]  xen_hvm_callback_vector+0xf/0x20
[327011.400501]  </IRQ>
[327011.400514] RIP: 0033:0x5620499a31aa
[327011.400532] Code: 43 4c 0f b6 34 0e 8b 8b 80 00 00 00 d3 e7 89 f9 48 8b 7b 60 31 f1 23 4b 7c 48 8b 73 68 83 fd 07 89 4b 70 48 8d 0c 4e 0f b7 31 <66> 42 89 34 47 66 89 11 0f 84 88 01 00 00 8b 8b 90 00 00 00 8b 83
[327011.400606] RSP: 002b:00007ffe82874060 EFLAGS: 00000297 ORIG_RAX: ffffffffffffff0c
[327011.400638] RAX: 0000000000001f7c RBX: 000056204a861190 RCX: 000056204a8cd986
[327011.400671] RDX: 000000000000e084 RSI: 0000000000000000 RDI: 000056204a8bae50
[327011.400703] RBP: 0000000000000000 R08: 0000000000006084 R09: 0000000000003fdf
[327011.400735] R10: 0000000000000000 R11: 0000000000000001 R12: 00005620499b9500
[327011.400769] R13: 000000000000206a R14: 00000000ffffffff R15: 000056204a861190
[327011.400802] Modules linked in:
[327011.400884] ---[ end trace 660afb6bf8586996 ]---
[327011.400910] RIP: 0010:skb_shift+0x63/0x430
[327011.400929] Code: bc 00 00 00 48 03 8f c0 00 00 00 f6 41 03 08 74 07 48 83 79 28 00 75 d0 8b 8e bc 00 00 00 48 03 8e c0 00 00 00 48 85 f6 74 0a <f6> 41 03 08 0f 85 09 03 00 00 49 89 fd 8b bf bc 00 00 00 41 89 d4
[327011.401012] RSP: 0000:ffffc900000039b0 EFLAGS: 00010286
[327011.401042] RAX: 00000000000005a0 RBX: ffff8880115fb800 RCX: fffe8880115152c0
[327011.401081] RDX: 00000000000005a0 RSI: ffff8880115fb800 RDI: ffff88801b0f2200
[327011.401116] RBP: ffffc900000039e8 R08: 0000000000070260 R09: 00000000000005a0
[327011.401157] R10: 00000000000005a0 R11: ffff8880115fb800 R12: 0000000000000000
[327011.401198] R13: 000000000ca599b0 R14: 0000000000000000 R15: ffff88801b0f2200
[327011.401244] FS:  00007f20c1b8bb80(0000) GS:ffff88801f200000(0000) knlGS:0000000000000000
[327011.401293] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[327011.404822] CR2: 00007ffac4c245cc CR3: 0000000005544000 CR4: 00000000000006f0
[327011.404866] Kernel panic - not syncing: Fatal exception in interrupt
[327011.405103] Kernel Offset: disabled

^ permalink raw reply

* Re: [RFC 4/4] net: cdc_ncm: Add ACPI MAC address pass through functionality
From: Bjørn Mork @ 2019-08-24 10:43 UTC (permalink / raw)
  To: Charles.Hyde
  Cc: linux-usb, linux-acpi, gregkh, Mario.Limonciello, oliver, netdev,
	nic_swsd
In-Reply-To: <ec7435e0529243a99f6949ee9efbede5@AUSX13MPS303.AMER.DELL.COM>

<Charles.Hyde@dellteam.com> writes:

> @@ -930,11 +931,18 @@ int cdc_ncm_bind_common(struct usbnet *dev, struct usb_interface *intf, u8 data_
>  	usb_set_intfdata(ctx->control, dev);
>  
>  	if (ctx->ether_desc) {
> +		struct sockaddr sa;
> +
>  		temp = usbnet_get_ethernet_addr(dev, ctx->ether_desc->iMACAddress);
>  		if (temp) {
>  			dev_dbg(&intf->dev, "failed to get mac address\n");
>  			goto error2;
>  		}
> +		if (get_acpi_mac_passthru(&intf->dev, &sa) == 0) {
> +			memcpy(dev->net->dev_addr, sa.sa_data, ETH_ALEN);
> +			if (usbnet_set_ethernet_addr(dev) < 0)
> +				usbnet_get_ethernet_addr(dev, ctx->ether_desc->iMACAddress);
> +		}
>  		dev_info(&intf->dev, "MAC-Address: %pM\n", dev->net->dev_addr);
>  	}

So you want to run a Dell specific ACPI method every time anyone plugs
some NCM class device into a host supporing ACPI?  That's going to annoy
the hell out of 99.9997% of the x86, ia64 and arm64 users of this
driver.

Call ACPI once when the driver loads, and only if running on an actual
Dell system where this method is supported.  There must be some ACPI
device ID you can match on to know if this method is supported or not?


Bjørn

^ permalink raw reply

* Re: [PATCH 12/16] arm64: prefer __section from compiler_attributes.h
From: Will Deacon @ 2019-08-24 11:25 UTC (permalink / raw)
  To: Miguel Ojeda
  Cc: Nick Desaulniers, Andrew Morton, Sedat Dilek, Josh Poimboeuf,
	Yonghong Song, clang-built-linux, Catalin Marinas,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Andrey Konovalov, Greg Kroah-Hartman, Enrico Weigelt,
	Suzuki K Poulose, Thomas Gleixner, Masayoshi Mizuma,
	Shaokun Zhang, Alexios Zavras, Allison Randal, Linux ARM,
	linux-kernel, Network Development, bpf
In-Reply-To: <CANiq72nfn4zxAO63GEEoUjumC6Jwi5_jdcD_5Xzt1vZRgh52fg@mail.gmail.com>

On Fri, Aug 23, 2019 at 09:35:08PM +0200, Miguel Ojeda wrote:
> On Thu, Aug 15, 2019 at 11:12 AM Miguel Ojeda
> <miguel.ojeda.sandonis@gmail.com> wrote:
> >
> > Btw, I guess that is the Oops you were mentioning in the cover letter?
> 
> Pinging about this...

Which bit are you pinging about? This patch (12/16) has been in -next for a
while and is queued in the arm64 tree for 5.4. The Oops/boot issue is
addressed in patch 14 which probably needs to be sent as a separate patch
(with a commit message) if it's targetting 5.3 and, I assume, routed via
somebody like akpm.

Will

^ permalink raw reply

* Fw: [Bug 204681] New: Kernel BUG/Oops:  tc qdisc delete with tc filter action xt -j CONNMARK
From: Stephen Hemminger @ 2019-08-24 11:57 UTC (permalink / raw)
  To: netdev



Begin forwarded message:

Date: Fri, 23 Aug 2019 23:44:26 +0000
From: bugzilla-daemon@bugzilla.kernel.org
To: stephen@networkplumber.org
Subject: [Bug 204681] New: Kernel BUG/Oops:  tc qdisc delete with tc filter action xt -j CONNMARK


https://bugzilla.kernel.org/show_bug.cgi?id=204681

            Bug ID: 204681
           Summary: Kernel BUG/Oops:  tc qdisc delete with tc filter
                    action xt -j CONNMARK
           Product: Networking
           Version: 2.5
    Kernel Version: 5.2.6 (x86_64) / 4.19.62 (mips32)/ 4.14.133 (arm7l)
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: Other
          Assignee: stephen@networkplumber.org
          Reporter: itugrok@yahoo.com
        Regression: No

Created attachment 284581
  --> https://bugzilla.kernel.org/attachment.cgi?id=284581&action=edit  
BUG/Oops: kernel 5.2.6/x86_64

Overview:
=========

Several uses of "tc filter .. action xt" work as expected and also allow final
qdisc/filter deletion: e.g. xt_DSCP and xt_CLASSIFY.

However, trying to delete a qdisc/filter using xt_CONNMARK results in a kernel
oops or hang/crash on all platforms and kernel versions tested.


Steps to Reproduce:
===================

# tc qdisc add dev lo clsact
# tc filter add dev lo egress protocol ip matchall action xt -j CONNMARK
--save-mark
# tc qdisc del dev lo clsact
<Kernel Oops>


Systems Tested:
===============

Ubuntu 18.04 LTS (mainline kernel 5.2.6/x86_64, iptables 1.6.1, iproute2 4.15)
(Kernel build: https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.2.6/)

Ubuntu 18.04 LTS (distro kernel 4.15/x86_64, iptables 1.6.1, iproute2 4.15)

OpenWrt master, r10666-fc5d46dc62 (kernel 4.19.62, mips32_be, iptables 1.8.3,
iproute2 5.0.0)

OpenWrt 19.07, r10324-8bf8de95a2 (kernel 4.14.133, armv7l, iptables 1.8.3,
iproute2 5.0.0)


Kernel Logs:
============

The clearest call traces are from the Ubuntu systems, and are attached.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply

* Fw: [Bug 204669] New: TLS: module crash with TLS record double free
From: Stephen Hemminger @ 2019-08-24 12:03 UTC (permalink / raw)
  To: netdev

This may have already been fixed, but forwarding to check.

Begin forwarded message:

Date: Thu, 22 Aug 2019 12:34:10 +0000
From: bugzilla-daemon@bugzilla.kernel.org
To: stephen@networkplumber.org
Subject: [Bug 204669] New: TLS: module crash with TLS record double free


https://bugzilla.kernel.org/show_bug.cgi?id=204669

            Bug ID: 204669
           Summary: TLS: module crash with TLS record double free
           Product: Networking
           Version: 2.5
    Kernel Version: v5.3-rc4
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: blocking
          Priority: P1
         Component: IPV4
          Assignee: stephen@networkplumber.org
          Reporter: mallesh537@gmail.com
        Regression: No

Created attachment 284565
  --> https://bugzilla.kernel.org/attachment.cgi?id=284565&action=edit  
tls crash log

TLS module is crashing While running SSL record encryption using
Klts_send_[file] 

Precondition:
1) Installed 5.3-rc4.
2) Nitrox5 card pluggin.


Steps to produce the issue:
1) Install n5pf.ko.(drivers/crypto/cavium/nitrox)
2) Install tls.ko if not is installed by default(net/tls)
3) Taken uperf tool from git.
   3.1) Modified uperf to use tls module by using setsocket.
   3.2) Modified uperf tool to support sendfile with SSL.


Test:
1) Running uperf with 4threads.
2) Each Thread send the data using sendfile over SSL protocol.


After few seconds kernel is crashing because of record list corruption


[  270.888952] ------------[ cut here ]------------
[  270.890450] list_del corruption, ffff91cc3753a800->prev is LIST_POISON2
(dead000000000122)
[  270.891194] WARNING: CPU: 1 PID: 7387 at lib/list_debug.c:50
__list_del_entry_valid+0x62/0x90
[  270.892037] Modules linked in: n5pf(OE) netconsole tls(OE) bonding
intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp
coretemp kvm_intel kvm iTCO_wdt iTCO_vendor_support irqbypass crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd mei_me cryptd
glue_helper ipmi_si sg mei lpc_ich pcspkr joydev ioatdma i2c_i801 ipmi_devintf
ipmi_msghandler wmi ip_tables xfs libcrc32c sd_mod mgag200 drm_vram_helper ttm
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm isci libsas
ahci scsi_transport_sas libahci crc32c_intel serio_raw igb libata ptp pps_core
dca i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
nitrox_drv]
[  270.896836] CPU: 1 PID: 7387 Comm: uperf Kdump: loaded Tainted: G          
OE     5.3.0-rc4 #1
[  270.897711] Hardware name: Supermicro SYS-1027R-N3RF/X9DRW, BIOS 3.0c
03/24/2014
[  270.898597] RIP: 0010:__list_del_entry_valid+0x62/0x90
[  270.899478] Code: 00 00 00 c3 48 89 fe 48 89 c2 48 c7 c7 e0 f9 ee 8d e8 b2
cf c8 ff 0f 0b 31 c0 c3 48 89 fe 48 c7 c7 18 fa ee 8d e8 9e cf c8 ff <0f> 0b 31
c0 c3 48 89 f2 48 89 fe 48 c7 c7 50 fa ee 8d e8 87 cf c8
[  270.901321] RSP: 0018:ffffb6ea86eb7c20 EFLAGS: 00010282
[  270.902240] RAX: 0000000000000000 RBX: ffff91cc3753c000 RCX:
0000000000000000
[  270.903157] RDX: ffff91bc3f867080 RSI: ffff91bc3f857738 RDI:
ffff91bc3f857738
[  270.904074] RBP: ffff91bc36020940 R08: 0000000000000560 R09:
0000000000000000
[  270.904988] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
[  270.905902] R13: ffff91cc3753a800 R14: ffff91cc37cc6400 R15:
ffff91cc3753a800
[  270.906809] FS:  00007f454a88d700(0000) GS:ffff91bc3f840000(0000)
knlGS:0000000000000000
[  270.907715] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  270.908606] CR2: 00007f453c00292c CR3: 000000103554e003 CR4:
00000000001606e0
[  270.909490] Call Trace:
[  270.910373]  tls_tx_records+0x138/0x1c0 [tls]
[  270.911262]  tls_sw_sendpage+0x3e0/0x420 [tls]
[  270.912154]  inet_sendpage+0x52/0x90
[  270.913045]  ? direct_splice_actor+0x40/0x40
[  270.913941]  kernel_sendpage+0x1a/0x30
[  270.914831]  sock_sendpage+0x20/0x30
[  270.915714]  pipe_to_sendpage+0x62/0x90
[  270.916592]  __splice_from_pipe+0x80/0x180
[  270.917461]  ? direct_splice_actor+0x40/0x40
[  270.918334]  splice_from_pipe+0x5d/0x90
[  270.919208]  direct_splice_actor+0x35/0x40
[  270.920086]  splice_direct_to_actor+0x103/0x230
[  270.920966]  ? generic_pipe_buf_nosteal+0x10/0x10
[  270.921850]  do_splice_direct+0x9a/0xd0
[  270.922733]  do_sendfile+0x1c9/0x3d0
[  270.923612]  __x64_sys_sendfile64+0x5c/0xc0


(gdb) list *(tls_tx_records+0x138)
0x2d18 is in tls_tx_records (./include/linux/list.h:131).
126      * Note: list_empty() on entry does not return true after this, the
entry is
127      * in an undefined state.
128      */
129     static inline void __list_del_entry(struct list_head *entry)
130     {
131             if (!__list_del_entry_valid(entry))
132                     return;
133     
134             __list_del(entry->prev, entry->next);
135     }
(gdb) 
(gdb) list *(tls_sw_sendpage+0x3e0)
0x48e0 is in tls_sw_sendpage (/home/mjatharkonda/5_3_rc4/tls/tls_sw.c:1211).
1206    
1207            if (num_async) {
1208                    /* Transmit if any encryptions have completed */
1209                    if (test_and_clear_bit(BIT_TX_SCHEDULED,
&ctx->tx_bitmask)) {
1210                            cancel_delayed_work(&ctx->tx_work.work);
1211                            tls_tx_records(sk, flags);
1212                    }
1213            }
1214    sendpage_end:
1215            ret = sk_stream_error(sk, flags, ret);
(gdb) 



Attached complete crash log

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply

* Re: [Bridge] [PATCH 0/3] Add NETIF_F_HW_BRIDGE feature
From: Stephen Hemminger @ 2019-08-24 12:05 UTC (permalink / raw)
  To: Horatiu Vultur
  Cc: roopa, nikolay, davem, UNGLinuxDriver, alexandre.belloni,
	allan.nielsen, netdev, linux-kernel, bridge
In-Reply-To: <1566500850-6247-1-git-send-email-horatiu.vultur@microchip.com>

On Thu, 22 Aug 2019 21:07:27 +0200
Horatiu Vultur <horatiu.vultur@microchip.com> wrote:

> Current implementation of the SW bridge is setting the interfaces in
> promisc mode when they are added to bridge if learning of the frames is
> enabled.
> In case of Ocelot which has HW capabilities to switch frames, it is not
> needed to set the ports in promisc mode because the HW already capable of
> doing that. Therefore add NETIF_F_HW_BRIDGE feature to indicate that the
> HW has bridge capabilities. Therefore the SW bridge doesn't need to set
> the ports in promisc mode to do the switching.
> This optimization takes places only if all the interfaces that are part
> of the bridge have this flag and have the same network driver.
> 
> If the bridge interfaces is added in promisc mode then also the ports part
> of the bridge are set in promisc mode.
> 
> Horatiu Vultur (3):
>   net: Add HW_BRIDGE offload feature
>   net: mscc: Use NETIF_F_HW_BRIDGE
>   net: mscc: Implement promisc mode.
> 
>  drivers/net/ethernet/mscc/ocelot.c | 26 ++++++++++++++++++++++++--
>  include/linux/netdev_features.h    |  3 +++
>  net/bridge/br_if.c                 | 29 ++++++++++++++++++++++++++++-
>  net/core/ethtool.c                 |  1 +
>  4 files changed, 56 insertions(+), 3 deletions(-)
> 

IMHO there are already enough nerd knobs in bridge to support this.
If you hardware can't do real bridging, it is only doing MACVLAN so that
is what you should support instead.

^ permalink raw reply

* Re: [PATCH spi for-5.4 0/5] Deterministic SPI latency with NXP DSPI driver
From: Vladimir Oltean @ 2019-08-24 12:13 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Mark Brown, Hubert Feurstein, Miroslav Lichvar, Andrew Lunn,
	Florian Fainelli, linux-spi, netdev
In-Reply-To: <20190823052217.GD2502@localhost>

On Fri, 23 Aug 2019 at 08:22, Richard Cochran <richardcochran@gmail.com> wrote:
>
> On Thu, Aug 22, 2019 at 07:13:12PM +0300, Vladimir Oltean wrote:
> > You do think that I understand the problem? But I don't!
>
> ;^)
>
> > > And who generates Local_sync_resp?
> > >
> >
> > Local_sync_resp is the same as Local_sync_req except maybe with a
> > custom tag added by the switch. Irrelevant as long as the DSA master
> > can timestamp it.
>
> So this is point why it won't work.  The time stamping logic in the
> switch only recognizes PTP frames.
>
> Thanks,
> Richard

So to summarize the pros and cons of a PHC-to-PHC loopback sync:
Pros:
- At least two orders of magnitude improvement in offset compared to a
software timestamping solution, even in the situation where the
software timestamp is fully optimized
- Does not depend on the availability of PPS hardware
- In the case where both MACs support this, the synchronization can
simply reuse the DSA link with no dedicated circuitry
Cons:
- DSA framework for retrieving timestamps would need to be reworked
- The solution would have to be implemented in the kernel
- A separate protocol from PTP would have to be devised
- Not all DSA masters support hardware timestamping. Of those that do,
not all may support timestamping generic frames
- Not all PTP-capable DSA switches support timestamping generic frames
- Not all DSA switches may be able to loop back traffic from their CPU
port. I think this is called "hairpinning".
- The solution only covers the sync of the top-most switch in the DSA
tree. The hairpinning described above would need to be selective as
well, not just possible.

So at this point, the solution is not generic enough for me to be
compelled to prototype it. Taking system clock timestamps in the SPI
driver is "good enough". We'll just need to work out a way with Mark
that this can be added to the SPI subsystem, given the valid
objections already expressed.

Thanks,
-Vladimir

^ permalink raw reply

* pull-request: ieee802154 for net 2019-08-24
From: Stefan Schmidt @ 2019-08-24 12:19 UTC (permalink / raw)
  To: davem; +Cc: linux-wpan, alex.aring, netdev

Hello Dave.

An update from ieee802154 for your *net* tree.

Yue  Haibing fixed two bugs discovered by KASAN in the hwsim driver for
ieee802154 and Colin Ian King cleaned up a redundant variable assignment.

If there are any problems let me know.

regards
Stefan Schmidt

The following changes since commit 6c0afef5fb0c27758f4d52b2210c61b6bd8b4470:

  ipv6/flowlabel: wait rcu grace period before put_pid() (2019-04-29 23:30:13 -0400)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan.git ieee802154-for-davem-2019-08-24

for you to fetch changes up to 074014abdf2bd2a00da3dd14a6ae04cafc1d62cc:

  net: ieee802154: remove redundant assignment to rc (2019-08-14 01:10:41 +0200)

----------------------------------------------------------------
Colin Ian King (1):
      net: ieee802154: remove redundant assignment to rc

YueHaibing (2):
      ieee802154: hwsim: Fix error handle path in hwsim_init_module
      ieee802154: hwsim: unregister hw while hwsim_subscribe_all_others fails

 drivers/net/ieee802154/mac802154_hwsim.c | 8 +++++---
 net/ieee802154/socket.c                  | 2 +-
 2 files changed, 6 insertions(+), 4 deletions(-)

^ permalink raw reply

* Re: [PATCH net-next v3 1/3] net: ethernet: mediatek: Add basic PHYLINK support
From: René van Dorst @ 2019-08-24 12:33 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: John Crispin, Sean Wang, Nelson Chang, David S . Miller,
	Matthias Brugger, netdev, linux-arm-kernel, linux-mediatek,
	linux-mips, Frank Wunderlich, Stefan Roese
In-Reply-To: <20190824091106.GC13294@shell.armlinux.org.uk>

Hi Russell,

Quoting Russell King - ARM Linux admin <linux@armlinux.org.uk>:

> On Fri, Aug 23, 2019 at 03:45:14PM +0200, René van Dorst wrote:
>> This convert the basics to PHYLINK API.
>> SGMII support is not in this patch.
>>
>> Signed-off-by: René van Dorst <opensource@vdorst.com>
>> --
>> v2->v3:
>> * Make link_down() similar as link_up() suggested by Russell King
>
> Yep, almost there, but...
>
>> +static void mtk_mac_link_down(struct phylink_config *config,  
>> unsigned int mode,
>> +			      phy_interface_t interface)
>> +{
>> +	struct mtk_mac *mac = container_of(config, struct mtk_mac,
>> +					   phylink_config);
>> +	u32 mcr = mtk_r32(mac->hw, MTK_MAC_MCR(mac->id));
>>
>> +	mcr &= (MAC_MCR_TX_EN | MAC_MCR_RX_EN);
>
> ... this clears all bits _except_ for the tx and rx enable (which will
> remain set) - you probably wanted a ~ before the (.

Yes that is what it should be.
I only want to clear the TX_EN en RX_EN bits.

Greats,

René

>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> According to speedtest.net: 11.9Mbps down 500kbps up




^ permalink raw reply

* Re: [PATCH spi for-5.4 2/5] spi: Add a PTP system timestamp to the transfer structure
From: Vladimir Oltean @ 2019-08-24 12:38 UTC (permalink / raw)
  To: Mark Brown
  Cc: Hubert Feurstein, Miroslav Lichvar, Richard Cochran, Andrew Lunn,
	Florian Fainelli, linux-spi, netdev
In-Reply-To: <20190822171137.GB23391@sirena.co.uk>

On Thu, 22 Aug 2019 at 21:19, Mark Brown <broonie@kernel.org> wrote:
>
> On Sun, Aug 18, 2019 at 09:25:57PM +0300, Vladimir Oltean wrote:
>
> > @@ -1391,6 +1402,13 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
> >               goto out;
> >       }
> >
> > +     if (!ctlr->ptp_sts_supported) {
> > +             list_for_each_entry(xfer, &mesg->transfers, transfer_list) {
> > +                     xfer->ptp_sts_word_pre = 0;
> > +                     ptp_read_system_prets(xfer->ptp_sts);
> > +             }
> > +     }
> > +
>
> We can do better than this for controllers which use transfer_one().
>

You mean I should guard this "if", and the one below, with "&&
!ctlr->transfer_one"?

> > +const void *spi_xfer_ptp_sts_word(struct spi_transfer *xfer, bool pre)
> > +{
>
> xfer can be const here too.
>
> > + * @ptp_sts_supported: If the driver sets this to true, it must provide a
> > + *   time snapshot in @spi_transfer->ptp_sts as close as possible to the
> > + *   moment in time when @spi_transfer->ptp_sts_word_pre and
> > + *   @spi_transfer->ptp_sts_word_post were transmitted.
> > + *   If the driver does not set this, the SPI core takes the snapshot as
> > + *   close to the driver hand-over as possible.
>
> A couple of issues here.  The big one is that for PIO transfers
> this is going to either complicate the code or introduce overhead
> in individual drivers for an extremely niche use case.  I guess
> most drivers won't implement it which makes this a bit moot but
> then this is a concern that pushes back against the idea of
> implementing the feature.
>

The concern is the overhead in terms of code, or runtime performance?
Arguably the applications that require deterministic latency are
actually going to push for overall less overhead at runtime, even if
that comes at a cost in terms of code size. The spi-fsl-dspi driver
does not perform worse by any metric after this rework.

> The other is that it's not 100% clear what you're looking to
> timestamp here - is it when the data goes on the wire, is it when
> the data goes on the FIFO (which could be relatively large)?  I'm
> guessing you're looking for the physical transfer here, if that's
> the case should there be some effort to compensate for the delays
> in the controller?

The goal is to timestamp the moment when the SPI slave sees word N of
the data. Luckily the DSPI driver raises the TCF (Transfer Complete
Flag) once that word has been transmitted, which I used to my
advantage. The EOQ mode behaves similarly, but has a granularity of 4
words. The controller delays are hence implicitly included in the
software timestamp.
But the question is valid and I expect that such compensation might be
needed for some hardware, provided that it can be measured and
guaranteed. In fact Hubert did add such logic to the v3 of his MDIO
patch: https://lkml.org/lkml/2019/8/20/195 There were some objections
mainly related to the certainty of those offset corrections. I don't
want to "future-proof" the API now with features I have no use of, but
such compensation logic might come in the future.

Regards,
-Vladimir

^ permalink raw reply

* Re: [PATCH 12/16] arm64: prefer __section from compiler_attributes.h
From: Miguel Ojeda @ 2019-08-24 12:48 UTC (permalink / raw)
  To: Will Deacon
  Cc: Nick Desaulniers, Andrew Morton, Sedat Dilek, Josh Poimboeuf,
	Yonghong Song, clang-built-linux, Catalin Marinas,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Andrey Konovalov, Greg Kroah-Hartman, Enrico Weigelt,
	Suzuki K Poulose, Thomas Gleixner, Masayoshi Mizuma,
	Shaokun Zhang, Alexios Zavras, Allison Randal, Linux ARM,
	linux-kernel, Network Development, bpf
In-Reply-To: <20190824112542.7guulvdenm35ihs7@willie-the-truck>

On Sat, Aug 24, 2019 at 1:25 PM Will Deacon <will@kernel.org> wrote:
>
> Which bit are you pinging about? This patch (12/16) has been in -next for a
> while and is queued in the arm64 tree for 5.4. The Oops/boot issue is
> addressed in patch 14 which probably needs to be sent as a separate patch
> (with a commit message) if it's targetting 5.3 and, I assume, routed via
> somebody like akpm.

I was pinging about the bit I was quoting, i.e. whether the Oops in
the cover letter was #14 indeed. Also, since Nick said he wanted to
get this ASAP through compiler-attributes, I assumed he wanted it to
be in 5.3, but I have not seen the independent patch.

Since he seems busy, I will write a better commit message myself and
send it to Linus next week.

Cheers,
Miguel

^ permalink raw reply

* Re: [RFC PATCH 1/2] gianfar: convert to phylink
From: Vladimir Oltean @ 2019-08-24 12:49 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Arseny Solokha, Claudiu Manoil, Ioana Ciornei, Andrew Lunn,
	netdev, Florian Fainelli
In-Reply-To: <20190730102329.GZ1330@shell.armlinux.org.uk>

Hi Russell,

On Tue, 30 Jul 2019 at 13:23, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Tue, Jul 30, 2019 at 02:39:58AM +0300, Vladimir Oltean wrote:
> > To be honest I don't have a complete answer to that question. The
> > literature recommends writing 0x01a0 to the MII_ADVERTISE (0x4)
> > register of the MAC PCS for 1000Base-X, and 0x4001 for SGMII.
>
> That looks entirely sane for both modes.
>
> 0x01a0 for 802.3z (1000BASE-X) is defined in 802.3 37.2.5.1.3 as:
>         bit 5 - full duplex = 1
>         bit 6 - half duplex = 0
>         bit 7 - pause = 1
>         bit 8 - asym_pause = 1
>
> The description of the bits match the config word that is sent in the
> link with the exception of bit 14, which is the acknowledgement bit.
> Normally, in 802.3z, bit 14 will not be set in the transmitted config
> word until we have received the config word from the other end of the
> link.
>
> For SGMII, 0x4001 is the acknowledgement code word, which is defined
> in the SGMII spec as "tx_config_Reg[15:0] sent from the MAC to the
> PHY" which requires:
>         bit 0 - must be 1
>         bit 1 .. 13, 15 - must be 0, reserved for future use
>         bit 14 - must be 1 (auto-negotiation acknowledgement in 802.3z)
>
> > The FMan driver which uses the TSEC MAC does exactly that:
> > https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/drivers/net/ethernet/freescale/fman/fman_dtsec.c#n58
> > However I can't seem to be able to trace down the definition of bit 14
> > from 0x4001 - it's reserved in all the manuals I see. I have a hunch
> > that it selects the format of the Config_Reg base page between
> > 1000Base-X and SGMII.
>
> It could be that is what bit 14 is being used for, or it could just be
> that they've taken the values from the appropriate specs, and the
> hardware behaves the same way irrespective of whether it is in SGMII
> or 1000BASE-X mode.
>

Mystery solved - indeed it appears that there is no hardware logic for
propagating the AN results from the PCS to the MAC. Hence there is no
hardware distinction between 1000Base-X and SGMII. That must be
handled in PHYLINK.

> > > +static int gfar_mac_link_state(struct phylink_config *config,
> > > +                              struct phylink_link_state *state)
> > > +{
> > > +       if (state->interface == PHY_INTERFACE_MODE_SGMII ||
> > > +           state->interface == PHY_INTERFACE_MODE_1000BASEX) {
> >
> > What if you reduce the indentation level by 1 here, by just exiting if
> > the interface mode is not SGMII?
>
> It's only called for in-band negotiation modes anyway, so the above
> protection probably doesn't gain much.
>

Or that.

> > > +               struct gfar_private *priv =
> > > +                       netdev_priv(to_net_dev(config->dev));
> > > +               u16 tbi_cr;
> > > +
> > > +               if (!priv->tbi_phy)
> > > +                       return -ENODEV;
> > > +
> > > +               tbi_cr = phy_read(priv->tbi_phy, MII_TBI_CR);
> > > +
> > > +               state->duplex = !!(tbi_cr & TBI_CR_FULL_DUPLEX);
> >
> > Woah there. Aren't you supposed to first ensure state->an_complete is
> > ok, based on TBI_MII_Register_Set_SR[AN_Done]? There's also a
> > Link_Status bit in that register that you could retrieve.
>
> Indeed.
>
> > > +               if ((tbi_cr & TBI_CR_SPEED_1000_MASK) == TBI_CR_SPEED_1000_MASK)
> > > +                       state->speed = SPEED_1000;
> > > +       }
> >
> > See the Speed_Bit table from TBI_MII_Register_Set_ANLPBPA_SGMII for
> > the link partner (aka SGMII PHY) advertisement. You have to do a
> > logical-and between that and your own. Also please make sure you
> > really are in SGMII AN and not 1000 Base-X when interpreting registers
> > 4 & 5 as one way or another.
>
> From what you've said above, yes, this needs to do exactly that.
>
> > > -static noinline void gfar_update_link_state(struct gfar_private *priv)
> > > +static void gfar_mac_config(struct phylink_config *config, unsigned int mode,
> > > +                           const struct phylink_link_state *state)
> > >  {
> > > +       struct gfar_private *priv = netdev_priv(to_net_dev(config->dev));
> > >         struct gfar __iomem *regs = priv->gfargrp[0].regs;
> > > -       struct net_device *ndev = priv->ndev;
> > > -       struct phy_device *phydev = ndev->phydev;
> > > -       struct gfar_priv_rx_q *rx_queue = NULL;
> > > -       int i;
> > > +       u32 maccfg1, new_maccfg1;
> > > +       u32 maccfg2, new_maccfg2;
> > > +       u32 ecntrl, new_ecntrl;
> > > +       u32 tx_flow, new_tx_flow;
> >
> > Don't introduce new_ variables. Is there any issue if you
> > unconditionally write to the MAC registers?
>
> We do this in every driver, as mac_config() can be called with only a
> small number of changes in the settings, and it is important not to
> upset the MAC for minor updates.
>
> An example of this is when we are in SGMII mode with an attached PHY.
> SGMII will communicate the speed and duplex, but not the results of
> the pause negotiation.  We read that from the attached PHY and report
> it back by calling mac_config() - but at that point, we don't want to
> cause the established link to bounce.
>
> So, mac_config() should be implemented to avoid upsetting an already
> established link where possible (unless the configuration items that
> affect the link have changed.)
>

Ok.

> > > +static void gfar_mac_an_restart(struct phylink_config *config)
> > > +{
> > > +       /* Not supported */
> > > +}
> >
> > What about running gfar_configure_serdes again?
>
> The intention here is to cause the 802.3z link to renegotiate...
>

Yes, gfar_configure_serdes does this:

    phy_write(tbiphy, MII_BMCR,
          BMCR_ANENABLE | BMCR_ANRESTART | BMCR_FULLDPLX |
          BMCR_SPEED1000);

> >
> > > +
> > > +static void gfar_mac_link_down(struct phylink_config *config, unsigned int mode,
> > > +                              phy_interface_t interface)
> > > +{
> > > +       /* Not supported */
> > > +}
> > > +
> >
> > What about disabling RX_EN and TX_EN from MACCFG1?
> >
> > > +static void gfar_mac_link_up(struct phylink_config *config, unsigned int mode,
> > > +                            phy_interface_t interface, struct phy_device *phy)
> > > +{
> > > +       /* Not supported */
> > >  }
> > >
> >
> > What about enabling RX_EN and TX_EN from MACCFG1?
>
> Note that both of these functions must still allow the link to be
> established if we are using in-band negotiation - but they are
> expected to start/stop the transmission of packets.
>

I don't think AN pages count as Ethernet frames, and the RX_EN and
TX_EN bits are MAC settings anyway - it is the PCS below it that would
process them.

> > > @@ -149,8 +148,13 @@ extern const char gfar_driver_version[];
> > >  #define GFAR_SUPPORTED_GBIT SUPPORTED_1000baseT_Full
> > >
> > >  /* TBI register addresses */
> > > +#define MII_TBI_CR             0x00
> > >  #define MII_TBICON             0x11
> > >
> > > +/* TBI_CR register bit fields */
> > > +#define TBI_CR_FULL_DUPLEX     0x0100
> > > +#define TBI_CR_SPEED_1000_MASK 0x0040
> > > +
> >
> > I think BIT() definitions are preferred, even if that means you have
> > to convert existing code first.
>
> If MII_TBI_CR is the BMCR of the PCS, how about using the definitions
> that we already have in the kernel:
>
> BMCR_SPEED1000 is TBI_CR_SPEED_1000_MASK
> BMCR_FULLDPLX is TBI_CR_FULL_DUPLEX
>
> ?

Or that, yes.

>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> According to speedtest.net: 11.9Mbps down 500kbps up

Thanks,
-Vladimir

^ permalink raw reply

* Re: [PATCH 14/16] include/linux: prefer __section from compiler_attributes.h
From: Miguel Ojeda @ 2019-08-24 12:51 UTC (permalink / raw)
  To: Will Deacon
  Cc: Nick Desaulniers, Andrew Morton, Sedat Dilek, Josh Poimboeuf,
	Yonghong Song, clang-built-linux, Luc Van Oostenryck,
	Lai Jiangshan, Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra (Intel), Nicholas Piggin, Jiri Kosina,
	Ard Biesheuvel, Michael Ellerman, Masahiro Yamada,
	Hans Liljestrand, Elena Reshetova, David Windsor, Marc Zyngier,
	Ming Lei, Dou Liyang, Julien Thierry, Mauro Carvalho Chehab,
	Jens Axboe, linux-kernel, linux-sparse, rcu, Network Development,
	bpf
In-Reply-To: <20190813083257.nnsxf5khnqagl46s@willie-the-truck>

On Tue, Aug 13, 2019 at 10:33 AM Will Deacon <will@kernel.org> wrote:
>
> -ENOCOMMITMESSAGE
>
> Otherwise, patch looks good to me.

Do you want Ack, Review or nothing?

Cheers,
Miguel

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox