Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCHv2 1/1] net: rds: add service level support in rds-info
From: santosh.shilimkar @ 2019-08-24  1:25 UTC (permalink / raw)
  To: Zhu Yanjun, davem, netdev, linux-rdma, rds-devel, gerd.rausch
In-Reply-To: <1566608656-30836-1-git-send-email-yanjun.zhu@oracle.com>

On 8/23/19 6:04 PM, Zhu Yanjun wrote:
>  From IB specific 7.6.5 SERVICE LEVEL, Service Level (SL)
> is used to identify different flows within an IBA subnet.
> It is carried in the local route header of the packet.
> 
> Before this commit, run "rds-info -I". The outputs are as
> below:
> "
> RDS IB Connections:
>   LocalAddr  RemoteAddr Tos SL  LocalDev               RemoteDev
> 192.2.95.3  192.2.95.1  2   0  fe80::21:28:1a:39  fe80::21:28:10:b9
> 192.2.95.3  192.2.95.1  1   0  fe80::21:28:1a:39  fe80::21:28:10:b9
> 192.2.95.3  192.2.95.1  0   0  fe80::21:28:1a:39  fe80::21:28:10:b9
> "
> After this commit, the output is as below:
> "
> RDS IB Connections:
>   LocalAddr  RemoteAddr Tos SL  LocalDev               RemoteDev
> 192.2.95.3  192.2.95.1  2   2  fe80::21:28:1a:39  fe80::21:28:10:b9
> 192.2.95.3  192.2.95.1  1   1  fe80::21:28:1a:39  fe80::21:28:10:b9
> 192.2.95.3  192.2.95.1  0   0  fe80::21:28:1a:39  fe80::21:28:10:b9
> "
> 
> The commit fe3475af3bdf ("net: rds: add per rds connection cache
> statistics") adds cache_allocs in struct rds_info_rdma_connection
> as below:
> struct rds_info_rdma_connection {
> ...
>          __u32           rdma_mr_max;
>          __u32           rdma_mr_size;
>          __u8            tos;
>          __u32           cache_allocs;
>   };
> The peer struct in rds-tools of struct rds_info_rdma_connection is as
> below:
> struct rds_info_rdma_connection {
> ...
>          uint32_t        rdma_mr_max;
>          uint32_t        rdma_mr_size;
>          uint8_t         tos;
>          uint8_t         sl;
>          uint32_t        cache_allocs;
> };
> The difference between userspace and kernel is the member variable sl.
> In the kernel struct, the member variable sl is missing. This will
> introduce risks. So it is necessary to use this commit to avoid this risk.
> 
> Fixes: fe3475af3bdf ("net: rds: add per rds connection cache statistics")
> CC: Joe Jin <joe.jin@oracle.com>
> CC: JUNXIAO_BI <junxiao.bi@oracle.com>
> Suggested-by: Gerd Rausch <gerd.rausch@oracle.com>
> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
> ---
> V1->V2: fix typos in commit logs.
> ---
I did ask you when ypu posted the patch about whether you did
backward compatibility tests for which you said, you did all the
tests and said "So do not worry about backward compatibility.  This
commit will work well with older rds-tools2.0.5 and 2.0.6."

https://www.spinics.net/lists/netdev/msg574691.html

I was worried about exactly such issue as described in commit.

Anyways thanks for the fixup patch. Should be applied to stable
as well.

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Regards,
Santosh


^ permalink raw reply

* Re: [net-next 07/14] ice: Rename ethtool private flag for lldp
From: Jakub Kicinski @ 2019-08-24  1:31 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: davem, Dave Ertman, netdev, nhorman, sassmann, Andrew Bowers
In-Reply-To: <20190823233750.7997-8-jeffrey.t.kirsher@intel.com>

On Fri, 23 Aug 2019 16:37:43 -0700, Jeff Kirsher wrote:
> From: Dave Ertman <david.m.ertman@intel.com>
> 
> The current flag name of "enable-fw-lldp" is a bit cumbersome.
> 
> Change priv-flag name to "fw-lldp-agent" with a value of on or
> off.  This is more straight-forward in meaning.
> 
> Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Just flagging this for Dave, it was introduced in v5.2 by:

commit 3a257a1404f8bf751a258ab92262dcb2cce39eef
Author: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Date:   Thu Feb 28 15:24:31 2019 -0800

    ice: Add code to control FW LLDP and DCBX
    
    This patch adds code to start or stop LLDP and DCBX in firmware through
    use of ethtool private flags.
    
    Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
    Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
    Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

And changed once already in v5.3-rc by:

commit 31eafa403b9945997cf5b321ae3560f072b74efe
Author: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Date:   Tue Apr 16 10:24:25 2019 -0700

    ice: Implement LLDP persistence
    
    Implement LLDP persistence across reboots, start and stop of LLDP agent.
    Add additional parameter to ice_aq_start_lldp and ice_aq_stop_lldp.
    
    Also change the ethtool private flag from "disable-fw-lldp" to
    "enable-fw-lldp". This change will flip the boolean logic of the
    functionality of the flag (on = enable, off = disable). The change
    in name and functionality is to differentiate between the
    pre-persistence and post-persistence states.
    
    Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
    Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
    Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Perhaps the rename should target net? IDK how much driver flag renaming
is okay otherwise, I guess this will only affect Intel users.

^ permalink raw reply

* Re: [PATCHv2 1/1] net: rds: add service level support in rds-info
From: Zhu Yanjun @ 2019-08-24  1:36 UTC (permalink / raw)
  To: santosh.shilimkar, davem, netdev, linux-rdma, rds-devel,
	gerd.rausch
In-Reply-To: <6e5bc371-d613-e8f7-7b57-0b1bc2e10e9d@oracle.com>


On 2019/8/24 9:25, santosh.shilimkar@oracle.com wrote:
> On 8/23/19 6:04 PM, Zhu Yanjun wrote:
>>  From IB specific 7.6.5 SERVICE LEVEL, Service Level (SL)
>> is used to identify different flows within an IBA subnet.
>> It is carried in the local route header of the packet.
>>
>> Before this commit, run "rds-info -I". The outputs are as
>> below:
>> "
>> RDS IB Connections:
>>   LocalAddr  RemoteAddr Tos SL  LocalDev               RemoteDev
>> 192.2.95.3  192.2.95.1  2   0  fe80::21:28:1a:39 fe80::21:28:10:b9
>> 192.2.95.3  192.2.95.1  1   0  fe80::21:28:1a:39 fe80::21:28:10:b9
>> 192.2.95.3  192.2.95.1  0   0  fe80::21:28:1a:39 fe80::21:28:10:b9
>> "
>> After this commit, the output is as below:
>> "
>> RDS IB Connections:
>>   LocalAddr  RemoteAddr Tos SL  LocalDev               RemoteDev
>> 192.2.95.3  192.2.95.1  2   2  fe80::21:28:1a:39 fe80::21:28:10:b9
>> 192.2.95.3  192.2.95.1  1   1  fe80::21:28:1a:39 fe80::21:28:10:b9
>> 192.2.95.3  192.2.95.1  0   0  fe80::21:28:1a:39 fe80::21:28:10:b9
>> "
>>
>> The commit fe3475af3bdf ("net: rds: add per rds connection cache
>> statistics") adds cache_allocs in struct rds_info_rdma_connection
>> as below:
>> struct rds_info_rdma_connection {
>> ...
>>          __u32           rdma_mr_max;
>>          __u32           rdma_mr_size;
>>          __u8            tos;
>>          __u32           cache_allocs;
>>   };
>> The peer struct in rds-tools of struct rds_info_rdma_connection is as
>> below:
>> struct rds_info_rdma_connection {
>> ...
>>          uint32_t        rdma_mr_max;
>>          uint32_t        rdma_mr_size;
>>          uint8_t         tos;
>>          uint8_t         sl;
>>          uint32_t        cache_allocs;
>> };
>> The difference between userspace and kernel is the member variable sl.
>> In the kernel struct, the member variable sl is missing. This will
>> introduce risks. So it is necessary to use this commit to avoid this 
>> risk.
>>
>> Fixes: fe3475af3bdf ("net: rds: add per rds connection cache 
>> statistics")
>> CC: Joe Jin <joe.jin@oracle.com>
>> CC: JUNXIAO_BI <junxiao.bi@oracle.com>
>> Suggested-by: Gerd Rausch <gerd.rausch@oracle.com>
>> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
>> ---
>> V1->V2: fix typos in commit logs.
>> ---
> I did ask you when ypu posted the patch about whether you did
> backward compatibility tests for which you said, you did all the
> tests and said "So do not worry about backward compatibility. This
> commit will work well with older rds-tools2.0.5 and 2.0.6."
>
> https://www.spinics.net/lists/netdev/msg574691.html
>
> I was worried about exactly such issue as described in commit.

Sorry. My bad. I will make more work to let rds robust.

Thanks a lot for your Ack.

Zhu Yanjun

>
> Anyways thanks for the fixup patch. Should be applied to stable
> as well.
>
> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
>
> Regards,
> Santosh
>
>

^ permalink raw reply

* [PATCH bpf] nfp: bpf: fix latency bug when updating stack index register
From: Jakub Kicinski @ 2019-08-24  2:00 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: bpf, netdev, oss-drivers, Jiong Wang, Jakub Kicinski

From: Jiong Wang <jiong.wang@netronome.com>

NFP is using Local Memory to model stack. LM_addr could be used as base of
a 16 32-bit word region of Local Memory. Then, if the stack offset is
beyond the current region, the local index needs to be updated. The update
needs at least three cycles to take effect, therefore the sequence normally
looks like:

  local_csr_wr[ActLMAddr3, gprB_5]
  nop
  nop
  nop

If the local index switch happens on a narrow loads, then the instruction
preparing value to zero high 32-bit of the destination register could be
counted as one cycle, the sequence then could be something like:

  local_csr_wr[ActLMAddr3, gprB_5]
  nop
  nop
  immed[gprB_5, 0]

However, we have zero extension optimization that zeroing high 32-bit could
be eliminated, therefore above IMMED insn won't be available for which case
the first sequence needs to be generated.

Fixes: 0b4de1ff19bf ("nfp: bpf: eliminate zero extension code-gen")
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/jit.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index 4054b70d7719..5afcb3c4c2ef 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -1163,7 +1163,7 @@ mem_op_stack(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 	     bool clr_gpr, lmem_step step)
 {
 	s32 off = nfp_prog->stack_frame_depth + meta->insn.off + ptr_off;
-	bool first = true, last;
+	bool first = true, narrow_ld, last;
 	bool needs_inc = false;
 	swreg stack_off_reg;
 	u8 prev_gpr = 255;
@@ -1209,13 +1209,22 @@ mem_op_stack(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 
 		needs_inc = true;
 	}
+
+	narrow_ld = clr_gpr && size < 8;
+
 	if (lm3) {
+		unsigned int nop_cnt;
+
 		emit_csr_wr(nfp_prog, imm_b(nfp_prog), NFP_CSR_ACT_LM_ADDR3);
-		/* For size < 4 one slot will be filled by zeroing of upper. */
-		wrp_nops(nfp_prog, clr_gpr && size < 8 ? 2 : 3);
+		/* For size < 4 one slot will be filled by zeroing of upper,
+		 * but be careful, that zeroing could be eliminated by zext
+		 * optimization.
+		 */
+		nop_cnt = narrow_ld && meta->flags & FLAG_INSN_DO_ZEXT ? 2 : 3;
+		wrp_nops(nfp_prog, nop_cnt);
 	}
 
-	if (clr_gpr && size < 8)
+	if (narrow_ld)
 		wrp_zext(nfp_prog, meta, gpr);
 
 	while (size) {
-- 
2.21.0


^ permalink raw reply related

* [PATCH RFC net-next 0/3] Multi-CPU DSA support
From: Marek Behún @ 2019-08-24  2:42 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, David Ahern,
	Stephen Hemminger, Marek Behún

Hi,
this is my attempt to solve the multi-CPU port issue for DSA.

Patch 1 adds code for handling multiple CPU ports in a DSA switch tree.
If more than one CPU port is found in a tree, the code assigns CPU ports
to user/DSA ports in a round robin way. So for the simplest case where
we have one switch with N ports, 2 of them of type CPU connected to eth0
and eth1, and the other ports labels being lan1, lan2, ..., the code
assigns them to CPU ports this way:
  lan1 <-> eth0
  lan2 <-> eth1
  lan3 <-> eth0
  lan4 <-> eth1
  lan5 <-> eth0
  ...

Patch 2 adds a new operation to the net device operations structure.
Currently we use the iflink property of a net device to report to which
CPU port a given switch port si connected to. The ip link utility from
iproute2 reports this as "lan1@eth0". We add a new net device operation,
ndo_set_iflink, which can be used to set this property. We call this
function from the netlink handlers.

Patch 3 implements this new ndo_set_iflink operation for DSA slave
device. Thus the userspace can request a change of CPU port of a given
port.

I am also sending patch for iproute2-next, to add support for setting
this iflink value.

Marek

Marek Behún (3):
  net: dsa: allow for multiple CPU ports
  net: add ndo for setting the iflink property
  net: dsa: implement ndo_set_netlink for chaning port's CPU port

 include/linux/netdevice.h |  5 +++
 include/net/dsa.h         | 11 ++++-
 net/core/dev.c            | 15 +++++++
 net/core/rtnetlink.c      |  7 ++++
 net/dsa/dsa2.c            | 84 +++++++++++++++++++++++++--------------
 net/dsa/slave.c           | 35 ++++++++++++++++
 6 files changed, 126 insertions(+), 31 deletions(-)

-- 
2.21.0

^ permalink raw reply

* [PATCH RFC net-next 2/3] net: add ndo for setting the iflink property
From: Marek Behún @ 2019-08-24  2:42 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, David Ahern,
	Stephen Hemminger, Marek Behún
In-Reply-To: <20190824024251.4542-1-marek.behun@nic.cz>

In DSA the iflink value is used to report to which CPU port a given
switch port is connected to. Since we want to support multi-CPU DSA, we
want the user to be able to change this value.

Add ndo_set_iflink method into the ndo strucutre to be a pair to
ndo_get_iflink. Also create dev_set_iflink and call this from the
netlink code, so that userspace can change the iflink value.

Signed-off-by: Marek Behún <marek.behun@nic.cz>
---
 include/linux/netdevice.h |  5 +++++
 net/core/dev.c            | 15 +++++++++++++++
 net/core/rtnetlink.c      |  7 +++++++
 3 files changed, 27 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 55ac223553f8..45eeb6da8583 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1201,6 +1201,8 @@ struct tlsdev_ops;
  *	TX queue.
  * int (*ndo_get_iflink)(const struct net_device *dev);
  *	Called to get the iflink value of this device.
+ * int (*ndo_set_iflink)(struct net_device *dev, int iflink);
+ *	Called to set the iflink value of this device.
  * void (*ndo_change_proto_down)(struct net_device *dev,
  *				 bool proto_down);
  *	This function is used to pass protocol port error state information
@@ -1415,6 +1417,8 @@ struct net_device_ops {
 						      int queue_index,
 						      u32 maxrate);
 	int			(*ndo_get_iflink)(const struct net_device *dev);
+	int			(*ndo_set_iflink)(struct net_device *dev,
+						  int iflink);
 	int			(*ndo_change_proto_down)(struct net_device *dev,
 							 bool proto_down);
 	int			(*ndo_fill_metadata_dst)(struct net_device *dev,
@@ -2606,6 +2610,7 @@ void dev_add_offload(struct packet_offload *po);
 void dev_remove_offload(struct packet_offload *po);
 
 int dev_get_iflink(const struct net_device *dev);
+int dev_set_iflink(struct net_device *dev, int iflink);
 int dev_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb);
 struct net_device *__dev_get_by_flags(struct net *net, unsigned short flags,
 				      unsigned short mask);
diff --git a/net/core/dev.c b/net/core/dev.c
index 49589ed2018d..966bab196694 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -693,6 +693,21 @@ int dev_get_iflink(const struct net_device *dev)
 }
 EXPORT_SYMBOL(dev_get_iflink);
 
+/**
+ *	dev_set_iflink - set 'iflink' value of an interface
+ *	@dev: target interface
+ *	@iflink: new value
+ *
+ *	Change the interface to which this interface is linked to.
+ */
+int dev_set_iflink(struct net_device *dev, int iflink)
+{
+	if (dev->netdev_ops && dev->netdev_ops->ndo_set_iflink)
+		return dev->netdev_ops->ndo_set_iflink(dev, iflink);
+
+	return -EOPNOTSUPP;
+}
+
 /**
  *	dev_fill_metadata_dst - Retrieve tunnel egress information.
  *	@dev: targeted interface
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 1ee6460f8275..106d5e23ae6f 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2507,6 +2507,13 @@ static int do_setlink(const struct sk_buff *skb,
 		status |= DO_SETLINK_MODIFIED;
 	}
 
+	if (tb[IFLA_LINK]) {
+		err = dev_set_iflink(dev, nla_get_u32(tb[IFLA_LINK]));
+		if (err)
+			goto errout;
+		status |= DO_SETLINK_MODIFIED;
+	}
+
 	if (tb[IFLA_CARRIER]) {
 		err = dev_change_carrier(dev, nla_get_u8(tb[IFLA_CARRIER]));
 		if (err)
-- 
2.21.0


^ permalink raw reply related

* [PATCH RFC net-next 1/3] net: dsa: allow for multiple CPU ports
From: Marek Behún @ 2019-08-24  2:42 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, David Ahern,
	Stephen Hemminger, Marek Behún
In-Reply-To: <20190824024251.4542-1-marek.behun@nic.cz>

Allow for multiple CPU ports in a DSA switch tree. By default assign the
CPU ports to user ports in a round robin way, ie. if there are two CPU
ports connected to eth0 and eth1, and five user ports (lan1..lan5),
assign them as:
  lan1 <-> eth0
  lan2 <-> eth1
  lan3 <-> eth0
  lan4 <-> eth1
  lan5 <-> eth0

Signed-off-by: Marek Behún <marek.behun@nic.cz>
---
 include/net/dsa.h |  5 +--
 net/dsa/dsa2.c    | 84 +++++++++++++++++++++++++++++++----------------
 2 files changed, 58 insertions(+), 31 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 147b757ef8ea..64bd70608f2f 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -123,9 +123,10 @@ struct dsa_switch_tree {
 	struct dsa_platform_data	*pd;
 
 	/*
-	 * The switch port to which the CPU is attached.
+	 * The switch ports to which the CPU is attached.
 	 */
-	struct dsa_port		*cpu_dp;
+	size_t			num_cpu_dps;
+	struct dsa_port		*cpu_dps[DSA_MAX_PORTS];
 
 	/*
 	 * Data for the individual switch chips.
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 8c4eccb0cfe6..c5af89079a6b 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -194,11 +194,12 @@ static bool dsa_tree_setup_routing_table(struct dsa_switch_tree *dst)
 	return complete;
 }
 
-static struct dsa_port *dsa_tree_find_first_cpu(struct dsa_switch_tree *dst)
+static void dsa_tree_fill_cpu_ports(struct dsa_switch_tree *dst)
 {
 	struct dsa_switch *ds;
 	struct dsa_port *dp;
 	int device, port;
+	int count = 0;
 
 	for (device = 0; device < DSA_MAX_SWITCHES; device++) {
 		ds = dst->ds[device];
@@ -208,28 +209,38 @@ static struct dsa_port *dsa_tree_find_first_cpu(struct dsa_switch_tree *dst)
 		for (port = 0; port < ds->num_ports; port++) {
 			dp = &ds->ports[port];
 
-			if (dsa_port_is_cpu(dp))
-				return dp;
+			if (dsa_port_is_cpu(dp)) {
+				if (count == ARRAY_SIZE(dst->cpu_dps)) {
+					pr_warn("Tree has too many CPU ports\n");
+					dst->num_cpu_dps = count;
+					return;
+				}
+
+				dst->cpu_dps[count++] = dp;
+			}
 		}
 	}
 
-	return NULL;
+	dst->num_cpu_dps = count;
 }
 
-static int dsa_tree_setup_default_cpu(struct dsa_switch_tree *dst)
+static int dsa_tree_setup_default_cpus(struct dsa_switch_tree *dst)
 {
 	struct dsa_switch *ds;
 	struct dsa_port *dp;
-	int device, port;
+	int device, port, i;
 
-	/* DSA currently only supports a single CPU port */
-	dst->cpu_dp = dsa_tree_find_first_cpu(dst);
-	if (!dst->cpu_dp) {
+	dsa_tree_fill_cpu_ports(dst);
+	if (!dst->num_cpu_dps) {
 		pr_warn("Tree has no master device\n");
 		return -EINVAL;
 	}
 
-	/* Assign the default CPU port to all ports of the fabric */
+	/* Assign the default CPU port to all ports of the fabric in a round
+	 * robin way. This should work nicely for all sane switch tree designs.
+	 */
+	i = 0;
+
 	for (device = 0; device < DSA_MAX_SWITCHES; device++) {
 		ds = dst->ds[device];
 		if (!ds)
@@ -238,18 +249,20 @@ static int dsa_tree_setup_default_cpu(struct dsa_switch_tree *dst)
 		for (port = 0; port < ds->num_ports; port++) {
 			dp = &ds->ports[port];
 
-			if (dsa_port_is_user(dp) || dsa_port_is_dsa(dp))
-				dp->cpu_dp = dst->cpu_dp;
+			if (dsa_port_is_user(dp) || dsa_port_is_dsa(dp)) {
+				dp->cpu_dp = dst->cpu_dps[i++];
+				if (i == dst->num_cpu_dps)
+					i = 0;
+			}
 		}
 	}
 
 	return 0;
 }
 
-static void dsa_tree_teardown_default_cpu(struct dsa_switch_tree *dst)
+static void dsa_tree_teardown_default_cpus(struct dsa_switch_tree *dst)
 {
-	/* DSA currently only supports a single CPU port */
-	dst->cpu_dp = NULL;
+	dst->num_cpu_dps = 0;
 }
 
 static int dsa_port_setup(struct dsa_port *dp)
@@ -493,21 +506,34 @@ static void dsa_tree_teardown_switches(struct dsa_switch_tree *dst)
 	}
 }
 
-static int dsa_tree_setup_master(struct dsa_switch_tree *dst)
+static int dsa_tree_setup_masters(struct dsa_switch_tree *dst)
 {
-	struct dsa_port *cpu_dp = dst->cpu_dp;
-	struct net_device *master = cpu_dp->master;
+	int i;
+	int err;
 
-	/* DSA currently supports a single pair of CPU port and master device */
-	return dsa_master_setup(master, cpu_dp);
+	for (i = 0; i < dst->num_cpu_dps; ++i) {
+		struct dsa_port *cpu_dp = dst->cpu_dps[i];
+		struct net_device *master = cpu_dp->master;
+
+		err = dsa_master_setup(master, cpu_dp);
+		if (err)
+			goto teardown;
+	}
+
+	return 0;
+teardown:
+	for (--i; i >= 0; --i)
+		dsa_master_teardown(dst->cpu_dps[i]->master);
+
+	return err;
 }
 
-static void dsa_tree_teardown_master(struct dsa_switch_tree *dst)
+static void dsa_tree_teardown_masters(struct dsa_switch_tree *dst)
 {
-	struct dsa_port *cpu_dp = dst->cpu_dp;
-	struct net_device *master = cpu_dp->master;
+	int i;
 
-	return dsa_master_teardown(master);
+	for (i = 0; i < dst->num_cpu_dps; ++i)
+		dsa_master_teardown(dst->cpu_dps[i]->master);
 }
 
 static int dsa_tree_setup(struct dsa_switch_tree *dst)
@@ -525,7 +551,7 @@ static int dsa_tree_setup(struct dsa_switch_tree *dst)
 	if (!complete)
 		return 0;
 
-	err = dsa_tree_setup_default_cpu(dst);
+	err = dsa_tree_setup_default_cpus(dst);
 	if (err)
 		return err;
 
@@ -533,7 +559,7 @@ static int dsa_tree_setup(struct dsa_switch_tree *dst)
 	if (err)
 		goto teardown_default_cpu;
 
-	err = dsa_tree_setup_master(dst);
+	err = dsa_tree_setup_masters(dst);
 	if (err)
 		goto teardown_switches;
 
@@ -546,7 +572,7 @@ static int dsa_tree_setup(struct dsa_switch_tree *dst)
 teardown_switches:
 	dsa_tree_teardown_switches(dst);
 teardown_default_cpu:
-	dsa_tree_teardown_default_cpu(dst);
+	dsa_tree_teardown_default_cpus(dst);
 
 	return err;
 }
@@ -556,11 +582,11 @@ static void dsa_tree_teardown(struct dsa_switch_tree *dst)
 	if (!dst->setup)
 		return;
 
-	dsa_tree_teardown_master(dst);
+	dsa_tree_teardown_masters(dst);
 
 	dsa_tree_teardown_switches(dst);
 
-	dsa_tree_teardown_default_cpu(dst);
+	dsa_tree_teardown_default_cpus(dst);
 
 	pr_info("DSA: tree %d torn down\n", dst->index);
 
-- 
2.21.0


^ permalink raw reply related

* [PATCH RFC iproute2-next] iplink: allow to change iplink value
From: Marek Behún @ 2019-08-24  2:42 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, David Ahern,
	Stephen Hemminger, Marek Behún
In-Reply-To: <20190824024251.4542-1-marek.behun@nic.cz>

Allow to change the interface to which a given interface is linked to.
This is useful in the case of multi-CPU port DSA, for changing the CPU
port of a given user port.

Signed-off-by: Marek Behún <marek.behun@nic.cz>
Cc: David Ahern <dsahern@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
---
 ip/iplink.c           | 16 +++++-----------
 man/man8/ip-link.8.in |  7 +++++++
 2 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/ip/iplink.c b/ip/iplink.c
index 212a0885..d52c0aaf 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -579,7 +579,6 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req, char **type)
 {
 	char *name = NULL;
 	char *dev = NULL;
-	char *link = NULL;
 	int ret, len;
 	char abuf[32];
 	int qlen = -1;
@@ -590,6 +589,7 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req, char **type)
 	int numrxqueues = -1;
 	int link_netnsid = -1;
 	int index = 0;
+	int link = -1;
 	int group = -1;
 	int addr_len = 0;
 
@@ -620,7 +620,10 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req, char **type)
 				invarg("Invalid \"index\" value", *argv);
 		} else if (matches(*argv, "link") == 0) {
 			NEXT_ARG();
-			link = *argv;
+			link = ll_name_to_index(*argv);
+			if (!link)
+				return nodev(*argv);
+			addattr32(&req->n, sizeof(*req), IFLA_LINK, link);
 		} else if (matches(*argv, "address") == 0) {
 			NEXT_ARG();
 			addr_len = ll_addr_a2n(abuf, sizeof(abuf), *argv);
@@ -1004,15 +1007,6 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req, char **type)
 			exit(-1);
 		}
 
-		if (link) {
-			int ifindex;
-
-			ifindex = ll_name_to_index(link);
-			if (!ifindex)
-				return nodev(link);
-			addattr32(&req->n, sizeof(*req), IFLA_LINK, ifindex);
-		}
-
 		req->i.ifi_index = index;
 	}
 
diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index a8ae72d2..800aed05 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -149,6 +149,9 @@ ip-link \- network device configuration
 .br
 .RB "[ " nomaster " ]"
 .br
+.RB "[ " link
+.IR DEVICE " ]"
+.br
 .RB "[ " vrf
 .IR NAME " ]"
 .br
@@ -2131,6 +2134,10 @@ set master device of the device (enslave device).
 .BI nomaster
 unset master device of the device (release device).
 
+.TP
+.BI link " DEVICE"
+set device to which this device is linked to.
+
 .TP
 .BI addrgenmode " eui64|none|stable_secret|random"
 set the IPv6 address generation mode
-- 
2.21.0


^ permalink raw reply related

* [PATCH RFC net-next 3/3] net: dsa: implement ndo_set_netlink for chaning port's CPU port
From: Marek Behún @ 2019-08-24  2:42 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, David Ahern,
	Stephen Hemminger, Marek Behún
In-Reply-To: <20190824024251.4542-1-marek.behun@nic.cz>

Implement ndo_set_iflink for DSA slave device. In multi-CPU port setup
this should be used to change to which CPU destination port a given port
should be connected.

This adds a new operation into the DSA switch operations structure,
port_change_cpu_port. A driver implementing this function has the
ability to change CPU destination port of a given port.

Signed-off-by: Marek Behún <marek.behun@nic.cz>
---
 include/net/dsa.h |  6 ++++++
 net/dsa/slave.c   | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 64bd70608f2f..4f3f0032b886 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -545,6 +545,12 @@ struct dsa_switch_ops {
 	 */
 	netdev_tx_t (*port_deferred_xmit)(struct dsa_switch *ds, int port,
 					  struct sk_buff *skb);
+
+	/*
+	 * Multi-CPU port support
+	 */
+	int	(*port_change_cpu_port)(struct dsa_switch *ds, int port,
+					struct dsa_port *new_cpu_dp);
 };
 
 struct dsa_switch_driver {
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 33f41178afcc..bafaadeca912 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -64,6 +64,40 @@ static int dsa_slave_get_iflink(const struct net_device *dev)
 	return dsa_slave_to_master(dev)->ifindex;
 }
 
+static int dsa_slave_set_iflink(struct net_device *dev, int iflink)
+{
+	struct dsa_port *dp = dsa_slave_to_port(dev);
+	struct dsa_slave_priv *p = netdev_priv(dev);
+	struct net_device *new_cpu_dev;
+	struct dsa_port *new_cpu_dp;
+	int err;
+
+	if (!dp->ds->ops->port_change_cpu_port)
+		return -EOPNOTSUPP;
+
+	new_cpu_dev = dev_get_by_index(dev_net(dev), iflink);
+	if (!new_cpu_dev)
+		return -ENODEV;
+
+	new_cpu_dp = new_cpu_dev->dsa_ptr;
+	if (!new_cpu_dp)
+		return -EINVAL;
+
+	/* new CPU port has to be on the same switch tree */
+	if (new_cpu_dp->dst != dp->dst)
+		return -EINVAL;
+
+	err = dp->ds->ops->port_change_cpu_port(dp->ds, dp->index, new_cpu_dp);
+	if (err)
+		return err;
+
+	/* should this be done atomically? */
+	dp->cpu_dp = new_cpu_dp;
+	p->xmit = new_cpu_dp->tag_ops->xmit;
+
+	return 0;
+}
+
 static int dsa_slave_open(struct net_device *dev)
 {
 	struct net_device *master = dsa_slave_to_master(dev);
@@ -1176,6 +1210,7 @@ static const struct net_device_ops dsa_slave_netdev_ops = {
 	.ndo_fdb_dump		= dsa_slave_fdb_dump,
 	.ndo_do_ioctl		= dsa_slave_ioctl,
 	.ndo_get_iflink		= dsa_slave_get_iflink,
+	.ndo_set_iflink		= dsa_slave_set_iflink,
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_netpoll_setup	= dsa_slave_netpoll_setup,
 	.ndo_netpoll_cleanup	= dsa_slave_netpoll_cleanup,
-- 
2.21.0


^ permalink raw reply related

* RE: [PATCH v2 0/2] Simplify mtty driver and mdev core
From: Parav Pandit @ 2019-08-24  3:56 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Jiri Pirko, Jiri Pirko, David S . Miller, Kirti Wankhede,
	Cornelia Huck, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	cjia, netdev@vger.kernel.org
In-Reply-To: <20190823134337.37e4b215@x1.home>



> -----Original Message-----
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Saturday, August 24, 2019 1:14 AM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>; David S . Miller
> <davem@davemloft.net>; Kirti Wankhede <kwankhede@nvidia.com>; Cornelia
> Huck <cohuck@redhat.com>; kvm@vger.kernel.org; linux-
> kernel@vger.kernel.org; cjia <cjia@nvidia.com>; netdev@vger.kernel.org
> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> 
> On Fri, 23 Aug 2019 18:00:30 +0000
> Parav Pandit <parav@mellanox.com> wrote:
> 
> > > -----Original Message-----
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Friday, August 23, 2019 10:47 PM
> > > To: Parav Pandit <parav@mellanox.com>
> > > Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>;
> > > David S . Miller <davem@davemloft.net>; Kirti Wankhede
> > > <kwankhede@nvidia.com>; Cornelia Huck <cohuck@redhat.com>;
> > > kvm@vger.kernel.org; linux- kernel@vger.kernel.org; cjia
> > > <cjia@nvidia.com>; netdev@vger.kernel.org
> > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > >
> > > On Fri, 23 Aug 2019 16:14:04 +0000
> > > Parav Pandit <parav@mellanox.com> wrote:
> > >
> > > > > > Idea is to have mdev alias as optional.
> > > > > > Each mdev_parent says whether it wants mdev_core to generate
> > > > > > an alias or not. So only networking device drivers would set it to true.
> > > > > > For rest, alias won't be generated, and won't be compared
> > > > > > either during creation time. User continue to provide only uuid.
> > > > >
> > > > > Ok
> > > > >
> > > > > > I am tempted to have alias collision detection only within
> > > > > > children mdevs of the same parent, but doing so will always
> > > > > > mandate to prefix in netdev name. And currently we are left
> > > > > > with only 3 characters to prefix it, so that may not be good either.
> > > > > > Hence, I think mdev core wide alias is better with 12 characters.
> > > > >
> > > > > I suppose it depends on the API, if the vendor driver can ask
> > > > > the mdev core for an alias as part of the device creation
> > > > > process, then it could manage the netdev namespace for all its
> > > > > devices, choosing how many characters to use, and fail the
> > > > > creation if it can't meet a uniqueness requirement.  IOW,
> > > > > mdev-core would always provide a full
> > > > > sha1 and therefore gets itself out of the uniqueness/collision aspects.
> > > > >
> > > > This doesn't work. At mdev core level 20 bytes sha1 are unique, so
> > > > mdev core allowed to create a mdev.
> > >
> > > The mdev vendor driver has the opportunity to fail the device
> > > creation in mdev_parent_ops.create().
> > >
> > That is not helpful for below reasons.
> > 1. vendor driver doesn't have visibility in other vendor's alias.
> > 2. Even for single vendor, it needs to maintain global list of devices to see
> collision.
> > 3. multiple vendors needs to implement same scheme.
> >
> > Mdev core should be the owner. Shifting ownership from one layer to a
> > lower layer in vendor driver doesn't solve the problem (if there is
> > one, which I think doesn't exist).
> >
> > > > And then devlink core chooses
> > > > only 6 bytes (12 characters) and there is collision. Things fall
> > > > apart. Since mdev provides unique uuid based scheme, it's the mdev
> > > > core's ownership to provide unique aliases.
> > >
> > > You're suggesting/contemplating multiple solutions here, 3-char
> > > prefix + 12- char sha1 vs <parent netdev> + ?-char sha1.  Also, the
> > > 15-char total limit is imposed by an external subsystem, where the
> > > vendor driver is the gateway between that subsystem and mdev.  How
> > > would mdev integrate with another subsystem that maybe only has
> > > 9-chars available?  Would the vendor driver API specify "I need an
> > > alias" or would it specify "I need an X-char length alias"?
> > Yes, Vendor driver should say how long the alias it wants.
> > However before we implement that, I suggest let such
> > vendor/user/driver arrive which needs that. Such variable length alias
> > can be added at that time and even with that alias collision can be
> > detected by single mdev module.
> 
> If we agree that different alias lengths are possible, then I would request that
> minimally an mdev sample driver be modified to request an alias with a length
> that can be adjusted without recompiling in order to exercise the collision path.
> 
Yes. this can be done. But I fail to understand the need to do so.
It is not the responsibility of the mdev core to show case sha1 collision efficiency/deficiency.
So why do you insist exercise it?

> If mdev-core is guaranteeing uniqueness, does this indicate that each alias
> length constitutes a separate namespace?  ie. strictly a strcmp(), not a
> strncmp() to the shorter alias.
> 
Yes.


> > > Does it make sense that mdev-core would fail creation of a device if
> > > there's a collision in the 12-char address space between different
> > > subsystems?  For example, does enm0123456789ab really
> > > collide with xyz0123456789ab?
> > I think so, because at mdev level its 12-char alias matters.
> > Choosing the prefix not adding prefix is really a user space choice.
> >
> > >  So if
> > > mdev were to provided a 40-char sha1, is it possible that the vendor
> > > driver could consume this in its create callback, truncate it to the
> > > number of chars required by the vendor driver's subsystem, and
> > > determine whether a collision exists?
> > We shouldn't shift the problem from mdev to multiple vendor drivers to
> > detect collision.
> >
> > I still think that user providing alias is better because it knows the
> > use-case system in use, and eliminates these collision issue.
> 
> How is a user provided alias immune from collisions?  The burden is on the user
> to provide both a unique uuid and a unique alias.  That makes it trivial to create
> a collision.
> 
Than such collision should have occurred for other subsystem such as netdev while creating vlan, macvlan, ipvlan, vxlan and more devices who are named by the user.
But that isn't the case.

> > > > > > I do not understand how an extra character reduces collision,
> > > > > > if that's what you meant.
> > > > >
> > > > > If the default were for example 3-chars, we might already have
> > > > > device 'abc'.  A collision would expose one more char of the new
> > > > > device, so we might add device with alias 'abcd'.  I mentioned
> > > > > previously that this leaves an issue for userspace that we can't
> > > > > change the alias of device abc, so without additional
> > > > > information, userspace can only determine via elimination the
> > > > > mapping of alias to device, but userspace has more information
> > > > > available to it in the form of sysfs links.
> > > > > > Module options are almost not encouraged anymore with other
> > > > > > subsystems/drivers.
> > > > >
> > > > > We don't live in a world of absolutes.  I agree that the
> > > > > defaults should work in the vast majority of cases.  Requiring a
> > > > > user to twiddle module options to make things work is
> > > > > undesirable, verging on a bug.  A module option to enable some
> > > > > specific feature, unsafe condition, or test that is outside of
> > > > > the typical use case is reasonable, imo.
> > > > > > For testing collision rate, a sample user space script and
> > > > > > sample mtty is easy and get us collision count too. We
> > > > > > shouldn't put that using module option in production kernel.
> > > > > > I practically have the code ready to play with; Changing 12 to
> > > > > > smaller value is easy with module reload.
> > > > > >
> > > > > > #define MDEV_ALIAS_LEN 12
> > > > >
> > > > > If it can't be tested with a shipping binary, it probably won't
> > > > > be tested.  Thanks,
> > > > It is not the role of mdev core to expose collision
> > > > efficiency/deficiency of the sha1. It can be tested outside before
> > > > mdev choose to use it.
> > >
> > > The testing I'm considering is the user and kernel response to a
> > > collision.
> > > > I am saying we should test with 12 characters with 10,000 or more
> > > > devices and see how collision occurs. Even if collision occurs,
> > > > mdev returns EEXIST status indicating user to pick a different
> > > > UUID for those rare conditions.
> > >
> > > The only way we're going to see collision with a 12-char sha1 is if
> > > we burn the CPU cycles to find uuids that collide in that space.
> > > 10,000 devices is not remotely enough to generate a collision in
> > > that address space.  That puts a prerequisite in place that in order
> > > to test collision, someone needs to know certain magic inputs.
> > > OTOH, if we could use a shorter abbreviation, collisions are trivial
> > > to test experimentally.  Thanks,
> > Yes, and therefore a sane user who wants to create more mdevs,
> > wouldn't intentionally stress it to see failures.
> 
> I don't understand this logic.  I'm simply asking that we have a way to test the
> collision behavior without changing the binary.  The path we're driving towards
> seems to be making this easier and easier.  If the vendor can request an alias of
> a specific length, then a sample driver with a module option to set the desired
> alias length to 1-char makes it trivially easy to induce a collision.  
Sure it is easy to test collision, but my point is - mdev core is not sha1 test module.
Hence adding functionality of variable alias length to test collision doesn't make sense.
When the actual user arrives who needs small alias, we will be able to add additional pieces very easily.

> It doesn't
> even need to be exposed in a real driver.  Besides, when do we ever get to
> design interfaces that only worry about sane users???  Thanks,
> 
I intent to say that a sane user who wants to create mdev's will just work fine with less collision.
If there is collision EEXIST is returns and sane user picks different UUID.
If user is intentionally picking UUIDs in such a way that triggers sha1 collision, his intention is likely to not create mdevs for actual use.
And if interface returns error code it is still fine.

> Alex

^ permalink raw reply

* RE: [PATCH v2 0/2] Simplify mtty driver and mdev core
From: Parav Pandit @ 2019-08-24  4:45 UTC (permalink / raw)
  To: Parav Pandit, Alex Williamson
  Cc: Jiri Pirko, Jiri Pirko, David S . Miller, Kirti Wankhede,
	Cornelia Huck, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	cjia, netdev@vger.kernel.org
In-Reply-To: <AM0PR05MB4866008B0571B90DAFFADA97D1A70@AM0PR05MB4866.eurprd05.prod.outlook.com>

Hi Alex,

> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org <linux-kernel-
> owner@vger.kernel.org> On Behalf Of Parav Pandit
> Sent: Saturday, August 24, 2019 9:26 AM
> To: Alex Williamson <alex.williamson@redhat.com>
> Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>; David S .
> Miller <davem@davemloft.net>; Kirti Wankhede <kwankhede@nvidia.com>;
> Cornelia Huck <cohuck@redhat.com>; kvm@vger.kernel.org; linux-
> kernel@vger.kernel.org; cjia <cjia@nvidia.com>; netdev@vger.kernel.org
> Subject: RE: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > I don't understand this logic.  I'm simply asking that we have a way
> > to test the collision behavior without changing the binary.  The path
> > we're driving towards seems to be making this easier and easier.  If
> > the vendor can request an alias of a specific length, then a sample
> > driver with a module option to set the desired alias length to 1-char makes
> it trivially easy to induce a collision.
> Sure it is easy to test collision, but my point is - mdev core is not sha1 test
> module.
> Hence adding functionality of variable alias length to test collision doesn't
> make sense.
> When the actual user arrives who needs small alias, we will be able to add
> additional pieces very easily.

My initial thoughts to add parent_ops to have bool flag to generate alias or not.
However, instead of bool, keeping it unsigned int to say, zero to skip alias and non-zero length to convey generate alias.
This will serve both the purpose with trivial handling.



^ permalink raw reply

* Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
From: Alex Williamson @ 2019-08-24  4:59 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jiri Pirko, Jiri Pirko, David S . Miller, Kirti Wankhede,
	Cornelia Huck, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	cjia, netdev@vger.kernel.org
In-Reply-To: <AM0PR05MB4866008B0571B90DAFFADA97D1A70@AM0PR05MB4866.eurprd05.prod.outlook.com>

On Sat, 24 Aug 2019 03:56:08 +0000
Parav Pandit <parav@mellanox.com> wrote:

> > -----Original Message-----
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Saturday, August 24, 2019 1:14 AM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>; David S . Miller
> > <davem@davemloft.net>; Kirti Wankhede <kwankhede@nvidia.com>; Cornelia
> > Huck <cohuck@redhat.com>; kvm@vger.kernel.org; linux-
> > kernel@vger.kernel.org; cjia <cjia@nvidia.com>; netdev@vger.kernel.org
> > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > 
> > On Fri, 23 Aug 2019 18:00:30 +0000
> > Parav Pandit <parav@mellanox.com> wrote:
> >   
> > > > -----Original Message-----
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Friday, August 23, 2019 10:47 PM
> > > > To: Parav Pandit <parav@mellanox.com>
> > > > Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>;
> > > > David S . Miller <davem@davemloft.net>; Kirti Wankhede
> > > > <kwankhede@nvidia.com>; Cornelia Huck <cohuck@redhat.com>;
> > > > kvm@vger.kernel.org; linux- kernel@vger.kernel.org; cjia
> > > > <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > > >
> > > > On Fri, 23 Aug 2019 16:14:04 +0000
> > > > Parav Pandit <parav@mellanox.com> wrote:
> > > >  
> > > > > > > Idea is to have mdev alias as optional.
> > > > > > > Each mdev_parent says whether it wants mdev_core to generate
> > > > > > > an alias or not. So only networking device drivers would set it to true.
> > > > > > > For rest, alias won't be generated, and won't be compared
> > > > > > > either during creation time. User continue to provide only uuid.  
> > > > > >
> > > > > > Ok
> > > > > >  
> > > > > > > I am tempted to have alias collision detection only within
> > > > > > > children mdevs of the same parent, but doing so will always
> > > > > > > mandate to prefix in netdev name. And currently we are left
> > > > > > > with only 3 characters to prefix it, so that may not be good either.
> > > > > > > Hence, I think mdev core wide alias is better with 12 characters.  
> > > > > >
> > > > > > I suppose it depends on the API, if the vendor driver can ask
> > > > > > the mdev core for an alias as part of the device creation
> > > > > > process, then it could manage the netdev namespace for all its
> > > > > > devices, choosing how many characters to use, and fail the
> > > > > > creation if it can't meet a uniqueness requirement.  IOW,
> > > > > > mdev-core would always provide a full
> > > > > > sha1 and therefore gets itself out of the uniqueness/collision aspects.
> > > > > >  
> > > > > This doesn't work. At mdev core level 20 bytes sha1 are unique, so
> > > > > mdev core allowed to create a mdev.  
> > > >
> > > > The mdev vendor driver has the opportunity to fail the device
> > > > creation in mdev_parent_ops.create().
> > > >  
> > > That is not helpful for below reasons.
> > > 1. vendor driver doesn't have visibility in other vendor's alias.
> > > 2. Even for single vendor, it needs to maintain global list of devices to see  
> > collision.  
> > > 3. multiple vendors needs to implement same scheme.
> > >
> > > Mdev core should be the owner. Shifting ownership from one layer to a
> > > lower layer in vendor driver doesn't solve the problem (if there is
> > > one, which I think doesn't exist).
> > >  
> > > > > And then devlink core chooses
> > > > > only 6 bytes (12 characters) and there is collision. Things fall
> > > > > apart. Since mdev provides unique uuid based scheme, it's the mdev
> > > > > core's ownership to provide unique aliases.  
> > > >
> > > > You're suggesting/contemplating multiple solutions here, 3-char
> > > > prefix + 12- char sha1 vs <parent netdev> + ?-char sha1.  Also, the
> > > > 15-char total limit is imposed by an external subsystem, where the
> > > > vendor driver is the gateway between that subsystem and mdev.  How
> > > > would mdev integrate with another subsystem that maybe only has
> > > > 9-chars available?  Would the vendor driver API specify "I need an
> > > > alias" or would it specify "I need an X-char length alias"?  
> > > Yes, Vendor driver should say how long the alias it wants.
> > > However before we implement that, I suggest let such
> > > vendor/user/driver arrive which needs that. Such variable length alias
> > > can be added at that time and even with that alias collision can be
> > > detected by single mdev module.  
> > 
> > If we agree that different alias lengths are possible, then I would request that
> > minimally an mdev sample driver be modified to request an alias with a length
> > that can be adjusted without recompiling in order to exercise the collision path.
> >   
> Yes. this can be done. But I fail to understand the need to do so.
> It is not the responsibility of the mdev core to show case sha1
> collision efficiency/deficiency. So why do you insist exercise it?

I don't understand what you're trying to imply with "show case sha1
collision efficiency/deficiency".  Are you suggesting that I'm asking
for this feature to experimentally test the probability of collisions
at different character lengths?  We can use shell scripts for that.
I'm simply observing that collisions are possible based on user input,
but they're not practical to test for at the character lengths we're
using.  Therefore, how do I tell QA to develop a tests to make sure the
kernel and userspace tools that might be involved behave correctly when
this rare event occurs?

As I mentioned previously, we can burn the cpu cyles to find some uuids
which will collide with our aliases, but the more accessible approach
seems to be to have a tune-able to reduce the alias address space such
that we can simply throw enough random uuids into the test to guarantee
a collision.  Simply generating 10,000 devices with a 12-character
alias, as you suggested previously, has effectively a 0% probability of
generating a collision.

If we accept that different vendor drivers might have different alias
requirements, and therefore the vendor driver should have the ability
to specify an alias length, then this all fits very nicely into
modifying a sample driver to request a sufficiently short alias such
that we can use it to test the behavior of mdev-core and surrounding
code when an alias collision occurs.

> > If mdev-core is guaranteeing uniqueness, does this indicate that
> > each alias length constitutes a separate namespace?  ie. strictly a
> > strcmp(), not a strncmp() to the shorter alias.
> >   
> Yes.
> 
> 
> > > > Does it make sense that mdev-core would fail creation of a
> > > > device if there's a collision in the 12-char address space
> > > > between different subsystems?  For example, does
> > > > enm0123456789ab really collide with xyz0123456789ab?  
> > > I think so, because at mdev level its 12-char alias matters.
> > > Choosing the prefix not adding prefix is really a user space
> > > choice. 
> > > >  So if
> > > > mdev were to provided a 40-char sha1, is it possible that the
> > > > vendor driver could consume this in its create callback,
> > > > truncate it to the number of chars required by the vendor
> > > > driver's subsystem, and determine whether a collision exists?  
> > > We shouldn't shift the problem from mdev to multiple vendor
> > > drivers to detect collision.
> > >
> > > I still think that user providing alias is better because it
> > > knows the use-case system in use, and eliminates these collision
> > > issue.  
> > 
> > How is a user provided alias immune from collisions?  The burden is
> > on the user to provide both a unique uuid and a unique alias.  That
> > makes it trivial to create a collision.
> >   
> Than such collision should have occurred for other subsystem such as
> netdev while creating vlan, macvlan, ipvlan, vxlan and more devices
> who are named by the user. But that isn't the case.
> 
> > > > > > > I do not understand how an extra character reduces
> > > > > > > collision, if that's what you meant.  
> > > > > >
> > > > > > If the default were for example 3-chars, we might already
> > > > > > have device 'abc'.  A collision would expose one more char
> > > > > > of the new device, so we might add device with alias
> > > > > > 'abcd'.  I mentioned previously that this leaves an issue
> > > > > > for userspace that we can't change the alias of device abc,
> > > > > > so without additional information, userspace can only
> > > > > > determine via elimination the mapping of alias to device,
> > > > > > but userspace has more information available to it in the
> > > > > > form of sysfs links.  
> > > > > > > Module options are almost not encouraged anymore with
> > > > > > > other subsystems/drivers.  
> > > > > >
> > > > > > We don't live in a world of absolutes.  I agree that the
> > > > > > defaults should work in the vast majority of cases.
> > > > > > Requiring a user to twiddle module options to make things
> > > > > > work is undesirable, verging on a bug.  A module option to
> > > > > > enable some specific feature, unsafe condition, or test
> > > > > > that is outside of the typical use case is reasonable,
> > > > > > imo.  
> > > > > > > For testing collision rate, a sample user space script and
> > > > > > > sample mtty is easy and get us collision count too. We
> > > > > > > shouldn't put that using module option in production
> > > > > > > kernel. I practically have the code ready to play with;
> > > > > > > Changing 12 to smaller value is easy with module reload.
> > > > > > >
> > > > > > > #define MDEV_ALIAS_LEN 12  
> > > > > >
> > > > > > If it can't be tested with a shipping binary, it probably
> > > > > > won't be tested.  Thanks,  
> > > > > It is not the role of mdev core to expose collision
> > > > > efficiency/deficiency of the sha1. It can be tested outside
> > > > > before mdev choose to use it.  
> > > >
> > > > The testing I'm considering is the user and kernel response to a
> > > > collision.  
> > > > > I am saying we should test with 12 characters with 10,000 or
> > > > > more devices and see how collision occurs. Even if collision
> > > > > occurs, mdev returns EEXIST status indicating user to pick a
> > > > > different UUID for those rare conditions.  
> > > >
> > > > The only way we're going to see collision with a 12-char sha1
> > > > is if we burn the CPU cycles to find uuids that collide in that
> > > > space. 10,000 devices is not remotely enough to generate a
> > > > collision in that address space.  That puts a prerequisite in
> > > > place that in order to test collision, someone needs to know
> > > > certain magic inputs. OTOH, if we could use a shorter
> > > > abbreviation, collisions are trivial to test experimentally.
> > > > Thanks,  
> > > Yes, and therefore a sane user who wants to create more mdevs,
> > > wouldn't intentionally stress it to see failures.  
> > 
> > I don't understand this logic.  I'm simply asking that we have a
> > way to test the collision behavior without changing the binary.
> > The path we're driving towards seems to be making this easier and
> > easier.  If the vendor can request an alias of a specific length,
> > then a sample driver with a module option to set the desired alias
> > length to 1-char makes it trivially easy to induce a collision.    
> Sure it is easy to test collision, but my point is - mdev core is not
> sha1 test module. Hence adding functionality of variable alias length
> to test collision doesn't make sense. When the actual user arrives
> who needs small alias, we will be able to add additional pieces very
> easily.
> 
> > It doesn't
> > even need to be exposed in a real driver.  Besides, when do we ever
> > get to design interfaces that only worry about sane users???
> > Thanks, 
> I intent to say that a sane user who wants to create mdev's will just
> work fine with less collision. If there is collision EEXIST is
> returns and sane user picks different UUID. If user is intentionally
> picking UUIDs in such a way that triggers sha1 collision, his
> intention is likely to not create mdevs for actual use. And if
> interface returns error code it is still fine.

This is exactly the scenarios that I'm asking "how do we test that it
works as we expect".  I can test that passing identical uuids into the
mdev create interface only allows the first to succeed.  With a 12-char
sha1 alias, it's not practical to construct a test to validate the
alias collision behavior.  Do you suggest we rely only on code
inspection instead?  Thanks,

Alex

^ permalink raw reply

* RE: [PATCH v2 0/2] Simplify mtty driver and mdev core
From: Parav Pandit @ 2019-08-24  5:22 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Jiri Pirko, Jiri Pirko, David S . Miller, Kirti Wankhede,
	Cornelia Huck, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	cjia, netdev@vger.kernel.org
In-Reply-To: <20190823225929.38fd86f5@x1.home>



> -----Original Message-----
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Saturday, August 24, 2019 10:29 AM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>; David S .
> Miller <davem@davemloft.net>; Kirti Wankhede <kwankhede@nvidia.com>;
> Cornelia Huck <cohuck@redhat.com>; kvm@vger.kernel.org; linux-
> kernel@vger.kernel.org; cjia <cjia@nvidia.com>; netdev@vger.kernel.org
> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> 
> On Sat, 24 Aug 2019 03:56:08 +0000
> Parav Pandit <parav@mellanox.com> wrote:
> 
> > > -----Original Message-----
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Saturday, August 24, 2019 1:14 AM
> > > To: Parav Pandit <parav@mellanox.com>
> > > Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>;
> > > David S . Miller <davem@davemloft.net>; Kirti Wankhede
> > > <kwankhede@nvidia.com>; Cornelia Huck <cohuck@redhat.com>;
> > > kvm@vger.kernel.org; linux- kernel@vger.kernel.org; cjia
> > > <cjia@nvidia.com>; netdev@vger.kernel.org
> > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > >
> > > On Fri, 23 Aug 2019 18:00:30 +0000
> > > Parav Pandit <parav@mellanox.com> wrote:
> > >
> > > > > -----Original Message-----
> > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > Sent: Friday, August 23, 2019 10:47 PM
> > > > > To: Parav Pandit <parav@mellanox.com>
> > > > > Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko
> > > > > <jiri@mellanox.com>; David S . Miller <davem@davemloft.net>;
> > > > > Kirti Wankhede <kwankhede@nvidia.com>; Cornelia Huck
> > > > > <cohuck@redhat.com>; kvm@vger.kernel.org; linux-
> > > > > kernel@vger.kernel.org; cjia <cjia@nvidia.com>;
> > > > > netdev@vger.kernel.org
> > > > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > > > >
> > > > > On Fri, 23 Aug 2019 16:14:04 +0000 Parav Pandit
> > > > > <parav@mellanox.com> wrote:
> > > > >
> > > > > > > > Idea is to have mdev alias as optional.
> > > > > > > > Each mdev_parent says whether it wants mdev_core to
> > > > > > > > generate an alias or not. So only networking device drivers
> would set it to true.
> > > > > > > > For rest, alias won't be generated, and won't be compared
> > > > > > > > either during creation time. User continue to provide only uuid.
> > > > > > >
> > > > > > > Ok
> > > > > > >
> > > > > > > > I am tempted to have alias collision detection only within
> > > > > > > > children mdevs of the same parent, but doing so will
> > > > > > > > always mandate to prefix in netdev name. And currently we
> > > > > > > > are left with only 3 characters to prefix it, so that may not be
> good either.
> > > > > > > > Hence, I think mdev core wide alias is better with 12 characters.
> > > > > > >
> > > > > > > I suppose it depends on the API, if the vendor driver can
> > > > > > > ask the mdev core for an alias as part of the device
> > > > > > > creation process, then it could manage the netdev namespace
> > > > > > > for all its devices, choosing how many characters to use,
> > > > > > > and fail the creation if it can't meet a uniqueness
> > > > > > > requirement.  IOW, mdev-core would always provide a full
> > > > > > > sha1 and therefore gets itself out of the uniqueness/collision
> aspects.
> > > > > > >
> > > > > > This doesn't work. At mdev core level 20 bytes sha1 are
> > > > > > unique, so mdev core allowed to create a mdev.
> > > > >
> > > > > The mdev vendor driver has the opportunity to fail the device
> > > > > creation in mdev_parent_ops.create().
> > > > >
> > > > That is not helpful for below reasons.
> > > > 1. vendor driver doesn't have visibility in other vendor's alias.
> > > > 2. Even for single vendor, it needs to maintain global list of
> > > > devices to see
> > > collision.
> > > > 3. multiple vendors needs to implement same scheme.
> > > >
> > > > Mdev core should be the owner. Shifting ownership from one layer
> > > > to a lower layer in vendor driver doesn't solve the problem (if
> > > > there is one, which I think doesn't exist).
> > > >
> > > > > > And then devlink core chooses
> > > > > > only 6 bytes (12 characters) and there is collision. Things
> > > > > > fall apart. Since mdev provides unique uuid based scheme, it's
> > > > > > the mdev core's ownership to provide unique aliases.
> > > > >
> > > > > You're suggesting/contemplating multiple solutions here, 3-char
> > > > > prefix + 12- char sha1 vs <parent netdev> + ?-char sha1.  Also,
> > > > > the 15-char total limit is imposed by an external subsystem,
> > > > > where the vendor driver is the gateway between that subsystem
> > > > > and mdev.  How would mdev integrate with another subsystem that
> > > > > maybe only has 9-chars available?  Would the vendor driver API
> > > > > specify "I need an alias" or would it specify "I need an X-char length
> alias"?
> > > > Yes, Vendor driver should say how long the alias it wants.
> > > > However before we implement that, I suggest let such
> > > > vendor/user/driver arrive which needs that. Such variable length
> > > > alias can be added at that time and even with that alias collision
> > > > can be detected by single mdev module.
> > >
> > > If we agree that different alias lengths are possible, then I would
> > > request that minimally an mdev sample driver be modified to request
> > > an alias with a length that can be adjusted without recompiling in order
> to exercise the collision path.
> > >
> > Yes. this can be done. But I fail to understand the need to do so.
> > It is not the responsibility of the mdev core to show case sha1
> > collision efficiency/deficiency. So why do you insist exercise it?
> 
> I don't understand what you're trying to imply with "show case sha1 collision
> efficiency/deficiency".  Are you suggesting that I'm asking for this feature to
> experimentally test the probability of collisions at different character
> lengths?  We can use shell scripts for that.
> I'm simply observing that collisions are possible based on user input, but
> they're not practical to test for at the character lengths we're using.
> Therefore, how do I tell QA to develop a tests to make sure the kernel and
> userspace tools that might be involved behave correctly when this rare event
> occurs?
>
Ok. so you want to have code coverage and want to add a knob for that.
That is fine. I will have the mdev_parent->ops.alias_len as API instead of bool.
And extend mtty module parameter to set the alias length.

Unfortunately similar code coverage doesn't exist for API like mdev_get/set_iommu_device() in sample of real vendor driver.
And QA is not able to test this functionality without tainting the kernel.

^ permalink raw reply

* [PATCH 4.14] tcp: fix tcp_rtx_queue_tail in case of empty retransmit queue
From: Tim Froidcoeur @ 2019-08-24  6:03 UTC (permalink / raw)
  To: matthieu.baerts
  Cc: aprout, cpaasch, davem, edumazet, gregkh, jonathan.lemon, jtl,
	linux-kernel, mkubecek, ncardwell, sashal, stable, tim.froidcoeur,
	ycheng, netdev
In-Reply-To: <529376a4-cf63-f225-ce7c-4747e9966938@tessares.net>

Commit 8c3088f895a0 ("tcp: be more careful in tcp_fragment()")
triggers following stack trace:

[25244.848046] kernel BUG at ./include/linux/skbuff.h:1406!
[25244.859335] RIP: 0010:skb_queue_prev+0x9/0xc
[25244.888167] Call Trace:
[25244.889182]  <IRQ>
[25244.890001]  tcp_fragment+0x9c/0x2cf
[25244.891295]  tcp_write_xmit+0x68f/0x988
[25244.892732]  __tcp_push_pending_frames+0x3b/0xa0
[25244.894347]  tcp_data_snd_check+0x2a/0xc8
[25244.895775]  tcp_rcv_established+0x2a8/0x30d
[25244.897282]  tcp_v4_do_rcv+0xb2/0x158
[25244.898666]  tcp_v4_rcv+0x692/0x956
[25244.899959]  ip_local_deliver_finish+0xeb/0x169
[25244.901547]  __netif_receive_skb_core+0x51c/0x582
[25244.903193]  ? inet_gro_receive+0x239/0x247
[25244.904756]  netif_receive_skb_internal+0xab/0xc6
[25244.906395]  napi_gro_receive+0x8a/0xc0
[25244.907760]  receive_buf+0x9a1/0x9cd
[25244.909160]  ? load_balance+0x17a/0x7b7
[25244.910536]  ? vring_unmap_one+0x18/0x61
[25244.911932]  ? detach_buf+0x60/0xfa
[25244.913234]  virtnet_poll+0x128/0x1e1
[25244.914607]  net_rx_action+0x12a/0x2b1
[25244.915953]  __do_softirq+0x11c/0x26b
[25244.917269]  ? handle_irq_event+0x44/0x56
[25244.918695]  irq_exit+0x61/0xa0
[25244.919947]  do_IRQ+0x9d/0xbb
[25244.921065]  common_interrupt+0x85/0x85
[25244.922479]  </IRQ>

tcp_rtx_queue_tail() (called by tcp_fragment()) can call
tcp_write_queue_prev() on the first packet in the queue, which will trigger
the BUG in tcp_write_queue_prev(), because there is no previous packet.

This happens when the retransmit queue is empty, for example in case of a
zero window.

Patch is needed for 4.4, 4.9 and 4.14 stable branches.

Fixes: 8c3088f895a0 ("tcp: be more careful in tcp_fragment()")
Change-Id: I839bde7167ae59e2f7d916c913507372445765c5
Signed-off-by: Tim Froidcoeur <tim.froidcoeur@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Christoph Paasch <cpaasch@apple.com>
---
 include/net/tcp.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 9de2c8cdcc51..1e70ca75c8bf 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1705,6 +1705,10 @@ static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
 {
 	struct sk_buff *skb = tcp_send_head(sk);

+	/* empty retransmit queue, for example due to zero window */
+	if (skb == tcp_write_queue_head(sk))
+		return NULL;
+
 	return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk);
 }

--
2.23.0


-- 


Disclaimer: https://www.tessares.net/mail-disclaimer/ 
<https://www.tessares.net/mail-disclaimer/>



^ permalink raw reply related

* Re: Aw: [PATCH net-next v3 0/3] net: ethernet: mediatek: convert to PHYLINK
From: René van Dorst @ 2019-08-24  7:41 UTC (permalink / raw)
  To: Frank Wunderlich
  Cc: John Crispin, Sean Wang, Nelson Chang, David S . Miller,
	Matthias Brugger, netdev, linux-arm-kernel, linux-mediatek,
	linux-mips, Russell King, Stefan Roese
In-Reply-To: <trinity-df75d11a-c27f-4941-a880-b017ebabd3dc-1566583013438@3c-app-gmx-bs75>

Hi Frank,

Quoting Frank Wunderlich <frank-w@public-files.de>:

> tested on bpi-r2 (mt7623/mt7530) and bpi-r64 (mt7622/rtl8367)
>

Thanks for testing!

> as reported to rene directly rx-path needs some rework because  
> current rx-speed
> on bpi-r2 is 865 Mbits/sec instead of ~940 Mbits/sec

I still think it is a result of the extra code in the rx path when mt76x8
was introduced.

Greats,

René

>
> Tested-by: Frank Wunderlich <frank-w@public-files.de>
>
> regards Frank
>
>
>> Gesendet: Freitag, 23. August 2019 um 15:45 Uhr
>> Von: "René van Dorst" <opensource@vdorst.com>
>> An: "John Crispin" <john@phrozen.org>, "Sean Wang"  
>> <sean.wang@mediatek.com>, "Nelson Chang"  
>> <nelson.chang@mediatek.com>, "David S . Miller"  
>> <davem@davemloft.net>, "Matthias Brugger" <matthias.bgg@gmail.com>
>> Cc: netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org,  
>> linux-mediatek@lists.infradead.org, linux-mips@vger.kernel.org,  
>> "Russell King" <linux@armlinux.org.uk>, "Frank Wunderlich"  
>> <frank-w@public-files.de>, "Stefan Roese" <sr@denx.de>, "René van  
>> Dorst" <opensource@vdorst.com>
>> Betreff: [PATCH net-next v3 0/3] net: ethernet: mediatek: convert to PHYLINK
>>
>> These patches converts mediatek driver to PHYLINK API.
>>
>> v2->v3:
>> * Phylink improvements and clean-ups after review
>> v1->v2:
>> * Rebase for mt76x8 changes
>> * Phylink improvements and clean-ups after review
>> * SGMII port doesn't support 2.5Gbit in SGMII mode only in BASE-X mode.
>>   Refactor the code.
>>
>> René van Dorst (3):
>>   net: ethernet: mediatek: Add basic PHYLINK support
>>   net: ethernet: mediatek: Re-add support SGMII
>>   dt-bindings: net: ethernet: Update mt7622 docs and dts to reflect the
>>     new phylink API
>>
>>  .../arm/mediatek/mediatek,sgmiisys.txt        |   2 -
>>  .../dts/mediatek/mt7622-bananapi-bpi-r64.dts  |  28 +-
>>  arch/arm64/boot/dts/mediatek/mt7622.dtsi      |   1 -
>>  drivers/net/ethernet/mediatek/Kconfig         |   2 +-
>>  drivers/net/ethernet/mediatek/mtk_eth_path.c  |  75 +--
>>  drivers/net/ethernet/mediatek/mtk_eth_soc.c   | 529 ++++++++++++------
>>  drivers/net/ethernet/mediatek/mtk_eth_soc.h   |  68 ++-
>>  drivers/net/ethernet/mediatek/mtk_sgmii.c     |  65 ++-
>>  8 files changed, 477 insertions(+), 293 deletions(-)
>>
>> --
>> 2.20.1
>>
>>




^ permalink raw reply

* Re: [PATCH 0/3] Add NETIF_F_HW_BRIDGE feature
From: Jiri Pirko @ 2019-08-24  7:42 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Horatiu Vultur, roopa, nikolay, davem, UNGLinuxDriver,
	alexandre.belloni, allan.nielsen, netdev, linux-kernel, bridge
In-Reply-To: <e47a318c-6446-71cd-660c-8592037d8166@gmail.com>

Sat, Aug 24, 2019 at 01:25:02AM CEST, f.fainelli@gmail.com wrote:
>On 8/22/19 12:07 PM, Horatiu Vultur wrote:
>> Current implementation of the SW bridge is setting the interfaces in
>> promisc mode when they are added to bridge if learning of the frames is
>> enabled.
>> In case of Ocelot which has HW capabilities to switch frames, it is not
>> needed to set the ports in promisc mode because the HW already capable of
>> doing that. Therefore add NETIF_F_HW_BRIDGE feature to indicate that the
>> HW has bridge capabilities. Therefore the SW bridge doesn't need to set
>> the ports in promisc mode to do the switching.
>
>Then do not do anything when the ndo_set_rx_mode() for the ocelot
>network device is called and indicates that IFF_PROMISC is set and that
>your network port is a bridge port member. That is what mlxsw does AFAICT.

Correct.

>
>As other pointed out, the Linux bridge implements a software bridge by
>default, and because it needs to operate on a wide variety of network
>devices, all with different capabilities, the easiest way to make sure
>that all management (IGMP, BPDU, etc. ) as well as non-management
>traffic can make it to the bridge ports, is to put the network devices
>in promiscuous mode. If this is suboptimal for you, you can take
>shortcuts in your driver that do not hinder the overall functionality.
>
>> This optimization takes places only if all the interfaces that are part
>> of the bridge have this flag and have the same network driver.
>> 
>> If the bridge interfaces is added in promisc mode then also the ports part
>> of the bridge are set in promisc mode.
>> 
>> Horatiu Vultur (3):
>>   net: Add HW_BRIDGE offload feature
>>   net: mscc: Use NETIF_F_HW_BRIDGE
>>   net: mscc: Implement promisc mode.
>> 
>>  drivers/net/ethernet/mscc/ocelot.c | 26 ++++++++++++++++++++++++--
>>  include/linux/netdev_features.h    |  3 +++
>>  net/bridge/br_if.c                 | 29 ++++++++++++++++++++++++++++-
>>  net/core/ethtool.c                 |  1 +
>>  4 files changed, 56 insertions(+), 3 deletions(-)
>> 
>
>
>-- 
>Florian

^ permalink raw reply

* [PATCH] r8152: Set memory to all 0xFFs on failed reg reads
From: Prashant Malani @ 2019-08-24  8:36 UTC (permalink / raw)
  To: hayeswang, davem; +Cc: grundler, netdev, Prashant Malani

get_registers() blindly copies the memory written to by the
usb_control_msg() call even if the underlying urb failed.

This could lead to junk register values being read by the driver, since
some indirect callers of get_registers() ignore the return values. One
example is:
  ocp_read_dword() ignores the return value of generic_ocp_read(), which
  calls get_registers().

So, emulate PCI "Master Abort" behavior by setting the buffer to all
0xFFs when usb_control_msg() fails.

This patch is copied from the r8152 driver (v2.12.0) published by
Realtek (www.realtek.com).

Signed-off-by: Prashant Malani <pmalani@chromium.org>
---
 drivers/net/usb/r8152.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 0cc03a9ff545..eee0f5007ee3 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -799,8 +799,11 @@ int get_registers(struct r8152 *tp, u16 value, u16 index, u16 size, void *data)
 	ret = usb_control_msg(tp->udev, usb_rcvctrlpipe(tp->udev, 0),
 			      RTL8152_REQ_GET_REGS, RTL8152_REQT_READ,
 			      value, index, tmp, size, 500);
+	if (ret < 0)
+		memset(data, 0xff, size);
+	else
+		memcpy(data, tmp, size);

-	memcpy(data, tmp, size);
 	kfree(tmp);

 	return ret;
-- 
2.23.0.187.g17f5b7556c-goog

^ permalink raw reply related

* Re: [PATCH net-next v3 1/3] net: ethernet: mediatek: Add basic PHYLINK support
From: Russell King - ARM Linux admin @ 2019-08-24  9:11 UTC (permalink / raw)
  To: René van Dorst
  Cc: John Crispin, Sean Wang, Nelson Chang, David S . Miller,
	Matthias Brugger, netdev, linux-arm-kernel, linux-mediatek,
	linux-mips, Frank Wunderlich, Stefan Roese
In-Reply-To: <20190823134516.27559-2-opensource@vdorst.com>

On Fri, Aug 23, 2019 at 03:45:14PM +0200, René van Dorst wrote:
> This convert the basics to PHYLINK API.
> SGMII support is not in this patch.
> 
> Signed-off-by: René van Dorst <opensource@vdorst.com>
> --
> v2->v3:
> * Make link_down() similar as link_up() suggested by Russell King

Yep, almost there, but...

> +static void mtk_mac_link_down(struct phylink_config *config, unsigned int mode,
> +			      phy_interface_t interface)
> +{
> +	struct mtk_mac *mac = container_of(config, struct mtk_mac,
> +					   phylink_config);
> +	u32 mcr = mtk_r32(mac->hw, MTK_MAC_MCR(mac->id));
>  
> +	mcr &= (MAC_MCR_TX_EN | MAC_MCR_RX_EN);

... this clears all bits _except_ for the tx and rx enable (which will
remain set) - you probably wanted a ~ before the (.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply

* Re: [PATCH net-next v3 2/3] net: ethernet: mediatek: Re-add support SGMII
From: Russell King - ARM Linux admin @ 2019-08-24  9:21 UTC (permalink / raw)
  To: René van Dorst
  Cc: John Crispin, Sean Wang, Nelson Chang, David S . Miller,
	Matthias Brugger, netdev, linux-arm-kernel, linux-mediatek,
	linux-mips, Frank Wunderlich, Stefan Roese
In-Reply-To: <20190823134516.27559-3-opensource@vdorst.com>

On Fri, Aug 23, 2019 at 03:45:15PM +0200, René van Dorst wrote:
> +	switch (state->interface) {
> +	case PHY_INTERFACE_MODE_SGMII:
> +		phylink_set(mask, 10baseT_Half);
> +		phylink_set(mask, 10baseT_Full);
> +		phylink_set(mask, 100baseT_Half);
> +		phylink_set(mask, 100baseT_Full);

You also want 1000baseX_Full here - the connected PHY could have a fiber
interface on it.

> +		/* fall through */
> +	case PHY_INTERFACE_MODE_TRGMII:
>  		phylink_set(mask, 1000baseT_Full);

I don't know enough about this interface type to comment whether it
should support 1000baseX_Full - if this is connected to a PHY that may
support fiber, then it ought to set it.

> +		break;
> +	case PHY_INTERFACE_MODE_2500BASEX:
> +		phylink_set(mask, 2500baseX_Full);
> +		/* fall through */
> +	case PHY_INTERFACE_MODE_1000BASEX:
> +		phylink_set(mask, 1000baseX_Full);

Both should be set.  The reasoning here is that if you have a
Fiberchannel 4Gbaud SFP plugged in and connected directly to the
MAC, it can operate at either 2500Base-X or 1000Base-X.  If we
decide to operate at 2500Base-X, then PHY_INTERFACE_MODE_2500BASEX
will be chosen.  Otherwise, PHY_INTERFACE_MODE_1000BASEX will be
used.

The user can use ethtool to control which interface mode is used
by adjusting the advertise mask and/or placing the interface in
manual mode and setting the speed directly.  This will change
the PHY_INTERFACE_MODE_xxxxBASEX (via phylink_helper_basex_speed())
between the two settings.

If we lose 2500baseX_Full when 1000Base-X is selected, the user
will not be able to go back to 2500Base-X mode.

Yes, it's a little confusing and has slightly different rules
from the other modes - partly due to phylink_helper_basex_speed().
These are the only interface modes that we dynamically switch
between depending on the settings that the user configures via
ethtool.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply

* Re: 5.3-rc3-ish VM crash: RIP: 0010:tcp_trim_head+0x20/0xe0
From: Sander Eikelenboom @ 2019-08-24 10:22 UTC (permalink / raw)
  To: Eric Dumazet, netdev, linux-kernel
In-Reply-To: <59dd3497-6d08-1e0e-7a4f-b121b850a24f@gmail.com>

On 17/08/2019 18:35, Eric Dumazet wrote:
> 
> 
> On 8/17/19 10:24 AM, Sander Eikelenboom wrote:
>> On 12/08/2019 19:56, Eric Dumazet wrote:
>>>
>>>
>>> On 8/12/19 2:50 PM, Sander Eikelenboom wrote:
>>>> L.S.,
>>>>
>>>> While testing a somewhere-after-5.3-rc3 kernel (which included the latest net merge (33920f1ec5bf47c5c0a1d2113989bdd9dfb3fae9),
>>>> one of my Xen VM's (which gets quite some network load) crashed.
>>>> See below for the stacktrace.
>>>>
>>>> Unfortunately I haven't got a clear trigger, so bisection doesn't seem to be an option at the moment. 
>>>> I haven't encountered this on 5.2, so it seems to be an regression against 5.2.
>>>>
>>>> Any ideas ?
>>>>
>>>> --
>>>> Sander
>>>>
>>>>
>>>> [16930.653595] general protection fault: 0000 [#1] SMP NOPTI
>>>> [16930.653624] CPU: 0 PID: 3275 Comm: rsync Not tainted 5.3.0-rc3-20190809-doflr+ #1
>>>> [16930.653657] RIP: 0010:tcp_trim_head+0x20/0xe0
>>>> [16930.653677] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8
>>>> [16930.653741] RSP: 0000:ffffc90000003ad8 EFLAGS: 00010286
>>>> [16930.653762] RAX: fffe888005bf62c0 RBX: ffff8880115fb800 RCX: 000000008010000b
>>>
>>> crash in " mov    0x20(%rax),%eax"   and RAX=fffe888005bf62c0 (not a valid kernel address)
>>>
>>> Look like one bit corruption maybe.
>>>
>>> Nothing comes to mind really between 5.2 and 53 that could explain this.
>>>
>>>> [16930.653791] RDX: 00000000000005a0 RSI: ffff8880115fb800 RDI: ffff888016b00880
>>>> [16930.653819] RBP: ffff888016b00880 R08: 0000000000000001 R09: 0000000000000000
>>>> [16930.653848] R10: ffff88800ae00800 R11: 00000000bfe632e6 R12: 00000000000005a0
>>>> [16930.653875] R13: 0000000000000001 R14: 00000000bfe62d46 R15: 0000000000000004
>>>> [16930.653913] FS:  00007fe71fe2cb80(0000) GS:ffff88801f200000(0000) knlGS:0000000000000000
>>>> [16930.653943] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [16930.653965] CR2: 000055de0f3e7000 CR3: 0000000011f32000 CR4: 00000000000006f0
>>>> [16930.653993] Call Trace:
>>>> [16930.654005]  <IRQ>
>>>> [16930.654018]  tcp_ack+0xbb0/0x1230
>>>> [16930.654033]  tcp_rcv_established+0x2e8/0x630
>>>> [16930.654053]  tcp_v4_do_rcv+0x129/0x1d0
>>>> [16930.654070]  tcp_v4_rcv+0xac9/0xcb0
>>>> [16930.654088]  ip_protocol_deliver_rcu+0x27/0x1b0
>>>> [16930.654109]  ip_local_deliver_finish+0x3f/0x50
>>>> [16930.654128]  ip_local_deliver+0x4d/0xe0
>>>> [16930.654145]  ? ip_protocol_deliver_rcu+0x1b0/0x1b0
>>>> [16930.654163]  ip_rcv+0x4c/0xd0
>>>> [16930.654179]  __netif_receive_skb_one_core+0x79/0x90
>>>> [16930.654200]  netif_receive_skb_internal+0x2a/0xa0
>>>> [16930.654219]  napi_gro_receive+0xe7/0x140
>>>> [16930.654237]  xennet_poll+0x9be/0xae0
>>>> [16930.654254]  net_rx_action+0x136/0x340
>>>> [16930.654271]  __do_softirq+0xdd/0x2cf
>>>> [16930.654287]  irq_exit+0x7a/0xa0
>>>> [16930.654304]  xen_evtchn_do_upcall+0x27/0x40
>>>> [16930.654320]  xen_hvm_callback_vector+0xf/0x20
>>>> [16930.654339]  </IRQ>
>>>> [16930.654349] RIP: 0033:0x55de0d87db99
>>>> [16930.654364] Code: 00 00 48 89 7c 24 f8 45 39 fe 45 0f 42 fe 44 89 7c 24 f4 eb 09 0f 1f 40 00 83 e9 01 74 3e 89 f2 48 63 f8 4c 01 d2 44 38 1c 3a <75> 25 44 38 6c 3a ff 75 1e 41 0f b6 3c 24 40 38 3a 75 14 41 0f b6
>>>> [16930.654432] RSP: 002b:00007ffd5531eec8 EFLAGS: 00000a87 ORIG_RAX: ffffffffffffff0c
>>>> [16930.655004] RAX: 0000000000000002 RBX: 000055de0f3e8e50 RCX: 000000000000007f
>>>> [16930.655034] RDX: 000055de0f3dc2d2 RSI: 0000000000003492 RDI: 0000000000000002
>>>> [16930.655062] RBP: 0000000000007fff R08: 00000000000080ea R09: 00000000000001f0
>>>> [16930.655089] R10: 000055de0f3d8e40 R11: 0000000000000094 R12: 000055de0f3e0f2a
>>>> [16930.655116] R13: 0000000000000010 R14: 0000000000007f16 R15: 0000000000000080
>>>> [16930.655144] Modules linked in:
>>>> [16930.655200] ---[ end trace 533367c95501b645 ]---
>>>> [16930.655223] RIP: 0010:tcp_trim_head+0x20/0xe0
>>>> [16930.655243] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8
>>>> [16930.655312] RSP: 0000:ffffc90000003ad8 EFLAGS: 00010286
>>>> [16930.655331] RAX: fffe888005bf62c0 RBX: ffff8880115fb800 RCX: 000000008010000b
>>>> [16930.655360] RDX: 00000000000005a0 RSI: ffff8880115fb800 RDI: ffff888016b00880
>>>> [16930.655387] RBP: ffff888016b00880 R08: 0000000000000001 R09: 0000000000000000
>>>> [16930.655414] R10: ffff88800ae00800 R11: 00000000bfe632e6 R12: 00000000000005a0
>>>> [16930.655441] R13: 0000000000000001 R14: 00000000bfe62d46 R15: 0000000000000004
>>>> [16930.655475] FS:  00007fe71fe2cb80(0000) GS:ffff88801f200000(0000) knlGS:0000000000000000
>>>> [16930.655502] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [16930.655525] CR2: 000055de0f3e7000 CR3: 0000000011f32000 CR4: 00000000000006f0
>>>> [16930.655553] Kernel panic - not syncing: Fatal exception in interrupt
>>>> [16930.655789] Kernel Offset: disabled
>>>>
>>
>> Hi Eric,
>>
>> Got another VM crash, with a slightly different stacktrace this time around.
>> Still networking though.
>>
>> --
>> Sander
>>
>> [112522.697498] general protection fault: 0000 [#1] SMP NOPTI
>> [112522.697555] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc4-20190812-doflr+ #1
>> [112522.697592] RIP: 0010:skb_shift+0x63/0x430
>> [112522.697608] Code: bc 00 00 00 48 03 8f c0 00 00 00 f6 41 03 08 74 07 48 83 79 28 00 75 d0 8b 8e bc 00 00 00 48 03 8e c0 00 00 00 48 85 f6 74 0a <f6> 41 03 08 0f 85 09 03 00 00 49 89 fd 8b bf bc 00 00 00 41 89 
> 
> 
> 
> crash in "testb  $0x8,0x3(%rcx)"  with RCX==fffe8880117da6c0
> 
> Same strange looking address on x86_64
> 
> I have no idea.
> 
>> [112522.697673] RSP: 0018:ffffc900000039b0 EFLAGS: 00010286
>> [112522.697693] RAX: 00000000000005a0 RBX: ffff8880117fb800 RCX: fffe8880117da6c0
>> [112522.697721] RDX: 00000000000005a0 RSI: ffff8880117fb800 RDI: ffff88800ae58000
>> [112522.697748] RBP: ffffc900000039e8 R08: 000000000004cfe0 R09: 00000000000005a0
>> [112522.697775] R10: 00000000000005a0 R11: ffff8880117fb800 R12: 0000000000000000
>> [112522.697803] R13: 00000000c95a98c2 R14: 0000000000000000 R15: ffff88800ae58000
>> [112522.697839] FS:  0000000000000000(0000) GS:ffff88801f200000(0000) knlGS:0000000000000000
>> [112522.697869] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [112522.697895] CR2: 00007f9210d8e078 CR3: 000000000b660000 CR4: 00000000000006f0
>> [112522.697925] Call Trace:
>> [112522.697938]  <IRQ>
>> [112522.697951]  tcp_sacktag_walk+0x2af/0x480
>> [112522.697967]  tcp_sacktag_write_queue+0x34d/0x820
>> [112522.697986]  ? ip_forward_options.cold.0+0x1c/0x1c
>> [112522.698007]  tcp_ack+0xb8c/0x1230
>> [112522.698023]  ? tcp_event_new_data_sent+0x4a/0x90
>> [112522.698043]  tcp_rcv_established+0x14c/0x630
>> [112522.698064]  tcp_v4_do_rcv+0x129/0x1d0
>> [112522.698081]  tcp_v4_rcv+0xac9/0xcb0
>> [112522.698099]  ip_protocol_deliver_rcu+0x27/0x1b0
>> [112522.698119]  ip_local_deliver_finish+0x3f/0x50
>> [112522.698139]  ip_local_deliver+0x4d/0xe0
>> [112522.698155]  ? ip_protocol_deliver_rcu+0x1b0/0x1b0
>> [112522.698177]  ip_rcv+0x4c/0xd0
>> [112522.698194]  __netif_receive_skb_one_core+0x79/0x90
>> [112522.698215]  netif_receive_skb_internal+0x2a/0xa0
>> [112522.698237]  napi_gro_receive+0xe7/0x140
>> [112522.698255]  xennet_poll+0x9be/0xae0
>> [112522.698271]  net_rx_action+0x136/0x340
>> [112522.698288]  __do_softirq+0xdd/0x2cf
>> [112522.698304]  irq_exit+0x7a/0xa0
>> [112522.698321]  xen_evtchn_do_upcall+0x27/0x40
>> [112522.698340]  xen_hvm_callback_vector+0xf/0x20
>> [112522.698359]  </IRQ>
>> [112522.698373] RIP: 0010:native_safe_halt+0xe/0x10
>> [112522.698392] Code: 48 8b 04 25 c0 6b 01 00 f0 80 48 02 20 48 8b 00 a8 08 75 c4 eb 80 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 54 fb 41 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d 44 fb 41 00 f4 c3 90 90 41 55 41 54
>> [112522.699522] RSP: 0018:ffffffff82a03e90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff0c
>> [112522.699552] RAX: 0001a54800000000 RBX: 0000000000000000 RCX: 0000000000000001
>> [112522.699580] RDX: 0000000002b9f9b6 RSI: 0000000000000087 RDI: 0000000000000000
>> [112522.699608] RBP: 0000000000000000 R08: 000000001eb5c3cb R09: ffffffff82a08460
>> [112522.699634] R10: 000000000002e46e R11: 0000000000000000 R12: 0000000000000000
>> [112522.699662] R13: 0000000000000000 R14: ffffffff8326e0a0 R15: 0000000000000000
>> [112522.699692]  default_idle+0x17/0x140
>> [112522.699709]  do_idle+0x1ee/0x210
>> [112522.699726]  cpu_startup_entry+0x14/0x20
>> [112522.699743]  start_kernel+0x4e9/0x50b
>> [112522.699760]  secondary_startup_64+0xa4/0xb0
>> [112522.699780] Modules linked in:
>> [112522.699829] ---[ end trace 3b8db3603485e952 ]---
>> [112522.699850] RIP: 0010:skb_shift+0x63/0x430
>> [112522.699866] Code: bc 00 00 00 48 03 8f c0 00 00 00 f6 41 03 08 74 07 48 83 79 28 00 75 d0 8b 8e bc 00 00 00 48 03 8e c0 00 00 00 48 85 f6 74 0a <f6> 41 03 08 0f 85 09 03 00 00 49 89 fd 8b bf bc 00 00 00 41 89 d4
>> [112522.699938] RSP: 0018:ffffc900000039b0 EFLAGS: 00010286
>> [112522.699959] RAX: 00000000000005a0 RBX: ffff8880117fb800 RCX: fffe8880117da6c0
>> [112522.699986] RDX: 00000000000005a0 RSI: ffff8880117fb800 RDI: ffff88800ae58000
>> [112522.700013] RBP: ffffc900000039e8 R08: 000000000004cfe0 R09: 00000000000005a0
>> [112522.700041] R10: 00000000000005a0 R11: ffff8880117fb800 R12: 0000000000000000
>> [112522.700067] R13: 00000000c95a98c2 R14: 0000000000000000 R15: ffff88800ae58000
>> [112522.700111] FS:  0000000000000000(0000) GS:ffff88801f200000(0000) knlGS:0000000000000000
>> [112522.700140] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [112522.700165] CR2: 00007f9210d8e078 CR3: 000000000b660000 CR4: 00000000000006f0
>> [112522.700201] Kernel panic - not syncing: Fatal exception in interrupt
>> [112522.702992] Kernel Offset: disabled
>>
>>

And another one (on an RC5 kernel now), but I probably know the answer :( ...
Is there any extra debugging options I can turn on that could shed some more light on this ?
(it doesn't seem to be an incident and happens at about 3 to 4 days of uptime, so bisecting is not a realistic option).

--
Sander



[327011.399239] general protection fault: 0000 [#1] SMP NOPTI
[327011.399271] CPU: 0 PID: 19440 Comm: rsync Not tainted 5.3.0-rc5-20190820-doflr+ #1
[327011.399309] RIP: 0010:skb_shift+0x63/0x430
[327011.399326] Code: bc 00 00 00 48 03 8f c0 00 00 00 f6 41 03 08 74 07 48 83 79 28 00 75 d0 8b 8e bc 00 00 00 48 03 8e c0 00 00 00 48 85 f6 74 0a <f6> 41 03 08 0f 85 09 03 00 00 49 89 fd 8b bf bc 00 00 00 41 89 d4
[327011.399407] RSP: 0000:ffffc900000039b0 EFLAGS: 00010286
[327011.399427] RAX: 00000000000005a0 RBX: ffff8880115fb800 RCX: fffe8880115152c0
[327011.399456] RDX: 00000000000005a0 RSI: ffff8880115fb800 RDI: ffff88801b0f2200
[327011.399485] RBP: ffffc900000039e8 R08: 0000000000070260 R09: 00000000000005a0
[327011.399515] R10: 00000000000005a0 R11: ffff8880115fb800 R12: 0000000000000000
[327011.399544] R13: 000000000ca599b0 R14: 0000000000000000 R15: ffff88801b0f2200
[327011.399581] FS:  00007f20c1b8bb80(0000) GS:ffff88801f200000(0000) knlGS:0000000000000000
[327011.399610] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[327011.399633] CR2: 00007ffac4c245cc CR3: 0000000005544000 CR4: 00000000000006f0
[327011.399663] Call Trace:
[327011.399675]  <IRQ>
[327011.399688]  tcp_sacktag_walk+0x2af/0x480
[327011.399705]  ? __alloc_skb+0x52/0x1d0
[327011.399721]  tcp_sacktag_write_queue+0x34d/0x820
[327011.399743]  ? xennet_tx_setup_grant+0xbd/0x140
[327011.399762]  tcp_ack+0xb8c/0x1230
[327011.399778]  ? gnttab_foreach_grant_in_range+0x83/0xf0
[327011.399799]  ? xennet_tx_setup_grant+0x140/0x140
[327011.399819]  ? xennet_make_txreqs+0x81/0xb0
[327011.399838]  tcp_rcv_established+0x14c/0x630
[327011.399860]  tcp_v4_do_rcv+0x129/0x1d0
[327011.399876]  tcp_v4_rcv+0xac9/0xcb0
[327011.399892]  ip_protocol_deliver_rcu+0x27/0x1b0
[327011.399912]  ip_local_deliver_finish+0x3f/0x50
[327011.399932]  ip_local_deliver+0x4d/0xe0
[327011.399957]  ? ip_protocol_deliver_rcu+0x1b0/0x1b0
[327011.399977]  ip_rcv+0x4c/0xd0
[327011.400312]  __netif_receive_skb_one_core+0x79/0x90
[327011.400347]  netif_receive_skb_internal+0x2a/0xa0
[327011.400367]  napi_gro_receive+0xe7/0x140
[327011.400384]  xennet_poll+0x9be/0xae0
[327011.400401]  net_rx_action+0x136/0x340
[327011.400419]  __do_softirq+0xdd/0x2cf
[327011.400437]  irq_exit+0x7a/0xa0
[327011.400454]  xen_evtchn_do_upcall+0x27/0x40
[327011.400474]  xen_hvm_callback_vector+0xf/0x20
[327011.400501]  </IRQ>
[327011.400514] RIP: 0033:0x5620499a31aa
[327011.400532] Code: 43 4c 0f b6 34 0e 8b 8b 80 00 00 00 d3 e7 89 f9 48 8b 7b 60 31 f1 23 4b 7c 48 8b 73 68 83 fd 07 89 4b 70 48 8d 0c 4e 0f b7 31 <66> 42 89 34 47 66 89 11 0f 84 88 01 00 00 8b 8b 90 00 00 00 8b 83
[327011.400606] RSP: 002b:00007ffe82874060 EFLAGS: 00000297 ORIG_RAX: ffffffffffffff0c
[327011.400638] RAX: 0000000000001f7c RBX: 000056204a861190 RCX: 000056204a8cd986
[327011.400671] RDX: 000000000000e084 RSI: 0000000000000000 RDI: 000056204a8bae50
[327011.400703] RBP: 0000000000000000 R08: 0000000000006084 R09: 0000000000003fdf
[327011.400735] R10: 0000000000000000 R11: 0000000000000001 R12: 00005620499b9500
[327011.400769] R13: 000000000000206a R14: 00000000ffffffff R15: 000056204a861190
[327011.400802] Modules linked in:
[327011.400884] ---[ end trace 660afb6bf8586996 ]---
[327011.400910] RIP: 0010:skb_shift+0x63/0x430
[327011.400929] Code: bc 00 00 00 48 03 8f c0 00 00 00 f6 41 03 08 74 07 48 83 79 28 00 75 d0 8b 8e bc 00 00 00 48 03 8e c0 00 00 00 48 85 f6 74 0a <f6> 41 03 08 0f 85 09 03 00 00 49 89 fd 8b bf bc 00 00 00 41 89 d4
[327011.401012] RSP: 0000:ffffc900000039b0 EFLAGS: 00010286
[327011.401042] RAX: 00000000000005a0 RBX: ffff8880115fb800 RCX: fffe8880115152c0
[327011.401081] RDX: 00000000000005a0 RSI: ffff8880115fb800 RDI: ffff88801b0f2200
[327011.401116] RBP: ffffc900000039e8 R08: 0000000000070260 R09: 00000000000005a0
[327011.401157] R10: 00000000000005a0 R11: ffff8880115fb800 R12: 0000000000000000
[327011.401198] R13: 000000000ca599b0 R14: 0000000000000000 R15: ffff88801b0f2200
[327011.401244] FS:  00007f20c1b8bb80(0000) GS:ffff88801f200000(0000) knlGS:0000000000000000
[327011.401293] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[327011.404822] CR2: 00007ffac4c245cc CR3: 0000000005544000 CR4: 00000000000006f0
[327011.404866] Kernel panic - not syncing: Fatal exception in interrupt
[327011.405103] Kernel Offset: disabled

^ permalink raw reply

* Re: [RFC 4/4] net: cdc_ncm: Add ACPI MAC address pass through functionality
From: Bjørn Mork @ 2019-08-24 10:43 UTC (permalink / raw)
  To: Charles.Hyde
  Cc: linux-usb, linux-acpi, gregkh, Mario.Limonciello, oliver, netdev,
	nic_swsd
In-Reply-To: <ec7435e0529243a99f6949ee9efbede5@AUSX13MPS303.AMER.DELL.COM>

<Charles.Hyde@dellteam.com> writes:

> @@ -930,11 +931,18 @@ int cdc_ncm_bind_common(struct usbnet *dev, struct usb_interface *intf, u8 data_
>  	usb_set_intfdata(ctx->control, dev);
>  
>  	if (ctx->ether_desc) {
> +		struct sockaddr sa;
> +
>  		temp = usbnet_get_ethernet_addr(dev, ctx->ether_desc->iMACAddress);
>  		if (temp) {
>  			dev_dbg(&intf->dev, "failed to get mac address\n");
>  			goto error2;
>  		}
> +		if (get_acpi_mac_passthru(&intf->dev, &sa) == 0) {
> +			memcpy(dev->net->dev_addr, sa.sa_data, ETH_ALEN);
> +			if (usbnet_set_ethernet_addr(dev) < 0)
> +				usbnet_get_ethernet_addr(dev, ctx->ether_desc->iMACAddress);
> +		}
>  		dev_info(&intf->dev, "MAC-Address: %pM\n", dev->net->dev_addr);
>  	}

So you want to run a Dell specific ACPI method every time anyone plugs
some NCM class device into a host supporing ACPI?  That's going to annoy
the hell out of 99.9997% of the x86, ia64 and arm64 users of this
driver.

Call ACPI once when the driver loads, and only if running on an actual
Dell system where this method is supported.  There must be some ACPI
device ID you can match on to know if this method is supported or not?


Bjørn

^ permalink raw reply

* Re: [PATCH 12/16] arm64: prefer __section from compiler_attributes.h
From: Will Deacon @ 2019-08-24 11:25 UTC (permalink / raw)
  To: Miguel Ojeda
  Cc: Nick Desaulniers, Andrew Morton, Sedat Dilek, Josh Poimboeuf,
	Yonghong Song, clang-built-linux, Catalin Marinas,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Andrey Konovalov, Greg Kroah-Hartman, Enrico Weigelt,
	Suzuki K Poulose, Thomas Gleixner, Masayoshi Mizuma,
	Shaokun Zhang, Alexios Zavras, Allison Randal, Linux ARM,
	linux-kernel, Network Development, bpf
In-Reply-To: <CANiq72nfn4zxAO63GEEoUjumC6Jwi5_jdcD_5Xzt1vZRgh52fg@mail.gmail.com>

On Fri, Aug 23, 2019 at 09:35:08PM +0200, Miguel Ojeda wrote:
> On Thu, Aug 15, 2019 at 11:12 AM Miguel Ojeda
> <miguel.ojeda.sandonis@gmail.com> wrote:
> >
> > Btw, I guess that is the Oops you were mentioning in the cover letter?
> 
> Pinging about this...

Which bit are you pinging about? This patch (12/16) has been in -next for a
while and is queued in the arm64 tree for 5.4. The Oops/boot issue is
addressed in patch 14 which probably needs to be sent as a separate patch
(with a commit message) if it's targetting 5.3 and, I assume, routed via
somebody like akpm.

Will

^ permalink raw reply

* Fw: [Bug 204681] New: Kernel BUG/Oops:  tc qdisc delete with tc filter action xt -j CONNMARK
From: Stephen Hemminger @ 2019-08-24 11:57 UTC (permalink / raw)
  To: netdev



Begin forwarded message:

Date: Fri, 23 Aug 2019 23:44:26 +0000
From: bugzilla-daemon@bugzilla.kernel.org
To: stephen@networkplumber.org
Subject: [Bug 204681] New: Kernel BUG/Oops:  tc qdisc delete with tc filter action xt -j CONNMARK


https://bugzilla.kernel.org/show_bug.cgi?id=204681

            Bug ID: 204681
           Summary: Kernel BUG/Oops:  tc qdisc delete with tc filter
                    action xt -j CONNMARK
           Product: Networking
           Version: 2.5
    Kernel Version: 5.2.6 (x86_64) / 4.19.62 (mips32)/ 4.14.133 (arm7l)
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: Other
          Assignee: stephen@networkplumber.org
          Reporter: itugrok@yahoo.com
        Regression: No

Created attachment 284581
  --> https://bugzilla.kernel.org/attachment.cgi?id=284581&action=edit  
BUG/Oops: kernel 5.2.6/x86_64

Overview:
=========

Several uses of "tc filter .. action xt" work as expected and also allow final
qdisc/filter deletion: e.g. xt_DSCP and xt_CLASSIFY.

However, trying to delete a qdisc/filter using xt_CONNMARK results in a kernel
oops or hang/crash on all platforms and kernel versions tested.


Steps to Reproduce:
===================

# tc qdisc add dev lo clsact
# tc filter add dev lo egress protocol ip matchall action xt -j CONNMARK
--save-mark
# tc qdisc del dev lo clsact
<Kernel Oops>


Systems Tested:
===============

Ubuntu 18.04 LTS (mainline kernel 5.2.6/x86_64, iptables 1.6.1, iproute2 4.15)
(Kernel build: https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.2.6/)

Ubuntu 18.04 LTS (distro kernel 4.15/x86_64, iptables 1.6.1, iproute2 4.15)

OpenWrt master, r10666-fc5d46dc62 (kernel 4.19.62, mips32_be, iptables 1.8.3,
iproute2 5.0.0)

OpenWrt 19.07, r10324-8bf8de95a2 (kernel 4.14.133, armv7l, iptables 1.8.3,
iproute2 5.0.0)


Kernel Logs:
============

The clearest call traces are from the Ubuntu systems, and are attached.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply

* Fw: [Bug 204669] New: TLS: module crash with TLS record double free
From: Stephen Hemminger @ 2019-08-24 12:03 UTC (permalink / raw)
  To: netdev

This may have already been fixed, but forwarding to check.

Begin forwarded message:

Date: Thu, 22 Aug 2019 12:34:10 +0000
From: bugzilla-daemon@bugzilla.kernel.org
To: stephen@networkplumber.org
Subject: [Bug 204669] New: TLS: module crash with TLS record double free


https://bugzilla.kernel.org/show_bug.cgi?id=204669

            Bug ID: 204669
           Summary: TLS: module crash with TLS record double free
           Product: Networking
           Version: 2.5
    Kernel Version: v5.3-rc4
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: blocking
          Priority: P1
         Component: IPV4
          Assignee: stephen@networkplumber.org
          Reporter: mallesh537@gmail.com
        Regression: No

Created attachment 284565
  --> https://bugzilla.kernel.org/attachment.cgi?id=284565&action=edit  
tls crash log

TLS module is crashing While running SSL record encryption using
Klts_send_[file] 

Precondition:
1) Installed 5.3-rc4.
2) Nitrox5 card pluggin.


Steps to produce the issue:
1) Install n5pf.ko.(drivers/crypto/cavium/nitrox)
2) Install tls.ko if not is installed by default(net/tls)
3) Taken uperf tool from git.
   3.1) Modified uperf to use tls module by using setsocket.
   3.2) Modified uperf tool to support sendfile with SSL.


Test:
1) Running uperf with 4threads.
2) Each Thread send the data using sendfile over SSL protocol.


After few seconds kernel is crashing because of record list corruption


[  270.888952] ------------[ cut here ]------------
[  270.890450] list_del corruption, ffff91cc3753a800->prev is LIST_POISON2
(dead000000000122)
[  270.891194] WARNING: CPU: 1 PID: 7387 at lib/list_debug.c:50
__list_del_entry_valid+0x62/0x90
[  270.892037] Modules linked in: n5pf(OE) netconsole tls(OE) bonding
intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp
coretemp kvm_intel kvm iTCO_wdt iTCO_vendor_support irqbypass crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd mei_me cryptd
glue_helper ipmi_si sg mei lpc_ich pcspkr joydev ioatdma i2c_i801 ipmi_devintf
ipmi_msghandler wmi ip_tables xfs libcrc32c sd_mod mgag200 drm_vram_helper ttm
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm isci libsas
ahci scsi_transport_sas libahci crc32c_intel serio_raw igb libata ptp pps_core
dca i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
nitrox_drv]
[  270.896836] CPU: 1 PID: 7387 Comm: uperf Kdump: loaded Tainted: G          
OE     5.3.0-rc4 #1
[  270.897711] Hardware name: Supermicro SYS-1027R-N3RF/X9DRW, BIOS 3.0c
03/24/2014
[  270.898597] RIP: 0010:__list_del_entry_valid+0x62/0x90
[  270.899478] Code: 00 00 00 c3 48 89 fe 48 89 c2 48 c7 c7 e0 f9 ee 8d e8 b2
cf c8 ff 0f 0b 31 c0 c3 48 89 fe 48 c7 c7 18 fa ee 8d e8 9e cf c8 ff <0f> 0b 31
c0 c3 48 89 f2 48 89 fe 48 c7 c7 50 fa ee 8d e8 87 cf c8
[  270.901321] RSP: 0018:ffffb6ea86eb7c20 EFLAGS: 00010282
[  270.902240] RAX: 0000000000000000 RBX: ffff91cc3753c000 RCX:
0000000000000000
[  270.903157] RDX: ffff91bc3f867080 RSI: ffff91bc3f857738 RDI:
ffff91bc3f857738
[  270.904074] RBP: ffff91bc36020940 R08: 0000000000000560 R09:
0000000000000000
[  270.904988] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
[  270.905902] R13: ffff91cc3753a800 R14: ffff91cc37cc6400 R15:
ffff91cc3753a800
[  270.906809] FS:  00007f454a88d700(0000) GS:ffff91bc3f840000(0000)
knlGS:0000000000000000
[  270.907715] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  270.908606] CR2: 00007f453c00292c CR3: 000000103554e003 CR4:
00000000001606e0
[  270.909490] Call Trace:
[  270.910373]  tls_tx_records+0x138/0x1c0 [tls]
[  270.911262]  tls_sw_sendpage+0x3e0/0x420 [tls]
[  270.912154]  inet_sendpage+0x52/0x90
[  270.913045]  ? direct_splice_actor+0x40/0x40
[  270.913941]  kernel_sendpage+0x1a/0x30
[  270.914831]  sock_sendpage+0x20/0x30
[  270.915714]  pipe_to_sendpage+0x62/0x90
[  270.916592]  __splice_from_pipe+0x80/0x180
[  270.917461]  ? direct_splice_actor+0x40/0x40
[  270.918334]  splice_from_pipe+0x5d/0x90
[  270.919208]  direct_splice_actor+0x35/0x40
[  270.920086]  splice_direct_to_actor+0x103/0x230
[  270.920966]  ? generic_pipe_buf_nosteal+0x10/0x10
[  270.921850]  do_splice_direct+0x9a/0xd0
[  270.922733]  do_sendfile+0x1c9/0x3d0
[  270.923612]  __x64_sys_sendfile64+0x5c/0xc0


(gdb) list *(tls_tx_records+0x138)
0x2d18 is in tls_tx_records (./include/linux/list.h:131).
126      * Note: list_empty() on entry does not return true after this, the
entry is
127      * in an undefined state.
128      */
129     static inline void __list_del_entry(struct list_head *entry)
130     {
131             if (!__list_del_entry_valid(entry))
132                     return;
133     
134             __list_del(entry->prev, entry->next);
135     }
(gdb) 
(gdb) list *(tls_sw_sendpage+0x3e0)
0x48e0 is in tls_sw_sendpage (/home/mjatharkonda/5_3_rc4/tls/tls_sw.c:1211).
1206    
1207            if (num_async) {
1208                    /* Transmit if any encryptions have completed */
1209                    if (test_and_clear_bit(BIT_TX_SCHEDULED,
&ctx->tx_bitmask)) {
1210                            cancel_delayed_work(&ctx->tx_work.work);
1211                            tls_tx_records(sk, flags);
1212                    }
1213            }
1214    sendpage_end:
1215            ret = sk_stream_error(sk, flags, ret);
(gdb) 



Attached complete crash log

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply

* Re: [Bridge] [PATCH 0/3] Add NETIF_F_HW_BRIDGE feature
From: Stephen Hemminger @ 2019-08-24 12:05 UTC (permalink / raw)
  To: Horatiu Vultur
  Cc: roopa, nikolay, davem, UNGLinuxDriver, alexandre.belloni,
	allan.nielsen, netdev, linux-kernel, bridge
In-Reply-To: <1566500850-6247-1-git-send-email-horatiu.vultur@microchip.com>

On Thu, 22 Aug 2019 21:07:27 +0200
Horatiu Vultur <horatiu.vultur@microchip.com> wrote:

> Current implementation of the SW bridge is setting the interfaces in
> promisc mode when they are added to bridge if learning of the frames is
> enabled.
> In case of Ocelot which has HW capabilities to switch frames, it is not
> needed to set the ports in promisc mode because the HW already capable of
> doing that. Therefore add NETIF_F_HW_BRIDGE feature to indicate that the
> HW has bridge capabilities. Therefore the SW bridge doesn't need to set
> the ports in promisc mode to do the switching.
> This optimization takes places only if all the interfaces that are part
> of the bridge have this flag and have the same network driver.
> 
> If the bridge interfaces is added in promisc mode then also the ports part
> of the bridge are set in promisc mode.
> 
> Horatiu Vultur (3):
>   net: Add HW_BRIDGE offload feature
>   net: mscc: Use NETIF_F_HW_BRIDGE
>   net: mscc: Implement promisc mode.
> 
>  drivers/net/ethernet/mscc/ocelot.c | 26 ++++++++++++++++++++++++--
>  include/linux/netdev_features.h    |  3 +++
>  net/bridge/br_if.c                 | 29 ++++++++++++++++++++++++++++-
>  net/core/ethtool.c                 |  1 +
>  4 files changed, 56 insertions(+), 3 deletions(-)
> 

IMHO there are already enough nerd knobs in bridge to support this.
If you hardware can't do real bridging, it is only doing MACVLAN so that
is what you should support instead.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox