Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] mlx4_ib: Increase the timeout for CM cache
From: Håkon Bugge @ 2019-02-06  8:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: netdev, OFED mailing list, rds-devel, linux-kernel,
	Jack Morgenstein
In-Reply-To: <20190205223608.GA23110@ziepe.ca>



> On 5 Feb 2019, at 23:36, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> 
> On Thu, Jan 31, 2019 at 06:09:51PM +0100, Håkon Bugge wrote:
>> Using CX-3 virtual functions, either from a bare-metal machine or
>> pass-through from a VM, MAD packets are proxied through the PF driver.
>> 
>> Since the VMs have separate name spaces for MAD Transaction Ids
>> (TIDs), the PF driver has to re-map the TIDs and keep the book keeping
>> in a cache.
>> 
>> Following the RDMA CM protocol, it is clear when an entry has to
>> evicted form the cache. But life is not perfect, remote peers may die
>> or be rebooted. Hence, it's a timeout to wipe out a cache entry, when
>> the PF driver assumes the remote peer has gone.
>> 
>> We have experienced excessive amount of DREQ retries during fail-over
>> testing, when running with eight VMs per database server.
>> 
>> The problem has been reproduced in a bare-metal system using one VM
>> per physical node. In this environment, running 256 processes in each
>> VM, each process uses RDMA CM to create an RC QP between himself and
>> all (256) remote processes. All in all 16K QPs.
>> 
>> When tearing down these 16K QPs, excessive DREQ retries (and
>> duplicates) are observed. With some cat/paste/awk wizardry on the
>> infiniband_cm sysfs, we observe:
>> 
>>      dreq:       5007
>> cm_rx_msgs:
>>      drep:       3838
>>      dreq:      13018
>>       rep:       8128
>>       req:       8256
>>       rtu:       8256
>> cm_tx_msgs:
>>      drep:       8011
>>      dreq:      68856
>>       rep:       8256
>>       req:       8128
>>       rtu:       8128
>> cm_tx_retries:
>>      dreq:      60483
>> 
>> Note that the active/passive side is distributed.
>> 
>> Enabling pr_debug in cm.c gives tons of:
>> 
>> [171778.814239] <mlx4_ib> mlx4_ib_multiplex_cm_handler: id{slave:
>> 1,sl_cm_id: 0xd393089f} is NULL!
>> 
>> By increasing the CM_CLEANUP_CACHE_TIMEOUT from 5 to 30 seconds, the
>> tear-down phase of the application is reduced from 113 to 67
>> seconds. Retries/duplicates are also significantly reduced:
>> 
>> cm_rx_duplicates:
>>      dreq:       7726
>> []
>> cm_tx_retries:
>>      drep:          1
>>      dreq:       7779
>> 
>> Increasing the timeout further didn't help, as these duplicates and
>> retries stem from a too short CMA timeout, which was 20 (~4 seconds)
>> on the systems. By increasing the CMA timeout to 22 (~17 seconds), the
>> numbers fell down to about one hundred for both of them.
>> 
>> Adjustment of the CMA timeout is _not_ part of this commit.
>> 
>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
>> ---
>> drivers/infiniband/hw/mlx4/cm.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Jack? What do you think?

I am tempted to send a v2 making this a sysctl tuneable. This because, full-rack testing using 8 servers, each with 8 VMs, only showed 33% reduction in the occurrences of "mlx4_ib_multiplex_cm_handler: id{slave:1,sl_cm_id: 0xd393089f} is NULL" with this commit.

But sure, Jack's opinion matters.


Thxs, Håkon

> 
>> diff --git a/drivers/infiniband/hw/mlx4/cm.c b/drivers/infiniband/hw/mlx4/cm.c
>> index fedaf8260105..8c79a480f2b7 100644
>> --- a/drivers/infiniband/hw/mlx4/cm.c
>> +++ b/drivers/infiniband/hw/mlx4/cm.c
>> @@ -39,7 +39,7 @@
>> 
>> #include "mlx4_ib.h"
>> 
>> -#define CM_CLEANUP_CACHE_TIMEOUT  (5 * HZ)
>> +#define CM_CLEANUP_CACHE_TIMEOUT  (30 * HZ)
>> 
>> struct id_map_entry {
>> 	struct rb_node node;
>> -- 
>> 2.20.1


^ permalink raw reply

* [PATCH v2] rhashtable: make walk safe from softirq context
From: Johannes Berg @ 2019-02-06  9:07 UTC (permalink / raw)
  To: linux-wireless, netdev
  Cc: Jouni Malinen, Thomas Graf, Herbert Xu, Johannes Berg

From: Johannes Berg <johannes.berg@intel.com>

When an rhashtable walk is done from softirq context, we rightfully
get a lockdep complaint saying that we could get a softirq in the
middle of a rehash, and thus deadlock on &ht->lock. This happened
e.g. in mac80211 as it does a walk in softirq context.

Fix this by using spin_lock_bh() wherever we use the &ht->lock.

Initially, I thought it would be sufficient to do this only in the
rehash (rhashtable_rehash_table), but I changed my mind:
 * the caller doesn't really need to disable softirqs across all
   of the rhashtable_walk_* functions, only those parts that they
   actually do within the lock need it
 * maybe more importantly, it would still lead to massive lockdep
   complaints - false positives, but hard to fix - because lockdep
   wouldn't know about different ht->lock instances, and thus one
   user of the code doing a walk w/o any locking (when it only ever
   uses process context this is fine) vs. another user like in wifi
   where we noticed this problem would still cause it to complain.

Cc: stable@vger.kernel.org
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 lib/rhashtable.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 852ffa5160f1..30d14f8d9985 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -327,10 +327,10 @@ static int rhashtable_rehash_table(struct rhashtable *ht)
 	/* Publish the new table pointer. */
 	rcu_assign_pointer(ht->tbl, new_tbl);
 
-	spin_lock(&ht->lock);
+	spin_lock_bh(&ht->lock);
 	list_for_each_entry(walker, &old_tbl->walkers, list)
 		walker->tbl = NULL;
-	spin_unlock(&ht->lock);
+	spin_unlock_bh(&ht->lock);
 
 	/* Wait for readers. All new readers will see the new
 	 * table, and thus no references to the old table will
@@ -670,11 +670,11 @@ void rhashtable_walk_enter(struct rhashtable *ht, struct rhashtable_iter *iter)
 	iter->skip = 0;
 	iter->end_of_table = 0;
 
-	spin_lock(&ht->lock);
+	spin_lock_bh(&ht->lock);
 	iter->walker.tbl =
 		rcu_dereference_protected(ht->tbl, lockdep_is_held(&ht->lock));
 	list_add(&iter->walker.list, &iter->walker.tbl->walkers);
-	spin_unlock(&ht->lock);
+	spin_unlock_bh(&ht->lock);
 }
 EXPORT_SYMBOL_GPL(rhashtable_walk_enter);
 
@@ -686,10 +686,10 @@ EXPORT_SYMBOL_GPL(rhashtable_walk_enter);
  */
 void rhashtable_walk_exit(struct rhashtable_iter *iter)
 {
-	spin_lock(&iter->ht->lock);
+	spin_lock_bh(&iter->ht->lock);
 	if (iter->walker.tbl)
 		list_del(&iter->walker.list);
-	spin_unlock(&iter->ht->lock);
+	spin_unlock_bh(&iter->ht->lock);
 }
 EXPORT_SYMBOL_GPL(rhashtable_walk_exit);
 
@@ -719,10 +719,10 @@ int rhashtable_walk_start_check(struct rhashtable_iter *iter)
 
 	rcu_read_lock();
 
-	spin_lock(&ht->lock);
+	spin_lock_bh(&ht->lock);
 	if (iter->walker.tbl)
 		list_del(&iter->walker.list);
-	spin_unlock(&ht->lock);
+	spin_unlock_bh(&ht->lock);
 
 	if (iter->end_of_table)
 		return 0;
@@ -938,12 +938,12 @@ void rhashtable_walk_stop(struct rhashtable_iter *iter)
 
 	ht = iter->ht;
 
-	spin_lock(&ht->lock);
+	spin_lock_bh(&ht->lock);
 	if (tbl->rehash < tbl->size)
 		list_add(&iter->walker.list, &tbl->walkers);
 	else
 		iter->walker.tbl = NULL;
-	spin_unlock(&ht->lock);
+	spin_unlock_bh(&ht->lock);
 
 out:
 	rcu_read_unlock();
-- 
2.17.2


^ permalink raw reply related

* Re: [PATCH iproute2-next] tc: full JSON support for 'bpf' actions
From: Davide Caratti @ 2019-02-06  9:14 UTC (permalink / raw)
  To: David Ahern, Stephen Hemminger; +Cc: netdev
In-Reply-To: <fa55aa0f-b40f-fec3-2626-4a23e73fd1aa@gmail.com>

On Tue, 2019-02-05 at 14:59 -0800, David Ahern wrote:
> On 2/5/19 2:53 PM, Stephen Hemminger wrote:
> > On Thu, 31 Jan 2019 18:58:09 +0100
> > Davide Caratti <dcaratti@redhat.com> wrote:
> > 
> > > +		print_uint(PRINT_ANY, "code", "%hu ", ops[i].code);
> > > +		print_uint(PRINT_ANY, "jt", "%hhu ", ops[i].jt);
> > > +		print_uint(PRINT_ANY, "jf", "%hhu ", ops[i].jf);
> > 
> > Did you know that print_uint promotes the argument to unsigned int

this happens frequently in iproute2 code,

> > then you are printing it with %hhu which expects only a u8.

but indeed, %hu and %hhu made the former cast just useless: thanks for
noticing.

> I did look at the print_hhu option and it seems really weird that you
> use "print_hhu(..., "%hhu", ...)" which is why I took the patch as is.
> There are existing examples of print_uint with '%hu' too.
> The print_ functions really should be renamed (print_uchar,
> print_ushort, etc).

maybe this can be done more reliably with an automatic tool, like
coccinelle. There are only 5 lines with the print_uint(...
"%hu" ...) pattern, so this can be uniformed easily with a small patch.

-- 
davide 


^ permalink raw reply

* Re: [PATCH v2 3/6] ethtool: introduce new ioctl for per-queue settings
From: Michal Kubecek @ 2019-02-06  9:18 UTC (permalink / raw)
  To: netdev; +Cc: Jeff Kirsher, linville, Nicholas Nunley, nhorman, sassmann
In-Reply-To: <20190206000106.24364-3-jeffrey.t.kirsher@intel.com>

On Tue, Feb 05, 2019 at 04:01:03PM -0800, Jeff Kirsher wrote:
> Introduce a new ioctl for setting per-queue parameters.
> Users can apply commands to specific queues by setting SUB_COMMAND and
> queue_mask with the following ethtool command:
> 
>  ethtool --set-perqueue-command DEVNAME [queue_mask %x] SUB_COMMAND

The "set" part is IMHO a bit confusing in combination with "show" type
subcommands.

> +static int set_queue_mask(u32 *queue_mask, char *str)
> +{
> +	int len = strlen(str);
> +	int index = __KERNEL_DIV_ROUND_UP(len * 4, 32);
> +	char tmp[9];
> +	char *end = str + len;
> +	int i, num;
> +	__u32 mask;
> +	int n_queues = 0;
> +
> +	if (len > MAX_NUM_QUEUE)
> +		return -EINVAL;
> +
> +	for (i = 0; i < index; i++) {
> +		num = end - str;
> +
> +		if (num >= 8) {
> +			end -= 8;
> +			num = 8;
> +		} else {
> +			end = str;
> +		}
> +		strncpy(tmp, end, num);
> +		tmp[num] = '\0';
> +
> +		queue_mask[i] = strtoul(tmp, NULL, 16);
> +
> +		mask = queue_mask[i];
> +		while (mask > 0) {
> +			if (mask & 0x1)
> +				n_queues++;
> +			mask = mask >> 1;
> +		}
> +	}
> +
> +	return n_queues;
> +}

Could you allow optional prefix "0x" as we do for link modes? Maybe
parse_hex_u32_bitmap() could be used for both.

> +static int do_perqueue(struct cmd_context *ctx)
> +{
> +	__u32 queue_mask[__KERNEL_DIV_ROUND_UP(MAX_NUM_QUEUE, 32)] = {0};
> +	int i, n_queues = 0;
> +
> +	if (ctx->argc == 0)
> +		exit_bad_args();
> +
> +	/*
> +	 * The sub commands will be applied to
> +	 * all queues if no queue_mask set
> +	 */
> +	if (strncmp(*ctx->argp, "queue_mask", 10)) {
> +		n_queues = find_max_num_queues(ctx);
> +		if (n_queues < 0) {
> +			perror("Cannot get number of queues");
> +			return -EFAULT;
> +		}
> +		for (i = 0; i < n_queues / 32; i++)
> +			queue_mask[i] = ~0;
> +		queue_mask[i] = (1 << (n_queues - i * 32)) - 1;
> +		fprintf(stdout,
> +			"The sub commands will be applied to all %d queues\n",
> +			n_queues);
> +	} else {
> +		ctx->argc--;
> +		ctx->argp++;
> +		n_queues = set_queue_mask(queue_mask, *ctx->argp);
> +		if (n_queues < 0) {
> +			perror("Invalid queue mask");
> +			return n_queues;
> +		}
> +		ctx->argc--;
> +		ctx->argp++;
> +	}
> +
> +	i = find_option(ctx->argc, ctx->argp);
> +	if (i < 0)
> +		exit_bad_args();
> +
> +	/* no sub_command support yet */
> +
> +	return 0;
> +}

Technically the return value should be -EOPNOTSUPP here but as the next
patch fixes that, it doesn't really matter.

Michal Kubecek

^ permalink raw reply

* Re: [PATCH] net: stmmac: Variable "val" in function sun8i_dwmac_set_syscon() could be uninitialized
From: Maxime Ripard @ 2019-02-06  9:18 UTC (permalink / raw)
  To: Yizhuo
  Cc: csong, zhiyunq, Giuseppe Cavallaro, Alexandre Torgue,
	Chen-Yu Tsai, netdev, linux-arm-kernel, linux-kernel
In-Reply-To: <20190205221559.17545-1-yzhai003@ucr.edu>

[-- Attachment #1: Type: text/plain, Size: 1171 bytes --]

Hi,

On Tue, Feb 05, 2019 at 02:15:59PM -0800, Yizhuo wrote:
> In function sun8i_dwmac_set_syscon(), local variable "val" could
> be uninitialized if function regmap_read() returns -EINVAL.
> However, it will be used directly in the if statement, which
> is potentially unsafe.
> 
> Signed-off-by: Yizhuo <yzhai003@ucr.edu>
> ---
>  drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
> index 39c2122a4f26..11d481c9e7ab 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
> @@ -639,9 +639,14 @@ static int sun8i_dwmac_set_syscon(struct stmmac_priv *priv)
>  	struct sunxi_priv_data *gmac = priv->plat->bsp_priv;
>  	struct device_node *node = priv->device->of_node;
>  	int ret;
> -	u32 reg, val;
> +	u32 reg, val = 0;

I guess we don't need to initialize it anymore with the check you add?

Maxime

-- 
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* [PATCH v3] arm64: dts: lx2160aqds: Add mdio mux nodes
From: Pankaj Bansal @ 2019-02-06  9:40 UTC (permalink / raw)
  To: Shawn Guo, Leo Li, Andrew Lunn, Florian Fainelli
  Cc: netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	Pankaj Bansal

The two external MDIO buses used to communicate with phy devices that are
external to SOC are muxed in LX2160AQDS board.

These buses can be routed to any one of the eight IO slots on LX2160AQDS
board depending on value in fpga register 0x54.

Additionally the external MDIO1 is used to communicate to the onboard
RGMII phy devices.

The mdio1 is controlled by bits 4-7 of fpga register and mdio2 is
controlled by bits 0-3 of fpga register.

Signed-off-by: Pankaj Bansal <pankaj.bansal@nxp.com>
---

Notes:
    V3:
    - Add status = disabled in soc file and status = okay in board file
      for external MDIO nodes
    - Add interrupts property in external mdio nodes in soc file
    V2:
    - removed unnecassary TODO statements
    - removed device_type from mdio nodes
    - change the case of hex number to lowercase
    - removed board specific comments from soc file

 .../boot/dts/freescale/fsl-lx2160a-qds.dts   | 123 +++++++++++++++++
 .../boot/dts/freescale/fsl-lx2160a.dtsi      |  22 +++
 2 files changed, 145 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts b/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
index 99a22abbe725..079264b391a2 100644
--- a/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
+++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
@@ -35,6 +35,14 @@
 	status = "okay";
 };
 
+&emdio1 {
+	status = "okay";
+};
+
+&emdio2 {
+	status = "okay";
+};
+
 &esdhc0 {
 	status = "okay";
 };
@@ -46,6 +54,121 @@
 &i2c0 {
 	status = "okay";
 
+	fpga@66 {
+		compatible = "fsl,lx2160aqds-fpga", "fsl,fpga-qixis-i2c";
+		reg = <0x66>;
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		mdio-mux-1@54 {
+			mdio-parent-bus = <&emdio1>;
+			reg = <0x54>;		 /* BRDCFG4 */
+			mux-mask = <0xf8>;      /* EMI1_MDIO */
+			#address-cells=<1>;
+			#size-cells = <0>;
+
+			mdio@0 {
+				reg = <0x00>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@40 {
+				reg = <0x40>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@c0 {
+				reg = <0xc0>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@c8 {
+				reg = <0xc8>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@d0 {
+				reg = <0xd0>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@d8 {
+				reg = <0xd8>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@e0 {
+				reg = <0xe0>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@e8 {
+				reg = <0xe8>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@f0 {
+				reg = <0xf0>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@f8 {
+				reg = <0xf8>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+		};
+
+		mdio-mux-2@54 {
+			mdio-parent-bus = <&emdio2>;
+			reg = <0x54>;		 /* BRDCFG4 */
+			mux-mask = <0x07>;      /* EMI2_MDIO */
+			#address-cells=<1>;
+			#size-cells = <0>;
+
+			mdio@0 {
+				reg = <0x00>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@1 {
+				reg = <0x01>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@2 {
+				reg = <0x02>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@3 {
+				reg = <0x03>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@4 {
+				reg = <0x04>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@5 {
+				reg = <0x05>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@6 {
+				reg = <0x06>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@7 {
+				reg = <0x07>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+		};
+	};
+
 	i2c-mux@77 {
 		compatible = "nxp,pca9547";
 		reg = <0x77>;
diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
index a79f5c1ea56d..7def5252ac1a 100644
--- a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
@@ -762,5 +762,27 @@
 				     <GIC_SPI 209 IRQ_TYPE_LEVEL_HIGH>;
 			dma-coherent;
 		};
+
+		/* WRIOP0: 0x8b8_0000, E-MDIO1: 0x1_6000 */
+		emdio1: mdio@8b96000 {
+			compatible = "fsl,fman-memac-mdio";
+			reg = <0x0 0x8b96000 0x0 0x1000>;
+			interrupts = <GIC_SPI 90 IRQ_TYPE_LEVEL_HIGH>;
+			#address-cells = <1>;
+			#size-cells = <0>;
+			little-endian;	/* force the driver in LE mode */
+			status = "disabled";
+		};
+
+		/* WRIOP0: 0x8b8_0000, E-MDIO2: 0x1_7000 */
+		emdio2: mdio@8b97000 {
+			compatible = "fsl,fman-memac-mdio";
+			reg = <0x0 0x8b97000 0x0 0x1000>;
+			interrupts = <GIC_SPI 91 IRQ_TYPE_LEVEL_HIGH>;
+			#address-cells = <1>;
+			#size-cells = <0>;
+			little-endian;	/* force the driver in LE mode */
+			status = "disabled";
+		};
 	};
 };
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH bpf-next] tools: bpftool: doc, add text about feature-subcommand
From: Quentin Monnet @ 2019-02-06  9:42 UTC (permalink / raw)
  To: Prashant Bhole, Alexei Starovoitov, Daniel Borkmann; +Cc: netdev
In-Reply-To: <20190206014556.3548-1-bhole_prashant_q7@lab.ntt.co.jp>

2019-02-06 10:45 UTC+0900 ~ Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
> This patch adds missing information about feature-subcommand in
> bpftool.rst
> 
> Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>

My bad, thanks a lot for the update!

> ---
>   tools/bpf/bpftool/Documentation/bpftool.rst | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/bpf/bpftool/Documentation/bpftool.rst b/tools/bpf/bpftool/Documentation/bpftool.rst
> index 27153bb816ac..0685d5ada3ce 100644
> --- a/tools/bpf/bpftool/Documentation/bpftool.rst
> +++ b/tools/bpf/bpftool/Documentation/bpftool.rst
> @@ -16,7 +16,7 @@ SYNOPSIS
>   
>   	**bpftool** **version**
>   
> -	*OBJECT* := { **map** | **program** | **cgroup** | **perf** | **net** }
> +	*OBJECT* := { **map** | **program** | **cgroup** | **perf** | **net** | **feature** }
>   
>   	*OPTIONS* := { { **-V** | **--version** } | { **-h** | **--help** }
>   	| { **-j** | **--json** } [{ **-p** | **--pretty** }] }
> @@ -34,6 +34,8 @@ SYNOPSIS
>   
>   	*NET-COMMANDS* := { **show** | **list** | **help** }
>   
> +        *FEATURE-COMMANDS* := { **probe** | **help** }

Could you please fix this line to use a tab instead of spaces for 
indent, as for the rest of the file?

> +
>   DESCRIPTION
>   ===========
>   	*bpftool* allows for inspection and simple modification of BPF objects
> 


^ permalink raw reply

* Re: [PATCH bpf-next] tools: bpftool: doc, fix incorrect text
From: Quentin Monnet @ 2019-02-06  9:48 UTC (permalink / raw)
  To: Prashant Bhole, Alexei Starovoitov, Daniel Borkmann; +Cc: netdev
In-Reply-To: <20190206014723.1424-1-bhole_prashant_q7@lab.ntt.co.jp>

2019-02-06 10:47 UTC+0900 ~ Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
> Documentation about cgroup, feature, prog uses wrong header
> 'MAP COMMANDS' while listing commands. This patch corrects the header
> in respective doc files.
> 
> Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>

Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>

Thanks!

^ permalink raw reply

* Re: [PATCH v2 1/6] ethtool: move option parsing related code into function
From: Michal Kubecek @ 2019-02-06  9:56 UTC (permalink / raw)
  To: netdev; +Cc: Jeff Kirsher, linville, Nicholas Nunley, nhorman, sassmann
In-Reply-To: <20190206000106.24364-1-jeffrey.t.kirsher@intel.com>

On Tue, Feb 05, 2019 at 04:01:01PM -0800, Jeff Kirsher wrote:
> From: Nicholas Nunley <nicholas.d.nunley@intel.com>
> 
> Move option parsing code into find_option function.
> 
> No behavior changes.
> 
> Based on patch by Kan Liang <kan.liang@intel.com>
> 
> Signed-off-by: Nicholas Nunley <nicholas.d.nunley@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> ---
>  ethtool.c | 49 +++++++++++++++++++++++++++++++------------------
>  1 file changed, 31 insertions(+), 18 deletions(-)
> 
> diff --git a/ethtool.c b/ethtool.c
> index 2f7e96b..8b7c224 100644
> --- a/ethtool.c
> +++ b/ethtool.c
> @@ -5265,6 +5265,29 @@ static int show_usage(struct cmd_context *ctx)
>  	return 0;
>  }
>  
> +static int find_option(int argc, char **argp)
> +{
> +	const char *opt;
> +	size_t len;
> +	int k;
> +
> +	for (k = 0; args[k].opts; k++) {
> +		opt = args[k].opts;
> +		for (;;) {
> +			len = strcspn(opt, "|");
> +			if (strncmp(*argp, opt, len) == 0 &&
> +			    (*argp)[len] == 0)
> +				return k;
> +
> +			if (opt[len] == 0)
> +				break;
> +			opt += len + 1;
> +		}
> +	}
> +
> +	return -1;
> +}
> +
>  int main(int argc, char **argp)
>  {
>  	int (*func)(struct cmd_context *);
> @@ -5284,24 +5307,14 @@ int main(int argc, char **argp)
>  	 */
>  	if (argc == 0)
>  		exit_bad_args();
> -	for (k = 0; args[k].opts; k++) {
> -		const char *opt;
> -		size_t len;
> -		opt = args[k].opts;
> -		for (;;) {
> -			len = strcspn(opt, "|");
> -			if (strncmp(*argp, opt, len) == 0 &&
> -			    (*argp)[len] == 0) {
> -				argp++;
> -				argc--;
> -				func = args[k].func;
> -				want_device = args[k].want_device;
> -				goto opt_found;
> -			}
> -			if (opt[len] == 0)
> -				break;
> -			opt += len + 1;
> -		}
> +
> +	k = find_option(argc, argp);
> +	if (k >= 0) {
> +		argp++;
> +		argc--;
> +		func = args[k].func;
> +		want_device = args[k].want_device;
> +		goto opt_found;
>  	}
>  	if ((*argp)[0] == '-')
>  		exit_bad_args();

After hiding the loop into find_option() helper, you can put the
following few lines into else branch and get rid of the goto.

Michal Kubecek

^ permalink raw reply

* Re: [PATCH net-next v7 0/8] devlink: Add configuration parameters support for devlink_port
From: Vasundhara Volam @ 2019-02-06 10:13 UTC (permalink / raw)
  To: Michal Kubecek
  Cc: Jakub Kicinski, Netdev, David Miller, michael.chan@broadcom.com,
	Jiri Pirko
In-Reply-To: <20190205165133.GF21401@unicorn.suse.cz>

On Tue, Feb 5, 2019 at 10:21 PM Michal Kubecek <mkubecek@suse.cz> wrote:
>
> On Tue, Feb 05, 2019 at 09:53:26AM +0530, Vasundhara Volam wrote:
> > On Tue, Feb 5, 2019 at 8:26 AM Jakub Kicinski
> > >
> > > No?  We were talking about using the soon-too-come ethtool netlink
> > > API with additional indication that given configuration request is
> > > supposed to be persisted.  Adding more devlink parameters is exactly
> > > the opposite of what you should be doing.
> >
> > Okay. So, till then can we have the devlink wake_on_lan parameter or
> > you want this to be removed? Could you please clarify?
> >
> > Once ethtool netlink API is available with persisted support, I can remove
> > this wake_on_lan parameter from devlink. Thanks.
>
> Once you provide an interface for userspace and applications start using
> it, it's hard to get rid of it. As an extreme example, the legacy ioctl
> interface used by ifconfig has been declared obsolete since kernel 2.2.0
> (January 1999, i.e. 20 years ago) and we still have to maintain it.
>
Okay Got it. I will revert only the wake_on_lan parameter and send the patch.
We will wait for soon-too-come ethtool netlink API.

Thank you.
> Michal Kubecek

^ permalink raw reply

* [PATCH 6/8] net: thunderx: add mutex to protect mailbox from concurrent calls for same VF
From: Vadim Lomovtsev @ 2019-02-06 10:13 UTC (permalink / raw)
  To: sgoutham@cavium.com, rric@kernel.org, davem@davemloft.net,
	linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
  Cc: dnelson@redhat.com, Vadim Lomovtsev
In-Reply-To: <20190206101351.16744-1-vlomovtsev@marvell.com>

In some cases it could happen that nicvf_send_msg_to_pf() could be called
concurrently for the same NIC VF, and thus re-writing mailbox contents and
breaking messaging sequence with PF by re-writing NICVF data.

This commit is to implement mutex for NICVF to protect mailbox registers
and NICVF messaging control data from concurrent access.

Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com>
---
 drivers/net/ethernet/cavium/thunder/nic.h        |  2 ++
 drivers/net/ethernet/cavium/thunder/nicvf_main.c | 13 ++++++++++---
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nic.h b/drivers/net/ethernet/cavium/thunder/nic.h
index 227343625e83..86cda3f4b37b 100644
--- a/drivers/net/ethernet/cavium/thunder/nic.h
+++ b/drivers/net/ethernet/cavium/thunder/nic.h
@@ -329,6 +329,8 @@ struct nicvf {
 	spinlock_t              rx_mode_wq_lock;
 	/* workqueue for handling kernel ndo_set_rx_mode() calls */
 	struct workqueue_struct *nicvf_rx_mode_wq;
+	/* mutex to protect VF's mailbox contents from concurrent access */
+	struct mutex            rx_mode_mtx;
 
 	/* PTP timestamp */
 	struct cavium_ptp	*ptp_clock;
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index 30c7f54b4f17..a05e2989ec76 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -124,6 +124,9 @@ int nicvf_send_msg_to_pf(struct nicvf *nic, union nic_mbx *mbx)
 {
 	int timeout = NIC_MBOX_MSG_TIMEOUT;
 	int sleep = 10;
+	int ret = 0;
+
+	mutex_lock(&nic->rx_mode_mtx);
 
 	nic->pf_acked = false;
 	nic->pf_nacked = false;
@@ -136,7 +139,8 @@ int nicvf_send_msg_to_pf(struct nicvf *nic, union nic_mbx *mbx)
 			netdev_err(nic->netdev,
 				   "PF NACK to mbox msg 0x%02x from VF%d\n",
 				   (mbx->msg.msg & 0xFF), nic->vf_id);
-			return -EINVAL;
+			ret = -EINVAL;
+			break;
 		}
 		msleep(sleep);
 		if (nic->pf_acked)
@@ -146,10 +150,12 @@ int nicvf_send_msg_to_pf(struct nicvf *nic, union nic_mbx *mbx)
 			netdev_err(nic->netdev,
 				   "PF didn't ACK to mbox msg 0x%02x from VF%d\n",
 				   (mbx->msg.msg & 0xFF), nic->vf_id);
-			return -EBUSY;
+			ret = -EBUSY;
+			break;
 		}
 	}
-	return 0;
+	mutex_unlock(&nic->rx_mode_mtx);
+	return ret;
 }
 
 /* Checks if VF is able to comminicate with PF
@@ -2211,6 +2217,7 @@ static int nicvf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 							nic->vf_id);
 	INIT_WORK(&nic->rx_mode_work.work, nicvf_set_rx_mode_task);
 	spin_lock_init(&nic->rx_mode_wq_lock);
+	mutex_init(&nic->rx_mode_mtx);
 
 	err = register_netdev(netdev);
 	if (err) {
-- 
2.17.2

^ permalink raw reply related

* [PATCH 7/8] net: thunderx: implement helpers to read mailbox IRQ status
From: Vadim Lomovtsev @ 2019-02-06 10:13 UTC (permalink / raw)
  To: sgoutham@cavium.com, rric@kernel.org, davem@davemloft.net,
	linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
  Cc: dnelson@redhat.com, Vadim Lomovtsev
In-Reply-To: <20190206101351.16744-1-vlomovtsev@marvell.com>

This commit is to implement routines to read mailbox IRQ status
for particular VF at PF side, and for mailbox IRQ status
from PF at VF side.

Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com>
---
 drivers/net/ethernet/cavium/thunder/nic_main.c     | 13 +++++++++++++
 drivers/net/ethernet/cavium/thunder/nicvf_queues.c | 14 ++++++++++++++
 drivers/net/ethernet/cavium/thunder/nicvf_queues.h |  1 +
 3 files changed, 28 insertions(+)

diff --git a/drivers/net/ethernet/cavium/thunder/nic_main.c b/drivers/net/ethernet/cavium/thunder/nic_main.c
index 620dbe082ca0..a32c1bd75794 100644
--- a/drivers/net/ethernet/cavium/thunder/nic_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nic_main.c
@@ -104,6 +104,19 @@ static u64 nic_reg_read(struct nicpf *nic, u64 offset)
 	return readq_relaxed(nic->reg_base + offset);
 }
 
+static int nic_is_mbox_intr_active(struct nicpf *nic, int vf_id)
+{
+	int ret = 0;
+
+	if (vf_id < NIC_VF_PER_MBX_REG) {
+		ret = nic_reg_read(nic, NIC_PF_MAILBOX_INT) & BIT_ULL(vf_id);
+	} else {
+		ret = nic_reg_read(nic, NIC_PF_MAILBOX_INT + sizeof(u64)) &
+			BIT_ULL(vf_id - NIC_VF_PER_MBX_REG);
+	}
+	return ret;
+}
+
 /* PF -> VF mailbox communication APIs */
 static void nic_enable_mbx_intr(struct nicpf *nic)
 {
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
index 5b4d3badcb73..e7ee7005657c 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
@@ -1801,6 +1801,20 @@ void nicvf_clear_intr(struct nicvf *nic, int int_type, int q_idx)
 	nicvf_reg_write(nic, NIC_VF_INT, mask);
 }
 
+/* Check if interrupt is active */
+int nicvf_check_is_intr_active(struct nicvf *nic, int int_type, int q_idx)
+{
+	u64 mask = nicvf_int_type_to_mask(int_type, q_idx);
+
+	if (!mask) {
+		netdev_dbg(nic->netdev,
+			   "Failed to read interrupt status: unknown type\n");
+		return 0;
+	}
+
+	return (mask & nicvf_reg_read(nic, NIC_VF_INT));
+}
+
 /* Check if interrupt is enabled */
 int nicvf_is_intr_enabled(struct nicvf *nic, int int_type, int q_idx)
 {
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.h b/drivers/net/ethernet/cavium/thunder/nicvf_queues.h
index 5e9a03cf1b4d..58f6fbe48bce 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.h
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.h
@@ -358,6 +358,7 @@ void nicvf_enable_intr(struct nicvf *nic, int int_type, int q_idx);
 void nicvf_disable_intr(struct nicvf *nic, int int_type, int q_idx);
 void nicvf_clear_intr(struct nicvf *nic, int int_type, int q_idx);
 int nicvf_is_intr_enabled(struct nicvf *nic, int int_type, int q_idx);
+int nicvf_check_is_intr_active(struct nicvf *nic, int int_type, int q_idx);
 
 /* Register access APIs */
 void nicvf_reg_write(struct nicvf *nic, u64 offset, u64 val);
-- 
2.17.2

^ permalink raw reply related

* [PATCH 8/8] net: thunderx: check status of mailbox IRQ before sending a message
From: Vadim Lomovtsev @ 2019-02-06 10:13 UTC (permalink / raw)
  To: sgoutham@cavium.com, rric@kernel.org, davem@davemloft.net,
	linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
  Cc: dnelson@redhat.com, Vadim Lomovtsev
In-Reply-To: <20190206101351.16744-1-vlomovtsev@marvell.com>

In order to prevent mailbox data re-writing at VF side we need to check if
there is an active mailbox IRQ from PF, and if there is no one proceed with
sending message to PF. Having spinlock at irq handler and message send
routing wont help since by the moment when code flow would reach the irq
handler and acquire spinlock the message send routine could be already
invoked and thus re-write data in the mailbox.

The same is true for PF while sending messages to VF.

This commit is to implement mailbox IRQ status check before sending
message to VF from PF. Same is for sending message to PF from VF.

Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com>
---
 .../net/ethernet/cavium/thunder/nic_main.c    | 39 ++++++++-----------
 .../net/ethernet/cavium/thunder/nicvf_main.c  |  3 ++
 2 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nic_main.c b/drivers/net/ethernet/cavium/thunder/nic_main.c
index a32c1bd75794..e0041692caef 100644
--- a/drivers/net/ethernet/cavium/thunder/nic_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nic_main.c
@@ -64,7 +64,6 @@ struct nicpf {
 	u32			*speed;
 	u16			cpi_base[MAX_NUM_VFS_SUPPORTED];
 	u16			rssi_base[MAX_NUM_VFS_SUPPORTED];
-	bool			mbx_lock[MAX_NUM_VFS_SUPPORTED];
 
 	/* MSI-X */
 	u8			num_vec;
@@ -954,8 +953,6 @@ static void nic_handle_mbx_intr(struct nicpf *nic, int vf)
 	int i;
 	int ret = 0;
 
-	nic->mbx_lock[vf] = true;
-
 	mbx_addr = nic_get_mbx_addr(vf);
 	mbx_data = (u64 *)&mbx;
 
@@ -975,7 +972,7 @@ static void nic_handle_mbx_intr(struct nicpf *nic, int vf)
 			nic->duplex[vf] = 0;
 			nic->speed[vf] = 0;
 		}
-		goto unlock;
+		return;
 	case NIC_MBOX_MSG_QS_CFG:
 		reg_addr = NIC_PF_QSET_0_127_CFG |
 			   (mbx.qs.num << NIC_QS_ID_SHIFT);
@@ -1044,7 +1041,7 @@ static void nic_handle_mbx_intr(struct nicpf *nic, int vf)
 		break;
 	case NIC_MBOX_MSG_RSS_SIZE:
 		nic_send_rss_size(nic, vf);
-		goto unlock;
+		return;
 	case NIC_MBOX_MSG_RSS_CFG:
 	case NIC_MBOX_MSG_RSS_CFG_CONT:
 		nic_config_rss(nic, &mbx.rss_cfg);
@@ -1062,19 +1059,19 @@ static void nic_handle_mbx_intr(struct nicpf *nic, int vf)
 		break;
 	case NIC_MBOX_MSG_ALLOC_SQS:
 		nic_alloc_sqs(nic, &mbx.sqs_alloc);
-		goto unlock;
+		return;
 	case NIC_MBOX_MSG_NICVF_PTR:
 		nic->nicvf[vf] = mbx.nicvf.nicvf;
 		break;
 	case NIC_MBOX_MSG_PNICVF_PTR:
 		nic_send_pnicvf(nic, vf);
-		goto unlock;
+		return;
 	case NIC_MBOX_MSG_SNICVF_PTR:
 		nic_send_snicvf(nic, &mbx.nicvf);
-		goto unlock;
+		return;
 	case NIC_MBOX_MSG_BGX_STATS:
 		nic_get_bgx_stats(nic, &mbx.bgx_stats);
-		goto unlock;
+		return;
 	case NIC_MBOX_MSG_LOOPBACK:
 		ret = nic_config_loopback(nic, &mbx.lbk);
 		break;
@@ -1083,7 +1080,7 @@ static void nic_handle_mbx_intr(struct nicpf *nic, int vf)
 		break;
 	case NIC_MBOX_MSG_PFC:
 		nic_pause_frame(nic, vf, &mbx.pfc);
-		goto unlock;
+		return;
 	case NIC_MBOX_MSG_PTP_CFG:
 		nic_config_timestamp(nic, vf, &mbx.ptp);
 		break;
@@ -1134,8 +1131,6 @@ static void nic_handle_mbx_intr(struct nicpf *nic, int vf)
 			mbx.msg.msg, vf);
 		nic_mbx_send_nack(nic, vf);
 	}
-unlock:
-	nic->mbx_lock[vf] = false;
 }
 
 static irqreturn_t nic_mbx_intr_handler(int irq, void *nic_irq)
@@ -1313,18 +1308,18 @@ static void nic_poll_for_link(struct work_struct *work)
 		if (nic->link[vf] == link.link_up)
 			continue;
 
-		if (!nic->mbx_lock[vf]) {
-			nic->link[vf] = link.link_up;
-			nic->duplex[vf] = link.duplex;
-			nic->speed[vf] = link.speed;
+		nic->link[vf] = link.link_up;
+		nic->duplex[vf] = link.duplex;
+		nic->speed[vf] = link.speed;
 
-			/* Send a mbox message to VF with current link status */
-			mbx.link_status.link_up = link.link_up;
-			mbx.link_status.duplex = link.duplex;
-			mbx.link_status.speed = link.speed;
-			mbx.link_status.mac_type = link.mac_type;
+		/* Send a mbox message to VF with current link status */
+		mbx.link_status.link_up = link.link_up;
+		mbx.link_status.duplex = link.duplex;
+		mbx.link_status.speed = link.speed;
+		mbx.link_status.mac_type = link.mac_type;
+
+		if (!nic_is_mbox_intr_active(nic, vf))
 			nic_send_msg_to_vf(nic, vf, &mbx);
-		}
 	}
 	queue_delayed_work(nic->check_link, &nic->dwork, HZ * 2);
 }
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index a05e2989ec76..66e19c207467 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -128,6 +128,9 @@ int nicvf_send_msg_to_pf(struct nicvf *nic, union nic_mbx *mbx)
 
 	mutex_lock(&nic->rx_mode_mtx);
 
+	while (nicvf_check_is_intr_active(nic, NICVF_INTR_MBOX, 0))
+		msleep(1);
+
 	nic->pf_acked = false;
 	nic->pf_nacked = false;
 
-- 
2.17.2

^ permalink raw reply related

* [PATCH 3/8] net: thunderx: make CFG_DONE message to run through generic send-ack sequence
From: Vadim Lomovtsev @ 2019-02-06 10:13 UTC (permalink / raw)
  To: sgoutham@cavium.com, rric@kernel.org, davem@davemloft.net,
	linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
  Cc: dnelson@redhat.com, Vadim Lomovtsev
In-Reply-To: <20190206101351.16744-1-vlomovtsev@marvell.com>

At the end of NIC VF initialization it send CFG_DONE message to PF without
using nicvf_msg_send_to_pf routine. This potentially could re-write data in
mailbox. This commit is to implement common way of sending CFG_DONE message
by the same way with other configuration messages by using
nicvf_send_msg_to_pf() routine.

Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com>
---
 drivers/net/ethernet/cavium/thunder/nic_main.c |  2 +-
 .../net/ethernet/cavium/thunder/nicvf_main.c   | 18 +++++++++++++++---
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nic_main.c b/drivers/net/ethernet/cavium/thunder/nic_main.c
index 6c8dcb65ff03..90497a27df18 100644
--- a/drivers/net/ethernet/cavium/thunder/nic_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nic_main.c
@@ -1039,7 +1039,7 @@ static void nic_handle_mbx_intr(struct nicpf *nic, int vf)
 	case NIC_MBOX_MSG_CFG_DONE:
 		/* Last message of VF config msg sequence */
 		nic_enable_vf(nic, vf, true);
-		goto unlock;
+		break;
 	case NIC_MBOX_MSG_SHUTDOWN:
 		/* First msg in VF teardown sequence */
 		if (vf >= nic->num_vf_en)
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index abf24e7dff2d..b0e8a04e0f1e 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -169,6 +169,20 @@ static int nicvf_check_pf_ready(struct nicvf *nic)
 	return 1;
 }
 
+static int nicvf_send_cfg_done(struct nicvf *nic)
+{
+	union nic_mbx mbx = {};
+
+	mbx.msg.msg = NIC_MBOX_MSG_CFG_DONE;
+	if (nicvf_send_msg_to_pf(nic, &mbx)) {
+		netdev_err(nic->netdev,
+			   "PF didn't respond to CFG DONE msg\n");
+		return 0;
+	}
+
+	return 1;
+}
+
 static void nicvf_read_bgx_stats(struct nicvf *nic, struct bgx_stats_msg *bgx)
 {
 	if (bgx->rx)
@@ -1416,7 +1430,6 @@ int nicvf_open(struct net_device *netdev)
 	struct nicvf *nic = netdev_priv(netdev);
 	struct queue_set *qs = nic->qs;
 	struct nicvf_cq_poll *cq_poll = NULL;
-	union nic_mbx mbx = {};
 
 	/* wait till all queued set_rx_mode tasks completes if any */
 	drain_workqueue(nic->nicvf_rx_mode_wq);
@@ -1515,8 +1528,7 @@ int nicvf_open(struct net_device *netdev)
 		nicvf_enable_intr(nic, NICVF_INTR_RBDR, qidx);
 
 	/* Send VF config done msg to PF */
-	mbx.msg.msg = NIC_MBOX_MSG_CFG_DONE;
-	nicvf_write_to_mbx(nic, &mbx);
+	nicvf_send_cfg_done(nic);
 
 	return 0;
 cleanup:
-- 
2.17.2

^ permalink raw reply related

* [PATCH 0/8] nic: thunderx: fix communication races betwen VF & PF
From: Vadim Lomovtsev @ 2019-02-06 10:13 UTC (permalink / raw)
  To: sgoutham@cavium.com, rric@kernel.org, davem@davemloft.net,
	linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
  Cc: dnelson@redhat.com, Vadim Lomovtsev

The ThunderX CN88XX NIC Virtual Function driver uses mailbox interface 
to communicate to physical function driver. Each of VF has it's own pair
of mailbox registers to read from and write to. The mailbox registers
has no protection from possible races, so it has to be implemented
at software side.

After long term testing by loop of 'ip link set <ifname> up/down'
command it was found that there are two possible scenarios when
race condition appears:
 1. VF receives link change message from PF and VF send RX mode
configuration message to PF in the same time from separate thread.
 2. PF receives RX mode configuration from VF and in the same time,
in separate thread PF detects link status change and sends appropriate
message to particular VF.

Both cases leads to mailbox data to be rewritten, NIC VF messaging control
data to be updated incorrectly and communication sequence gets broken.

This patch series is to address race condition with VF & PF communication.

Vadim Lomovtsev (8):
  net: thunderx: correct typo in macro name
  net: thunderx: replace global nicvf_rx_mode_wq work queue for all VFs
    to private for each of them.
  net: thunderx: make CFG_DONE message to run through generic send-ack
    sequence
  net: thunderx: add nicvf_send_msg_to_pf result check for
    set_rx_mode_task
  net: thunderx: rework xcast message structure to make it fit into 64
    bit
  net: thunderx: add mutex to protect mailbox from concurrent calls for
    same VF
  net: thunderx: implement helpers to read mailbox IRQ status
  net: thunderx: check status of mailbox IRQ before sending a message

 drivers/net/ethernet/cavium/thunder/nic.h     | 12 +--
 .../net/ethernet/cavium/thunder/nic_main.c    | 58 +++++++------
 .../net/ethernet/cavium/thunder/nicvf_main.c  | 82 +++++++++++++------
 .../ethernet/cavium/thunder/nicvf_queues.c    | 14 ++++
 .../ethernet/cavium/thunder/nicvf_queues.h    |  1 +
 .../net/ethernet/cavium/thunder/thunder_bgx.c |  2 +-
 .../net/ethernet/cavium/thunder/thunder_bgx.h |  2 +-
 7 files changed, 112 insertions(+), 59 deletions(-)

-- 
2.17.2

^ permalink raw reply

* [PATCH 4/8] net: thunderx: add nicvf_send_msg_to_pf result check for set_rx_mode_task
From: Vadim Lomovtsev @ 2019-02-06 10:13 UTC (permalink / raw)
  To: sgoutham@cavium.com, rric@kernel.org, davem@davemloft.net,
	linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
  Cc: dnelson@redhat.com, Vadim Lomovtsev
In-Reply-To: <20190206101351.16744-1-vlomovtsev@marvell.com>

The rx_set_mode invokes number of messages to be send to PF for receive
mode configuration. In case if there any issues we need to stop sending
messages and release allocated memory.

This commit is to implement check of nicvf_msg_send_to_pf() result.

Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com>
---
 drivers/net/ethernet/cavium/thunder/nicvf_main.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index b0e8a04e0f1e..dbd8862d60d6 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -1956,7 +1956,8 @@ static void __nicvf_set_rx_mode_task(u8 mode, struct xcast_addr_list *mc_addrs,
 
 	/* flush DMAC filters and reset RX mode */
 	mbx.xcast.msg = NIC_MBOX_MSG_RESET_XCAST;
-	nicvf_send_msg_to_pf(nic, &mbx);
+	if (nicvf_send_msg_to_pf(nic, &mbx) < 0)
+		goto free_mc;
 
 	if (mode & BGX_XCAST_MCAST_FILTER) {
 		/* once enabling filtering, we need to signal to PF to add
@@ -1964,7 +1965,8 @@ static void __nicvf_set_rx_mode_task(u8 mode, struct xcast_addr_list *mc_addrs,
 		 */
 		mbx.xcast.msg = NIC_MBOX_MSG_ADD_MCAST;
 		mbx.xcast.data.mac = 0;
-		nicvf_send_msg_to_pf(nic, &mbx);
+		if (nicvf_send_msg_to_pf(nic, &mbx) < 0)
+			goto free_mc;
 	}
 
 	/* check if we have any specific MACs to be added to PF DMAC filter */
@@ -1973,9 +1975,9 @@ static void __nicvf_set_rx_mode_task(u8 mode, struct xcast_addr_list *mc_addrs,
 		for (idx = 0; idx < mc_addrs->count; idx++) {
 			mbx.xcast.msg = NIC_MBOX_MSG_ADD_MCAST;
 			mbx.xcast.data.mac = mc_addrs->mc[idx];
-			nicvf_send_msg_to_pf(nic, &mbx);
+			if (nicvf_send_msg_to_pf(nic, &mbx) < 0)
+				goto free_mc;
 		}
-		kfree(mc_addrs);
 	}
 
 	/* and finally set rx mode for PF accordingly */
@@ -1983,6 +1985,8 @@ static void __nicvf_set_rx_mode_task(u8 mode, struct xcast_addr_list *mc_addrs,
 	mbx.xcast.data.mode = mode;
 
 	nicvf_send_msg_to_pf(nic, &mbx);
+free_mc:
+	kfree(mc_addrs);
 }
 
 static void nicvf_set_rx_mode_task(struct work_struct *work_arg)
-- 
2.17.2

^ permalink raw reply related

* [PATCH 5/8] net: thunderx: rework xcast message structure to make it fit into 64 bit
From: Vadim Lomovtsev @ 2019-02-06 10:13 UTC (permalink / raw)
  To: sgoutham@cavium.com, rric@kernel.org, davem@davemloft.net,
	linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
  Cc: dnelson@redhat.com, Vadim Lomovtsev
In-Reply-To: <20190206101351.16744-1-vlomovtsev@marvell.com>

To communicate to PF each of ThunderX NIC VF uses mailbox which is
pair of 64 bit registers available to both VFn and PF.

This commit is to change the xcast message structure in order to
fit it into 64 bit.

Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com>
---
 drivers/net/ethernet/cavium/thunder/nic.h        | 6 ++----
 drivers/net/ethernet/cavium/thunder/nic_main.c   | 4 ++--
 drivers/net/ethernet/cavium/thunder/nicvf_main.c | 6 +++---
 3 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nic.h b/drivers/net/ethernet/cavium/thunder/nic.h
index 376a96bce33f..227343625e83 100644
--- a/drivers/net/ethernet/cavium/thunder/nic.h
+++ b/drivers/net/ethernet/cavium/thunder/nic.h
@@ -577,10 +577,8 @@ struct set_ptp {
 
 struct xcast {
 	u8    msg;
-	union {
-		u8    mode;
-		u64   mac;
-	} data;
+	u8    mode;
+	u64   mac:48;
 };
 
 /* 128 bit shared memory between PF and each VF */
diff --git a/drivers/net/ethernet/cavium/thunder/nic_main.c b/drivers/net/ethernet/cavium/thunder/nic_main.c
index 90497a27df18..620dbe082ca0 100644
--- a/drivers/net/ethernet/cavium/thunder/nic_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nic_main.c
@@ -1094,7 +1094,7 @@ static void nic_handle_mbx_intr(struct nicpf *nic, int vf)
 		bgx = NIC_GET_BGX_FROM_VF_LMAC_MAP(nic->vf_lmac_map[vf]);
 		lmac = NIC_GET_LMAC_FROM_VF_LMAC_MAP(nic->vf_lmac_map[vf]);
 		bgx_set_dmac_cam_filter(nic->node, bgx, lmac,
-					mbx.xcast.data.mac,
+					mbx.xcast.mac,
 					vf < NIC_VF_PER_MBX_REG ? vf :
 					vf - NIC_VF_PER_MBX_REG);
 		break;
@@ -1106,7 +1106,7 @@ static void nic_handle_mbx_intr(struct nicpf *nic, int vf)
 		}
 		bgx = NIC_GET_BGX_FROM_VF_LMAC_MAP(nic->vf_lmac_map[vf]);
 		lmac = NIC_GET_LMAC_FROM_VF_LMAC_MAP(nic->vf_lmac_map[vf]);
-		bgx_set_xcast_mode(nic->node, bgx, lmac, mbx.xcast.data.mode);
+		bgx_set_xcast_mode(nic->node, bgx, lmac, mbx.xcast.mode);
 		break;
 	default:
 		dev_err(&nic->pdev->dev,
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index dbd8862d60d6..30c7f54b4f17 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -1964,7 +1964,7 @@ static void __nicvf_set_rx_mode_task(u8 mode, struct xcast_addr_list *mc_addrs,
 		 * its' own LMAC to the filter to accept packets for it.
 		 */
 		mbx.xcast.msg = NIC_MBOX_MSG_ADD_MCAST;
-		mbx.xcast.data.mac = 0;
+		mbx.xcast.mac = 0;
 		if (nicvf_send_msg_to_pf(nic, &mbx) < 0)
 			goto free_mc;
 	}
@@ -1974,7 +1974,7 @@ static void __nicvf_set_rx_mode_task(u8 mode, struct xcast_addr_list *mc_addrs,
 		/* now go through kernel list of MACs and add them one by one */
 		for (idx = 0; idx < mc_addrs->count; idx++) {
 			mbx.xcast.msg = NIC_MBOX_MSG_ADD_MCAST;
-			mbx.xcast.data.mac = mc_addrs->mc[idx];
+			mbx.xcast.mac = mc_addrs->mc[idx];
 			if (nicvf_send_msg_to_pf(nic, &mbx) < 0)
 				goto free_mc;
 		}
@@ -1982,7 +1982,7 @@ static void __nicvf_set_rx_mode_task(u8 mode, struct xcast_addr_list *mc_addrs,
 
 	/* and finally set rx mode for PF accordingly */
 	mbx.xcast.msg = NIC_MBOX_MSG_SET_XCAST;
-	mbx.xcast.data.mode = mode;
+	mbx.xcast.mode = mode;
 
 	nicvf_send_msg_to_pf(nic, &mbx);
 free_mc:
-- 
2.17.2

^ permalink raw reply related

* [PATCH 2/8] net: thunderx: replace global nicvf_rx_mode_wq work queue for all VFs to private for each of them.
From: Vadim Lomovtsev @ 2019-02-06 10:13 UTC (permalink / raw)
  To: sgoutham@cavium.com, rric@kernel.org, davem@davemloft.net,
	linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
  Cc: dnelson@redhat.com, Vadim Lomovtsev
In-Reply-To: <20190206101351.16744-1-vlomovtsev@marvell.com>

Having one work queue for receive mode configuration ndo_set_rx_mode()
call for all VFs results in making each of them wait till the
set_rx_mode() call completes for another VF if any of close, set
receive mode and change flags calls being already invoked. Potentially
this could cause device state change before appropriate call of receive
mode configuration completes, so the call itself became meaningless,
corrupt data or break configuration sequence.

We don't need any delays in NIC VF configuration sequence so having delayed
work call with 0 delay has no sense.

This commit is to implement one work queue for each NIC VF for set_rx_mode
task and to let them work independently and replacing delayed_work
with work_struct.

Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com>
---
 drivers/net/ethernet/cavium/thunder/nic.h     |  4 ++-
 .../net/ethernet/cavium/thunder/nicvf_main.c  | 30 ++++++++++---------
 2 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nic.h b/drivers/net/ethernet/cavium/thunder/nic.h
index f4d81765221e..376a96bce33f 100644
--- a/drivers/net/ethernet/cavium/thunder/nic.h
+++ b/drivers/net/ethernet/cavium/thunder/nic.h
@@ -271,7 +271,7 @@ struct xcast_addr_list {
 };
 
 struct nicvf_work {
-	struct delayed_work    work;
+	struct work_struct     work;
 	u8                     mode;
 	struct xcast_addr_list *mc;
 };
@@ -327,6 +327,8 @@ struct nicvf {
 	struct nicvf_work       rx_mode_work;
 	/* spinlock to protect workqueue arguments from concurrent access */
 	spinlock_t              rx_mode_wq_lock;
+	/* workqueue for handling kernel ndo_set_rx_mode() calls */
+	struct workqueue_struct *nicvf_rx_mode_wq;
 
 	/* PTP timestamp */
 	struct cavium_ptp	*ptp_clock;
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index 88f8a8fa93cd..abf24e7dff2d 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -68,9 +68,6 @@ module_param(cpi_alg, int, 0444);
 MODULE_PARM_DESC(cpi_alg,
 		 "PFC algorithm (0=none, 1=VLAN, 2=VLAN16, 3=IP Diffserv)");
 
-/* workqueue for handling kernel ndo_set_rx_mode() calls */
-static struct workqueue_struct *nicvf_rx_mode_wq;
-
 static inline u8 nicvf_netdev_qidx(struct nicvf *nic, u8 qidx)
 {
 	if (nic->sqs_mode)
@@ -1311,6 +1308,9 @@ int nicvf_stop(struct net_device *netdev)
 	struct nicvf_cq_poll *cq_poll = NULL;
 	union nic_mbx mbx = {};
 
+	/* wait till all queued set_rx_mode tasks completes */
+	drain_workqueue(nic->nicvf_rx_mode_wq);
+
 	mbx.msg.msg = NIC_MBOX_MSG_SHUTDOWN;
 	nicvf_send_msg_to_pf(nic, &mbx);
 
@@ -1418,6 +1418,9 @@ int nicvf_open(struct net_device *netdev)
 	struct nicvf_cq_poll *cq_poll = NULL;
 	union nic_mbx mbx = {};
 
+	/* wait till all queued set_rx_mode tasks completes if any */
+	drain_workqueue(nic->nicvf_rx_mode_wq);
+
 	netif_carrier_off(netdev);
 
 	err = nicvf_register_misc_interrupt(nic);
@@ -1973,7 +1976,7 @@ static void __nicvf_set_rx_mode_task(u8 mode, struct xcast_addr_list *mc_addrs,
 static void nicvf_set_rx_mode_task(struct work_struct *work_arg)
 {
 	struct nicvf_work *vf_work = container_of(work_arg, struct nicvf_work,
-						  work.work);
+						  work);
 	struct nicvf *nic = container_of(vf_work, struct nicvf, rx_mode_work);
 	u8 mode;
 	struct xcast_addr_list *mc;
@@ -2030,7 +2033,7 @@ static void nicvf_set_rx_mode(struct net_device *netdev)
 	kfree(nic->rx_mode_work.mc);
 	nic->rx_mode_work.mc = mc_list;
 	nic->rx_mode_work.mode = mode;
-	queue_delayed_work(nicvf_rx_mode_wq, &nic->rx_mode_work.work, 0);
+	queue_work(nic->nicvf_rx_mode_wq, &nic->rx_mode_work.work);
 	spin_unlock(&nic->rx_mode_wq_lock);
 }
 
@@ -2187,7 +2190,10 @@ static int nicvf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	INIT_WORK(&nic->reset_task, nicvf_reset_task);
 
-	INIT_DELAYED_WORK(&nic->rx_mode_work.work, nicvf_set_rx_mode_task);
+	nic->nicvf_rx_mode_wq = alloc_ordered_workqueue("nicvf_rx_mode_wq_VF%d",
+							WQ_MEM_RECLAIM,
+							nic->vf_id);
+	INIT_WORK(&nic->rx_mode_work.work, nicvf_set_rx_mode_task);
 	spin_lock_init(&nic->rx_mode_wq_lock);
 
 	err = register_netdev(netdev);
@@ -2228,13 +2234,15 @@ static void nicvf_remove(struct pci_dev *pdev)
 	nic = netdev_priv(netdev);
 	pnetdev = nic->pnicvf->netdev;
 
-	cancel_delayed_work_sync(&nic->rx_mode_work.work);
-
 	/* Check if this Qset is assigned to different VF.
 	 * If yes, clean primary and all secondary Qsets.
 	 */
 	if (pnetdev && (pnetdev->reg_state == NETREG_REGISTERED))
 		unregister_netdev(pnetdev);
+	if (nic->nicvf_rx_mode_wq) {
+		destroy_workqueue(nic->nicvf_rx_mode_wq);
+		nic->nicvf_rx_mode_wq = NULL;
+	}
 	nicvf_unregister_interrupts(nic);
 	pci_set_drvdata(pdev, NULL);
 	if (nic->drv_stats)
@@ -2261,17 +2269,11 @@ static struct pci_driver nicvf_driver = {
 static int __init nicvf_init_module(void)
 {
 	pr_info("%s, ver %s\n", DRV_NAME, DRV_VERSION);
-	nicvf_rx_mode_wq = alloc_ordered_workqueue("nicvf_generic",
-						   WQ_MEM_RECLAIM);
 	return pci_register_driver(&nicvf_driver);
 }
 
 static void __exit nicvf_cleanup_module(void)
 {
-	if (nicvf_rx_mode_wq) {
-		destroy_workqueue(nicvf_rx_mode_wq);
-		nicvf_rx_mode_wq = NULL;
-	}
 	pci_unregister_driver(&nicvf_driver);
 }
 
-- 
2.17.2

^ permalink raw reply related

* [PATCH 1/8] net: thunderx: correct typo in macro name
From: Vadim Lomovtsev @ 2019-02-06 10:13 UTC (permalink / raw)
  To: sgoutham@cavium.com, rric@kernel.org, davem@davemloft.net,
	linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
  Cc: dnelson@redhat.com, Vadim Lomovtsev
In-Reply-To: <20190206101351.16744-1-vlomovtsev@marvell.com>

Correct STREERING to STEERING at macro name for BGX steering register.

Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com>
---
 drivers/net/ethernet/cavium/thunder/thunder_bgx.c | 2 +-
 drivers/net/ethernet/cavium/thunder/thunder_bgx.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
index e337da6ba2a4..673c57b8023f 100644
--- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
+++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
@@ -1217,7 +1217,7 @@ static void bgx_init_hw(struct bgx *bgx)
 
 	/* Disable MAC steering (NCSI traffic) */
 	for (i = 0; i < RX_TRAFFIC_STEER_RULE_COUNT; i++)
-		bgx_reg_write(bgx, 0, BGX_CMR_RX_STREERING + (i * 8), 0x00);
+		bgx_reg_write(bgx, 0, BGX_CMR_RX_STEERING + (i * 8), 0x00);
 }
 
 static u8 bgx_get_lane2sds_cfg(struct bgx *bgx, struct lmac *lmac)
diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.h b/drivers/net/ethernet/cavium/thunder/thunder_bgx.h
index cbdd20b9ee6f..5cbc54e9eb19 100644
--- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.h
+++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.h
@@ -60,7 +60,7 @@
 #define  RX_DMACX_CAM_EN			BIT_ULL(48)
 #define  RX_DMACX_CAM_LMACID(x)			(((u64)x) << 49)
 #define  RX_DMAC_COUNT				32
-#define BGX_CMR_RX_STREERING		0x300
+#define BGX_CMR_RX_STEERING		0x300
 #define  RX_TRAFFIC_STEER_RULE_COUNT		8
 #define BGX_CMR_CHAN_MSK_AND		0x450
 #define BGX_CMR_BIST_STATUS		0x460
-- 
2.17.2

^ permalink raw reply related

* [PATCH] net: sxgbe: fix unintended sign extension
From: Colin King @ 2019-02-06 10:25 UTC (permalink / raw)
  To: Byungho An, Girish K S, Vipul Pandya, David S . Miller, netdev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Shifting a u8 by 24 will cause the value to be promoted to an integer. If
the top bit of the u8 is set then the following conversion to an unsigned
long will sign extend the value causing the upper 32 bits to be set in
the result.

Fix this by casting the u8 value to an unsigned long before the shift.

Detected by CoverityScan, CID#1195586 ("Unintended sign extension")

Fixes: 1edb9ca69e8a ("net: sxgbe: add basic framework for Samsung 10Gb ethernet driver")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c b/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c
index 6d22dd500790..51fe161e5da9 100644
--- a/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c
+++ b/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c
@@ -1822,7 +1822,8 @@ static void sxgbe_set_umac_addr(void __iomem *ioaddr, unsigned char *addr,
 	 * is RO.
 	 */
 	writel(data | SXGBE_HI_REG_AE, ioaddr + SXGBE_ADDR_HIGH(reg_n));
-	data = (addr[3] << 24) | (addr[2] << 16) | (addr[1] << 8) | addr[0];
+	data = ((unsigned long)addr[3] << 24) | (addr[2] << 16) |
+		(addr[1] << 8) | addr[0];
 	writel(data, ioaddr + SXGBE_ADDR_LOW(reg_n));
 }
 
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH v2 3/6] ethtool: introduce new ioctl for per-queue settings
From: Michal Kubecek @ 2019-02-06 10:32 UTC (permalink / raw)
  To: netdev; +Cc: Jeff Kirsher, linville, Nicholas Nunley, nhorman, sassmann
In-Reply-To: <20190206000106.24364-3-jeffrey.t.kirsher@intel.com>

On Tue, Feb 05, 2019 at 04:01:03PM -0800, Jeff Kirsher wrote:
> +static int find_max_num_queues(struct cmd_context *ctx)
> +{
> +	struct ethtool_channels echannels;
> +
> +	echannels.cmd = ETHTOOL_GCHANNELS;
> +	if (send_ioctl(ctx, &echannels))
> +		return -1;
> +
> +	return MAX(MAX(echannels.rx_count, echannels.tx_count),
> +		   echannels.combined_count);
> +}

Is the outer MAX() correct here? From the documentation to -L option, it
rather seems we might want

	return MAX(echannels.rx_count, echannels.tx_count) +
	       echannels.combined_count;

But I can't find any NIC around which would have non-zero rx_count or
tx_count so that I cannot check.

Michal Kubecek

^ permalink raw reply

* Re: stmmac / meson8b-dwmac
From: Emiliano Ingrassia @ 2019-02-06 10:36 UTC (permalink / raw)
  To: Martin Blumenstingl, Simon Huelck
  Cc: Gpeppe.cavallaro, alexandre.torgue, linux-amlogic, netdev
In-Reply-To: <CAFBinCAkXN42piiYsc_Qmuqv0w4dbK86gq10AbZyrB-zKuh9hg@mail.gmail.com>

Hi Martin, Hi Simon,

On Mon, Feb 04, 2019 at 03:34:41PM +0100, Martin Blumenstingl wrote:
> On Thu, Jan 17, 2019 at 10:23 PM Simon Huelck <simonmail@gmx.de> wrote:
> [...]
> > >> I got problems with my ODROID c2 running on 4.19.16 ( and some releases
> > >> earlier ). the stmmac / dwmac driver doesnt provide the 800M/900M
> > >> performance that i was used to earlier.
> > >>

Simon, did you ever reach 1 Gbps full duplex speed?
If yes, what was the kernel version did you use?

> > >>
> > >> Now im stuck near 550M/600M in the same environment. but what really
> > >> confuses me that duplex does hurt even more.
> > > interesting that you see this on the Odroid-C2 as well.
> > > previously I have only observed it on an Odroid-C1
> > >
> > >> PC --- VLAN3 --> switch --VLAN3--> ODROID
> > >>
> > >> NAS <-- VLAN1 -- switch <-- VLAN1-- ODROID
> > >>
> > >>
> > >> this means when im doing a iperf from PC to NAS, that my ODROID has load
> > >> on RX/TX same time (duplex). this shouldnt be an issue , all is 1GBits
> > >> FD. And in the past that wasnt an issue.
> +Cc Emiliano who has seen a similar duplex issue on his Odroid-C1: [0]
> (please note that all kernels prior to v5.1 with the pending patches
> from [1] applied are only receiving data on RXD0 and RXD1 but not on
> RXD2 and RXD3)
>
> Emiliano, can you confirm the duplex issue observed by Simon is
> similar to the one you see on your Odroid-C1?
>

It could be but, if I understand correctly, Simon is limited in
speed also in half duplex transmission (~550/600 Mbps), while we can
reach at least 900 Mbps.

> > >>
> > >>
> > >> Now what happens:
> > >>
> > >> - benchmark between PC - ODROID is roughly 550M
> > >>
> > >> - benchmark between NAS - ODROID is roughly 550M
> > >>
> > >> - benchmark between PC - NAS is only around 300M
> > >>
> > >>
> > >> and like i said i was easliy able to hit 800 or even 900M to my NAS
> > >> earlier. I applied some .dtb fixes for interrupt levels for the
> > >> meson-gx.dtsi and meson-gxbb-odroid-c2.dtb, which will be mainlined ,
> > >> but the effect stayed identical.
> > > good that you have the interrupt patches already applied
> > > I believe it don't fix any performance issues - it's a fix for the
> > > Ethernet controller seemingly getting "stuck" (not processing data
> > > anymore). however, that already rules out one potential issue
> > >
> > >> are you aware of this problem ? Earlier kernel versions were all
> > >> perfectly fine and i stepped ( self compiled) kernel through all major
> > >> releases since odroid c2 was mainlined.
> Guiseppe, Alexandre: what kind of data do you need from us if we see
> the speeds drop (in both directions) when we send and receive at the
> same time?
>
> [...]
> > the problem is that i dont have these kernel sources anymore :-(. but i
> > can provide some testing and numbers. maybe i dig if i got these kernel
> > configs somewhere around but i did not change much during migrating
> do you remember the kernel version where it worked fine?
>
> > im using a zyxel gs1900-8 switch and a qnap ts231p , and as i said i
> > didnt change my setup. i was able to hit 100MByte/s from my NAS , so
> > close to the benchmarks of 900MBit/s
> I typically only do small transfers or I have traffic only in one direction.
> thus it's likely that I missed this in my own tests
>
>
> Regards
> Martin
>
>
> [0] http://lists.infradead.org/pipermail/linux-amlogic/2018-December/009679.html
> [1] https://patchwork.kernel.org/cover/10744905/

Regards,

Emiliano

^ permalink raw reply

* [PATCH] net: sfp: do not probe SFP module before we're attached
From: Russell King @ 2019-02-06 10:52 UTC (permalink / raw)
  To: netdev; +Cc: Andrew Lunn, Florian Fainelli, Heiner Kallweit, David S. Miller

When we probe a SFP module, we expect to be able to call the upstream
device's module_insert() function so that the upstream link can be
configured.  However, when the upstream device is delayed, we currently
may end up probing the module before the upstream device is available,
and lose the module_insert() call.

Avoid this by holding off probing the module until the SFP bus is
properly connected to both the SFP socket driver and the upstream
driver.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
---
 drivers/net/phy/sfp-bus.c |  2 ++
 drivers/net/phy/sfp.c     | 30 +++++++++++++++++++++---------
 drivers/net/phy/sfp.h     |  2 ++
 3 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/drivers/net/phy/sfp-bus.c b/drivers/net/phy/sfp-bus.c
index ad9db652874d..fef701bfad62 100644
--- a/drivers/net/phy/sfp-bus.c
+++ b/drivers/net/phy/sfp-bus.c
@@ -347,6 +347,7 @@ static int sfp_register_bus(struct sfp_bus *bus)
 				return ret;
 		}
 	}
+	bus->socket_ops->attach(bus->sfp);
 	if (bus->started)
 		bus->socket_ops->start(bus->sfp);
 	bus->netdev->sfp_bus = bus;
@@ -362,6 +363,7 @@ static void sfp_unregister_bus(struct sfp_bus *bus)
 	if (bus->registered) {
 		if (bus->started)
 			bus->socket_ops->stop(bus->sfp);
+		bus->socket_ops->detach(bus->sfp);
 		if (bus->phydev && ops && ops->disconnect_phy)
 			ops->disconnect_phy(bus->upstream);
 	}
diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
index fd8bb998ae52..68c8fbf099f8 100644
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -184,6 +184,7 @@ struct sfp {
 
 	struct gpio_desc *gpio[GPIO_MAX];
 
+	bool attached;
 	unsigned int state;
 	struct delayed_work poll;
 	struct delayed_work timeout;
@@ -1475,7 +1476,7 @@ static void sfp_sm_event(struct sfp *sfp, unsigned int event)
 	 */
 	switch (sfp->sm_mod_state) {
 	default:
-		if (event == SFP_E_INSERT) {
+		if (event == SFP_E_INSERT && sfp->attached) {
 			sfp_module_tx_disable(sfp);
 			sfp_sm_ins_next(sfp, SFP_MOD_PROBE, T_PROBE_INIT);
 		}
@@ -1607,6 +1608,19 @@ static void sfp_sm_event(struct sfp *sfp, unsigned int event)
 	mutex_unlock(&sfp->sm_mutex);
 }
 
+static void sfp_attach(struct sfp *sfp)
+{
+	sfp->attached = true;
+	if (sfp->state & SFP_F_PRESENT)
+		sfp_sm_event(sfp, SFP_E_INSERT);
+}
+
+static void sfp_detach(struct sfp *sfp)
+{
+	sfp->attached = false;
+	sfp_sm_event(sfp, SFP_E_REMOVE);
+}
+
 static void sfp_start(struct sfp *sfp)
 {
 	sfp_sm_event(sfp, SFP_E_DEV_UP);
@@ -1667,6 +1681,8 @@ static int sfp_module_eeprom(struct sfp *sfp, struct ethtool_eeprom *ee,
 }
 
 static const struct sfp_socket_ops sfp_module_ops = {
+	.attach = sfp_attach,
+	.detach = sfp_detach,
 	.start = sfp_start,
 	.stop = sfp_stop,
 	.module_info = sfp_module_info,
@@ -1834,10 +1850,6 @@ static int sfp_probe(struct platform_device *pdev)
 	dev_info(sfp->dev, "Host maximum power %u.%uW\n",
 		 sfp->max_power_mW / 1000, (sfp->max_power_mW / 100) % 10);
 
-	sfp->sfp_bus = sfp_register_socket(sfp->dev, sfp, &sfp_module_ops);
-	if (!sfp->sfp_bus)
-		return -ENOMEM;
-
 	/* Get the initial state, and always signal TX disable,
 	 * since the network interface will not be up.
 	 */
@@ -1848,10 +1860,6 @@ static int sfp_probe(struct platform_device *pdev)
 		sfp->state |= SFP_F_RATE_SELECT;
 	sfp_set_state(sfp, sfp->state);
 	sfp_module_tx_disable(sfp);
-	rtnl_lock();
-	if (sfp->state & SFP_F_PRESENT)
-		sfp_sm_event(sfp, SFP_E_INSERT);
-	rtnl_unlock();
 
 	for (i = 0; i < GPIO_MAX; i++) {
 		if (gpio_flags[i] != GPIOD_IN || !sfp->gpio[i])
@@ -1884,6 +1892,10 @@ static int sfp_probe(struct platform_device *pdev)
 		dev_warn(sfp->dev,
 			 "No tx_disable pin: SFP modules will always be emitting.\n");
 
+	sfp->sfp_bus = sfp_register_socket(sfp->dev, sfp, &sfp_module_ops);
+	if (!sfp->sfp_bus)
+		return -ENOMEM;
+
 	return 0;
 }
 
diff --git a/drivers/net/phy/sfp.h b/drivers/net/phy/sfp.h
index 31b0acf337e2..64f54b0bbd8c 100644
--- a/drivers/net/phy/sfp.h
+++ b/drivers/net/phy/sfp.h
@@ -7,6 +7,8 @@
 struct sfp;
 
 struct sfp_socket_ops {
+	void (*attach)(struct sfp *sfp);
+	void (*detach)(struct sfp *sfp);
 	void (*start)(struct sfp *sfp);
 	void (*stop)(struct sfp *sfp);
 	int (*module_info)(struct sfp *sfp, struct ethtool_modinfo *modinfo);
-- 
2.7.4


^ permalink raw reply related

* [PATCH] MAINTAINERS: add maintainer for SFF/SFP/SFP+ support
From: Russell King @ 2019-02-06 10:54 UTC (permalink / raw)
  To: netdev; +Cc: Andrew Lunn, David S. Miller, Florian Fainelli, Heiner Kallweit

Add maintainer entry for SFF/SFP/SFP+ support.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
---
 MAINTAINERS | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index f3a5c97e3419..f10b60dcb0bb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13510,6 +13510,15 @@ L:	netdev@vger.kernel.org
 S:	Supported
 F:	drivers/net/ethernet/sfc/
 
+SFF/SFP/SFP+ MODULE SUPPORT
+M:	Russell King <linux@armlinux.org.uk>
+L:	netdev@vger.kernel.org
+S:	Maintained
+F:	drivers/net/phy/phylink.c
+F:	drivers/net/phy/sfp*
+F:	include/linux/phylink.h
+F:	include/linux/sfp.h
+
 SGI GRU DRIVER
 M:	Dimitri Sivanich <sivanich@sgi.com>
 S:	Maintained
-- 
2.7.4


^ permalink raw reply related

* Re: [PATCH iproute2-next] Introduce ip-brctl shell script
From: Stefano Brivio @ 2019-02-06 10:55 UTC (permalink / raw)
  To: Stephen Hemminger, David Ahern
  Cc: Roopa Prabhu, Nikolay Aleksandrov, Phil Sutter, Eric Garver,
	Tomas Dolezal, Lennert Buytenhek, netdev
In-Reply-To: <20190205145033.5d90dc42@hermes.lan>

On Tue, 5 Feb 2019 14:50:33 -0800
Stephen Hemminger <stephen@networkplumber.org> wrote:

> Providing brctl or ifconfig scripts is possible, but it really should
> be put in a sample or demo directory and not installed by default.
> It would just cause too much pain to distributions.

On one hand, I did it this way exactly to make it easy for
distributions (after all, I work for a distributor). The rationale is
that it's easier to get rid of things from package scripts rather than
add them. Also implementing a Debian package diversion looked more
elegant this way. No idea about other ones though.

On the other hand, I didn't even think of adding that to examples/.
Maybe that would alleviate David's concern about having to maintain it
forever (if it breaks temporarily, or if we need to remove it for some
reason, it's not a drama).

> I love concise human readable output and hate long winded VMS style
> commands.

That's exactly where I feel the current tools included in iproute2 fall
rather short. Compare "brctl show" to the closest equivalent "IFS='
'
for b in $(ip -br link show type bridge); do ip -br link show type
bridge_slave master ${b%% *}; done".

Sure, as Roopa said, we could and should improve 'bridge', and by now
I'm even almost convinced it's doable without breaking the existing
syntax, but it's not something we're doing overnight.

-- 
Stefano

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox