Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net] flow_dissect: call init_default_flow_dissectors() earlier
From: Andre Noll @ 2016-11-22 19:44 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jiri Pirko, Duyck, Alexander H, edumazet@google.com,
	linux-kernel@vger.kernel.org, ast@kernel.org, willemb@google.com,
	gregkh@linuxfoundation.org, jslaby@suse.cz, davem@davemloft.net,
	yibyang@cisco.com, netdev
In-Reply-To: <1479842250.8455.452.camel@edumazet-glaptop3.roam.corp.google.com>

[-- Attachment #1: Type: text/plain, Size: 424 bytes --]

On Tue, Nov 22, 11:17, Eric Dumazet wrote
> -late_initcall_sync(init_default_flow_dissectors);
> +core_initcall(init_default_flow_dissectors);

Indeed, that fixed it. Feel free to add

Tested-by: Andre Noll <maan@tuebingen.mpg.de>

Thanks a lot
Andre
-- 
Max Planck Institute for Developmental Biology
Spemannstraße 35, 72076 Tübingen, Germany. Phone: (+49) 7071 601 829
http://people.tuebingen.mpg.de/maan/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* Re: [PATCH net] flow_dissect: call init_default_flow_dissectors() earlier
From: David Miller @ 2016-11-22 19:44 UTC (permalink / raw)
  To: eric.dumazet
  Cc: maan, jiri, alexander.h.duyck, edumazet, linux-kernel, ast,
	willemb, gregkh, jslaby, yibyang, netdev
In-Reply-To: <1479842250.8455.452.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 22 Nov 2016 11:17:30 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> Andre Noll reported panics after my recent fix (commit 34fad54c2537
> "net: __skb_flow_dissect() must cap its return value")
> 
> After some more headaches, Alexander root caused the problem to
> init_default_flow_dissectors() being called too late, in case
> a network driver like IGB is not a module and receives DHCP message
> very early.
> 
> Fix is to call init_default_flow_dissectors() much earlier,
> as it is a core infrastructure and does not depend on another
> kernel service.
> 
> Fixes: 06635a35d13d4 ("flow_dissect: use programable dissector in skb_flow_dissect and friends")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Andre Noll <maan@tuebingen.mpg.de>
> Diagnosed-by: Alexander Duyck <alexander.h.duyck@intel.com>

Applied and queued up for -stable, I'll try to fast-track this.

^ permalink raw reply

* Re: [RFC 02/10] IB/hfi-vnic: Virtual Network Interface Controller (VNIC) Bus driver
From: Vishwanathapura, Niranjana @ 2016-11-22 19:49 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, linux-rdma, netdev, Dennis Dalessandro
In-Reply-To: <20161122170407.GE3956@obsidianresearch.com>

Ok, I do understand Jason's point that we should probably not put this driver 
under drivers/infiniband/sw/.., as this driver is not a HCA.
It is an ULP similar to ipoib, built on top of Omni-path irrespective of 
whether we register a hfi_vnic_bus or a direct custom interface with HFI1.
This ULP will transmit and recieve Omni-path packets over the fabric, and is 
dependent on IB MAD interface and the HFI1 driver.

Doug,
Will it be acceptable if we put it under 'drivers/infiniband/ulp/hfi_vnic'?

Niranjana

^ permalink raw reply

* Re: [PATCH net-next 4/5] net: phy: bcm7xxx: Add support for downshift/Wirespeed
From: Andrew Lunn @ 2016-11-22 20:02 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, davem, bcm-kernel-feedback-list, allan.nielsen,
	raju.lakkaraju, vivien.didelot
In-Reply-To: <20161122194058.29820-5-f.fainelli@gmail.com>

> +static int bcm7xxx_28nm_set_tunable(struct phy_device *phydev,
> +				    struct ethtool_tunable *tuna,
> +				    const void *data)
> +{
> +	u8 count = *(u8 *)data;
> +	int ret;
> +
> +	switch (tuna->id) {
> +	case ETHTOOL_PHY_DOWNSHIFT:
> +		ret = bcm_phy_downshift_set(phydev, count);
> +		break;
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +
> +	if (ret)
> +		return ret;
> +
> +	/* Disable EEE advertisment since this prevents the PHY
> +	 * from successfully linking up, trigger auto-negotiation restart
> +	 * to let the MAC decide what to do.
> +	 */
> +	ret = bcm_phy_set_eee(phydev, count == DOWNSHIFT_DEV_DISABLE);
> +	if (ret)
> +		return ret;
> +
> +	return genphy_restart_aneg(phydev);
> +}

Hi Florian

Is the locking O.K. here? The core code does not take the phy lock.
But i think your shadow register accesses at least need to be
protected by the lock?

Maybe we should think about this locking a bit. It is normal for the
lock to be held when using ops in the phy driver structure. The
exception is suspend/resume. Maybe we should also take the lock before
calling the phydev->drv->get_tunable() and phydev->drv->set_tunable()?

	  Andrew

^ permalink raw reply

* Re: [PATCH net-next 4/5] net: phy: bcm7xxx: Add support for downshift/Wirespeed
From: Florian Fainelli @ 2016-11-22 20:07 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, davem, bcm-kernel-feedback-list, allan.nielsen,
	raju.lakkaraju, vivien.didelot
In-Reply-To: <20161122200228.GG14947@lunn.ch>

On 11/22/2016 12:02 PM, Andrew Lunn wrote:
>> +static int bcm7xxx_28nm_set_tunable(struct phy_device *phydev,
>> +				    struct ethtool_tunable *tuna,
>> +				    const void *data)
>> +{
>> +	u8 count = *(u8 *)data;
>> +	int ret;
>> +
>> +	switch (tuna->id) {
>> +	case ETHTOOL_PHY_DOWNSHIFT:
>> +		ret = bcm_phy_downshift_set(phydev, count);
>> +		break;
>> +	default:
>> +		return -EOPNOTSUPP;
>> +	}
>> +
>> +	if (ret)
>> +		return ret;
>> +
>> +	/* Disable EEE advertisment since this prevents the PHY
>> +	 * from successfully linking up, trigger auto-negotiation restart
>> +	 * to let the MAC decide what to do.
>> +	 */
>> +	ret = bcm_phy_set_eee(phydev, count == DOWNSHIFT_DEV_DISABLE);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return genphy_restart_aneg(phydev);
>> +}
> 
> Hi Florian
> 
> Is the locking O.K. here? The core code does not take the phy lock.
> But i think your shadow register accesses at least need to be
> protected by the lock?

There should be some kind of protection, but I was expecting it to be
done at the caller level, so that when {get,set}_tunable run, they are
serialized with respect to each other, clearly, by looking at the code,
this is not the case.

> 
> Maybe we should think about this locking a bit. It is normal for the
> lock to be held when using ops in the phy driver structure. The
> exception is suspend/resume. Maybe we should also take the lock before
> calling the phydev->drv->get_tunable() and phydev->drv->set_tunable()?

Yes, that certainly seems like a good approach to me, let me cook a
patch doing that.
-- 
Florian

^ permalink raw reply

* [PATCH net-next] ethtool: Protect {get,set}_phy_tunable with PHY device mutex
From: Florian Fainelli @ 2016-11-22 20:13 UTC (permalink / raw)
  To: netdev
  Cc: davem, bcm-kernel-feedback-list, andrew, allan.nielsen,
	raju.lakkaraju, vivien.didelot, Florian Fainelli

PHY drivers should be able to rely on the caller of {get,set}_tunable to
have acquired the PHY device mutex, in order to both serialize against
concurrent calls of these functions, but also against PHY state machine
changes. All ethtool PHY-level functions do this, except
{get,set}_tunable, so we make them consistent here as well.

Fixes: 968ad9da7e0e ("ethtool: Implements ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 net/core/ethtool.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index e9b4556751ff..0adb3bec5b5a 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -2466,7 +2466,9 @@ static int get_phy_tunable(struct net_device *dev, void __user *useraddr)
 	data = kmalloc(tuna.len, GFP_USER);
 	if (!data)
 		return -ENOMEM;
+	mutex_lock(&phydev->lock);
 	ret = phydev->drv->get_tunable(phydev, &tuna, data);
+	mutex_unlock(&phydev->lock);
 	if (ret)
 		goto out;
 	useraddr += sizeof(tuna);
@@ -2501,7 +2503,9 @@ static int set_phy_tunable(struct net_device *dev, void __user *useraddr)
 	ret = -EFAULT;
 	if (copy_from_user(data, useraddr, tuna.len))
 		goto out;
+	mutex_lock(&phydev->lock);
 	ret = phydev->drv->set_tunable(phydev, &tuna, data);
+	mutex_unlock(&phydev->lock);
 
 out:
 	kfree(data);
-- 
2.9.3

^ permalink raw reply related

* [PATCH ethtool v4 0/2] Adding downshift support to ethtool
From: Allan W. Nielsen @ 2016-11-22 20:32 UTC (permalink / raw)
  To: netdev; +Cc: andrew, f.fainelli, raju.lakkaraju, allan.nielsen

(downshift feature is applied in the net-next tree - d3c19c0a72)

This series adds support for downshift (using phy-tunables).

Downshifting can either be turned on/off, or it can be configured to a
specifc count.

"count" is optional.

Change set:
v1:
- Initial version of set/get phy tunable with downshift feature.
v2:
- (ethtool) Syntax is changed from "--set-phy-tunable downshift on|off|%d"
  to "--set-phy-tunable [downshift on|off [count N]]" - as requested by
  Andrew.
v3:
- Fixed Spelling in "ethtool-copy.h:sync with net" 
- Fixed "if send_ioctl() returns an error, print the error message and then
  still print th value of count".
v4:
- Fixing spelling in the example included in the commit message
- Improve the description in the man-page

Raju Lakkaraju (2):
  ethtool-copy.h:sync with net
  Ethtool: Implements ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE and PHY
    downshift

 ethtool-copy.h |  18 +++++++-
 ethtool.8.in   |  40 ++++++++++++++++
 ethtool.c      | 144 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 201 insertions(+), 1 deletion(-)

-- 
2.7.3

^ permalink raw reply

* [PATCH ethtool v4 1/2] ethtool-copy.h:sync with net
From: Allan W. Nielsen @ 2016-11-22 20:32 UTC (permalink / raw)
  To: netdev; +Cc: andrew, f.fainelli, raju.lakkaraju, allan.nielsen, Raju Lakkaraju
In-Reply-To: <1479846737-18669-1-git-send-email-allan.nielsen@microsemi.com>

From: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>

This covers kernel changes upto:

commit f5a4732f85613b3fb43f8bc33a017e3db3b3605a
Author: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Date:   Wed Nov 9 16:33:09 2016 +0530

    ethtool: (uapi) Add ETHTOOL_PHY_DOWNSHIFT to PHY tunables

    For operation in cabling environments that are incompatible with
    1000BASE-T, PHY device may provide an automatic link speed downshift
    operation. When enabled, the device automatically changes its 1000BASE-T
    auto-negotiation to the next slower speed after a configured number of
    failed attempts at 1000BASE-T.  This feature is useful in setting up in
    networks using older cable installations that include only pairs A and B,
    and not pairs C and D.

    Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
    Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com>

Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
---
 ethtool-copy.h | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/ethtool-copy.h b/ethtool-copy.h
index 70748f5..2e2448f 100644
--- a/ethtool-copy.h
+++ b/ethtool-copy.h
@@ -247,6 +247,19 @@ struct ethtool_tunable {
 	void	*data[0];
 };
 
+#define DOWNSHIFT_DEV_DEFAULT_COUNT	0xff
+#define DOWNSHIFT_DEV_DISABLE		0
+
+enum phy_tunable_id {
+	ETHTOOL_PHY_ID_UNSPEC,
+	ETHTOOL_PHY_DOWNSHIFT,
+	/*
+	 * Add your fresh new phy tunable attribute above and remember to update
+	 * phy_tunable_strings[] in net/core/ethtool.c
+	 */
+	__ETHTOOL_PHY_TUNABLE_COUNT,
+};
+
 /**
  * struct ethtool_regs - hardware register dump
  * @cmd: Command number = %ETHTOOL_GREGS
@@ -547,6 +560,7 @@ struct ethtool_pauseparam {
  * @ETH_SS_FEATURES: Device feature names
  * @ETH_SS_RSS_HASH_FUNCS: RSS hush function names
  * @ETH_SS_PHY_STATS: Statistic names, for use with %ETHTOOL_GPHYSTATS
+ * @ETH_SS_PHY_TUNABLES: PHY tunable names
  */
 enum ethtool_stringset {
 	ETH_SS_TEST		= 0,
@@ -557,6 +571,7 @@ enum ethtool_stringset {
 	ETH_SS_RSS_HASH_FUNCS,
 	ETH_SS_TUNABLES,
 	ETH_SS_PHY_STATS,
+	ETH_SS_PHY_TUNABLES,
 };
 
 /**
@@ -1312,7 +1327,8 @@ struct ethtool_per_queue_op {
 
 #define ETHTOOL_GLINKSETTINGS	0x0000004c /* Get ethtool_link_settings */
 #define ETHTOOL_SLINKSETTINGS	0x0000004d /* Set ethtool_link_settings */
-
+#define ETHTOOL_PHY_GTUNABLE	0x0000004e /* Get PHY tunable configuration */
+#define ETHTOOL_PHY_STUNABLE	0x0000004f /* Set PHY tunable configuration */
 
 /* compatibility with older code */
 #define SPARC_ETH_GSET		ETHTOOL_GSET
-- 
2.7.3

^ permalink raw reply related

* [PATCH ethtool v4 2/2] Ethtool: Implements ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE and PHY downshift
From: Allan W. Nielsen @ 2016-11-22 20:32 UTC (permalink / raw)
  To: netdev; +Cc: andrew, f.fainelli, raju.lakkaraju, allan.nielsen, Raju Lakkaraju
In-Reply-To: <1479846737-18669-1-git-send-email-allan.nielsen@microsemi.com>

From: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>

Add ethtool get and set tunable to access PHY drivers.

Ethtool Help: ethtool -h for PHY tunables
    ethtool --set-phy-tunable DEVNAME      Set PHY tunable
                [ downshift on|off [count N] ]
    ethtool --get-phy-tunable DEVNAME      Get PHY tunable
                [ downshift ]

Ethtool ex:
  ethtool --set-phy-tunable eth0 downshift on
  ethtool --set-phy-tunable eth0 downshift off
  ethtool --set-phy-tunable eth0 downshift on count 2

  ethtool --get-phy-tunable eth0 downshift

Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
---
 ethtool.8.in |  40 +++++++++++++++++
 ethtool.c    | 144 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 184 insertions(+)

diff --git a/ethtool.8.in b/ethtool.8.in
index 9631847..5c36c06 100644
--- a/ethtool.8.in
+++ b/ethtool.8.in
@@ -340,6 +340,18 @@ ethtool \- query or control network driver and hardware settings
 .B2 tx-lpi on off
 .BN tx-timer
 .BN advertise
+.HP
+.B ethtool \-\-set\-phy\-tunable
+.I devname
+.RB [
+.B downshift
+.A1 on off
+.BN count
+.RB ]
+.HP
+.B ethtool \-\-get\-phy\-tunable
+.I devname
+.RB [ downshift ]
 .
 .\" Adjust lines (i.e. full justification) and hyphenate.
 .ad
@@ -947,6 +959,34 @@ Values are as for
 Sets the amount of time the device should stay in idle mode prior to asserting
 its Tx LPI (in microseconds). This has meaning only when Tx LPI is enabled.
 .RE
+.TP
+.B \-\-set\-phy\-tunable
+Sets the PHY tunable parameters.
+.RS 4
+.TP
+.A2 downshift on off
+Specifies whether downshift should be enabled
+.TS
+nokeep;
+lB	l.
+.BI count \ N
+Sets the PHY downshift re-tries count.
+.TE
+.PD
+.RE
+.TP
+.B \-\-get\-phy\-tunable
+Gets the PHY tunable parameters.
+.RS 4
+.TP
+.B downshift
+For operation in cabling environments that are incompatible with 1000BASE-T,
+PHY device provides an automatic link speed downshift operation.
+Link speed downshift after N failed 1000BASE-T auto-negotiation attempts.
+Downshift is useful where cable does not have the 4 pairs instance.
+
+Gets the PHY downshift count/status.
+.RE
 .SH BUGS
 Not supported (in part or whole) on all network drivers.
 .SH AUTHOR
diff --git a/ethtool.c b/ethtool.c
index 49ac94e..7dcd005 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -4520,6 +4520,146 @@ static int do_seee(struct cmd_context *ctx)
 	return 0;
 }
 
+static int do_get_phy_tunable(struct cmd_context *ctx)
+{
+	int argc = ctx->argc;
+	char **argp = ctx->argp;
+	int err, i;
+	u8 downshift_changed = 0;
+
+	if (argc < 1)
+		exit_bad_args();
+	for (i = 0; i < argc; i++) {
+		if (!strcmp(argp[i], "downshift")) {
+			downshift_changed = 1;
+			i += 1;
+			if (i < argc)
+				exit_bad_args();
+		} else  {
+			exit_bad_args();
+		}
+	}
+
+	if (downshift_changed) {
+		struct ethtool_tunable ds;
+		u8 count = 0;
+
+		ds.cmd = ETHTOOL_PHY_GTUNABLE;
+		ds.id = ETHTOOL_PHY_DOWNSHIFT;
+		ds.type_id = ETHTOOL_TUNABLE_U8;
+		ds.len = 1;
+		ds.data[0] = &count;
+		err = send_ioctl(ctx, &ds);
+		if (err < 0) {
+			perror("Cannot Get PHY downshift count");
+			return 87;
+		}
+		count = *((u8 *)&ds.data[0]);
+		if (count)
+			fprintf(stdout, "Downshift count: %d\n", count);
+		else
+			fprintf(stdout, "Downshift disabled\n");
+	}
+
+	return err;
+}
+
+static int parse_named_bool(struct cmd_context *ctx, const char *name, u8 *on)
+{
+	if (ctx->argc < 2)
+		return 0;
+
+	if (strcmp(*ctx->argp, name))
+		return 0;
+
+	if (!strcmp(*(ctx->argp + 1), "on")) {
+		*on = 1;
+	} else if (!strcmp(*(ctx->argp + 1), "off")) {
+		*on = 0;
+	} else {
+		fprintf(stderr, "Invalid boolean\n");
+		exit_bad_args();
+	}
+
+	ctx->argc -= 2;
+	ctx->argp += 2;
+
+	return 1;
+}
+
+static int parse_named_u8(struct cmd_context *ctx, const char *name, u8 *val)
+{
+	if (ctx->argc < 2)
+		return 0;
+
+	if (strcmp(*ctx->argp, name))
+		return 0;
+
+	*val = get_uint_range(*(ctx->argp + 1), 0, 0xff);
+
+	ctx->argc -= 2;
+	ctx->argp += 2;
+
+	return 1;
+}
+
+static int do_set_phy_tunable(struct cmd_context *ctx)
+{
+	int err = 0;
+	u8 ds_cnt = DOWNSHIFT_DEV_DEFAULT_COUNT;
+	u8 ds_changed = 0, ds_has_cnt = 0, ds_enable = 0;
+
+	if (ctx->argc == 0)
+		exit_bad_args();
+
+	/* Parse arguments */
+	while (ctx->argc) {
+		if (parse_named_bool(ctx, "downshift", &ds_enable)) {
+			ds_changed = 1;
+			ds_has_cnt = parse_named_u8(ctx, "count", &ds_cnt);
+		} else {
+			exit_bad_args();
+		}
+	}
+
+	/* Validate parameters */
+	if (ds_changed) {
+		if (!ds_enable && ds_has_cnt) {
+			fprintf(stderr, "'count' may not be set when downshift "
+				        "is off.\n");
+			exit_bad_args();
+		}
+
+		if (ds_enable && ds_has_cnt && ds_cnt == 0) {
+			fprintf(stderr, "'count' may not be zero.\n");
+			exit_bad_args();
+		}
+
+		if (!ds_enable)
+			ds_cnt = DOWNSHIFT_DEV_DISABLE;
+	}
+
+	/* Do it */
+	if (ds_changed) {
+		struct ethtool_tunable ds;
+		u8 count;
+
+		ds.cmd = ETHTOOL_PHY_STUNABLE;
+		ds.id = ETHTOOL_PHY_DOWNSHIFT;
+		ds.type_id = ETHTOOL_TUNABLE_U8;
+		ds.len = 1;
+		ds.data[0] = &count;
+		*((u8 *)&ds.data[0]) = ds_cnt;
+		err = send_ioctl(ctx, &ds);
+		if (err < 0) {
+			perror("Cannot Set PHY downshift count");
+			err = 87;
+		}
+	}
+
+	return err;
+}
+
 #ifndef TEST_ETHTOOL
 int send_ioctl(struct cmd_context *ctx, void *cmd)
 {
@@ -4681,6 +4821,10 @@ static const struct option {
 	  "		[ advertise %x ]\n"
 	  "		[ tx-lpi on|off ]\n"
 	  "		[ tx-timer %d ]\n"},
+	{ "--set-phy-tunable", 1, do_set_phy_tunable, "Set PHY tunable",
+	  "		[ downshift on|off [count N] ]\n"},
+	{ "--get-phy-tunable", 1, do_get_phy_tunable, "Get PHY tunable",
+	  "		[ downshift ]\n"},
 	{ "-h|--help", 0, show_usage, "Show this help" },
 	{ "--version", 0, do_version, "Show version number" },
 	{}
-- 
2.7.3

^ permalink raw reply related

* Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it
From: Daniel Borkmann @ 2016-11-22 20:41 UTC (permalink / raw)
  To: Cong Wang, Jiri Pirko
  Cc: Roi Dayan, David S. Miller, Linux Kernel Network Developers,
	Jiri Pirko, Or Gerlitz, Cong Wang, John Fastabend
In-Reply-To: <CAM_iQpVTtqQRg7KgQbMQSxxVJRB8a46DGgfU52m=SxF1uYDFWQ@mail.gmail.com>

On 11/22/2016 08:28 PM, Cong Wang wrote:
> On Tue, Nov 22, 2016 at 8:11 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Tue, Nov 22, 2016 at 05:04:11PM CET, daniel@iogearbox.net wrote:
>>> Hmm, I don't think we want to have such an additional test in fast
>>> path for each and every classifier. Can we think of ways to avoid that?
>>>
>>> My question is, since we unlink individual instances from such tp-internal
>>> lists through RCU and release the instance through call_rcu() as well as
>>> the head (tp->root) via kfree_rcu() eventually, against what are we protecting
>>> setting RCU_INIT_POINTER(tp->root, NULL) in ->destroy() callback? Something
>>> not respecting grace period?
>>
>> If you call tp->ops->destroy in call_rcu, you don't have to set tp->root
>> to null.

But that's not really an answer to my question. ;)

> We do need to respect the grace period if we touch the globally visible
> data structure tp in tcf_destroy(). Therefore Roi's patch is not fixing the
> right place.

I think there may be multiple issues actually.

At the time we go into tc_classify(), from ingress as well as egress side,
we're under RCU, but BH variant. In cls delete()/destroy() callbacks, we
everywhere use call_rcu() and kfree_rcu(), same as for tcf_destroy() where
we use kfree_rcu() on tp, although we iterate tps (and implicitly inner filters)
via rcu_dereference_bh() from reader side. Is there a reason why we don't
use call_rcu_bh() variant on destruction for all this instead?

Just looking at cls_bpf and others, what protects RCU_INIT_POINTER(tp->root,
NULL) against? The tp is unlinked in tc_ctl_tfilter() from the tp chain in
tcf_destroy() cases. Still active readers under RCU BH can race against this
(tp->root being NULL), as the commit identified. Only the get() callback checks
for head against NULL, but both are serialized under rtnl, and the only place
we call this is tc_ctl_tfilter(). Even if we create a new tp, head should not
be NULL there, if it was assigned during the init() cb, but contains an empty
list. (It's different for things like cls_cgroup, though.) So, I'm wondering
if the RCU_INIT_POINTER(tp->root, NULL) can just be removed instead (unless I'm
missing something obvious)?

> Also I don't know why you blame my commit, this problem should already
> exist prior to my commit, probably date back to John's RCU patches.

It seems so.

^ permalink raw reply

* Re: [PATCH net-next 4/5] net: phy: bcm7xxx: Add support for downshift/Wirespeed
From: Andrew Lunn @ 2016-11-22 20:57 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, davem, bcm-kernel-feedback-list, allan.nielsen,
	raju.lakkaraju, vivien.didelot
In-Reply-To: <1902f0f0-46e5-d3b3-90c1-10867f4fb826@gmail.com>

> > Maybe we should think about this locking a bit. It is normal for the
> > lock to be held when using ops in the phy driver structure. The
> > exception is suspend/resume. Maybe we should also take the lock before
> > calling the phydev->drv->get_tunable() and phydev->drv->set_tunable()?
> 
> Yes, that certainly seems like a good approach to me, let me cook a
> patch doing that.

Hi Florian

There are a couple of mutex locks/unlocks you will need to remove from
mscc.c when you centralize this mutex.

       Andrew

^ permalink raw reply

* Re: [PATCH net-next 1/4] net: mvneta: Convert to be 64 bits compatible
From: Arnd Bergmann @ 2016-11-22 21:04 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Gregory CLEMENT, David S. Miller, linux-kernel, netdev,
	Thomas Petazzoni, Andrew Lunn, Jason Cooper, Marcin Wojtas,
	Sebastian Hesselbarth
In-Reply-To: <20161122164844.19566-2-gregory.clement@free-electrons.com>

On Tuesday, November 22, 2016 5:48:41 PM CET Gregory CLEMENT wrote:
> +#ifdef CONFIG_64BIT
> +       void *data_tmp;
> +
> +       /* In Neta HW only 32 bits data is supported, so in order to
> +        * obtain whole 64 bits address from RX descriptor, we store
> +        * the upper 32 bits when allocating buffer, and put it back
> +        * when using buffer cookie for accessing packet in memory.
> +        * Frags should be allocated from single 'memory' region,
> +        * hence common upper address half should be sufficient.
> +        */
> +       data_tmp = mvneta_frag_alloc(pp->frag_size);
> +       if (data_tmp) {
> +               pp->data_high = (u64)upper_32_bits((u64)data_tmp) << 32;
> +               mvneta_frag_free(pp->frag_size, data_tmp);
> +       }
> 

How does this work when the region spans a n*4GB address boundary?

	Arnd

^ permalink raw reply

* Re: [PATCH net-next 4/5] net: phy: bcm7xxx: Add support for downshift/Wirespeed
From: Florian Fainelli @ 2016-11-22 21:01 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli
  Cc: netdev, davem, bcm-kernel-feedback-list, allan.nielsen,
	raju.lakkaraju, vivien.didelot
In-Reply-To: <20161122205747.GH14947@lunn.ch>

On 11/22/2016 12:57 PM, Andrew Lunn wrote:
>>> Maybe we should think about this locking a bit. It is normal for the
>>> lock to be held when using ops in the phy driver structure. The
>>> exception is suspend/resume. Maybe we should also take the lock before
>>> calling the phydev->drv->get_tunable() and phydev->drv->set_tunable()?
>>
>> Yes, that certainly seems like a good approach to me, let me cook a
>> patch doing that.
> 
> Hi Florian
> 
> There are a couple of mutex locks/unlocks you will need to remove from
> mscc.c when you centralize this mutex.

Good point, thanks, let me review the mscc PHY driver and propose a more
proper fix.
-- 
Florian

^ permalink raw reply

* [PATCH net-next V2 0/7] Mellanox 100G mlx5 SRIOV switchdev update
From: Saeed Mahameed @ 2016-11-22 21:09 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Or Gerlitz, Roi Dayan, Saeed Mahameed

Hi Dave,

This series from Roi and Or further enhances the new SRIOV switchdev mode.

Roi's patches deal with allowing users to configure though devlink
the level of inline headers that the VF should be setting in order for
the eswitch HW to do proper matching. We also enforce that the matching
required for offloaded TC rules is aligned with that level on the PF driver.

Or's patches deals with allowing the user to control on the VF operational
link state through admin directives on the mlx5 VF rep link. Also in this series
is implementation of HW and SW counters for the mlx5 VF rep which is aligned
with the design set by commit a5ea31f57309 'Merge branch net-offloaded-stats'.

v1 --> v2:
* constified the net-device param of get offloaded stats ndo in mlxsw
  (pointed by 0-day screaming on us...)
* added Or's Review-by tags for Roi's patches

This series was generated against commit
e796f49d826a ("net: ieee802154: constify ieee802154_ops structures")

Thanks,
Saeed.

Or Gerlitz (3):
  net: Add net-device param to the get offloaded stats ndo
  net/mlx5e: Support HW (offloaded) and SW counters for SRIOV switchdev
    mode
  net/mlx5e: Support VF vport link state control for SRIOV switchdev
    mode

Roi Dayan (4):
  devlink: Add E-Switch inline mode control
  net/mlx5: Enable to query min inline for a specific vport
  net/mlx5: E-Switch, Add control for inline mode
  net/mlx5e: Enforce min inline mode when offloading flows

 drivers/net/ethernet/mellanox/mlx5/core/en.h       |  15 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  42 +++---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   | 144 +++++++++++++++++++--
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    |  46 ++++++-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |   4 +
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 141 ++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/main.c     |   2 +
 drivers/net/ethernet/mellanox/mlx5/core/vport.c    |  14 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c     |   2 +-
 include/linux/mlx5/vport.h                         |  10 +-
 include/linux/netdevice.h                          |   4 +-
 include/net/devlink.h                              |   2 +
 include/uapi/linux/devlink.h                       |   8 ++
 net/core/devlink.c                                 |  70 +++++++---
 net/core/rtnetlink.c                               |   4 +-
 17 files changed, 438 insertions(+), 72 deletions(-)

-- 
2.7.4

^ permalink raw reply

* [PATCH net-next V2 4/7] devlink: Add E-Switch inline mode control
From: Saeed Mahameed @ 2016-11-22 21:09 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Or Gerlitz, Roi Dayan, Saeed Mahameed
In-Reply-To: <1479849000-14902-1-git-send-email-saeedm@mellanox.com>

From: Roi Dayan <roid@mellanox.com>

Some HWs need the VF driver to put part of the packet headers on the
TX descriptor so the e-switch can do proper matching and steering.

The supported modes: none, link, network, transport.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/net/devlink.h        |  2 ++
 include/uapi/linux/devlink.h |  8 +++++
 net/core/devlink.c           | 70 ++++++++++++++++++++++++++++++++------------
 3 files changed, 61 insertions(+), 19 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 211bd3c..d29e5fc 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -92,6 +92,8 @@ struct devlink_ops {
 
 	int (*eswitch_mode_get)(struct devlink *devlink, u16 *p_mode);
 	int (*eswitch_mode_set)(struct devlink *devlink, u16 mode);
+	int (*eswitch_inline_mode_get)(struct devlink *devlink, u8 *p_inline_mode);
+	int (*eswitch_inline_mode_set)(struct devlink *devlink, u8 inline_mode);
 };
 
 static inline void *devlink_priv(struct devlink *devlink)
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 915bfa7..9014c33 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -102,6 +102,13 @@ enum devlink_eswitch_mode {
 	DEVLINK_ESWITCH_MODE_SWITCHDEV,
 };
 
+enum devlink_eswitch_inline_mode {
+	DEVLINK_ESWITCH_INLINE_MODE_NONE,
+	DEVLINK_ESWITCH_INLINE_MODE_LINK,
+	DEVLINK_ESWITCH_INLINE_MODE_NETWORK,
+	DEVLINK_ESWITCH_INLINE_MODE_TRANSPORT,
+};
+
 enum devlink_attr {
 	/* don't change the order or add anything between, this is ABI! */
 	DEVLINK_ATTR_UNSPEC,
@@ -133,6 +140,7 @@ enum devlink_attr {
 	DEVLINK_ATTR_SB_OCC_CUR,		/* u32 */
 	DEVLINK_ATTR_SB_OCC_MAX,		/* u32 */
 	DEVLINK_ATTR_ESWITCH_MODE,		/* u16 */
+	DEVLINK_ATTR_ESWITCH_INLINE_MODE,	/* u8 */
 
 	/* add new attributes above here, update the policy in devlink.c */
 
diff --git a/net/core/devlink.c b/net/core/devlink.c
index c14f8b6..2b5bf9e 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -1394,26 +1394,45 @@ static int devlink_nl_cmd_sb_occ_max_clear_doit(struct sk_buff *skb,
 
 static int devlink_eswitch_fill(struct sk_buff *msg, struct devlink *devlink,
 				enum devlink_command cmd, u32 portid,
-				u32 seq, int flags, u16 mode)
+				u32 seq, int flags)
 {
+	const struct devlink_ops *ops = devlink->ops;
 	void *hdr;
+	int err = 0;
+	u16 mode;
+	u8 inline_mode;
 
 	hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
 	if (!hdr)
 		return -EMSGSIZE;
 
-	if (devlink_nl_put_handle(msg, devlink))
-		goto nla_put_failure;
+	err = devlink_nl_put_handle(msg, devlink);
+	if (err)
+		goto out;
 
-	if (nla_put_u16(msg, DEVLINK_ATTR_ESWITCH_MODE, mode))
-		goto nla_put_failure;
+	err = ops->eswitch_mode_get(devlink, &mode);
+	if (err)
+		goto out;
+	err = nla_put_u16(msg, DEVLINK_ATTR_ESWITCH_MODE, mode);
+	if (err)
+		goto out;
+
+	if (ops->eswitch_inline_mode_get) {
+		err = ops->eswitch_inline_mode_get(devlink, &inline_mode);
+		if (err)
+			goto out;
+		err = nla_put_u8(msg, DEVLINK_ATTR_ESWITCH_INLINE_MODE,
+				 inline_mode);
+		if (err)
+			goto out;
+	}
 
 	genlmsg_end(msg, hdr);
 	return 0;
 
-nla_put_failure:
+out:
 	genlmsg_cancel(msg, hdr);
-	return -EMSGSIZE;
+	return err;
 }
 
 static int devlink_nl_cmd_eswitch_mode_get_doit(struct sk_buff *skb,
@@ -1422,22 +1441,17 @@ static int devlink_nl_cmd_eswitch_mode_get_doit(struct sk_buff *skb,
 	struct devlink *devlink = info->user_ptr[0];
 	const struct devlink_ops *ops = devlink->ops;
 	struct sk_buff *msg;
-	u16 mode;
 	int err;
 
 	if (!ops || !ops->eswitch_mode_get)
 		return -EOPNOTSUPP;
 
-	err = ops->eswitch_mode_get(devlink, &mode);
-	if (err)
-		return err;
-
 	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
 	if (!msg)
 		return -ENOMEM;
 
 	err = devlink_eswitch_fill(msg, devlink, DEVLINK_CMD_ESWITCH_MODE_GET,
-				   info->snd_portid, info->snd_seq, 0, mode);
+				   info->snd_portid, info->snd_seq, 0);
 
 	if (err) {
 		nlmsg_free(msg);
@@ -1453,15 +1467,32 @@ static int devlink_nl_cmd_eswitch_mode_set_doit(struct sk_buff *skb,
 	struct devlink *devlink = info->user_ptr[0];
 	const struct devlink_ops *ops = devlink->ops;
 	u16 mode;
+	u8 inline_mode;
+	int err = 0;
 
-	if (!info->attrs[DEVLINK_ATTR_ESWITCH_MODE])
-		return -EINVAL;
+	if (!ops)
+		return -EOPNOTSUPP;
 
-	mode = nla_get_u16(info->attrs[DEVLINK_ATTR_ESWITCH_MODE]);
+	if (info->attrs[DEVLINK_ATTR_ESWITCH_MODE]) {
+		if (!ops->eswitch_mode_set)
+			return -EOPNOTSUPP;
+		mode = nla_get_u16(info->attrs[DEVLINK_ATTR_ESWITCH_MODE]);
+		err = ops->eswitch_mode_set(devlink, mode);
+		if (err)
+			return err;
+	}
 
-	if (ops && ops->eswitch_mode_set)
-		return ops->eswitch_mode_set(devlink, mode);
-	return -EOPNOTSUPP;
+	if (info->attrs[DEVLINK_ATTR_ESWITCH_INLINE_MODE]) {
+		if (!ops->eswitch_inline_mode_set)
+			return -EOPNOTSUPP;
+		inline_mode = nla_get_u8(
+				info->attrs[DEVLINK_ATTR_ESWITCH_INLINE_MODE]);
+		err = ops->eswitch_inline_mode_set(devlink, inline_mode);
+		if (err)
+			return err;
+	}
+
+	return 0;
 }
 
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
@@ -1478,6 +1509,7 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_SB_THRESHOLD] = { .type = NLA_U32 },
 	[DEVLINK_ATTR_SB_TC_INDEX] = { .type = NLA_U16 },
 	[DEVLINK_ATTR_ESWITCH_MODE] = { .type = NLA_U16 },
+	[DEVLINK_ATTR_ESWITCH_INLINE_MODE] = { .type = NLA_U8 },
 };
 
 static const struct genl_ops devlink_nl_ops[] = {
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next V2 2/7] net/mlx5e: Support HW (offloaded) and SW counters for SRIOV switchdev mode
From: Saeed Mahameed @ 2016-11-22 21:09 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Or Gerlitz, Roi Dayan, Saeed Mahameed
In-Reply-To: <1479849000-14902-1-git-send-email-saeedm@mellanox.com>

From: Or Gerlitz <ogerlitz@mellanox.com>

Switchdev driver net-device port statistics should follow the model introduced
in commit a5ea31f57309 'Merge branch net-offloaded-stats'.

For VF reps we return the SRIOV eswitch vport stats as the usual ones and SW stats
if asked. For the PF, if we're in the switchdev mode, we return the uplink stats
and SW stats if asked, otherwise as before. The uplink stats are implemented using
the PPCNT 802_3 counters which are already being read/cached by the driver.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |   9 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  31 +++---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   | 111 +++++++++++++++++++--
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h |   1 +
 4 files changed, 128 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index ac09767..ebf5dbc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -874,6 +874,7 @@ int mlx5e_add_sqs_fwd_rules(struct mlx5e_priv *priv);
 void mlx5e_remove_sqs_fwd_rules(struct mlx5e_priv *priv);
 int mlx5e_attr_get(struct net_device *dev, struct switchdev_attr *attr);
 void mlx5e_handle_rx_cqe_rep(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe);
+void mlx5e_update_hw_rep_counters(struct mlx5e_priv *priv);
 
 int mlx5e_create_direct_rqts(struct mlx5e_priv *priv);
 void mlx5e_destroy_rqt(struct mlx5e_priv *priv, struct mlx5e_rqt *rqt);
@@ -890,12 +891,16 @@ struct net_device *mlx5e_create_netdev(struct mlx5_core_dev *mdev,
 void mlx5e_destroy_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv);
 int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev);
 void mlx5e_detach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev);
-struct rtnl_link_stats64 *
-mlx5e_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats);
 u32 mlx5e_choose_lro_timeout(struct mlx5_core_dev *mdev, u32 wanted_timeout);
 void mlx5e_add_vxlan_port(struct net_device *netdev,
 			  struct udp_tunnel_info *ti);
 void mlx5e_del_vxlan_port(struct net_device *netdev,
 			  struct udp_tunnel_info *ti);
 
+int mlx5e_get_offload_stats(int attr_id, const struct net_device *dev,
+			    void *sp);
+bool mlx5e_has_offload_stats(const struct net_device *dev, int attr_id);
+
+bool mlx5e_is_uplink_rep(struct mlx5e_priv *priv);
+bool mlx5e_is_vf_vport_rep(struct mlx5e_priv *priv);
 #endif /* __MLX5_EN_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 6957608..8e8d809 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -470,16 +470,6 @@ static void mlx5e_rq_free_mpwqe_info(struct mlx5e_rq *rq)
 	kfree(rq->mpwqe.info);
 }
 
-static bool mlx5e_is_vf_vport_rep(struct mlx5e_priv *priv)
-{
-	struct mlx5_eswitch_rep *rep = (struct mlx5_eswitch_rep *)priv->ppriv;
-
-	if (rep && rep->vport != FDB_UPLINK_VPORT)
-		return true;
-
-	return false;
-}
-
 static int mlx5e_create_rq(struct mlx5e_channel *c,
 			   struct mlx5e_rq_param *param,
 			   struct mlx5e_rq *rq)
@@ -2664,7 +2654,7 @@ static int mlx5e_ndo_setup_tc(struct net_device *dev, u32 handle,
 	return mlx5e_setup_tc(dev, tc->tc);
 }
 
-struct rtnl_link_stats64 *
+static struct rtnl_link_stats64 *
 mlx5e_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats)
 {
 	struct mlx5e_priv *priv = netdev_priv(dev);
@@ -2672,13 +2662,20 @@ mlx5e_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats)
 	struct mlx5e_vport_stats *vstats = &priv->stats.vport;
 	struct mlx5e_pport_stats *pstats = &priv->stats.pport;
 
-	stats->rx_packets = sstats->rx_packets;
-	stats->rx_bytes   = sstats->rx_bytes;
-	stats->tx_packets = sstats->tx_packets;
-	stats->tx_bytes   = sstats->tx_bytes;
+	if (mlx5e_is_uplink_rep(priv)) {
+		stats->rx_packets = PPORT_802_3_GET(pstats, a_frames_received_ok);
+		stats->rx_bytes   = PPORT_802_3_GET(pstats, a_octets_received_ok);
+		stats->tx_packets = PPORT_802_3_GET(pstats, a_frames_transmitted_ok);
+		stats->tx_bytes   = PPORT_802_3_GET(pstats, a_octets_transmitted_ok);
+	} else {
+		stats->rx_packets = sstats->rx_packets;
+		stats->rx_bytes   = sstats->rx_bytes;
+		stats->tx_packets = sstats->tx_packets;
+		stats->tx_bytes   = sstats->tx_bytes;
+		stats->tx_dropped = sstats->tx_queue_dropped;
+	}
 
 	stats->rx_dropped = priv->stats.qcnt.rx_out_of_buffer;
-	stats->tx_dropped = sstats->tx_queue_dropped;
 
 	stats->rx_length_errors =
 		PPORT_802_3_GET(pstats, a_in_range_length_errors) +
@@ -3290,6 +3287,8 @@ static const struct net_device_ops mlx5e_netdev_ops_sriov = {
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller     = mlx5e_netpoll,
 #endif
+	.ndo_has_offload_stats	 = mlx5e_has_offload_stats,
+	.ndo_get_offload_stats	 = mlx5e_get_offload_stats,
 };
 
 static int mlx5e_check_required_hca_cap(struct mlx5_core_dev *mdev)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index a84825d..e0d1a56 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -72,7 +72,29 @@ static void mlx5e_rep_get_strings(struct net_device *dev,
 	}
 }
 
-static void mlx5e_update_sw_rep_counters(struct mlx5e_priv *priv)
+static void mlx5e_rep_update_hw_counters(struct mlx5e_priv *priv)
+{
+	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+	struct mlx5_eswitch_rep *rep = priv->ppriv;
+	struct rtnl_link_stats64 *vport_stats;
+	struct ifla_vf_stats vf_stats;
+	int err;
+
+	err = mlx5_eswitch_get_vport_stats(esw, rep->vport, &vf_stats);
+	if (err) {
+		pr_warn("vport %d error %d reading stats\n", rep->vport, err);
+		return;
+	}
+
+	vport_stats = &priv->stats.vf_vport;
+	/* flip tx/rx as we are reporting the counters for the switch vport */
+	vport_stats->rx_packets = vf_stats.tx_packets;
+	vport_stats->rx_bytes   = vf_stats.tx_bytes;
+	vport_stats->tx_packets = vf_stats.rx_packets;
+	vport_stats->tx_bytes   = vf_stats.rx_bytes;
+}
+
+static void mlx5e_rep_update_sw_counters(struct mlx5e_priv *priv)
 {
 	struct mlx5e_sw_stats *s = &priv->stats.sw;
 	struct mlx5e_rq_stats *rq_stats;
@@ -95,6 +117,12 @@ static void mlx5e_update_sw_rep_counters(struct mlx5e_priv *priv)
 	}
 }
 
+static void mlx5e_rep_update_stats(struct mlx5e_priv *priv)
+{
+	mlx5e_rep_update_sw_counters(priv);
+	mlx5e_rep_update_hw_counters(priv);
+}
+
 static void mlx5e_rep_get_ethtool_stats(struct net_device *dev,
 					struct ethtool_stats *stats, u64 *data)
 {
@@ -106,7 +134,7 @@ static void mlx5e_rep_get_ethtool_stats(struct net_device *dev,
 
 	mutex_lock(&priv->state_lock);
 	if (test_bit(MLX5E_STATE_OPENED, &priv->state))
-		mlx5e_update_sw_rep_counters(priv);
+		mlx5e_rep_update_sw_counters(priv);
 	mutex_unlock(&priv->state_lock);
 
 	for (i = 0; i < NUM_VPORT_REP_COUNTERS; i++)
@@ -245,6 +273,77 @@ static int mlx5e_rep_ndo_setup_tc(struct net_device *dev, u32 handle,
 	}
 }
 
+bool mlx5e_is_uplink_rep(struct mlx5e_priv *priv)
+{
+	struct mlx5_eswitch_rep *rep = (struct mlx5_eswitch_rep *)priv->ppriv;
+	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+
+	if (rep && rep->vport == FDB_UPLINK_VPORT && esw->mode == SRIOV_OFFLOADS)
+		return true;
+
+	return false;
+}
+
+bool mlx5e_is_vf_vport_rep(struct mlx5e_priv *priv)
+{
+	struct mlx5_eswitch_rep *rep = (struct mlx5_eswitch_rep *)priv->ppriv;
+
+	if (rep && rep->vport != FDB_UPLINK_VPORT)
+		return true;
+
+	return false;
+}
+
+bool mlx5e_has_offload_stats(const struct net_device *dev, int attr_id)
+{
+	struct mlx5e_priv *priv = netdev_priv(dev);
+
+	switch (attr_id) {
+	case IFLA_OFFLOAD_XSTATS_CPU_HIT:
+		if (mlx5e_is_vf_vport_rep(priv) || mlx5e_is_uplink_rep(priv))
+			return true;
+	}
+
+	return false;
+}
+
+static int
+mlx5e_get_sw_stats64(const struct net_device *dev,
+		     struct rtnl_link_stats64 *stats)
+{
+	struct mlx5e_priv *priv = netdev_priv(dev);
+	struct mlx5e_sw_stats *sstats = &priv->stats.sw;
+
+	stats->rx_packets = sstats->rx_packets;
+	stats->rx_bytes   = sstats->rx_bytes;
+	stats->tx_packets = sstats->tx_packets;
+	stats->tx_bytes   = sstats->tx_bytes;
+
+	stats->tx_dropped = sstats->tx_queue_dropped;
+
+	return 0;
+}
+
+int mlx5e_get_offload_stats(int attr_id, const struct net_device *dev,
+			    void *sp)
+{
+	switch (attr_id) {
+	case IFLA_OFFLOAD_XSTATS_CPU_HIT:
+		return mlx5e_get_sw_stats64(dev, sp);
+	}
+
+	return -EINVAL;
+}
+
+static struct rtnl_link_stats64 *
+mlx5e_rep_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats)
+{
+	struct mlx5e_priv *priv = netdev_priv(dev);
+
+	memcpy(stats, &priv->stats.vf_vport, sizeof(*stats));
+	return stats;
+}
+
 static const struct switchdev_ops mlx5e_rep_switchdev_ops = {
 	.switchdev_port_attr_get	= mlx5e_attr_get,
 };
@@ -255,9 +354,9 @@ static const struct net_device_ops mlx5e_netdev_ops_rep = {
 	.ndo_start_xmit          = mlx5e_xmit,
 	.ndo_get_phys_port_name  = mlx5e_rep_get_phys_port_name,
 	.ndo_setup_tc            = mlx5e_rep_ndo_setup_tc,
-	.ndo_get_stats64         = mlx5e_get_stats,
-	.ndo_udp_tunnel_add	 = mlx5e_add_vxlan_port,
-	.ndo_udp_tunnel_del	 = mlx5e_del_vxlan_port,
+	.ndo_get_stats64         = mlx5e_rep_get_stats,
+	.ndo_has_offload_stats	 = mlx5e_has_offload_stats,
+	.ndo_get_offload_stats	 = mlx5e_get_offload_stats,
 };
 
 static void mlx5e_build_rep_netdev_priv(struct mlx5_core_dev *mdev,
@@ -407,7 +506,7 @@ static struct mlx5e_profile mlx5e_rep_profile = {
 	.cleanup_rx		= mlx5e_cleanup_rep_rx,
 	.init_tx		= mlx5e_init_rep_tx,
 	.cleanup_tx		= mlx5e_cleanup_nic_tx,
-	.update_stats           = mlx5e_update_sw_rep_counters,
+	.update_stats           = mlx5e_rep_update_stats,
 	.max_nch		= mlx5e_get_rep_max_num_channels,
 	.max_tc			= 1,
 };
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index 5da6a1c..f202f87 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -407,6 +407,7 @@ struct mlx5e_stats {
 	struct mlx5e_vport_stats vport;
 	struct mlx5e_pport_stats pport;
 	struct mlx5e_pcie_stats pcie;
+	struct rtnl_link_stats64 vf_vport;
 };
 
 static const struct counter_desc mlx5e_pme_status_desc[] = {
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next V2 3/7] net/mlx5e: Support VF vport link state control for SRIOV switchdev mode
From: Saeed Mahameed @ 2016-11-22 21:09 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Or Gerlitz, Roi Dayan, Saeed Mahameed
In-Reply-To: <1479849000-14902-1-git-send-email-saeedm@mellanox.com>

From: Or Gerlitz <ogerlitz@mellanox.com>

Reflect the administative link changes done on the VF representor to the
VF e-switch vport. This means that doing ip link set down/up commands on
the VF rep will modify the e-switch vport state which in turn will make
proper VF drivers to set their carrier accordingly.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 33 ++++++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index e0d1a56..5e33f6b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -236,6 +236,35 @@ void mlx5e_nic_rep_unload(struct mlx5_eswitch *esw,
 	mlx5e_tc_init(priv);
 }
 
+static int mlx5e_rep_open(struct net_device *dev)
+{
+	struct mlx5e_priv *priv = netdev_priv(dev);
+	struct mlx5_eswitch_rep *rep = priv->ppriv;
+	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+	int err;
+
+	err = mlx5e_open(dev);
+	if (err)
+		return err;
+
+	err = mlx5_eswitch_set_vport_state(esw, rep->vport, MLX5_ESW_VPORT_ADMIN_STATE_UP);
+	if (!err)
+		netif_carrier_on(dev);
+
+	return 0;
+}
+
+static int mlx5e_rep_close(struct net_device *dev)
+{
+	struct mlx5e_priv *priv = netdev_priv(dev);
+	struct mlx5_eswitch_rep *rep = priv->ppriv;
+	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+
+	(void)mlx5_eswitch_set_vport_state(esw, rep->vport, MLX5_ESW_VPORT_ADMIN_STATE_DOWN);
+
+	return mlx5e_close(dev);
+}
+
 static int mlx5e_rep_get_phys_port_name(struct net_device *dev,
 					char *buf, size_t len)
 {
@@ -349,8 +378,8 @@ static const struct switchdev_ops mlx5e_rep_switchdev_ops = {
 };
 
 static const struct net_device_ops mlx5e_netdev_ops_rep = {
-	.ndo_open                = mlx5e_open,
-	.ndo_stop                = mlx5e_close,
+	.ndo_open                = mlx5e_rep_open,
+	.ndo_stop                = mlx5e_rep_close,
 	.ndo_start_xmit          = mlx5e_xmit,
 	.ndo_get_phys_port_name  = mlx5e_rep_get_phys_port_name,
 	.ndo_setup_tc            = mlx5e_rep_ndo_setup_tc,
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next V2 5/7] net/mlx5: Enable to query min inline for a specific vport
From: Saeed Mahameed @ 2016-11-22 21:09 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Or Gerlitz, Roi Dayan, Saeed Mahameed
In-Reply-To: <1479849000-14902-1-git-send-email-saeedm@mellanox.com>

From: Roi Dayan <roid@mellanox.com>

Also move the inline capablities enum to a shared header vport.h

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h      |  6 ------
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 11 +++++------
 drivers/net/ethernet/mellanox/mlx5/core/vport.c   | 14 ++++++++------
 include/linux/mlx5/vport.h                        | 10 ++++++++--
 4 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index ebf5dbc..a2b32ed 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -150,12 +150,6 @@ static inline int mlx5_max_log_rq_size(int wq_type)
 	}
 }
 
-enum {
-	MLX5E_INLINE_MODE_L2,
-	MLX5E_INLINE_MODE_VPORT_CONTEXT,
-	MLX5_INLINE_MODE_NOT_REQUIRED,
-};
-
 struct mlx5e_tx_wqe {
 	struct mlx5_wqe_ctrl_seg ctrl;
 	struct mlx5_wqe_eth_seg  eth;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 8e8d809..19403d6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -957,7 +957,7 @@ static int mlx5e_create_sq(struct mlx5e_channel *c,
 	sq->bf_buf_size = (1 << MLX5_CAP_GEN(mdev, log_bf_reg_size)) / 2;
 	sq->max_inline  = param->max_inline;
 	sq->min_inline_mode =
-		MLX5_CAP_ETH(mdev, wqe_inline_mode) == MLX5E_INLINE_MODE_VPORT_CONTEXT ?
+		MLX5_CAP_ETH(mdev, wqe_inline_mode) == MLX5_CAP_INLINE_MODE_VPORT_CONTEXT ?
 		param->min_inline_mode : 0;
 
 	err = mlx5e_alloc_sq_db(sq, cpu_to_node(c->cpu));
@@ -3417,14 +3417,13 @@ static void mlx5e_query_min_inline(struct mlx5_core_dev *mdev,
 				   u8 *min_inline_mode)
 {
 	switch (MLX5_CAP_ETH(mdev, wqe_inline_mode)) {
-	case MLX5E_INLINE_MODE_L2:
+	case MLX5_CAP_INLINE_MODE_L2:
 		*min_inline_mode = MLX5_INLINE_MODE_L2;
 		break;
-	case MLX5E_INLINE_MODE_VPORT_CONTEXT:
-		mlx5_query_nic_vport_min_inline(mdev,
-						min_inline_mode);
+	case MLX5_CAP_INLINE_MODE_VPORT_CONTEXT:
+		mlx5_query_nic_vport_min_inline(mdev, 0, min_inline_mode);
 		break;
-	case MLX5_INLINE_MODE_NOT_REQUIRED:
+	case MLX5_CAP_INLINE_MODE_NOT_REQUIRED:
 		*min_inline_mode = MLX5_INLINE_MODE_NONE;
 		break;
 	}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index 525f17a..269e440 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -113,15 +113,17 @@ static int mlx5_modify_nic_vport_context(struct mlx5_core_dev *mdev, void *in,
 	return mlx5_cmd_exec(mdev, in, inlen, out, sizeof(out));
 }
 
-void mlx5_query_nic_vport_min_inline(struct mlx5_core_dev *mdev,
-				     u8 *min_inline_mode)
+int mlx5_query_nic_vport_min_inline(struct mlx5_core_dev *mdev,
+				    u16 vport, u8 *min_inline)
 {
 	u32 out[MLX5_ST_SZ_DW(query_nic_vport_context_out)] = {0};
+	int err;
 
-	mlx5_query_nic_vport_context(mdev, 0, out, sizeof(out));
-
-	*min_inline_mode = MLX5_GET(query_nic_vport_context_out, out,
-				    nic_vport_context.min_wqe_inline_mode);
+	err = mlx5_query_nic_vport_context(mdev, vport, out, sizeof(out));
+	if (!err)
+		*min_inline = MLX5_GET(query_nic_vport_context_out, out,
+				       nic_vport_context.min_wqe_inline_mode);
+	return err;
 }
 EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_min_inline);
 
diff --git a/include/linux/mlx5/vport.h b/include/linux/mlx5/vport.h
index 451b0bd..ec35157 100644
--- a/include/linux/mlx5/vport.h
+++ b/include/linux/mlx5/vport.h
@@ -36,6 +36,12 @@
 #include <linux/mlx5/driver.h>
 #include <linux/mlx5/device.h>
 
+enum {
+	MLX5_CAP_INLINE_MODE_L2,
+	MLX5_CAP_INLINE_MODE_VPORT_CONTEXT,
+	MLX5_CAP_INLINE_MODE_NOT_REQUIRED,
+};
+
 u8 mlx5_query_vport_state(struct mlx5_core_dev *mdev, u8 opmod, u16 vport);
 u8 mlx5_query_vport_admin_state(struct mlx5_core_dev *mdev, u8 opmod,
 				u16 vport);
@@ -43,8 +49,8 @@ int mlx5_modify_vport_admin_state(struct mlx5_core_dev *mdev, u8 opmod,
 				  u16 vport, u8 state);
 int mlx5_query_nic_vport_mac_address(struct mlx5_core_dev *mdev,
 				     u16 vport, u8 *addr);
-void mlx5_query_nic_vport_min_inline(struct mlx5_core_dev *mdev,
-				     u8 *min_inline);
+int mlx5_query_nic_vport_min_inline(struct mlx5_core_dev *mdev,
+				    u16 vport, u8 *min_inline);
 int mlx5_modify_nic_vport_min_inline(struct mlx5_core_dev *mdev,
 				     u16 vport, u8 min_inline);
 int mlx5_modify_nic_vport_mac_address(struct mlx5_core_dev *dev,
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next V2 1/7] net: Add net-device param to the get offloaded stats ndo
From: Saeed Mahameed @ 2016-11-22 21:09 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Or Gerlitz, Roi Dayan, Saeed Mahameed
In-Reply-To: <1479849000-14902-1-git-send-email-saeedm@mellanox.com>

From: Or Gerlitz <ogerlitz@mellanox.com>

Some drivers would need to check few internal matters for
that. To be used in downstream mlx5 commit.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 2 +-
 include/linux/netdevice.h                      | 4 ++--
 net/core/rtnetlink.c                           | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 4a1f9d5..e0d7d5a 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -857,7 +857,7 @@ mlxsw_sp_port_get_sw_stats64(const struct net_device *dev,
 	return 0;
 }
 
-static bool mlxsw_sp_port_has_offload_stats(int attr_id)
+static bool mlxsw_sp_port_has_offload_stats(const struct net_device *dev, int attr_id)
 {
 	switch (attr_id) {
 	case IFLA_OFFLOAD_XSTATS_CPU_HIT:
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e84800e..ae32a27 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -925,7 +925,7 @@ struct netdev_xdp {
  *	3. Update dev->stats asynchronously and atomically, and define
  *	   neither operation.
  *
- * bool (*ndo_has_offload_stats)(int attr_id)
+ * bool (*ndo_has_offload_stats)(const struct net_device *dev, int attr_id)
  *	Return true if this device supports offload stats of this attr_id.
  *
  * int (*ndo_get_offload_stats)(int attr_id, const struct net_device *dev,
@@ -1165,7 +1165,7 @@ struct net_device_ops {
 
 	struct rtnl_link_stats64* (*ndo_get_stats64)(struct net_device *dev,
 						     struct rtnl_link_stats64 *storage);
-	bool			(*ndo_has_offload_stats)(int attr_id);
+	bool			(*ndo_has_offload_stats)(const struct net_device *dev, int attr_id);
 	int			(*ndo_get_offload_stats)(int attr_id,
 							 const struct net_device *dev,
 							 void *attr_data);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index db313ec..f5a8d8a 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3665,7 +3665,7 @@ static int rtnl_get_offload_stats(struct sk_buff *skb, struct net_device *dev,
 		if (!size)
 			continue;
 
-		if (!dev->netdev_ops->ndo_has_offload_stats(attr_id))
+		if (!dev->netdev_ops->ndo_has_offload_stats(dev, attr_id))
 			continue;
 
 		attr = nla_reserve_64bit(skb, attr_id, size,
@@ -3706,7 +3706,7 @@ static int rtnl_get_offload_stats_size(const struct net_device *dev)
 
 	for (attr_id = IFLA_OFFLOAD_XSTATS_FIRST;
 	     attr_id <= IFLA_OFFLOAD_XSTATS_MAX; attr_id++) {
-		if (!dev->netdev_ops->ndo_has_offload_stats(attr_id))
+		if (!dev->netdev_ops->ndo_has_offload_stats(dev, attr_id))
 			continue;
 		size = rtnl_get_offload_stats_attr_size(attr_id);
 		nla_size += nla_total_size_64bit(size);
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next V2 6/7] net/mlx5: E-Switch, Add control for inline mode
From: Saeed Mahameed @ 2016-11-22 21:09 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Or Gerlitz, Roi Dayan, Saeed Mahameed
In-Reply-To: <1479849000-14902-1-git-send-email-saeedm@mellanox.com>

From: Roi Dayan <roid@mellanox.com>

Implement devlink show and set of HW inline-mode.
The supported modes: none, link, network, transport.
We currently support one mode for all vports so set is done on all vports.
When eswitch is first initialized the inline-mode is queried from the FW.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |   4 +
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 141 +++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/main.c     |   2 +
 4 files changed, 148 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 9734ac8..d6807c3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1798,6 +1798,7 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev)
 	esw->total_vports = total_vports;
 	esw->enabled_vports = 0;
 	esw->mode = SRIOV_NONE;
+	esw->offloads.inline_mode = MLX5_INLINE_MODE_NONE;
 
 	dev->priv.eswitch = esw;
 	return 0;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 40482e8..cf1aa56 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -200,6 +200,7 @@ struct mlx5_esw_offload {
 	struct mlx5_flow_group *vport_rx_group;
 	struct mlx5_eswitch_rep *vport_reps;
 	DECLARE_HASHTABLE(encap_tbl, 8);
+	u8 inline_mode;
 };
 
 struct mlx5_eswitch {
@@ -309,6 +310,9 @@ void mlx5_eswitch_sqs2vport_stop(struct mlx5_eswitch *esw,
 
 int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode);
 int mlx5_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode);
+int mlx5_devlink_eswitch_inline_mode_set(struct devlink *devlink, u8 mode);
+int mlx5_devlink_eswitch_inline_mode_get(struct devlink *devlink, u8 *mode);
+int mlx5_eswitch_inline_mode_get(struct mlx5_eswitch *esw, int nvfs, u8 *mode);
 void mlx5_eswitch_register_vport_rep(struct mlx5_eswitch *esw,
 				     int vport_index,
 				     struct mlx5_eswitch_rep *rep);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 731f286..5c01550 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -657,6 +657,14 @@ static int esw_offloads_start(struct mlx5_eswitch *esw)
 		if (err1)
 			esw_warn(esw->dev, "Failed setting eswitch back to legacy, err %d\n", err);
 	}
+	if (esw->offloads.inline_mode == MLX5_INLINE_MODE_NONE) {
+		if (mlx5_eswitch_inline_mode_get(esw,
+						 num_vfs,
+						 &esw->offloads.inline_mode)) {
+			esw->offloads.inline_mode = MLX5_INLINE_MODE_L2;
+			esw_warn(esw->dev, "Inline mode is different between vports\n");
+		}
+	}
 	return err;
 }
 
@@ -771,6 +779,50 @@ static int esw_mode_to_devlink(u16 mlx5_mode, u16 *mode)
 	return 0;
 }
 
+static int esw_inline_mode_from_devlink(u8 mode, u8 *mlx5_mode)
+{
+	switch (mode) {
+	case DEVLINK_ESWITCH_INLINE_MODE_NONE:
+		*mlx5_mode = MLX5_INLINE_MODE_NONE;
+		break;
+	case DEVLINK_ESWITCH_INLINE_MODE_LINK:
+		*mlx5_mode = MLX5_INLINE_MODE_L2;
+		break;
+	case DEVLINK_ESWITCH_INLINE_MODE_NETWORK:
+		*mlx5_mode = MLX5_INLINE_MODE_IP;
+		break;
+	case DEVLINK_ESWITCH_INLINE_MODE_TRANSPORT:
+		*mlx5_mode = MLX5_INLINE_MODE_TCP_UDP;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int esw_inline_mode_to_devlink(u8 mlx5_mode, u8 *mode)
+{
+	switch (mlx5_mode) {
+	case MLX5_INLINE_MODE_NONE:
+		*mode = DEVLINK_ESWITCH_INLINE_MODE_NONE;
+		break;
+	case MLX5_INLINE_MODE_L2:
+		*mode = DEVLINK_ESWITCH_INLINE_MODE_LINK;
+		break;
+	case MLX5_INLINE_MODE_IP:
+		*mode = DEVLINK_ESWITCH_INLINE_MODE_NETWORK;
+		break;
+	case MLX5_INLINE_MODE_TCP_UDP:
+		*mode = DEVLINK_ESWITCH_INLINE_MODE_TRANSPORT;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode)
 {
 	struct mlx5_core_dev *dev;
@@ -815,6 +867,95 @@ int mlx5_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode)
 	return esw_mode_to_devlink(dev->priv.eswitch->mode, mode);
 }
 
+int mlx5_devlink_eswitch_inline_mode_set(struct devlink *devlink, u8 mode)
+{
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
+	struct mlx5_eswitch *esw = dev->priv.eswitch;
+	int num_vports = esw->enabled_vports;
+	int err;
+	int vport;
+	u8 mlx5_mode;
+
+	if (!MLX5_CAP_GEN(dev, vport_group_manager))
+		return -EOPNOTSUPP;
+
+	if (esw->mode == SRIOV_NONE)
+		return -EOPNOTSUPP;
+
+	if (MLX5_CAP_ETH(dev, wqe_inline_mode) !=
+	    MLX5_CAP_INLINE_MODE_VPORT_CONTEXT)
+		return -EOPNOTSUPP;
+
+	err = esw_inline_mode_from_devlink(mode, &mlx5_mode);
+	if (err)
+		goto out;
+
+	for (vport = 1; vport < num_vports; vport++) {
+		err = mlx5_modify_nic_vport_min_inline(dev, vport, mlx5_mode);
+		if (err) {
+			esw_warn(dev, "Failed to set min inline on vport %d\n",
+				 vport);
+			goto revert_inline_mode;
+		}
+	}
+
+	esw->offloads.inline_mode = mlx5_mode;
+	return 0;
+
+revert_inline_mode:
+	while (--vport > 0)
+		mlx5_modify_nic_vport_min_inline(dev,
+						 vport,
+						 esw->offloads.inline_mode);
+out:
+	return err;
+}
+
+int mlx5_devlink_eswitch_inline_mode_get(struct devlink *devlink, u8 *mode)
+{
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
+	struct mlx5_eswitch *esw = dev->priv.eswitch;
+
+	if (!MLX5_CAP_GEN(dev, vport_group_manager))
+		return -EOPNOTSUPP;
+
+	if (esw->mode == SRIOV_NONE)
+		return -EOPNOTSUPP;
+
+	if (MLX5_CAP_ETH(dev, wqe_inline_mode) !=
+	    MLX5_CAP_INLINE_MODE_VPORT_CONTEXT)
+		return -EOPNOTSUPP;
+
+	return esw_inline_mode_to_devlink(esw->offloads.inline_mode, mode);
+}
+
+int mlx5_eswitch_inline_mode_get(struct mlx5_eswitch *esw, int nvfs, u8 *mode)
+{
+	struct mlx5_core_dev *dev = esw->dev;
+	int vport;
+	u8 prev_mlx5_mode, mlx5_mode = MLX5_INLINE_MODE_L2;
+
+	if (!MLX5_CAP_GEN(dev, vport_group_manager))
+		return -EOPNOTSUPP;
+
+	if (esw->mode == SRIOV_NONE)
+		return -EOPNOTSUPP;
+
+	if (MLX5_CAP_ETH(dev, wqe_inline_mode) !=
+	    MLX5_CAP_INLINE_MODE_VPORT_CONTEXT)
+		return -EOPNOTSUPP;
+
+	for (vport = 1; vport <= nvfs; vport++) {
+		mlx5_query_nic_vport_min_inline(dev, vport, &mlx5_mode);
+		if (vport > 1 && prev_mlx5_mode != mlx5_mode)
+			return -EINVAL;
+		prev_mlx5_mode = mlx5_mode;
+	}
+
+	*mode = mlx5_mode;
+	return 0;
+}
+
 void mlx5_eswitch_register_vport_rep(struct mlx5_eswitch *esw,
 				     int vport_index,
 				     struct mlx5_eswitch_rep *__rep)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index f28df33..b440a16 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1239,6 +1239,8 @@ static const struct devlink_ops mlx5_devlink_ops = {
 #ifdef CONFIG_MLX5_CORE_EN
 	.eswitch_mode_set = mlx5_devlink_eswitch_mode_set,
 	.eswitch_mode_get = mlx5_devlink_eswitch_mode_get,
+	.eswitch_inline_mode_set = mlx5_devlink_eswitch_inline_mode_set,
+	.eswitch_inline_mode_get = mlx5_devlink_eswitch_inline_mode_get,
 #endif
 };
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next V2 7/7] net/mlx5e: Enforce min inline mode when offloading flows
From: Saeed Mahameed @ 2016-11-22 21:10 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Or Gerlitz, Roi Dayan, Saeed Mahameed
In-Reply-To: <1479849000-14902-1-git-send-email-saeedm@mellanox.com>

From: Roi Dayan <roid@mellanox.com>

A flow should be offloaded only if the matches are
allowed according to min inline mode.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 46 +++++++++++++++++++++++--
 1 file changed, 44 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 4b99112..4d06fab 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -279,8 +279,10 @@ static int parse_tunnel_attr(struct mlx5e_priv *priv,
 	return 0;
 }
 
-static int parse_cls_flower(struct mlx5e_priv *priv, struct mlx5_flow_spec *spec,
-			    struct tc_cls_flower_offload *f)
+static int __parse_cls_flower(struct mlx5e_priv *priv,
+			      struct mlx5_flow_spec *spec,
+			      struct tc_cls_flower_offload *f,
+			      u8 *min_inline)
 {
 	void *headers_c = MLX5_ADDR_OF(fte_match_param, spec->match_criteria,
 				       outer_headers);
@@ -289,6 +291,8 @@ static int parse_cls_flower(struct mlx5e_priv *priv, struct mlx5_flow_spec *spec
 	u16 addr_type = 0;
 	u8 ip_proto = 0;
 
+	*min_inline = MLX5_INLINE_MODE_L2;
+
 	if (f->dissector->used_keys &
 	    ~(BIT(FLOW_DISSECTOR_KEY_CONTROL) |
 	      BIT(FLOW_DISSECTOR_KEY_BASIC) |
@@ -362,6 +366,9 @@ static int parse_cls_flower(struct mlx5e_priv *priv, struct mlx5_flow_spec *spec
 			 mask->ip_proto);
 		MLX5_SET(fte_match_set_lyr_2_4, headers_v, ip_protocol,
 			 key->ip_proto);
+
+		if (mask->ip_proto)
+			*min_inline = MLX5_INLINE_MODE_IP;
 	}
 
 	if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
@@ -432,6 +439,9 @@ static int parse_cls_flower(struct mlx5e_priv *priv, struct mlx5_flow_spec *spec
 		memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v,
 				    dst_ipv4_dst_ipv6.ipv4_layout.ipv4),
 		       &key->dst, sizeof(key->dst));
+
+		if (mask->src || mask->dst)
+			*min_inline = MLX5_INLINE_MODE_IP;
 	}
 
 	if (addr_type == FLOW_DISSECTOR_KEY_IPV6_ADDRS) {
@@ -457,6 +467,10 @@ static int parse_cls_flower(struct mlx5e_priv *priv, struct mlx5_flow_spec *spec
 		memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v,
 				    dst_ipv4_dst_ipv6.ipv6_layout.ipv6),
 		       &key->dst, sizeof(key->dst));
+
+		if (ipv6_addr_type(&mask->src) != IPV6_ADDR_ANY ||
+		    ipv6_addr_type(&mask->dst) != IPV6_ADDR_ANY)
+			*min_inline = MLX5_INLINE_MODE_IP;
 	}
 
 	if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_PORTS)) {
@@ -497,11 +511,39 @@ static int parse_cls_flower(struct mlx5e_priv *priv, struct mlx5_flow_spec *spec
 				   "Only UDP and TCP transport are supported\n");
 			return -EINVAL;
 		}
+
+		if (mask->src || mask->dst)
+			*min_inline = MLX5_INLINE_MODE_TCP_UDP;
 	}
 
 	return 0;
 }
 
+static int parse_cls_flower(struct mlx5e_priv *priv,
+			    struct mlx5_flow_spec *spec,
+			    struct tc_cls_flower_offload *f)
+{
+	struct mlx5_core_dev *dev = priv->mdev;
+	struct mlx5_eswitch *esw = dev->priv.eswitch;
+	struct mlx5_eswitch_rep *rep = priv->ppriv;
+	u8 min_inline;
+	int err;
+
+	err = __parse_cls_flower(priv, spec, f, &min_inline);
+
+	if (!err && esw->mode == SRIOV_OFFLOADS &&
+	    rep->vport != FDB_UPLINK_VPORT) {
+		if (min_inline > esw->offloads.inline_mode) {
+			netdev_warn(priv->netdev,
+				    "Flow is not offloaded due to min inline setting, required %d actual %d\n",
+				    min_inline, esw->offloads.inline_mode);
+			return -EOPNOTSUPP;
+		}
+	}
+
+	return err;
+}
+
 static int parse_tc_nic_actions(struct mlx5e_priv *priv, struct tcf_exts *exts,
 				u32 *action, u32 *flow_tag)
 {
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net-next] ethtool: Protect {get,set}_phy_tunable with PHY device mutex
From: Florian Fainelli @ 2016-11-22 21:02 UTC (permalink / raw)
  To: netdev
  Cc: davem, bcm-kernel-feedback-list, andrew, allan.nielsen,
	raju.lakkaraju, vivien.didelot
In-Reply-To: <20161122201316.10830-1-f.fainelli@gmail.com>

On 11/22/2016 12:13 PM, Florian Fainelli wrote:
> PHY drivers should be able to rely on the caller of {get,set}_tunable to
> have acquired the PHY device mutex, in order to both serialize against
> concurrent calls of these functions, but also against PHY state machine
> changes. All ethtool PHY-level functions do this, except
> {get,set}_tunable, so we make them consistent here as well.
> 
> Fixes: 968ad9da7e0e ("ethtool: Implements ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE")
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

David, please discard, this is going to create problems for the
Microsemi PHY driver since it also acquires phydev->lock. (patch has
been marked accordingly in patchwork.
Thanks!
-- 
Florian

^ permalink raw reply

* [PATCH net-next v2] ethtool: Protect {get,set}_phy_tunable with PHY device mutex
From: Florian Fainelli @ 2016-11-22 21:55 UTC (permalink / raw)
  To: netdev
  Cc: davem, bcm-kernel-feedback-list, andrew, allan.nielsen,
	raju.lakkaraju, vivien.didelot, Florian Fainelli

PHY drivers should be able to rely on the caller of {get,set}_tunable to
have acquired the PHY device mutex, in order to both serialize against
concurrent calls of these functions, but also against PHY state machine
changes. All ethtool PHY-level functions do this, except
{get,set}_tunable, so we make them consistent here as well.

We need to update the Microsemi PHY driver in the same commit to avoid
introducing either deadlocks, or lack of proper locking.

Fixes: 968ad9da7e0e ("ethtool: Implements ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE")
Fixes: 310d9ad57ae0 ("net: phy: Add downshift get/set support in Microsemi PHYs driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
Changes in v2:

- also patch drivers/net/phy/mscc.c in the same commit

 drivers/net/phy/mscc.c | 16 +++++-----------
 net/core/ethtool.c     |  4 ++++
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/net/phy/mscc.c b/drivers/net/phy/mscc.c
index 92018ba6209e..7a3740c7bf6d 100644
--- a/drivers/net/phy/mscc.c
+++ b/drivers/net/phy/mscc.c
@@ -115,10 +115,9 @@ static int vsc85xx_downshift_get(struct phy_device *phydev, u8 *count)
 	int rc;
 	u16 reg_val;
 
-	mutex_lock(&phydev->lock);
 	rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_EXTENDED);
 	if (rc != 0)
-		goto out_unlock;
+		goto out;
 
 	reg_val = phy_read(phydev, MSCC_PHY_ACTIPHY_CNTL);
 	reg_val &= DOWNSHIFT_CNTL_MASK;
@@ -128,9 +127,7 @@ static int vsc85xx_downshift_get(struct phy_device *phydev, u8 *count)
 		*count = ((reg_val & ~DOWNSHIFT_EN) >> DOWNSHIFT_CNTL_POS) + 2;
 	rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_STANDARD);
 
-out_unlock:
-	mutex_unlock(&phydev->lock);
-
+out:
 	return rc;
 }
 
@@ -150,23 +147,20 @@ static int vsc85xx_downshift_set(struct phy_device *phydev, u8 count)
 		count = (((count - 2) << DOWNSHIFT_CNTL_POS) | DOWNSHIFT_EN);
 	}
 
-	mutex_lock(&phydev->lock);
 	rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_EXTENDED);
 	if (rc != 0)
-		goto out_unlock;
+		goto out;
 
 	reg_val = phy_read(phydev, MSCC_PHY_ACTIPHY_CNTL);
 	reg_val &= ~(DOWNSHIFT_CNTL_MASK);
 	reg_val |= count;
 	rc = phy_write(phydev, MSCC_PHY_ACTIPHY_CNTL, reg_val);
 	if (rc != 0)
-		goto out_unlock;
+		goto out;
 
 	rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_STANDARD);
 
-out_unlock:
-	mutex_unlock(&phydev->lock);
-
+out:
 	return rc;
 }
 
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index e9b4556751ff..0adb3bec5b5a 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -2466,7 +2466,9 @@ static int get_phy_tunable(struct net_device *dev, void __user *useraddr)
 	data = kmalloc(tuna.len, GFP_USER);
 	if (!data)
 		return -ENOMEM;
+	mutex_lock(&phydev->lock);
 	ret = phydev->drv->get_tunable(phydev, &tuna, data);
+	mutex_unlock(&phydev->lock);
 	if (ret)
 		goto out;
 	useraddr += sizeof(tuna);
@@ -2501,7 +2503,9 @@ static int set_phy_tunable(struct net_device *dev, void __user *useraddr)
 	ret = -EFAULT;
 	if (copy_from_user(data, useraddr, tuna.len))
 		goto out;
+	mutex_lock(&phydev->lock);
 	ret = phydev->drv->set_tunable(phydev, &tuna, data);
+	mutex_unlock(&phydev->lock);
 
 out:
 	kfree(data);
-- 
2.9.3

^ permalink raw reply related

* Re: [net] 34fad54c25: kernel BUG at include/linux/skbuff.h:1935!
From: Linus Torvalds @ 2016-11-22 22:04 UTC (permalink / raw)
  To: kernel test robot, David Miller, Eric Dumazet
  Cc: LKP, LKML, Alexei Starovoitov, Willem de Bruijn, Alexander Duyck,
	Network Development
In-Reply-To: <582b7c30.nXQXP2V4/6pFiYwt%xiaolong.ye@intel.com>

David, Eric,

 what's the situation on this issue? The bisection looks a bit odd,
but the commit in question does end up changing the key_control->thoff
value for the failure case, so maybe that in turn ends up screwing up
a later skb_pull.

I'm not seeing anything that might fix this in the last networking
pull, but I may have missed something.

I also noticed that the kernel test robot had screwed up the
participants list for some reason, and had

  "Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>, David S.
Miller" <davem@davemloft.net>

as one of the participants. So there's some odd commit parsing issue
there somewhere. But Alexander seems to have seen this report despite
that, it just never went anywhere that I can tell.

                Linus

On Tue, Nov 15, 2016 at 1:20 PM, kernel test robot
<xiaolong.ye@intel.com> wrote:
>
> FYI, we noticed the following commit:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> commit 34fad54c2537f7c99d07375e50cb30aa3c23bd83 ("net: __skb_flow_dissect() must cap its return value")
>
> in testcase: pbzip2
> with following parameters:
>
>         nr_threads: 25%
>         blocksize: 900K
>         cpufreq_governor: performance
>
>
>
> on test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 64G memory
>
> caused below changes:
>
>
> +------------------------------------------------------------------+------------+------------+
> |                                                                  | 79774d6bfa | 34fad54c25 |
> +------------------------------------------------------------------+------------+------------+
> | boot_successes                                                   | 0          | 2          |
> | boot_failures                                                    | 2          | 20         |
> | invoked_oom-killer:gfp_mask=0x                                   | 2          | 2          |
> | Mem-Info                                                         | 2          | 2          |
> | Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 2          | 2          |
> | kernel_BUG_at_include/linux/skbuff.h                             | 0          | 16         |
> | invalid_opcode:#[##]SMP                                          | 0          | 16         |
> | RIP:eth_type_trans                                               | 0          | 16         |
> | Kernel_panic-not_syncing:Fatal_exception_in_interrupt            | 0          | 15         |
> | calltrace:hub_event                                              | 0          | 1          |
> | WARNING:at_fs/sysfs/dir.c:#sysfs_warn_dup                        | 0          | 2          |
> | calltrace:parport_pc_init                                        | 0          | 2          |
> | calltrace:SyS_finit_module                                       | 0          | 2          |
> | WARNING:at_lib/kobject.c:#kobject_add_internal                   | 0          | 2          |
> +------------------------------------------------------------------+------------+------------+
>
>
>
> [   19.375251] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
> [   19.388892] Sending DHCP requests .
> [   19.388892] ------------[ cut here ]------------
> [   19.388894] kernel BUG at include/linux/skbuff.h:1935!
> [   19.388895] invalid opcode: 0000 [#1] SMP
> [   19.388896] Modules linked in:
> [   19.388897] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc3-00320-g34fad54 #1
> [   19.388898] Hardware name: Intel Corporation S2600WP/S2600WP, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
> [   19.388899] task: ffffffff81e0e4c0 task.stack: ffffffff81e00000
> [   19.388904] RIP: 0010:[<ffffffff81837c48>]  [<ffffffff81837c48>] eth_type_trans+0xe8/0x140
> [   19.388904] RSP: 0000:ffff88081e803db8  EFLAGS: 00010297
> [   19.388905] RAX: 0000000000000152 RBX: ffff88080221f200 RCX: 0000000000001073
> [   19.388905] RDX: ffff8808013afdc0 RSI: ffff880801114000 RDI: ffff880819407c00
> [   19.388906] RBP: ffff88081e803e20 R08: ffff880801114000 R09: 0000000000000800
> [   19.388907] R10: ffff8808013afec0 R11: ffffea003fd5a880 R12: ffff880819407c00
> [   19.388907] R13: ffff881033408000 R14: ffffc9000843e000 R15: 0000000000000158
> [   19.388908] FS:  0000000000000000(0000) GS:ffff88081e800000(0000) knlGS:0000000000000000
> [   19.388909] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   19.388910] CR2: ffff88103ffff000 CR3: 0000000001e07000 CR4: 00000000001406f0
> [   19.388910] Stack:
> [   19.388912]  ffffffff816905a7 ffffea003fd5a880 ffffea0000000008 ffff88080221f050
> [   19.388913]  ffff88080221f000 0000004000000160 ffffea003fd5a880 0000000000000000
> [   19.388915]  0000000000000040 0000000000000000 ffff88080221f050 ffff88100d216000
> [   19.388915] Call Trace:
> [   19.388919]  <IRQ>
> [   19.388919]  [<ffffffff816905a7>] ? igb_clean_rx_irq+0x6a7/0x7d0
> [   19.388921]  [<ffffffff81690a52>] igb_poll+0x382/0x700
> [   19.388922]  [<ffffffff81690a67>] ? igb_poll+0x397/0x700
> [   19.388925]  [<ffffffff8180f2d7>] net_rx_action+0x217/0x360
> [   19.388928]  [<ffffffff81957fb4>] __do_softirq+0x104/0x2ab
> [   19.388931]  [<ffffffff81086961>] irq_exit+0xf1/0x100
> [   19.388932]  [<ffffffff81957cf4>] do_IRQ+0x54/0xd0
> [   19.388935]  [<ffffffff81955b8c>] common_interrupt+0x8c/0x8c
> [   19.388938]  <EOI>
> [   19.388938]  [<ffffffff817c1d12>] ? cpuidle_enter_state+0x122/0x2e0
> [   19.388939]  [<ffffffff817c1f07>] cpuidle_enter+0x17/0x20
> [   19.388942]  [<ffffffff810c64c3>] call_cpuidle+0x23/0x40
> [   19.388944]  [<ffffffff810c66f4>] cpu_startup_entry+0x114/0x200
> [   19.388946]  [<ffffffff81947675>] rest_init+0x85/0x90
> [   19.388950]  [<ffffffff81ffbf5c>] start_kernel+0x407/0x414
> [   19.388952]  [<ffffffff81ffb120>] ? early_idt_handler_array+0x120/0x120
> [   19.388953]  [<ffffffff81ffb2d6>] x86_64_start_reservations+0x2a/0x2c
> [   19.388955]  [<ffffffff81ffb415>] x86_64_start_kernel+0x13d/0x14c
> [   19.388968] Code: 00 04 00 00 c9 c3 48 33 86 70 03 00 00 48 c1 e0 10 48 85 c0 0f b6 87 90 00 00 00 75 28 83 e0 f8 83 c8 01 88 87 90 00 00 00 eb 82 <0f> 0b 0f b6 87 90 00 00 00 83 e0 f8 83 c8 03 88 87 90 00 00 00
> [   19.388970] RIP  [<ffffffff81837c48>] eth_type_trans+0xe8/0x140
> [   19.388970]  RSP <ffff88081e803db8>
> [   19.388996] ---[ end trace 107996155a43a15c ]---
> [   19.393422] Kernel panic - not syncing: Fatal exception in interrupt
>
>
> To reproduce:
>
>         git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
>         cd lkp-tests
>         bin/lkp install job.yaml  # job file is attached in this email
>         bin/lkp run     job.yaml
>
>
>
> Thanks,
> Kernel Test Robot

^ permalink raw reply

* Re: [RFC net-next 0/3] net: bridge: Allow CPU port configuration
From: Jiri Pirko @ 2016-11-22 22:08 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ido Schimmel, Florian Fainelli, netdev, davem, bridge, stephen,
	vivien.didelot, jiri, idosch
In-Reply-To: <20161122174829.GD14947@lunn.ch>

Tue, Nov 22, 2016 at 06:48:29PM CET, andrew@lunn.ch wrote:
>Hi Ido
> 
>> First of all, I want to be sure that when we say "CPU port", we're
>> talking about the same thing. In mlxsw, the CPU port is a pipe between
>> the device and the host, through which all packets trapped to the host
>> go through. So, when a packet is trapped, the driver reads its Rx
>> descriptor, checks through which port it ingressed, resolves its netdev,
>> sets skb->dev accordingly and injects it to the Rx path via
>> netif_receive_skb(). The CPU port itself isn't represented using a
>> netdev.
>
>With DSA, we have a real physical ethernet network interface for the
>'cpu' port. It connects to one of the ports of the switch. Frames on

Every port should be visible as a netdevice, including cpu port.
Would it make sence to have representors for those?

>this interface have an extra header, indicating which switch port it
>came from, and we do a similar resolving it to a slave netdev, strip
>of the header and injecting it into the receiver path via
>netif_receive_skb().
>
>	Andrew

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox