netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH RESEND net-next 0/2] ntuple filters with RSS
       [not found] <533b5eff-49b6-16c3-9873-dda3fb05c3d4@solarflare.com>
@ 2018-02-27 17:38 ` David Miller
  2018-02-27 17:55   ` Edward Cree
  0 siblings, 1 reply; 18+ messages in thread
From: David Miller @ 2018-02-27 17:38 UTC (permalink / raw)
  To: ecree; +Cc: linux-net-drivers, netdev, linville


Edward, none of these postings are making it to the list.

The problem is there are syntax errors in your email headers.

Any time a person's name contains a special character like ".",
that entire string must be enclosed in double quotes.

This is the case for "John W. Linville" so please add proper
quotes around such names and resend your patch series again.

Thank you.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RESEND net-next 0/2] ntuple filters with RSS
  2018-02-27 17:38 ` David Miller
@ 2018-02-27 17:55   ` Edward Cree
  2018-02-27 19:28     ` John W. Linville
  0 siblings, 1 reply; 18+ messages in thread
From: Edward Cree @ 2018-02-27 17:55 UTC (permalink / raw)
  To: David Miller; +Cc: linux-net-drivers, netdev, linville

On 27/02/18 17:38, David Miller wrote:
> The problem is there are syntax errors in your email headers.
>
> Any time a person's name contains a special character like ".",
> that entire string must be enclosed in double quotes.
>
> This is the case for "John W. Linville" so please add proper
> quotes around such names and resend your patch series again.
Thank you for spotting this!  I looked at the headers and failed
 to notice anything wrong with them.
I'm surprised that git-imap-send doesn't check for this...

Will resend with that fixed.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH RESEND net-next 0/2] ntuple filters with RSS
@ 2018-02-27 17:59 Edward Cree
  2018-02-27 18:02 ` [PATCH net-next 1/2] net: ethtool: extend RXNFC API to support RSS spreading of filter matches Edward Cree
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Edward Cree @ 2018-02-27 17:59 UTC (permalink / raw)
  To: linux-net-drivers, David Miller; +Cc: netdev, John W. Linville

This series introduces the ability to mark an ethtool steering filter to use
 RSS spreading, and the ability to create and configure multiple RSS contexts
 with different indirection tables, hash keys, and hash fields.
An implementation for the sfc driver (for 7000-series and later SFC NICs) is
 included in patch 2/2.

The anticipated use case of this feature is for steering traffic destined for
 a container (or virtual machine) to the subset of CPUs on which processes in
 the container (or the VM's vCPUs) are bound, while retaining the scalability
 of RSS spreading from the viewpoint inside the container.
The use of both a base queue number (ring_cookie) and indirection table is
 intended to allow re-use of a single RSS context to target multiple sets of
 CPUs.  For instance, if an 8-core system is hosting three containers on CPUs
 [1,2], [3,4] and [6,7], then a single RSS context with an equal-weight [0,1]
 indirection table could be used to target all three containers by setting
 ring_cookie to 1, 3 and 6 on the respective filters.

Edward Cree (2):
  net: ethtool: extend RXNFC API to support RSS spreading of filter
    matches
  sfc: support RSS spreading of ethtool ntuple filters

 drivers/net/ethernet/sfc/ef10.c       | 273 ++++++++++++++++++++++------------
 drivers/net/ethernet/sfc/efx.c        |  65 +++++++-
 drivers/net/ethernet/sfc/efx.h        |  12 +-
 drivers/net/ethernet/sfc/ethtool.c    | 153 ++++++++++++++++---
 drivers/net/ethernet/sfc/farch.c      |  11 +-
 drivers/net/ethernet/sfc/filter.h     |   7 +-
 drivers/net/ethernet/sfc/net_driver.h |  44 +++++-
 drivers/net/ethernet/sfc/nic.h        |   2 -
 drivers/net/ethernet/sfc/siena.c      |  26 ++--
 include/linux/ethtool.h               |   5 +
 include/uapi/linux/ethtool.h          |  32 +++-
 net/core/ethtool.c                    |  64 ++++++--
 12 files changed, 523 insertions(+), 171 deletions(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next 1/2] net: ethtool: extend RXNFC API to support RSS spreading of filter matches
  2018-02-27 17:59 [PATCH RESEND net-next 0/2] ntuple filters with RSS Edward Cree
@ 2018-02-27 18:02 ` Edward Cree
  2018-02-27 18:03 ` [PATCH net-next 2/2] sfc: support RSS spreading of ethtool ntuple filters Edward Cree
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Edward Cree @ 2018-02-27 18:02 UTC (permalink / raw)
  To: linux-net-drivers, David Miller; +Cc: netdev, John W. Linville

We use a two-step process to configure a filter with RSS spreading.  First,
 the RSS context is allocated and configured using ETHTOOL_SRSSH; this
 returns an identifier (rss_context) which can then be passed to subsequent
 invocations of ETHTOOL_SRXCLSRLINS to specify that the offset from the RSS
 indirection table lookup should be added to the queue number (ring_cookie)
 when delivering the packet.  Drivers for devices which can only use the
 indirection table entry directly (not add it to a base queue number)
 should reject rule insertions combining RSS with a nonzero ring_cookie.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 include/linux/ethtool.h      |  5 ++++
 include/uapi/linux/ethtool.h | 32 +++++++++++++++++-----
 net/core/ethtool.c           | 64 +++++++++++++++++++++++++++++++++-----------
 3 files changed, 80 insertions(+), 21 deletions(-)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 2ec41a7eb54f..ebe41811ed34 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -371,6 +371,11 @@ struct ethtool_ops {
 			    u8 *hfunc);
 	int	(*set_rxfh)(struct net_device *, const u32 *indir,
 			    const u8 *key, const u8 hfunc);
+	int	(*get_rxfh_context)(struct net_device *, u32 *indir, u8 *key,
+				    u8 *hfunc, u32 rss_context);
+	int	(*set_rxfh_context)(struct net_device *, const u32 *indir,
+				    const u8 *key, const u8 hfunc,
+				    u32 *rss_context, bool delete);
 	void	(*get_channels)(struct net_device *, struct ethtool_channels *);
 	int	(*set_channels)(struct net_device *, struct ethtool_channels *);
 	int	(*get_dump_flag)(struct net_device *, struct ethtool_dump *);
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 44a0b675a6bc..20da156aaf64 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -914,12 +914,15 @@ static inline __u64 ethtool_get_flow_spec_ring_vf(__u64 ring_cookie)
  * @flow_type: Type of flow to be affected, e.g. %TCP_V4_FLOW
  * @data: Command-dependent value
  * @fs: Flow classification rule
+ * @rss_context: RSS context to be affected
  * @rule_cnt: Number of rules to be affected
  * @rule_locs: Array of used rule locations
  *
  * For %ETHTOOL_GRXFH and %ETHTOOL_SRXFH, @data is a bitmask indicating
  * the fields included in the flow hash, e.g. %RXH_IP_SRC.  The following
- * structure fields must not be used.
+ * structure fields must not be used, except that if @flow_type includes
+ * the %FLOW_RSS flag, then @rss_context determines which RSS context to
+ * act on.
  *
  * For %ETHTOOL_GRXRINGS, @data is set to the number of RX rings/queues
  * on return.
@@ -931,7 +934,9 @@ static inline __u64 ethtool_get_flow_spec_ring_vf(__u64 ring_cookie)
  * set in @data then special location values should not be used.
  *
  * For %ETHTOOL_GRXCLSRULE, @fs.@location specifies the location of an
- * existing rule on entry and @fs contains the rule on return.
+ * existing rule on entry and @fs contains the rule on return; if
+ * @fs.@flow_type includes the %FLOW_RSS flag, then @rss_context is
+ * filled with the RSS context ID associated with the rule.
  *
  * For %ETHTOOL_GRXCLSRLALL, @rule_cnt specifies the array size of the
  * user buffer for @rule_locs on entry.  On return, @data is the size
@@ -942,7 +947,11 @@ static inline __u64 ethtool_get_flow_spec_ring_vf(__u64 ring_cookie)
  * For %ETHTOOL_SRXCLSRLINS, @fs specifies the rule to add or update.
  * @fs.@location either specifies the location to use or is a special
  * location value with %RX_CLS_LOC_SPECIAL flag set.  On return,
- * @fs.@location is the actual rule location.
+ * @fs.@location is the actual rule location.  If @fs.@flow_type
+ * includes the %FLOW_RSS flag, @rss_context is the RSS context ID to
+ * use for flow spreading traffic which matches this rule.  The value
+ * from the rxfh indirection table will be added to @fs.@ring_cookie
+ * to choose which ring to deliver to.
  *
  * For %ETHTOOL_SRXCLSRLDEL, @fs.@location specifies the location of an
  * existing rule on entry.
@@ -963,7 +972,10 @@ struct ethtool_rxnfc {
 	__u32				flow_type;
 	__u64				data;
 	struct ethtool_rx_flow_spec	fs;
-	__u32				rule_cnt;
+	union {
+		__u32			rule_cnt;
+		__u32			rss_context;
+	};
 	__u32				rule_locs[0];
 };
 
@@ -990,7 +1002,11 @@ struct ethtool_rxfh_indir {
 /**
  * struct ethtool_rxfh - command to get/set RX flow hash indir or/and hash key.
  * @cmd: Specific command number - %ETHTOOL_GRSSH or %ETHTOOL_SRSSH
- * @rss_context: RSS context identifier.
+ * @rss_context: RSS context identifier.  Context 0 is the default for normal
+ *	traffic; other contexts can be referenced as the destination for RX flow
+ *	classification rules.  %ETH_RXFH_CONTEXT_ALLOC is used with command
+ *	%ETHTOOL_SRSSH to allocate a new RSS context; on return this field will
+ *	contain the ID of the newly allocated context.
  * @indir_size: On entry, the array size of the user buffer for the
  *	indirection table, which may be zero, or (for %ETHTOOL_SRSSH),
  *	%ETH_RXFH_INDIR_NO_CHANGE.  On return from %ETHTOOL_GRSSH,
@@ -1009,7 +1025,8 @@ struct ethtool_rxfh_indir {
  * size should be returned.  For %ETHTOOL_SRSSH, an @indir_size of
  * %ETH_RXFH_INDIR_NO_CHANGE means that indir table setting is not requested
  * and a @indir_size of zero means the indir table should be reset to default
- * values. An hfunc of zero means that hash function setting is not requested.
+ * values (if @rss_context == 0) or that the RSS context should be deleted.
+ * An hfunc of zero means that hash function setting is not requested.
  */
 struct ethtool_rxfh {
 	__u32   cmd;
@@ -1021,6 +1038,7 @@ struct ethtool_rxfh {
 	__u32	rsvd32;
 	__u32   rss_config[0];
 };
+#define ETH_RXFH_CONTEXT_ALLOC		0xffffffff
 #define ETH_RXFH_INDIR_NO_CHANGE	0xffffffff
 
 /**
@@ -1635,6 +1653,8 @@ static inline int ethtool_validate_duplex(__u8 duplex)
 /* Flag to enable additional fields in struct ethtool_rx_flow_spec */
 #define	FLOW_EXT	0x80000000
 #define	FLOW_MAC_EXT	0x40000000
+/* Flag to enable RSS spreading of traffic matching rule (nfc only) */
+#define	FLOW_RSS	0x20000000
 
 /* L3-L4 network traffic flow hash options */
 #define	RXH_L2DA	(1 << 1)
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 107b122c8969..7c8685e47351 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1028,6 +1028,15 @@ static noinline_for_stack int ethtool_get_rxnfc(struct net_device *dev,
 	if (copy_from_user(&info, useraddr, info_size))
 		return -EFAULT;
 
+	/* If FLOW_RSS was requested then user-space must be using the
+	 * new definition, as FLOW_RSS is newer.
+	 */
+	if (cmd == ETHTOOL_GRXFH && info.flow_type & FLOW_RSS) {
+		info_size = sizeof(info);
+		if (copy_from_user(&info, useraddr, info_size))
+			return -EFAULT;
+	}
+
 	if (info.cmd == ETHTOOL_GRXCLSRLALL) {
 		if (info.rule_cnt > 0) {
 			if (info.rule_cnt <= KMALLOC_MAX_SIZE / sizeof(u32))
@@ -1257,9 +1266,11 @@ static noinline_for_stack int ethtool_get_rxfh(struct net_device *dev,
 	user_key_size = rxfh.key_size;
 
 	/* Check that reserved fields are 0 for now */
-	if (rxfh.rss_context || rxfh.rsvd8[0] || rxfh.rsvd8[1] ||
-	    rxfh.rsvd8[2] || rxfh.rsvd32)
+	if (rxfh.rsvd8[0] || rxfh.rsvd8[1] || rxfh.rsvd8[2] || rxfh.rsvd32)
 		return -EINVAL;
+	/* Most drivers don't handle rss_context, check it's 0 as well */
+	if (rxfh.rss_context && !ops->get_rxfh_context)
+		return -EOPNOTSUPP;
 
 	rxfh.indir_size = dev_indir_size;
 	rxfh.key_size = dev_key_size;
@@ -1282,7 +1293,12 @@ static noinline_for_stack int ethtool_get_rxfh(struct net_device *dev,
 	if (user_key_size)
 		hkey = rss_config + indir_bytes;
 
-	ret = dev->ethtool_ops->get_rxfh(dev, indir, hkey, &dev_hfunc);
+	if (rxfh.rss_context)
+		ret = dev->ethtool_ops->get_rxfh_context(dev, indir, hkey,
+							 &dev_hfunc,
+							 rxfh.rss_context);
+	else
+		ret = dev->ethtool_ops->get_rxfh(dev, indir, hkey, &dev_hfunc);
 	if (ret)
 		goto out;
 
@@ -1312,6 +1328,7 @@ static noinline_for_stack int ethtool_set_rxfh(struct net_device *dev,
 	u8 *hkey = NULL;
 	u8 *rss_config;
 	u32 rss_cfg_offset = offsetof(struct ethtool_rxfh, rss_config[0]);
+	bool delete = false;
 
 	if (!ops->get_rxnfc || !ops->set_rxfh)
 		return -EOPNOTSUPP;
@@ -1325,9 +1342,11 @@ static noinline_for_stack int ethtool_set_rxfh(struct net_device *dev,
 		return -EFAULT;
 
 	/* Check that reserved fields are 0 for now */
-	if (rxfh.rss_context || rxfh.rsvd8[0] || rxfh.rsvd8[1] ||
-	    rxfh.rsvd8[2] || rxfh.rsvd32)
+	if (rxfh.rsvd8[0] || rxfh.rsvd8[1] || rxfh.rsvd8[2] || rxfh.rsvd32)
 		return -EINVAL;
+	/* Most drivers don't handle rss_context, check it's 0 as well */
+	if (rxfh.rss_context && !ops->set_rxfh_context)
+		return -EOPNOTSUPP;
 
 	/* If either indir, hash key or function is valid, proceed further.
 	 * Must request at least one change: indir size, hash key or function.
@@ -1352,7 +1371,8 @@ static noinline_for_stack int ethtool_set_rxfh(struct net_device *dev,
 	if (ret)
 		goto out;
 
-	/* rxfh.indir_size == 0 means reset the indir table to default.
+	/* rxfh.indir_size == 0 means reset the indir table to default (master
+	 * context) or delete the context (other RSS contexts).
 	 * rxfh.indir_size == ETH_RXFH_INDIR_NO_CHANGE means leave it unchanged.
 	 */
 	if (rxfh.indir_size &&
@@ -1365,9 +1385,13 @@ static noinline_for_stack int ethtool_set_rxfh(struct net_device *dev,
 		if (ret)
 			goto out;
 	} else if (rxfh.indir_size == 0) {
-		indir = (u32 *)rss_config;
-		for (i = 0; i < dev_indir_size; i++)
-			indir[i] = ethtool_rxfh_indir_default(i, rx_rings.data);
+		if (rxfh.rss_context == 0) {
+			indir = (u32 *)rss_config;
+			for (i = 0; i < dev_indir_size; i++)
+				indir[i] = ethtool_rxfh_indir_default(i, rx_rings.data);
+		} else {
+			delete = true;
+		}
 	}
 
 	if (rxfh.key_size) {
@@ -1380,15 +1404,25 @@ static noinline_for_stack int ethtool_set_rxfh(struct net_device *dev,
 		}
 	}
 
-	ret = ops->set_rxfh(dev, indir, hkey, rxfh.hfunc);
+	if (rxfh.rss_context)
+		ret = ops->set_rxfh_context(dev, indir, hkey, rxfh.hfunc,
+					    &rxfh.rss_context, delete);
+	else
+		ret = ops->set_rxfh(dev, indir, hkey, rxfh.hfunc);
 	if (ret)
 		goto out;
 
-	/* indicate whether rxfh was set to default */
-	if (rxfh.indir_size == 0)
-		dev->priv_flags &= ~IFF_RXFH_CONFIGURED;
-	else if (rxfh.indir_size != ETH_RXFH_INDIR_NO_CHANGE)
-		dev->priv_flags |= IFF_RXFH_CONFIGURED;
+	if (copy_to_user(useraddr + offsetof(struct ethtool_rxfh, rss_context),
+			 &rxfh.rss_context, sizeof(rxfh.rss_context)))
+		ret = -EFAULT;
+
+	if (!rxfh.rss_context) {
+		/* indicate whether rxfh was set to default */
+		if (rxfh.indir_size == 0)
+			dev->priv_flags &= ~IFF_RXFH_CONFIGURED;
+		else if (rxfh.indir_size != ETH_RXFH_INDIR_NO_CHANGE)
+			dev->priv_flags |= IFF_RXFH_CONFIGURED;
+	}
 
 out:
 	kfree(rss_config);

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 2/2] sfc: support RSS spreading of ethtool ntuple filters
  2018-02-27 17:59 [PATCH RESEND net-next 0/2] ntuple filters with RSS Edward Cree
  2018-02-27 18:02 ` [PATCH net-next 1/2] net: ethtool: extend RXNFC API to support RSS spreading of filter matches Edward Cree
@ 2018-02-27 18:03 ` Edward Cree
  2018-03-01 20:51   ` kbuild test robot
  2018-02-27 23:47 ` [PATCH RESEND net-next 0/2] ntuple filters with RSS Jakub Kicinski
  2018-03-01 18:36 ` David Miller
  3 siblings, 1 reply; 18+ messages in thread
From: Edward Cree @ 2018-02-27 18:03 UTC (permalink / raw)
  To: linux-net-drivers, David Miller; +Cc: netdev, John W. Linville

Use a linked list to associate user-facing context IDs with FW-facing
 context IDs (since the latter can change after an MC reset).

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 drivers/net/ethernet/sfc/ef10.c       | 273 ++++++++++++++++++++++------------
 drivers/net/ethernet/sfc/efx.c        |  65 +++++++-
 drivers/net/ethernet/sfc/efx.h        |  12 +-
 drivers/net/ethernet/sfc/ethtool.c    | 153 ++++++++++++++++---
 drivers/net/ethernet/sfc/farch.c      |  11 +-
 drivers/net/ethernet/sfc/filter.h     |   7 +-
 drivers/net/ethernet/sfc/net_driver.h |  44 +++++-
 drivers/net/ethernet/sfc/nic.h        |   2 -
 drivers/net/ethernet/sfc/siena.c      |  26 ++--
 9 files changed, 443 insertions(+), 150 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 75fbf58e421c..30d69bac6b8f 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -28,9 +28,6 @@ enum {
 	EFX_EF10_TEST = 1,
 	EFX_EF10_REFILL,
 };
-
-/* The reserved RSS context value */
-#define EFX_EF10_RSS_CONTEXT_INVALID	0xffffffff
 /* The maximum size of a shared RSS context */
 /* TODO: this should really be from the mcdi protocol export */
 #define EFX_EF10_MAX_SHARED_RSS_CONTEXT_SIZE 64UL
@@ -697,7 +694,7 @@ static int efx_ef10_probe(struct efx_nic *efx)
 	}
 	nic_data->warm_boot_count = rc;
 
-	nic_data->rx_rss_context = EFX_EF10_RSS_CONTEXT_INVALID;
+	efx->rss_context.context_id = EFX_EF10_RSS_CONTEXT_INVALID;
 
 	nic_data->vport_id = EVB_PORT_ID_ASSIGNED;
 
@@ -1489,8 +1486,8 @@ static int efx_ef10_init_nic(struct efx_nic *efx)
 	}
 
 	/* don't fail init if RSS setup doesn't work */
-	rc = efx->type->rx_push_rss_config(efx, false, efx->rx_indir_table, NULL);
-	efx->rss_active = (rc == 0);
+	rc = efx->type->rx_push_rss_config(efx, false,
+					   efx->rss_context.rx_indir_table, NULL);
 
 	return 0;
 }
@@ -1507,7 +1504,7 @@ static void efx_ef10_reset_mc_allocations(struct efx_nic *efx)
 	nic_data->must_restore_filters = true;
 	nic_data->must_restore_piobufs = true;
 	efx_ef10_forget_old_piobufs(efx);
-	nic_data->rx_rss_context = EFX_EF10_RSS_CONTEXT_INVALID;
+	efx->rss_context.context_id = EFX_EF10_RSS_CONTEXT_INVALID;
 
 	/* Driver-created vswitches and vports must be re-created */
 	nic_data->must_probe_vswitching = true;
@@ -2703,27 +2700,30 @@ static int efx_ef10_get_rss_flags(struct efx_nic *efx, u32 context, u32 *flags)
  * Defaults are 4-tuple for TCP and 2-tuple for UDP and other-IP, so we
  * just need to set the UDP ports flags (for both IP versions).
  */
-static void efx_ef10_set_rss_flags(struct efx_nic *efx, u32 context)
+static void efx_ef10_set_rss_flags(struct efx_nic *efx,
+				   struct efx_rss_context *ctx)
 {
 	MCDI_DECLARE_BUF(inbuf, MC_CMD_RSS_CONTEXT_SET_FLAGS_IN_LEN);
 	u32 flags;
 
 	BUILD_BUG_ON(MC_CMD_RSS_CONTEXT_SET_FLAGS_OUT_LEN != 0);
 
-	if (efx_ef10_get_rss_flags(efx, context, &flags) != 0)
+	if (efx_ef10_get_rss_flags(efx, ctx->context_id, &flags) != 0)
 		return;
-	MCDI_SET_DWORD(inbuf, RSS_CONTEXT_SET_FLAGS_IN_RSS_CONTEXT_ID, context);
+	MCDI_SET_DWORD(inbuf, RSS_CONTEXT_SET_FLAGS_IN_RSS_CONTEXT_ID,
+		       ctx->context_id);
 	flags |= RSS_MODE_HASH_PORTS << MC_CMD_RSS_CONTEXT_GET_FLAGS_OUT_UDP_IPV4_RSS_MODE_LBN;
 	flags |= RSS_MODE_HASH_PORTS << MC_CMD_RSS_CONTEXT_GET_FLAGS_OUT_UDP_IPV6_RSS_MODE_LBN;
 	MCDI_SET_DWORD(inbuf, RSS_CONTEXT_SET_FLAGS_IN_FLAGS, flags);
 	if (!efx_mcdi_rpc(efx, MC_CMD_RSS_CONTEXT_SET_FLAGS, inbuf, sizeof(inbuf),
 			  NULL, 0, NULL))
 		/* Succeeded, so UDP 4-tuple is now enabled */
-		efx->rx_hash_udp_4tuple = true;
+		ctx->rx_hash_udp_4tuple = true;
 }
 
-static int efx_ef10_alloc_rss_context(struct efx_nic *efx, u32 *context,
-				      bool exclusive, unsigned *context_size)
+static int efx_ef10_alloc_rss_context(struct efx_nic *efx, bool exclusive,
+				      struct efx_rss_context *ctx,
+				      unsigned *context_size)
 {
 	MCDI_DECLARE_BUF(inbuf, MC_CMD_RSS_CONTEXT_ALLOC_IN_LEN);
 	MCDI_DECLARE_BUF(outbuf, MC_CMD_RSS_CONTEXT_ALLOC_OUT_LEN);
@@ -2739,7 +2739,7 @@ static int efx_ef10_alloc_rss_context(struct efx_nic *efx, u32 *context,
 				    EFX_EF10_MAX_SHARED_RSS_CONTEXT_SIZE);
 
 	if (!exclusive && rss_spread == 1) {
-		*context = EFX_EF10_RSS_CONTEXT_INVALID;
+		ctx->context_id = EFX_EF10_RSS_CONTEXT_INVALID;
 		if (context_size)
 			*context_size = 1;
 		return 0;
@@ -2762,29 +2762,26 @@ static int efx_ef10_alloc_rss_context(struct efx_nic *efx, u32 *context,
 	if (outlen < MC_CMD_RSS_CONTEXT_ALLOC_OUT_LEN)
 		return -EIO;
 
-	*context = MCDI_DWORD(outbuf, RSS_CONTEXT_ALLOC_OUT_RSS_CONTEXT_ID);
+	ctx->context_id = MCDI_DWORD(outbuf, RSS_CONTEXT_ALLOC_OUT_RSS_CONTEXT_ID);
 
 	if (context_size)
 		*context_size = rss_spread;
 
 	if (nic_data->datapath_caps &
 	    1 << MC_CMD_GET_CAPABILITIES_OUT_ADDITIONAL_RSS_MODES_LBN)
-		efx_ef10_set_rss_flags(efx, *context);
+		efx_ef10_set_rss_flags(efx, ctx);
 
 	return 0;
 }
 
-static void efx_ef10_free_rss_context(struct efx_nic *efx, u32 context)
+static int efx_ef10_free_rss_context(struct efx_nic *efx, u32 context)
 {
 	MCDI_DECLARE_BUF(inbuf, MC_CMD_RSS_CONTEXT_FREE_IN_LEN);
-	int rc;
 
 	MCDI_SET_DWORD(inbuf, RSS_CONTEXT_FREE_IN_RSS_CONTEXT_ID,
 		       context);
-
-	rc = efx_mcdi_rpc(efx, MC_CMD_RSS_CONTEXT_FREE, inbuf, sizeof(inbuf),
+	return efx_mcdi_rpc(efx, MC_CMD_RSS_CONTEXT_FREE, inbuf, sizeof(inbuf),
 			    NULL, 0, NULL);
-	WARN_ON(rc != 0);
 }
 
 static int efx_ef10_populate_rss_table(struct efx_nic *efx, u32 context,
@@ -2796,15 +2793,15 @@ static int efx_ef10_populate_rss_table(struct efx_nic *efx, u32 context,
 
 	MCDI_SET_DWORD(tablebuf, RSS_CONTEXT_SET_TABLE_IN_RSS_CONTEXT_ID,
 		       context);
-	BUILD_BUG_ON(ARRAY_SIZE(efx->rx_indir_table) !=
+	BUILD_BUG_ON(ARRAY_SIZE(efx->rss_context.rx_indir_table) !=
 		     MC_CMD_RSS_CONTEXT_SET_TABLE_IN_INDIRECTION_TABLE_LEN);
 
-	/* This iterates over the length of efx->rx_indir_table, but copies
-	 * bytes from rx_indir_table.  That's because the latter is a pointer
-	 * rather than an array, but should have the same length.
-	 * The efx->rx_hash_key loop below is similar.
+	/* This iterates over the length of efx->rss_context.rx_indir_table, but
+	 * copies bytes from rx_indir_table.  That's because the latter is a
+	 * pointer rather than an array, but should have the same length.
+	 * The efx->rss_context.rx_hash_key loop below is similar.
 	 */
-	for (i = 0; i < ARRAY_SIZE(efx->rx_indir_table); ++i)
+	for (i = 0; i < ARRAY_SIZE(efx->rss_context.rx_indir_table); ++i)
 		MCDI_PTR(tablebuf,
 			 RSS_CONTEXT_SET_TABLE_IN_INDIRECTION_TABLE)[i] =
 				(u8) rx_indir_table[i];
@@ -2816,9 +2813,9 @@ static int efx_ef10_populate_rss_table(struct efx_nic *efx, u32 context,
 
 	MCDI_SET_DWORD(keybuf, RSS_CONTEXT_SET_KEY_IN_RSS_CONTEXT_ID,
 		       context);
-	BUILD_BUG_ON(ARRAY_SIZE(efx->rx_hash_key) !=
+	BUILD_BUG_ON(ARRAY_SIZE(efx->rss_context.rx_hash_key) !=
 		     MC_CMD_RSS_CONTEXT_SET_KEY_IN_TOEPLITZ_KEY_LEN);
-	for (i = 0; i < ARRAY_SIZE(efx->rx_hash_key); ++i)
+	for (i = 0; i < ARRAY_SIZE(efx->rss_context.rx_hash_key); ++i)
 		MCDI_PTR(keybuf, RSS_CONTEXT_SET_KEY_IN_TOEPLITZ_KEY)[i] = key[i];
 
 	return efx_mcdi_rpc(efx, MC_CMD_RSS_CONTEXT_SET_KEY, keybuf,
@@ -2827,27 +2824,27 @@ static int efx_ef10_populate_rss_table(struct efx_nic *efx, u32 context,
 
 static void efx_ef10_rx_free_indir_table(struct efx_nic *efx)
 {
-	struct efx_ef10_nic_data *nic_data = efx->nic_data;
+	int rc;
 
-	if (nic_data->rx_rss_context != EFX_EF10_RSS_CONTEXT_INVALID)
-		efx_ef10_free_rss_context(efx, nic_data->rx_rss_context);
-	nic_data->rx_rss_context = EFX_EF10_RSS_CONTEXT_INVALID;
+	if (efx->rss_context.context_id != EFX_EF10_RSS_CONTEXT_INVALID) {
+		rc = efx_ef10_free_rss_context(efx, efx->rss_context.context_id);
+		WARN_ON(rc != 0);
+	}
+	efx->rss_context.context_id = EFX_EF10_RSS_CONTEXT_INVALID;
 }
 
 static int efx_ef10_rx_push_shared_rss_config(struct efx_nic *efx,
 					      unsigned *context_size)
 {
-	u32 new_rx_rss_context;
 	struct efx_ef10_nic_data *nic_data = efx->nic_data;
-	int rc = efx_ef10_alloc_rss_context(efx, &new_rx_rss_context,
-					    false, context_size);
+	int rc = efx_ef10_alloc_rss_context(efx, false, &efx->rss_context,
+					    context_size);
 
 	if (rc != 0)
 		return rc;
 
-	nic_data->rx_rss_context = new_rx_rss_context;
 	nic_data->rx_rss_context_exclusive = false;
-	efx_set_default_rx_indir_table(efx);
+	efx_set_default_rx_indir_table(efx, &efx->rss_context);
 	return 0;
 }
 
@@ -2855,50 +2852,79 @@ static int efx_ef10_rx_push_exclusive_rss_config(struct efx_nic *efx,
 						 const u32 *rx_indir_table,
 						 const u8 *key)
 {
+	u32 old_rx_rss_context = efx->rss_context.context_id;
 	struct efx_ef10_nic_data *nic_data = efx->nic_data;
 	int rc;
-	u32 new_rx_rss_context;
 
-	if (nic_data->rx_rss_context == EFX_EF10_RSS_CONTEXT_INVALID ||
+	if (efx->rss_context.context_id == EFX_EF10_RSS_CONTEXT_INVALID ||
 	    !nic_data->rx_rss_context_exclusive) {
-		rc = efx_ef10_alloc_rss_context(efx, &new_rx_rss_context,
-						true, NULL);
+		rc = efx_ef10_alloc_rss_context(efx, true, &efx->rss_context,
+						NULL);
 		if (rc == -EOPNOTSUPP)
 			return rc;
 		else if (rc != 0)
 			goto fail1;
-	} else {
-		new_rx_rss_context = nic_data->rx_rss_context;
 	}
 
-	rc = efx_ef10_populate_rss_table(efx, new_rx_rss_context,
+	rc = efx_ef10_populate_rss_table(efx, efx->rss_context.context_id,
 					 rx_indir_table, key);
 	if (rc != 0)
 		goto fail2;
 
-	if (nic_data->rx_rss_context != new_rx_rss_context)
-		efx_ef10_rx_free_indir_table(efx);
-	nic_data->rx_rss_context = new_rx_rss_context;
+	if (efx->rss_context.context_id != old_rx_rss_context &&
+	    old_rx_rss_context != EFX_EF10_RSS_CONTEXT_INVALID)
+		WARN_ON(efx_ef10_free_rss_context(efx, old_rx_rss_context) != 0);
 	nic_data->rx_rss_context_exclusive = true;
-	if (rx_indir_table != efx->rx_indir_table)
-		memcpy(efx->rx_indir_table, rx_indir_table,
-		       sizeof(efx->rx_indir_table));
-	if (key != efx->rx_hash_key)
-		memcpy(efx->rx_hash_key, key, efx->type->rx_hash_key_size);
+	if (rx_indir_table != efx->rss_context.rx_indir_table)
+		memcpy(efx->rss_context.rx_indir_table, rx_indir_table,
+		       sizeof(efx->rss_context.rx_indir_table));
+	if (key != efx->rss_context.rx_hash_key)
+		memcpy(efx->rss_context.rx_hash_key, key,
+		       efx->type->rx_hash_key_size);
 
 	return 0;
 
 fail2:
-	if (new_rx_rss_context != nic_data->rx_rss_context)
-		efx_ef10_free_rss_context(efx, new_rx_rss_context);
+	if (old_rx_rss_context != efx->rss_context.context_id) {
+		WARN_ON(efx_ef10_free_rss_context(efx, efx->rss_context.context_id) != 0);
+		efx->rss_context.context_id = old_rx_rss_context;
+	}
 fail1:
 	netif_err(efx, hw, efx->net_dev, "%s: failed rc=%d\n", __func__, rc);
 	return rc;
 }
 
-static int efx_ef10_rx_pull_rss_config(struct efx_nic *efx)
+static int efx_ef10_rx_push_rss_context_config(struct efx_nic *efx,
+					       struct efx_rss_context *ctx,
+					       const u32 *rx_indir_table,
+					       const u8 *key)
+{
+	int rc;
+
+	if (ctx->context_id == EFX_EF10_RSS_CONTEXT_INVALID) {
+		rc = efx_ef10_alloc_rss_context(efx, true, ctx, NULL);
+		if (rc)
+			return rc;
+	}
+
+	if (!rx_indir_table) /* Delete this context */
+		return efx_ef10_free_rss_context(efx, ctx->context_id);
+
+	rc = efx_ef10_populate_rss_table(efx, ctx->context_id,
+					 rx_indir_table, key);
+	if (rc)
+		return rc;
+
+	memcpy(ctx->rx_indir_table, rx_indir_table,
+	       sizeof(efx->rss_context.rx_indir_table));
+	memcpy(ctx->rx_hash_key, key, efx->type->rx_hash_key_size);
+
+	return 0;
+}
+
+static int efx_ef10_rx_pull_rss_context_config(struct efx_nic *efx,
+					       struct efx_rss_context *ctx)
 {
-	struct efx_ef10_nic_data *nic_data = efx->nic_data;
 	MCDI_DECLARE_BUF(inbuf, MC_CMD_RSS_CONTEXT_GET_TABLE_IN_LEN);
 	MCDI_DECLARE_BUF(tablebuf, MC_CMD_RSS_CONTEXT_GET_TABLE_OUT_LEN);
 	MCDI_DECLARE_BUF(keybuf, MC_CMD_RSS_CONTEXT_GET_KEY_OUT_LEN);
@@ -2908,12 +2934,12 @@ static int efx_ef10_rx_pull_rss_config(struct efx_nic *efx)
 	BUILD_BUG_ON(MC_CMD_RSS_CONTEXT_GET_TABLE_IN_LEN !=
 		     MC_CMD_RSS_CONTEXT_GET_KEY_IN_LEN);
 
-	if (nic_data->rx_rss_context == EFX_EF10_RSS_CONTEXT_INVALID)
+	if (ctx->context_id == EFX_EF10_RSS_CONTEXT_INVALID)
 		return -ENOENT;
 
 	MCDI_SET_DWORD(inbuf, RSS_CONTEXT_GET_TABLE_IN_RSS_CONTEXT_ID,
-		       nic_data->rx_rss_context);
-	BUILD_BUG_ON(ARRAY_SIZE(efx->rx_indir_table) !=
+		       ctx->context_id);
+	BUILD_BUG_ON(ARRAY_SIZE(ctx->rx_indir_table) !=
 		     MC_CMD_RSS_CONTEXT_GET_TABLE_OUT_INDIRECTION_TABLE_LEN);
 	rc = efx_mcdi_rpc(efx, MC_CMD_RSS_CONTEXT_GET_TABLE, inbuf, sizeof(inbuf),
 			  tablebuf, sizeof(tablebuf), &outlen);
@@ -2923,13 +2949,13 @@ static int efx_ef10_rx_pull_rss_config(struct efx_nic *efx)
 	if (WARN_ON(outlen != MC_CMD_RSS_CONTEXT_GET_TABLE_OUT_LEN))
 		return -EIO;
 
-	for (i = 0; i < ARRAY_SIZE(efx->rx_indir_table); i++)
-		efx->rx_indir_table[i] = MCDI_PTR(tablebuf,
+	for (i = 0; i < ARRAY_SIZE(ctx->rx_indir_table); i++)
+		ctx->rx_indir_table[i] = MCDI_PTR(tablebuf,
 				RSS_CONTEXT_GET_TABLE_OUT_INDIRECTION_TABLE)[i];
 
 	MCDI_SET_DWORD(inbuf, RSS_CONTEXT_GET_KEY_IN_RSS_CONTEXT_ID,
-		       nic_data->rx_rss_context);
-	BUILD_BUG_ON(ARRAY_SIZE(efx->rx_hash_key) !=
+		       ctx->context_id);
+	BUILD_BUG_ON(ARRAY_SIZE(ctx->rx_hash_key) !=
 		     MC_CMD_RSS_CONTEXT_SET_KEY_IN_TOEPLITZ_KEY_LEN);
 	rc = efx_mcdi_rpc(efx, MC_CMD_RSS_CONTEXT_GET_KEY, inbuf, sizeof(inbuf),
 			  keybuf, sizeof(keybuf), &outlen);
@@ -2939,13 +2965,38 @@ static int efx_ef10_rx_pull_rss_config(struct efx_nic *efx)
 	if (WARN_ON(outlen != MC_CMD_RSS_CONTEXT_GET_KEY_OUT_LEN))
 		return -EIO;
 
-	for (i = 0; i < ARRAY_SIZE(efx->rx_hash_key); ++i)
-		efx->rx_hash_key[i] = MCDI_PTR(
+	for (i = 0; i < ARRAY_SIZE(ctx->rx_hash_key); ++i)
+		ctx->rx_hash_key[i] = MCDI_PTR(
 				keybuf, RSS_CONTEXT_GET_KEY_OUT_TOEPLITZ_KEY)[i];
 
 	return 0;
 }
 
+static int efx_ef10_rx_pull_rss_config(struct efx_nic *efx)
+{
+	return efx_ef10_rx_pull_rss_context_config(efx, &efx->rss_context);
+}
+
+static void efx_ef10_rx_restore_rss_contexts(struct efx_nic *efx)
+{
+	struct efx_rss_context *ctx;
+	int rc;
+
+	list_for_each_entry(ctx, &efx->rss_context.list, list) {
+		/* previous NIC RSS context is gone */
+		ctx->context_id = EFX_EF10_RSS_CONTEXT_INVALID;
+		/* so try to allocate a new one */
+		rc = efx_ef10_rx_push_rss_context_config(efx, ctx,
+							 ctx->rx_indir_table,
+							 ctx->rx_hash_key);
+		if (rc)
+			netif_warn(efx, probe, efx->net_dev,
+				   "failed to restore RSS context %u, rc=%d"
+				   "; RSS filters may fail to be applied\n",
+				   ctx->user_id, rc);
+	}
+}
+
 static int efx_ef10_pf_rx_push_rss_config(struct efx_nic *efx, bool user,
 					  const u32 *rx_indir_table,
 					  const u8 *key)
@@ -2956,7 +3007,7 @@ static int efx_ef10_pf_rx_push_rss_config(struct efx_nic *efx, bool user,
 		return 0;
 
 	if (!key)
-		key = efx->rx_hash_key;
+		key = efx->rss_context.rx_hash_key;
 
 	rc = efx_ef10_rx_push_exclusive_rss_config(efx, rx_indir_table, key);
 
@@ -2965,7 +3016,8 @@ static int efx_ef10_pf_rx_push_rss_config(struct efx_nic *efx, bool user,
 		bool mismatch = false;
 		size_t i;
 
-		for (i = 0; i < ARRAY_SIZE(efx->rx_indir_table) && !mismatch;
+		for (i = 0;
+		     i < ARRAY_SIZE(efx->rss_context.rx_indir_table) && !mismatch;
 		     i++)
 			mismatch = rx_indir_table[i] !=
 				ethtool_rxfh_indir_default(i, efx->rss_spread);
@@ -3000,11 +3052,9 @@ static int efx_ef10_vf_rx_push_rss_config(struct efx_nic *efx, bool user,
 					  const u8 *key
 					  __attribute__ ((unused)))
 {
-	struct efx_ef10_nic_data *nic_data = efx->nic_data;
-
 	if (user)
 		return -EOPNOTSUPP;
-	if (nic_data->rx_rss_context != EFX_EF10_RSS_CONTEXT_INVALID)
+	if (efx->rss_context.context_id != EFX_EF10_RSS_CONTEXT_INVALID)
 		return 0;
 	return efx_ef10_rx_push_shared_rss_config(efx, NULL);
 }
@@ -4109,6 +4159,7 @@ efx_ef10_filter_push_prep_set_match_fields(struct efx_nic *efx,
 static void efx_ef10_filter_push_prep(struct efx_nic *efx,
 				      const struct efx_filter_spec *spec,
 				      efx_dword_t *inbuf, u64 handle,
+				      struct efx_rss_context *ctx,
 				      bool replacing)
 {
 	struct efx_ef10_nic_data *nic_data = efx->nic_data;
@@ -4116,11 +4167,16 @@ static void efx_ef10_filter_push_prep(struct efx_nic *efx,
 
 	memset(inbuf, 0, MC_CMD_FILTER_OP_EXT_IN_LEN);
 
-	/* Remove RSS flag if we don't have an RSS context. */
-	if (flags & EFX_FILTER_FLAG_RX_RSS &&
-	    spec->rss_context == EFX_FILTER_RSS_CONTEXT_DEFAULT &&
-	    nic_data->rx_rss_context == EFX_EF10_RSS_CONTEXT_INVALID)
-		flags &= ~EFX_FILTER_FLAG_RX_RSS;
+	/* If RSS filter, caller better have given us an RSS context */
+	if (flags & EFX_FILTER_FLAG_RX_RSS) {
+		/* We don't have the ability to return an error, so we'll just
+		 * log a warning and disable RSS for the filter.
+		 */
+		if (WARN_ON_ONCE(!ctx))
+			flags &= ~EFX_FILTER_FLAG_RX_RSS;
+		else if (WARN_ON_ONCE(ctx->context_id == EFX_EF10_RSS_CONTEXT_INVALID))
+			flags &= ~EFX_FILTER_FLAG_RX_RSS;
+	}
 
 	if (replacing) {
 		MCDI_SET_DWORD(inbuf, FILTER_OP_IN_OP,
@@ -4146,21 +4202,18 @@ static void efx_ef10_filter_push_prep(struct efx_nic *efx,
 		       MC_CMD_FILTER_OP_IN_RX_MODE_RSS :
 		       MC_CMD_FILTER_OP_IN_RX_MODE_SIMPLE);
 	if (flags & EFX_FILTER_FLAG_RX_RSS)
-		MCDI_SET_DWORD(inbuf, FILTER_OP_IN_RX_CONTEXT,
-			       spec->rss_context !=
-			       EFX_FILTER_RSS_CONTEXT_DEFAULT ?
-			       spec->rss_context : nic_data->rx_rss_context);
+		MCDI_SET_DWORD(inbuf, FILTER_OP_IN_RX_CONTEXT, ctx->context_id);
 }
 
 static int efx_ef10_filter_push(struct efx_nic *efx,
-				const struct efx_filter_spec *spec,
-				u64 *handle, bool replacing)
+				const struct efx_filter_spec *spec, u64 *handle,
+				struct efx_rss_context *ctx, bool replacing)
 {
 	MCDI_DECLARE_BUF(inbuf, MC_CMD_FILTER_OP_EXT_IN_LEN);
 	MCDI_DECLARE_BUF(outbuf, MC_CMD_FILTER_OP_EXT_OUT_LEN);
 	int rc;
 
-	efx_ef10_filter_push_prep(efx, spec, inbuf, *handle, replacing);
+	efx_ef10_filter_push_prep(efx, spec, inbuf, *handle, ctx, replacing);
 	rc = efx_mcdi_rpc(efx, MC_CMD_FILTER_OP, inbuf, sizeof(inbuf),
 			  outbuf, sizeof(outbuf), NULL);
 	if (rc == 0)
@@ -4253,6 +4306,7 @@ static s32 efx_ef10_filter_insert(struct efx_nic *efx,
 	DECLARE_BITMAP(mc_rem_map, EFX_EF10_FILTER_SEARCH_LIMIT);
 	struct efx_filter_spec *saved_spec;
 	unsigned int match_pri, hash;
+	struct efx_rss_context *ctx;
 	unsigned int priv_flags;
 	bool replacing = false;
 	int ins_index = -1;
@@ -4275,6 +4329,18 @@ static s32 efx_ef10_filter_insert(struct efx_nic *efx,
 	if (is_mc_recip)
 		bitmap_zero(mc_rem_map, EFX_EF10_FILTER_SEARCH_LIMIT);
 
+	if (spec->flags & EFX_FILTER_FLAG_RX_RSS) {
+		if (spec->rss_context)
+			ctx = efx_find_rss_context_entry(spec->rss_context,
+							 &efx->rss_context.list);
+		else
+			ctx = &efx->rss_context;
+		if (!ctx)
+			return -ENOENT;
+		if (ctx->context_id == EFX_EF10_RSS_CONTEXT_INVALID)
+			return -EOPNOTSUPP;
+	}
+
 	/* Find any existing filters with the same match tuple or
 	 * else a free slot to insert at.  If any of them are busy,
 	 * we have to wait and retry.
@@ -4390,7 +4456,7 @@ static s32 efx_ef10_filter_insert(struct efx_nic *efx,
 	spin_unlock_bh(&efx->filter_lock);
 
 	rc = efx_ef10_filter_push(efx, spec, &table->entry[ins_index].handle,
-				  replacing);
+				  ctx, replacing);
 
 	/* Finalise the software table entry */
 	spin_lock_bh(&efx->filter_lock);
@@ -4534,12 +4600,13 @@ static int efx_ef10_filter_remove_internal(struct efx_nic *efx,
 
 		new_spec.priority = EFX_FILTER_PRI_AUTO;
 		new_spec.flags = (EFX_FILTER_FLAG_RX |
-				  (efx_rss_enabled(efx) ?
+				  (efx_rss_active(&efx->rss_context) ?
 				   EFX_FILTER_FLAG_RX_RSS : 0));
 		new_spec.dmaq_id = 0;
-		new_spec.rss_context = EFX_FILTER_RSS_CONTEXT_DEFAULT;
+		new_spec.rss_context = 0;
 		rc = efx_ef10_filter_push(efx, &new_spec,
 					  &table->entry[filter_idx].handle,
+					  &efx->rss_context,
 					  true);
 
 		spin_lock_bh(&efx->filter_lock);
@@ -4783,7 +4850,8 @@ static s32 efx_ef10_filter_rfs_insert(struct efx_nic *efx,
 	cookie = replacing << 31 | ins_index << 16 | spec->dmaq_id;
 
 	efx_ef10_filter_push_prep(efx, spec, inbuf,
-				  table->entry[ins_index].handle, replacing);
+				  table->entry[ins_index].handle, NULL,
+				  replacing);
 	efx_mcdi_rpc_async(efx, MC_CMD_FILTER_OP, inbuf, sizeof(inbuf),
 			   MC_CMD_FILTER_OP_OUT_LEN,
 			   efx_ef10_filter_rfs_insert_complete, cookie);
@@ -5104,6 +5172,7 @@ static void efx_ef10_filter_table_restore(struct efx_nic *efx)
 	unsigned int invalid_filters = 0, failed = 0;
 	struct efx_ef10_filter_vlan *vlan;
 	struct efx_filter_spec *spec;
+	struct efx_rss_context *ctx;
 	unsigned int filter_idx;
 	u32 mcdi_flags;
 	int match_pri;
@@ -5133,17 +5202,34 @@ static void efx_ef10_filter_table_restore(struct efx_nic *efx)
 			invalid_filters++;
 			goto not_restored;
 		}
-		if (spec->rss_context != EFX_FILTER_RSS_CONTEXT_DEFAULT &&
-		    spec->rss_context != nic_data->rx_rss_context)
-			netif_warn(efx, drv, efx->net_dev,
-				   "Warning: unable to restore a filter with specific RSS context.\n");
+		if (spec->rss_context)
+			ctx = efx_find_rss_context_entry(spec->rss_context,
+							 &efx->rss_context.list);
+		else
+			ctx = &efx->rss_context;
+		if (spec->flags & EFX_FILTER_FLAG_RX_RSS) {
+			if (!ctx) {
+				netif_warn(efx, drv, efx->net_dev,
+					   "Warning: unable to restore a filter with nonexistent RSS context %u.\n",
+					   spec->rss_context);
+				invalid_filters++;
+				goto not_restored;
+			}
+			if (ctx->context_id == EFX_EF10_RSS_CONTEXT_INVALID) {
+				netif_warn(efx, drv, efx->net_dev,
+					   "Warning: unable to restore a filter with RSS context %u as it was not created.\n",
+					   spec->rss_context);
+				invalid_filters++;
+				goto not_restored;
+			}
+		}
 
 		table->entry[filter_idx].spec |= EFX_EF10_FILTER_FLAG_BUSY;
 		spin_unlock_bh(&efx->filter_lock);
 
 		rc = efx_ef10_filter_push(efx, spec,
 					  &table->entry[filter_idx].handle,
-					  false);
+					  ctx, false);
 		if (rc)
 			failed++;
 		spin_lock_bh(&efx->filter_lock);
@@ -6784,6 +6870,9 @@ const struct efx_nic_type efx_hunt_a0_nic_type = {
 	.tx_limit_len = efx_ef10_tx_limit_len,
 	.rx_push_rss_config = efx_ef10_pf_rx_push_rss_config,
 	.rx_pull_rss_config = efx_ef10_rx_pull_rss_config,
+	.rx_push_rss_context_config = efx_ef10_rx_push_rss_context_config,
+	.rx_pull_rss_context_config = efx_ef10_rx_pull_rss_context_config,
+	.rx_restore_rss_contexts = efx_ef10_rx_restore_rss_contexts,
 	.rx_probe = efx_ef10_rx_probe,
 	.rx_init = efx_ef10_rx_init,
 	.rx_remove = efx_ef10_rx_remove,
diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 16757cfc5b29..7321a4cf6f4d 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -1353,12 +1353,13 @@ static void efx_fini_io(struct efx_nic *efx)
 		pci_disable_device(efx->pci_dev);
 }
 
-void efx_set_default_rx_indir_table(struct efx_nic *efx)
+void efx_set_default_rx_indir_table(struct efx_nic *efx,
+				    struct efx_rss_context *ctx)
 {
 	size_t i;
 
-	for (i = 0; i < ARRAY_SIZE(efx->rx_indir_table); i++)
-		efx->rx_indir_table[i] =
+	for (i = 0; i < ARRAY_SIZE(ctx->rx_indir_table); i++)
+		ctx->rx_indir_table[i] =
 			ethtool_rxfh_indir_default(i, efx->rss_spread);
 }
 
@@ -1739,9 +1740,9 @@ static int efx_probe_nic(struct efx_nic *efx)
 	} while (rc == -EAGAIN);
 
 	if (efx->n_channels > 1)
-		netdev_rss_key_fill(&efx->rx_hash_key,
-				    sizeof(efx->rx_hash_key));
-	efx_set_default_rx_indir_table(efx);
+		netdev_rss_key_fill(efx->rss_context.rx_hash_key,
+				    sizeof(efx->rss_context.rx_hash_key));
+	efx_set_default_rx_indir_table(efx, &efx->rss_context);
 
 	netif_set_real_num_tx_queues(efx->net_dev, efx->n_tx_channels);
 	netif_set_real_num_rx_queues(efx->net_dev, efx->n_rx_channels);
@@ -2700,6 +2701,8 @@ int efx_reset_up(struct efx_nic *efx, enum reset_type method, bool ok)
 			   " VFs may not function\n", rc);
 #endif
 
+	if (efx->type->rx_restore_rss_contexts)
+		efx->type->rx_restore_rss_contexts(efx);
 	down_read(&efx->filter_sem);
 	efx_restore_filters(efx);
 	up_read(&efx->filter_sem);
@@ -3003,6 +3006,7 @@ static int efx_init_struct(struct efx_nic *efx,
 		efx->type->rx_hash_offset - efx->type->rx_prefix_size;
 	efx->rx_packet_ts_offset =
 		efx->type->rx_ts_offset - efx->type->rx_prefix_size;
+	INIT_LIST_HEAD(&efx->rss_context.list);
 	spin_lock_init(&efx->stats_lock);
 	efx->vi_stride = EFX_DEFAULT_VI_STRIDE;
 	efx->num_mac_stats = MC_CMD_MAC_NSTATS;
@@ -3072,6 +3076,55 @@ void efx_update_sw_stats(struct efx_nic *efx, u64 *stats)
 	stats[GENERIC_STAT_rx_noskb_drops] = atomic_read(&efx->n_rx_noskb_drops);
 }
 
+/* RSS contexts.  We're using linked lists and crappy O(n) algorithms, because
+ * (a) this is an infrequent control-plane operation and (b) n is small (max 64)
+ */
+struct efx_rss_context *efx_alloc_rss_context_entry(struct list_head *head)
+{
+	struct efx_rss_context *ctx, *new;
+	u32 id = 1; /* Don't use zero, that refers to the master RSS context */
+
+	/* Search for first gap in the numbering */
+	list_for_each_entry(ctx, head, list) {
+		if (ctx->user_id != id)
+			break;
+		id++;
+		/* Check for wrap.  If this happens, we have nearly 2^32
+		 * allocated RSS contexts, which seems unlikely.
+		 */
+		if (WARN_ON_ONCE(!id))
+			return NULL;
+	}
+
+	/* Create the new entry */
+	new = kmalloc(sizeof(struct efx_rss_context), GFP_KERNEL);
+	if (!new)
+		return NULL;
+	new->context_id = EFX_EF10_RSS_CONTEXT_INVALID;
+	new->rx_hash_udp_4tuple = false;
+
+	/* Insert the new entry into the gap */
+	new->user_id = id;
+	list_add_tail(&new->list, &ctx->list);
+	return new;
+}
+
+struct efx_rss_context *efx_find_rss_context_entry(u32 id, struct list_head *head)
+{
+	struct efx_rss_context *ctx;
+
+	list_for_each_entry(ctx, head, list)
+		if (ctx->user_id == id)
+			return ctx;
+	return NULL;
+}
+
+void efx_free_rss_context_entry(struct efx_rss_context *ctx)
+{
+	list_del(&ctx->list);
+	kfree(ctx);
+}
+
 /**************************************************************************
  *
  * PCI interface
diff --git a/drivers/net/ethernet/sfc/efx.h b/drivers/net/ethernet/sfc/efx.h
index 0cddc5ad77b1..3429ae3f3b08 100644
--- a/drivers/net/ethernet/sfc/efx.h
+++ b/drivers/net/ethernet/sfc/efx.h
@@ -34,7 +34,8 @@ extern unsigned int efx_piobuf_size;
 extern bool efx_separate_tx_channels;
 
 /* RX */
-void efx_set_default_rx_indir_table(struct efx_nic *efx);
+void efx_set_default_rx_indir_table(struct efx_nic *efx,
+				    struct efx_rss_context *ctx);
 void efx_rx_config_page_split(struct efx_nic *efx);
 int efx_probe_rx_queue(struct efx_rx_queue *rx_queue);
 void efx_remove_rx_queue(struct efx_rx_queue *rx_queue);
@@ -182,6 +183,15 @@ static inline void efx_filter_rfs_expire(struct efx_channel *channel) {}
 #endif
 bool efx_filter_is_mc_recipient(const struct efx_filter_spec *spec);
 
+/* RSS contexts */
+struct efx_rss_context *efx_alloc_rss_context_entry(struct list_head *list);
+struct efx_rss_context *efx_find_rss_context_entry(u32 id, struct list_head *list);
+void efx_free_rss_context_entry(struct efx_rss_context *ctx);
+static inline bool efx_rss_active(struct efx_rss_context *ctx)
+{
+	return ctx->context_id != EFX_EF10_RSS_CONTEXT_INVALID;
+}
+
 /* Channels */
 int efx_channel_dummy_op_int(struct efx_channel *channel);
 void efx_channel_dummy_op_void(struct efx_channel *channel);
diff --git a/drivers/net/ethernet/sfc/ethtool.c b/drivers/net/ethernet/sfc/ethtool.c
index 4db2dc2bf52f..64049e71e6e7 100644
--- a/drivers/net/ethernet/sfc/ethtool.c
+++ b/drivers/net/ethernet/sfc/ethtool.c
@@ -808,7 +808,8 @@ static inline void ip6_fill_mask(__be32 *mask)
 }
 
 static int efx_ethtool_get_class_rule(struct efx_nic *efx,
-				      struct ethtool_rx_flow_spec *rule)
+				      struct ethtool_rx_flow_spec *rule,
+				      u32 *rss_context)
 {
 	struct ethtool_tcpip4_spec *ip_entry = &rule->h_u.tcp_ip4_spec;
 	struct ethtool_tcpip4_spec *ip_mask = &rule->m_u.tcp_ip4_spec;
@@ -964,6 +965,11 @@ static int efx_ethtool_get_class_rule(struct efx_nic *efx,
 		rule->m_ext.vlan_tci = htons(0xfff);
 	}
 
+	if (spec.flags & EFX_FILTER_FLAG_RX_RSS) {
+		rule->flow_type |= FLOW_RSS;
+		*rss_context = spec.rss_context;
+	}
+
 	return rc;
 }
 
@@ -972,6 +978,8 @@ efx_ethtool_get_rxnfc(struct net_device *net_dev,
 		      struct ethtool_rxnfc *info, u32 *rule_locs)
 {
 	struct efx_nic *efx = netdev_priv(net_dev);
+	u32 rss_context = 0;
+	s32 rc;
 
 	switch (info->cmd) {
 	case ETHTOOL_GRXRINGS:
@@ -979,12 +987,20 @@ efx_ethtool_get_rxnfc(struct net_device *net_dev,
 		return 0;
 
 	case ETHTOOL_GRXFH: {
+		struct efx_rss_context *ctx = &efx->rss_context;
+
+		if (info->flow_type & FLOW_RSS && info->rss_context) {
+			ctx = efx_find_rss_context_entry(info->rss_context,
+							 &efx->rss_context.list);
+			if (!ctx)
+				return -ENOENT;
+		}
 		info->data = 0;
-		if (!efx->rss_active) /* No RSS */
+		if (!efx_rss_active(ctx)) /* No RSS */
 			return 0;
-		switch (info->flow_type) {
+		switch (info->flow_type & ~FLOW_RSS) {
 		case UDP_V4_FLOW:
-			if (efx->rx_hash_udp_4tuple)
+			if (ctx->rx_hash_udp_4tuple)
 				/* fall through */
 		case TCP_V4_FLOW:
 				info->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
@@ -995,7 +1011,7 @@ efx_ethtool_get_rxnfc(struct net_device *net_dev,
 			info->data |= RXH_IP_SRC | RXH_IP_DST;
 			break;
 		case UDP_V6_FLOW:
-			if (efx->rx_hash_udp_4tuple)
+			if (ctx->rx_hash_udp_4tuple)
 				/* fall through */
 		case TCP_V6_FLOW:
 				info->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
@@ -1023,10 +1039,14 @@ efx_ethtool_get_rxnfc(struct net_device *net_dev,
 	case ETHTOOL_GRXCLSRULE:
 		if (efx_filter_get_rx_id_limit(efx) == 0)
 			return -EOPNOTSUPP;
-		return efx_ethtool_get_class_rule(efx, &info->fs);
+		rc = efx_ethtool_get_class_rule(efx, &info->fs, &rss_context);
+		if (rc < 0)
+			return rc;
+		if (info->fs.flow_type & FLOW_RSS)
+			info->rss_context = rss_context;
+		return 0;
 
-	case ETHTOOL_GRXCLSRLALL: {
-		s32 rc;
+	case ETHTOOL_GRXCLSRLALL:
 		info->data = efx_filter_get_rx_id_limit(efx);
 		if (info->data == 0)
 			return -EOPNOTSUPP;
@@ -1036,7 +1056,6 @@ efx_ethtool_get_rxnfc(struct net_device *net_dev,
 			return rc;
 		info->rule_cnt = rc;
 		return 0;
-	}
 
 	default:
 		return -EOPNOTSUPP;
@@ -1054,7 +1073,8 @@ static inline bool ip6_mask_is_empty(__be32 mask[4])
 }
 
 static int efx_ethtool_set_class_rule(struct efx_nic *efx,
-				      struct ethtool_rx_flow_spec *rule)
+				      struct ethtool_rx_flow_spec *rule,
+				      u32 rss_context)
 {
 	struct ethtool_tcpip4_spec *ip_entry = &rule->h_u.tcp_ip4_spec;
 	struct ethtool_tcpip4_spec *ip_mask = &rule->m_u.tcp_ip4_spec;
@@ -1066,6 +1086,7 @@ static int efx_ethtool_set_class_rule(struct efx_nic *efx,
 	struct ethtool_usrip6_spec *uip6_mask = &rule->m_u.usr_ip6_spec;
 	struct ethhdr *mac_entry = &rule->h_u.ether_spec;
 	struct ethhdr *mac_mask = &rule->m_u.ether_spec;
+	enum efx_filter_flags flags = 0;
 	struct efx_filter_spec spec;
 	int rc;
 
@@ -1084,12 +1105,19 @@ static int efx_ethtool_set_class_rule(struct efx_nic *efx,
 	     rule->m_ext.data[1]))
 		return -EINVAL;
 
-	efx_filter_init_rx(&spec, EFX_FILTER_PRI_MANUAL,
-			   efx->rx_scatter ? EFX_FILTER_FLAG_RX_SCATTER : 0,
+	if (efx->rx_scatter)
+		flags |= EFX_FILTER_FLAG_RX_SCATTER;
+	if (rule->flow_type & FLOW_RSS)
+		flags |= EFX_FILTER_FLAG_RX_RSS;
+
+	efx_filter_init_rx(&spec, EFX_FILTER_PRI_MANUAL, flags,
 			   (rule->ring_cookie == RX_CLS_FLOW_DISC) ?
 			   EFX_FILTER_RX_DMAQ_ID_DROP : rule->ring_cookie);
 
-	switch (rule->flow_type & ~FLOW_EXT) {
+	if (rule->flow_type & FLOW_RSS)
+		spec.rss_context = rss_context;
+
+	switch (rule->flow_type & ~(FLOW_EXT | FLOW_RSS)) {
 	case TCP_V4_FLOW:
 	case UDP_V4_FLOW:
 		spec.match_flags = (EFX_FILTER_MATCH_ETHER_TYPE |
@@ -1265,7 +1293,8 @@ static int efx_ethtool_set_rxnfc(struct net_device *net_dev,
 
 	switch (info->cmd) {
 	case ETHTOOL_SRXCLSRLINS:
-		return efx_ethtool_set_class_rule(efx, &info->fs);
+		return efx_ethtool_set_class_rule(efx, &info->fs,
+						  info->rss_context);
 
 	case ETHTOOL_SRXCLSRLDEL:
 		return efx_filter_remove_id_safe(efx, EFX_FILTER_PRI_MANUAL,
@@ -1280,7 +1309,9 @@ static u32 efx_ethtool_get_rxfh_indir_size(struct net_device *net_dev)
 {
 	struct efx_nic *efx = netdev_priv(net_dev);
 
-	return (efx->n_rx_channels == 1) ? 0 : ARRAY_SIZE(efx->rx_indir_table);
+	if (efx->n_rx_channels == 1)
+		return 0;
+	return ARRAY_SIZE(efx->rss_context.rx_indir_table);
 }
 
 static u32 efx_ethtool_get_rxfh_key_size(struct net_device *net_dev)
@@ -1303,9 +1334,11 @@ static int efx_ethtool_get_rxfh(struct net_device *net_dev, u32 *indir, u8 *key,
 	if (hfunc)
 		*hfunc = ETH_RSS_HASH_TOP;
 	if (indir)
-		memcpy(indir, efx->rx_indir_table, sizeof(efx->rx_indir_table));
+		memcpy(indir, efx->rss_context.rx_indir_table,
+		       sizeof(efx->rss_context.rx_indir_table));
 	if (key)
-		memcpy(key, efx->rx_hash_key, efx->type->rx_hash_key_size);
+		memcpy(key, efx->rss_context.rx_hash_key,
+		       efx->type->rx_hash_key_size);
 	return 0;
 }
 
@@ -1321,13 +1354,93 @@ static int efx_ethtool_set_rxfh(struct net_device *net_dev, const u32 *indir,
 		return 0;
 
 	if (!key)
-		key = efx->rx_hash_key;
+		key = efx->rss_context.rx_hash_key;
 	if (!indir)
-		indir = efx->rx_indir_table;
+		indir = efx->rss_context.rx_indir_table;
 
 	return efx->type->rx_push_rss_config(efx, true, indir, key);
 }
 
+static int efx_ethtool_get_rxfh_context(struct net_device *net_dev, u32 *indir,
+					u8 *key, u8 *hfunc, u32 rss_context)
+{
+	struct efx_nic *efx = netdev_priv(net_dev);
+	struct efx_rss_context *ctx;
+	int rc;
+
+	if (!efx->type->rx_pull_rss_context_config)
+		return -EOPNOTSUPP;
+	ctx = efx_find_rss_context_entry(rss_context, &efx->rss_context.list);
+	if (!ctx)
+		return -ENOENT;
+	rc = efx->type->rx_pull_rss_context_config(efx, ctx);
+	if (rc)
+		return rc;
+
+	if (hfunc)
+		*hfunc = ETH_RSS_HASH_TOP;
+	if (indir)
+		memcpy(indir, ctx->rx_indir_table, sizeof(ctx->rx_indir_table));
+	if (key)
+		memcpy(key, ctx->rx_hash_key, efx->type->rx_hash_key_size);
+	return 0;
+}
+
+static int efx_ethtool_set_rxfh_context(struct net_device *net_dev,
+					const u32 *indir, const u8 *key,
+					const u8 hfunc, u32 *rss_context,
+					bool delete)
+{
+	struct efx_nic *efx = netdev_priv(net_dev);
+	struct efx_rss_context *ctx;
+	bool allocated = false;
+	int rc;
+
+	if (!efx->type->rx_push_rss_context_config)
+		return -EOPNOTSUPP;
+	/* Hash function is Toeplitz, cannot be changed */
+	if (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP)
+		return -EOPNOTSUPP;
+	if (*rss_context == ETH_RXFH_CONTEXT_ALLOC) {
+		if (delete)
+			/* alloc + delete == Nothing to do */
+			return -EINVAL;
+		ctx = efx_alloc_rss_context_entry(&efx->rss_context.list);
+		if (!ctx)
+			return -ENOMEM;
+		ctx->context_id = EFX_EF10_RSS_CONTEXT_INVALID;
+		/* Initialise indir table and key to defaults */
+		efx_set_default_rx_indir_table(efx, ctx);
+		netdev_rss_key_fill(ctx->rx_hash_key, sizeof(ctx->rx_hash_key));
+		allocated = true;
+	} else {
+		ctx = efx_find_rss_context_entry(*rss_context,
+						 &efx->rss_context.list);
+		if (!ctx)
+			return -ENOENT;
+	}
+
+	if (delete) {
+		/* delete this context */
+		rc = efx->type->rx_push_rss_context_config(efx, ctx, NULL, NULL);
+		if (!rc)
+			efx_free_rss_context_entry(ctx);
+		return rc;
+	}
+
+	if (!key)
+		key = ctx->rx_hash_key;
+	if (!indir)
+		indir = ctx->rx_indir_table;
+
+	rc = efx->type->rx_push_rss_context_config(efx, ctx, indir, key);
+	if (rc && allocated)
+		efx_free_rss_context_entry(ctx);
+	else
+		*rss_context = ctx->user_id;
+	return rc;
+}
+
 static int efx_ethtool_get_ts_info(struct net_device *net_dev,
 				   struct ethtool_ts_info *ts_info)
 {
@@ -1403,6 +1516,8 @@ const struct ethtool_ops efx_ethtool_ops = {
 	.get_rxfh_key_size	= efx_ethtool_get_rxfh_key_size,
 	.get_rxfh		= efx_ethtool_get_rxfh,
 	.set_rxfh		= efx_ethtool_set_rxfh,
+	.get_rxfh_context	= efx_ethtool_get_rxfh_context,
+	.set_rxfh_context	= efx_ethtool_set_rxfh_context,
 	.get_ts_info		= efx_ethtool_get_ts_info,
 	.get_module_info	= efx_ethtool_get_module_info,
 	.get_module_eeprom	= efx_ethtool_get_module_eeprom,
diff --git a/drivers/net/ethernet/sfc/farch.c b/drivers/net/ethernet/sfc/farch.c
index 266b9bee1f3a..ad001e77d554 100644
--- a/drivers/net/ethernet/sfc/farch.c
+++ b/drivers/net/ethernet/sfc/farch.c
@@ -1630,12 +1630,12 @@ void efx_farch_rx_push_indir_table(struct efx_nic *efx)
 	size_t i = 0;
 	efx_dword_t dword;
 
-	BUILD_BUG_ON(ARRAY_SIZE(efx->rx_indir_table) !=
+	BUILD_BUG_ON(ARRAY_SIZE(efx->rss_context.rx_indir_table) !=
 		     FR_BZ_RX_INDIRECTION_TBL_ROWS);
 
 	for (i = 0; i < FR_BZ_RX_INDIRECTION_TBL_ROWS; i++) {
 		EFX_POPULATE_DWORD_1(dword, FRF_BZ_IT_QUEUE,
-				     efx->rx_indir_table[i]);
+				     efx->rss_context.rx_indir_table[i]);
 		efx_writed(efx, &dword,
 			   FR_BZ_RX_INDIRECTION_TBL +
 			   FR_BZ_RX_INDIRECTION_TBL_STEP * i);
@@ -1647,14 +1647,14 @@ void efx_farch_rx_pull_indir_table(struct efx_nic *efx)
 	size_t i = 0;
 	efx_dword_t dword;
 
-	BUILD_BUG_ON(ARRAY_SIZE(efx->rx_indir_table) !=
+	BUILD_BUG_ON(ARRAY_SIZE(efx->rss_context.rx_indir_table) !=
 		     FR_BZ_RX_INDIRECTION_TBL_ROWS);
 
 	for (i = 0; i < FR_BZ_RX_INDIRECTION_TBL_ROWS; i++) {
 		efx_readd(efx, &dword,
 			   FR_BZ_RX_INDIRECTION_TBL +
 			   FR_BZ_RX_INDIRECTION_TBL_STEP * i);
-		efx->rx_indir_table[i] = EFX_DWORD_FIELD(dword, FRF_BZ_IT_QUEUE);
+		efx->rss_context.rx_indir_table[i] = EFX_DWORD_FIELD(dword, FRF_BZ_IT_QUEUE);
 	}
 }
 
@@ -2032,8 +2032,7 @@ efx_farch_filter_from_gen_spec(struct efx_farch_filter_spec *spec,
 {
 	bool is_full = false;
 
-	if ((gen_spec->flags & EFX_FILTER_FLAG_RX_RSS) &&
-	    gen_spec->rss_context != EFX_FILTER_RSS_CONTEXT_DEFAULT)
+	if ((gen_spec->flags & EFX_FILTER_FLAG_RX_RSS) && gen_spec->rss_context)
 		return -EINVAL;
 
 	spec->priority = gen_spec->priority;
diff --git a/drivers/net/ethernet/sfc/filter.h b/drivers/net/ethernet/sfc/filter.h
index 8189a1cd973f..59021ad6d98d 100644
--- a/drivers/net/ethernet/sfc/filter.h
+++ b/drivers/net/ethernet/sfc/filter.h
@@ -125,7 +125,9 @@ enum efx_encap_type {
  * @match_flags: Match type flags, from &enum efx_filter_match_flags
  * @priority: Priority of the filter, from &enum efx_filter_priority
  * @flags: Miscellaneous flags, from &enum efx_filter_flags
- * @rss_context: RSS context to use, if %EFX_FILTER_FLAG_RX_RSS is set
+ * @rss_context: RSS context to use, if %EFX_FILTER_FLAG_RX_RSS is set.  This
+ *	is a user_id (with 0 meaning the driver/default RSS context), not an
+ *	MCFW context_id.
  * @dmaq_id: Source/target queue index, or %EFX_FILTER_RX_DMAQ_ID_DROP for
  *	an RX drop filter
  * @outer_vid: Outer VLAN ID to match, if %EFX_FILTER_MATCH_OUTER_VID is set
@@ -173,7 +175,6 @@ struct efx_filter_spec {
 };
 
 enum {
-	EFX_FILTER_RSS_CONTEXT_DEFAULT = 0xffffffff,
 	EFX_FILTER_RX_DMAQ_ID_DROP = 0xfff
 };
 
@@ -185,7 +186,7 @@ static inline void efx_filter_init_rx(struct efx_filter_spec *spec,
 	memset(spec, 0, sizeof(*spec));
 	spec->priority = priority;
 	spec->flags = EFX_FILTER_FLAG_RX | flags;
-	spec->rss_context = EFX_FILTER_RSS_CONTEXT_DEFAULT;
+	spec->rss_context = 0;
 	spec->dmaq_id = rxq_id;
 }
 
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index d20a8660ee48..203d64c88de5 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -704,6 +704,28 @@ union efx_multicast_hash {
 
 struct vfdi_status;
 
+/* The reserved RSS context value */
+#define EFX_EF10_RSS_CONTEXT_INVALID	0xffffffff
+/**
+ * struct efx_rss_context - A user-defined RSS context for filtering
+ * @list: node of linked list on which this struct is stored
+ * @context_id: the RSS_CONTEXT_ID returned by MC firmware, or
+ *	%EFX_EF10_RSS_CONTEXT_INVALID if this context is not present on the NIC.
+ *	For Siena, 0 if RSS is active, else %EFX_EF10_RSS_CONTEXT_INVALID.
+ * @user_id: the rss_context ID exposed to userspace over ethtool.
+ * @rx_hash_udp_4tuple: UDP 4-tuple hashing enabled
+ * @rx_hash_key: Toeplitz hash key for this RSS context
+ * @indir_table: Indirection table for this RSS context
+ */
+struct efx_rss_context {
+	struct list_head list;
+	u32 context_id;
+	u32 user_id;
+	bool rx_hash_udp_4tuple;
+	u8 rx_hash_key[40];
+	u32 rx_indir_table[128];
+};
+
 /**
  * struct efx_nic - an Efx NIC
  * @name: Device name (net device name or bus id before net device registered)
@@ -764,11 +786,9 @@ struct vfdi_status;
  *	(valid only for NICs that set %EFX_RX_PKT_PREFIX_LEN; always negative)
  * @rx_packet_ts_offset: Offset of timestamp from start of packet data
  *	(valid only if channel->sync_timestamps_enabled; always negative)
- * @rx_hash_key: Toeplitz hash key for RSS
- * @rx_indir_table: Indirection table for RSS
  * @rx_scatter: Scatter mode enabled for receives
- * @rss_active: RSS enabled on hardware
- * @rx_hash_udp_4tuple: UDP 4-tuple hashing enabled
+ * @rss_context: Main RSS context.  Its @list member is the head of the list of
+ *	RSS contexts created by user requests
  * @int_error_count: Number of internal errors seen recently
  * @int_error_expire: Time at which error count will be expired
  * @irq_soft_enabled: Are IRQs soft-enabled? If not, IRQ handler will
@@ -909,11 +929,8 @@ struct efx_nic {
 	int rx_packet_hash_offset;
 	int rx_packet_len_offset;
 	int rx_packet_ts_offset;
-	u8 rx_hash_key[40];
-	u32 rx_indir_table[128];
 	bool rx_scatter;
-	bool rss_active;
-	bool rx_hash_udp_4tuple;
+	struct efx_rss_context rss_context;
 
 	unsigned int_error_count;
 	unsigned long int_error_expire;
@@ -1099,6 +1116,10 @@ struct efx_udp_tunnel {
  * @tx_write: Write TX descriptors and doorbell
  * @rx_push_rss_config: Write RSS hash key and indirection table to the NIC
  * @rx_pull_rss_config: Read RSS hash key and indirection table back from the NIC
+ * @rx_push_rss_context_config: Write RSS hash key and indirection table for
+ *	user RSS context to the NIC
+ * @rx_pull_rss_context_config: Read RSS hash key and indirection table for user
+ *	RSS context back from the NIC
  * @rx_probe: Allocate resources for RX queue
  * @rx_init: Initialise RX queue on the NIC
  * @rx_remove: Free resources for RX queue
@@ -1237,6 +1258,13 @@ struct efx_nic_type {
 	int (*rx_push_rss_config)(struct efx_nic *efx, bool user,
 				  const u32 *rx_indir_table, const u8 *key);
 	int (*rx_pull_rss_config)(struct efx_nic *efx);
+	int (*rx_push_rss_context_config)(struct efx_nic *efx,
+					  struct efx_rss_context *ctx,
+					  const u32 *rx_indir_table,
+					  const u8 *key);
+	int (*rx_pull_rss_context_config)(struct efx_nic *efx,
+					  struct efx_rss_context *ctx);
+	void (*rx_restore_rss_contexts)(struct efx_nic *efx);
 	int (*rx_probe)(struct efx_rx_queue *rx_queue);
 	void (*rx_init)(struct efx_rx_queue *rx_queue);
 	void (*rx_remove)(struct efx_rx_queue *rx_queue);
diff --git a/drivers/net/ethernet/sfc/nic.h b/drivers/net/ethernet/sfc/nic.h
index 6549fc685a48..d080a414e8f2 100644
--- a/drivers/net/ethernet/sfc/nic.h
+++ b/drivers/net/ethernet/sfc/nic.h
@@ -374,7 +374,6 @@ enum {
  * @piobuf_size: size of a single PIO buffer
  * @must_restore_piobufs: Flag: PIO buffers have yet to be restored after MC
  *	reboot
- * @rx_rss_context: Firmware handle for our RSS context
  * @rx_rss_context_exclusive: Whether our RSS context is exclusive or shared
  * @stats: Hardware statistics
  * @workaround_35388: Flag: firmware supports workaround for bug 35388
@@ -415,7 +414,6 @@ struct efx_ef10_nic_data {
 	unsigned int piobuf_handle[EF10_TX_PIOBUF_COUNT];
 	u16 piobuf_size;
 	bool must_restore_piobufs;
-	u32 rx_rss_context;
 	bool rx_rss_context_exclusive;
 	u64 stats[EF10_STAT_COUNT];
 	bool workaround_35388;
diff --git a/drivers/net/ethernet/sfc/siena.c b/drivers/net/ethernet/sfc/siena.c
index ae8645ae4492..18aab25234ba 100644
--- a/drivers/net/ethernet/sfc/siena.c
+++ b/drivers/net/ethernet/sfc/siena.c
@@ -350,11 +350,11 @@ static int siena_rx_pull_rss_config(struct efx_nic *efx)
 	 * siena_rx_push_rss_config, below)
 	 */
 	efx_reado(efx, &temp, FR_CZ_RX_RSS_IPV6_REG1);
-	memcpy(efx->rx_hash_key, &temp, sizeof(temp));
+	memcpy(efx->rss_context.rx_hash_key, &temp, sizeof(temp));
 	efx_reado(efx, &temp, FR_CZ_RX_RSS_IPV6_REG2);
-	memcpy(efx->rx_hash_key + sizeof(temp), &temp, sizeof(temp));
+	memcpy(efx->rss_context.rx_hash_key + sizeof(temp), &temp, sizeof(temp));
 	efx_reado(efx, &temp, FR_CZ_RX_RSS_IPV6_REG3);
-	memcpy(efx->rx_hash_key + 2 * sizeof(temp), &temp,
+	memcpy(efx->rss_context.rx_hash_key + 2 * sizeof(temp), &temp,
 	       FRF_CZ_RX_RSS_IPV6_TKEY_HI_WIDTH / 8);
 	efx_farch_rx_pull_indir_table(efx);
 	return 0;
@@ -367,26 +367,26 @@ static int siena_rx_push_rss_config(struct efx_nic *efx, bool user,
 
 	/* Set hash key for IPv4 */
 	if (key)
-		memcpy(efx->rx_hash_key, key, sizeof(temp));
-	memcpy(&temp, efx->rx_hash_key, sizeof(temp));
+		memcpy(efx->rss_context.rx_hash_key, key, sizeof(temp));
+	memcpy(&temp, efx->rss_context.rx_hash_key, sizeof(temp));
 	efx_writeo(efx, &temp, FR_BZ_RX_RSS_TKEY);
 
 	/* Enable IPv6 RSS */
-	BUILD_BUG_ON(sizeof(efx->rx_hash_key) <
+	BUILD_BUG_ON(sizeof(efx->rss_context.rx_hash_key) <
 		     2 * sizeof(temp) + FRF_CZ_RX_RSS_IPV6_TKEY_HI_WIDTH / 8 ||
 		     FRF_CZ_RX_RSS_IPV6_TKEY_HI_LBN != 0);
-	memcpy(&temp, efx->rx_hash_key, sizeof(temp));
+	memcpy(&temp, efx->rss_context.rx_hash_key, sizeof(temp));
 	efx_writeo(efx, &temp, FR_CZ_RX_RSS_IPV6_REG1);
-	memcpy(&temp, efx->rx_hash_key + sizeof(temp), sizeof(temp));
+	memcpy(&temp, efx->rss_context.rx_hash_key + sizeof(temp), sizeof(temp));
 	efx_writeo(efx, &temp, FR_CZ_RX_RSS_IPV6_REG2);
 	EFX_POPULATE_OWORD_2(temp, FRF_CZ_RX_RSS_IPV6_THASH_ENABLE, 1,
 			     FRF_CZ_RX_RSS_IPV6_IP_THASH_ENABLE, 1);
-	memcpy(&temp, efx->rx_hash_key + 2 * sizeof(temp),
+	memcpy(&temp, efx->rss_context.rx_hash_key + 2 * sizeof(temp),
 	       FRF_CZ_RX_RSS_IPV6_TKEY_HI_WIDTH / 8);
 	efx_writeo(efx, &temp, FR_CZ_RX_RSS_IPV6_REG3);
 
-	memcpy(efx->rx_indir_table, rx_indir_table,
-	       sizeof(efx->rx_indir_table));
+	memcpy(efx->rss_context.rx_indir_table, rx_indir_table,
+	       sizeof(efx->rss_context.rx_indir_table));
 	efx_farch_rx_push_indir_table(efx);
 
 	return 0;
@@ -432,8 +432,8 @@ static int siena_init_nic(struct efx_nic *efx)
 			    EFX_RX_USR_BUF_SIZE >> 5);
 	efx_writeo(efx, &temp, FR_AZ_RX_CFG);
 
-	siena_rx_push_rss_config(efx, false, efx->rx_indir_table, NULL);
-	efx->rss_active = true;
+	siena_rx_push_rss_config(efx, false, efx->rss_context.rx_indir_table, NULL);
+	efx->rss_context.context_id = 0; /* indicates RSS is active */
 
 	/* Enable event logging */
 	rc = efx_mcdi_log_ctrl(efx, true, false, 0);

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH RESEND net-next 0/2] ntuple filters with RSS
  2018-02-27 17:55   ` Edward Cree
@ 2018-02-27 19:28     ` John W. Linville
  0 siblings, 0 replies; 18+ messages in thread
From: John W. Linville @ 2018-02-27 19:28 UTC (permalink / raw)
  To: Edward Cree; +Cc: David Miller, linux-net-drivers, netdev

On Tue, Feb 27, 2018 at 05:55:51PM +0000, Edward Cree wrote:
> On 27/02/18 17:38, David Miller wrote:
> > The problem is there are syntax errors in your email headers.
> >
> > Any time a person's name contains a special character like ".",
> > that entire string must be enclosed in double quotes.
> >
> > This is the case for "John W. Linville" so please add proper
> > quotes around such names and resend your patch series again.
> Thank you for spotting this!� I looked at the headers and failed
> �to notice anything wrong with them.
> I'm surprised that git-imap-send doesn't check for this...
> 
> Will resend with that fixed.

Haha, sorry for indirectly causing this issue! If it helps, you can
leave-off the "W." -- I'll still know its for me... :-)

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RESEND net-next 0/2] ntuple filters with RSS
  2018-02-27 17:59 [PATCH RESEND net-next 0/2] ntuple filters with RSS Edward Cree
  2018-02-27 18:02 ` [PATCH net-next 1/2] net: ethtool: extend RXNFC API to support RSS spreading of filter matches Edward Cree
  2018-02-27 18:03 ` [PATCH net-next 2/2] sfc: support RSS spreading of ethtool ntuple filters Edward Cree
@ 2018-02-27 23:47 ` Jakub Kicinski
  2018-02-28  1:24   ` Alexander Duyck
  2018-03-01 18:36 ` David Miller
  3 siblings, 1 reply; 18+ messages in thread
From: Jakub Kicinski @ 2018-02-27 23:47 UTC (permalink / raw)
  To: Edward Cree
  Cc: linux-net-drivers, David Miller, netdev, John W. Linville,
	Or Gerlitz, Alexander Duyck

On Tue, 27 Feb 2018 17:59:12 +0000, Edward Cree wrote:
> This series introduces the ability to mark an ethtool steering filter to use
>  RSS spreading, and the ability to create and configure multiple RSS contexts
>  with different indirection tables, hash keys, and hash fields.
> An implementation for the sfc driver (for 7000-series and later SFC NICs) is
>  included in patch 2/2.
> 
> The anticipated use case of this feature is for steering traffic destined for
>  a container (or virtual machine) to the subset of CPUs on which processes in
>  the container (or the VM's vCPUs) are bound, while retaining the scalability
>  of RSS spreading from the viewpoint inside the container.
> The use of both a base queue number (ring_cookie) and indirection table is
>  intended to allow re-use of a single RSS context to target multiple sets of
>  CPUs.  For instance, if an 8-core system is hosting three containers on CPUs
>  [1,2], [3,4] and [6,7], then a single RSS context with an equal-weight [0,1]
>  indirection table could be used to target all three containers by setting
>  ring_cookie to 1, 3 and 6 on the respective filters.

Please, let's stop extending ethtool_rx_flow APIs.  I bit my tongue
when Intel was adding their "redirection to VF" based on ethtool ntuples
and look now they're adding the same functionality with flower :|  And
wonder how to handle two interfaces doing the same thing.

On the use case itself, I wonder how much sense that makes.  Can your
hardware not tag the packet as well so you could then mux it to
something like macvlan offload?

CC: Alex, Or

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RESEND net-next 0/2] ntuple filters with RSS
  2018-02-27 23:47 ` [PATCH RESEND net-next 0/2] ntuple filters with RSS Jakub Kicinski
@ 2018-02-28  1:24   ` Alexander Duyck
  2018-03-02 15:24     ` Edward Cree
  0 siblings, 1 reply; 18+ messages in thread
From: Alexander Duyck @ 2018-02-28  1:24 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Edward Cree, linux-net-drivers, David Miller, netdev,
	John W. Linville, Or Gerlitz, Alexander Duyck

On Tue, Feb 27, 2018 at 3:47 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
> On Tue, 27 Feb 2018 17:59:12 +0000, Edward Cree wrote:
>> This series introduces the ability to mark an ethtool steering filter to use
>>  RSS spreading, and the ability to create and configure multiple RSS contexts
>>  with different indirection tables, hash keys, and hash fields.
>> An implementation for the sfc driver (for 7000-series and later SFC NICs) is
>>  included in patch 2/2.
>>
>> The anticipated use case of this feature is for steering traffic destined for
>>  a container (or virtual machine) to the subset of CPUs on which processes in
>>  the container (or the VM's vCPUs) are bound, while retaining the scalability
>>  of RSS spreading from the viewpoint inside the container.
>> The use of both a base queue number (ring_cookie) and indirection table is
>>  intended to allow re-use of a single RSS context to target multiple sets of
>>  CPUs.  For instance, if an 8-core system is hosting three containers on CPUs
>>  [1,2], [3,4] and [6,7], then a single RSS context with an equal-weight [0,1]
>>  indirection table could be used to target all three containers by setting
>>  ring_cookie to 1, 3 and 6 on the respective filters.
>
> Please, let's stop extending ethtool_rx_flow APIs.  I bit my tongue
> when Intel was adding their "redirection to VF" based on ethtool ntuples
> and look now they're adding the same functionality with flower :|  And
> wonder how to handle two interfaces doing the same thing.
>
> On the use case itself, I wonder how much sense that makes.  Can your
> hardware not tag the packet as well so you could then mux it to
> something like macvlan offload?
>
> CC: Alex, Or

We did something like this for i40e. Basically we required creating
the queue groups using mqprio to keep them symmetric on Tx and Rx, and
then allowed for TC ingress filters to redirect traffic to those queue
groups.

- Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RESEND net-next 0/2] ntuple filters with RSS
  2018-02-27 17:59 [PATCH RESEND net-next 0/2] ntuple filters with RSS Edward Cree
                   ` (2 preceding siblings ...)
  2018-02-27 23:47 ` [PATCH RESEND net-next 0/2] ntuple filters with RSS Jakub Kicinski
@ 2018-03-01 18:36 ` David Miller
  2018-03-02 16:01   ` Edward Cree
  3 siblings, 1 reply; 18+ messages in thread
From: David Miller @ 2018-03-01 18:36 UTC (permalink / raw)
  To: ecree; +Cc: linux-net-drivers, netdev, linville

From: Edward Cree <ecree@solarflare.com>
Date: Tue, 27 Feb 2018 17:59:12 +0000

> This series introduces the ability to mark an ethtool steering filter to use
>  RSS spreading, and the ability to create and configure multiple RSS contexts
>  with different indirection tables, hash keys, and hash fields.
> An implementation for the sfc driver (for 7000-series and later SFC NICs) is
>  included in patch 2/2.
> 
> The anticipated use case of this feature is for steering traffic destined for
>  a container (or virtual machine) to the subset of CPUs on which processes in
>  the container (or the VM's vCPUs) are bound, while retaining the scalability
>  of RSS spreading from the viewpoint inside the container.
> The use of both a base queue number (ring_cookie) and indirection table is
>  intended to allow re-use of a single RSS context to target multiple sets of
>  CPUs.  For instance, if an 8-core system is hosting three containers on CPUs
>  [1,2], [3,4] and [6,7], then a single RSS context with an equal-weight [0,1]
>  indirection table could be used to target all three containers by setting
>  ring_cookie to 1, 3 and 6 on the respective filters.

We really should have the ethtool interfaces under deep freeze until we
convert it to netlink or similar.

Second, this is a real hackish way to extend ethtool with new
semantics.  A structure changes layout based upon a flag bit setting
in an earlier member?  Yikes...

Lastly, there has been feedback asking how practical and useful this
facility actually is, and you must address that.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 2/2] sfc: support RSS spreading of ethtool ntuple filters
  2018-02-27 18:03 ` [PATCH net-next 2/2] sfc: support RSS spreading of ethtool ntuple filters Edward Cree
@ 2018-03-01 20:51   ` kbuild test robot
  0 siblings, 0 replies; 18+ messages in thread
From: kbuild test robot @ 2018-03-01 20:51 UTC (permalink / raw)
  To: Edward Cree
  Cc: kbuild-all, linux-net-drivers, David Miller, netdev,
	John W. Linville

[-- Attachment #1: Type: text/plain, Size: 17927 bytes --]

Hi Edward,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Edward-Cree/ntuple-filters-with-RSS/20180302-031011
config: x86_64-rhel (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

Note: it may well be a FALSE warning. FWIW you are at least aware of it now.
http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings

All warnings (new ones prefixed by >>):

   drivers/net//ethernet/sfc/ef10.c: In function 'efx_ef10_filter_insert':
>> drivers/net//ethernet/sfc/ef10.c:4458:5: warning: 'ctx' may be used uninitialized in this function [-Wmaybe-uninitialized]
     rc = efx_ef10_filter_push(efx, spec, &table->entry[ins_index].handle,
     ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          ctx, replacing);
          ~~~~~~~~~~~~~~~

vim +/ctx +4458 drivers/net//ethernet/sfc/ef10.c

8127d661 Ben Hutchings    2013-08-29  4300  
8127d661 Ben Hutchings    2013-08-29  4301  static s32 efx_ef10_filter_insert(struct efx_nic *efx,
8127d661 Ben Hutchings    2013-08-29  4302  				  struct efx_filter_spec *spec,
8127d661 Ben Hutchings    2013-08-29  4303  				  bool replace_equal)
8127d661 Ben Hutchings    2013-08-29  4304  {
8127d661 Ben Hutchings    2013-08-29  4305  	struct efx_ef10_filter_table *table = efx->filter_state;
8127d661 Ben Hutchings    2013-08-29  4306  	DECLARE_BITMAP(mc_rem_map, EFX_EF10_FILTER_SEARCH_LIMIT);
8127d661 Ben Hutchings    2013-08-29  4307  	struct efx_filter_spec *saved_spec;
8127d661 Ben Hutchings    2013-08-29  4308  	unsigned int match_pri, hash;
87dec16f Edward Cree      2018-02-27  4309  	struct efx_rss_context *ctx;
8127d661 Ben Hutchings    2013-08-29  4310  	unsigned int priv_flags;
8127d661 Ben Hutchings    2013-08-29  4311  	bool replacing = false;
8127d661 Ben Hutchings    2013-08-29  4312  	int ins_index = -1;
8127d661 Ben Hutchings    2013-08-29  4313  	DEFINE_WAIT(wait);
8127d661 Ben Hutchings    2013-08-29  4314  	bool is_mc_recip;
8127d661 Ben Hutchings    2013-08-29  4315  	s32 rc;
8127d661 Ben Hutchings    2013-08-29  4316  
8127d661 Ben Hutchings    2013-08-29  4317  	/* For now, only support RX filters */
8127d661 Ben Hutchings    2013-08-29  4318  	if ((spec->flags & (EFX_FILTER_FLAG_RX | EFX_FILTER_FLAG_TX)) !=
8127d661 Ben Hutchings    2013-08-29  4319  	    EFX_FILTER_FLAG_RX)
8127d661 Ben Hutchings    2013-08-29  4320  		return -EINVAL;
8127d661 Ben Hutchings    2013-08-29  4321  
7ac0dd9d Andrew Rybchenko 2016-06-15  4322  	rc = efx_ef10_filter_pri(table, spec);
8127d661 Ben Hutchings    2013-08-29  4323  	if (rc < 0)
8127d661 Ben Hutchings    2013-08-29  4324  		return rc;
8127d661 Ben Hutchings    2013-08-29  4325  	match_pri = rc;
8127d661 Ben Hutchings    2013-08-29  4326  
8127d661 Ben Hutchings    2013-08-29  4327  	hash = efx_ef10_filter_hash(spec);
8127d661 Ben Hutchings    2013-08-29  4328  	is_mc_recip = efx_filter_is_mc_recipient(spec);
8127d661 Ben Hutchings    2013-08-29  4329  	if (is_mc_recip)
8127d661 Ben Hutchings    2013-08-29  4330  		bitmap_zero(mc_rem_map, EFX_EF10_FILTER_SEARCH_LIMIT);
8127d661 Ben Hutchings    2013-08-29  4331  
87dec16f Edward Cree      2018-02-27  4332  	if (spec->flags & EFX_FILTER_FLAG_RX_RSS) {
87dec16f Edward Cree      2018-02-27  4333  		if (spec->rss_context)
87dec16f Edward Cree      2018-02-27  4334  			ctx = efx_find_rss_context_entry(spec->rss_context,
87dec16f Edward Cree      2018-02-27  4335  							 &efx->rss_context.list);
87dec16f Edward Cree      2018-02-27  4336  		else
87dec16f Edward Cree      2018-02-27  4337  			ctx = &efx->rss_context;
87dec16f Edward Cree      2018-02-27  4338  		if (!ctx)
87dec16f Edward Cree      2018-02-27  4339  			return -ENOENT;
87dec16f Edward Cree      2018-02-27  4340  		if (ctx->context_id == EFX_EF10_RSS_CONTEXT_INVALID)
87dec16f Edward Cree      2018-02-27  4341  			return -EOPNOTSUPP;
87dec16f Edward Cree      2018-02-27  4342  	}
87dec16f Edward Cree      2018-02-27  4343  
8127d661 Ben Hutchings    2013-08-29  4344  	/* Find any existing filters with the same match tuple or
8127d661 Ben Hutchings    2013-08-29  4345  	 * else a free slot to insert at.  If any of them are busy,
8127d661 Ben Hutchings    2013-08-29  4346  	 * we have to wait and retry.
8127d661 Ben Hutchings    2013-08-29  4347  	 */
8127d661 Ben Hutchings    2013-08-29  4348  	for (;;) {
8127d661 Ben Hutchings    2013-08-29  4349  		unsigned int depth = 1;
8127d661 Ben Hutchings    2013-08-29  4350  		unsigned int i;
8127d661 Ben Hutchings    2013-08-29  4351  
8127d661 Ben Hutchings    2013-08-29  4352  		spin_lock_bh(&efx->filter_lock);
8127d661 Ben Hutchings    2013-08-29  4353  
8127d661 Ben Hutchings    2013-08-29  4354  		for (;;) {
8127d661 Ben Hutchings    2013-08-29  4355  			i = (hash + depth) & (HUNT_FILTER_TBL_ROWS - 1);
8127d661 Ben Hutchings    2013-08-29  4356  			saved_spec = efx_ef10_filter_entry_spec(table, i);
8127d661 Ben Hutchings    2013-08-29  4357  
8127d661 Ben Hutchings    2013-08-29  4358  			if (!saved_spec) {
8127d661 Ben Hutchings    2013-08-29  4359  				if (ins_index < 0)
8127d661 Ben Hutchings    2013-08-29  4360  					ins_index = i;
8127d661 Ben Hutchings    2013-08-29  4361  			} else if (efx_ef10_filter_equal(spec, saved_spec)) {
8127d661 Ben Hutchings    2013-08-29  4362  				if (table->entry[i].spec &
8127d661 Ben Hutchings    2013-08-29  4363  				    EFX_EF10_FILTER_FLAG_BUSY)
8127d661 Ben Hutchings    2013-08-29  4364  					break;
8127d661 Ben Hutchings    2013-08-29  4365  				if (spec->priority < saved_spec->priority &&
7665d1ab Ben Hutchings    2013-11-21  4366  				    spec->priority != EFX_FILTER_PRI_AUTO) {
8127d661 Ben Hutchings    2013-08-29  4367  					rc = -EPERM;
8127d661 Ben Hutchings    2013-08-29  4368  					goto out_unlock;
8127d661 Ben Hutchings    2013-08-29  4369  				}
8127d661 Ben Hutchings    2013-08-29  4370  				if (!is_mc_recip) {
8127d661 Ben Hutchings    2013-08-29  4371  					/* This is the only one */
8127d661 Ben Hutchings    2013-08-29  4372  					if (spec->priority ==
8127d661 Ben Hutchings    2013-08-29  4373  					    saved_spec->priority &&
8127d661 Ben Hutchings    2013-08-29  4374  					    !replace_equal) {
8127d661 Ben Hutchings    2013-08-29  4375  						rc = -EEXIST;
8127d661 Ben Hutchings    2013-08-29  4376  						goto out_unlock;
8127d661 Ben Hutchings    2013-08-29  4377  					}
8127d661 Ben Hutchings    2013-08-29  4378  					ins_index = i;
8127d661 Ben Hutchings    2013-08-29  4379  					goto found;
8127d661 Ben Hutchings    2013-08-29  4380  				} else if (spec->priority >
8127d661 Ben Hutchings    2013-08-29  4381  					   saved_spec->priority ||
8127d661 Ben Hutchings    2013-08-29  4382  					   (spec->priority ==
8127d661 Ben Hutchings    2013-08-29  4383  					    saved_spec->priority &&
8127d661 Ben Hutchings    2013-08-29  4384  					    replace_equal)) {
8127d661 Ben Hutchings    2013-08-29  4385  					if (ins_index < 0)
8127d661 Ben Hutchings    2013-08-29  4386  						ins_index = i;
8127d661 Ben Hutchings    2013-08-29  4387  					else
8127d661 Ben Hutchings    2013-08-29  4388  						__set_bit(depth, mc_rem_map);
8127d661 Ben Hutchings    2013-08-29  4389  				}
8127d661 Ben Hutchings    2013-08-29  4390  			}
8127d661 Ben Hutchings    2013-08-29  4391  
8127d661 Ben Hutchings    2013-08-29  4392  			/* Once we reach the maximum search depth, use
8127d661 Ben Hutchings    2013-08-29  4393  			 * the first suitable slot or return -EBUSY if
8127d661 Ben Hutchings    2013-08-29  4394  			 * there was none
8127d661 Ben Hutchings    2013-08-29  4395  			 */
8127d661 Ben Hutchings    2013-08-29  4396  			if (depth == EFX_EF10_FILTER_SEARCH_LIMIT) {
8127d661 Ben Hutchings    2013-08-29  4397  				if (ins_index < 0) {
8127d661 Ben Hutchings    2013-08-29  4398  					rc = -EBUSY;
8127d661 Ben Hutchings    2013-08-29  4399  					goto out_unlock;
8127d661 Ben Hutchings    2013-08-29  4400  				}
8127d661 Ben Hutchings    2013-08-29  4401  				goto found;
8127d661 Ben Hutchings    2013-08-29  4402  			}
8127d661 Ben Hutchings    2013-08-29  4403  
8127d661 Ben Hutchings    2013-08-29  4404  			++depth;
8127d661 Ben Hutchings    2013-08-29  4405  		}
8127d661 Ben Hutchings    2013-08-29  4406  
8127d661 Ben Hutchings    2013-08-29  4407  		prepare_to_wait(&table->waitq, &wait, TASK_UNINTERRUPTIBLE);
8127d661 Ben Hutchings    2013-08-29  4408  		spin_unlock_bh(&efx->filter_lock);
8127d661 Ben Hutchings    2013-08-29  4409  		schedule();
8127d661 Ben Hutchings    2013-08-29  4410  	}
8127d661 Ben Hutchings    2013-08-29  4411  
8127d661 Ben Hutchings    2013-08-29  4412  found:
8127d661 Ben Hutchings    2013-08-29  4413  	/* Create a software table entry if necessary, and mark it
8127d661 Ben Hutchings    2013-08-29  4414  	 * busy.  We might yet fail to insert, but any attempt to
8127d661 Ben Hutchings    2013-08-29  4415  	 * insert a conflicting filter while we're waiting for the
8127d661 Ben Hutchings    2013-08-29  4416  	 * firmware must find the busy entry.
8127d661 Ben Hutchings    2013-08-29  4417  	 */
8127d661 Ben Hutchings    2013-08-29  4418  	saved_spec = efx_ef10_filter_entry_spec(table, ins_index);
8127d661 Ben Hutchings    2013-08-29  4419  	if (saved_spec) {
7665d1ab Ben Hutchings    2013-11-21  4420  		if (spec->priority == EFX_FILTER_PRI_AUTO &&
7665d1ab Ben Hutchings    2013-11-21  4421  		    saved_spec->priority >= EFX_FILTER_PRI_AUTO) {
8127d661 Ben Hutchings    2013-08-29  4422  			/* Just make sure it won't be removed */
7665d1ab Ben Hutchings    2013-11-21  4423  			if (saved_spec->priority > EFX_FILTER_PRI_AUTO)
7665d1ab Ben Hutchings    2013-11-21  4424  				saved_spec->flags |= EFX_FILTER_FLAG_RX_OVER_AUTO;
8127d661 Ben Hutchings    2013-08-29  4425  			table->entry[ins_index].spec &=
b59e6ef8 Ben Hutchings    2013-11-21  4426  				~EFX_EF10_FILTER_FLAG_AUTO_OLD;
8127d661 Ben Hutchings    2013-08-29  4427  			rc = ins_index;
8127d661 Ben Hutchings    2013-08-29  4428  			goto out_unlock;
8127d661 Ben Hutchings    2013-08-29  4429  		}
8127d661 Ben Hutchings    2013-08-29  4430  		replacing = true;
8127d661 Ben Hutchings    2013-08-29  4431  		priv_flags = efx_ef10_filter_entry_flags(table, ins_index);
8127d661 Ben Hutchings    2013-08-29  4432  	} else {
8127d661 Ben Hutchings    2013-08-29  4433  		saved_spec = kmalloc(sizeof(*spec), GFP_ATOMIC);
8127d661 Ben Hutchings    2013-08-29  4434  		if (!saved_spec) {
8127d661 Ben Hutchings    2013-08-29  4435  			rc = -ENOMEM;
8127d661 Ben Hutchings    2013-08-29  4436  			goto out_unlock;
8127d661 Ben Hutchings    2013-08-29  4437  		}
8127d661 Ben Hutchings    2013-08-29  4438  		*saved_spec = *spec;
8127d661 Ben Hutchings    2013-08-29  4439  		priv_flags = 0;
8127d661 Ben Hutchings    2013-08-29  4440  	}
8127d661 Ben Hutchings    2013-08-29  4441  	efx_ef10_filter_set_entry(table, ins_index, saved_spec,
8127d661 Ben Hutchings    2013-08-29  4442  				  priv_flags | EFX_EF10_FILTER_FLAG_BUSY);
8127d661 Ben Hutchings    2013-08-29  4443  
8127d661 Ben Hutchings    2013-08-29  4444  	/* Mark lower-priority multicast recipients busy prior to removal */
8127d661 Ben Hutchings    2013-08-29  4445  	if (is_mc_recip) {
8127d661 Ben Hutchings    2013-08-29  4446  		unsigned int depth, i;
8127d661 Ben Hutchings    2013-08-29  4447  
8127d661 Ben Hutchings    2013-08-29  4448  		for (depth = 0; depth < EFX_EF10_FILTER_SEARCH_LIMIT; depth++) {
8127d661 Ben Hutchings    2013-08-29  4449  			i = (hash + depth) & (HUNT_FILTER_TBL_ROWS - 1);
8127d661 Ben Hutchings    2013-08-29  4450  			if (test_bit(depth, mc_rem_map))
8127d661 Ben Hutchings    2013-08-29  4451  				table->entry[i].spec |=
8127d661 Ben Hutchings    2013-08-29  4452  					EFX_EF10_FILTER_FLAG_BUSY;
8127d661 Ben Hutchings    2013-08-29  4453  		}
8127d661 Ben Hutchings    2013-08-29  4454  	}
8127d661 Ben Hutchings    2013-08-29  4455  
8127d661 Ben Hutchings    2013-08-29  4456  	spin_unlock_bh(&efx->filter_lock);
8127d661 Ben Hutchings    2013-08-29  4457  
8127d661 Ben Hutchings    2013-08-29 @4458  	rc = efx_ef10_filter_push(efx, spec, &table->entry[ins_index].handle,
87dec16f Edward Cree      2018-02-27  4459  				  ctx, replacing);
8127d661 Ben Hutchings    2013-08-29  4460  
8127d661 Ben Hutchings    2013-08-29  4461  	/* Finalise the software table entry */
8127d661 Ben Hutchings    2013-08-29  4462  	spin_lock_bh(&efx->filter_lock);
8127d661 Ben Hutchings    2013-08-29  4463  	if (rc == 0) {
8127d661 Ben Hutchings    2013-08-29  4464  		if (replacing) {
8127d661 Ben Hutchings    2013-08-29  4465  			/* Update the fields that may differ */
7665d1ab Ben Hutchings    2013-11-21  4466  			if (saved_spec->priority == EFX_FILTER_PRI_AUTO)
7665d1ab Ben Hutchings    2013-11-21  4467  				saved_spec->flags |=
7665d1ab Ben Hutchings    2013-11-21  4468  					EFX_FILTER_FLAG_RX_OVER_AUTO;
8127d661 Ben Hutchings    2013-08-29  4469  			saved_spec->priority = spec->priority;
7665d1ab Ben Hutchings    2013-11-21  4470  			saved_spec->flags &= EFX_FILTER_FLAG_RX_OVER_AUTO;
8127d661 Ben Hutchings    2013-08-29  4471  			saved_spec->flags |= spec->flags;
8127d661 Ben Hutchings    2013-08-29  4472  			saved_spec->rss_context = spec->rss_context;
8127d661 Ben Hutchings    2013-08-29  4473  			saved_spec->dmaq_id = spec->dmaq_id;
8127d661 Ben Hutchings    2013-08-29  4474  		}
8127d661 Ben Hutchings    2013-08-29  4475  	} else if (!replacing) {
8127d661 Ben Hutchings    2013-08-29  4476  		kfree(saved_spec);
8127d661 Ben Hutchings    2013-08-29  4477  		saved_spec = NULL;
8127d661 Ben Hutchings    2013-08-29  4478  	}
8127d661 Ben Hutchings    2013-08-29  4479  	efx_ef10_filter_set_entry(table, ins_index, saved_spec, priv_flags);
8127d661 Ben Hutchings    2013-08-29  4480  
8127d661 Ben Hutchings    2013-08-29  4481  	/* Remove and finalise entries for lower-priority multicast
8127d661 Ben Hutchings    2013-08-29  4482  	 * recipients
8127d661 Ben Hutchings    2013-08-29  4483  	 */
8127d661 Ben Hutchings    2013-08-29  4484  	if (is_mc_recip) {
bb53f4d4 Martin Habets    2017-06-22  4485  		MCDI_DECLARE_BUF(inbuf, MC_CMD_FILTER_OP_EXT_IN_LEN);
8127d661 Ben Hutchings    2013-08-29  4486  		unsigned int depth, i;
8127d661 Ben Hutchings    2013-08-29  4487  
8127d661 Ben Hutchings    2013-08-29  4488  		memset(inbuf, 0, sizeof(inbuf));
8127d661 Ben Hutchings    2013-08-29  4489  
8127d661 Ben Hutchings    2013-08-29  4490  		for (depth = 0; depth < EFX_EF10_FILTER_SEARCH_LIMIT; depth++) {
8127d661 Ben Hutchings    2013-08-29  4491  			if (!test_bit(depth, mc_rem_map))
8127d661 Ben Hutchings    2013-08-29  4492  				continue;
8127d661 Ben Hutchings    2013-08-29  4493  
8127d661 Ben Hutchings    2013-08-29  4494  			i = (hash + depth) & (HUNT_FILTER_TBL_ROWS - 1);
8127d661 Ben Hutchings    2013-08-29  4495  			saved_spec = efx_ef10_filter_entry_spec(table, i);
8127d661 Ben Hutchings    2013-08-29  4496  			priv_flags = efx_ef10_filter_entry_flags(table, i);
8127d661 Ben Hutchings    2013-08-29  4497  
8127d661 Ben Hutchings    2013-08-29  4498  			if (rc == 0) {
8127d661 Ben Hutchings    2013-08-29  4499  				spin_unlock_bh(&efx->filter_lock);
8127d661 Ben Hutchings    2013-08-29  4500  				MCDI_SET_DWORD(inbuf, FILTER_OP_IN_OP,
8127d661 Ben Hutchings    2013-08-29  4501  					       MC_CMD_FILTER_OP_IN_OP_UNSUBSCRIBE);
8127d661 Ben Hutchings    2013-08-29  4502  				MCDI_SET_QWORD(inbuf, FILTER_OP_IN_HANDLE,
8127d661 Ben Hutchings    2013-08-29  4503  					       table->entry[i].handle);
8127d661 Ben Hutchings    2013-08-29  4504  				rc = efx_mcdi_rpc(efx, MC_CMD_FILTER_OP,
8127d661 Ben Hutchings    2013-08-29  4505  						  inbuf, sizeof(inbuf),
8127d661 Ben Hutchings    2013-08-29  4506  						  NULL, 0, NULL);
8127d661 Ben Hutchings    2013-08-29  4507  				spin_lock_bh(&efx->filter_lock);
8127d661 Ben Hutchings    2013-08-29  4508  			}
8127d661 Ben Hutchings    2013-08-29  4509  
8127d661 Ben Hutchings    2013-08-29  4510  			if (rc == 0) {
8127d661 Ben Hutchings    2013-08-29  4511  				kfree(saved_spec);
8127d661 Ben Hutchings    2013-08-29  4512  				saved_spec = NULL;
8127d661 Ben Hutchings    2013-08-29  4513  				priv_flags = 0;
8127d661 Ben Hutchings    2013-08-29  4514  			} else {
8127d661 Ben Hutchings    2013-08-29  4515  				priv_flags &= ~EFX_EF10_FILTER_FLAG_BUSY;
8127d661 Ben Hutchings    2013-08-29  4516  			}
8127d661 Ben Hutchings    2013-08-29  4517  			efx_ef10_filter_set_entry(table, i, saved_spec,
8127d661 Ben Hutchings    2013-08-29  4518  						  priv_flags);
8127d661 Ben Hutchings    2013-08-29  4519  		}
8127d661 Ben Hutchings    2013-08-29  4520  	}
8127d661 Ben Hutchings    2013-08-29  4521  
8127d661 Ben Hutchings    2013-08-29  4522  	/* If successful, return the inserted filter ID */
8127d661 Ben Hutchings    2013-08-29  4523  	if (rc == 0)
0ccb998b Jon Cooper       2017-02-17  4524  		rc = efx_ef10_make_filter_id(match_pri, ins_index);
8127d661 Ben Hutchings    2013-08-29  4525  
8127d661 Ben Hutchings    2013-08-29  4526  	wake_up_all(&table->waitq);
8127d661 Ben Hutchings    2013-08-29  4527  out_unlock:
8127d661 Ben Hutchings    2013-08-29  4528  	spin_unlock_bh(&efx->filter_lock);
8127d661 Ben Hutchings    2013-08-29  4529  	finish_wait(&table->waitq, &wait);
8127d661 Ben Hutchings    2013-08-29  4530  	return rc;
8127d661 Ben Hutchings    2013-08-29  4531  }
8127d661 Ben Hutchings    2013-08-29  4532  

:::::: The code at line 4458 was first introduced by commit
:::::: 8127d661e77f5ec410093bce411f540afa34593f sfc: Add support for Solarflare SFC9100 family

:::::: TO: Ben Hutchings <bhutchings@solarflare.com>
:::::: CC: Ben Hutchings <bhutchings@solarflare.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 40831 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RESEND net-next 0/2] ntuple filters with RSS
  2018-02-28  1:24   ` Alexander Duyck
@ 2018-03-02 15:24     ` Edward Cree
  2018-03-02 18:55       ` Jakub Kicinski
  0 siblings, 1 reply; 18+ messages in thread
From: Edward Cree @ 2018-03-02 15:24 UTC (permalink / raw)
  To: Alexander Duyck, Jakub Kicinski
  Cc: linux-net-drivers, David Miller, netdev, John W. Linville,
	Or Gerlitz, Alexander Duyck

On Tue, Feb 27, 2018 at 3:47 PM, Jakub Kicinski <kubakici@wp.pl> wrote:

> Please, let's stop extending ethtool_rx_flow APIs.  I bit my tongue
> when Intel was adding their "redirection to VF" based on ethtool ntuples
> and look now they're adding the same functionality with flower :|  And
> wonder how to handle two interfaces doing the same thing.
Since sfc only supports ethtool NFC interfaces (we have no flower support,
 and I also wonder how one is to support both of those interfaces without
 producing an ugly mess), I'd much rather put this in ethtool than have to
 implement all of flower just so we can have this extension.
I guess part of the question is, which other drivers besides us would want
 to implement something like this, and what are their requirements?

> On the use case itself, I wonder how much sense that makes.  Can your
> hardware not tag the packet as well so you could then mux it to
> something like macvlan offload?
In practice the only way our hardware can "tag the packet" is by the
 selection of RX queue.  So you could for instance give a container its
 own RX queues (rather than just using the existing RX queues on the
 appropriate CPUs), and maybe in future hook those queues up to l2fwd
 offload somehow.
But that seems like a separate job (offloading the macvlan switching) to
 what this series is about (making the RX processing happen on the right
 CPUs).  Is software macvlan switching really noticeably slow, anyway?
Besides, more powerful filtering than just MAC addr might be needed, if,
 for instance, the container network is encapsulated.  In that case
 something like a UDP 4-tuple filter might be necessary (or, indeed, a
 filter looking at the VNID (VxLAN TNI) - which our hardware can do but
 ethtool doesn't currently have a way to specify).  AFAICT l2-fwd-offload
 can only be used for straight MAC addr, not for overlay networks like
 VxLAN or FOU?  At least, existing ndo_dfwd_add_station() implementations
 don't seem to check that dev is a macvlan...  Does it even support
 VLAN filters?  fm10k implementation doesn't seem to.
Anyway, like I say, filtering traffic onto its own queues seems to be
 orthogonal, or at least separate, to binding those queues into an
 upperdev for demux offload.

On 28/02/18 01:24, Alexander Duyck wrote:

> We did something like this for i40e. Basically we required creating
> the queue groups using mqprio to keep them symmetric on Tx and Rx, and
> then allowed for TC ingress filters to redirect traffic to those queue
> groups.
>
> - Alex
If we're not doing macvlan offload, I'm not sure what, if anything, the
 TX side would buy us.  So for now it seems to make sense for TX just to
 use the TXQ associated with the CPU from which the TX originates, which
 I believe already happens automatically.

-Ed

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RESEND net-next 0/2] ntuple filters with RSS
  2018-03-01 18:36 ` David Miller
@ 2018-03-02 16:01   ` Edward Cree
  2018-03-02 17:49     ` David Riddoch
  2018-03-07 15:24     ` David Miller
  0 siblings, 2 replies; 18+ messages in thread
From: Edward Cree @ 2018-03-02 16:01 UTC (permalink / raw)
  To: David Miller; +Cc: linux-net-drivers, netdev, linville

On 01/03/18 18:36, David Miller wrote:
> We really should have the ethtool interfaces under deep freeze until we
> convert it to netlink or similar.
> Second, this is a real hackish way to extend ethtool with new
> semantics.  A structure changes layout based upon a flag bit setting
> in an earlier member?  Yikes...
Yeah, while I'm reasonably confident it's ABI-compatible (presence of that
 flag in the past should always have led to drivers complaining they didn't
 recognise it), and it is somewhat similar to the existing FLOW_EXT flag,
 it is indeed rather ugly.  This is the only way I could see to do it
 without adding a whole new command number, which I felt might also be
 contentious (see: deep freeze) but is probably a better approach.

> Lastly, there has been feedback asking how practical and useful this
> facility actually is, and you must address that.
According to our marketing folks, there is end-user demand for this feature
 or something like it.  I didn't see any arguments why this isn't useful,
 just that other things might be useful too.  (Also, sorry it took me so
 long to address their feedback, but I had to do a bit of background
 reading before I could understand what Jakub was suggesting.)

-Ed

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RESEND net-next 0/2] ntuple filters with RSS
  2018-03-02 16:01   ` Edward Cree
@ 2018-03-02 17:49     ` David Riddoch
  2018-03-07 15:24     ` David Miller
  1 sibling, 0 replies; 18+ messages in thread
From: David Riddoch @ 2018-03-02 17:49 UTC (permalink / raw)
  To: Edward Cree, David Miller; +Cc: linux-net-drivers, netdev, linville


>> Lastly, there has been feedback asking how practical and useful this
>> facility actually is, and you must address that.
> According to our marketing folks, there is end-user demand for this feature
>   or something like it.
The main benefit comes on numa systems, when you have high throughput 
applications or containers on multiple numa nodes.  Using RSS without 
steering gives poor efficiency because traffic is often not received on 
the same node as the application.  With flow steering to a single queue 
you can get a bottleneck, as all traffic for a TCP/UDP port or container 
goes to one core.  ARFS doesn't scale to large numbers of flows.

This feature allows the admin to ensure packets are received on the same 
numa node as the application (improving efficiency) and avoids the 
single core bottleneck.

David

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RESEND net-next 0/2] ntuple filters with RSS
  2018-03-02 15:24     ` Edward Cree
@ 2018-03-02 18:55       ` Jakub Kicinski
  2018-03-02 23:24         ` Alexander Duyck
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Kicinski @ 2018-03-02 18:55 UTC (permalink / raw)
  To: Edward Cree
  Cc: Alexander Duyck, linux-net-drivers, David Miller, netdev,
	John W. Linville, Or Gerlitz, Alexander Duyck

On Fri, 2 Mar 2018 15:24:29 +0000, Edward Cree wrote:
> On Tue, Feb 27, 2018 at 3:47 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
> 
> > Please, let's stop extending ethtool_rx_flow APIs.  I bit my tongue
> > when Intel was adding their "redirection to VF" based on ethtool ntuples
> > and look now they're adding the same functionality with flower :|  And
> > wonder how to handle two interfaces doing the same thing.  
> Since sfc only supports ethtool NFC interfaces (we have no flower support,
>  and I also wonder how one is to support both of those interfaces without
>  producing an ugly mess), I'd much rather put this in ethtool than have to
>  implement all of flower just so we can have this extension.

"Just this one extension" is exactly the attitude that can lead to
messy APIs :(

> I guess part of the question is, which other drivers besides us would want
>  to implement something like this, and what are their requirements?

I think every vendor is trying to come up with ways to make their HW
work with containers better these days.

> > On the use case itself, I wonder how much sense that makes.  Can your
> > hardware not tag the packet as well so you could then mux it to
> > something like macvlan offload?  
> In practice the only way our hardware can "tag the packet" is by the
>  selection of RX queue.  So you could for instance give a container its
>  own RX queues (rather than just using the existing RX queues on the
>  appropriate CPUs), and maybe in future hook those queues up to l2fwd
>  offload somehow.
> But that seems like a separate job (offloading the macvlan switching) to
>  what this series is about (making the RX processing happen on the right
>  CPUs).  Is software macvlan switching really noticeably slow, anyway?

OK, thanks for clarifying.

> Besides, more powerful filtering than just MAC addr might be needed, if,
>  for instance, the container network is encapsulated.  In that case
>  something like a UDP 4-tuple filter might be necessary (or, indeed, a
>  filter looking at the VNID (VxLAN TNI) - which our hardware can do but
>  ethtool doesn't currently have a way to specify).  AFAICT l2-fwd-offload
>  can only be used for straight MAC addr, not for overlay networks like
>  VxLAN or FOU?  At least, existing ndo_dfwd_add_station() implementations
>  don't seem to check that dev is a macvlan...  Does it even support
>  VLAN filters?  fm10k implementation doesn't seem to.

Exactly!  One can come up with many protocol combinations which flower
already has APIs for...  ethtool is not the place for it.

> Anyway, like I say, filtering traffic onto its own queues seems to be
>  orthogonal, or at least separate, to binding those queues into an
>  upperdev for demux offload.

It is, I was just trying to broaden the scope to more capable HW so we
design APIs that would serve all.

> On 28/02/18 01:24, Alexander Duyck wrote:
> 
> > We did something like this for i40e. Basically we required creating
> > the queue groups using mqprio to keep them symmetric on Tx and Rx, and
> > then allowed for TC ingress filters to redirect traffic to those queue
> > groups.
> >
> > - Alex  
> If we're not doing macvlan offload, I'm not sure what, if anything, the
>  TX side would buy us.  So for now it seems to make sense for TX just to
>  use the TXQ associated with the CPU from which the TX originates, which
>  I believe already happens automatically.

I don't think that's what Alex was referring to.  Please see
commit e284fc280473 ("i40e: Add and delete cloud filter") for
instance :)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RESEND net-next 0/2] ntuple filters with RSS
  2018-03-02 18:55       ` Jakub Kicinski
@ 2018-03-02 23:24         ` Alexander Duyck
  0 siblings, 0 replies; 18+ messages in thread
From: Alexander Duyck @ 2018-03-02 23:24 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Edward Cree, linux-net-drivers, David Miller, netdev,
	John W. Linville, Or Gerlitz, Alexander Duyck

On Fri, Mar 2, 2018 at 10:55 AM, Jakub Kicinski <kubakici@wp.pl> wrote:
> On Fri, 2 Mar 2018 15:24:29 +0000, Edward Cree wrote:
>> On Tue, Feb 27, 2018 at 3:47 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
>>
>> > Please, let's stop extending ethtool_rx_flow APIs.  I bit my tongue
>> > when Intel was adding their "redirection to VF" based on ethtool ntuples
>> > and look now they're adding the same functionality with flower :|  And
>> > wonder how to handle two interfaces doing the same thing.
>> Since sfc only supports ethtool NFC interfaces (we have no flower support,
>>  and I also wonder how one is to support both of those interfaces without
>>  producing an ugly mess), I'd much rather put this in ethtool than have to
>>  implement all of flower just so we can have this extension.
>
> "Just this one extension" is exactly the attitude that can lead to
> messy APIs :(
>
>> I guess part of the question is, which other drivers besides us would want
>>  to implement something like this, and what are their requirements?
>
> I think every vendor is trying to come up with ways to make their HW
> work with containers better these days.
>
>> > On the use case itself, I wonder how much sense that makes.  Can your
>> > hardware not tag the packet as well so you could then mux it to
>> > something like macvlan offload?
>> In practice the only way our hardware can "tag the packet" is by the
>>  selection of RX queue.  So you could for instance give a container its
>>  own RX queues (rather than just using the existing RX queues on the
>>  appropriate CPUs), and maybe in future hook those queues up to l2fwd
>>  offload somehow.
>> But that seems like a separate job (offloading the macvlan switching) to
>>  what this series is about (making the RX processing happen on the right
>>  CPUs).  Is software macvlan switching really noticeably slow, anyway?
>
> OK, thanks for clarifying.
>
>> Besides, more powerful filtering than just MAC addr might be needed, if,
>>  for instance, the container network is encapsulated.  In that case
>>  something like a UDP 4-tuple filter might be necessary (or, indeed, a
>>  filter looking at the VNID (VxLAN TNI) - which our hardware can do but
>>  ethtool doesn't currently have a way to specify).  AFAICT l2-fwd-offload
>>  can only be used for straight MAC addr, not for overlay networks like
>>  VxLAN or FOU?  At least, existing ndo_dfwd_add_station() implementations
>>  don't seem to check that dev is a macvlan...  Does it even support
>>  VLAN filters?  fm10k implementation doesn't seem to.
>
> Exactly!  One can come up with many protocol combinations which flower
> already has APIs for...  ethtool is not the place for it.
>
>> Anyway, like I say, filtering traffic onto its own queues seems to be
>>  orthogonal, or at least separate, to binding those queues into an
>>  upperdev for demux offload.
>
> It is, I was just trying to broaden the scope to more capable HW so we
> design APIs that would serve all.
>
>> On 28/02/18 01:24, Alexander Duyck wrote:
>>
>> > We did something like this for i40e. Basically we required creating
>> > the queue groups using mqprio to keep them symmetric on Tx and Rx, and
>> > then allowed for TC ingress filters to redirect traffic to those queue
>> > groups.
>> >
>> > - Alex
>> If we're not doing macvlan offload, I'm not sure what, if anything, the
>>  TX side would buy us.  So for now it seems to make sense for TX just to
>>  use the TXQ associated with the CPU from which the TX originates, which
>>  I believe already happens automatically.
>
> I don't think that's what Alex was referring to.  Please see
> commit e284fc280473 ("i40e: Add and delete cloud filter") for
> instance :)

Right. And as far as the Tx queue association goes right now we are
basing things off of skb->priority which is easily controlled via
cgroups. So in theory you could associate a given set of cgroup to a
specific set of Tx queues using this approach.

Most of the filtering that Jakub pointed out is applied to the Rx side
to make sure the packets come in on the right queue set.

- Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RESEND net-next 0/2] ntuple filters with RSS
  2018-03-02 16:01   ` Edward Cree
  2018-03-02 17:49     ` David Riddoch
@ 2018-03-07 15:24     ` David Miller
  2018-03-07 15:40       ` Edward Cree
  1 sibling, 1 reply; 18+ messages in thread
From: David Miller @ 2018-03-07 15:24 UTC (permalink / raw)
  To: ecree; +Cc: linux-net-drivers, netdev, linville

From: Edward Cree <ecree@solarflare.com>
Date: Fri, 2 Mar 2018 16:01:47 +0000

> On 01/03/18 18:36, David Miller wrote:
>> We really should have the ethtool interfaces under deep freeze until we
>> convert it to netlink or similar.
>> Second, this is a real hackish way to extend ethtool with new
>> semantics.� A structure changes layout based upon a flag bit setting
>> in an earlier member?� Yikes...
> Yeah, while I'm reasonably confident it's ABI-compatible (presence of that
> �flag in the past should always have led to drivers complaining they didn't
> �recognise it), and it is somewhat similar to the existing FLOW_EXT flag,
> �it is indeed rather ugly.� This is the only way I could see to do it
> �without adding a whole new command number, which I felt might also be
> �contentious (see: deep freeze) but is probably a better approach.
> 
>> Lastly, there has been feedback asking how practical and useful this
>> facility actually is, and you must address that.
> According to our marketing folks, there is end-user demand for this feature
> �or something like it.� I didn't see any arguments why this isn't useful,
> �just that other things might be useful too.� (Also, sorry it took me so
> �long to address their feedback, but I had to do a bit of background
> �reading before I could understand what Jakub was suggesting.)

Ok.

Since nobody is really working on the ethtool --> devlink/netlink conversion,
it really isn't reasonable for me to block useful changes like your's.

So please resubmit this series and I will apply it.

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RESEND net-next 0/2] ntuple filters with RSS
  2018-03-07 15:24     ` David Miller
@ 2018-03-07 15:40       ` Edward Cree
  2018-03-07 20:55         ` David Miller
  0 siblings, 1 reply; 18+ messages in thread
From: Edward Cree @ 2018-03-07 15:40 UTC (permalink / raw)
  To: David Miller; +Cc: linux-net-drivers, netdev, linville

On 07/03/18 15:24, David Miller wrote:
> Ok.
>
> Since nobody is really working on the ethtool --> devlink/netlink conversion,
> it really isn't reasonable for me to block useful changes like your's.
>
> So please resubmit this series and I will apply it.
>
> Thanks.
Ok, thanks.  Should I stick with the hackish union-and-flag-bit, or define a
 new ethtool command number for the extended command?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RESEND net-next 0/2] ntuple filters with RSS
  2018-03-07 15:40       ` Edward Cree
@ 2018-03-07 20:55         ` David Miller
  0 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2018-03-07 20:55 UTC (permalink / raw)
  To: ecree; +Cc: linux-net-drivers, netdev, linville

From: Edward Cree <ecree@solarflare.com>
Date: Wed, 7 Mar 2018 15:40:39 +0000

> On 07/03/18 15:24, David Miller wrote:
>> Ok.
>>
>> Since nobody is really working on the ethtool --> devlink/netlink conversion,
>> it really isn't reasonable for me to block useful changes like your's.
>>
>> So please resubmit this series and I will apply it.
>>
>> Thanks.
> Ok, thanks.� Should I stick with the hackish union-and-flag-bit, or define a
> �new ethtool command number for the extended command?

I'd say stick with the union-and-flag-bit hack.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-03-07 20:55 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-27 17:59 [PATCH RESEND net-next 0/2] ntuple filters with RSS Edward Cree
2018-02-27 18:02 ` [PATCH net-next 1/2] net: ethtool: extend RXNFC API to support RSS spreading of filter matches Edward Cree
2018-02-27 18:03 ` [PATCH net-next 2/2] sfc: support RSS spreading of ethtool ntuple filters Edward Cree
2018-03-01 20:51   ` kbuild test robot
2018-02-27 23:47 ` [PATCH RESEND net-next 0/2] ntuple filters with RSS Jakub Kicinski
2018-02-28  1:24   ` Alexander Duyck
2018-03-02 15:24     ` Edward Cree
2018-03-02 18:55       ` Jakub Kicinski
2018-03-02 23:24         ` Alexander Duyck
2018-03-01 18:36 ` David Miller
2018-03-02 16:01   ` Edward Cree
2018-03-02 17:49     ` David Riddoch
2018-03-07 15:24     ` David Miller
2018-03-07 15:40       ` Edward Cree
2018-03-07 20:55         ` David Miller
     [not found] <533b5eff-49b6-16c3-9873-dda3fb05c3d4@solarflare.com>
2018-02-27 17:38 ` David Miller
2018-02-27 17:55   ` Edward Cree
2018-02-27 19:28     ` John W. Linville

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).