Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [patch net-next-2.6 V3] net: convert bonding to use rx_handler
From: Nicolas de Pesloüan @ 2011-02-25 23:46 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David Miller, kaber, eric.dumazet, netdev, shemminger, fubar,
	andy
In-Reply-To: <20110223190541.GB2783@psychotron.redhat.com>

Le 23/02/2011 20:05, Jiri Pirko a écrit :
> This patch converts bonding to use rx_handler. Results in cleaner
> __netif_receive_skb() with much less exceptions needed. Also
> bond-specific work is moved into bond code.
>
> Did performance test using pktgen and counting incoming packets by
> iptables. No regression noted.
>
> Signed-off-by: Jiri Pirko<jpirko@redhat.com>
>
> v1->v2:
>          using skb_iif instead of new input_dev to remember original
> 	device
>
> v2->v3:
> 	do another loop in case skb->dev is changed. That way orig_dev
> 	core can be left untouched.

Hi Jiri,

Eventually taking enough time for a review.

I think we should split this change :

1/ Change __netif_receive_skb() to call rx_handler for diverted net_device, until rx_handler is NULL.

2/ Convert currently existing rx_handlers (bridge and macvlan) to use this new "loop" feature, 
removing the need to call netif_rx() inside their respective rx_handler and also removing the 
associated overhead.

3/ Convert bonding to use rx_handlers.

Also, on step 1, we definitely need to clarify what orig_dev should be.

I now think that orig_dev should be "the device one level below the current one" or NULL if current 
device was not diverted from another one. It means that we should keep an array of crossed 
(diverted) devices and the associated orig_dev. This array would be used to pass the right orig_dev 
to protocol handlers, depending on the device they register on :

eth0 -> bond0 -> br0

A protocol handler registered on bond0 would receive eth0 as orig_dev.
A protocol handler registered on br0 would receive bond0 as orig_dev.

[snip]

> @@ -3167,32 +3135,8 @@ static int __netif_receive_skb(struct sk_buff *skb)

[snip]

> +another_round:
> +
> +	__this_cpu_inc(softnet_data.processed);
> +
>   #ifdef CONFIG_NET_CLS_ACT
>   	if (skb->tc_verd&  TC_NCLS) {
>   		skb->tc_verd = CLR_TC_NCLS(skb->tc_verd);
> @@ -3209,8 +3157,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
>   #endif
>
>   	list_for_each_entry_rcu(ptype,&ptype_all, list) {
> -		if (ptype->dev == null_or_orig || ptype->dev == skb->dev ||
> -		    ptype->dev == orig_dev) {
> +		if (!ptype->dev || ptype->dev == skb->dev) {
>   			if (pt_prev)
>   				ret = deliver_skb(skb, pt_prev, orig_dev);
>   			pt_prev = ptype;
> @@ -3224,16 +3171,20 @@ static int __netif_receive_skb(struct sk_buff *skb)
>   ncls:
>   #endif
>

Why do you loop to ptype_all before calling rx_handler ?

I don't understand why ptype_all and ptype_base are not handled at the same place in current 
__netif_receive_skb() but I think we should take the opportunity to change that, unless someone know 
of a good reason not to do so.

> -	/* Handle special case of bridge or macvlan */
>   	rx_handler = rcu_dereference(skb->dev->rx_handler);
>   	if (rx_handler) {

	Nicolas.

^ permalink raw reply

* [ethtool PATCH 0/4] Add support for network flow classifier rules
From: Alexander Duyck @ 2011-02-25 23:48 UTC (permalink / raw)
  To: davem, jeffrey.t.kirsher, bhutchings; +Cc: netdev

This patch series implements the user-space portion of network flow
classifier rules.  The original patches were applied to the kernel a couple
of years ago, but the user-space side was never applied.  As such I have
gone though and updated the original patch from Santwona Behera to make it
applicable to the current ethtool git tree.

In addition this updates the network flow classification rules to handle
displaying the same fields as ntuple filters due to the fact that ntuple
display functionality had some serious issues, and provided no means for a
driver to display rules if they are stored internally.

The formatting of the man page and the help text may still need some work.
I was having difficulties determining the best way to layout the
class-rule-add and flow-type help instructions since they are mutually
exclusive but both contain a large number of possible options.

Finally there was one minor change adding ESP hashing as a separate option
to Rx-hashing that was contained in the original patch that I have moved
out into a separate patch.

---

Alexander Duyck (3):
      Add support for displaying a ntuple contained in an rx_flow_spec
      Remove strings based approach for displaying ntuple
      Add support for ESP as a separate protocol from AH

Santwona Behera (1):
      v2 Add RX packet classification interface

 Makefile.am      |    3 
 ethtool-bitops.h |   25 +
 ethtool-copy.h   |   23 +
 ethtool-util.h   |   42 ++
 ethtool.8.in     |  204 +++++----
 ethtool.c        |  344 ++++++---------
 rxclass.c        | 1241 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 1573 insertions(+), 309 deletions(-)
 create mode 100644 ethtool-bitops.h
 create mode 100644 rxclass.c

-- 

^ permalink raw reply

* [ethtool PATCH 1/4] Add support for ESP as a separate protocol from AH
From: Alexander Duyck @ 2011-02-25 23:48 UTC (permalink / raw)
  To: davem, jeffrey.t.kirsher, bhutchings; +Cc: netdev
In-Reply-To: <20110225233902.8409.74474.stgit@gitlad.jf.intel.com>

This change is meant to split out ESP from AH.  Currently they are present
as both a combined value, and two separate values.  In order to try and
support eventually splitting the two out into separate values this change
makes it so that ESP can be called out separately in ethtool.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

 ethtool.8.in |    2 +-
 ethtool.c    |   21 ++++++++++++++++-----
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/ethtool.8.in b/ethtool.8.in
index 133825b..7dec259 100644
--- a/ethtool.8.in
+++ b/ethtool.8.in
@@ -50,7 +50,7 @@
 .\"
 .\"	\(*FL - flow type values
 .\"
-.ds FL \fBtcp4\fP|\fBudp4\fP|\fBah4\fP|\fBsctp4\fP|\fBtcp6\fP|\fBudp6\fP|\fBah6\fP|\fBsctp6\fP
+.ds FL \fBtcp4\fP|\fBudp4\fP|\fBah4\fP|\fBesp4\fP|\fBsctp4\fP|\fBtcp6\fP|\fBudp6\fP|\fBah6\fP|\fBesp6\fP|\fBsctp6\fP
 .\"
 .\"	\(*HO - hash options
 .\"
diff --git a/ethtool.c b/ethtool.c
index 1afdfe4..14740d5 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -32,7 +32,6 @@
 #include <sys/ioctl.h>
 #include <sys/stat.h>
 #include <stdio.h>
-#include <string.h>
 #include <errno.h>
 #include <net/if.h>
 #include <sys/utsname.h>
@@ -232,15 +231,15 @@ static struct option {
     { "-S", "--statistics", MODE_GSTATS, "Show adapter statistics" },
     { "-n", "--show-nfc", MODE_GNFC, "Show Rx network flow classification "
 		"options",
-		"		[ rx-flow-hash tcp4|udp4|ah4|sctp4|"
-		"tcp6|udp6|ah6|sctp6 ]\n" },
+		"		[ rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|"
+		"tcp6|udp6|ah6|esp6|sctp6 ]\n" },
     { "-f", "--flash", MODE_FLASHDEV, "FILENAME " "Flash firmware image "
     		"from the specified file to a region on the device",
 		"               [ REGION-NUMBER-TO-FLASH ]\n" },
     { "-N", "--config-nfc", MODE_SNFC, "Configure Rx network flow "
 		"classification options",
-		"		[ rx-flow-hash tcp4|udp4|ah4|sctp4|"
-		"tcp6|udp6|ah6|sctp6 m|v|t|s|d|f|n|r... ]\n" },
+		"		[ rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|"
+		"tcp6|udp6|ah6|esp6|sctp6 m|v|t|s|d|f|n|r... ]\n" },
     { "-x", "--show-rxfh-indir", MODE_GRXFHINDIR, "Show Rx flow hash "
 		"indirection" },
     { "-X", "--set-rxfh-indir", MODE_SRXFHINDIR, "Set Rx flow hash indirection",
@@ -778,6 +777,8 @@ static int rxflow_str_to_type(const char *str)
 		flow_type = UDP_V4_FLOW;
 	else if (!strcmp(str, "ah4"))
 		flow_type = AH_ESP_V4_FLOW;
+	else if (!strcmp(str, "esp4"))
+		flow_type = ESP_V4_FLOW;
 	else if (!strcmp(str, "sctp4"))
 		flow_type = SCTP_V4_FLOW;
 	else if (!strcmp(str, "tcp6"))
@@ -786,6 +787,8 @@ static int rxflow_str_to_type(const char *str)
 		flow_type = UDP_V6_FLOW;
 	else if (!strcmp(str, "ah6"))
 		flow_type = AH_ESP_V6_FLOW;
+	else if (!strcmp(str, "esp6"))
+		flow_type = ESP_V6_FLOW;
 	else if (!strcmp(str, "sctp6"))
 		flow_type = SCTP_V6_FLOW;
 	else if (!strcmp(str, "ether"))
@@ -1918,8 +1921,12 @@ static int dump_rxfhash(int fhash, u64 val)
 		fprintf(stdout, "SCTP over IPV4 flows");
 		break;
 	case AH_ESP_V4_FLOW:
+	case AH_V4_FLOW:
 		fprintf(stdout, "IPSEC AH over IPV4 flows");
 		break;
+	case ESP_V4_FLOW:
+		fprintf(stdout, "IPSEC ESP over IPV4 flows");
+		break;
 	case TCP_V6_FLOW:
 		fprintf(stdout, "TCP over IPV6 flows");
 		break;
@@ -1930,8 +1937,12 @@ static int dump_rxfhash(int fhash, u64 val)
 		fprintf(stdout, "SCTP over IPV6 flows");
 		break;
 	case AH_ESP_V6_FLOW:
+	case AH_V6_FLOW:
 		fprintf(stdout, "IPSEC AH over IPV6 flows");
 		break;
+	case ESP_V6_FLOW:
+		fprintf(stdout, "IPSEC ESP over IPV6 flows");
+		break;
 	default:
 		break;
 	}


^ permalink raw reply related

* [ethtool PATCH 2/4] Remove strings based approach for displaying ntuple
From: Alexander Duyck @ 2011-02-25 23:48 UTC (permalink / raw)
  To: davem, jeffrey.t.kirsher, bhutchings; +Cc: netdev
In-Reply-To: <20110225233902.8409.74474.stgit@gitlad.jf.intel.com>

This change is meant to remove the strings based approach for displaying
ntuple filters.  A follow-on patch will replace that functionality with a
network flow classification based approach that will get the number of
filters, get their locations, and then request and display them
individually.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

 ethtool-copy.h |    3 +--
 ethtool.c      |   44 --------------------------------------------
 2 files changed, 1 insertions(+), 46 deletions(-)

diff --git a/ethtool-copy.h b/ethtool-copy.h
index 75c3ae7..dd1080b 100644
--- a/ethtool-copy.h
+++ b/ethtool-copy.h
@@ -250,7 +250,6 @@ enum ethtool_stringset {
 	ETH_SS_TEST		= 0,
 	ETH_SS_STATS,
 	ETH_SS_PRIV_FLAGS,
-	ETH_SS_NTUPLE_FILTERS,
 };
 
 /* for passing string sets for data tagging */
@@ -580,7 +579,7 @@ struct ethtool_flash {
 #define ETHTOOL_FLASHDEV	0x00000033 /* Flash firmware to device */
 #define ETHTOOL_RESET		0x00000034 /* Reset hardware */
 #define ETHTOOL_SRXNTUPLE	0x00000035 /* Add an n-tuple filter to device */
-#define ETHTOOL_GRXNTUPLE	0x00000036 /* Get n-tuple filters from device */
+/* ETHTOOL_GRXNTUPLE		0x00000036 disabled due to multiple issues */
 #define ETHTOOL_GSSET_INFO	0x00000037 /* Get string set info */
 #define ETHTOOL_GRXFHINDIR	0x00000038 /* Get RX flow hash indir'n table */
 #define ETHTOOL_SRXFHINDIR	0x00000039 /* Set RX flow hash indir'n table */
diff --git a/ethtool.c b/ethtool.c
index 14740d5..2a084db 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -3162,50 +3162,6 @@ static int do_srxntuple(int fd, struct ifreq *ifr)
 
 static int do_grxntuple(int fd, struct ifreq *ifr)
 {
-	struct ethtool_sset_info *sset_info;
-	struct ethtool_gstrings *strings;
-	int sz_str, n_strings, err, i;
-
-	sset_info = malloc(sizeof(struct ethtool_sset_info) + sizeof(u32));
-	sset_info->cmd = ETHTOOL_GSSET_INFO;
-	sset_info->sset_mask = (1ULL << ETH_SS_NTUPLE_FILTERS);
-	ifr->ifr_data = (caddr_t)sset_info;
-	err = send_ioctl(fd, ifr);
-
-	if ((err < 0) ||
-	    (!(sset_info->sset_mask & (1ULL << ETH_SS_NTUPLE_FILTERS)))) {
-		perror("Cannot get driver strings info");
-		return 100;
-	}
-
-	n_strings = sset_info->data[0];
-	free(sset_info);
-	sz_str = n_strings * ETH_GSTRING_LEN;
-
-	strings = calloc(1, sz_str + sizeof(struct ethtool_gstrings));
-	if (!strings) {
-		fprintf(stderr, "no memory available\n");
-		return 95;
-	}
-
-	strings->cmd = ETHTOOL_GRXNTUPLE;
-	strings->string_set = ETH_SS_NTUPLE_FILTERS;
-	strings->len = n_strings;
-	ifr->ifr_data = (caddr_t) strings;
-	err = send_ioctl(fd, ifr);
-	if (err < 0) {
-		perror("Cannot get Rx n-tuple information");
-		free(strings);
-		return 101;
-	}
-
-	n_strings = strings->len;
-	fprintf(stdout, "Rx n-tuple filters:\n");
-	for (i = 0; i < n_strings; i++)
-		fprintf(stdout, "%s", &strings->data[i * ETH_GSTRING_LEN]);
-
-	free(strings);
-
 	return 0;
 }
 


^ permalink raw reply related

* [ethtool PATCH 4/4] Add support for displaying a ntuple contained in an rx_flow_spec
From: Alexander Duyck @ 2011-02-25 23:49 UTC (permalink / raw)
  To: davem, jeffrey.t.kirsher, bhutchings; +Cc: netdev
In-Reply-To: <20110225233902.8409.74474.stgit@gitlad.jf.intel.com>

This change is meant to add support for displaying a ntuple filter by using
the unused space contained within a standard rx_flow_spec.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

 ethtool-copy.h |   20 ++++++++++++++++
 ethtool.8.in   |    6 +++++
 ethtool.c      |    3 ++
 rxclass.c      |   72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 101 insertions(+), 0 deletions(-)

diff --git a/ethtool-copy.h b/ethtool-copy.h
index dd1080b..4401238 100644
--- a/ethtool-copy.h
+++ b/ethtool-copy.h
@@ -376,10 +376,25 @@ struct ethtool_usrip4_spec {
 };
 
 /**
+ * struct ethtool_ntuple_spec_ext - flow spec extension for ntuple in nfc
+ * @unused: space unused by extension
+ * @vlan_etype: EtherType for vlan tagged packet to match
+ * @vlan_tci: VLAN tag to match
+ * @data: Driver-dependent data to match
+ */
+struct ethtool_ntuple_spec_ext {
+	__be32	unused[15];
+	__be16	vlan_etype;
+	__be16	vlan_tci;
+	__be32	data[2];
+};
+
+/**
  * struct ethtool_rx_flow_spec - specification for RX flow filter
  * @flow_type: Type of match to perform, e.g. %TCP_V4_FLOW
  * @h_u: Flow fields to match (dependent on @flow_type)
  * @m_u: Masks for flow field bits to be ignored
+ * @flow_type_ext: Type of extended match to perform, e.g. %NTUPLE_FLOW_EXT
  * @ring_cookie: RX ring/queue index to deliver to, or %RX_CLS_FLOW_DISC
  *	if packets should be discarded
  * @location: Index of filter in hardware table
@@ -394,8 +409,10 @@ struct ethtool_rx_flow_spec {
 		struct ethtool_ah_espip4_spec		esp_ip4_spec;
 		struct ethtool_usrip4_spec		usr_ip4_spec;
 		struct ethhdr				ether_spec;
+		struct ethtool_ntuple_spec_ext		ntuple_spec;
 		__u8					hdata[72];
 	} h_u, m_u;
+	__u32		flow_type_ext;
 	__u64		ring_cookie;
 	__u32		location;
 };
@@ -706,6 +723,9 @@ struct ethtool_flash {
 #define	IPV6_FLOW	0x11	/* hash only */
 #define	ETHER_FLOW	0x12	/* spec only (ether_spec) */
 
+/* Flow extension types for network flow classifier */
+#define NTUPLE_FLOW_EXT	0x01 /* indicates ntuple in nfc */
+
 /* L3-L4 network traffic flow hash options */
 #define	RXH_L2DA	(1 << 1)
 #define	RXH_VLAN	(1 << 2)
diff --git a/ethtool.8.in b/ethtool.8.in
index b68b010..9c768fb 100644
--- a/ethtool.8.in
+++ b/ethtool.8.in
@@ -278,6 +278,9 @@ ethtool \- query or control network driver and hardware settings
 .BM src-port
 .BM dst-port
 .BM spi
+.BM vlan-etype
+.BM vlan
+.BM user-def
 .BN action
 .BN loc
 .RB \ |\  flow-type \ \*(NC
@@ -764,6 +767,9 @@ Specify the value of the security parameter index field (applicable to
 AH/ESP packets)in the incoming packet to match along with an
 optional mask.
 .TP
+.BI vlan-etype \ N \\fR\ [\\fPvlan-etype-mask \ N \\fR]\\fP
+Includes the VLAN tag Ethertype and an optional mask.
+.TP
 .BI vlan \ N \\fR\ [\\fPvlan-mask \ N \\fR]\\fP
 Includes the VLAN tag and an optional mask.
 .TP
diff --git a/ethtool.c b/ethtool.c
index f4dfc39..d931353 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -254,6 +254,9 @@ static struct option {
 		"			[ src-port %d [src-port-mask %x] ]\n"
 		"			[ dst-port %d [dst-port-mask %x] ]\n"
 		"			[ spi %d [spi-mask %x] ]\n"
+		"			[ vlan-etype %x [vlan-etype-mask %x] ]\n"
+		"			[ vlan %x [vlan-mask %x] ]\n"
+		"			[ user-def %x [user-def-mask %x] ]\n"
 		"			[ action %d ]\n"
 		"			[ loc %d] |\n"
 		"		  flow-type ether|ip4|tcp4|udp4|sctp4|ah4|esp4\n"
diff --git a/rxclass.c b/rxclass.c
index f2a8c96..99071a4 100644
--- a/rxclass.c
+++ b/rxclass.c
@@ -30,6 +30,33 @@ struct rmgr_ctrl {
 static struct rmgr_ctrl rmgr;
 static int rmgr_init_done = 0;
 
+static void rmgr_print_nfc_spec_ext(struct ethtool_rx_flow_spec *fsp)
+{
+	u64 data, datam;
+	__u16 etype, etypem, tci, tcim;
+	
+	switch(fsp->flow_type_ext) {
+	case NTUPLE_FLOW_EXT:
+		etype = ntohs(fsp->h_u.ntuple_spec.vlan_etype);
+		etypem = ntohs(fsp->m_u.ntuple_spec.vlan_etype);
+		tci = ntohs(fsp->h_u.ntuple_spec.vlan_tci);
+		tcim = ntohs(fsp->m_u.ntuple_spec.vlan_tci);
+		data = (u64)ntohl(fsp->h_u.ntuple_spec.data[0]);
+		data = (u64)ntohl(fsp->h_u.ntuple_spec.data[1]) << 32;
+		datam = (u64)ntohl(fsp->m_u.ntuple_spec.data[0]);
+		datam = (u64)ntohl(fsp->m_u.ntuple_spec.data[1]) << 32;
+
+		fprintf(stdout,
+			"\tVLAN EtherType: 0x%x mask: 0x%x\n"
+			"\tVLAN: 0x%x mask: 0x%x\n"
+			"\tUser-defined: 0x%Lx mask: 0x%Lx\n",
+			etype, etypem, tci, tcim, data, datam);
+		break;
+	default:
+		break;
+	}
+}
+
 static void rmgr_print_nfc_rule(struct ethtool_rx_flow_spec *fsp)
 {
 	unsigned char	*smac, *smacm, *dmac, *dmacm;
@@ -127,6 +154,7 @@ static void rmgr_print_nfc_rule(struct ethtool_rx_flow_spec *fsp)
 		default:
 			break;
 		}
+		rmgr_print_nfc_spec_ext(fsp);
 		break;
 	case ETHER_FLOW:
 		dmac = fsp->h_u.ether_spec.h_dest;
@@ -148,6 +176,7 @@ static void rmgr_print_nfc_rule(struct ethtool_rx_flow_spec *fsp)
 			dmac[0], dmac[1], dmac[2], dmac[3], dmac[4], dmac[5],
 			dmacm[0], dmacm[1], dmacm[2], dmacm[3], dmacm[4], dmacm[5],
 			proto, protom);
+		rmgr_print_nfc_spec_ext(fsp);
 		break;
 	default:
 		fprintf(stdout,
@@ -516,6 +545,7 @@ typedef enum {
 #define NFC_FLAG_PROTO		0x080
 #define NTUPLE_FLAG_VLAN	0x100
 #define NTUPLE_FLAG_UDEF	0x200
+#define NTUPLE_FLAG_VETH	0x400
 
 struct rule_opts {
 	const char	*name;
@@ -545,6 +575,15 @@ static struct rule_opts rule_nfc_tcp_ip4[] = {
 	  offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 },
 	{ "loc", OPT_U32, NFC_FLAG_LOC,
 	  offsetof(struct ethtool_rx_flow_spec, location), -1 },
+	{ "vlan-etype", OPT_BE16, NTUPLE_FLAG_VETH,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.ntuple_spec.vlan_etype),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.ntuple_spec.vlan_etype) },
+	{ "vlan", OPT_BE16, NTUPLE_FLAG_VLAN,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.ntuple_spec.vlan_tci),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.ntuple_spec.vlan_tci) },
+	{ "user-def", OPT_BE64, NTUPLE_FLAG_UDEF,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.ntuple_spec.data),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.ntuple_spec.data) },
 };
 
 static struct rule_opts rule_nfc_esp_ip4[] = {
@@ -564,6 +603,15 @@ static struct rule_opts rule_nfc_esp_ip4[] = {
 	  offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 },
 	{ "loc", OPT_U32, NFC_FLAG_LOC,
 	  offsetof(struct ethtool_rx_flow_spec, location), -1 },
+	{ "vlan-etype", OPT_BE16, NTUPLE_FLAG_VETH,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.ntuple_spec.vlan_etype),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.ntuple_spec.vlan_etype) },
+	{ "vlan", OPT_BE16, NTUPLE_FLAG_VLAN,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.ntuple_spec.vlan_tci),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.ntuple_spec.vlan_tci) },
+	{ "user-def", OPT_BE64, NTUPLE_FLAG_UDEF,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.ntuple_spec.data),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.ntuple_spec.data) },
 };
 
 static struct rule_opts rule_nfc_usr_ip4[] = {
@@ -592,6 +640,15 @@ static struct rule_opts rule_nfc_usr_ip4[] = {
 	  offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 },
 	{ "loc", OPT_U32, NFC_FLAG_LOC,
 	  offsetof(struct ethtool_rx_flow_spec, location), -1 },
+	{ "vlan-etype", OPT_BE16, NTUPLE_FLAG_VETH,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.ntuple_spec.vlan_etype),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.ntuple_spec.vlan_etype) },
+	{ "vlan", OPT_BE16, NTUPLE_FLAG_VLAN,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.ntuple_spec.vlan_tci),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.ntuple_spec.vlan_tci) },
+	{ "user-def", OPT_BE64, NTUPLE_FLAG_UDEF,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.ntuple_spec.data),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.ntuple_spec.data) },
 };
 
 static struct rule_opts rule_nfc_ether[] = {
@@ -608,6 +665,15 @@ static struct rule_opts rule_nfc_ether[] = {
 	  offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 },
 	{ "loc", OPT_U32, NFC_FLAG_LOC,
 	  offsetof(struct ethtool_rx_flow_spec, location), -1 },
+	{ "vlan-etype", OPT_BE16, NTUPLE_FLAG_VETH,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.ntuple_spec.vlan_etype),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.ntuple_spec.vlan_etype) },
+	{ "vlan", OPT_BE16, NTUPLE_FLAG_VLAN,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.ntuple_spec.vlan_tci),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.ntuple_spec.vlan_tci) },
+	{ "user-def", OPT_BE64, NTUPLE_FLAG_UDEF,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.ntuple_spec.data),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.ntuple_spec.data) },
 };
 
 static struct rule_opts rule_ntuple_tcp_ip4[] = {
@@ -1159,6 +1225,12 @@ int rxclass_parse_ruleopts(char **argp, int argc, void *fsp, u8 *loc_valid)
 	if (spec_type == ETH_SPEC_NFC) {
 		if (loc_valid && (flags & NFC_FLAG_LOC))
 			*loc_valid = 1;
+		if (flags & (NTUPLE_FLAG_VLAN |
+			     NTUPLE_FLAG_UDEF |
+			     NTUPLE_FLAG_VETH)) {
+			struct ethtool_rx_flow_spec *fs = fsp;
+			fs->flow_type_ext = NTUPLE_FLOW_EXT;
+		}
 	}
 
 	return 0;


^ permalink raw reply related

* [ethtool PATCH 3/4] v2 Add RX packet classification interface
From: Alexander Duyck @ 2011-02-25 23:48 UTC (permalink / raw)
  To: davem, jeffrey.t.kirsher, bhutchings; +Cc: netdev
In-Reply-To: <20110225233902.8409.74474.stgit@gitlad.jf.intel.com>

From: Santwona Behera <santwona.behera@sun.com>

This patch was originally introduced as:
  [PATCH 1/3] [ethtool] Add rx pkt classification interface
  Signed-off-by: Santwona Behera <santwona.behera@sun.com>
  http://patchwork.ozlabs.org/patch/23223/

I have updated it to address a number of issues.  As a result I removed the
local caching of rules due to the fact that there were memory leaks in this
code and the rule manager would consume over 1Mb of space for an 8K table
when all that was needed was 1K in order to store which rules were active
and which were not.

In addition I dropped the use of regions as there were multiple issue found
including the fact that the regions were not properly expanding beyond 2
and the fact that the regions required reading all of the rules in order to
correctly expand beyond 2.  By dropping the regions from the rule manager
it is possible to write a much cleaner interface leaving region management
to be done by either the driver or by external management scripts.

I also added an ethtool bitops interface to allow for simple bit set and
test activities since the rule manager can most efficiently store the list
of active rules via a bitmap.

This patch now also merges the functionality of the packet classification
into the ntuple interface.  This is done by using the key word flow-type to
indicate the addition of an ntuple, class-rule-add to indicate the addition
of a network flow classifier, and class-rule-del to indicate the deletion
of a network flow classifier. Since ntuple display functionality was
already removed I have made the defalt for the -u option to display the
number of rings and all network flow classification filters.  If a single
rule is requested via class-rule %d then only that rule will be displayed.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

 Makefile.am      |    3 
 ethtool-bitops.h |   25 +
 ethtool-util.h   |   42 ++
 ethtool.8.in     |  198 +++++----
 ethtool.c        |  284 ++++++-------
 rxclass.c        | 1169 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 1459 insertions(+), 262 deletions(-)
 create mode 100644 ethtool-bitops.h
 create mode 100644 rxclass.c

diff --git a/Makefile.am b/Makefile.am
index a0d2116..0262c31 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -8,7 +8,8 @@ ethtool_SOURCES = ethtool.c ethtool-copy.h ethtool-util.h	\
 		  amd8111e.c de2104x.c e100.c e1000.c igb.c	\
 		  fec_8xx.c ibm_emac.c ixgb.c ixgbe.c natsemi.c	\
 		  pcnet32.c realtek.c tg3.c marvell.c vioc.c	\
-		  smsc911x.c at76c50x-usb.c sfc.c stmmac.c
+		  smsc911x.c at76c50x-usb.c sfc.c stmmac.c	\
+		  rxclass.c
 
 dist-hook:
 	cp $(top_srcdir)/ethtool.spec $(distdir)
diff --git a/ethtool-bitops.h b/ethtool-bitops.h
new file mode 100644
index 0000000..93d32a4
--- /dev/null
+++ b/ethtool-bitops.h
@@ -0,0 +1,25 @@
+#ifndef ETHTOOL_BITOPS_H__
+#define ETHTOOL_BITOPS_H__
+
+#define BITS_PER_BYTE		8
+#define BITS_PER_LONG		BITS_PER_BYTE * sizeof(long)
+#define DIV_ROUND_UP(n, d)	(((n) + (d) - 1) / (d))
+#define BITS_TO_LONGS(nr)	DIV_ROUND_UP(nr, BITS_PER_LONG)
+
+static inline void set_bit(int nr, unsigned long *addr)
+{
+	addr[nr / BITS_PER_LONG] |= 1UL << (nr % BITS_PER_LONG);
+}
+
+static inline void clear_bit(int nr, unsigned long *addr)
+{
+	addr[nr / BITS_PER_LONG] &= ~(1UL << (nr % BITS_PER_LONG));
+}
+
+static __always_inline int test_bit(unsigned int nr, const unsigned long *addr)
+{
+	return ((1UL << (nr % BITS_PER_LONG)) &
+		(((unsigned long *)addr)[nr / BITS_PER_LONG])) != 0UL;
+}
+
+#endif
diff --git a/ethtool-util.h b/ethtool-util.h
index f053028..182658b 100644
--- a/ethtool-util.h
+++ b/ethtool-util.h
@@ -5,15 +5,18 @@
 
 #include <sys/types.h>
 #include <endian.h>
+#include <sys/ioctl.h>
+#include <net/if.h>
+#include "ethtool-config.h"
+#include "ethtool-copy.h"
 
 /* ethtool.h expects these to be defined by <linux/types.h> */
 #ifndef HAVE_BE_TYPES
 typedef __uint16_t __be16;
 typedef __uint32_t __be32;
+typedef unsigned long long __be64;
 #endif
 
-#include "ethtool-copy.h"
-
 typedef unsigned long long u64;
 typedef __uint32_t u32;
 typedef __uint16_t u16;
@@ -23,11 +26,15 @@ typedef __int32_t s32;
 #if __BYTE_ORDER == __BIG_ENDIAN
 static inline u16 cpu_to_be16(u16 value)
 {
-    return value;
+	return value;
 }
 static inline u32 cpu_to_be32(u32 value)
 {
-    return value;
+	return value;
+}
+static inline u64 cpu_to_be64(u64 value)
+{
+	return value;
 }
 #else
 static inline u16 cpu_to_be16(u16 value)
@@ -38,6 +45,21 @@ static inline u32 cpu_to_be32(u32 value)
 {
 	return cpu_to_be16(value >> 16) | (cpu_to_be16(value) << 16);
 }
+static inline u64 cpu_to_be64(u64 value)
+{
+	return cpu_to_be32(value >> 32) | ((u64)cpu_to_be32(value) << 32);
+}
+#endif
+
+#define ntohll cpu_to_be64
+#define htonll cpu_to_be64
+
+#ifndef ARRAY_SIZE
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+#endif
+
+#ifndef SIOCETHTOOL
+#define SIOCETHTOOL     0x8946
 #endif
 
 /* National Semiconductor DP83815, DP83816 */
@@ -103,4 +125,14 @@ int sfc_dump_regs(struct ethtool_drvinfo *info, struct ethtool_regs *regs);
 int st_mac100_dump_regs(struct ethtool_drvinfo *info,
 			struct ethtool_regs *regs);
 int st_gmac_dump_regs(struct ethtool_drvinfo *info, struct ethtool_regs *regs);
-#endif
+
+/* Rx flow classification */
+int rxclass_parse_ruleopts(char **optstr, int opt_cnt,
+			   void *fsp, __u8 *loc_valid);
+int rxclass_rule_getall(int fd, struct ifreq *ifr);
+int rxclass_rule_get(int fd, struct ifreq *ifr, __u32 loc);
+int rxclass_rule_ins(int fd, struct ifreq *ifr,
+		     struct ethtool_rx_flow_spec *fsp, __u8 loc_valid);
+int rxclass_rule_del(int fd, struct ifreq *ifr, __u32 loc);
+
+#endif /* ETHTOOL_UTIL_H__ */
diff --git a/ethtool.8.in b/ethtool.8.in
index 7dec259..b68b010 100644
--- a/ethtool.8.in
+++ b/ethtool.8.in
@@ -40,10 +40,20 @@
 [\\fB\\$1\\fP\ \\fIN\\fP]
 ..
 .\"
+.\"	.BM - same as above but has a mask field for format "[value N [value-mask N]]"
+.\"
+.de BM
+[\\fB\\$1\\fP\ \\fIN\\fP\ [\\fB\\$1\-mask\\fP\ \\fIN\\fP]]
+..
+.\"
 .\"	\(*MA - mac address
 .\"
 .ds MA \fIxx\fP\fB:\fP\fIyy\fP\fB:\fP\fIzz\fP\fB:\fP\fIaa\fP\fB:\fP\fIbb\fP\fB:\fP\fIcc\fP
 .\"
+.\"	\(*PA - IP address
+.\"
+.ds PA \fIx\fP\fB.\fP\fIx\fP\fB.\fP\fIx\fP\fB.\fP\fIx\fP
+.\"
 .\"	\(*WO - wol flags
 .\"
 .ds WO \fBp\fP|\fBu\fP|\fBm\fP|\fBb\fP|\fBa\fP|\fBg\fP|\fBs\fP|\fBd\fP...
@@ -55,6 +65,12 @@
 .\"	\(*HO - hash options
 .\"
 .ds HO \fBm\fP|\fBv\fP|\fBt\fP|\fBs\fP|\fBd\fP|\fBf\fP|\fBn\fP|\fBr\fP...
+.\"
+.\"	\(*NC - Network Classifier type values
+.\"
+.ds NC \fBether\fP|\fBip4\fP|\fBtcp4\fP|\fBudp4\fP|\fBsctp4\fP|\fBah4\fP|\fBesp4\fP
+
+.\"
 .\" Start URL.
 .de UR
 .  ds m1 \\$1\"
@@ -227,8 +243,7 @@ ethtool \- query or control network driver and hardware settings
 
 .B ethtool \-N
 .I ethX
-.RB [ rx-flow-hash \ \*(FL
-.RB \ \*(HO]
+.RB [ rx-flow-hash \ \*(FL \  \*(HO]
 
 .B ethtool \-x|\-\-show\-rxfh\-indir
 .I ethX
@@ -248,51 +263,38 @@ ethtool \- query or control network driver and hardware settings
 
 .B ethtool \-u|\-\-show\-ntuple
 .I ethX
+.BN class-rule
 
-.TP
 .BI ethtool\ \-U|\-\-config\-ntuple \ ethX
-.RB {
-.A3 flow-type tcp4 udp4 sctp4
-.RB [ src-ip
-.IR addr
-.RB [ src-ip-mask
-.IR mask ]]
-.RB [ dst-ip
-.IR addr
-.RB [ dst-ip-mask
-.IR mask ]]
-.RB [ src-port
-.IR port
-.RB [ src-port-mask
-.IR mask ]]
-.RB [ dst-port
-.IR port
-.RB [ dst-port-mask
-.IR mask ]]
-.br
-.RB | \ flow-type\ ether
-.RB [ src
-.IR mac-addr
-.RB [ src-mask
-.IR mask ]]
-.RB [ dst
-.IR mac-addr
-.RB [ dst-mask
-.IR mask ]]
-.RB [ proto
-.IR N
-.RB [ proto-mask
-.IR mask ]]\ }
-.br
-.RB [ vlan
-.IR VLAN-tag
-.RB [ vlan-mask
-.IR mask ]]
-.RB [ user-def
-.IR data
-.RB [ user-def-mask
-.IR mask ]]
-.RI action \ N
+.BN class-rule-del
+.RB [\  class-rule-add \ \*(NC
+.RB [ src \ \*(MA\ [ src-mask \ \*(MA]]
+.RB [ dst \ \*(MA\ [ dst-mask \ \*(MA]]
+.BM proto
+.RB [ src-ip \ \*(PA\ [ src-ip-mask \ \*(PA]]
+.RB [ dst-ip \ \*(PA\ [ dst-ip-mask \ \*(PA]]
+.BM tos
+.BM l4proto
+.BM src-port
+.BM dst-port
+.BM spi
+.BN action
+.BN loc
+.RB \ |\  flow-type \ \*(NC
+.RB [ src \ \*(MA\ [ src-mask \ \*(MA]]
+.RB [ dst \ \*(MA\ [ dst-mask \ \*(MA]]
+.BM proto
+.RB [ src-ip \ \*(PA\ [ src-ip-mask \ \*(PA]]
+.RB [ dst-ip \ \*(PA\ [ dst-ip-mask \ \*(PA]]
+.BM tos
+.BM l4proto
+.BM src-port
+.BM dst-port
+.BM spi
+.BM vlan
+.BM user-def
+.BN action
+.RB ]
 
 .SH DESCRIPTION
 .BI ethtool
@@ -654,7 +656,8 @@ Hash on bytes 0 and 1 of the Layer 4 header of the rx packet.
 Hash on bytes 2 and 3 of the Layer 4 header of the rx packet.
 .TP 3
 .B r
-Discard all packets of this flow type. When this option is set, all other options are ignored.
+Discard all packets of this flow type. When this option is set, all
+other options are ignored.
 .PD
 .RE
 .TP
@@ -685,14 +688,23 @@ Default region is 0 which denotes all regions in the flash.
 .TP
 .B \-u \-\-show-ntuple
 Get Rx ntuple filters and actions, then display them to the user.
+.TP
+.BI class-rule \ N
+Retrieves the RX classification rule with the given ID.
 .PD
 .RE
 .TP
 .B \-U \-\-config-ntuple
 Configure Rx ntuple filters and actions
 .TP
-.B flow-type tcp4|udp4|sctp4|ether
+.BI class-rule-del \ N
+Deletes the RX classification rule with the given ID.
+.PP
+.BR class-rule-add \ \*(NC
+.br
+.BR flow-type \ \*(NC
 .RS
+Adds an RX packet classification rule.
 .PD 0
 .TP 3
 .BR "tcp4" "    TCP over IPv4"
@@ -701,78 +713,80 @@ Configure Rx ntuple filters and actions
 .TP 3
 .BR "sctp4" "   SCTP over IPv4"
 .TP 3
+.BR "ah4" "     IPSEC AH over IPv4"
+.TP 3
+.BR "esp4" "    IPSEC ESP over IPv4"
+.TP 3
+.BR "ip4" "     Raw IPv4"
+.TP 3
 .BR "ether" "   Ethernet"
 .PD
 .RE
 .TP
-.BI src-ip \ addr
-Includes the source IP address, specified using dotted-quad notation
-or as a single 32-bit number.
-.TP
-.BI src-ip-mask \ mask
-Specify a mask for the source IP address.
-.TP
-.BI dst-ip \ addr
-Includes the destination IP address.
-.TP
-.BI dst-ip-mask \ mask
-Specify a mask for the destination IP address.
-.TP
-.BI src-port \ port
-Includes the source port.
-.TP
-.BI src-port-mask \ mask
-Specify a mask for the source port.
-.TP
-.BI dst-port \ port
-Includes the destination port.
+.BR src \ \*(MA\ [ src-mask \ \*(MA]
+Includes the source MAC address, specified as 6 bytes in hexadecimal
+separated by colons, along with an optional mask.
 .TP
-.BI dst-port-mask \ mask
-Specify a mask for the destination port.
+.BR dst \ \*(MA\ [ src-mask \ \*(MA]
+Includes the destination MAC address, specified as 6 bytes in hexadecimal
+separated by colons, along with an optional mask.
 .TP
-.BI src \ mac-addr
-Includes the source MAC address, specified as 6 bytes in hexadecimal
-separated by colons.
+.BI proto \ N \\fR\ [\\fPproto-mask \ N \\fR]\\fP
+Includes the Ethernet protocol number (ethertype) and an optional mask.
 .TP
-.BI src-mask \ mask
-Specify a mask for the source MAC address.
+.BR src-ip \ \*(PA\ [ src-ip-mask \ \*(PA]
+Specify the source IP address of the incoming packet to
+match along with an optional mask.
 .TP
-.BI dst \ mac-addr
-Includes the destination MAC address.
+.BR dst-ip \ \*(PA\ [ dst-ip-mask \ \*(PA]
+Specify the destination IP address of the incoming packet to
+match along with an optional mask.
 .TP
-.BI dst-mask \ mask
-Specify a mask for the destination MAC address.
+.BI tos \ N \\fR\ [\\fPtos-mask \ N \\fR]\\fP
+Specify the value of the Type of Service field in the incoming packet to
+match along with an optional mask.
 .TP
-.BI proto \ N
-Includes the Ethernet protocol number (ethertype).
+.BI l4proto \ N \\fR\ [\\fPl4proto-mask \ N \\fR]\\fP
+Includes the layer 4 protocol number and optional mask.
 .TP
-.BI proto-mask \ mask
-Specify a mask for the Ethernet protocol number.
+.BI src-port \ N \\fR\ [\\fPsrc-port-mask \ N \\fR]\\fP
+Specify the value of the source port field (applicable to
+TCP/UDP packets)in the incoming packet to match along with an
+optional mask.
 .TP
-.BI vlan \ VLAN-tag
-Includes the VLAN tag.
+.BI dst-port \ N \\fR\ [\\fPdst-port-mask \ N \\fR]\\fP
+Specify the value of the destination port field (applicable to
+TCP/UDP packets)in the incoming packet to match along with an
+optional mask.
 .TP
-.BI vlan-mask \ mask
-Specify a mask for the VLAN tag.
+.BI spi \ N \\fR\ [\\fPspi-mask \ N \\fR]\\fP
+Specify the value of the security parameter index field (applicable to
+AH/ESP packets)in the incoming packet to match along with an
+optional mask.
 .TP
-.BI user-def \ data
-Includes 64-bits of user-specific data.
+.BI vlan \ N \\fR\ [\\fPvlan-mask \ N \\fR]\\fP
+Includes the VLAN tag and an optional mask.
 .TP
-.BI user-def-mask \ mask
-Specify a mask for the user-specific data.
+.BI user-def \ N \\fR\ [\\fPuser-def-mask \ N \\fR]\\fP
+Includes 64-bits of user-specific data and an optional mask.
 .TP
 .BI action \ N
 Specifies the Rx queue to send packets to, or some other action.
 .RS
 .PD 0
 .TP 3
-.BR "-2" "             Clear the filter"
+.BR "-2" "             Clear the filter (ntuple only)"
 .TP 3
 .BR "-1" "             Drop the matched flow"
 .TP 3
 .BR "0 or higher" "    Rx queue to route the flow"
 .PD
 .RE
+.TP
+.BI loc \ N
+Specify the location/ID to insert the rule. This will overwrite
+any rule present in that location and will not go through any
+of the rule ordering process.
 .SH BUGS
 Not supported (in part or whole) on all network drivers.
 .SH AUTHOR
diff --git a/ethtool.c b/ethtool.c
index 2a084db..f4dfc39 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -6,6 +6,7 @@
  * Kernel 2.4 update Copyright 2001 Jeff Garzik <jgarzik@mandrakesoft.com>
  * Wake-on-LAN,natsemi,misc support by Tim Hockin <thockin@sun.com>
  * Portions Copyright 2002 Intel
+ * Portions Copyright (C) Sun Microsystems 2008
  * do_test support by Eli Kupermann <eli.kupermann@intel.com>
  * ETHTOOL_PHYS_ID support by Chris Leech <christopher.leech@intel.com>
  * e1000 support by Scott Feldman <scott.feldman@intel.com>
@@ -14,6 +15,7 @@
  * amd8111e support by Reeja John <reeja.john@amd.com>
  * long arguments by Andi Kleen.
  * SMSC LAN911x support by Steve Glendinning <steve.glendinning@smsc.com>
+ * Rx Network Flow Control configuration support <santwona.behera@sun.com>
  * Various features by Ben Hutchings <bhutchings@solarflare.com>;
  *	Copyright 2009, 2010 Solarflare Communications
  *
@@ -43,18 +45,13 @@
 #include <arpa/inet.h>
 
 #include <linux/sockios.h>
+#include <sys/socket.h>
+#include <arpa/inet.h>
 #include "ethtool-util.h"
 
-
-#ifndef SIOCETHTOOL
-#define SIOCETHTOOL     0x8946
-#endif
 #ifndef MAX_ADDR_LEN
 #define MAX_ADDR_LEN	32
 #endif
-#ifndef ARRAY_SIZE
-#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
-#endif
 
 #ifndef HAVE_NETIF_MSG
 enum {
@@ -100,7 +97,6 @@ static int do_gstats(int fd, struct ifreq *ifr);
 static int rxflow_str_to_type(const char *str);
 static int parse_rxfhashopts(char *optstr, u32 *data);
 static char *unparse_rxfhashopts(u64 opts);
-static void parse_rxntupleopts(int argc, char **argp, int first_arg);
 static int dump_rxfhash(int fhash, u64 val);
 static int do_srxclass(int fd, struct ifreq *ifr);
 static int do_grxclass(int fd, struct ifreq *ifr);
@@ -246,20 +242,37 @@ static struct option {
 		"		equal N | weight W0 W1 ...\n" },
     { "-U", "--config-ntuple", MODE_SNTUPLE, "Configure Rx ntuple filters "
 		"and actions",
-		"		{ flow-type tcp4|udp4|sctp4\n"
-		"		  [ src-ip ADDR [src-ip-mask MASK] ]\n"
-		"		  [ dst-ip ADDR [dst-ip-mask MASK] ]\n"
-		"		  [ src-port PORT [src-port-mask MASK] ]\n"
-		"		  [ dst-port PORT [dst-port-mask MASK] ]\n"
-		"		| flow-type ether\n"
-		"		  [ src MAC-ADDR [src-mask MASK] ]\n"
-		"		  [ dst MAC-ADDR [dst-mask MASK] ]\n"
-		"		  [ proto N [proto-mask MASK] ] }\n"
-		"		[ vlan VLAN-TAG [vlan-mask MASK] ]\n"
-		"		[ user-def DATA [user-def-mask MASK] ]\n"
-		"		action N\n" },
+		"		[ class-rule-del %d ]\n"
+		"		[ class-rule-add ether|ip4|tcp4|udp4|sctp4|ah4|esp4\n"
+		"			[ src %x:%x:%x:%x:%x:%x [src-mask %x:%x:%x:%x:%x:%x] ]\n"
+		"			[ dst %x:%x:%x:%x:%x:%x [dst-mask %x:%x:%x:%x:%x:%x] ]\n"
+		"			[ proto %d [proto-mask MASK] ]\n"
+		"			[ src-ip %d.%d.%d.%d [src-ip-mask %d.%d.%d.%d] ]\n"
+		"			[ dst-ip %d.%d.%d.%d [dst-ip-mask %d.%d.%d.%d] ]\n"
+		"			[ tos %d [tos-mask %x] ]\n"
+		"			[ l4proto %d [l4proto-mask MASK] ]\n"
+		"			[ src-port %d [src-port-mask %x] ]\n"
+		"			[ dst-port %d [dst-port-mask %x] ]\n"
+		"			[ spi %d [spi-mask %x] ]\n"
+		"			[ action %d ]\n"
+		"			[ loc %d] |\n"
+		"		  flow-type ether|ip4|tcp4|udp4|sctp4|ah4|esp4\n"
+		"			[ src %x:%x:%x:%x:%x:%x [src-mask %x:%x:%x:%x:%x:%x] ]\n"
+		"			[ dst %x:%x:%x:%x:%x:%x [dst-mask %x:%x:%x:%x:%x:%x] ]\n"
+		"			[ proto %d [proto-mask MASK] ]\n"
+		"			[ src-ip %d.%d.%d.%d [src-ip-mask %d.%d.%d.%d] ]\n"
+		"			[ dst-ip %d.%d.%d.%d [dst-ip-mask %d.%d.%d.%d] ]\n"
+		"			[ tos %d [tos-mask %x] ]\n"
+		"			[ l4proto %d [l4proto-mask MASK] ]\n"
+		"			[ src-port %d [src-port-mask %x] ]\n"
+		"			[ dst-port %d [dst-port-mask %x] ]\n"
+		"			[ spi %d [spi-mask %x] ]\n"
+		"			[ vlan %x [vlan-mask %x] ]\n"
+		"			[ user-def %x [user-def-mask %x] ]\n"
+		"			[ action %d ] ]\n" },
     { "-u", "--show-ntuple", MODE_GNTUPLE,
-		"Get Rx ntuple filters and actions\n" },
+		"Get Rx ntuple filters and actions",
+		"		[ class-rule %d ]\n"},
     { "-P", "--show-permaddr", MODE_PERMADDR,
 		"Show permanent hardware address" },
     { "-h", "--help", MODE_HELP, "Show this help" },
@@ -381,24 +394,6 @@ static int rxfhindir_equal = 0;
 static char **rxfhindir_weight = NULL;
 static int sntuple_changed = 0;
 static struct ethtool_rx_ntuple_flow_spec ntuple_fs;
-static int ntuple_ip4src_seen = 0;
-static int ntuple_ip4src_mask_seen = 0;
-static int ntuple_ip4dst_seen = 0;
-static int ntuple_ip4dst_mask_seen = 0;
-static int ntuple_psrc_seen = 0;
-static int ntuple_psrc_mask_seen = 0;
-static int ntuple_pdst_seen = 0;
-static int ntuple_pdst_mask_seen = 0;
-static int ntuple_ether_dst_seen = 0;
-static int ntuple_ether_dst_mask_seen = 0;
-static int ntuple_ether_src_seen = 0;
-static int ntuple_ether_src_mask_seen = 0;
-static int ntuple_ether_proto_seen = 0;
-static int ntuple_ether_proto_mask_seen = 0;
-static int ntuple_vlan_tag_seen = 0;
-static int ntuple_vlan_tag_mask_seen = 0;
-static int ntuple_user_def_seen = 0;
-static int ntuple_user_def_mask_seen = 0;
 static char *flash_file = NULL;
 static int flash = -1;
 static int flash_region = -1;
@@ -407,6 +402,12 @@ static int msglvl_changed;
 static u32 msglvl_wanted = 0;
 static u32 msglvl_mask = 0;
 
+static int rx_class_rule_get = -1;
+static int rx_class_rule_del = -1;
+static int rx_class_rule_added = 0;
+static struct ethtool_rx_flow_spec rx_rule_fs;
+static u8 rxclass_loc_valid = 0;
+
 static enum {
 	ONLINE=0,
 	OFFLINE,
@@ -519,58 +520,6 @@ static struct cmdline_info cmdline_coalesce[] = {
 	{ "tx-frames-high", CMDL_S32, &coal_tx_frames_high_wanted, &ecoal.tx_max_coalesced_frames_high },
 };
 
-static struct cmdline_info cmdline_ntuple_tcp_ip4[] = {
-	{ "src-ip", CMDL_IP4, &ntuple_fs.h_u.tcp_ip4_spec.ip4src, NULL,
-	  0, &ntuple_ip4src_seen },
-	{ "src-ip-mask", CMDL_IP4, &ntuple_fs.m_u.tcp_ip4_spec.ip4src, NULL,
-	  0, &ntuple_ip4src_mask_seen },
-	{ "dst-ip", CMDL_IP4, &ntuple_fs.h_u.tcp_ip4_spec.ip4dst, NULL,
-	  0, &ntuple_ip4dst_seen },
-	{ "dst-ip-mask", CMDL_IP4, &ntuple_fs.m_u.tcp_ip4_spec.ip4dst, NULL,
-	  0, &ntuple_ip4dst_mask_seen },
-	{ "src-port", CMDL_BE16, &ntuple_fs.h_u.tcp_ip4_spec.psrc, NULL,
-	  0, &ntuple_psrc_seen },
-	{ "src-port-mask", CMDL_BE16, &ntuple_fs.m_u.tcp_ip4_spec.psrc, NULL,
-	  0, &ntuple_psrc_mask_seen },
-	{ "dst-port", CMDL_BE16, &ntuple_fs.h_u.tcp_ip4_spec.pdst, NULL,
-	  0, &ntuple_pdst_seen },
-	{ "dst-port-mask", CMDL_BE16, &ntuple_fs.m_u.tcp_ip4_spec.pdst, NULL,
-	  0, &ntuple_pdst_mask_seen },
-	{ "vlan", CMDL_U16, &ntuple_fs.vlan_tag, NULL,
-	  0, &ntuple_vlan_tag_seen },
-	{ "vlan-mask", CMDL_U16, &ntuple_fs.vlan_tag_mask, NULL,
-	  0, &ntuple_vlan_tag_mask_seen },
-	{ "user-def", CMDL_U64, &ntuple_fs.data, NULL,
-	  0, &ntuple_user_def_seen },
-	{ "user-def-mask", CMDL_U64, &ntuple_fs.data_mask, NULL,
-	  0, &ntuple_user_def_mask_seen },
-	{ "action", CMDL_S32, &ntuple_fs.action, NULL },
-};
-
-static struct cmdline_info cmdline_ntuple_ether[] = {
-	{ "dst", CMDL_MAC, ntuple_fs.h_u.ether_spec.h_dest, NULL,
-	  0, &ntuple_ether_dst_seen },
-	{ "dst-mask", CMDL_MAC, ntuple_fs.m_u.ether_spec.h_dest, NULL,
-	  0, &ntuple_ether_dst_mask_seen },
-	{ "src", CMDL_MAC, ntuple_fs.h_u.ether_spec.h_source, NULL,
-	  0, &ntuple_ether_src_seen },
-	{ "src-mask", CMDL_MAC, ntuple_fs.m_u.ether_spec.h_source, NULL,
-	  0, &ntuple_ether_src_mask_seen },
-	{ "proto", CMDL_BE16, &ntuple_fs.h_u.ether_spec.h_proto, NULL,
-	  0, &ntuple_ether_proto_seen },
-	{ "proto-mask", CMDL_BE16, &ntuple_fs.m_u.ether_spec.h_proto, NULL,
-	  0, &ntuple_ether_proto_mask_seen },
-	{ "vlan", CMDL_U16, &ntuple_fs.vlan_tag, NULL,
-	  0, &ntuple_vlan_tag_seen },
-	{ "vlan-mask", CMDL_U16, &ntuple_fs.vlan_tag_mask, NULL,
-	  0, &ntuple_vlan_tag_mask_seen },
-	{ "user-def", CMDL_U64, &ntuple_fs.data, NULL,
-	  0, &ntuple_user_def_seen },
-	{ "user-def-mask", CMDL_U64, &ntuple_fs.data_mask, NULL,
-	  0, &ntuple_user_def_mask_seen },
-	{ "action", CMDL_S32, &ntuple_fs.action, NULL },
-};
-
 static struct cmdline_info cmdline_msglvl[] = {
 	{ "drv", CMDL_FLAG, &msglvl_wanted, NULL,
 	  NETIF_MSG_DRV, &msglvl_mask },
@@ -924,14 +873,49 @@ static void parse_cmdline(int argc, char **argp)
 			}
 			if (mode == MODE_SNTUPLE) {
 				if (!strcmp(argp[i], "flow-type")) {
+					if (rxclass_parse_ruleopts(&argp[i],
+								   argc - i,
+								   &ntuple_fs,
+								   NULL) < 0) {
+						show_usage(1);
+					} else {
+						i = argc;
+						sntuple_changed = 1;
+					}
+				} else if (!strcmp(argp[i], "class-rule-del")) {
 					i += 1;
 					if (i >= argc) {
 						show_usage(1);
 						break;
 					}
-					parse_rxntupleopts(argc, argp, i);
-					i = argc;
-					break;
+					rx_class_rule_del =
+						get_uint_range(argp[i], 0,
+							       INT_MAX);
+				} else if (!strcmp(argp[i], "class-rule-add")) {
+					if (rxclass_parse_ruleopts(&argp[i],
+								   argc - i,
+								   &rx_rule_fs,
+								   &rxclass_loc_valid) < 0) {
+						show_usage(1);
+					} else {
+						i = argc;
+						rx_class_rule_added = 1;
+					}
+				} else {
+					show_usage(1);
+				}
+				break;
+			}
+			if (mode == MODE_GNTUPLE) {
+				if (!strcmp(argp[i], "class-rule")) {
+					i += 1;
+					if (i >= argc) {
+						show_usage(1);
+						break;
+					}
+					rx_class_rule_get =
+						get_uint_range(argp[i], 0,
+							       INT_MAX);
 				} else {
 					show_usage(1);
 				}
@@ -981,8 +965,10 @@ static void parse_cmdline(int argc, char **argp)
 						show_usage(1);
 					else
 						rx_fhash_changed = 1;
-				} else
+				} else {
 					show_usage(1);
+				}
+
 				break;
 			}
 			if (mode == MODE_SRXFHINDIR) {
@@ -1594,66 +1580,6 @@ static char *unparse_rxfhashopts(u64 opts)
 	return buf;
 }
 
-static void parse_rxntupleopts(int argc, char **argp, int i)
-{
-	ntuple_fs.flow_type = rxflow_str_to_type(argp[i]);
-
-	switch (ntuple_fs.flow_type) {
-	case TCP_V4_FLOW:
-	case UDP_V4_FLOW:
-	case SCTP_V4_FLOW:
-		parse_generic_cmdline(argc, argp, i + 1,
-				      &sntuple_changed,
-				      cmdline_ntuple_tcp_ip4,
-				      ARRAY_SIZE(cmdline_ntuple_tcp_ip4));
-		if (!ntuple_ip4src_seen)
-			ntuple_fs.m_u.tcp_ip4_spec.ip4src = 0xffffffff;
-		if (!ntuple_ip4dst_seen)
-			ntuple_fs.m_u.tcp_ip4_spec.ip4dst = 0xffffffff;
-		if (!ntuple_psrc_seen)
-			ntuple_fs.m_u.tcp_ip4_spec.psrc = 0xffff;
-		if (!ntuple_pdst_seen)
-			ntuple_fs.m_u.tcp_ip4_spec.pdst = 0xffff;
-		ntuple_fs.m_u.tcp_ip4_spec.tos = 0xff;
-		break;
-	case ETHER_FLOW:
-		parse_generic_cmdline(argc, argp, i + 1,
-				      &sntuple_changed,
-				      cmdline_ntuple_ether,
-				      ARRAY_SIZE(cmdline_ntuple_ether));
-		if (!ntuple_ether_dst_seen)
-			memset(ntuple_fs.m_u.ether_spec.h_dest, 0xff, ETH_ALEN);
-		if (!ntuple_ether_src_seen)
-			memset(ntuple_fs.m_u.ether_spec.h_source, 0xff,
-			       ETH_ALEN);
-		if (!ntuple_ether_proto_seen)
-			ntuple_fs.m_u.ether_spec.h_proto = 0xffff;
-		break;
-	default:
-		fprintf(stderr, "Unsupported flow type \"%s\"\n", argp[i]);
-		exit(106);
-		break;
-	}
-
-	if (!ntuple_vlan_tag_seen)
-		ntuple_fs.vlan_tag_mask = 0xffff;
-	if (!ntuple_user_def_seen)
-		ntuple_fs.data_mask = 0xffffffffffffffffULL;
-
-	if ((ntuple_ip4src_mask_seen && !ntuple_ip4src_seen) ||
-	    (ntuple_ip4dst_mask_seen && !ntuple_ip4dst_seen) ||
-	    (ntuple_psrc_mask_seen && !ntuple_psrc_seen) ||
-	    (ntuple_pdst_mask_seen && !ntuple_pdst_seen) ||
-	    (ntuple_ether_dst_mask_seen && !ntuple_ether_dst_seen) ||
-	    (ntuple_ether_src_mask_seen && !ntuple_ether_src_seen) ||
-	    (ntuple_ether_proto_mask_seen && !ntuple_ether_proto_seen) ||
-	    (ntuple_vlan_tag_mask_seen && !ntuple_vlan_tag_seen) ||
-	    (ntuple_user_def_mask_seen && !ntuple_user_def_seen)) {
-		fprintf(stderr, "Cannot specify mask without value\n");
-		exit(107);
-	}
-}
-
 static struct {
 	const char *name;
 	int (*func)(struct ethtool_drvinfo *info, struct ethtool_regs *regs);
@@ -2922,14 +2848,12 @@ static int do_gstats(int fd, struct ifreq *ifr)
 	return 0;
 }
 
-
 static int do_srxclass(int fd, struct ifreq *ifr)
 {
-	int err;
+	int err = 0;
+	struct ethtool_rxnfc nfccmd;
 
 	if (rx_fhash_changed) {
-		struct ethtool_rxnfc nfccmd;
-
 		nfccmd.cmd = ETHTOOL_SRXFH;
 		nfccmd.flow_type = rx_fhash_set;
 		nfccmd.data = rx_fhash_val;
@@ -2941,12 +2865,12 @@ static int do_srxclass(int fd, struct ifreq *ifr)
 
 	}
 
-	return 0;
+	return err ? 1 : 0;
 }
 
 static int do_grxclass(int fd, struct ifreq *ifr)
 {
-	int err;
+	int err = 0;
 
 	if (rx_fhash_get) {
 		struct ethtool_rxnfc nfccmd;
@@ -2961,7 +2885,7 @@ static int do_grxclass(int fd, struct ifreq *ifr)
 			dump_rxfhash(rx_fhash_get, nfccmd.data);
 	}
 
-	return 0;
+	return err ? 1 : 0;
 }
 
 static int do_grxfhindir(int fd, struct ifreq *ifr)
@@ -3142,7 +3066,17 @@ static int do_srxntuple(int fd, struct ifreq *ifr)
 {
 	int err;
 
-	if (sntuple_changed) {
+	if (rx_class_rule_added) {
+		err = rxclass_rule_ins(fd, ifr, &rx_rule_fs,
+				       rxclass_loc_valid);
+		if (err < 0)
+			fprintf(stderr, "Cannot insert RX classification rule\n");
+	} else if (rx_class_rule_del >= 0) {
+		err = rxclass_rule_del(fd, ifr, rx_class_rule_del);
+
+		if (err < 0)
+			fprintf(stderr, "Cannot delete RX classification rule\n");
+	} else if (sntuple_changed) {
 		struct ethtool_rx_ntuple ntuplecmd;
 
 		ntuplecmd.cmd = ETHTOOL_SRXNTUPLE;
@@ -3162,7 +3096,29 @@ static int do_srxntuple(int fd, struct ifreq *ifr)
 
 static int do_grxntuple(int fd, struct ifreq *ifr)
 {
-	return 0;
+	struct ethtool_rxnfc nfccmd;
+	int err;
+
+	if (rx_class_rule_get >= 0) {
+		err = rxclass_rule_get(fd, ifr, rx_class_rule_get);
+		if (err < 0)
+			fprintf(stderr, "Cannot get RX classification rule\n");
+		return err ? 1 : 0;
+	}
+
+	nfccmd.cmd = ETHTOOL_GRXRINGS;
+	ifr->ifr_data = (caddr_t)&nfccmd;
+	err = ioctl(fd, SIOCETHTOOL, ifr);
+	if (err < 0)
+		perror("Cannot get RX rings");
+	else
+		fprintf(stdout, "%d RX rings available\n",
+			(int)nfccmd.data);
+
+	err = rxclass_rule_getall(fd, ifr);
+	if (err < 0)
+		fprintf(stderr, "RX classification rule retrieval failed\n");
+	return err ? 1 : 0;
 }
 
 static int send_ioctl(int fd, struct ifreq *ifr)
diff --git a/rxclass.c b/rxclass.c
new file mode 100644
index 0000000..f2a8c96
--- /dev/null
+++ b/rxclass.c
@@ -0,0 +1,1169 @@
+/*
+ * Copyright (C) 2008 Sun Microsystems, Inc. All rights reserved.
+ */
+#include <stdio.h>
+#include <stdint.h>
+#include <stddef.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+
+#include <linux/sockios.h>
+#include <arpa/inet.h>
+#include "ethtool-util.h"
+#include "ethtool-bitops.h"
+
+/*
+ * This is a rule manager implementation for ordering rx flow
+ * classification rules in a longest prefix first match order.
+ * The assumption is that this rule manager is the only one adding rules to
+ * the device's hardware classifier.
+ */
+
+struct rmgr_ctrl {
+	/* slot contains a bitmap indicating which filters are valid */
+	unsigned long		*slot;
+	__u32			n_rules;
+	__u32			size;
+};
+
+static struct rmgr_ctrl rmgr;
+static int rmgr_init_done = 0;
+
+static void rmgr_print_nfc_rule(struct ethtool_rx_flow_spec *fsp)
+{
+	unsigned char	*smac, *smacm, *dmac, *dmacm;
+	__u32		sip, dip, sipm, dipm;
+	__u16		proto, protom;
+
+	fprintf(stdout,	"Filter: %d\n", fsp->location);
+
+	switch (fsp->flow_type) {
+	case TCP_V4_FLOW:
+	case UDP_V4_FLOW:
+	case SCTP_V4_FLOW:
+	case AH_V4_FLOW:
+	case ESP_V4_FLOW:
+	case IP_USER_FLOW:
+		sip = ntohl(fsp->h_u.tcp_ip4_spec.ip4src);
+		dip = ntohl(fsp->h_u.tcp_ip4_spec.ip4dst);
+		sipm = ntohl(fsp->m_u.tcp_ip4_spec.ip4src);
+		dipm = ntohl(fsp->m_u.tcp_ip4_spec.ip4dst);
+
+		switch (fsp->flow_type) {
+		case TCP_V4_FLOW:
+			fprintf(stdout, "\tRule Type: TCP over IPv4\n");
+			break;
+		case UDP_V4_FLOW:
+			fprintf(stdout, "\tRule Type: UDP over IPv4\n");
+			break;
+		case SCTP_V4_FLOW:
+			fprintf(stdout, "\tRule Type: SCTP over IPv4\n");
+			break;
+		case AH_V4_FLOW:
+			fprintf(stdout, "\tRule Type: IPSEC AH over IPv4\n");
+			break;
+		case ESP_V4_FLOW:
+			fprintf(stdout, "\tRule Type: IPSEC ESP over IPv4\n");
+			break;
+		case IP_USER_FLOW:
+			fprintf(stdout, "\tRule Type: Raw IPv4\n");
+			break;
+		default:
+			break;
+		}
+
+		fprintf(stdout,
+			"\tSrc IP addr: %d.%d.%d.%d mask: %d.%d.%d.%d\n"
+			"\tDest IP addr: %d.%d.%d.%d mask: %d.%d.%d.%d\n"
+			"\tTOS: 0x%x mask: 0x%x\n",
+			(sip & 0xff000000) >> 24,
+			(sip & 0xff0000) >> 16,
+			(sip & 0xff00) >> 8,
+			sip & 0xff,
+			(sipm & 0xff000000) >> 24,
+			(sipm & 0xff0000) >> 16,
+			(sipm & 0xff00) >> 8,
+			sipm & 0xff,
+			(dip & 0xff000000) >> 24,
+			(dip & 0xff0000) >> 16,
+			(dip & 0xff00) >> 8,
+			dip & 0xff,
+			(dipm & 0xff000000) >> 24,
+			(dipm & 0xff0000) >> 16,
+			(dipm & 0xff00) >> 8,
+			dipm & 0xff,
+			fsp->h_u.tcp_ip4_spec.tos,
+			fsp->m_u.tcp_ip4_spec.tos);
+
+		switch (fsp->flow_type) {
+		case TCP_V4_FLOW:
+		case UDP_V4_FLOW:
+		case SCTP_V4_FLOW:
+			fprintf(stdout,
+				"\tSrc port: %d mask: 0x%x\n"
+				"\tDest port: %d mask: 0x%x\n",
+				ntohs(fsp->h_u.tcp_ip4_spec.psrc),
+				ntohs(fsp->m_u.tcp_ip4_spec.psrc),
+				ntohs(fsp->h_u.tcp_ip4_spec.pdst),
+				ntohs(fsp->m_u.tcp_ip4_spec.pdst));
+			break;
+		case AH_V4_FLOW:
+		case ESP_V4_FLOW:
+			fprintf(stdout,
+				"\tSPI: %d mask: 0x%x\n",
+				ntohl(fsp->h_u.esp_ip4_spec.spi),
+				ntohl(fsp->m_u.esp_ip4_spec.spi));
+			break;
+		case IP_USER_FLOW:
+			fprintf(stdout,
+				"\tProtocol: %d mask: 0x%x\n"
+				"\tL4 bytes: 0x%x mask: 0x%x\n",
+				fsp->h_u.usr_ip4_spec.proto,
+				fsp->m_u.usr_ip4_spec.proto,
+				ntohl(fsp->h_u.usr_ip4_spec.l4_4_bytes),
+				ntohl(fsp->m_u.usr_ip4_spec.l4_4_bytes));
+			break;
+		default:
+			break;
+		}
+		break;
+	case ETHER_FLOW:
+		dmac = fsp->h_u.ether_spec.h_dest;
+		dmacm = fsp->m_u.ether_spec.h_dest;
+		smac = fsp->h_u.ether_spec.h_source;
+		smacm = fsp->m_u.ether_spec.h_source;
+		proto = ntohs(fsp->h_u.ether_spec.h_proto);
+		protom = ntohs(fsp->m_u.ether_spec.h_proto);
+
+		fprintf(stdout,
+			"\tFlow Type: Raw Ethernet\n"
+			"\tSrc MAC addr: %02X:%02X:%02X:%02X:%02X:%02X"
+			" mask: %02X:%02X:%02X:%02X:%02X:%02X\n"
+			"\tDest MAC addr: %02X:%02X:%02X:%02X:%02X:%02X"
+			" mask: %02X:%02X:%02X:%02X:%02X:%02X\n"
+			"\tEthertype: 0x%X mask: 0x%X\n",
+			smac[0], smac[1], smac[2], smac[3], smac[4], smac[5],
+			smacm[0], smacm[1], smacm[2], smacm[3], smacm[4], smacm[5],
+			dmac[0], dmac[1], dmac[2], dmac[3], dmac[4], dmac[5],
+			dmacm[0], dmacm[1], dmacm[2], dmacm[3], dmacm[4], dmacm[5],
+			proto, protom);
+		break;
+	default:
+		fprintf(stdout,
+			"\tUnknown Flow type: %d\n", fsp->flow_type);
+		break;
+	}
+
+	if (fsp->ring_cookie != RX_CLS_FLOW_DISC)
+		fprintf(stdout, "\tAction: Direct to queue %llu\n",
+			fsp->ring_cookie);
+	else
+		fprintf(stdout, "\tAction: Drop\n");
+
+	fprintf(stdout, "\n\n");
+}
+
+static void rmgr_print_rule(struct ethtool_rx_flow_spec *fsp)
+{
+	/* print the rule in this location */
+	switch (fsp->flow_type) {
+	case TCP_V4_FLOW:
+	case UDP_V4_FLOW:
+	case SCTP_V4_FLOW:
+	case AH_V4_FLOW:
+	case ESP_V4_FLOW:
+	case ETHER_FLOW:
+		rmgr_print_nfc_rule(fsp);
+		break;
+	case IP_USER_FLOW:
+		if (fsp->h_u.usr_ip4_spec.ip_ver == ETH_RX_NFC_IP4) {
+			rmgr_print_nfc_rule(fsp);
+			break;
+		}
+		/* IPv6 User Flow falls through to the case below */
+	case TCP_V6_FLOW:
+	case UDP_V6_FLOW:
+	case SCTP_V6_FLOW:
+	case AH_V6_FLOW:
+	case ESP_V6_FLOW:
+		fprintf(stderr, "IPv6 flows not implemented\n");
+		break;
+	default:
+		fprintf(stderr, "rmgr: Unknown flow type\n");
+		break;
+	}
+}
+
+static int rmgr_ins(__u32 loc)
+{
+	/* verify location is in rule manager range */
+	if ((loc < 0) || (loc >= rmgr.size)) {
+		fprintf(stderr, "rmgr: Location out of range\n");
+		return -1;
+	}
+
+	/* set bit for the rule */
+	set_bit(loc, rmgr.slot);
+
+	return 0;
+}
+
+static int rmgr_find(__u32 loc)
+{
+	/* verify location is in rule manager range */
+	if ((loc < 0) || (loc >= rmgr.size)) {
+		fprintf(stderr, "rmgr: Location out of range\n");
+		return -1;
+	}
+
+	/* if slot is found return 0 indicating success */
+	if (test_bit(loc, rmgr.slot))
+		return 0;
+
+	/* rule not found */
+	fprintf(stderr, "rmgr: No such rule\n");
+	return -1;
+}
+
+static int rmgr_del(__u32 loc)
+{
+	/* verify rule exists before attempting to delete */
+	int err = rmgr_find(loc);
+	if (err)
+		return err;
+
+	/* clear bit for the rule */
+	clear_bit(loc, rmgr.slot);
+
+	return 0;
+}
+
+static int rmgr_add(struct ethtool_rx_flow_spec *fsp, __u8 loc_valid)
+{
+	__u32 loc = fsp->location;
+
+	/* location provided, insert rule and update regions to match rule */
+	if (loc_valid)
+		return rmgr_ins(loc);
+
+	/* find an open slot */
+	for (loc = 0; loc < rmgr.size; loc += BITS_PER_LONG) {
+		if ((rmgr.slot[loc / BITS_PER_LONG]) != ~0UL)
+			break;
+	}
+
+	/* find and use available location in slot */
+	for (; loc < rmgr.size; loc++) {
+		if (!test_bit(loc, rmgr.slot)) {
+			fsp->location = loc;
+			return rmgr_ins(loc);
+		}
+	}
+
+	/* No space to add this rule */
+	fprintf(stderr, "rmgr: Cannot find appropriate slot to insert rule\n");
+
+	return -1;
+}
+
+static int rmgr_init(int fd, struct ifreq *ifr)
+{
+	struct ethtool_rxnfc *nfccmd;
+	int err, i;
+	__u32 *rule_locs;
+
+	if (rmgr_init_done)
+		return 0;
+
+	/* clear rule manager settings */
+	memset(&rmgr, 0, sizeof(struct rmgr_ctrl));
+
+	/* allocate memory for count request */
+	nfccmd = calloc(1, sizeof(*nfccmd));
+	if (!nfccmd) {
+		perror("rmgr: Cannot allocate memory for RX class rule data");
+		return -1;
+	}
+
+	/* request count and store in rmgr.n_rules */
+	nfccmd->cmd = ETHTOOL_GRXCLSRLCNT;
+	ifr->ifr_data = (caddr_t)nfccmd;
+	err = ioctl(fd, SIOCETHTOOL, ifr);
+	rmgr.n_rules = nfccmd->rule_cnt;
+	free(nfccmd);
+	if (err < 0) {
+		perror("rmgr: Cannot get RX class rule count");
+		return -1;
+	}
+
+	/* alloc memory for request of location list */
+	nfccmd = calloc(1, sizeof(*nfccmd) + (rmgr.n_rules * sizeof(__u32)));
+	if (!nfccmd) {
+		perror("rmgr: Cannot allocate memory for RX class rule locations");
+		return -1;
+	}
+
+	/* request location list */
+	nfccmd->cmd = ETHTOOL_GRXCLSRLALL;
+	nfccmd->rule_cnt = rmgr.n_rules;
+	ifr->ifr_data = (caddr_t)nfccmd;
+	err = ioctl(fd, SIOCETHTOOL, ifr);
+	if (err < 0) {
+		perror("rmgr: Cannot get RX class rules");
+		free(nfccmd);
+		return -1;
+	}
+
+	/* intitialize bitmap for storage of valid locations */
+	rmgr.size = nfccmd->data;
+	rmgr.slot = calloc(1, BITS_TO_LONGS(rmgr.size) * sizeof(long));
+	if (!rmgr.slot) {
+		perror("rmgr: Cannot allocate memory for RX class rules");
+		return -1;
+	}
+
+	/* write locations to bitmap */
+	rule_locs = nfccmd->rule_locs;
+	for (i = 0; i < rmgr.n_rules; i++) {
+		err = rmgr_ins(rule_locs[i]);
+		if (err < 0)
+			break;
+	}
+
+	/* free memory and set flag to avoid reinit */
+	free(nfccmd);
+	rmgr_init_done = 1;
+
+	return err;
+}
+
+static void rmgr_cleanup(void)
+{
+	if (!rmgr_init_done)
+		return;
+
+	rmgr_init_done = 0;
+
+	free(rmgr.slot);
+	rmgr.slot = NULL;
+	rmgr.size = 0;
+}
+
+int rxclass_rule_getall(int fd, struct ifreq *ifr)
+{
+	struct ethtool_rxnfc nfccmd;
+	int err, i, j;
+
+	/* init table of available rules */
+	err = rmgr_init(fd, ifr);
+	if (err < 0)
+		return err;
+
+	fprintf(stdout, "Total %d rules\n\n", rmgr.n_rules);
+
+	/* fetch and display all available rules */
+	for (i = 0; i < rmgr.size; i += BITS_PER_LONG) {
+		if (rmgr.slot[i / BITS_PER_LONG] == 0UL)
+			continue;
+		for (j = 0; j < BITS_PER_LONG; j++) {
+			if (!test_bit(i + j, rmgr.slot))
+				continue;
+			nfccmd.cmd = ETHTOOL_GRXCLSRULE;
+			memset(&nfccmd.fs, 0,
+			       sizeof(struct ethtool_rx_flow_spec));
+			nfccmd.fs.location = i + j;
+			ifr->ifr_data = (caddr_t)&nfccmd;
+			err = ioctl(fd, SIOCETHTOOL, ifr);
+			if (err < 0) {
+				perror("rmgr: Cannot get RX class rule");
+				return -1;
+			}
+			rmgr_print_rule(&nfccmd.fs);
+		}
+	}
+
+	rmgr_cleanup();
+
+	return 0;
+}
+
+int rxclass_rule_get(int fd, struct ifreq *ifr, __u32 loc)
+{
+	struct ethtool_rxnfc nfccmd;
+	int err;
+
+	/* init table of available rules */
+	err = rmgr_init(fd, ifr);
+	if (err < 0)
+		return err;
+
+	/* verify rule exists before attempting to display */
+	err = rmgr_find(loc);
+	if (err < 0)
+		return err;
+
+	/* fetch rule from netdev and display */
+	nfccmd.cmd = ETHTOOL_GRXCLSRULE;
+	memset(&nfccmd.fs, 0, sizeof(struct ethtool_rx_flow_spec));
+	nfccmd.fs.location = loc;
+	ifr->ifr_data = (caddr_t)&nfccmd;
+	err = ioctl(fd, SIOCETHTOOL, ifr);
+	if (err < 0) {
+		perror("rmgr: Cannot get RX class rule");
+		return -1;
+	}
+	rmgr_print_rule(&nfccmd.fs);
+
+	rmgr_cleanup();
+
+	return 0;
+}
+
+int rxclass_rule_ins(int fd, struct ifreq *ifr,
+		     struct ethtool_rx_flow_spec *fsp, __u8 loc_valid)
+{
+	struct ethtool_rxnfc nfccmd;
+	int err;
+
+	/* init table of available rules */
+	err = rmgr_init(fd, ifr);
+	if (err < 0)
+		return err;
+
+	/* verify rule location */
+	err = rmgr_add(fsp, loc_valid);
+	if (err < 0)
+		return err;
+
+	/* notify netdev of new rule */
+	nfccmd.cmd = ETHTOOL_SRXCLSRLINS;
+	nfccmd.fs = *fsp;
+	ifr->ifr_data = (caddr_t)&nfccmd;
+	err = ioctl(fd, SIOCETHTOOL, ifr);
+	if (err < 0) {
+		perror("rmgr: Cannot insert RX class rule");
+		return -1;
+	}
+	rmgr.n_rules++;
+
+	printf("Added rule with ID %d\n", fsp->location);
+
+	rmgr_cleanup();
+
+	return 0;
+}
+
+int rxclass_rule_del(int fd, struct ifreq *ifr, __u32 loc)
+{
+	struct ethtool_rxnfc nfccmd;
+	int err;
+
+	/* init table of available rules */
+	err = rmgr_init(fd, ifr);
+	if (err < 0)
+		return err;
+
+	/* verify rule exists */
+	err = rmgr_del(loc);
+	if (err < 0)
+		return err;
+
+	/* notify netdev of rule removal */
+	nfccmd.cmd = ETHTOOL_SRXCLSRLDEL;
+	nfccmd.fs.location = loc;
+	ifr->ifr_data = (caddr_t)&nfccmd;
+	err = ioctl(fd, SIOCETHTOOL, ifr);
+	if (err < 0) {
+		perror("rmgr: Cannot delete RX class rule");
+		return -1;
+	}
+	rmgr.n_rules--;
+
+	rmgr_cleanup();
+
+	return 0;
+}
+
+typedef enum {
+	OPT_NONE,
+	OPT_S32,
+	OPT_U8,
+	OPT_U16,
+	OPT_U32,
+	OPT_U64,
+	OPT_BE16,
+	OPT_BE32,
+	OPT_BE64,
+	OPT_IP4,
+	OPT_MAC,
+} rule_opt_type_t;
+
+typedef enum {
+	ETH_SPEC_NONE,
+	ETH_SPEC_NFC,
+	ETH_SPEC_NTUPLE,
+} rule_spec_type_t;
+
+#define NFC_FLAG_RING		0x001
+#define NFC_FLAG_LOC		0x002
+#define NFC_FLAG_SADDR		0x004
+#define NFC_FLAG_DADDR		0x008
+#define NFC_FLAG_SPORT		0x010
+#define NFC_FLAG_DPORT		0x020
+#define NFC_FLAG_SPI		0x030
+#define NFC_FLAG_TOS		0x040
+#define NFC_FLAG_PROTO		0x080
+#define NTUPLE_FLAG_VLAN	0x100
+#define NTUPLE_FLAG_UDEF	0x200
+
+struct rule_opts {
+	const char	*name;
+	rule_opt_type_t	type;
+	u32		flag;
+	int		offset;
+	int		moffset;
+};
+
+static struct rule_opts rule_nfc_tcp_ip4[] = {
+	{ "src-ip", OPT_IP4, NFC_FLAG_SADDR,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.tcp_ip4_spec.ip4src),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.tcp_ip4_spec.ip4src) },
+	{ "dst-ip", OPT_IP4, NFC_FLAG_DADDR,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.tcp_ip4_spec.ip4dst),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.tcp_ip4_spec.ip4dst) },
+	{ "tos", OPT_U8, NFC_FLAG_TOS,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.tcp_ip4_spec.tos),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.tcp_ip4_spec.tos) },
+	{ "src-port", OPT_BE16, NFC_FLAG_SPORT,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.tcp_ip4_spec.psrc),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.tcp_ip4_spec.psrc) },
+	{ "dst-port", OPT_BE16, NFC_FLAG_DPORT,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.tcp_ip4_spec.pdst),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.tcp_ip4_spec.pdst) },
+	{ "action", OPT_U64, NFC_FLAG_RING,
+	  offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 },
+	{ "loc", OPT_U32, NFC_FLAG_LOC,
+	  offsetof(struct ethtool_rx_flow_spec, location), -1 },
+};
+
+static struct rule_opts rule_nfc_esp_ip4[] = {
+	{ "src-ip", OPT_IP4, NFC_FLAG_SADDR,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.esp_ip4_spec.ip4src),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.esp_ip4_spec.ip4src) },
+	{ "dst-ip", OPT_IP4, NFC_FLAG_DADDR,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.esp_ip4_spec.ip4dst),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.esp_ip4_spec.ip4dst) },
+	{ "tos", OPT_U8, NFC_FLAG_TOS,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.esp_ip4_spec.tos),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.esp_ip4_spec.tos) },
+	{ "spi", OPT_BE32, NFC_FLAG_SPI,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.esp_ip4_spec.spi),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.esp_ip4_spec.spi) },
+	{ "action", OPT_U64, NFC_FLAG_RING,
+	  offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 },
+	{ "loc", OPT_U32, NFC_FLAG_LOC,
+	  offsetof(struct ethtool_rx_flow_spec, location), -1 },
+};
+
+static struct rule_opts rule_nfc_usr_ip4[] = {
+	{ "src-ip", OPT_IP4, NFC_FLAG_SADDR,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.usr_ip4_spec.ip4src),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.usr_ip4_spec.ip4src) },
+	{ "dst-ip", OPT_IP4, NFC_FLAG_DADDR,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.usr_ip4_spec.ip4dst),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.usr_ip4_spec.ip4dst) },
+	{ "tos", OPT_U8, NFC_FLAG_TOS,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.usr_ip4_spec.tos),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.usr_ip4_spec.tos) },
+	{ "l4proto", OPT_U8, NFC_FLAG_PROTO,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.usr_ip4_spec.proto),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.usr_ip4_spec.proto) },
+	{ "spi", OPT_BE32, NFC_FLAG_SPI,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.usr_ip4_spec.l4_4_bytes),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.usr_ip4_spec.l4_4_bytes) },
+	{ "src-port", OPT_BE16, NFC_FLAG_SPORT,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.usr_ip4_spec.l4_4_bytes),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.usr_ip4_spec.l4_4_bytes) },
+	{ "dst-port", OPT_BE16, NFC_FLAG_DPORT,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.usr_ip4_spec.l4_4_bytes) + 2,
+	  offsetof(struct ethtool_rx_flow_spec, m_u.usr_ip4_spec.l4_4_bytes) + 2 },
+	{ "action", OPT_U64, NFC_FLAG_RING,
+	  offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 },
+	{ "loc", OPT_U32, NFC_FLAG_LOC,
+	  offsetof(struct ethtool_rx_flow_spec, location), -1 },
+};
+
+static struct rule_opts rule_nfc_ether[] = {
+	{ "src", OPT_MAC, NFC_FLAG_SADDR,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.ether_spec.h_dest),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.ether_spec.h_dest) },
+	{ "dst", OPT_MAC, NFC_FLAG_DADDR,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.ether_spec.h_source),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.ether_spec.h_source) },
+	{ "proto", OPT_BE16, NFC_FLAG_PROTO,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.ether_spec.h_proto),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.ether_spec.h_proto) },
+	{ "action", OPT_U64, NFC_FLAG_RING,
+	  offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 },
+	{ "loc", OPT_U32, NFC_FLAG_LOC,
+	  offsetof(struct ethtool_rx_flow_spec, location), -1 },
+};
+
+static struct rule_opts rule_ntuple_tcp_ip4[] = {
+	{ "src-ip", OPT_IP4, NFC_FLAG_SADDR,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, h_u.tcp_ip4_spec.ip4src),
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, m_u.tcp_ip4_spec.ip4src) },
+	{ "dst-ip", OPT_IP4, NFC_FLAG_DADDR,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, h_u.tcp_ip4_spec.ip4dst),
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, m_u.tcp_ip4_spec.ip4dst) },
+	{ "tos", OPT_U8, NFC_FLAG_TOS,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, h_u.tcp_ip4_spec.tos),
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, m_u.tcp_ip4_spec.tos) },
+	{ "src-port", OPT_BE16, NFC_FLAG_SPORT,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, h_u.tcp_ip4_spec.psrc),
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, m_u.tcp_ip4_spec.psrc) },
+	{ "dst-port", OPT_BE16, NFC_FLAG_DPORT,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, h_u.tcp_ip4_spec.pdst),
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, m_u.tcp_ip4_spec.pdst) },
+	{ "vlan", OPT_U16, NTUPLE_FLAG_VLAN,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, vlan_tag),
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, vlan_tag_mask) },
+	{ "user-def", OPT_U64, NTUPLE_FLAG_UDEF,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, data),
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, data_mask) },
+	{ "action", OPT_S32, NFC_FLAG_RING,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, action), -1 },
+};
+
+static struct rule_opts rule_ntuple_esp_ip4[] = {
+	{ "src-ip", OPT_IP4, NFC_FLAG_SADDR,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, h_u.esp_ip4_spec.ip4src),
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, m_u.esp_ip4_spec.ip4src) },
+	{ "dst-ip", OPT_IP4, NFC_FLAG_DADDR,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, h_u.esp_ip4_spec.ip4dst),
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, m_u.esp_ip4_spec.ip4dst) },
+	{ "tos", OPT_U8, NFC_FLAG_TOS,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, h_u.esp_ip4_spec.tos),
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, m_u.esp_ip4_spec.tos) },
+	{ "spi", OPT_BE32, NFC_FLAG_SPI,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, h_u.esp_ip4_spec.spi),
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, m_u.esp_ip4_spec.spi) },
+	{ "vlan", OPT_U16, NTUPLE_FLAG_VLAN,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, vlan_tag),
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, vlan_tag_mask) },
+	{ "user-def", OPT_U64, NTUPLE_FLAG_UDEF,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, data),
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, data_mask) },
+	{ "action", OPT_S32, NFC_FLAG_RING,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, action), -1 },
+};
+
+static struct rule_opts rule_ntuple_usr_ip4[] = {
+	{ "src-ip", OPT_IP4, NFC_FLAG_SADDR,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, h_u.usr_ip4_spec.ip4src),
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, m_u.usr_ip4_spec.ip4src) },
+	{ "dst-ip", OPT_IP4, NFC_FLAG_DADDR,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, h_u.usr_ip4_spec.ip4dst),
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, m_u.usr_ip4_spec.ip4dst) },
+	{ "tos", OPT_U8, NFC_FLAG_TOS,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, h_u.usr_ip4_spec.tos),
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, m_u.usr_ip4_spec.tos) },
+	{ "l4proto", OPT_U8, NFC_FLAG_PROTO,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, h_u.usr_ip4_spec.proto),
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, m_u.usr_ip4_spec.proto) },
+	{ "spi", OPT_BE32, NFC_FLAG_SPI,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, h_u.usr_ip4_spec.l4_4_bytes),
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, m_u.usr_ip4_spec.l4_4_bytes) },
+	{ "src-port", OPT_BE16, NFC_FLAG_SPORT,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, h_u.usr_ip4_spec.l4_4_bytes),
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, m_u.usr_ip4_spec.l4_4_bytes) },
+	{ "dst-port", OPT_BE16, NFC_FLAG_DPORT,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, h_u.usr_ip4_spec.l4_4_bytes) + 2,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, m_u.usr_ip4_spec.l4_4_bytes) + 2 },
+	{ "vlan", OPT_U16, NTUPLE_FLAG_VLAN,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, vlan_tag),
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, vlan_tag_mask) },
+	{ "user-def", OPT_U64, NTUPLE_FLAG_UDEF,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, data),
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, data_mask) },
+	{ "action", OPT_S32, NFC_FLAG_RING,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, action), -1 },
+};
+
+static struct rule_opts rule_ntuple_ether[] = {
+	{ "src", OPT_MAC, NFC_FLAG_SADDR,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, h_u.ether_spec.h_dest),
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, m_u.ether_spec.h_dest) },
+	{ "dst", OPT_MAC, NFC_FLAG_DADDR,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, h_u.ether_spec.h_source),
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, m_u.ether_spec.h_source) },
+	{ "proto", OPT_BE16, NFC_FLAG_PROTO,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, h_u.ether_spec.h_proto),
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, m_u.ether_spec.h_proto) },
+	{ "vlan", OPT_U16, NTUPLE_FLAG_VLAN,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, vlan_tag),
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, vlan_tag_mask) },
+	{ "user-def", OPT_U64, NTUPLE_FLAG_UDEF,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, data),
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, data_mask) },
+	{ "action", OPT_S32, NFC_FLAG_RING,
+	  offsetof(struct ethtool_rx_ntuple_flow_spec, action), -1 },
+};
+
+static int rxclass_get_long(char *str, long long *val, int size)
+{
+	long long max = ~0ULL >> (65 - size);
+	char *endp;
+
+	errno = 0;
+
+	*val = strtoll(str, &endp, 0);
+
+	if (*endp || errno || (*val > max) || (*val < ~max))
+		return -1;
+
+	return 0;
+}
+
+static int rxclass_get_ulong(char *str, unsigned long long *val, int size)
+{
+	long long max = ~0ULL >> (64 - size);
+	char *endp;
+
+	errno = 0;
+
+	*val = strtoull(str, &endp, 0);
+
+	if (*endp || errno || (*val > max))
+		return -1;
+
+	return 0;
+}
+
+static int rxclass_get_ipv4(char *str, __be32 *val)
+{
+	if (!strchr(str, '.')) {
+		unsigned long long v;
+		int err;
+
+		err = rxclass_get_ulong(str, &v, 32);
+		if (err)
+			return -1;
+
+		*val = htonl((u32)v);
+
+		return 0;
+	}
+
+	if (!inet_pton(AF_INET, str, val))
+		return -1;
+
+	return 0;
+}
+
+static int rxclass_get_ether(char *str, unsigned char *val)
+{
+	unsigned int buf[ETH_ALEN];
+	int count;
+
+	if (!strchr(str, ':'))
+		return -1;
+
+	count = sscanf(str, "%2x:%2x:%2x:%2x:%2x:%2x",
+		       &buf[0], &buf[1], &buf[2],
+		       &buf[3], &buf[4], &buf[5]);
+
+	if (count != ETH_ALEN)
+		return -1;
+
+	do {
+		count--;
+		val[count] = buf[count];
+	} while (count);
+
+	return 0;
+}
+
+static int rxclass_get_val(char *str, unsigned char *p, u32 *flags,
+			   const struct rule_opts *opt, rule_spec_type_t spec)
+{
+	unsigned long long mask = (spec == ETH_SPEC_NFC) ? ~0ULL : 0ULL; 
+	int err = 0;
+
+	if (*flags & opt->flag)
+		return -1;
+
+	*flags |= opt->flag;
+
+	switch (opt->type) {
+	case OPT_S32: {
+		long long val;
+		err = rxclass_get_long(str, &val, 32);
+		if (err)
+			return -1;
+		*(int *)&p[opt->offset] = (int)val;
+		if (opt->moffset >= 0)
+			*(int *)&p[opt->moffset] = (int)mask;
+		break;
+	}
+	case OPT_U8: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 8);
+		if (err)
+			return -1;
+		*(u8 *)&p[opt->offset] = (u8)val;
+		if (opt->moffset >= 0)
+			*(u8 *)&p[opt->moffset] = (u8)mask;
+		break;
+	}
+	case OPT_U16: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 16);
+		if (err)
+			return -1;
+		*(u16 *)&p[opt->offset] = (u16)val;
+		if (opt->moffset >= 0)
+			*(u16 *)&p[opt->moffset] = (u16)mask;
+		break;
+	}
+	case OPT_U32: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 32);
+		if (err)
+			return -1;
+		*(u32 *)&p[opt->offset] = (u32)val;
+		if (opt->moffset >= 0)
+			*(u32 *)&p[opt->moffset] = (u32)mask;
+		break;
+	}
+	case OPT_U64: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 64);
+		if (err)
+			return -1;
+		*(u64 *)&p[opt->offset] = (u64)val;
+		if (opt->moffset >= 0)
+			*(u64 *)&p[opt->moffset] = (u64)mask;
+		break;
+	}
+	case OPT_BE16: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 16);
+		if (err)
+			return -1;
+		*(__be16 *)&p[opt->offset] = htons((u16)val);
+		if (opt->moffset >= 0)
+			*(__be16 *)&p[opt->moffset] = (__be16)mask;
+		break;
+	}
+	case OPT_BE32: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 32);
+		if (err)
+			return -1;
+		*(__be32 *)&p[opt->offset] = htonl((u32)val);
+		if (opt->moffset >= 0)
+			*(__be32 *)&p[opt->moffset] = (__be32)mask;
+		break;
+	}
+	case OPT_BE64: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 64);
+		if (err)
+			return -1;
+		*(__be64 *)&p[opt->offset] = htonll((u64)val);
+		if (opt->moffset >= 0)
+			*(__be64 *)&p[opt->moffset] = (__be64)mask;
+		break;
+	}
+	case OPT_IP4: {
+		__be32 val;
+		err = rxclass_get_ipv4(str, &val);
+		if (err)
+			return -1;
+		*(__be32 *)&p[opt->offset] = val;
+		if (opt->moffset >= 0)
+			*(__be32 *)&p[opt->moffset] = (__be32)mask;
+		break;
+	}
+	case OPT_MAC: {
+		unsigned char val[ETH_ALEN];
+		err = rxclass_get_ether(str, val);
+		if (err)
+			return -1;
+		memcpy(&p[opt->offset], val, ETH_ALEN);
+		if (opt->moffset >= 0)
+			memcpy(&p[opt->moffset], &mask, ETH_ALEN);
+		break;
+	}
+	case OPT_NONE:
+	default:
+		return -1;
+	}
+
+	return 0;
+}
+
+static int rxclass_get_mask(char *str, unsigned char *p,
+			    const struct rule_opts *opt)
+{
+	int err = 0;
+
+	if (opt->moffset < 0)
+		return -1;
+
+	switch (opt->type) {
+	case OPT_S32: {
+		long long val;
+		err = rxclass_get_long(str, &val, 32);
+		if (err)
+		*(int *)&p[opt->moffset] = (int)val;
+		break;
+	}
+	case OPT_U8: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 8);
+		if (err)
+			return -1;
+		*(u8 *)&p[opt->moffset] = (u8)val;
+		break;
+	}
+	case OPT_U16: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 16);
+		if (err)
+			return -1;
+		*(u16 *)&p[opt->moffset] = (u16)val;
+		break;
+	}
+	case OPT_U32: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 32);
+		if (err)
+			return -1;
+		*(u32 *)&p[opt->moffset] = (u32)val;
+		break;
+	}
+	case OPT_U64: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 64);
+		if (err)
+			return -1;
+		*(u64 *)&p[opt->moffset] = (u64)val;
+		break;
+	}
+	case OPT_BE16: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 16);
+		if (err)
+			return -1;
+		*(__be16 *)&p[opt->moffset] = htons((u16)val);
+		break;
+	}
+	case OPT_BE32: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 32);
+		if (err)
+			return -1;
+		*(__be32 *)&p[opt->moffset] = htonl((u32)val);
+		break;
+	}
+	case OPT_BE64: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 64);
+		if (err)
+			return -1;
+		*(__be64 *)&p[opt->moffset] = htonll((u64)val);
+		break;
+	}
+	case OPT_IP4: {
+		__be32 val;
+		err = rxclass_get_ipv4(str, &val);
+		if (err)
+			return -1;
+		*(__be32 *)&p[opt->moffset] = val;
+		break;
+	}
+	case OPT_MAC: {
+		unsigned char val[ETH_ALEN];
+		err = rxclass_get_ether(str, val);
+		if (err)
+			return -1;
+		memcpy(&p[opt->moffset], val, ETH_ALEN);
+		break;
+	}
+	case OPT_NONE:
+	default:
+		return -1;
+	}
+
+	return 0;
+}
+
+int rxclass_parse_ruleopts(char **argp, int argc, void *fsp, u8 *loc_valid)
+{
+	const struct rule_opts *options;
+	unsigned char *p = (unsigned char *)fsp;
+	int i = 0, n_opts, err;
+	u32 flags = 0;
+	rule_spec_type_t spec_type;
+	int flow_type;
+
+	if (*argp == NULL || **argp == '\0' || argc < 2)
+		goto syntax_err;
+
+	if (!strcmp(argp[0], "class-rule-add"))
+		spec_type = ETH_SPEC_NFC;
+	else if (!strcmp(argp[0], "flow-type"))
+		spec_type = ETH_SPEC_NTUPLE;
+	else
+		goto syntax_err;
+
+	if (!strcmp(argp[1], "tcp4"))
+		flow_type = TCP_V4_FLOW;
+	else if (!strcmp(argp[1], "udp4"))
+		flow_type = UDP_V4_FLOW;
+	else if (!strcmp(argp[1], "sctp4"))
+		flow_type = SCTP_V4_FLOW;
+	else if (!strcmp(argp[1], "ah4"))
+		flow_type = AH_V4_FLOW;
+	else if (!strcmp(argp[1], "esp4"))
+		flow_type = ESP_V4_FLOW;
+	else if (!strcmp(argp[1], "ip4"))
+		flow_type = IP_USER_FLOW;
+	else if (!strcmp(argp[1], "ether"))
+		flow_type = ETHER_FLOW;
+	else
+		goto syntax_err;
+
+	switch (spec_type) {
+	case ETH_SPEC_NFC:
+		memset(p, 0, sizeof(struct ethtool_rx_flow_spec));
+		*(u32 *)p = flow_type;
+
+		switch (flow_type) {
+		case TCP_V4_FLOW:
+		case UDP_V4_FLOW:
+		case SCTP_V4_FLOW:
+			options = rule_nfc_tcp_ip4;
+			n_opts = ARRAY_SIZE(rule_nfc_tcp_ip4);
+			break;
+		case AH_V4_FLOW:
+		case ESP_V4_FLOW:
+			options = rule_nfc_esp_ip4;
+			n_opts = ARRAY_SIZE(rule_nfc_esp_ip4);
+			break;
+		case IP_USER_FLOW:
+			options = rule_nfc_usr_ip4;
+			n_opts = ARRAY_SIZE(rule_nfc_usr_ip4);
+			break;
+		case ETHER_FLOW:
+			options = rule_nfc_ether;
+			n_opts = ARRAY_SIZE(rule_nfc_ether);
+			break;
+		default:
+			fprintf(stdout, "Add rule, invalid rule type[%s]\n",
+				argp[1]);
+			return -1;
+		}
+		break;
+	case ETH_SPEC_NTUPLE:
+		memset(p, 0xff,
+		       offsetof(struct ethtool_rx_ntuple_flow_spec, action));
+		memset(p + offsetof(struct ethtool_rx_ntuple_flow_spec, action),
+		       0, 4);
+		*(u32 *)fsp = flow_type;
+
+		switch (flow_type) {
+		case TCP_V4_FLOW:
+		case UDP_V4_FLOW:
+		case SCTP_V4_FLOW:
+			options = rule_ntuple_tcp_ip4;
+			n_opts = ARRAY_SIZE(rule_ntuple_tcp_ip4);
+			break;
+		case AH_V4_FLOW:
+		case ESP_V4_FLOW:
+			options = rule_ntuple_esp_ip4;
+			n_opts = ARRAY_SIZE(rule_nfc_esp_ip4);
+			break;
+		case IP_USER_FLOW:
+			options = rule_ntuple_usr_ip4;
+			n_opts = ARRAY_SIZE(rule_nfc_usr_ip4);
+			break;
+		case ETHER_FLOW:
+			options = rule_ntuple_ether;
+			n_opts = ARRAY_SIZE(rule_nfc_ether);
+			break;
+		default:
+			fprintf(stdout, "Add rule, invalid flow type[%s]\n",
+				argp[1]);
+			return -1;
+		}
+		break;
+	default:
+		fprintf(stdout, "Add rule, invalid command[%s]\n",
+			argp[0]);
+		return -1;
+	}
+
+	for (i = 2; i < argc;) {
+		const struct rule_opts *opt;
+		int idx;
+		for (opt = options, idx = 0; idx < n_opts; idx++, opt++) {
+			char mask_name[16];
+
+			if (strcmp(argp[i], opt->name))
+				continue;
+
+			i++;
+			if (i >= argc)
+				break;
+
+			err = rxclass_get_val(argp[i], p, &flags, opt, spec_type);
+			if (err) {
+				fprintf(stderr, "Invalid %s value[%s]\n",
+					opt->name, argp[i]);
+				return -1;
+			}
+
+			i++;
+			if (i >= argc)
+				break;
+
+			sprintf(mask_name, "%s-mask", opt->name);
+			if (strcmp(argp[i], mask_name))
+				break;
+
+			i++;
+			if (i >= argc)
+				goto syntax_err;
+
+			err = rxclass_get_mask(argp[i], p, opt);
+			if (err) {
+				fprintf(stderr, "Invalid %s mask[%s]\n",
+					opt->name, argp[i]);
+				return -1;
+			}
+
+			i++;
+
+			break;
+		}
+		if (idx == n_opts) {
+			fprintf(stdout, "Add rule, unreconized option[%s]\n", argp[i]);
+			return -1;
+		}
+	}
+
+	if (spec_type == ETH_SPEC_NFC) {
+		if (loc_valid && (flags & NFC_FLAG_LOC))
+			*loc_valid = 1;
+	}
+
+	return 0;
+
+syntax_err:
+	fprintf(stdout, "Add rule, invalid syntax\n");
+	return -1;
+}


^ permalink raw reply related

* Re: 2.6.37 regression: adding main interface to a bridge breaks vlan interface RX
From: chriss @ 2011-02-26  0:16 UTC (permalink / raw)
  To: netdev
In-Reply-To: <AANLkTimFC5drwt44C7Hpn4YJU-Njksr8Zkiv_RXK_oSL@mail.gmail.com>

Jesse Gross <jesse <at> nicira.com> writes:

> 
> What driver is in use with the NIC you are seeing this on?
>

He there

the device in question is (as lspci told)
Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8110SC/8169SC Gigabit
Ethernet (rev 10)

handled by kernel module r8169.

I also tried to set up the vlan on the bridge. like br0.3. but that did not work.

regards//chriss

^ permalink raw reply

* Re: TX VLAN acceleration on bridges broken in 2.6.37?
From: Jan Niehusmann @ 2011-02-26  0:19 UTC (permalink / raw)
  To: Jesse Gross; +Cc: linux-kernel, netdev
In-Reply-To: <AANLkTinoJqWA6ffnUx2KW_83srNbo+k6r24hnNpPTGvW@mail.gmail.com>

On Fri, Feb 25, 2011 at 02:53:21PM -0800, Jesse Gross wrote:
> is specific to the e1000e driver.  I know that some other Intel NICs
> require vlan stripping on receive to be enabled for vlan insertion on
> transmit to work.  Since this driver has not been converted over to
> use the new vlan model yet, it only enables these things if a vlan is
> directly configured on it.  To confirm this can you try a few things:

My observations confirm your theory:

> * Directly configure the vlan on the device instead of going through the bridge.

- does work, but only if eth0 is not part of bridge (expected behaviour,
  afaik)

> * Use the bridge but also configure an unused vlan device on the
> physical interface.

- does work

> * Double check that tcpdump with the settings that you are using shows
> vlan tags in other situations.  In some cases you need to use the 'e'
> flag with tcpdump in order for it show vlan tags.  If it is the
> driver/NIC that is dropping the tags, tcpdump should still show them.

- indeed, -e is necessary to show the vlan tags. So my prior observation
  regarding tag visibility in tcpdump was wrong. The packets are still
  have a vlan tag in the non-working case. 

  (What actually is affected by the txvlan flag is the ability to filter
  for vlan tags with tcpdump.  so 'tcpdump -e -i eth0' shows the packets,
  'tcpdump -e -i eth0 vlan' only shows them with txvlan off. However,
  filtering for the vlan tag also doesn't work with the vlan interface
  on eth0.1, while the tagging actually works, as verified above.)

Jan

^ permalink raw reply

* Re: [net-next-2.6 PATCH 01/10] ethtool: prevent null pointer dereference with NTUPLE set but no set_rx_ntuple
From: Ben Hutchings @ 2011-02-26  0:21 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: davem, jeffrey.t.kirsher, netdev
In-Reply-To: <20110225233244.7920.26742.stgit@gitlad.jf.intel.com>

On Fri, 2011-02-25 at 15:32 -0800, Alexander Duyck wrote:
> This change is meant to prevent a possible null pointer dereference if
> NETIF_F_NTUPLE is defined but the set_rx_ntuple function pointer is not.

I think it would be a bug for NETIF_F_NTUPLE to be enabled on a device
that doesn't have this operation.  Are there any drivers for which this
is possible?

> This issue appears to affect all kernels since 2.6.34.

If this can actually happen, the fix should go to net-2.6 and
stable@kernel.org.  However, I think that the null deference is
impossible and this really just fixes the error code.

Ben.

> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
> 
>  net/core/ethtool.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/net/core/ethtool.c b/net/core/ethtool.c
> index c1a71bb..4843674 100644
> --- a/net/core/ethtool.c
> +++ b/net/core/ethtool.c
> @@ -893,6 +893,9 @@ static noinline_for_stack int ethtool_set_rx_ntuple(struct net_device *dev,
>  	struct ethtool_rx_ntuple_flow_spec_container *fsc = NULL;
>  	int ret;
>  
> +	if (!ops->set_rx_ntuple)
> +		return -EOPNOTSUPP;
> +
>  	if (!(dev->features & NETIF_F_NTUPLE))
>  		return -EINVAL;
>  
> 

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [net-next-2.6 PATCH 01/10] ethtool: prevent null pointer dereference with NTUPLE set but no set_rx_ntuple
From: Alexander Duyck @ 2011-02-26  0:40 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: davem@davemloft.net, Kirsher, Jeffrey T, netdev@vger.kernel.org
In-Reply-To: <1298679675.3555.4.camel@localhost>

On 2/25/2011 4:21 PM, Ben Hutchings wrote:
> On Fri, 2011-02-25 at 15:32 -0800, Alexander Duyck wrote:
>> This change is meant to prevent a possible null pointer dereference if
>> NETIF_F_NTUPLE is defined but the set_rx_ntuple function pointer is not.
>
> I think it would be a bug for NETIF_F_NTUPLE to be enabled on a device
> that doesn't have this operation.  Are there any drivers for which this
> is possible?

Currently there are no drivers where this is possible.  However I 
encountered it as a result of testing the patches further on in this set.

>> This issue appears to affect all kernels since 2.6.34.
>
> If this can actually happen, the fix should go to net-2.6 and
> stable@kernel.org.  However, I think that the null deference is
> impossible and this really just fixes the error code.
>
> Ben.

It cannot occur with any of the in-kernel drivers since they all set the 
NETIF_F_NTUPLE flag and have the function defined.  However going 
forward I would like to have the option of using the network flow 
classifier interface instead of the set_rx_ntuple interface due to the 
fact that it supports many of the features I needed.

I believe this patch should apply to net-2.6 without any changes so if 
it is better placed there I will resubmit it specifically for net-2.6 
and stable.

Thanks,

Alex

>> Signed-off-by: Alexander Duyck<alexander.h.duyck@intel.com>
>> ---
>>
>>   net/core/ethtool.c |    3 +++
>>   1 files changed, 3 insertions(+), 0 deletions(-)
>>
>> diff --git a/net/core/ethtool.c b/net/core/ethtool.c
>> index c1a71bb..4843674 100644
>> --- a/net/core/ethtool.c
>> +++ b/net/core/ethtool.c
>> @@ -893,6 +893,9 @@ static noinline_for_stack int ethtool_set_rx_ntuple(struct net_device *dev,
>>   	struct ethtool_rx_ntuple_flow_spec_container *fsc = NULL;
>>   	int ret;
>>
>> +	if (!ops->set_rx_ntuple)
>> +		return -EOPNOTSUPP;
>> +
>>   	if (!(dev->features&  NETIF_F_NTUPLE))
>>   		return -EINVAL;
>>
>>
>


^ permalink raw reply

* [net-2.6 PATCH] ethtool: prevent null pointer dereference with NTUPLE set but no set_rx_ntuple
From: Alexander Duyck @ 2011-02-26  0:42 UTC (permalink / raw)
  To: davem, jeffrey.t.kirsher, bhutchings; +Cc: netdev, stable

This change is meant to prevent a possible null pointer dereference if
NETIF_F_NTUPLE is defined but the set_rx_ntuple function pointer is not.

This issue appears to affect all kernels since 2.6.34.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

 net/core/ethtool.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index c1a71bb..4843674 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -893,6 +893,9 @@ static noinline_for_stack int ethtool_set_rx_ntuple(struct net_device *dev,
 	struct ethtool_rx_ntuple_flow_spec_container *fsc = NULL;
 	int ret;
 
+	if (!ops->set_rx_ntuple)
+		return -EOPNOTSUPP;
+
 	if (!(dev->features & NETIF_F_NTUPLE))
 		return -EINVAL;
 


^ permalink raw reply related

* Re: SO_REUSEPORT - can it be done in kernel?
From: Herbert Xu @ 2011-02-26  0:57 UTC (permalink / raw)
  To: David Miller
  Cc: rick.jones2, tgraf, therbert, wsommerfeld, daniel.baluta, netdev
In-Reply-To: <20110225.112019.48513284.davem@davemloft.net>

David Miller <davem@davemloft.net> wrote:
>
> I think this is fundamentally a bind problem as well.

I'm fairly certain the bottleneck is indeed in the kernel, and
in the UDP stack in particular.

This is born out by a test where I used two named worker threads,
both working on the same socket.  Stracing shows that they're
working flat out only doing sendmsg/recvmsg.

The result was that they obtained (in aggregate) half the throughput
of a single worker thread.

I then retested by having them use two sockets and the performance
greatly improved.

Now this is actually expected since our UDP stack is essentially
single-threaded on the send side when only one socket is being
used, mostly due to the corking functionality.

I'm unsure how big a role the receive side scalability actually
plays in this case, but I suspect it isn't great.

Which is why I'm quite skeptical about this REUSEPORT patch as
IMHO the only reason it produces a great result is solely because
it is allowing parallel sends going out.

Rather than modifying all UDP applications out there to fix what
is fundamentally a kernel problem, I think what we should do is
fix the UDP stack so that it actually scales.

It isn't all that hard since the easy way would be to only take
the lock if we're already corked or about to cork.

For the receive side we also don't need REUSEPORT as we can simply
make our UDP stack multiqueue.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [net-next-2.6 PATCH 02/10] ethtool: add ntuple flow specifier to network flow classifier
From: Ben Hutchings @ 2011-02-26  1:00 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: davem, jeffrey.t.kirsher, netdev
In-Reply-To: <20110225233249.7920.70334.stgit@gitlad.jf.intel.com>

On Fri, 2011-02-25 at 15:32 -0800, Alexander Duyck wrote:
> This change is meant to add an ntuple define type to the rx network flow
> classification specifiers.  The idea is to allow ntuple to be displayed and
> possibly configured via the network flow classification interface.  To do
> this I added a ntuple_flow_spec_ext to the lsit of supported filters, and
> added a flow_type_ext value to the structure in an unused hole within the
> ethtool_rx_flow_spec structure.

There's a hole there on 64-bit architectures.  Unfortunately, on i386
and other architectures where u64 is not 64-bit-aligned, there isn't.
We actually need to add compat handling for the commands that use it.

Also, we don't want these flags to be ignored by older kernel versions
and drivers - they should reject specs that they don't understand.  So
any extension flags need to be added to flow_type.

> Due to the fact that the flow specifier structures are only 4 byte aligned
> instead of 8 I had to break the user data field into 2 sections.  In
> addition I added the vlan ethertype field since this is what ixgbe was
> using the user-data for currently and it allows for the fields to stay 4
> byte aligned while occupying space at the end of the flow_spec.
>
> In order to guarantee byte ordering I also thought it best to keep all
> fields in the flow_spec area a big endian value, as such I added vlan, vlan
> ethertype, and data as big endian values.

It's not important that byte order is consistent across architectures.

> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
> 
>  include/linux/ethtool.h |   20 ++++++++++++++++++++
>  1 files changed, 20 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
> index aac3e2e..3d1f8e0 100644
> --- a/include/linux/ethtool.h
> +++ b/include/linux/ethtool.h
> @@ -378,10 +378,25 @@ struct ethtool_usrip4_spec {
>  };
>  
>  /**
> + * struct ethtool_ntuple_spec_ext - flow spec extension for ntuple in nfc
> + * @unused: space unused by extension
> + * @vlan_etype: EtherType for vlan tagged packet to match
> + * @vlan_tci: VLAN tag to match
> + * @data: Driver-dependent data to match
> + */
> +struct ethtool_ntuple_spec_ext {
> +	__be32	unused[15];
> +	__be16	vlan_etype;
> +	__be16	vlan_tci;
> +	__be32	data[2];
> +};
[...]

This is a really nasty way to reclaim space in the union.

Let's name the union, shrink it and insert the extra fields that way:

--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -377,27 +377,43 @@ struct ethtool_usrip4_spec {
 	__u8    proto;
 };
 
+union ethtool_flow_union {
+	struct ethtool_tcpip4_spec		tcp_ip4_spec;
+	struct ethtool_tcpip4_spec		udp_ip4_spec;
+	struct ethtool_tcpip4_spec		sctp_ip4_spec;
+	struct ethtool_ah_espip4_spec		ah_ip4_spec;
+	struct ethtool_ah_espip4_spec		esp_ip4_spec;
+	struct ethtool_usrip4_spec		usr_ip4_spec;
+	struct ethhdr				ether_spec;
+	__u8					hdata[52];
+};
+
+struct ethtool_flow_ext {
+	__be16	vlan_etype;
+	__be16	vlan_tci;
+	__be32	data[2];
+	__u32	reserved[2];
+};
+
 /**
  * struct ethtool_rx_flow_spec - specification for RX flow filter
  * @flow_type: Type of match to perform, e.g. %TCP_V4_FLOW
  * @h_u: Flow fields to match (dependent on @flow_type)
+ * @h_ext: Additional fields to match
  * @m_u: Masks for flow field bits to be ignored
+ * @m_ext: Masks for additional field bits to be ignored.
+ *	Note, all additional fields must be ignored unless @flow_type
+ *	includes the %FLOW_EXT flag.
  * @ring_cookie: RX ring/queue index to deliver to, or %RX_CLS_FLOW_DISC
  *	if packets should be discarded
  * @location: Index of filter in hardware table
  */
 struct ethtool_rx_flow_spec {
 	__u32		flow_type;
-	union {
-		struct ethtool_tcpip4_spec		tcp_ip4_spec;
-		struct ethtool_tcpip4_spec		udp_ip4_spec;
-		struct ethtool_tcpip4_spec		sctp_ip4_spec;
-		struct ethtool_ah_espip4_spec		ah_ip4_spec;
-		struct ethtool_ah_espip4_spec		esp_ip4_spec;
-		struct ethtool_usrip4_spec		usr_ip4_spec;
-		struct ethhdr				ether_spec;
-		__u8					hdata[72];
-	} h_u, m_u;
+	union ethtool_flow_union h_u;
+	struct ethtool_flow_ext h_ext;
+	union ethtool_flow_union m_u;
+	struct ethtool_flow_ext m_ext;
 	__u64		ring_cookie;
 	__u32		location;
 };
@@ -954,6 +970,8 @@ struct ethtool_ops {
 #define	IPV4_FLOW	0x10	/* hash only */
 #define	IPV6_FLOW	0x11	/* hash only */
 #define	ETHER_FLOW	0x12	/* spec only (ether_spec) */
+/* Flag to enable additional fields in struct ethtool_rx_flow_spec */
+#define	FLOW_EXT	0x80000000
 
 /* L3-L4 network traffic flow hash options */
 #define	RXH_L2DA	(1 << 1)
---

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: 2.6.37 regression: adding main interface to a bridge breaks vlan interface RX
From: Jesse Gross @ 2011-02-26  1:08 UTC (permalink / raw)
  To: chriss; +Cc: netdev
In-Reply-To: <loom.20110226T011347-702@post.gmane.org>

On Fri, Feb 25, 2011 at 4:16 PM, chriss <mail_to_chriss@gmx.net> wrote:
> Jesse Gross <jesse <at> nicira.com> writes:
>
>>
>> What driver is in use with the NIC you are seeing this on?
>>
>
> He there
>
> the device in question is (as lspci told)
> Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8110SC/8169SC Gigabit
> Ethernet (rev 10)
>
> handled by kernel module r8169.

I'm guessing that you're hitting the special case in this code in
r8169.c:rtl8169_vlan_rx_register():

         /*
          * Do not disable RxVlan on 8110SCd.
          */
         if (tp->vlgrp || (tp->mac_version == RTL_GIGA_MAC_VER_05))
                 tp->cp_cmd |= RxVlan;
         else
                 tp->cp_cmd &= ~RxVlan;

Since before you were getting the vlans directly off the device there
was a vlan group configured.  However, now that the packets are going
through the bridge, the group is not being configured on the device
and the tag gets dropped.  Assuming that this is the case, the
solution is to convert the driver to use the new vlan model, which
does not require knowledge of the vlan group.

Can you confirm this by running tcpdump -eni br0?  I would expect that
you see the correct packets but without vlan tags.

^ permalink raw reply

* Re: TX VLAN acceleration on bridges broken in 2.6.37?
From: Jesse Gross @ 2011-02-26  1:16 UTC (permalink / raw)
  To: Jan Niehusmann; +Cc: linux-kernel, netdev, Tantilov, Emil S, Kirsher, Jeffrey T
In-Reply-To: <20110226001908.GA10777@x61s.reliablesolutions.de>

On Fri, Feb 25, 2011 at 4:19 PM, Jan Niehusmann <jan@gondor.com> wrote:
> On Fri, Feb 25, 2011 at 02:53:21PM -0800, Jesse Gross wrote:
>> is specific to the e1000e driver.  I know that some other Intel NICs
>> require vlan stripping on receive to be enabled for vlan insertion on
>> transmit to work.  Since this driver has not been converted over to
>> use the new vlan model yet, it only enables these things if a vlan is
>> directly configured on it.  To confirm this can you try a few things:
>
> My observations confirm your theory:

OK, thanks for confirming.  The right solution is convert the driver
over to the new vlan model.  I don't know how soon I might get to
this, maybe it's something that the Intel guys can take a look at?

> - indeed, -e is necessary to show the vlan tags. So my prior observation
>  regarding tag visibility in tcpdump was wrong. The packets are still
>  have a vlan tag in the non-working case.
>
>  (What actually is affected by the txvlan flag is the ability to filter
>  for vlan tags with tcpdump.  so 'tcpdump -e -i eth0' shows the packets,
>  'tcpdump -e -i eth0 vlan' only shows them with txvlan off. However,
>  filtering for the vlan tag also doesn't work with the vlan interface
>  on eth0.1, while the tagging actually works, as verified above.)

Good to know, though that's a separate issue.

^ permalink raw reply

* Re: TX VLAN acceleration on bridges broken in 2.6.37?
From: Jeff Kirsher @ 2011-02-26  1:22 UTC (permalink / raw)
  To: Jesse Gross, Allan, Bruce W
  Cc: Jan Niehusmann, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, Tantilov, Emil S
In-Reply-To: <AANLkTikVvjsaG94-gtauvTLjH_RW5fmva8+N7Lk-ryQ0@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1610 bytes --]

On Fri, 2011-02-25 at 17:16 -0800, Jesse Gross wrote:
> On Fri, Feb 25, 2011 at 4:19 PM, Jan Niehusmann <jan@gondor.com> wrote:
> > On Fri, Feb 25, 2011 at 02:53:21PM -0800, Jesse Gross wrote:
> >> is specific to the e1000e driver.  I know that some other Intel NICs
> >> require vlan stripping on receive to be enabled for vlan insertion on
> >> transmit to work.  Since this driver has not been converted over to
> >> use the new vlan model yet, it only enables these things if a vlan is
> >> directly configured on it.  To confirm this can you try a few things:
> >
> > My observations confirm your theory:
> 
> OK, thanks for confirming.  The right solution is convert the driver
> over to the new vlan model.  I don't know how soon I might get to
> this, maybe it's something that the Intel guys can take a look at?

I have made sure that Bruce is aware of the issue.

We will see what we can do to get some patches created and under
testing.

> 
> > - indeed, -e is necessary to show the vlan tags. So my prior observation
> >  regarding tag visibility in tcpdump was wrong. The packets are still
> >  have a vlan tag in the non-working case.
> >
> >  (What actually is affected by the txvlan flag is the ability to filter
> >  for vlan tags with tcpdump.  so 'tcpdump -e -i eth0' shows the packets,
> >  'tcpdump -e -i eth0 vlan' only shows them with txvlan off. However,
> >  filtering for the vlan tag also doesn't work with the vlan interface
> >  on eth0.1, while the tagging actually works, as verified above.)
> 
> Good to know, though that's a separate issue.



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: pull request: wireless-next-2.6 2011-02-22
From: Gustavo F. Padovan @ 2011-02-26  1:36 UTC (permalink / raw)
  To: David Miller, linville-2XuSBdqkA4R54TAoqtyWWQ,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	linux-bluetooth-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20110225193618.GB2107@joana>

* Gustavo F. Padovan <padovan-Y3ZbgMPKUGA34EUeqzHoZw@public.gmane.org> [2011-02-25 16:36:18 -0300]:

> Hi David,
> 
> * David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> [2011-02-25 11:15:00 -0800]:
> 
> > From: David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
> > Date: Thu, 24 Feb 2011 22:43:44 -0800 (PST)
> > 
> > > From: "John W. Linville" <linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
> > > Date: Tue, 22 Feb 2011 16:52:30 -0500
> > > 
> > >> Here is the latest batch of wireless bits intended for 2.6.39.  It seems
> > >> I neglected to send a pull request last week, so this one is a bit big
> > >> -- I apologize!
> > >> 
> > >> This includes a rather large batch of bluetooth bits by way of Gustavo.
> > >> It looks like a variety of bits, including some code refactoring, some
> > >> protocol support enhancements, some bugfixes, etc. -- nothing too
> > >> unusual.
> > >> 
> > >> Other items of interest include a new driver from Realtek, some ssb
> > >> support enhancements, and the usual sort of updates for mac80211 and a
> > >> variety of drivers.  Also included is a wireless-2.6 pull to resolve
> > >> some build breakage.
> > >> 
> > >> Please let me know if there are problems!
> > > 
> > > Pulled, thanks a lot John.
> > 
> > John a few things:
> > 
> > 1) I had to add some vmalloc.h includes to fix the build on sparc64,
> >    see commit b08cd667c4b6641c4d16a3f87f4550f81a6d69ac in net-next-2.6
> > 
> > 2) Something is screwey with the bluetooth config options now.
> > 
> >    I have an allmodconfig tree, and when I run "make oldconfig" after
> >    this pull, BT_L2CAP and BT_SCO both prompt me, claiming that they
> >    can only be built statically.
> > 
> >    I give it 'y' just to make it happen, for both, and afterways no
> >    matter how many times I rerun "make oldconfig" I keep seeing things
> >    like this in my build:
> > 
> > scripts/kconfig/conf --silentoldconfig Kconfig
> > include/config/auto.conf:986:warning: symbol value 'm' invalid for BT_SCO
> > include/config/auto.conf:3156:warning: symbol value 'm' invalid for BT_L2CAP
> > 
> >    First, what the heck is going on here?  Second, why the heck can't these
> >    non-trivial pieces of code be built modular any more?
> 
> We now have L2CAP and SCO built-in in the main bluetooth.ko module.
> 
> > 
> >    You can't make something "bool", have it depend on something that
> >    might be modular, and then build it into what could in fact be a
> >    module.  That's exactly what the bluetooth stuff seems to be doing
> >    now.
> 
> Seems I did the Kconfig change wrong, I'll fix it ASAP and send it to you
> guys.

I Figured the problem. When I first wrote this I based the work in other
Kconfig in net/ (as it was my very first time doing such kind of changes in a
Kconfig). For example, net/decnet/ and net/ax25/ do exactly the same as the
Bluetooth Kconfig. "bool" depending on "tristate" and build both together.

But doing another look after your report there is some places where this is
done a bit different, net/ipv6 and net/mac80211 are examples. Then I changed
to this new approach to remove the direct dependency from BT_L2CAP and BT_SCO.
Patch follows this e-mail.

That point me out that we may have other subsystems doing it wrong and we have
to fix this.

-- 
Gustavo F. Padovan
http://profusion.mobi

^ permalink raw reply

* [PATCH] Bluetooth: Fix BT_L2CAP and BT_SCO in Kconfig
From: Gustavo F. Padovan @ 2011-02-26  1:41 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: linville-2XuSBdqkA4R54TAoqtyWWQ,
	linux-bluetooth-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20110226013639.GA2166@joana>

If we want something "bool" built-in in something "tristate" it can't
"depend on" the tristate config option.

Report by DaveM:

   I give it 'y' just to make it happen, for both, and afterways no
   matter how many times I rerun "make oldconfig" I keep seeing things
   like this in my build:

scripts/kconfig/conf --silentoldconfig Kconfig
include/config/auto.conf:986:warning: symbol value 'm' invalid for BT_SCO
include/config/auto.conf:3156:warning: symbol value 'm' invalid for BT_L2CAP

Reported-by: David S. Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
Signed-off-by: Gustavo F. Padovan <padovan-Y3ZbgMPKUGA34EUeqzHoZw@public.gmane.org>
---
 net/bluetooth/Kconfig |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/bluetooth/Kconfig b/net/bluetooth/Kconfig
index c6f9c2f..6ae5ec5 100644
--- a/net/bluetooth/Kconfig
+++ b/net/bluetooth/Kconfig
@@ -31,9 +31,10 @@ menuconfig BT
 	  to Bluetooth kernel modules are provided in the BlueZ packages.  For
 	  more information, see <http://www.bluez.org/>.
 
+if BT != n
+
 config BT_L2CAP
 	bool "L2CAP protocol support"
-	depends on BT
 	select CRC16
 	help
 	  L2CAP (Logical Link Control and Adaptation Protocol) provides
@@ -42,11 +43,12 @@ config BT_L2CAP
 
 config BT_SCO
 	bool "SCO links support"
-	depends on BT
 	help
 	  SCO link provides voice transport over Bluetooth.  SCO support is
 	  required for voice applications like Headset and Audio.
 
+endif
+
 source "net/bluetooth/rfcomm/Kconfig"
 
 source "net/bluetooth/bnep/Kconfig"
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH] pfkey: Use const where possible.
From: David Miller @ 2011-02-26  2:07 UTC (permalink / raw)
  To: netdev


This actually pointed out a (seemingly known) bug where we mangle the
pfkey header in a potentially shared SKB, which is fixed here.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/key/af_key.c |  201 +++++++++++++++++++++++++++++-------------------------
 1 files changed, 107 insertions(+), 94 deletions(-)

diff --git a/net/key/af_key.c b/net/key/af_key.c
index 5637285..7fb5457 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -70,7 +70,7 @@ static inline struct pfkey_sock *pfkey_sk(struct sock *sk)
 	return (struct pfkey_sock *)sk;
 }
 
-static int pfkey_can_dump(struct sock *sk)
+static int pfkey_can_dump(const struct sock *sk)
 {
 	if (3 * atomic_read(&sk->sk_rmem_alloc) <= 2 * sk->sk_rcvbuf)
 		return 1;
@@ -303,12 +303,13 @@ static int pfkey_do_dump(struct pfkey_sock *pfk)
 	return rc;
 }
 
-static inline void pfkey_hdr_dup(struct sadb_msg *new, struct sadb_msg *orig)
+static inline void pfkey_hdr_dup(struct sadb_msg *new,
+				 const struct sadb_msg *orig)
 {
 	*new = *orig;
 }
 
-static int pfkey_error(struct sadb_msg *orig, int err, struct sock *sk)
+static int pfkey_error(const struct sadb_msg *orig, int err, struct sock *sk)
 {
 	struct sk_buff *skb = alloc_skb(sizeof(struct sadb_msg) + 16, GFP_KERNEL);
 	struct sadb_msg *hdr;
@@ -369,13 +370,13 @@ static u8 sadb_ext_min_len[] = {
 };
 
 /* Verify sadb_address_{len,prefixlen} against sa_family.  */
-static int verify_address_len(void *p)
+static int verify_address_len(const void *p)
 {
-	struct sadb_address *sp = p;
-	struct sockaddr *addr = (struct sockaddr *)(sp + 1);
-	struct sockaddr_in *sin;
+	const struct sadb_address *sp = p;
+	const struct sockaddr *addr = (const struct sockaddr *)(sp + 1);
+	const struct sockaddr_in *sin;
 #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
-	struct sockaddr_in6 *sin6;
+	const struct sockaddr_in6 *sin6;
 #endif
 	int len;
 
@@ -411,16 +412,16 @@ static int verify_address_len(void *p)
 	return 0;
 }
 
-static inline int pfkey_sec_ctx_len(struct sadb_x_sec_ctx *sec_ctx)
+static inline int pfkey_sec_ctx_len(const struct sadb_x_sec_ctx *sec_ctx)
 {
 	return DIV_ROUND_UP(sizeof(struct sadb_x_sec_ctx) +
 			    sec_ctx->sadb_x_ctx_len,
 			    sizeof(uint64_t));
 }
 
-static inline int verify_sec_ctx_len(void *p)
+static inline int verify_sec_ctx_len(const void *p)
 {
-	struct sadb_x_sec_ctx *sec_ctx = (struct sadb_x_sec_ctx *)p;
+	const struct sadb_x_sec_ctx *sec_ctx = p;
 	int len = sec_ctx->sadb_x_ctx_len;
 
 	if (len > PAGE_SIZE)
@@ -434,7 +435,7 @@ static inline int verify_sec_ctx_len(void *p)
 	return 0;
 }
 
-static inline struct xfrm_user_sec_ctx *pfkey_sadb2xfrm_user_sec_ctx(struct sadb_x_sec_ctx *sec_ctx)
+static inline struct xfrm_user_sec_ctx *pfkey_sadb2xfrm_user_sec_ctx(const struct sadb_x_sec_ctx *sec_ctx)
 {
 	struct xfrm_user_sec_ctx *uctx = NULL;
 	int ctx_size = sec_ctx->sadb_x_ctx_len;
@@ -455,16 +456,16 @@ static inline struct xfrm_user_sec_ctx *pfkey_sadb2xfrm_user_sec_ctx(struct sadb
 	return uctx;
 }
 
-static int present_and_same_family(struct sadb_address *src,
-				   struct sadb_address *dst)
+static int present_and_same_family(const struct sadb_address *src,
+				   const struct sadb_address *dst)
 {
-	struct sockaddr *s_addr, *d_addr;
+	const struct sockaddr *s_addr, *d_addr;
 
 	if (!src || !dst)
 		return 0;
 
-	s_addr = (struct sockaddr *)(src + 1);
-	d_addr = (struct sockaddr *)(dst + 1);
+	s_addr = (const struct sockaddr *)(src + 1);
+	d_addr = (const struct sockaddr *)(dst + 1);
 	if (s_addr->sa_family != d_addr->sa_family)
 		return 0;
 	if (s_addr->sa_family != AF_INET
@@ -477,15 +478,15 @@ static int present_and_same_family(struct sadb_address *src,
 	return 1;
 }
 
-static int parse_exthdrs(struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs)
+static int parse_exthdrs(struct sk_buff *skb, const struct sadb_msg *hdr, void **ext_hdrs)
 {
-	char *p = (char *) hdr;
+	const char *p = (char *) hdr;
 	int len = skb->len;
 
 	len -= sizeof(*hdr);
 	p += sizeof(*hdr);
 	while (len > 0) {
-		struct sadb_ext *ehdr = (struct sadb_ext *) p;
+		const struct sadb_ext *ehdr = (const struct sadb_ext *) p;
 		uint16_t ext_type;
 		int ext_len;
 
@@ -514,7 +515,7 @@ static int parse_exthdrs(struct sk_buff *skb, struct sadb_msg *hdr, void **ext_h
 				if (verify_sec_ctx_len(p))
 					return -EINVAL;
 			}
-			ext_hdrs[ext_type-1] = p;
+			ext_hdrs[ext_type-1] = (void *) p;
 		}
 		p   += ext_len;
 		len -= ext_len;
@@ -606,21 +607,21 @@ int pfkey_sockaddr_extract(const struct sockaddr *sa, xfrm_address_t *xaddr)
 }
 
 static
-int pfkey_sadb_addr2xfrm_addr(struct sadb_address *addr, xfrm_address_t *xaddr)
+int pfkey_sadb_addr2xfrm_addr(const struct sadb_address *addr, xfrm_address_t *xaddr)
 {
 	return pfkey_sockaddr_extract((struct sockaddr *)(addr + 1),
 				      xaddr);
 }
 
-static struct  xfrm_state *pfkey_xfrm_state_lookup(struct net *net, struct sadb_msg *hdr, void **ext_hdrs)
+static struct  xfrm_state *pfkey_xfrm_state_lookup(struct net *net, const struct sadb_msg *hdr, void * const *ext_hdrs)
 {
-	struct sadb_sa *sa;
-	struct sadb_address *addr;
+	const struct sadb_sa *sa;
+	const struct sadb_address *addr;
 	uint16_t proto;
 	unsigned short family;
 	xfrm_address_t *xaddr;
 
-	sa = (struct sadb_sa *) ext_hdrs[SADB_EXT_SA-1];
+	sa = (const struct sadb_sa *) ext_hdrs[SADB_EXT_SA-1];
 	if (sa == NULL)
 		return NULL;
 
@@ -629,18 +630,18 @@ static struct  xfrm_state *pfkey_xfrm_state_lookup(struct net *net, struct sadb_
 		return NULL;
 
 	/* sadb_address_len should be checked by caller */
-	addr = (struct sadb_address *) ext_hdrs[SADB_EXT_ADDRESS_DST-1];
+	addr = (const struct sadb_address *) ext_hdrs[SADB_EXT_ADDRESS_DST-1];
 	if (addr == NULL)
 		return NULL;
 
-	family = ((struct sockaddr *)(addr + 1))->sa_family;
+	family = ((const struct sockaddr *)(addr + 1))->sa_family;
 	switch (family) {
 	case AF_INET:
-		xaddr = (xfrm_address_t *)&((struct sockaddr_in *)(addr + 1))->sin_addr;
+		xaddr = (xfrm_address_t *)&((const struct sockaddr_in *)(addr + 1))->sin_addr;
 		break;
 #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
 	case AF_INET6:
-		xaddr = (xfrm_address_t *)&((struct sockaddr_in6 *)(addr + 1))->sin6_addr;
+		xaddr = (xfrm_address_t *)&((const struct sockaddr_in6 *)(addr + 1))->sin6_addr;
 		break;
 #endif
 	default:
@@ -691,8 +692,8 @@ static inline int pfkey_mode_to_xfrm(int mode)
 }
 
 static unsigned int pfkey_sockaddr_fill(const xfrm_address_t *xaddr, __be16 port,
-				       struct sockaddr *sa,
-				       unsigned short family)
+					struct sockaddr *sa,
+					unsigned short family)
 {
 	switch (family) {
 	case AF_INET:
@@ -720,7 +721,7 @@ static unsigned int pfkey_sockaddr_fill(const xfrm_address_t *xaddr, __be16 port
 	return 0;
 }
 
-static struct sk_buff *__pfkey_xfrm_state2msg(struct xfrm_state *x,
+static struct sk_buff *__pfkey_xfrm_state2msg(const struct xfrm_state *x,
 					      int add_keys, int hsc)
 {
 	struct sk_buff *skb;
@@ -1010,7 +1011,7 @@ static struct sk_buff *__pfkey_xfrm_state2msg(struct xfrm_state *x,
 }
 
 
-static inline struct sk_buff *pfkey_xfrm_state2msg(struct xfrm_state *x)
+static inline struct sk_buff *pfkey_xfrm_state2msg(const struct xfrm_state *x)
 {
 	struct sk_buff *skb;
 
@@ -1019,26 +1020,26 @@ static inline struct sk_buff *pfkey_xfrm_state2msg(struct xfrm_state *x)
 	return skb;
 }
 
-static inline struct sk_buff *pfkey_xfrm_state2msg_expire(struct xfrm_state *x,
+static inline struct sk_buff *pfkey_xfrm_state2msg_expire(const struct xfrm_state *x,
 							  int hsc)
 {
 	return __pfkey_xfrm_state2msg(x, 0, hsc);
 }
 
 static struct xfrm_state * pfkey_msg2xfrm_state(struct net *net,
-						struct sadb_msg *hdr,
-						void **ext_hdrs)
+						const struct sadb_msg *hdr,
+						void * const *ext_hdrs)
 {
 	struct xfrm_state *x;
-	struct sadb_lifetime *lifetime;
-	struct sadb_sa *sa;
-	struct sadb_key *key;
-	struct sadb_x_sec_ctx *sec_ctx;
+	const struct sadb_lifetime *lifetime;
+	const struct sadb_sa *sa;
+	const struct sadb_key *key;
+	const struct sadb_x_sec_ctx *sec_ctx;
 	uint16_t proto;
 	int err;
 
 
-	sa = (struct sadb_sa *) ext_hdrs[SADB_EXT_SA-1];
+	sa = (const struct sadb_sa *) ext_hdrs[SADB_EXT_SA-1];
 	if (!sa ||
 	    !present_and_same_family(ext_hdrs[SADB_EXT_ADDRESS_SRC-1],
 				     ext_hdrs[SADB_EXT_ADDRESS_DST-1]))
@@ -1077,7 +1078,7 @@ static struct xfrm_state * pfkey_msg2xfrm_state(struct net *net,
 	     sa->sadb_sa_encrypt > SADB_X_CALG_MAX) ||
 	    sa->sadb_sa_encrypt > SADB_EALG_MAX)
 		return ERR_PTR(-EINVAL);
-	key = (struct sadb_key*) ext_hdrs[SADB_EXT_KEY_AUTH-1];
+	key = (const struct sadb_key*) ext_hdrs[SADB_EXT_KEY_AUTH-1];
 	if (key != NULL &&
 	    sa->sadb_sa_auth != SADB_X_AALG_NULL &&
 	    ((key->sadb_key_bits+7) / 8 == 0 ||
@@ -1104,14 +1105,14 @@ static struct xfrm_state * pfkey_msg2xfrm_state(struct net *net,
 	if (sa->sadb_sa_flags & SADB_SAFLAGS_NOPMTUDISC)
 		x->props.flags |= XFRM_STATE_NOPMTUDISC;
 
-	lifetime = (struct sadb_lifetime*) ext_hdrs[SADB_EXT_LIFETIME_HARD-1];
+	lifetime = (const struct sadb_lifetime*) ext_hdrs[SADB_EXT_LIFETIME_HARD-1];
 	if (lifetime != NULL) {
 		x->lft.hard_packet_limit = _KEY2X(lifetime->sadb_lifetime_allocations);
 		x->lft.hard_byte_limit = _KEY2X(lifetime->sadb_lifetime_bytes);
 		x->lft.hard_add_expires_seconds = lifetime->sadb_lifetime_addtime;
 		x->lft.hard_use_expires_seconds = lifetime->sadb_lifetime_usetime;
 	}
-	lifetime = (struct sadb_lifetime*) ext_hdrs[SADB_EXT_LIFETIME_SOFT-1];
+	lifetime = (const struct sadb_lifetime*) ext_hdrs[SADB_EXT_LIFETIME_SOFT-1];
 	if (lifetime != NULL) {
 		x->lft.soft_packet_limit = _KEY2X(lifetime->sadb_lifetime_allocations);
 		x->lft.soft_byte_limit = _KEY2X(lifetime->sadb_lifetime_bytes);
@@ -1119,7 +1120,7 @@ static struct xfrm_state * pfkey_msg2xfrm_state(struct net *net,
 		x->lft.soft_use_expires_seconds = lifetime->sadb_lifetime_usetime;
 	}
 
-	sec_ctx = (struct sadb_x_sec_ctx *) ext_hdrs[SADB_X_EXT_SEC_CTX-1];
+	sec_ctx = (const struct sadb_x_sec_ctx *) ext_hdrs[SADB_X_EXT_SEC_CTX-1];
 	if (sec_ctx != NULL) {
 		struct xfrm_user_sec_ctx *uctx = pfkey_sadb2xfrm_user_sec_ctx(sec_ctx);
 
@@ -1133,7 +1134,7 @@ static struct xfrm_state * pfkey_msg2xfrm_state(struct net *net,
 			goto out;
 	}
 
-	key = (struct sadb_key*) ext_hdrs[SADB_EXT_KEY_AUTH-1];
+	key = (const struct sadb_key*) ext_hdrs[SADB_EXT_KEY_AUTH-1];
 	if (sa->sadb_sa_auth) {
 		int keysize = 0;
 		struct xfrm_algo_desc *a = xfrm_aalg_get_byid(sa->sadb_sa_auth);
@@ -1202,7 +1203,7 @@ static struct xfrm_state * pfkey_msg2xfrm_state(struct net *net,
 				  &x->id.daddr);
 
 	if (ext_hdrs[SADB_X_EXT_SA2-1]) {
-		struct sadb_x_sa2 *sa2 = (void*)ext_hdrs[SADB_X_EXT_SA2-1];
+		const struct sadb_x_sa2 *sa2 = ext_hdrs[SADB_X_EXT_SA2-1];
 		int mode = pfkey_mode_to_xfrm(sa2->sadb_x_sa2_mode);
 		if (mode < 0) {
 			err = -EINVAL;
@@ -1213,7 +1214,7 @@ static struct xfrm_state * pfkey_msg2xfrm_state(struct net *net,
 	}
 
 	if (ext_hdrs[SADB_EXT_ADDRESS_PROXY-1]) {
-		struct sadb_address *addr = ext_hdrs[SADB_EXT_ADDRESS_PROXY-1];
+		const struct sadb_address *addr = ext_hdrs[SADB_EXT_ADDRESS_PROXY-1];
 
 		/* Nobody uses this, but we try. */
 		x->sel.family = pfkey_sadb_addr2xfrm_addr(addr, &x->sel.saddr);
@@ -1224,7 +1225,7 @@ static struct xfrm_state * pfkey_msg2xfrm_state(struct net *net,
 		x->sel.family = x->props.family;
 
 	if (ext_hdrs[SADB_X_EXT_NAT_T_TYPE-1]) {
-		struct sadb_x_nat_t_type* n_type;
+		const struct sadb_x_nat_t_type* n_type;
 		struct xfrm_encap_tmpl *natt;
 
 		x->encap = kmalloc(sizeof(*x->encap), GFP_KERNEL);
@@ -1236,12 +1237,12 @@ static struct xfrm_state * pfkey_msg2xfrm_state(struct net *net,
 		natt->encap_type = n_type->sadb_x_nat_t_type_type;
 
 		if (ext_hdrs[SADB_X_EXT_NAT_T_SPORT-1]) {
-			struct sadb_x_nat_t_port* n_port =
+			const struct sadb_x_nat_t_port *n_port =
 				ext_hdrs[SADB_X_EXT_NAT_T_SPORT-1];
 			natt->encap_sport = n_port->sadb_x_nat_t_port_port;
 		}
 		if (ext_hdrs[SADB_X_EXT_NAT_T_DPORT-1]) {
-			struct sadb_x_nat_t_port* n_port =
+			const struct sadb_x_nat_t_port *n_port =
 				ext_hdrs[SADB_X_EXT_NAT_T_DPORT-1];
 			natt->encap_dport = n_port->sadb_x_nat_t_port_port;
 		}
@@ -1261,12 +1262,12 @@ out:
 	return ERR_PTR(err);
 }
 
-static int pfkey_reserved(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs)
+static int pfkey_reserved(struct sock *sk, struct sk_buff *skb, const struct sadb_msg *hdr, void * const *ext_hdrs)
 {
 	return -EOPNOTSUPP;
 }
 
-static int pfkey_getspi(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs)
+static int pfkey_getspi(struct sock *sk, struct sk_buff *skb, const struct sadb_msg *hdr, void * const *ext_hdrs)
 {
 	struct net *net = sock_net(sk);
 	struct sk_buff *resp_skb;
@@ -1365,7 +1366,7 @@ static int pfkey_getspi(struct sock *sk, struct sk_buff *skb, struct sadb_msg *h
 	return 0;
 }
 
-static int pfkey_acquire(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs)
+static int pfkey_acquire(struct sock *sk, struct sk_buff *skb, const struct sadb_msg *hdr, void * const *ext_hdrs)
 {
 	struct net *net = sock_net(sk);
 	struct xfrm_state *x;
@@ -1453,7 +1454,7 @@ static int key_notify_sa(struct xfrm_state *x, const struct km_event *c)
 	return 0;
 }
 
-static int pfkey_add(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs)
+static int pfkey_add(struct sock *sk, struct sk_buff *skb, const struct sadb_msg *hdr, void * const *ext_hdrs)
 {
 	struct net *net = sock_net(sk);
 	struct xfrm_state *x;
@@ -1492,7 +1493,7 @@ out:
 	return err;
 }
 
-static int pfkey_delete(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs)
+static int pfkey_delete(struct sock *sk, struct sk_buff *skb, const struct sadb_msg *hdr, void * const *ext_hdrs)
 {
 	struct net *net = sock_net(sk);
 	struct xfrm_state *x;
@@ -1534,7 +1535,7 @@ out:
 	return err;
 }
 
-static int pfkey_get(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs)
+static int pfkey_get(struct sock *sk, struct sk_buff *skb, const struct sadb_msg *hdr, void * const *ext_hdrs)
 {
 	struct net *net = sock_net(sk);
 	__u8 proto;
@@ -1570,7 +1571,7 @@ static int pfkey_get(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr,
 	return 0;
 }
 
-static struct sk_buff *compose_sadb_supported(struct sadb_msg *orig,
+static struct sk_buff *compose_sadb_supported(const struct sadb_msg *orig,
 					      gfp_t allocation)
 {
 	struct sk_buff *skb;
@@ -1642,7 +1643,7 @@ out_put_algs:
 	return skb;
 }
 
-static int pfkey_register(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs)
+static int pfkey_register(struct sock *sk, struct sk_buff *skb, const struct sadb_msg *hdr, void * const *ext_hdrs)
 {
 	struct pfkey_sock *pfk = pfkey_sk(sk);
 	struct sk_buff *supp_skb;
@@ -1671,7 +1672,7 @@ static int pfkey_register(struct sock *sk, struct sk_buff *skb, struct sadb_msg
 	return 0;
 }
 
-static int unicast_flush_resp(struct sock *sk, struct sadb_msg *ihdr)
+static int unicast_flush_resp(struct sock *sk, const struct sadb_msg *ihdr)
 {
 	struct sk_buff *skb;
 	struct sadb_msg *hdr;
@@ -1710,7 +1711,7 @@ static int key_notify_sa_flush(const struct km_event *c)
 	return 0;
 }
 
-static int pfkey_flush(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs)
+static int pfkey_flush(struct sock *sk, struct sk_buff *skb, const struct sadb_msg *hdr, void * const *ext_hdrs)
 {
 	struct net *net = sock_net(sk);
 	unsigned proto;
@@ -1784,7 +1785,7 @@ static void pfkey_dump_sa_done(struct pfkey_sock *pfk)
 	xfrm_state_walk_done(&pfk->dump.u.state);
 }
 
-static int pfkey_dump(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs)
+static int pfkey_dump(struct sock *sk, struct sk_buff *skb, const struct sadb_msg *hdr, void * const *ext_hdrs)
 {
 	u8 proto;
 	struct pfkey_sock *pfk = pfkey_sk(sk);
@@ -1805,19 +1806,29 @@ static int pfkey_dump(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr
 	return pfkey_do_dump(pfk);
 }
 
-static int pfkey_promisc(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs)
+static int pfkey_promisc(struct sock *sk, struct sk_buff *skb, const struct sadb_msg *hdr, void * const *ext_hdrs)
 {
 	struct pfkey_sock *pfk = pfkey_sk(sk);
 	int satype = hdr->sadb_msg_satype;
+	bool reset_errno = false;
 
 	if (hdr->sadb_msg_len == (sizeof(*hdr) / sizeof(uint64_t))) {
-		/* XXX we mangle packet... */
-		hdr->sadb_msg_errno = 0;
+		reset_errno = true;
 		if (satype != 0 && satype != 1)
 			return -EINVAL;
 		pfk->promisc = satype;
 	}
-	pfkey_broadcast(skb_clone(skb, GFP_KERNEL), GFP_KERNEL, BROADCAST_ALL, NULL, sock_net(sk));
+	if (reset_errno && skb_cloned(skb))
+		skb = skb_copy(skb, GFP_KERNEL);
+	else
+		skb = skb_clone(skb, GFP_KERNEL);
+
+	if (reset_errno && skb) {
+		struct sadb_msg *new_hdr = (struct sadb_msg *) skb->data;
+		new_hdr->sadb_msg_errno = 0;
+	}
+
+	pfkey_broadcast(skb, GFP_KERNEL, BROADCAST_ALL, NULL, sock_net(sk));
 	return 0;
 }
 
@@ -1921,7 +1932,7 @@ parse_ipsecrequests(struct xfrm_policy *xp, struct sadb_x_policy *pol)
 	return 0;
 }
 
-static inline int pfkey_xfrm_policy2sec_ctx_size(struct xfrm_policy *xp)
+static inline int pfkey_xfrm_policy2sec_ctx_size(const struct xfrm_policy *xp)
 {
   struct xfrm_sec_ctx *xfrm_ctx = xp->security;
 
@@ -1933,9 +1944,9 @@ static inline int pfkey_xfrm_policy2sec_ctx_size(struct xfrm_policy *xp)
 	return 0;
 }
 
-static int pfkey_xfrm_policy2msg_size(struct xfrm_policy *xp)
+static int pfkey_xfrm_policy2msg_size(const struct xfrm_policy *xp)
 {
-	struct xfrm_tmpl *t;
+	const struct xfrm_tmpl *t;
 	int sockaddr_size = pfkey_sockaddr_size(xp->family);
 	int socklen = 0;
 	int i;
@@ -1955,7 +1966,7 @@ static int pfkey_xfrm_policy2msg_size(struct xfrm_policy *xp)
 		pfkey_xfrm_policy2sec_ctx_size(xp);
 }
 
-static struct sk_buff * pfkey_xfrm_policy2msg_prep(struct xfrm_policy *xp)
+static struct sk_buff * pfkey_xfrm_policy2msg_prep(const struct xfrm_policy *xp)
 {
 	struct sk_buff *skb;
 	int size;
@@ -1969,7 +1980,7 @@ static struct sk_buff * pfkey_xfrm_policy2msg_prep(struct xfrm_policy *xp)
 	return skb;
 }
 
-static int pfkey_xfrm_policy2msg(struct sk_buff *skb, struct xfrm_policy *xp, int dir)
+static int pfkey_xfrm_policy2msg(struct sk_buff *skb, const struct xfrm_policy *xp, int dir)
 {
 	struct sadb_msg *hdr;
 	struct sadb_address *addr;
@@ -2065,8 +2076,8 @@ static int pfkey_xfrm_policy2msg(struct sk_buff *skb, struct xfrm_policy *xp, in
 	pol->sadb_x_policy_priority = xp->priority;
 
 	for (i=0; i<xp->xfrm_nr; i++) {
+		const struct xfrm_tmpl *t = xp->xfrm_vec + i;
 		struct sadb_x_ipsecrequest *rq;
-		struct xfrm_tmpl *t = xp->xfrm_vec + i;
 		int req_size;
 		int mode;
 
@@ -2152,7 +2163,7 @@ static int key_notify_policy(struct xfrm_policy *xp, int dir, const struct km_ev
 
 }
 
-static int pfkey_spdadd(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs)
+static int pfkey_spdadd(struct sock *sk, struct sk_buff *skb, const struct sadb_msg *hdr, void * const *ext_hdrs)
 {
 	struct net *net = sock_net(sk);
 	int err = 0;
@@ -2273,7 +2284,7 @@ out:
 	return err;
 }
 
-static int pfkey_spddelete(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs)
+static int pfkey_spddelete(struct sock *sk, struct sk_buff *skb, const struct sadb_msg *hdr, void * const *ext_hdrs)
 {
 	struct net *net = sock_net(sk);
 	int err;
@@ -2350,7 +2361,7 @@ out:
 	return err;
 }
 
-static int key_pol_get_resp(struct sock *sk, struct xfrm_policy *xp, struct sadb_msg *hdr, int dir)
+static int key_pol_get_resp(struct sock *sk, struct xfrm_policy *xp, const struct sadb_msg *hdr, int dir)
 {
 	int err;
 	struct sk_buff *out_skb;
@@ -2458,7 +2469,7 @@ static int ipsecrequests_to_migrate(struct sadb_x_ipsecrequest *rq1, int len,
 }
 
 static int pfkey_migrate(struct sock *sk, struct sk_buff *skb,
-			 struct sadb_msg *hdr, void **ext_hdrs)
+			 const struct sadb_msg *hdr, void * const *ext_hdrs)
 {
 	int i, len, ret, err = -EINVAL;
 	u8 dir;
@@ -2556,7 +2567,7 @@ static int pfkey_migrate(struct sock *sk, struct sk_buff *skb,
 #endif
 
 
-static int pfkey_spdget(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs)
+static int pfkey_spdget(struct sock *sk, struct sk_buff *skb, const struct sadb_msg *hdr, void * const *ext_hdrs)
 {
 	struct net *net = sock_net(sk);
 	unsigned int dir;
@@ -2644,7 +2655,7 @@ static void pfkey_dump_sp_done(struct pfkey_sock *pfk)
 	xfrm_policy_walk_done(&pfk->dump.u.policy);
 }
 
-static int pfkey_spddump(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs)
+static int pfkey_spddump(struct sock *sk, struct sk_buff *skb, const struct sadb_msg *hdr, void * const *ext_hdrs)
 {
 	struct pfkey_sock *pfk = pfkey_sk(sk);
 
@@ -2680,7 +2691,7 @@ static int key_notify_policy_flush(const struct km_event *c)
 
 }
 
-static int pfkey_spdflush(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs)
+static int pfkey_spdflush(struct sock *sk, struct sk_buff *skb, const struct sadb_msg *hdr, void * const *ext_hdrs)
 {
 	struct net *net = sock_net(sk);
 	struct km_event c;
@@ -2709,7 +2720,7 @@ static int pfkey_spdflush(struct sock *sk, struct sk_buff *skb, struct sadb_msg
 }
 
 typedef int (*pfkey_handler)(struct sock *sk, struct sk_buff *skb,
-			     struct sadb_msg *hdr, void **ext_hdrs);
+			     const struct sadb_msg *hdr, void * const *ext_hdrs);
 static pfkey_handler pfkey_funcs[SADB_MAX + 1] = {
 	[SADB_RESERVED]		= pfkey_reserved,
 	[SADB_GETSPI]		= pfkey_getspi,
@@ -2736,7 +2747,7 @@ static pfkey_handler pfkey_funcs[SADB_MAX + 1] = {
 	[SADB_X_MIGRATE]	= pfkey_migrate,
 };
 
-static int pfkey_process(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr)
+static int pfkey_process(struct sock *sk, struct sk_buff *skb, const struct sadb_msg *hdr)
 {
 	void *ext_hdrs[SADB_EXT_MAX];
 	int err;
@@ -2781,7 +2792,8 @@ static struct sadb_msg *pfkey_get_base_msg(struct sk_buff *skb, int *errp)
 	return hdr;
 }
 
-static inline int aalg_tmpl_set(struct xfrm_tmpl *t, struct xfrm_algo_desc *d)
+static inline int aalg_tmpl_set(const struct xfrm_tmpl *t,
+				const struct xfrm_algo_desc *d)
 {
 	unsigned int id = d->desc.sadb_alg_id;
 
@@ -2791,7 +2803,8 @@ static inline int aalg_tmpl_set(struct xfrm_tmpl *t, struct xfrm_algo_desc *d)
 	return (t->aalgos >> id) & 1;
 }
 
-static inline int ealg_tmpl_set(struct xfrm_tmpl *t, struct xfrm_algo_desc *d)
+static inline int ealg_tmpl_set(const struct xfrm_tmpl *t,
+				const struct xfrm_algo_desc *d)
 {
 	unsigned int id = d->desc.sadb_alg_id;
 
@@ -2801,12 +2814,12 @@ static inline int ealg_tmpl_set(struct xfrm_tmpl *t, struct xfrm_algo_desc *d)
 	return (t->ealgos >> id) & 1;
 }
 
-static int count_ah_combs(struct xfrm_tmpl *t)
+static int count_ah_combs(const struct xfrm_tmpl *t)
 {
 	int i, sz = 0;
 
 	for (i = 0; ; i++) {
-		struct xfrm_algo_desc *aalg = xfrm_aalg_get_byidx(i);
+		const struct xfrm_algo_desc *aalg = xfrm_aalg_get_byidx(i);
 		if (!aalg)
 			break;
 		if (aalg_tmpl_set(t, aalg) && aalg->available)
@@ -2815,12 +2828,12 @@ static int count_ah_combs(struct xfrm_tmpl *t)
 	return sz + sizeof(struct sadb_prop);
 }
 
-static int count_esp_combs(struct xfrm_tmpl *t)
+static int count_esp_combs(const struct xfrm_tmpl *t)
 {
 	int i, k, sz = 0;
 
 	for (i = 0; ; i++) {
-		struct xfrm_algo_desc *ealg = xfrm_ealg_get_byidx(i);
+		const struct xfrm_algo_desc *ealg = xfrm_ealg_get_byidx(i);
 		if (!ealg)
 			break;
 
@@ -2828,7 +2841,7 @@ static int count_esp_combs(struct xfrm_tmpl *t)
 			continue;
 
 		for (k = 1; ; k++) {
-			struct xfrm_algo_desc *aalg = xfrm_aalg_get_byidx(k);
+			const struct xfrm_algo_desc *aalg = xfrm_aalg_get_byidx(k);
 			if (!aalg)
 				break;
 
@@ -2839,7 +2852,7 @@ static int count_esp_combs(struct xfrm_tmpl *t)
 	return sz + sizeof(struct sadb_prop);
 }
 
-static void dump_ah_combs(struct sk_buff *skb, struct xfrm_tmpl *t)
+static void dump_ah_combs(struct sk_buff *skb, const struct xfrm_tmpl *t)
 {
 	struct sadb_prop *p;
 	int i;
@@ -2851,7 +2864,7 @@ static void dump_ah_combs(struct sk_buff *skb, struct xfrm_tmpl *t)
 	memset(p->sadb_prop_reserved, 0, sizeof(p->sadb_prop_reserved));
 
 	for (i = 0; ; i++) {
-		struct xfrm_algo_desc *aalg = xfrm_aalg_get_byidx(i);
+		const struct xfrm_algo_desc *aalg = xfrm_aalg_get_byidx(i);
 		if (!aalg)
 			break;
 
@@ -2871,7 +2884,7 @@ static void dump_ah_combs(struct sk_buff *skb, struct xfrm_tmpl *t)
 	}
 }
 
-static void dump_esp_combs(struct sk_buff *skb, struct xfrm_tmpl *t)
+static void dump_esp_combs(struct sk_buff *skb, const struct xfrm_tmpl *t)
 {
 	struct sadb_prop *p;
 	int i, k;
@@ -2883,7 +2896,7 @@ static void dump_esp_combs(struct sk_buff *skb, struct xfrm_tmpl *t)
 	memset(p->sadb_prop_reserved, 0, sizeof(p->sadb_prop_reserved));
 
 	for (i=0; ; i++) {
-		struct xfrm_algo_desc *ealg = xfrm_ealg_get_byidx(i);
+		const struct xfrm_algo_desc *ealg = xfrm_ealg_get_byidx(i);
 		if (!ealg)
 			break;
 
@@ -2892,7 +2905,7 @@ static void dump_esp_combs(struct sk_buff *skb, struct xfrm_tmpl *t)
 
 		for (k = 1; ; k++) {
 			struct sadb_comb *c;
-			struct xfrm_algo_desc *aalg = xfrm_aalg_get_byidx(k);
+			const struct xfrm_algo_desc *aalg = xfrm_aalg_get_byidx(k);
 			if (!aalg)
 				break;
 			if (!(aalg_tmpl_set(t, aalg) && aalg->available))
-- 
1.7.4.1


^ permalink raw reply related

* Re: SO_REUSEPORT - can it be done in kernel?
From: David Miller @ 2011-02-26  2:12 UTC (permalink / raw)
  To: herbert; +Cc: rick.jones2, tgraf, therbert, wsommerfeld, daniel.baluta, netdev
In-Reply-To: <20110226005718.GA19889@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sat, 26 Feb 2011 08:57:18 +0800

> It isn't all that hard since the easy way would be to only take
> the lock if we're already corked or about to cork.

We take the lock unconditionally because we essentially have to after
UDP takes on the socket buffer accounting facilities similar to TCP.

^ permalink raw reply

* Re: SO_REUSEPORT - can it be done in kernel?
From: Herbert Xu @ 2011-02-26  2:48 UTC (permalink / raw)
  To: David Miller
  Cc: rick.jones2, tgraf, therbert, wsommerfeld, daniel.baluta, netdev
In-Reply-To: <20110225.181244.104056532.davem@davemloft.net>

On Fri, Feb 25, 2011 at 06:12:44PM -0800, David Miller wrote:
> 
> We take the lock unconditionally because we essentially have to after
> UDP takes on the socket buffer accounting facilities similar to TCP.

Well I just checked out the history tree (2.6.12) and it too had
the unconditional lock on the send path.  So this predates the
system-wide buffer limit change.

I'm looking at redoing this and the bulk of the work is going to
be restructuring ip_append_data/ip_push_pending_frames so that it
doesn't store the states in sk/inet_sk.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: SO_REUSEPORT - can it be done in kernel?
From: David Miller @ 2011-02-26  3:07 UTC (permalink / raw)
  To: herbert; +Cc: rick.jones2, tgraf, therbert, wsommerfeld, daniel.baluta, netdev
In-Reply-To: <20110226024848.GA20993@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sat, 26 Feb 2011 10:48:48 +0800

> I'm looking at redoing this and the bulk of the work is going to
> be restructuring ip_append_data/ip_push_pending_frames so that it
> doesn't store the states in sk/inet_sk.

I suppose you're going to replace that stuff with an on-stack
control structure that gets passed around by reference or
similar?

^ permalink raw reply

* Re: SO_REUSEPORT - can it be done in kernel?
From: Herbert Xu @ 2011-02-26  3:11 UTC (permalink / raw)
  To: David Miller
  Cc: rick.jones2, tgraf, therbert, wsommerfeld, daniel.baluta, netdev
In-Reply-To: <20110225.190723.39180243.davem@davemloft.net>

On Fri, Feb 25, 2011 at 07:07:23PM -0800, David Miller wrote:
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Sat, 26 Feb 2011 10:48:48 +0800
> 
> > I'm looking at redoing this and the bulk of the work is going to
> > be restructuring ip_append_data/ip_push_pending_frames so that it
> > doesn't store the states in sk/inet_sk.
> 
> I suppose you're going to replace that stuff with an on-stack
> control structure that gets passed around by reference or
> similar?

Either that or have ip_append_data do ip_push_pending_frames
directly.

That function's signature is a mess already and I need to think
about this a bit more :)

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [net-next-2.6 PATCH 02/10] ethtool: add ntuple flow specifier to network flow classifier
From: Alexander Duyck @ 2011-02-26  5:30 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Alexander Duyck, davem, jeffrey.t.kirsher, netdev
In-Reply-To: <1298682048.3555.18.camel@localhost>

On Fri, Feb 25, 2011 at 5:00 PM, Ben Hutchings
<bhutchings@solarflare.com> wrote:
> On Fri, 2011-02-25 at 15:32 -0800, Alexander Duyck wrote:
>> This change is meant to add an ntuple define type to the rx network flow
>> classification specifiers.  The idea is to allow ntuple to be displayed and
>> possibly configured via the network flow classification interface.  To do
>> this I added a ntuple_flow_spec_ext to the lsit of supported filters, and
>> added a flow_type_ext value to the structure in an unused hole within the
>> ethtool_rx_flow_spec structure.
>
> There's a hole there on 64-bit architectures.  Unfortunately, on i386
> and other architectures where u64 is not 64-bit-aligned, there isn't.
> We actually need to add compat handling for the commands that use it.
>
> Also, we don't want these flags to be ignored by older kernel versions
> and drivers - they should reject specs that they don't understand.  So
> any extension flags need to be added to flow_type.
>
>> Due to the fact that the flow specifier structures are only 4 byte aligned
>> instead of 8 I had to break the user data field into 2 sections.  In
>> addition I added the vlan ethertype field since this is what ixgbe was
>> using the user-data for currently and it allows for the fields to stay 4
>> byte aligned while occupying space at the end of the flow_spec.
>>
>> In order to guarantee byte ordering I also thought it best to keep all
>> fields in the flow_spec area a big endian value, as such I added vlan, vlan
>> ethertype, and data as big endian values.
>
> It's not important that byte order is consistent across architectures.
>
>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>> ---
>>
>>  include/linux/ethtool.h |   20 ++++++++++++++++++++
>>  1 files changed, 20 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
>> index aac3e2e..3d1f8e0 100644
>> --- a/include/linux/ethtool.h
>> +++ b/include/linux/ethtool.h
>> @@ -378,10 +378,25 @@ struct ethtool_usrip4_spec {
>>  };
>>
>>  /**
>> + * struct ethtool_ntuple_spec_ext - flow spec extension for ntuple in nfc
>> + * @unused: space unused by extension
>> + * @vlan_etype: EtherType for vlan tagged packet to match
>> + * @vlan_tci: VLAN tag to match
>> + * @data: Driver-dependent data to match
>> + */
>> +struct ethtool_ntuple_spec_ext {
>> +     __be32  unused[15];
>> +     __be16  vlan_etype;
>> +     __be16  vlan_tci;
>> +     __be32  data[2];
>> +};
> [...]
>
> This is a really nasty way to reclaim space in the union.
>
> Let's name the union, shrink it and insert the extra fields that way:
>
> --- a/include/linux/ethtool.h
> +++ b/include/linux/ethtool.h
> @@ -377,27 +377,43 @@ struct ethtool_usrip4_spec {
>        __u8    proto;
>  };
>
> +union ethtool_flow_union {
> +       struct ethtool_tcpip4_spec              tcp_ip4_spec;
> +       struct ethtool_tcpip4_spec              udp_ip4_spec;
> +       struct ethtool_tcpip4_spec              sctp_ip4_spec;
> +       struct ethtool_ah_espip4_spec           ah_ip4_spec;
> +       struct ethtool_ah_espip4_spec           esp_ip4_spec;
> +       struct ethtool_usrip4_spec              usr_ip4_spec;
> +       struct ethhdr                           ether_spec;
> +       __u8                                    hdata[52];
> +};
> +
> +struct ethtool_flow_ext {
> +       __be16  vlan_etype;
> +       __be16  vlan_tci;
> +       __be32  data[2];
> +       __u32   reserved[2];
> +};
> +

Any chance of getting the reserved fields moved to the top of the
structure?  My only concern is that we might end up with a flow spec
larger than 52 bytes at some point and moving the reserved fields to
the front might give us a little more wiggle room future
compatibility.

>  /**
>  * struct ethtool_rx_flow_spec - specification for RX flow filter
>  * @flow_type: Type of match to perform, e.g. %TCP_V4_FLOW
>  * @h_u: Flow fields to match (dependent on @flow_type)
> + * @h_ext: Additional fields to match
>  * @m_u: Masks for flow field bits to be ignored
> + * @m_ext: Masks for additional field bits to be ignored.
> + *     Note, all additional fields must be ignored unless @flow_type
> + *     includes the %FLOW_EXT flag.
>  * @ring_cookie: RX ring/queue index to deliver to, or %RX_CLS_FLOW_DISC
>  *     if packets should be discarded
>  * @location: Index of filter in hardware table
>  */
>  struct ethtool_rx_flow_spec {
>        __u32           flow_type;
> -       union {
> -               struct ethtool_tcpip4_spec              tcp_ip4_spec;
> -               struct ethtool_tcpip4_spec              udp_ip4_spec;
> -               struct ethtool_tcpip4_spec              sctp_ip4_spec;
> -               struct ethtool_ah_espip4_spec           ah_ip4_spec;
> -               struct ethtool_ah_espip4_spec           esp_ip4_spec;
> -               struct ethtool_usrip4_spec              usr_ip4_spec;
> -               struct ethhdr                           ether_spec;
> -               __u8                                    hdata[72];
> -       } h_u, m_u;
> +       union ethtool_flow_union h_u;
> +       struct ethtool_flow_ext h_ext;
> +       union ethtool_flow_union m_u;
> +       struct ethtool_flow_ext m_ext;
>        __u64           ring_cookie;
>        __u32           location;
>  };
> @@ -954,6 +970,8 @@ struct ethtool_ops {
>  #define        IPV4_FLOW       0x10    /* hash only */
>  #define        IPV6_FLOW       0x11    /* hash only */
>  #define        ETHER_FLOW      0x12    /* spec only (ether_spec) */
> +/* Flag to enable additional fields in struct ethtool_rx_flow_spec */
> +#define        FLOW_EXT        0x80000000
>
>  /* L3-L4 network traffic flow hash options */
>  #define        RXH_L2DA        (1 << 1)
> ---
>
> Ben.

This works for my purposes other than the one comment above.  However
if you are fine with it I am good with it since I can't think of any
filters that we might need in the near future that would require more
than 52 bytes.

Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>

^ permalink raw reply

* Re: [PATCH] xps-mq: Transmit Packet Steering for multiqueue
From: David Miller @ 2011-02-26  7:09 UTC (permalink / raw)
  To: bhutchings; +Cc: therbert, eric.dumazet, shemminger, netdev
In-Reply-To: <1298312395.2608.65.camel@bwh-desktop>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Mon, 21 Feb 2011 18:19:55 +0000

> On Wed, 2010-09-01 at 18:32 -0700, David Miller wrote:
>> 2) TX queue datastructures in the driver get reallocated using
>>    memory in that NUMA domain.
> 
> I've previously sent patches to add an ethtool API for NUMA control,
> which include the option to allocate on the same node where IRQs are
> handled.  However, there is currently no function to allocate
> DMA-coherent memory on a specified NUMA node (rather than the device's
> node).  This is likely to be beneficial for event rings and might be
> good for descriptor rings for some devices.  (The implementation I sent
> for sfc mistakenly switched it to allocating non-coherent memory, for
> which it *is* possible to specify the node.)

The thing to do is to work with someone like FUJITA Tomonori on this.

It's simply a matter of making new APIs that take the node specifier,
have the implementations either make use of or completely ignore the node,
and have the existing APIs pass in "-1" for the node or whatever the
CPP macro is for this :-)

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox