Netdev List
 help / color / mirror / Atom feed
* Re: 2.6.38-rc3-git1: Reported regressions 2.6.36 -> 2.6.37
From: Linus Torvalds @ 2011-02-03 15:42 UTC (permalink / raw)
  To: Carlos R. Mafra, Keith Packard, Dave Airlie
  Cc: Linux SCSI List, Linux ACPI, Network Development,
	Linux Wireless List, Linux Kernel Mailing List, DRI,
	Rafael J. Wysocki, Florian Mickler, Andrew Morton,
	Kernel Testers List, Linux PM List, Maciej Rutecki
In-Reply-To: <20110203112316.GA3718@linux-yscl.site>

On Thu, Feb 3, 2011 at 3:23 AM, Carlos R. Mafra <crmafra2@gmail.com> wrote:
> On Thu  3.Feb'11 at  1:03:41 +0100, Rafael J. Wysocki wrote:
>>
>> If you know of any other unresolved post-2.6.36 regressions, please let us know
>> either and we'll add them to the list.  Also, please let us know if any
>> of the entries below are invalid.
>
> I'm sorry if I'm overlooking something, but as far as I can see the regression
> reported here:
>
> https://lkml.org/lkml/2011/1/24/457
>
> is not in the list (update on that report: reverting that commit on top of
> 2.6.37 fixes the issue).

Ok, added Keith and Dave to the cc, since they are the signers of that commit.

> After some time, I also ended up finding an earlier report in the kernel bugzilla
> which I think is the same regression (it was bisected to the same commit):
>
> https://bugzilla.kernel.org/show_bug.cgi?id=24982
>
> but I do not see it in the list either, even though it's marked as a
> regression in the bugzilla.
>
> The issue was also present in 2.6.38-rc2 last time I tested.

Just to confirm, can you also check -rc3? I'm pretty sure nothing has
changed, but there were a few drm patches after -rc2, so it's alsways
good to double-check.

Keithp?

                        Linus

^ permalink raw reply

* Re: [PATCH] NETFILTER module xt_hmark new target for HASH MARK
From: Pablo Neira Ayuso @ 2011-02-03 15:42 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: kaber@trash.net, jengelh@medozas.de,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	hans@schillstrom.com
In-Reply-To: <1296742995.6662.57.camel@seasc0214>

On 03/02/11 15:23, Hans Schillstrom wrote:
> On Thu, 2011-02-03 at 14:51 +0100, Pablo Neira Ayuso wrote:
>> On 03/02/11 14:34, Hans Schillstrom wrote:
>> this assumption is not valid in NAT handlings.
> 
> That's true, because I want to avoid conntrack
> 
>> If you want consistent hashing with NAT handlings you'll have to make
>> this stateful and use the conntrack source and reply directions of the
>> original tuples (thus making it stateful). That may be a problem because
>> some people may want to use this without enabling connection tracking.
> 
> What about a compilation switch or a sysctl ?

or better some option for iptables.

>> Are you using this for (uplink) load balancing?
> 
> Actually in both ways 
>  - in front of a bunch of ipvs
>  - and in the payloads for outgoing traffic.
> 
>> Could you also include one realistic example in the patch description on
>> how this is used?
> Sure, I guess you mean some nice ascii graphics,  
> iptables and ip route commands

That would be great, for the record.

>> If this is accepted, I think this has to be merge with the (already
>> overloaded) MARK target.
> 
> I have no opinion about that, others might have.

Better put it in the MARK target with a new revision. I think that
Patrick is going to ask you this.

I don't know why I had the impression that MARK is overload, it's
actually fine at a first glance to the code.

^ permalink raw reply

* ipv6 doesn't chose correct source address
From: Stephen Clark @ 2011-02-03 15:10 UTC (permalink / raw)
  To: Linux Kernel Network Developers

Hello,

I have a Linux 2.6.32.26-175.fc12 box with
two possible ipv6 default gateways. So I was trying
to have two different ipv6 addresses on my eth0 interface.
Then by simply changing my default ipv6 gateway have packets
routed appropriately.

But when I do this the source address in the packet always is the
first ipv6 address and not the one that should be associated with
the corresponding default route.

ip -6 addr sh
4: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
     inet6 2001:470:34::129:91/48 scope global
        valid_lft forever preferred_lft forever
     inet6 2001:4830:28::129:91/48 scope global
        valid_lft forever preferred_lft forever
     inet6 fe80::21c:c0ff:fe94:3a12/64 scope link
        valid_lft forever preferred_lft forever

ip -6 rout sh
default via 2001:4830:28::1 dev eth0  metric 1024  mtu 1500 advmss 1440 
hoplimit 0

so I would expect to have packet going out the default address use the 
source address
of 2001:4830:28::129:91

but:
09:50:28.887483 IP6 2001:470:34::129:91 > 2001:558:1002:5:68:87:64:48: 
ICMP6, echo request, seq 1, length 64

I would have expected the source address to be: 2001:4830:28::129:91

This works correctly with ipv4.
ip addr sh eth0
4: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast 
state UP qlen 1000
     link/ether 00:1c:c0:94:3a:12 brd ff:ff:ff:ff:ff:ff
     inet 10.0.129.91/17 brd 10.0.255.255 scope global eth0
     inet 192.168.198.48/24 scope global eth0
     inet 10.254.150.91/24 scope global eth0
     inet 10.0.1.91/24 scope global eth0
     inet 192.168.3.6/24 scope global eth0
     inet 192.168.198.91/24 scope global secondary eth0

default via 192.168.198.252 dev eth0

09:52:06.968934 IP 192.168.198.48 > 68.87.64.48: ICMP echo request, id 
34402, seq 1, length 64



Is this broken or a feature of ipv6 or a misconfiguration on my part?

Thanks,
Steve

-- 

"They that give up essential liberty to obtain temporary safety,
deserve neither liberty nor safety."  (Ben Franklin)

"The course of history shows that as a government grows, liberty
decreases."  (Thomas Jefferson)




^ permalink raw reply

* Re: [PATCH] bonding: added 802.3ad round-robin hashing policy for single TCP session balancing
From: Oleg V. Ukhno @ 2011-02-03 14:54 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Nicolas de Pesloüan, John Fastabend, netdev@vger.kernel.org
In-Reply-To: <32505.1296669453@death>

On 02/02/2011 08:57 PM, Jay Vosburgh wrote:
> Nicolas de Pesloüan<nicolas.2p.debian@gmail.com>  wrote:

>> I just propose the following option and option values : "src_mac_select"
>> (instead of mac_select), with "default" and "slave_mac" (instead of
>> slave_src_mac) as possible values. In the future, we might need a
>> "dst_mac_select" option... :-)
>
> 	I originally thought of using the nomenclature you propose; my
> thinking for doing it the way I ended up with is to minimize the number
> of tunable knobs that bonding has (so, the dst_mac would be a setting
> for mac_select).  That works as long as there aren't a lot of settings
> that would be turned on simultaneously, since each combination would
> have to be a separate option, or the options parser would have to handle
> multiple settings (e.g., mac_select=src+dst or something like that).
>
> 	Anyway, after thinking about it some more, in the long run it's
> probably safer to separate these two, so, Oleg, use the above naming
> ("src_mac_select" with "default" and "slave_mac").
>
>> Also, are there any risks that this kind of session load-balancing won't
>> properly cooperate with multiqueue (as explained in "Overriding
>> Configuration for Special Cases" in Documentation/networking/bonding.txt)?
>> I think it is important to ensure we keep the ability to fine tune the
>> egress path selection
>
> 	I think the logic for the mac_select (or src_mac_select or
> whatever) just has to be done last, after the slave selection is done by
> the multiqueue stuff.  That's probably a good tidbit to put in the
> documentation as well.
>
> 	-J
>
> ---
> 	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
Thank everyone for comments,
I'll resubmit modified patch after it is ready and tested, in about a 
week or two  I think.

Oleg

>


-- 
Best regards,
Oleg Ukhno


^ permalink raw reply

* Re: [PATCH] tcp: Increase the initial congestion window to 10.
From: Hagen Paul Pfeifer @ 2011-02-03 14:43 UTC (permalink / raw)
  To: Hagen Paul Pfeifer; +Cc: David Miller, netdev, dccp, therbert
In-Reply-To: <47859e9273bb0c1b2c32fc1adfa450ee@localhost>


On Thu, 03 Feb 2011 15:00:28 +0100, Hagen Paul Pfeifer wrote:



> Davem, there is _no_ research how this huge IW will behave in

environments

> with a small BDP. Belief it or not, but there are networks out there

with a

> BDP similar to ~1980. Why for heaven's sake should this work in these

> environments? There are only _two_ extensive analysis one from Cherry

and



s/Cherry/Jerry/  ;-)



Hagen

^ permalink raw reply

* Re: [PATCH] tcp: Increase the initial congestion window to 10.
From: Hagen Paul Pfeifer @ 2011-02-03 14:26 UTC (permalink / raw)
  To: John Heffner; +Cc: David Miller, netdev, dccp, therbert
In-Reply-To: <AANLkTinEXYXeZs1=wVQRAPDQ53YY0ne9NgCZN3x1hbaA@mail.gmail.com>


On Thu, 3 Feb 2011 09:17:14 -0500, John Heffner wrote:



> There's already a per-route tunable, right (RTAX_INITCWND)?



Yes, __u32 cwnd = (dst ? dst_metric(dst, RTAX_INITCWND) : 0);



I was a little bit nervous ... ;)



HGN

^ permalink raw reply

* Re: [PATCH] NETFILTER module xt_hmark new target for HASH MARK
From: Hans Schillstrom @ 2011-02-03 14:23 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: kaber@trash.net, jengelh@medozas.de,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	hans@schillstrom.com
In-Reply-To: <4D4AB2D6.7070302@netfilter.org>

On Thu, 2011-02-03 at 14:51 +0100, Pablo Neira Ayuso wrote:
> On 03/02/11 14:34, Hans Schillstrom wrote:
> > +/*
> > + * Calc hash value, special casre is taken on icmp and fragmented messages
> > + * i.e. fragmented messages don't use ports.
> > + */
> > +static __u32 get_hash(struct sk_buff *skb, struct xt_hmark_info *info)
> > +{
> [...]
> > +	ip_proto &= info->prmask;
> > +	/* get a consistent hash (same value on both flow directions) */
> > +	if (addr2 < addr1)
> > +		swap(addr1, addr2);
> 
> this assumption is not valid in NAT handlings.

That's true, because I want to avoid conntrack

> 
> If you want consistent hashing with NAT handlings you'll have to make
> this stateful and use the conntrack source and reply directions of the
> original tuples (thus making it stateful). That may be a problem because
> some people may want to use this without enabling connection tracking.

What about a compilation switch or a sysctl ?

> 
> Are you using this for (uplink) load balancing?

Actually in both ways 
 - in front of a bunch of ipvs
 - and in the payloads for outgoing traffic.

> Could you also include one realistic example in the patch description on
> how this is used?
Sure, I guess you mean some nice ascii graphics,  
iptables and ip route commands

> 
> If this is accepted, I think this has to be merge with the (already
> overloaded) MARK target.

I have no opinion about that, others might have.

Thanks
Hans


^ permalink raw reply

* [PATCH v4 5/5] loopback: convert to hw_features
From: Michał Mirosław @ 2011-02-03 14:21 UTC (permalink / raw)
  To: netdev; +Cc: Ben Hutchings
In-Reply-To: <cover.1296741561.git.mirq-linux@rere.qmqm.pl>

This also enables TSOv6, TSO-ECN, and UFO as loopback clearly can handle them.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 drivers/net/loopback.c |    9 ++++-----
 1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 2d9663a..97b116b 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -129,10 +129,6 @@ static u32 always_on(struct net_device *dev)
 
 static const struct ethtool_ops loopback_ethtool_ops = {
 	.get_link		= always_on,
-	.set_tso		= ethtool_op_set_tso,
-	.get_tx_csum		= always_on,
-	.get_sg			= always_on,
-	.get_rx_csum		= always_on,
 };
 
 static int loopback_dev_init(struct net_device *dev)
@@ -169,9 +165,12 @@ static void loopback_setup(struct net_device *dev)
 	dev->type		= ARPHRD_LOOPBACK;	/* 0x0001*/
 	dev->flags		= IFF_LOOPBACK;
 	dev->priv_flags	       &= ~IFF_XMIT_DST_RELEASE;
+	dev->hw_features	= NETIF_F_ALL_TSO;
 	dev->features 		= NETIF_F_SG | NETIF_F_FRAGLIST
-		| NETIF_F_TSO
+		| NETIF_F_ALL_TSO
+		| NETIF_F_UFO
 		| NETIF_F_NO_CSUM
+		| NETIF_F_RXCSUM
 		| NETIF_F_HIGHDMA
 		| NETIF_F_LLTX
 		| NETIF_F_NETNS_LOCAL;
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH v2] ethtool: implement G/SFEATURES calls
From: Michał Mirosław @ 2011-02-03 14:21 UTC (permalink / raw)
  To: netdev; +Cc: Ben Hutchings
In-Reply-To: <cover.1296741561.git.mirq-linux@rere.qmqm.pl>

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---

Changes from v1:
 - removed deprecation of -k/-K
 - mark never-changeable features in userspace

---

 ethtool-copy.h |   89 +++++++++++++++++++++++-
 ethtool.c      |  215 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 296 insertions(+), 8 deletions(-)

diff --git a/ethtool-copy.h b/ethtool-copy.h
index 75c3ae7..6430dbd 100644
--- a/ethtool-copy.h
+++ b/ethtool-copy.h
@@ -251,6 +251,7 @@ enum ethtool_stringset {
 	ETH_SS_STATS,
 	ETH_SS_PRIV_FLAGS,
 	ETH_SS_NTUPLE_FILTERS,
+	ETH_SS_FEATURES,
 };
 
 /* for passing string sets for data tagging */
@@ -523,6 +524,87 @@ struct ethtool_flash {
 	char	data[ETHTOOL_FLASH_MAX_FILENAME];
 };
 
+/* for returning and changing feature sets */
+
+/**
+ * struct ethtool_get_features_block - block with state of 32 features
+ * @available: mask of changeable features
+ * @requested: mask of features requested to be enabled if possible
+ * @active: mask of currently enabled features
+ * @never_changed: mask of features not changeable for any device
+ */
+struct ethtool_get_features_block {
+	__u32	available;
+	__u32	requested;
+	__u32	active;
+	__u32	never_changed;
+};
+
+/**
+ * struct ethtool_gfeatures - command to get state of device's features
+ * @cmd: command number = %ETHTOOL_GFEATURES
+ * @size: in: number of elements in the features[] array;
+ *       out: number of elements in features[] needed to hold all features
+ * @features: state of features
+ */
+struct ethtool_gfeatures {
+	__u32	cmd;
+	__u32	size;
+	struct ethtool_get_features_block features[0];
+};
+
+/**
+ * struct ethtool_set_features_block - block with request for 32 features
+ * @valid: mask of features to be changed
+ * @requested: values of features to be changed
+ */
+struct ethtool_set_features_block {
+	__u32	valid;
+	__u32	requested;
+};
+
+/**
+ * struct ethtool_sfeatures - command to request change in device's features
+ * @cmd: command number = %ETHTOOL_SFEATURES
+ * @size: array size of the features[] array
+ * @features: feature change masks
+ */
+struct ethtool_sfeatures {
+	__u32	cmd;
+	__u32	size;
+	struct ethtool_set_features_block features[0];
+};
+
+/*
+ * %ETHTOOL_SFEATURES changes features present in features[].valid to the
+ * values of corresponding bits in features[].requested. Bits in .requested
+ * not set in .valid or not changeable are ignored.
+ *
+ * Returns %EINVAL when .valid contains undefined or never-changable bits
+ * or size is not equal to required number of features words (32-bit blocks).
+ * Returns >= 0 if request was completed; bits set in the value mean:
+ *   %ETHTOOL_F_UNSUPPORTED - there were bits set in .valid that are not
+ *	changeable (not present in %ETHTOOL_GFEATURES' features[].available)
+ *	those bits were ignored.
+ *   %ETHTOOL_F_WISH - some or all changes requested were recorded but the
+ *      resulting state of bits masked by .valid is not equal to .requested.
+ *      Probably there are other device-specific constraints on some features
+ *      in the set. When %ETHTOOL_F_UNSUPPORTED is set, .valid is considered
+ *      here as though ignored bits were cleared.
+ *
+ * Meaning of bits in the masks are obtained by %ETHTOOL_GSSET_INFO (number of
+ * bits in the arrays - always multiple of 32) and %ETHTOOL_GSTRINGS commands
+ * for ETH_SS_FEATURES string set. First entry in the table corresponds to least
+ * significant bit in features[0] fields. Empty strings mark undefined features.
+ */
+enum ethtool_sfeatures_retval_bits {
+	ETHTOOL_F_UNSUPPORTED__BIT,
+	ETHTOOL_F_WISH__BIT,
+};
+
+#define ETHTOOL_F_UNSUPPORTED   (1 << ETHTOOL_F_UNSUPPORTED__BIT)
+#define ETHTOOL_F_WISH          (1 << ETHTOOL_F_WISH__BIT)
+
 
 /* CMDs currently supported */
 #define ETHTOOL_GSET		0x00000001 /* Get settings. */
@@ -534,7 +616,9 @@ struct ethtool_flash {
 #define ETHTOOL_GMSGLVL		0x00000007 /* Get driver message level */
 #define ETHTOOL_SMSGLVL		0x00000008 /* Set driver msg level. */
 #define ETHTOOL_NWAY_RST	0x00000009 /* Restart autonegotiation. */
-#define ETHTOOL_GLINK		0x0000000a /* Get link status (ethtool_value) */
+/* Get link status for host, i.e. whether the interface *and* the
+ * physical port (if there is one) are up (ethtool_value). */
+#define ETHTOOL_GLINK		0x0000000a
 #define ETHTOOL_GEEPROM		0x0000000b /* Get EEPROM data */
 #define ETHTOOL_SEEPROM		0x0000000c /* Set EEPROM data. */
 #define ETHTOOL_GCOALESCE	0x0000000e /* Get coalesce config */
@@ -585,6 +669,9 @@ struct ethtool_flash {
 #define ETHTOOL_GRXFHINDIR	0x00000038 /* Get RX flow hash indir'n table */
 #define ETHTOOL_SRXFHINDIR	0x00000039 /* Set RX flow hash indir'n table */
 
+#define ETHTOOL_GFEATURES	0x0000003a /* Get device offload settings */
+#define ETHTOOL_SFEATURES	0x0000003b /* Change device offload settings */
+
 /* compatibility with older code */
 #define SPARC_ETH_GSET		ETHTOOL_GSET
 #define SPARC_ETH_SSET		ETHTOOL_SSET
diff --git a/ethtool.c b/ethtool.c
index 1afdfe4..af43e47 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -39,6 +39,7 @@
 #include <limits.h>
 #include <ctype.h>
 
+#include <unistd.h>
 #include <sys/socket.h>
 #include <netinet/in.h>
 #include <arpa/inet.h>
@@ -97,6 +98,9 @@ static int do_gcoalesce(int fd, struct ifreq *ifr);
 static int do_scoalesce(int fd, struct ifreq *ifr);
 static int do_goffload(int fd, struct ifreq *ifr);
 static int do_soffload(int fd, struct ifreq *ifr);
+static void parse_sfeatures_args(int argc, char **argp, int argi);
+static int do_gfeatures(int fd, struct ifreq *ifr);
+static int do_sfeatures(int fd, struct ifreq *ifr);
 static int do_gstats(int fd, struct ifreq *ifr);
 static int rxflow_str_to_type(const char *str);
 static int parse_rxfhashopts(char *optstr, u32 *data);
@@ -133,6 +137,8 @@ static enum {
 	MODE_SRING,
 	MODE_GOFFLOAD,
 	MODE_SOFFLOAD,
+	MODE_GFEATURES,
+	MODE_SFEATURES,
 	MODE_GSTATS,
 	MODE_GNFC,
 	MODE_SNFC,
@@ -211,6 +217,10 @@ static struct option {
 		"		[ ntuple on|off ]\n"
 		"		[ rxhash on|off ]\n"
     },
+    { "-w", "--show-features", MODE_GFEATURES, "Get offload status" },
+    { "-W", "--request-features", MODE_SFEATURES, "Set requested offload",
+		"		[ feature-name on|off [...] ]\n"
+		"		see --show-features output for feature-name strings\n" },
     { "-i", "--driver", MODE_GDRV, "Show driver information" },
     { "-d", "--register-dump", MODE_GREGS, "Do a register dump",
 		"		[ raw on|off ]\n"
@@ -743,8 +753,10 @@ static void parse_generic_cmdline(int argc, char **argp,
 				break;
 			}
 		}
-		if( !found)
+		if( !found) {
+			fprintf(stdout, "bad param: %s\n", argp[i]);
 			show_usage(1);
+		}
 	}
 }
 
@@ -829,6 +841,8 @@ static void parse_cmdline(int argc, char **argp)
 			    (mode == MODE_SRING) ||
 			    (mode == MODE_GOFFLOAD) ||
 			    (mode == MODE_SOFFLOAD) ||
+			    (mode == MODE_GFEATURES) ||
+			    (mode == MODE_SFEATURES) ||
 			    (mode == MODE_GSTATS) ||
 			    (mode == MODE_GNFC) ||
 			    (mode == MODE_SNFC) ||
@@ -919,6 +933,11 @@ static void parse_cmdline(int argc, char **argp)
 				i = argc;
 				break;
 			}
+			if (mode == MODE_SFEATURES) {
+				parse_sfeatures_args(argc, argp, i);
+				i = argc;
+				break;
+			}
 			if (mode == MODE_SNTUPLE) {
 				if (!strcmp(argp[i], "flow-type")) {
 					i += 1;
@@ -1947,21 +1966,30 @@ static int dump_rxfhash(int fhash, u64 val)
 	return 0;
 }
 
-static int doit(void)
+static int get_control_socket(struct ifreq *ifr)
 {
-	struct ifreq ifr;
 	int fd;
 
 	/* Setup our control structures. */
-	memset(&ifr, 0, sizeof(ifr));
-	strcpy(ifr.ifr_name, devname);
+	memset(ifr, 0, sizeof(*ifr));
+	strcpy(ifr->ifr_name, devname);
 
 	/* Open control socket. */
 	fd = socket(AF_INET, SOCK_DGRAM, 0);
-	if (fd < 0) {
+	if (fd < 0)
 		perror("Cannot get control socket");
+
+	return fd;
+}
+
+static int doit(void)
+{
+	struct ifreq ifr;
+	int fd;
+
+	fd = get_control_socket(&ifr);
+	if (fd < 0)
 		return 70;
-	}
 
 	/* all of these are expected to populate ifr->ifr_data as needed */
 	if (mode == MODE_GDRV) {
@@ -1998,6 +2026,10 @@ static int doit(void)
 		return do_goffload(fd, &ifr);
 	} else if (mode == MODE_SOFFLOAD) {
 		return do_soffload(fd, &ifr);
+	} else if (mode == MODE_GFEATURES) {
+		return do_gfeatures(fd, &ifr);
+	} else if (mode == MODE_SFEATURES) {
+		return do_sfeatures(fd, &ifr);
 	} else if (mode == MODE_GSTATS) {
 		return do_gstats(fd, &ifr);
 	} else if (mode == MODE_GNFC) {
@@ -2435,6 +2467,175 @@ static int do_soffload(int fd, struct ifreq *ifr)
 	return 0;
 }
 
+static int get_feature_strings(int fd, struct ifreq *ifr,
+	struct ethtool_gstrings **strs)
+{
+	struct ethtool_sset_info *sset_info;
+	struct ethtool_gstrings *strings;
+	int sz_str, n_strings, err;
+
+	sset_info = malloc(sizeof(struct ethtool_sset_info) + sizeof(u32));
+	sset_info->cmd = ETHTOOL_GSSET_INFO;
+	sset_info->sset_mask = (1ULL << ETH_SS_FEATURES);
+	ifr->ifr_data = (caddr_t)sset_info;
+	err = send_ioctl(fd, ifr);
+
+	if ((err < 0) ||
+	    (!(sset_info->sset_mask & (1ULL << ETH_SS_FEATURES)))) {
+		perror("Cannot get driver strings info");
+		return -100;
+	}
+
+	n_strings = sset_info->data[0];
+	free(sset_info);
+	sz_str = n_strings * ETH_GSTRING_LEN;
+
+	strings = calloc(1, sz_str + sizeof(struct ethtool_gstrings));
+	if (!strings) {
+		fprintf(stderr, "no memory available\n");
+		return -95;
+	}
+
+	strings->cmd = ETHTOOL_GSTRINGS;
+	strings->string_set = ETH_SS_FEATURES;
+	strings->len = n_strings;
+	ifr->ifr_data = (caddr_t) strings;
+	err = send_ioctl(fd, ifr);
+	if (err < 0) {
+		perror("Cannot get feature strings information");
+		free(strings);
+		return -96;
+	}
+
+	*strs = strings;
+	return n_strings;
+}
+
+struct ethtool_sfeatures *features_req;
+
+static void parse_sfeatures_args(int argc, char **argp, int argi)
+{
+	struct cmdline_info *cmdline_desc, *cp;
+	struct ethtool_gstrings *strings;
+	struct ifreq ifr;
+	int n_strings, sz_features, i;
+	int fd, changed = 0;
+
+	fd = get_control_socket(&ifr);
+	if (fd < 0)
+		exit(100);
+
+	n_strings = get_feature_strings(fd, &ifr, &strings);
+	if (n_strings < 0)
+		exit(-n_strings);
+
+	sz_features = sizeof(*features_req->features) * ((n_strings + 31) / 32);
+
+	cp = cmdline_desc = calloc(n_strings, sizeof(*cmdline_desc));
+	features_req = calloc(1, sizeof(*features_req) + sz_features);
+	if (!cmdline_desc || !features_req) {
+		fprintf(stderr, "no memory available\n");
+		exit(95);
+	}
+
+	features_req->size = (n_strings + 31) / 32;
+
+	for (i = 0; i < n_strings; ++i) {
+		if (!strings->data[i*ETH_GSTRING_LEN])
+			continue;
+
+		strings->data[i*ETH_GSTRING_LEN + ETH_GSTRING_LEN-1] = 0;
+		cp->name = (const char *)strings->data + i * ETH_GSTRING_LEN;
+		cp->type = CMDL_FLAG;
+		cp->flag_val = 1 << (i % 32);
+		cp->wanted_val = &features_req->features[i / 32].requested;
+		cp->seen_val = &features_req->features[i / 32].valid;
+		++cp;
+	}
+
+	parse_generic_cmdline(argc, argp, argi, &changed,
+		cmdline_desc, cp - cmdline_desc);
+
+	free(cmdline_desc);
+	free(strings);
+	close(fd);
+
+	if (!changed) {
+		free(features_req);
+		features_req = NULL;
+	}
+}
+
+static int do_gfeatures(int fd, struct ifreq *ifr)
+{
+	struct ethtool_gstrings *strings;
+	struct ethtool_gfeatures *features;
+	int n_strings, sz_features, err, i;
+
+	n_strings = get_feature_strings(fd, ifr, &strings);
+	if (n_strings < 0)
+		return -n_strings;
+
+	sz_features = sizeof(*features->features) * ((n_strings + 31) / 32);
+	features = calloc(1, sz_features + sizeof(*features));
+	if (!features) {
+		fprintf(stderr, "no memory available\n");
+		return 95;
+	}
+
+	features->cmd = ETHTOOL_GFEATURES;
+	features->size = (n_strings + 31) / 32;
+	ifr->ifr_data = (caddr_t) features;
+	err = send_ioctl(fd, ifr);
+	if (err < 0) {
+		perror("Cannot get feature status");
+		free(strings);
+		free(features);
+		return 97;
+	}
+
+	fprintf(stdout, "Offload state:  (name: enabled,wanted,changable)\n");
+	for (i = 0; i < n_strings; i++) {
+		if (!strings->data[i * ETH_GSTRING_LEN])
+			continue;	/* empty */
+#define P_FLAG(f) \
+	(features->features[i / 32].f & (1 << (i % 32))) ? "yes" : " no"
+#define PN_FLAG(f) \
+	(features->features[i / 32].never_changed & (1 << (i % 32))) ? "---" : P_FLAG(f)
+		fprintf(stdout, "     %-*.*s %s,%s,%s\n",
+			ETH_GSTRING_LEN, ETH_GSTRING_LEN,
+			&strings->data[i * ETH_GSTRING_LEN],
+			P_FLAG(active), PN_FLAG(requested), PN_FLAG(available));
+#undef P_FLAG
+	}
+	free(strings);
+	free(features);
+
+	return 0;
+}
+
+static int do_sfeatures(int fd, struct ifreq *ifr)
+{
+	int err;
+
+	if (!features_req) {
+		fprintf(stderr, "no features changed\n");
+		return 97;
+	}
+
+	features_req->cmd = ETHTOOL_SFEATURES;
+	ifr->ifr_data = (caddr_t) features_req;
+	err = send_ioctl(fd, ifr);
+	if (err < 0) {
+		perror("Cannot change features");
+		free(features_req);
+		return 97;
+	}
+
+	return 0;
+}
+
+
 static int do_gset(int fd, struct ifreq *ifr)
 {
 	int err;
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH v4 1/5] net: Introduce new feature setting ops
From: Michał Mirosław @ 2011-02-03 14:21 UTC (permalink / raw)
  To: netdev; +Cc: Ben Hutchings
In-Reply-To: <cover.1296741561.git.mirq-linux@rere.qmqm.pl>

This introduces a new framework to handle device features setting.
It consists of:
  - new fields in struct net_device:
	+ hw_features - features that hw/driver supports toggling
	+ wanted_features - features that user wants enabled, when possible
  - new netdev_ops:
	+ feat = ndo_fix_features(dev, feat) - API checking constraints for
		enabling features or their combinations
	+ ndo_set_features(dev) - API updating hardware state to match
		changed dev->features
  - new ethtool commands:
	+ ETHTOOL_GFEATURES/ETHTOOL_SFEATURES: get/set dev->wanted_features
		and trigger device reconfiguration if resulting dev->features
		changed
	+ ETHTOOL_GSTRINGS(ETH_SS_FEATURES): get feature bits names (meaning)

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 include/linux/ethtool.h   |   85 +++++++++++++++++++++++++
 include/linux/netdevice.h |   45 +++++++++++++-
 net/core/dev.c            |   49 +++++++++++++--
 net/core/ethtool.c        |  154 +++++++++++++++++++++++++++++++++++++++++----
 4 files changed, 314 insertions(+), 19 deletions(-)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 1908929..806e716 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -251,6 +251,7 @@ enum ethtool_stringset {
 	ETH_SS_STATS,
 	ETH_SS_PRIV_FLAGS,
 	ETH_SS_NTUPLE_FILTERS,
+	ETH_SS_FEATURES,
 };
 
 /* for passing string sets for data tagging */
@@ -523,6 +524,87 @@ struct ethtool_flash {
 	char	data[ETHTOOL_FLASH_MAX_FILENAME];
 };
 
+/* for returning and changing feature sets */
+
+/**
+ * struct ethtool_get_features_block - block with state of 32 features
+ * @available: mask of changeable features
+ * @requested: mask of features requested to be enabled if possible
+ * @active: mask of currently enabled features
+ * @never_changed: mask of features not changeable for any device
+ */
+struct ethtool_get_features_block {
+	__u32	available;
+	__u32	requested;
+	__u32	active;
+	__u32	never_changed;
+};
+
+/**
+ * struct ethtool_gfeatures - command to get state of device's features
+ * @cmd: command number = %ETHTOOL_GFEATURES
+ * @size: in: number of elements in the features[] array;
+ *       out: number of elements in features[] needed to hold all features
+ * @features: state of features
+ */
+struct ethtool_gfeatures {
+	__u32	cmd;
+	__u32	size;
+	struct ethtool_get_features_block features[0];
+};
+
+/**
+ * struct ethtool_set_features_block - block with request for 32 features
+ * @valid: mask of features to be changed
+ * @requested: values of features to be changed
+ */
+struct ethtool_set_features_block {
+	__u32	valid;
+	__u32	requested;
+};
+
+/**
+ * struct ethtool_sfeatures - command to request change in device's features
+ * @cmd: command number = %ETHTOOL_SFEATURES
+ * @size: array size of the features[] array
+ * @features: feature change masks
+ */
+struct ethtool_sfeatures {
+	__u32	cmd;
+	__u32	size;
+	struct ethtool_set_features_block features[0];
+};
+
+/*
+ * %ETHTOOL_SFEATURES changes features present in features[].valid to the
+ * values of corresponding bits in features[].requested. Bits in .requested
+ * not set in .valid or not changeable are ignored.
+ *
+ * Returns %EINVAL when .valid contains undefined or never-changable bits
+ * or size is not equal to required number of features words (32-bit blocks).
+ * Returns >= 0 if request was completed; bits set in the value mean:
+ *   %ETHTOOL_F_UNSUPPORTED - there were bits set in .valid that are not
+ *	changeable (not present in %ETHTOOL_GFEATURES' features[].available)
+ *	those bits were ignored.
+ *   %ETHTOOL_F_WISH - some or all changes requested were recorded but the
+ *      resulting state of bits masked by .valid is not equal to .requested.
+ *      Probably there are other device-specific constraints on some features
+ *      in the set. When %ETHTOOL_F_UNSUPPORTED is set, .valid is considered
+ *      here as though ignored bits were cleared.
+ *
+ * Meaning of bits in the masks are obtained by %ETHTOOL_GSSET_INFO (number of
+ * bits in the arrays - always multiple of 32) and %ETHTOOL_GSTRINGS commands
+ * for ETH_SS_FEATURES string set. First entry in the table corresponds to least
+ * significant bit in features[0] fields. Empty strings mark undefined features.
+ */
+enum ethtool_sfeatures_retval_bits {
+	ETHTOOL_F_UNSUPPORTED__BIT,
+	ETHTOOL_F_WISH__BIT,
+};
+
+#define ETHTOOL_F_UNSUPPORTED   (1 << ETHTOOL_F_UNSUPPORTED__BIT)
+#define ETHTOOL_F_WISH          (1 << ETHTOOL_F_WISH__BIT)
+
 #ifdef __KERNEL__
 
 #include <linux/rculist.h>
@@ -744,6 +826,9 @@ struct ethtool_ops {
 #define ETHTOOL_GRXFHINDIR	0x00000038 /* Get RX flow hash indir'n table */
 #define ETHTOOL_SRXFHINDIR	0x00000039 /* Set RX flow hash indir'n table */
 
+#define ETHTOOL_GFEATURES	0x0000003a /* Get device offload settings */
+#define ETHTOOL_SFEATURES	0x0000003b /* Change device offload settings */
+
 /* compatibility with older code */
 #define SPARC_ETH_GSET		ETHTOOL_GSET
 #define SPARC_ETH_SSET		ETHTOOL_SSET
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c7d7074..4a3e554 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -783,6 +783,17 @@ struct netdev_tc_txq {
  *	Set hardware filter for RFS.  rxq_index is the target queue index;
  *	flow_id is a flow ID to be passed to rps_may_expire_flow() later.
  *	Return the filter ID on success, or a negative error code.
+ *
+ * u32 (*ndo_fix_features)(struct net_device *dev, u32 features);
+ *	Adjusts the requested feature flags according to device-specific
+ *	constraints, and returns the resulting flags. Must not modify
+ *	the device state.
+ *
+ * int (*ndo_set_features)(struct net_device *dev, u32 features);
+ *	Called to update device configuration to new features. Passed
+ *	feature set might be less than what was returned by ndo_fix_features()).
+ *	Must return >0 or -errno if it changed dev->features itself.
+ *
  */
 #define HAVE_NET_DEVICE_OPS
 struct net_device_ops {
@@ -862,6 +873,10 @@ struct net_device_ops {
 						     u16 rxq_index,
 						     u32 flow_id);
 #endif
+	u32			(*ndo_fix_features)(struct net_device *dev,
+						    u32 features);
+	int			(*ndo_set_features)(struct net_device *dev,
+						    u32 features);
 };
 
 /*
@@ -913,12 +928,18 @@ struct net_device {
 	struct list_head	napi_list;
 	struct list_head	unreg_list;
 
-	/* Net device features */
+	/* currently active device features */
 	u32			features;
-
+	/* user-changeable features */
+	u32			hw_features;
+	/* user-requested features */
+	u32			wanted_features;
 	/* VLAN feature mask */
 	u32			vlan_features;
 
+	/* Net device feature bits; if you change something,
+	 * also update netdev_features_strings[] in ethtool.c */
+
 #define NETIF_F_SG		1	/* Scatter/gather IO. */
 #define NETIF_F_IP_CSUM		2	/* Can checksum TCP/UDP over IPv4. */
 #define NETIF_F_NO_CSUM		4	/* Does not require checksum. F.e. loopack. */
@@ -954,6 +975,12 @@ struct net_device {
 #define NETIF_F_TSO6		(SKB_GSO_TCPV6 << NETIF_F_GSO_SHIFT)
 #define NETIF_F_FSO		(SKB_GSO_FCOE << NETIF_F_GSO_SHIFT)
 
+	/* Features valid for ethtool to change */
+	/* = all defined minus driver/device-class-related */
+#define NETIF_F_NEVER_CHANGE	(NETIF_F_HIGHDMA | NETIF_F_VLAN_CHALLENGED | \
+				  NETIF_F_LLTX | NETIF_F_NETNS_LOCAL)
+#define NETIF_F_ETHTOOL_BITS	(0x1f3fffff & ~NETIF_F_NEVER_CHANGE)
+
 	/* List of features with software fallbacks. */
 #define NETIF_F_GSO_SOFTWARE	(NETIF_F_TSO | NETIF_F_TSO_ECN | \
 				 NETIF_F_TSO6 | NETIF_F_UFO)
@@ -964,6 +991,12 @@ struct net_device {
 #define NETIF_F_V6_CSUM		(NETIF_F_GEN_CSUM | NETIF_F_IPV6_CSUM)
 #define NETIF_F_ALL_CSUM	(NETIF_F_V4_CSUM | NETIF_F_V6_CSUM)
 
+#define NETIF_F_ALL_TSO 	(NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_TSO_ECN)
+
+#define NETIF_F_ALL_TX_OFFLOADS	(NETIF_F_ALL_CSUM | NETIF_F_SG | \
+				 NETIF_F_FRAGLIST | NETIF_F_ALL_TSO | \
+				 NETIF_F_SCTP_CSUM | NETIF_F_FCOE_CRC)
+
 	/*
 	 * If one device supports one of these features, then enable them
 	 * for all in netdev_increment_features.
@@ -972,6 +1005,9 @@ struct net_device {
 				 NETIF_F_SG | NETIF_F_HIGHDMA |		\
 				 NETIF_F_FRAGLIST)
 
+	/* changeable features with no special hardware requirements */
+#define NETIF_F_SOFT_FEATURES	(NETIF_F_GSO | NETIF_F_GRO)
+
 	/* Interface index. Unique device identifier	*/
 	int			ifindex;
 	int			iflink;
@@ -2405,8 +2441,13 @@ extern char *netdev_drivername(const struct net_device *dev, char *buffer, int l
 
 extern void linkwatch_run_queue(void);
 
+static inline u32 netdev_get_wanted_features(struct net_device *dev)
+{
+	return (dev->features & ~dev->hw_features) | dev->wanted_features;
+}
 u32 netdev_increment_features(u32 all, u32 one, u32 mask);
 u32 netdev_fix_features(struct net_device *dev, u32 features);
+void netdev_update_features(struct net_device *dev);
 
 void netif_stacked_transfer_operstate(const struct net_device *rootdev,
 					struct net_device *dev);
diff --git a/net/core/dev.c b/net/core/dev.c
index 9109e26..92c690e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5246,6 +5246,12 @@ u32 netdev_fix_features(struct net_device *dev, u32 features)
 		features &= ~NETIF_F_TSO;
 	}
 
+	/* Software GSO depends on SG. */
+	if ((features & NETIF_F_GSO) && !(features & NETIF_F_SG)) {
+		netdev_info(dev, "Dropping NETIF_F_GSO since no SG feature.\n");
+		features &= ~NETIF_F_GSO;
+	}
+
 	/* UFO needs SG and checksumming */
 	if (features & NETIF_F_UFO) {
 		/* maybe split UFO into V4 and V6? */
@@ -5268,6 +5274,37 @@ u32 netdev_fix_features(struct net_device *dev, u32 features)
 }
 EXPORT_SYMBOL(netdev_fix_features);
 
+void netdev_update_features(struct net_device *dev)
+{
+	u32 features;
+	int err = 0;
+
+	features = netdev_get_wanted_features(dev);
+
+	if (dev->netdev_ops->ndo_fix_features)
+		features = dev->netdev_ops->ndo_fix_features(dev, features);
+
+	/* driver might be less strict about feature dependencies */
+	features = netdev_fix_features(dev, features);
+
+	if (dev->features == features)
+		return;
+
+	netdev_info(dev, "Features changed: 0x%08x -> 0x%08x\n",
+		dev->features, features);
+
+	if (dev->netdev_ops->ndo_set_features)
+		err = dev->netdev_ops->ndo_set_features(dev, features);
+
+	if (!err)
+		dev->features = features;
+	else if (err < 0)
+		netdev_err(dev,
+			"set_features() failed (%d); wanted 0x%08x, left 0x%08x\n",
+			err, features, dev->features);
+}
+EXPORT_SYMBOL(netdev_update_features);
+
 /**
  *	netif_stacked_transfer_operstate -	transfer operstate
  *	@rootdev: the root or lower level device to transfer state from
@@ -5402,11 +5439,13 @@ int register_netdevice(struct net_device *dev)
 	if (dev->iflink == -1)
 		dev->iflink = dev->ifindex;
 
-	dev->features = netdev_fix_features(dev, dev->features);
-
-	/* Enable software GSO if SG is supported. */
-	if (dev->features & NETIF_F_SG)
-		dev->features |= NETIF_F_GSO;
+	/* Transfer changeable features to wanted_features and enable
+	 * software offloads (GSO and GRO).
+	 */
+	dev->hw_features |= NETIF_F_SOFT_FEATURES;
+	dev->wanted_features = (dev->features & dev->hw_features)
+		| NETIF_F_SOFT_FEATURES;
+	netdev_update_features(dev);
 
 	/* Enable GRO and NETIF_F_HIGHDMA for vlans by default,
 	 * vlan_dev_init() will do the dev->features check, so these features
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 5984ee0..2f1b448 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -55,6 +55,7 @@ int ethtool_op_set_tx_csum(struct net_device *dev, u32 data)
 
 	return 0;
 }
+EXPORT_SYMBOL(ethtool_op_set_tx_csum);
 
 int ethtool_op_set_tx_hw_csum(struct net_device *dev, u32 data)
 {
@@ -171,6 +172,136 @@ EXPORT_SYMBOL(ethtool_ntuple_flush);
 
 /* Handlers for each ethtool command */
 
+#define ETHTOOL_DEV_FEATURE_WORDS	1
+
+static int ethtool_get_features(struct net_device *dev, void __user *useraddr)
+{
+	struct ethtool_gfeatures cmd = {
+		.cmd = ETHTOOL_GFEATURES,
+		.size = ETHTOOL_DEV_FEATURE_WORDS,
+	};
+	struct ethtool_get_features_block features[ETHTOOL_DEV_FEATURE_WORDS] = {
+		{
+			.available = dev->hw_features,
+			.requested = dev->wanted_features,
+			.active = dev->features,
+			.never_changed = NETIF_F_NEVER_CHANGE,
+		},
+	};
+	u32 __user *sizeaddr;
+	u32 copy_size;
+
+	sizeaddr = useraddr + offsetof(struct ethtool_gfeatures, size);
+	if (get_user(copy_size, sizeaddr))
+		return -EFAULT;
+
+	if (copy_size > ETHTOOL_DEV_FEATURE_WORDS)
+		copy_size = ETHTOOL_DEV_FEATURE_WORDS;
+
+	if (copy_to_user(useraddr, &cmd, sizeof(cmd)))
+		return -EFAULT;
+	useraddr += sizeof(cmd);
+	if (copy_to_user(useraddr, features, copy_size * sizeof(*features)))
+		return -EFAULT;
+	return 0;
+}
+
+static int ethtool_set_features(struct net_device *dev, void __user *useraddr)
+{
+	struct ethtool_sfeatures cmd;
+	struct ethtool_set_features_block features[ETHTOOL_DEV_FEATURE_WORDS];
+	int ret = 0;
+
+	if (copy_from_user(&cmd, useraddr, sizeof(cmd)))
+		return -EFAULT;
+	useraddr += sizeof(cmd);
+
+	if (cmd.size != ETHTOOL_DEV_FEATURE_WORDS)
+		return -EINVAL;
+
+	if (copy_from_user(features, useraddr, sizeof(features)))
+		return -EFAULT;
+
+	if (features[0].valid & ~NETIF_F_ETHTOOL_BITS)
+		return -EINVAL;
+
+	if (features[0].valid & ~dev->hw_features) {
+		features[0].valid &= dev->hw_features;
+		ret |= ETHTOOL_F_UNSUPPORTED;
+	}
+
+	dev->wanted_features &= ~features[0].valid;
+	dev->wanted_features |= features[0].valid & features[0].requested;
+	netdev_update_features(dev);
+
+	if ((dev->wanted_features ^ dev->features) & features[0].valid)
+		ret |= ETHTOOL_F_WISH;
+
+	return ret;
+}
+
+static const char netdev_features_strings[ETHTOOL_DEV_FEATURE_WORDS * 32][ETH_GSTRING_LEN] = {
+	/* NETIF_F_SG */              "tx-scatter-gather",
+	/* NETIF_F_IP_CSUM */         "tx-checksum-ipv4",
+	/* NETIF_F_NO_CSUM */         "tx-checksum-unneeded",
+	/* NETIF_F_HW_CSUM */         "tx-checksum-ip-generic",
+	/* NETIF_F_IPV6_CSUM */       "tx_checksum-ipv6",
+	/* NETIF_F_HIGHDMA */         "highdma",
+	/* NETIF_F_FRAGLIST */        "tx-scatter-gather-fraglist",
+	/* NETIF_F_HW_VLAN_TX */      "tx-vlan-hw-insert",
+
+	/* NETIF_F_HW_VLAN_RX */      "rx-vlan-hw-parse",
+	/* NETIF_F_HW_VLAN_FILTER */  "rx-vlan-filter",
+	/* NETIF_F_VLAN_CHALLENGED */ "vlan-challenged",
+	/* NETIF_F_GSO */             "tx-generic-segmentation",
+	/* NETIF_F_LLTX */            "tx-lockless",
+	/* NETIF_F_NETNS_LOCAL */     "netns-local",
+	/* NETIF_F_GRO */             "rx-gro",
+	/* NETIF_F_LRO */             "rx-lro",
+
+	/* NETIF_F_TSO */             "tx-tcp-segmentation",
+	/* NETIF_F_UFO */             "tx-udp-fragmentation",
+	/* NETIF_F_GSO_ROBUST */      "tx-gso-robust",
+	/* NETIF_F_TSO_ECN */         "tx-tcp-ecn-segmentation",
+	/* NETIF_F_TSO6 */            "tx-tcp6-segmentation",
+	/* NETIF_F_FSO */             "tx-fcoe-segmentation",
+	"",
+	"",
+
+	/* NETIF_F_FCOE_CRC */        "tx-checksum-fcoe-crc",
+	/* NETIF_F_SCTP_CSUM */       "tx-checksum-sctp",
+	/* NETIF_F_FCOE_MTU */        "fcoe-mtu",
+	/* NETIF_F_NTUPLE */          "rx-ntuple-filter",
+	/* NETIF_F_RXHASH */          "rx-hashing",
+	"",
+	"",
+	"",
+};
+
+static int __ethtool_get_sset_count(struct net_device *dev, int sset)
+{
+	const struct ethtool_ops *ops = dev->ethtool_ops;
+
+	if (sset == ETH_SS_FEATURES)
+		return ARRAY_SIZE(netdev_features_strings);
+	else if (ops && ops->get_sset_count)
+		return ops->get_sset_count(dev, sset);
+	else
+		return -EINVAL;
+}
+
+static void __ethtool_get_strings(struct net_device *dev,
+	u32 stringset, u8 *data)
+{
+	const struct ethtool_ops *ops = dev->ethtool_ops;
+
+	if (stringset == ETH_SS_FEATURES)
+		memcpy(data, netdev_features_strings,
+			sizeof(netdev_features_strings));
+	else if (ops && ops->get_strings)
+		ops->get_strings(dev, stringset, data);
+}
+
 static int ethtool_get_settings(struct net_device *dev, void __user *useraddr)
 {
 	struct ethtool_cmd cmd = { .cmd = ETHTOOL_GSET };
@@ -251,14 +382,10 @@ static noinline_for_stack int ethtool_get_sset_info(struct net_device *dev,
 						    void __user *useraddr)
 {
 	struct ethtool_sset_info info;
-	const struct ethtool_ops *ops = dev->ethtool_ops;
 	u64 sset_mask;
 	int i, idx = 0, n_bits = 0, ret, rc;
 	u32 *info_buf = NULL;
 
-	if (!ops->get_sset_count)
-		return -EOPNOTSUPP;
-
 	if (copy_from_user(&info, useraddr, sizeof(info)))
 		return -EFAULT;
 
@@ -285,7 +412,7 @@ static noinline_for_stack int ethtool_get_sset_info(struct net_device *dev,
 		if (!(sset_mask & (1ULL << i)))
 			continue;
 
-		rc = ops->get_sset_count(dev, i);
+		rc = __ethtool_get_sset_count(dev, i);
 		if (rc >= 0) {
 			info.sset_mask |= (1ULL << i);
 			info_buf[idx++] = rc;
@@ -1287,17 +1414,13 @@ static int ethtool_self_test(struct net_device *dev, char __user *useraddr)
 static int ethtool_get_strings(struct net_device *dev, void __user *useraddr)
 {
 	struct ethtool_gstrings gstrings;
-	const struct ethtool_ops *ops = dev->ethtool_ops;
 	u8 *data;
 	int ret;
 
-	if (!ops->get_strings || !ops->get_sset_count)
-		return -EOPNOTSUPP;
-
 	if (copy_from_user(&gstrings, useraddr, sizeof(gstrings)))
 		return -EFAULT;
 
-	ret = ops->get_sset_count(dev, gstrings.string_set);
+	ret = __ethtool_get_sset_count(dev, gstrings.string_set);
 	if (ret < 0)
 		return ret;
 
@@ -1307,7 +1430,7 @@ static int ethtool_get_strings(struct net_device *dev, void __user *useraddr)
 	if (!data)
 		return -ENOMEM;
 
-	ops->get_strings(dev, gstrings.string_set, data);
+	__ethtool_get_strings(dev, gstrings.string_set, data);
 
 	ret = -EFAULT;
 	if (copy_to_user(useraddr, &gstrings, sizeof(gstrings)))
@@ -1317,7 +1440,7 @@ static int ethtool_get_strings(struct net_device *dev, void __user *useraddr)
 		goto out;
 	ret = 0;
 
- out:
+out:
 	kfree(data);
 	return ret;
 }
@@ -1500,6 +1623,7 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 	case ETHTOOL_GRXCLSRLCNT:
 	case ETHTOOL_GRXCLSRULE:
 	case ETHTOOL_GRXCLSRLALL:
+	case ETHTOOL_GFEATURES:
 		break;
 	default:
 		if (!capable(CAP_NET_ADMIN))
@@ -1693,6 +1817,12 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 	case ETHTOOL_SRXFHINDIR:
 		rc = ethtool_set_rxfh_indir(dev, useraddr);
 		break;
+	case ETHTOOL_GFEATURES:
+		rc = ethtool_get_features(dev, useraddr);
+		break;
+	case ETHTOOL_SFEATURES:
+		rc = ethtool_set_features(dev, useraddr);
+		break;
 	default:
 		rc = -EOPNOTSUPP;
 	}
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH v4 2/5] net: ethtool: use ndo_fix_features for offload setting
From: Michał Mirosław @ 2011-02-03 14:21 UTC (permalink / raw)
  To: netdev; +Cc: Ben Hutchings
In-Reply-To: <cover.1296741561.git.mirq-linux@rere.qmqm.pl>

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>                                                    

---
 net/core/ethtool.c |  262 ++++++++++++++++++++++++----------------------------
 1 files changed, 119 insertions(+), 143 deletions(-)

diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 2f1b448..555accf 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -240,6 +240,96 @@ static int ethtool_set_features(struct net_device *dev, void __user *useraddr)
 	return ret;
 }
 
+static u32 ethtool_get_feature_mask(u32 eth_cmd)
+{
+	/* feature masks of legacy discrete ethtool ops */
+
+	switch (eth_cmd) {
+	case ETHTOOL_GTXCSUM:
+	case ETHTOOL_STXCSUM:
+		return NETIF_F_ALL_CSUM | NETIF_F_SCTP_CSUM;
+	case ETHTOOL_GSG:
+	case ETHTOOL_SSG:
+		return NETIF_F_SG;
+	case ETHTOOL_GTSO:
+	case ETHTOOL_STSO:
+		return NETIF_F_ALL_TSO;
+	case ETHTOOL_GUFO:
+	case ETHTOOL_SUFO:
+		return NETIF_F_UFO;
+	case ETHTOOL_GGSO:
+	case ETHTOOL_SGSO:
+		return NETIF_F_GSO;
+	case ETHTOOL_GGRO:
+	case ETHTOOL_SGRO:
+		return NETIF_F_GRO;
+	default:
+		BUG();
+	}
+}
+
+static int ethtool_get_one_feature(struct net_device *dev, char __user *useraddr,
+	u32 ethcmd)
+{
+	struct ethtool_value edata = {
+		.cmd = ethcmd,
+		.data = !!(dev->features & ethtool_get_feature_mask(ethcmd)),
+	};
+
+	if (copy_to_user(useraddr, &edata, sizeof(edata)))
+		return -EFAULT;
+	return 0;
+}
+
+static int __ethtool_set_tx_csum(struct net_device *dev, u32 data);
+static int __ethtool_set_sg(struct net_device *dev, u32 data);
+static int __ethtool_set_tso(struct net_device *dev, u32 data);
+static int __ethtool_set_ufo(struct net_device *dev, u32 data);
+
+static int ethtool_set_one_feature(struct net_device *dev,
+	void __user *useraddr, u32 ethcmd)
+{
+	struct ethtool_value edata;
+	u32 mask;
+
+	if (copy_from_user(&edata, useraddr, sizeof(edata)))
+		return -EFAULT;
+
+	mask = ethtool_get_feature_mask(ethcmd);
+	mask &= dev->hw_features;
+	if (mask) {
+		if (edata.data)
+			dev->wanted_features |= mask;
+		else
+			dev->wanted_features &= ~mask;
+
+		netdev_update_features(dev);
+		return 0;
+	}
+
+	/* Driver is not converted to ndo_fix_features or does not
+	 * support changing this offload. In the latter case it won't
+	 * have corresponding ethtool_ops field set.
+	 *
+	 * Following part is to be removed after all drivers advertise
+	 * their changeable features in netdev->hw_features and stop
+	 * using discrete offload setting ops.
+	 */
+
+	switch (ethcmd) {
+	case ETHTOOL_STXCSUM:
+		return __ethtool_set_tx_csum(dev, edata.data);
+	case ETHTOOL_SSG:
+		return __ethtool_set_sg(dev, edata.data);
+	case ETHTOOL_STSO:
+		return __ethtool_set_tso(dev, edata.data);
+	case ETHTOOL_SUFO:
+		return __ethtool_set_ufo(dev, edata.data);
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
 static const char netdev_features_strings[ETHTOOL_DEV_FEATURE_WORDS * 32][ETH_GSTRING_LEN] = {
 	/* NETIF_F_SG */              "tx-scatter-gather",
 	/* NETIF_F_IP_CSUM */         "tx-checksum-ipv4",
@@ -1218,6 +1308,9 @@ static int __ethtool_set_sg(struct net_device *dev, u32 data)
 {
 	int err;
 
+	if (data && !(dev->features & NETIF_F_ALL_CSUM))
+		return -EINVAL;
+
 	if (!data && dev->ethtool_ops->set_tso) {
 		err = dev->ethtool_ops->set_tso(dev, 0);
 		if (err)
@@ -1232,26 +1325,21 @@ static int __ethtool_set_sg(struct net_device *dev, u32 data)
 	return dev->ethtool_ops->set_sg(dev, data);
 }
 
-static int ethtool_set_tx_csum(struct net_device *dev, char __user *useraddr)
+static int __ethtool_set_tx_csum(struct net_device *dev, u32 data)
 {
-	struct ethtool_value edata;
 	int err;
 
 	if (!dev->ethtool_ops->set_tx_csum)
 		return -EOPNOTSUPP;
 
-	if (copy_from_user(&edata, useraddr, sizeof(edata)))
-		return -EFAULT;
-
-	if (!edata.data && dev->ethtool_ops->set_sg) {
+	if (!data && dev->ethtool_ops->set_sg) {
 		err = __ethtool_set_sg(dev, 0);
 		if (err)
 			return err;
 	}
 
-	return dev->ethtool_ops->set_tx_csum(dev, edata.data);
+	return dev->ethtool_ops->set_tx_csum(dev, data);
 }
-EXPORT_SYMBOL(ethtool_op_set_tx_csum);
 
 static int ethtool_set_rx_csum(struct net_device *dev, char __user *useraddr)
 {
@@ -1269,108 +1357,28 @@ static int ethtool_set_rx_csum(struct net_device *dev, char __user *useraddr)
 	return dev->ethtool_ops->set_rx_csum(dev, edata.data);
 }
 
-static int ethtool_set_sg(struct net_device *dev, char __user *useraddr)
+static int __ethtool_set_tso(struct net_device *dev, u32 data)
 {
-	struct ethtool_value edata;
-
-	if (!dev->ethtool_ops->set_sg)
-		return -EOPNOTSUPP;
-
-	if (copy_from_user(&edata, useraddr, sizeof(edata)))
-		return -EFAULT;
-
-	if (edata.data &&
-	    !(dev->features & NETIF_F_ALL_CSUM))
-		return -EINVAL;
-
-	return __ethtool_set_sg(dev, edata.data);
-}
-
-static int ethtool_set_tso(struct net_device *dev, char __user *useraddr)
-{
-	struct ethtool_value edata;
-
 	if (!dev->ethtool_ops->set_tso)
 		return -EOPNOTSUPP;
 
-	if (copy_from_user(&edata, useraddr, sizeof(edata)))
-		return -EFAULT;
-
-	if (edata.data && !(dev->features & NETIF_F_SG))
+	if (data && !(dev->features & NETIF_F_SG))
 		return -EINVAL;
 
-	return dev->ethtool_ops->set_tso(dev, edata.data);
+	return dev->ethtool_ops->set_tso(dev, data);
 }
 
-static int ethtool_set_ufo(struct net_device *dev, char __user *useraddr)
+static int __ethtool_set_ufo(struct net_device *dev, u32 data)
 {
-	struct ethtool_value edata;
-
 	if (!dev->ethtool_ops->set_ufo)
 		return -EOPNOTSUPP;
-	if (copy_from_user(&edata, useraddr, sizeof(edata)))
-		return -EFAULT;
-	if (edata.data && !(dev->features & NETIF_F_SG))
+	if (data && !(dev->features & NETIF_F_SG))
 		return -EINVAL;
-	if (edata.data && !((dev->features & NETIF_F_GEN_CSUM) ||
+	if (data && !((dev->features & NETIF_F_GEN_CSUM) ||
 		(dev->features & (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM))
 			== (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM)))
 		return -EINVAL;
-	return dev->ethtool_ops->set_ufo(dev, edata.data);
-}
-
-static int ethtool_get_gso(struct net_device *dev, char __user *useraddr)
-{
-	struct ethtool_value edata = { ETHTOOL_GGSO };
-
-	edata.data = dev->features & NETIF_F_GSO;
-	if (copy_to_user(useraddr, &edata, sizeof(edata)))
-		return -EFAULT;
-	return 0;
-}
-
-static int ethtool_set_gso(struct net_device *dev, char __user *useraddr)
-{
-	struct ethtool_value edata;
-
-	if (copy_from_user(&edata, useraddr, sizeof(edata)))
-		return -EFAULT;
-	if (edata.data)
-		dev->features |= NETIF_F_GSO;
-	else
-		dev->features &= ~NETIF_F_GSO;
-	return 0;
-}
-
-static int ethtool_get_gro(struct net_device *dev, char __user *useraddr)
-{
-	struct ethtool_value edata = { ETHTOOL_GGRO };
-
-	edata.data = dev->features & NETIF_F_GRO;
-	if (copy_to_user(useraddr, &edata, sizeof(edata)))
-		return -EFAULT;
-	return 0;
-}
-
-static int ethtool_set_gro(struct net_device *dev, char __user *useraddr)
-{
-	struct ethtool_value edata;
-
-	if (copy_from_user(&edata, useraddr, sizeof(edata)))
-		return -EFAULT;
-
-	if (edata.data) {
-		u32 rxcsum = dev->ethtool_ops->get_rx_csum ?
-				dev->ethtool_ops->get_rx_csum(dev) :
-				ethtool_op_get_rx_csum(dev);
-
-		if (!rxcsum)
-			return -EINVAL;
-		dev->features |= NETIF_F_GRO;
-	} else
-		dev->features &= ~NETIF_F_GRO;
-
-	return 0;
+	return dev->ethtool_ops->set_ufo(dev, data);
 }
 
 static int ethtool_self_test(struct net_device *dev, char __user *useraddr)
@@ -1703,33 +1711,6 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 	case ETHTOOL_SRXCSUM:
 		rc = ethtool_set_rx_csum(dev, useraddr);
 		break;
-	case ETHTOOL_GTXCSUM:
-		rc = ethtool_get_value(dev, useraddr, ethcmd,
-				       (dev->ethtool_ops->get_tx_csum ?
-					dev->ethtool_ops->get_tx_csum :
-					ethtool_op_get_tx_csum));
-		break;
-	case ETHTOOL_STXCSUM:
-		rc = ethtool_set_tx_csum(dev, useraddr);
-		break;
-	case ETHTOOL_GSG:
-		rc = ethtool_get_value(dev, useraddr, ethcmd,
-				       (dev->ethtool_ops->get_sg ?
-					dev->ethtool_ops->get_sg :
-					ethtool_op_get_sg));
-		break;
-	case ETHTOOL_SSG:
-		rc = ethtool_set_sg(dev, useraddr);
-		break;
-	case ETHTOOL_GTSO:
-		rc = ethtool_get_value(dev, useraddr, ethcmd,
-				       (dev->ethtool_ops->get_tso ?
-					dev->ethtool_ops->get_tso :
-					ethtool_op_get_tso));
-		break;
-	case ETHTOOL_STSO:
-		rc = ethtool_set_tso(dev, useraddr);
-		break;
 	case ETHTOOL_TEST:
 		rc = ethtool_self_test(dev, useraddr);
 		break;
@@ -1745,21 +1726,6 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 	case ETHTOOL_GPERMADDR:
 		rc = ethtool_get_perm_addr(dev, useraddr);
 		break;
-	case ETHTOOL_GUFO:
-		rc = ethtool_get_value(dev, useraddr, ethcmd,
-				       (dev->ethtool_ops->get_ufo ?
-					dev->ethtool_ops->get_ufo :
-					ethtool_op_get_ufo));
-		break;
-	case ETHTOOL_SUFO:
-		rc = ethtool_set_ufo(dev, useraddr);
-		break;
-	case ETHTOOL_GGSO:
-		rc = ethtool_get_gso(dev, useraddr);
-		break;
-	case ETHTOOL_SGSO:
-		rc = ethtool_set_gso(dev, useraddr);
-		break;
 	case ETHTOOL_GFLAGS:
 		rc = ethtool_get_value(dev, useraddr, ethcmd,
 				       (dev->ethtool_ops->get_flags ?
@@ -1790,12 +1756,6 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 	case ETHTOOL_SRXCLSRLINS:
 		rc = ethtool_set_rxnfc(dev, ethcmd, useraddr);
 		break;
-	case ETHTOOL_GGRO:
-		rc = ethtool_get_gro(dev, useraddr);
-		break;
-	case ETHTOOL_SGRO:
-		rc = ethtool_set_gro(dev, useraddr);
-		break;
 	case ETHTOOL_FLASHDEV:
 		rc = ethtool_flash_device(dev, useraddr);
 		break;
@@ -1823,6 +1783,22 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 	case ETHTOOL_SFEATURES:
 		rc = ethtool_set_features(dev, useraddr);
 		break;
+	case ETHTOOL_GTXCSUM:
+	case ETHTOOL_GSG:
+	case ETHTOOL_GTSO:
+	case ETHTOOL_GUFO:
+	case ETHTOOL_GGSO:
+	case ETHTOOL_GGRO:
+		rc = ethtool_get_one_feature(dev, useraddr, ethcmd);
+		break;
+	case ETHTOOL_STXCSUM:
+	case ETHTOOL_SSG:
+	case ETHTOOL_STSO:
+	case ETHTOOL_SUFO:
+	case ETHTOOL_SGSO:
+	case ETHTOOL_SGRO:
+		rc = ethtool_set_one_feature(dev, useraddr, ethcmd);
+		break;
 	default:
 		rc = -EOPNOTSUPP;
 	}
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH v4 4/5] net: introduce NETIF_F_RXCSUM
From: Michał Mirosław @ 2011-02-03 14:21 UTC (permalink / raw)
  To: netdev; +Cc: Ben Hutchings
In-Reply-To: <cover.1296741561.git.mirq-linux@rere.qmqm.pl>

Introduce NETIF_F_RXCSUM to replace device-private flags for RX checksum
offload. Integrate it with ndo_fix_features.

ethtool_op_get_rx_csum() is removed altogether as nothing in-tree uses it.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>                                                    

---
 include/linux/ethtool.h   |    1 -
 include/linux/netdevice.h |    5 ++++-
 net/core/ethtool.c        |   36 ++++++++++++------------------------
 3 files changed, 16 insertions(+), 26 deletions(-)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 806e716..54d776c 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -625,7 +625,6 @@ struct net_device;
 
 /* Some generic methods drivers may use in their ethtool_ops */
 u32 ethtool_op_get_link(struct net_device *dev);
-u32 ethtool_op_get_rx_csum(struct net_device *dev);
 u32 ethtool_op_get_tx_csum(struct net_device *dev);
 int ethtool_op_set_tx_csum(struct net_device *dev, u32 data);
 int ethtool_op_set_tx_hw_csum(struct net_device *dev, u32 data);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4a3e554..45080bb 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -964,6 +964,7 @@ struct net_device {
 #define NETIF_F_FCOE_MTU	(1 << 26) /* Supports max FCoE MTU, 2158 bytes*/
 #define NETIF_F_NTUPLE		(1 << 27) /* N-tuple filters supported */
 #define NETIF_F_RXHASH		(1 << 28) /* Receive hashing offload */
+#define NETIF_F_RXCSUM		(1 << 29) /* Receive checksumming offload */
 
 	/* Segmentation offload features */
 #define NETIF_F_GSO_SHIFT	16
@@ -979,7 +980,7 @@ struct net_device {
 	/* = all defined minus driver/device-class-related */
 #define NETIF_F_NEVER_CHANGE	(NETIF_F_HIGHDMA | NETIF_F_VLAN_CHALLENGED | \
 				  NETIF_F_LLTX | NETIF_F_NETNS_LOCAL)
-#define NETIF_F_ETHTOOL_BITS	(0x1f3fffff & ~NETIF_F_NEVER_CHANGE)
+#define NETIF_F_ETHTOOL_BITS	(0x3f3fffff & ~NETIF_F_NEVER_CHANGE)
 
 	/* List of features with software fallbacks. */
 #define NETIF_F_GSO_SOFTWARE	(NETIF_F_TSO | NETIF_F_TSO_ECN | \
@@ -2501,6 +2502,8 @@ static inline int dev_ethtool_get_settings(struct net_device *dev,
 
 static inline u32 dev_ethtool_get_rx_csum(struct net_device *dev)
 {
+	if (dev->hw_features & NETIF_F_RXCSUM)
+		return !!(dev->features & NETIF_F_RXCSUM);
 	if (!dev->ethtool_ops || !dev->ethtool_ops->get_rx_csum)
 		return 0;
 	return dev->ethtool_ops->get_rx_csum(dev);
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 6e7c6f2..52e4272 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -34,12 +34,6 @@ u32 ethtool_op_get_link(struct net_device *dev)
 }
 EXPORT_SYMBOL(ethtool_op_get_link);
 
-u32 ethtool_op_get_rx_csum(struct net_device *dev)
-{
-	return (dev->features & NETIF_F_ALL_CSUM) != 0;
-}
-EXPORT_SYMBOL(ethtool_op_get_rx_csum);
-
 u32 ethtool_op_get_tx_csum(struct net_device *dev)
 {
 	return (dev->features & NETIF_F_ALL_CSUM) != 0;
@@ -276,6 +270,9 @@ static u32 ethtool_get_feature_mask(u32 eth_cmd)
 	case ETHTOOL_GTXCSUM:
 	case ETHTOOL_STXCSUM:
 		return NETIF_F_ALL_CSUM | NETIF_F_SCTP_CSUM;
+	case ETHTOOL_GRXCSUM:
+	case ETHTOOL_SRXCSUM:
+		return NETIF_F_RXCSUM;
 	case ETHTOOL_GSG:
 	case ETHTOOL_SSG:
 		return NETIF_F_SG;
@@ -310,6 +307,7 @@ static int ethtool_get_one_feature(struct net_device *dev, char __user *useraddr
 }
 
 static int __ethtool_set_tx_csum(struct net_device *dev, u32 data);
+static int __ethtool_set_rx_csum(struct net_device *dev, u32 data);
 static int __ethtool_set_sg(struct net_device *dev, u32 data);
 static int __ethtool_set_tso(struct net_device *dev, u32 data);
 static int __ethtool_set_ufo(struct net_device *dev, u32 data);
@@ -347,6 +345,8 @@ static int ethtool_set_one_feature(struct net_device *dev,
 	switch (ethcmd) {
 	case ETHTOOL_STXCSUM:
 		return __ethtool_set_tx_csum(dev, edata.data);
+	case ETHTOOL_SRXCSUM:
+		return __ethtool_set_rx_csum(dev, edata.data);
 	case ETHTOOL_SSG:
 		return __ethtool_set_sg(dev, edata.data);
 	case ETHTOOL_STSO:
@@ -391,7 +391,7 @@ static const char netdev_features_strings[ETHTOOL_DEV_FEATURE_WORDS * 32][ETH_GS
 	/* NETIF_F_FCOE_MTU */        "fcoe-mtu",
 	/* NETIF_F_NTUPLE */          "rx-ntuple-filter",
 	/* NETIF_F_RXHASH */          "rx-hashing",
-	"",
+	/* NETIF_F_RXCSUM */          "rx-checksum",
 	"",
 	"",
 };
@@ -1369,20 +1369,15 @@ static int __ethtool_set_tx_csum(struct net_device *dev, u32 data)
 	return dev->ethtool_ops->set_tx_csum(dev, data);
 }
 
-static int ethtool_set_rx_csum(struct net_device *dev, char __user *useraddr)
+static int __ethtool_set_rx_csum(struct net_device *dev, u32 data)
 {
-	struct ethtool_value edata;
-
 	if (!dev->ethtool_ops->set_rx_csum)
 		return -EOPNOTSUPP;
 
-	if (copy_from_user(&edata, useraddr, sizeof(edata)))
-		return -EFAULT;
-
-	if (!edata.data && dev->ethtool_ops->set_sg)
+	if (!data)
 		dev->features &= ~NETIF_F_GRO;
 
-	return dev->ethtool_ops->set_rx_csum(dev, edata.data);
+	return dev->ethtool_ops->set_rx_csum(dev, data);
 }
 
 static int __ethtool_set_tso(struct net_device *dev, u32 data)
@@ -1730,15 +1725,6 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 	case ETHTOOL_SPAUSEPARAM:
 		rc = ethtool_set_pauseparam(dev, useraddr);
 		break;
-	case ETHTOOL_GRXCSUM:
-		rc = ethtool_get_value(dev, useraddr, ethcmd,
-				       (dev->ethtool_ops->get_rx_csum ?
-					dev->ethtool_ops->get_rx_csum :
-					ethtool_op_get_rx_csum));
-		break;
-	case ETHTOOL_SRXCSUM:
-		rc = ethtool_set_rx_csum(dev, useraddr);
-		break;
 	case ETHTOOL_TEST:
 		rc = ethtool_self_test(dev, useraddr);
 		break;
@@ -1811,6 +1797,7 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 		rc = ethtool_set_features(dev, useraddr);
 		break;
 	case ETHTOOL_GTXCSUM:
+	case ETHTOOL_GRXCSUM:
 	case ETHTOOL_GSG:
 	case ETHTOOL_GTSO:
 	case ETHTOOL_GUFO:
@@ -1819,6 +1806,7 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 		rc = ethtool_get_one_feature(dev, useraddr, ethcmd);
 		break;
 	case ETHTOOL_STXCSUM:
+	case ETHTOOL_SRXCSUM:
 	case ETHTOOL_SSG:
 	case ETHTOOL_STSO:
 	case ETHTOOL_SUFO:
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH v4 0/5] net: Unified offload configuration
From: Michał Mirosław @ 2011-02-03 14:21 UTC (permalink / raw)
  To: netdev; +Cc: Ben Hutchings

Here's a v4 of the ethtool unification patch series.

What's in it?
 1:
	the patch - implement unified ethtool setting ops
 2..3:
	implement interoperation between old and new ethtool ops
 4:
	include RX checksum in features and plug it into new framework
 5:
	convert loopback device to new framework

What is it good for?
 - unifies driver behaviour wrt hardware offloads
 - removes a lot of boilerplate code from drivers
 - allows better fine-grained control over used offloads

I'm testing this on ARM Gemini arch now. Patch to ethtool userspace tool
will follow this series. I'm not fond of the GFEATURES output I implemented -
please throw some suggestions on it if you can.

Driver conversions stay the same as in v2 - as for v3, I'll keep them
from resending until after the core code gets accepted.

Patches 2,4,5 are unchanged from v3.

Best Regards,
Michał Mirosław


v1: http://marc.info/?l=linux-netdev&m=129245188832643&w=3

Changes from v3:
 - fixed kernel-doc and other comments
 - added HIGHDMA to never-changeable features
 - changed GFEATURES .size interpretation
 - changed feature strings
 - change __ethtool_set_flags() to reject invalid changes

Changes from v2:
 - rebase to net-next after merging v2 leading patches
 - fix missing comma in feature name table
 - force NETIF_F_SOFT_FEATURES in hw_features for simpler code
   (fixes a bug that disallowed changing GSO and GRO state)

Changes from v1:
 - split structures for GFEATURES/SFEATURES
 - naming of feature bits using GSTRINGS ETH_SS_FEATURES
 - strict checking of bits used in SFEATURES call
 - more comments and kernel-doc
 - rebased to net-next after 2.6.37


---

Michał Mirosław (5):
  net: Introduce new feature setting ops
  net: ethtool: use ndo_fix_features for offload setting
  net: use ndo_fix_features for ethtool_ops->set_flags
  net: introduce NETIF_F_RXCSUM
  loopback: convert to hw_features

 drivers/net/loopback.c    |    9 +-
 include/linux/ethtool.h   |   86 ++++++++-
 include/linux/netdevice.h |   48 +++++-
 net/core/dev.c            |   49 ++++-
 net/core/ethtool.c        |  481 ++++++++++++++++++++++++++++-----------------
 5 files changed, 480 insertions(+), 193 deletions(-)

-- 
1.7.2.3


^ permalink raw reply

* [PATCH v4 3/5] net: use ndo_fix_features for ethtool_ops->set_flags
From: Michał Mirosław @ 2011-02-03 14:21 UTC (permalink / raw)
  To: netdev; +Cc: Ben Hutchings
In-Reply-To: <cover.1296741561.git.mirq-linux@rere.qmqm.pl>

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 net/core/ethtool.c |   31 +++++++++++++++++++++++++++++--
 1 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 555accf..6e7c6f2 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -240,6 +240,34 @@ static int ethtool_set_features(struct net_device *dev, void __user *useraddr)
 	return ret;
 }
 
+static int __ethtool_set_flags(struct net_device *dev, u32 data)
+{
+	u32 changed;
+
+	if (data & ~flags_dup_features)
+		return -EINVAL;
+
+	/* legacy set_flags() op */
+	if (dev->ethtool_ops->set_flags) {
+		if (unlikely(dev->hw_features & flags_dup_features))
+			netdev_warn(dev,
+				"driver BUG: mixed hw_features and set_flags()\n");
+		return dev->ethtool_ops->set_flags(dev, data);
+	}
+
+	/* allow changing only bits set in hw_features */
+	changed = (data ^ dev->wanted_features) & flags_dup_features;
+	if (changed & ~dev->hw_features)
+		return -EOPNOTSUPP;
+
+	dev->wanted_features =
+		(dev->wanted_features & ~changed) | data;
+
+	netdev_update_features(dev);
+
+	return 0;
+}
+
 static u32 ethtool_get_feature_mask(u32 eth_cmd)
 {
 	/* feature masks of legacy discrete ethtool ops */
@@ -1733,8 +1761,7 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 					ethtool_op_get_flags));
 		break;
 	case ETHTOOL_SFLAGS:
-		rc = ethtool_set_value(dev, useraddr,
-				       dev->ethtool_ops->set_flags);
+		rc = ethtool_set_value(dev, useraddr, __ethtool_set_flags);
 		break;
 	case ETHTOOL_GPFLAGS:
 		rc = ethtool_get_value(dev, useraddr, ethcmd,
-- 
1.7.2.3


^ permalink raw reply related

* Re: [PATCH] tcp: Increase the initial congestion window to 10.
From: John Heffner @ 2011-02-03 14:17 UTC (permalink / raw)
  To: Hagen Paul Pfeifer; +Cc: David Miller, netdev, dccp, therbert
In-Reply-To: <47859e9273bb0c1b2c32fc1adfa450ee@localhost>

On Thu, Feb 3, 2011 at 9:00 AM, Hagen Paul Pfeifer <hagen@jauu.net> wrote:
>
> On Wed, 02 Feb 2011 17:07:50 -0800 (PST), David Miller wrote:
>
>> +/* TCP initial congestion window */
>> +#define TCP_INIT_CWND                10
>> +
>
> Davem, there is _no_ research how this huge IW will behave in environments
> with a small BDP. Belief it or not, but there are networks out there with a
> BDP similar to ~1980. Why for heaven's sake should this work in these
> environments? There are only _two_ extensive analysis one from Cherry and
> and one from Ilpo. Both analysis focus on current mainstream BDP. I started
> to setup a ns-2 environment but due to lack of time I am a little bit
> behind schedule. Anyway, why not make this a tunable per route knob so that
> it is easier to fix things, e.g. set IW back to 3.


There's already a per-route tunable, right (RTAX_INITCWND)?

  -John

^ permalink raw reply

* Re: [PATCH] tcp: Increase the initial congestion window to 10.
From: Hagen Paul Pfeifer @ 2011-02-03 14:00 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, dccp, therbert
In-Reply-To: <20110202.170750.229739784.davem@davemloft.net>


On Wed, 02 Feb 2011 17:07:50 -0800 (PST), David Miller wrote:



> +/* TCP initial congestion window */

> +#define TCP_INIT_CWND		10

> +



Davem, there is _no_ research how this huge IW will behave in environments

with a small BDP. Belief it or not, but there are networks out there with a

BDP similar to ~1980. Why for heaven's sake should this work in these

environments? There are only _two_ extensive analysis one from Cherry and

and one from Ilpo. Both analysis focus on current mainstream BDP. I started

to setup a ns-2 environment but due to lack of time I am a little bit

behind schedule. Anyway, why not make this a tunable per route knob so that

it is easier to fix things, e.g. set IW back to 3.



HGN

^ permalink raw reply

* Re: [PATCH] NETFILTER module xt_hmark new target for HASH MARK
From: Pablo Neira Ayuso @ 2011-02-03 13:51 UTC (permalink / raw)
  To: Hans Schillstrom; +Cc: kaber, jengelh, netfilter-devel, netdev, hans
In-Reply-To: <1296740050-6311-2-git-send-email-hans.schillstrom@ericsson.com>

On 03/02/11 14:34, Hans Schillstrom wrote:
> +/*
> + * Calc hash value, special casre is taken on icmp and fragmented messages
> + * i.e. fragmented messages don't use ports.
> + */
> +static __u32 get_hash(struct sk_buff *skb, struct xt_hmark_info *info)
> +{
[...]
> +	ip_proto &= info->prmask;
> +	/* get a consistent hash (same value on both flow directions) */
> +	if (addr2 < addr1)
> +		swap(addr1, addr2);

this assumption is not valid in NAT handlings.

If you want consistent hashing with NAT handlings you'll have to make
this stateful and use the conntrack source and reply directions of the
original tuples (thus making it stateful). That may be a problem because
some people may want to use this without enabling connection tracking.

Are you using this for (uplink) load balancing?

Could you also include one realistic example in the patch description on
how this is used?

If this is accepted, I think this has to be merge with the (already
overloaded) MARK target.

^ permalink raw reply

* Re: [PATCH v3 3/5] net: use ndo_fix_features for ethtool_ops->set_flags
From: Michał Mirosław @ 2011-02-03 13:45 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev
In-Reply-To: <1296341043.3477.64.camel@localhost>

On Sun, Jan 30, 2011 at 08:44:03AM +1000, Ben Hutchings wrote:
> On Sat, 2011-01-29 at 19:39 +0100, Michał Mirosław wrote:
> > Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > ---
> >  net/core/ethtool.c |   22 ++++++++++++++++++++--
> >  1 files changed, 20 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/core/ethtool.c b/net/core/ethtool.c
> > index 409aebb..17a689f4 100644
> > --- a/net/core/ethtool.c
> > +++ b/net/core/ethtool.c
> > @@ -240,6 +240,25 @@ static int ethtool_set_features(struct net_device *dev, void __user *useraddr)
> >  	return ret;
> >  }
> >  
> > +static int __ethtool_set_flags(struct net_device *dev, u32 data)
> > +{
> > +	if (data & ~flags_dup_features)
> > +		return -EINVAL;
> > +
> > +	if (!(dev->hw_features & flags_dup_features)) {
> > +		if (!dev->ethtool_ops->set_flags)
> > +			return -EOPNOTSUPP;
> > +		return dev->ethtool_ops->set_flags(dev, data);
> > +	}
> > +
> > +	dev->wanted_features =
> > +		(dev->wanted_features & ~flags_dup_features) | data;
> > +
> > +	netdev_update_features(dev);
> > +
> > +	return 0;
> [...]
> 
> I think it would be clearer to write this as:
[cut]

I changed it to check for more invalid states and to reject changing
of bits not in hw_features but present in flags_dup_features.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCH v3 1/5] net: Introduce new feature setting ops
From: Michał Mirosław @ 2011-02-03 13:42 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev
In-Reply-To: <1296339864.3477.52.camel@localhost>

On Sun, Jan 30, 2011 at 08:24:24AM +1000, Ben Hutchings wrote:
> On Sat, 2011-01-29 at 19:39 +0100, Michał Mirosław wrote:
> > This introduces a new framework to handle device features setting.
> > It consists of:
> >   - new fields in struct net_device:
> > 	+ hw_features - features that hw/driver supports toggling
> > 	+ wanted_features - features that user wants enabled, when possible
> >   - new netdev_ops:
> > 	+ feat = ndo_fix_features(dev, feat) - API checking constraints for
> > 		enabling features or their combinations
> > 	+ ndo_set_features(dev) - API updating hardware state to match
> > 		changed dev->features
> >   - new ethtool commands:
> > 	+ ETHTOOL_GFEATURES/ETHTOOL_SFEATURES: get/set dev->wanted_features
> > 		and trigger device reconfiguration if resulting dev->features
> > 		changed
> > 	+ ETHTOOL_GSTRINGS(ETH_SS_FEATURES): get feature bits names (meaning)
> > 
> > Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > ---
> >  include/linux/ethtool.h   |   86 +++++++++++++++++++++++++
> >  include/linux/netdevice.h |   44 ++++++++++++-
> >  net/core/dev.c            |   47 ++++++++++++--
> >  net/core/ethtool.c        |  154 +++++++++++++++++++++++++++++++++++++++++----
> >  4 files changed, 312 insertions(+), 19 deletions(-)
> > 
> > diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
> > index 1908929..b832083 100644
> > --- a/include/linux/ethtool.h
> > +++ b/include/linux/ethtool.h
> > @@ -251,6 +251,7 @@ enum ethtool_stringset {
> >  	ETH_SS_STATS,
> >  	ETH_SS_PRIV_FLAGS,
> >  	ETH_SS_NTUPLE_FILTERS,
> > +	ETH_SS_FEATURES,
> >  };
> >  
> >  /* for passing string sets for data tagging */
> > @@ -523,6 +524,88 @@ struct ethtool_flash {
> >  	char	data[ETHTOOL_FLASH_MAX_FILENAME];
> >  };
> >  
> > +/* for returning and changing feature sets */
> > +
> > +/**
> > + * struct ethtool_get_features_block - block with state of 32 features
> > + * @avaliable: mask of changeable features
> Typo, should be @available.

Fixed.

> > + * @requested: mask of features requested to be enabled if possible
> > + * @active: mask of currently enabled features
> > + * @never_changed: mask of never-changeable features
> 
> I don't think the description of never_changed is clear enough.  It
> should be described as something like:
>     Mask of feature flags that are not changeable for any device.

Fixed. I shortened the text a bit.

> > + */
> > +struct ethtool_get_features_block {
> > +	__u32	available;	/* features togglable */
> > +	__u32	requested;	/* features requested to be enabled */
> > +	__u32	active;		/* features currently enabled */
> > +	__u32	never_changed;	/* never-changeable features */
> 
> Don't comment the fields inline as well as in the kernel-doc.

Removed this and other occurrences.

> > +};
> > +
> > +/**
> > + * struct ethtool_gfeatures - command to get state of device's features
> > + * @cmd: command number = %ETHTOOL_GFEATURES
> > + * @size: in: array size of the features[] array
> > + *       out: count of features[] elements filled
> 
> The value on output should be the size required to read all features, so
> that the caller can discover it easily.
> 
> The two lines describing 'size' will be wrapped together when converted
> to another format (manual page, HTML...).  You need to use at least a
> full stop (period) to separate them.

Fixed. Anyway, userspace should use GSSET_INFO/SS_FEATURES for this.

> [...]
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index c7d7074..6490860 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -783,6 +783,16 @@ struct netdev_tc_txq {
> >   *	Set hardware filter for RFS.  rxq_index is the target queue index;
> >   *	flow_id is a flow ID to be passed to rps_may_expire_flow() later.
> >   *	Return the filter ID on success, or a negative error code.
> > + *
> > + * u32 (*ndo_fix_features)(struct net_device *dev, u32 features);
> > + *	Modifies features supported by device depending on device-specific
> > + *	constraints. Should not modify hardware state.
> 
> I don't think this wording is clear enough.  How about:
> 
>     Adjusts the requested feature flags according to device-specific
>     constraints, and returns the resulting flags.  Must not modify
>     the device state.

I like your version better, too. I also updated ndo_set_features() description.

> [...]
> > @@ -954,6 +974,12 @@ struct net_device {
> >  #define NETIF_F_TSO6		(SKB_GSO_TCPV6 << NETIF_F_GSO_SHIFT)
> >  #define NETIF_F_FSO		(SKB_GSO_FCOE << NETIF_F_GSO_SHIFT)
> >  
> > +	/* Features valid for ethtool to change */
> > +	/* = all defined minus driver/device-class-related */
> > +#define NETIF_F_NEVER_CHANGE	(NETIF_F_VLAN_CHALLENGED | \
> > +				  NETIF_F_LLTX | NETIF_F_NETNS_LOCAL)
> 
> Shouldn't NETIF_F_NO_CSUM and NETIF_F_HIGHDMA be included in this?  (And
> excluded from NETIF_F_ALL_TX_OFFLOADS.)

NETIF_F_HIGHDMA added. NO_CSUM can be changed for some software devices
like bridge or bond.

> > +#define NETIF_F_ETHTOOL_BITS	(0x1f3fffff & ~NETIF_F_NEVER_CHANGE)
> > +
> >  	/* List of features with software fallbacks. */
> >  #define NETIF_F_GSO_SOFTWARE	(NETIF_F_TSO | NETIF_F_TSO_ECN | \
> >  				 NETIF_F_TSO6 | NETIF_F_UFO)
> > @@ -964,6 +990,12 @@ struct net_device {
> >  #define NETIF_F_V6_CSUM		(NETIF_F_GEN_CSUM | NETIF_F_IPV6_CSUM)
> >  #define NETIF_F_ALL_CSUM	(NETIF_F_V4_CSUM | NETIF_F_V6_CSUM)
> >  
> > +#define NETIF_F_ALL_TSO 	(NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_TSO_ECN)
> > +
> > +#define NETIF_F_ALL_TX_OFFLOADS	(NETIF_F_ALL_CSUM | NETIF_F_SG | \
> > +				 NETIF_F_FRAGLIST | NETIF_F_ALL_TSO | \
> > +				 NETIF_F_SCTP_CSUM | NETIF_F_FCOE_CRC)
> > +
> >  	/*
> >  	 * If one device supports one of these features, then enable them
> >  	 * for all in netdev_increment_features.
> [...]
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index ddd5df2..0ccbee6 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> [...]
> > @@ -5267,6 +5273,35 @@ u32 netdev_fix_features(struct net_device *dev, u32 features)
> >  }
> >  EXPORT_SYMBOL(netdev_fix_features);
> >  
> > +void netdev_update_features(struct net_device *dev)
> > +{
> > +	u32 features;
> > +	int err = 0;
> > +
> > +	features = netdev_get_wanted_features(dev);
> > +
> > +	if (dev->netdev_ops->ndo_fix_features)
> > +		features = dev->netdev_ops->ndo_fix_features(dev, features);
> > +
> > +	/* driver might be less strict about feature dependencies */
> > +	features = netdev_fix_features(dev, features);
> > +
> > +	if (dev->features == features)
> > +		return;
> > +
> > +	netdev_info(dev, "Features changed: 0x%08x -> 0x%08x\n",
> > +		dev->features, features);
> > +
> > +	if (dev->netdev_ops->ndo_set_features)
> > +		err = dev->netdev_ops->ndo_set_features(dev, features);
> > +
> > +	if (!err)
> > +		dev->features = features;
> > +	else if (err < 0)
> > +		netdev_err(dev, "set_features() failed (%d)\n", err);
> 
> The error message should include the feature flags passed, since the
> previous informational message may be filtered out.

Fixed. This will make checkpatch.pl complain about long line, but I don't
want to split the message's text as it'd be hard to grep for it then.

> > +}
> > +EXPORT_SYMBOL(netdev_update_features);
> > +
> >  /**
> >   *	netif_stacked_transfer_operstate -	transfer operstate
> >   *	@rootdev: the root or lower level device to transfer state from
> [...]
> > diff --git a/net/core/ethtool.c b/net/core/ethtool.c
> > index 5984ee0..1420edd 100644
> > --- a/net/core/ethtool.c
> > +++ b/net/core/ethtool.c
> > @@ -55,6 +55,7 @@ int ethtool_op_set_tx_csum(struct net_device *dev, u32 data)
> >  
> >  	return 0;
> >  }
> > +EXPORT_SYMBOL(ethtool_op_set_tx_csum);
> >  
> >  int ethtool_op_set_tx_hw_csum(struct net_device *dev, u32 data)
> >  {
> > @@ -171,6 +172,136 @@ EXPORT_SYMBOL(ethtool_ntuple_flush);
> >  
> >  /* Handlers for each ethtool command */
> >  
> > +#define ETHTOOL_DEV_FEATURE_WORDS	1
> > +
> > +static int ethtool_get_features(struct net_device *dev, void __user *useraddr)
> > +{
> > +	struct ethtool_gfeatures cmd = {
> > +		.cmd = ETHTOOL_GFEATURES,
> > +		.size = ETHTOOL_DEV_FEATURE_WORDS,
> > +	};
> > +	struct ethtool_get_features_block features[ETHTOOL_DEV_FEATURE_WORDS] = {
> > +		{
> > +			.available = dev->hw_features,
> > +			.requested = dev->wanted_features,
> > +			.active = dev->features,
> > +			.never_changed = NETIF_F_NEVER_CHANGE,
> > +		},
> > +	};
> > +	u32 __user *sizeaddr;
> > +	u32 in_size;
> > +
> > +	sizeaddr = useraddr + offsetof(struct ethtool_gfeatures, size);
> > +	if (get_user(in_size, sizeaddr))
> > +		return -EFAULT;
> > +
> > +	if (in_size < ETHTOOL_DEV_FEATURE_WORDS)
> > +		return -EINVAL;
> I don't think this should be considered invalid.  Instead:
> 
> 	u32 copy_size;
> 	...
> 	copy_size = min_t(u32, in_size, ETHTOOL_DEV_FEATURE_WORDS);

> > +	if (copy_to_user(useraddr, &cmd, sizeof(cmd)))
> > +		return -EFAULT;
> > +	useraddr += sizeof(cmd);
> > +	if (copy_to_user(useraddr, features, sizeof(features)))
> 
> and:
> 	if (copy_to_user(useraddr, features, copy_size * sizeof(features[0]))

Fixed to equivalent code but without additional variable.

> [...]
> > +static const char netdev_features_strings[ETHTOOL_DEV_FEATURE_WORDS * 32][ETH_GSTRING_LEN] = {
> > +	/* NETIF_F_SG */              "scatter-gather",
> SG really means TX DMA gather only, as the driver is responsible for
> allocating its own RX buffers.

I changed this and other feature strings to prefix them with "tx-" or "rx-"
as apropriate.

> > +	/* NETIF_F_IP_CSUM */         "tx-checksum-hw-ipv4",
> > +	/* NETIF_F_NO_CSUM */         "tx-checksum-local",
> > +	/* NETIF_F_HW_CSUM */         "tx-checksum-hw-ip-generic",
> > +	/* NETIF_F_IPV6_CSUM */       "tx_checksum-hw-ipv6",
> > +	/* NETIF_F_HIGHDMA */         "highdma",
> > +	/* NETIF_F_FRAGLIST */        "scatter-gather-fraglist",
> > +	/* NETIF_F_HW_VLAN_TX */      "tx-vlan-hw",
> > +
> > +	/* NETIF_F_HW_VLAN_RX */      "rx-vlan-hw",
> > +	/* NETIF_F_HW_VLAN_FILTER */  "rx-vlan-filter",
> > +	/* NETIF_F_VLAN_CHALLENGED */ "*vlan-challenged",
> Don't mark the unchangeable features specially here; that can be done by
> userland.  Actually, I wonder whether they really need descriptions at
> all.

Removed the markings. I think it's still good to have them for debugging
purposes - ethtool can show them then. At least vlan-challenged state is
useful information.

> > +	/* NETIF_F_GSO */             "generic-segmentation-offload",
> > +	/* NETIF_F_LLTX */            "*lockless-tx",
> > +	/* NETIF_F_NETNS_LOCAL */     "*netns-local",
> > +	/* NETIF_F_GRO */             "generic-receive-offload",
> > +	/* NETIF_F_LRO */             "large-receive-offload",
> > +
> > +	/* NETIF_F_TSO */             "tcp-segmentation-offload",
> > +	/* NETIF_F_UFO */             "udp-fragmentation-offload",
> > +	/* NETIF_F_GSO_ROBUST */      "gso-robust",
> > +	/* NETIF_F_TSO_ECN */         "tcp-ecn-segmentation-offload",
> > +	/* NETIF_F_TSO6 */            "ipv6-tcp-segmentation-offload",
> > +	/* NETIF_F_FSO */             "fcoe-segmentation-offload",
> > +	"",
> > +	"",
> > +
> > +	/* NETIF_F_FCOE_CRC */        "tx-checksum-fcoe-crc",
> > +	/* NETIF_F_SCTP_CSUM */       "tx-checksum-sctp",
> > +	/* NETIF_F_FCOE_MTU */        "fcoe-mtu",
> > +	/* NETIF_F_NTUPLE */          "ntuple-filter",
> [...]
> 
> I think this should be named 'rx-ntuple-filter'.  TX filtering may be
> controlled for related devices (VFs) and is completely separate from
> this.

Changed, as above.

Thanks for the review!

Best Regards,
Michał Mirosław

^ permalink raw reply

* [PATCH] NETFILTER userspace part for target HMARK
From: Hans Schillstrom @ 2011-02-03 13:34 UTC (permalink / raw)
  To: kaber, jengelh, netfilter-devel, netdev; +Cc: hans, Hans Schillstrom
In-Reply-To: <1296740050-6311-1-git-send-email-hans.schillstrom@ericsson.com>

The target allows you to create rules in the "raw" and "mangle" tables
which alter the netfilter mark (nfmark) field within a given range.
First a 32 bit hash value is generated then modulus by <limit> and
finally an offset is added before it's written to nfmark.
Prior to routing, the nfmark can influence the routing method (see
"Use netfilter MARK value as routing key") and can also be used by
other subsystems to change their behaviour.

The mark match can also be used to match nfmark produced by this module.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 extensions/libxt_HMARK.c           |  366 ++++++++++++++++++++++++++++++++++++
 extensions/libxt_HMARK.man         |   60 ++++++
 include/linux/netfilter/xt_hmark.h |   28 +++
 3 files changed, 454 insertions(+), 0 deletions(-)
 create mode 100644 extensions/libxt_HMARK.c
 create mode 100644 extensions/libxt_HMARK.man
 create mode 100644 include/linux/netfilter/xt_hmark.h

diff --git a/extensions/libxt_HMARK.c b/extensions/libxt_HMARK.c
new file mode 100644
index 0000000..04ee516
--- /dev/null
+++ b/extensions/libxt_HMARK.c
@@ -0,0 +1,366 @@
+/*
+ * Shared library add-on to iptables to add HMARK target support.
+ *
+ * The kernel module calculates a hash value that can be modified by modulus
+ * and an offset. The hash value is based on a direction independent
+ * five tuple: src & dst addr src & dst ports and protocol.
+ * However src & dst port can be masked and are not used for fragmented
+ * packets, ESP and AH don't have ports so SPI will be used instead.
+ * For ICMP error messages the hash mark values will be calculated on
+ * the source packet i.e. the packet caused the error (If sufficient
+ * amount of data exists).
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <stdbool.h>
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include <getopt.h>
+
+#include <xtables.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_hmark.h>
+
+/*
+ * Flags must not start at 0, since it's used as none.
+ */
+enum {
+	XT_HMARK_SADR_AND = 1,
+	XT_HMARK_DADR_AND,
+	XT_HMARK_SPI_AND,
+	XT_HMARK_SPI_OR,
+	XT_HMARK_SPORT_AND,
+	XT_HMARK_DPORT_AND,
+	XT_HMARK_SPORT_OR,
+	XT_HMARK_DPORT_OR,
+	XT_HMARK_PROTO_AND,
+	XT_HMARK_RND,
+	XT_HMARK_MODULUS,
+	XT_HMARK_OFFSET,
+};
+
+#define DEF_HRAND 0xc175a3b8	/* Default "random" value to jhash */
+
+static void HMARK_help(void)
+{
+	printf(
+"HMARK target options, i.e. modify hash calculation by:\n"
+"  --hmark-smask value                Mask source address with value\n"
+"  --hmark-dmask value                Mask Dest. address with value\n"
+"  --hmark-sp-mask value              Mask src port with value\n"
+"  --hmark-dp-mask value              Mask dst port with value\n"
+"  --hmark-spi-mask value             For esp and ah AND spi with value\n"
+"  --hmark-sp-set value               OR src port with value\n"
+"  --hmark-dp-set value               OR dst port with value\n"
+"  --hmark-spi-set value              For esp and ah OR spi with value\n"
+"  --hmark-proto-mask value           Mask Protocol with value\n"
+"  --hmark-rnd                        Random value to hash cacl.\n"
+"  Limit/modify the calculated hash mark by:\n"
+"  --hmark-mod value                  nfmark modulus value\n"
+"  --hmark-offs value                 Last action add value to nfmark\n"
+" In many cases hmark can be omitted i.e. --smask can be used\n");
+}
+
+static const struct option HMARK_opts[] = {
+	{ "hmark-smask", 1, NULL, XT_HMARK_SADR_AND },
+	{ "hmark-dmask", 1, NULL, XT_HMARK_DADR_AND },
+	{ "hmark-sp-mask", 1, NULL, XT_HMARK_SPORT_AND },
+	{ "hmark-dp-mask", 1, NULL, XT_HMARK_DPORT_AND },
+	{ "hmark-spi-mask", 1, NULL, XT_HMARK_SPI_AND },
+	{ "hmark-sp-set", 1, NULL, XT_HMARK_SPORT_OR },
+	{ "hmark-dp-set", 1, NULL, XT_HMARK_DPORT_OR },
+	{ "hmark-spi-set", 1, NULL, XT_HMARK_SPI_OR },
+	{ "hmark-proto-mask", 1, NULL, XT_HMARK_PROTO_AND },
+	{ "hmark-rnd", 1, NULL, XT_HMARK_RND },
+	{ "hmark-mod", 1, NULL, XT_HMARK_MODULUS },
+	{ "hmark-offs", 1, NULL, XT_HMARK_OFFSET },
+	{ "smask", 1, NULL, XT_HMARK_SADR_AND },
+	{ "dmask", 1, NULL, XT_HMARK_DADR_AND },
+	{ "sp-mask", 1, NULL, XT_HMARK_SPORT_AND },
+	{ "dp-mask", 1, NULL, XT_HMARK_DPORT_AND },
+	{ "spi-mask", 1, NULL, XT_HMARK_SPI_AND },
+	{ "sp-set", 1, NULL, XT_HMARK_SPORT_OR },
+	{ "dp-set", 1, NULL, XT_HMARK_DPORT_OR },
+	{ "spi-set", 1, NULL, XT_HMARK_SPI_OR },
+	{ "proto-mask", 1, NULL, XT_HMARK_PROTO_AND },
+	{ "rnd", 1, NULL, XT_HMARK_RND },
+	{ "mod", 1, NULL, XT_HMARK_MODULUS },
+	{ "offs", 1, NULL, XT_HMARK_OFFSET },
+	{ .name = NULL }
+};
+
+static int
+HMARK_parse(int c, char **argv, int invert, unsigned int *flags,
+	    const void *entry, struct xt_entry_target **target)
+{
+	struct xt_hmark_info *hmarkinfo
+		= (struct xt_hmark_info *)(*target)->data;
+	unsigned int value = 0xffffffff;
+	unsigned int maxint = UINT32_MAX;
+
+	if ((c < XT_HMARK_SADR_AND) || (c > XT_HMARK_OFFSET)) {
+		xtables_error(PARAMETER_PROBLEM, "Bad HMARK option \"%s\"",
+			      optarg);
+		return 0;
+	}
+
+	if (c >= XT_HMARK_SPORT_AND && c <= XT_HMARK_DPORT_OR)
+		maxint = UINT16_MAX;
+	else if (c == XT_HMARK_PROTO_AND)
+		maxint = UINT8_MAX;
+
+	if (!xtables_strtoui(optarg, NULL, &value, 0, maxint))
+		xtables_error(PARAMETER_PROBLEM, "Bad HMARK value \"%s\"",
+			      optarg);
+
+	if (*flags == 0) {
+		memset(hmarkinfo, 0xff, sizeof(struct xt_hmark_info));
+		hmarkinfo->pset.v32 = 0;
+		hmarkinfo->flags = 0;
+		hmarkinfo->spiset = 0;
+		hmarkinfo->hoffs = 0;
+		hmarkinfo->hashrnd = DEF_HRAND;
+	}
+	switch (c) {
+	case XT_HMARK_SADR_AND:
+		if (*flags & (1 << c)) {
+			xtables_error(PARAMETER_PROBLEM,
+				      "Can only specify "
+				      "`--hmark-smask' once");
+		}
+		hmarkinfo->smask = htonl(value);
+		if (value == maxint)
+			c = 0;
+		break;
+
+	case XT_HMARK_DADR_AND:
+		if (*flags & (1 << c)) {
+			xtables_error(PARAMETER_PROBLEM,
+				      "Can only specify "
+				      "`--hmark-dmask' once");
+		}
+		hmarkinfo->dmask = htonl(value);
+		if (value == maxint)
+			c = 0;
+		break;
+
+	case XT_HMARK_MODULUS:
+		if (*flags & (1 << c)) {
+			xtables_error(PARAMETER_PROBLEM,
+				      "Can only specify "
+				      "`--hmark-mod' once");
+		}
+		if (value == 0) {
+			xtables_error(PARAMETER_PROBLEM,
+				      "xxx modulus 0 ? "
+				      "thats a div by 0");
+			value = 0xffffffff;
+		}
+		hmarkinfo->hmod = value;
+		if (value == maxint)
+			c = 0;
+		break;
+
+	case XT_HMARK_OFFSET:
+		if (*flags & (1 << c)) {
+			xtables_error(PARAMETER_PROBLEM,
+				      "Can only specify "
+				      "`--hmark-offs' once");
+		}
+		hmarkinfo->hoffs = value;
+		if (value == 0)
+			c = 0;
+		break;
+
+	case XT_HMARK_SPORT_AND:
+		if (*flags & (1 << c)) {
+			xtables_error(PARAMETER_PROBLEM,
+				      "Can only specify "
+				      "`--hmark-sp-mask' once");
+		}
+		hmarkinfo->pmask.p16.src = htons(value);
+		if (value == maxint)
+			c = 0;
+		break;
+
+	case XT_HMARK_DPORT_AND:
+		if (*flags & (1 << c)) {
+			xtables_error(PARAMETER_PROBLEM,
+				      "Can only specify "
+				      "`--hmark-dp-mask' once");
+		}
+		hmarkinfo->pmask.p16.dst = htons(value);
+		if (value == maxint)
+			c = 0;
+		break;
+
+	case XT_HMARK_SPI_AND:
+		if (*flags & (1 << c)) {
+			xtables_error(PARAMETER_PROBLEM,
+				      "Can only specify "
+				      "`--hmark-spi-mask' once");
+		}
+		hmarkinfo->spimask = htonl(value);
+		if (value == maxint)
+			c = 0;
+		break;
+
+	case XT_HMARK_SPORT_OR:
+		if (*flags & (1 << c)) {
+			xtables_error(PARAMETER_PROBLEM,
+				      "Can only specify "
+				      "`--hmark-sp-set' once");
+		}
+		hmarkinfo->pset.p16.src = htons(value);
+		if (!value)
+			c = 0;
+		break;
+
+	case XT_HMARK_DPORT_OR:
+		if (*flags & (1 << c)) {
+			xtables_error(PARAMETER_PROBLEM,
+				      "Can only specify "
+				      "`--hmark-dp-set' once");
+		}
+		hmarkinfo->pset.p16.dst = htons(value);
+		if (!value)
+			c = 0;
+		break;
+
+	case XT_HMARK_SPI_OR:
+		if (*flags & (1 << c)) {
+			xtables_error(PARAMETER_PROBLEM,
+				      "Can only specify "
+				      "`--hmark-spi-set' once");
+		}
+		hmarkinfo->spiset = htonl(value);
+		if (!value)
+			c = 0;
+		break;
+
+	case XT_HMARK_PROTO_AND:
+		if (*flags & (1 << c)) {
+			xtables_error(PARAMETER_PROBLEM,
+				      "Can only specify "
+				      "`--hmark-proto-mask' once");
+		}
+		hmarkinfo->prmask = value;
+		if (value == maxint)
+			c = 0;
+		break;
+
+	case XT_HMARK_RND:
+		if (*flags & (1 << c)) {
+			xtables_error(PARAMETER_PROBLEM,
+				      "Can only specify `--hmark-rnd' once");
+		}
+		hmarkinfo->hashrnd = value;
+		if (value == DEF_HRAND)
+			c = 0;
+		break;
+
+	default:
+		return 0;
+	}
+	*flags |= 1 << c;
+	hmarkinfo->flags = *flags;
+
+	return 1;
+}
+
+static void HMARK_check(unsigned int flags)
+{
+	if (!(flags & (1 << XT_HMARK_MODULUS)))
+		xtables_error(PARAMETER_PROBLEM, "HMARK: the --hmark-mod, "
+			   "is not set, that means the nfmark will be in range"
+			   " 0 - 0xffffffff");
+}
+
+static void HMARK_print(const void *ip, const struct xt_entry_target *target,
+			int numeric)
+{
+	const struct xt_hmark_info *info =
+			(const struct xt_hmark_info *)target->data;
+
+	printf("HMARK ");
+	if (info->flags & (1 << XT_HMARK_SADR_AND))
+		printf("smask 0x%x ", htonl(info->smask));
+	if (info->flags & (1 << XT_HMARK_DADR_AND))
+		printf("dmask 0x%x ", htonl(info->dmask));
+	if (info->flags & (1 << XT_HMARK_SPORT_AND))
+		printf("sp-mask 0x%x ", htons(info->pmask.p16.src));
+	if (info->flags & (1 << XT_HMARK_DPORT_AND))
+		printf("dp-mask 0x%x ", htons(info->pmask.p16.dst));
+	if (info->flags & (1 << XT_HMARK_SPI_AND))
+		printf("spi-mask 0x%x ", htonl(info->spimask));
+	if (info->flags & (1 << XT_HMARK_SPORT_OR))
+		printf("sp-set 0x%x ", htons(info->pset.p16.src));
+	if (info->flags & (1 << XT_HMARK_DPORT_OR))
+		printf("dp-set 0x%x ", htons(info->pset.p16.dst));
+	if (info->flags & (1 << XT_HMARK_SPI_OR))
+		printf("spi-set 0x%x ", htonl(info->spiset));
+	if (info->flags & (1 << XT_HMARK_PROTO_AND))
+		printf("proto-mask 0x%x ", info->prmask);
+	if (info->flags & (1 << XT_HMARK_RND))
+		printf("rnd 0x%x ", info->hashrnd);
+	if (info->flags & (1 << XT_HMARK_MODULUS))
+		printf("mark=hv %% 0x%x ", info->hmod);
+	if (info->flags & (1 << XT_HMARK_OFFSET))
+		printf("+ 0x%x ", info->hoffs);
+}
+
+static void HMARK_save(const void *ip, const struct xt_entry_target *target)
+{
+	const struct xt_hmark_info *info =
+		(const struct xt_hmark_info *)target->data;
+
+	if (info->flags & (1 << XT_HMARK_SADR_AND))
+		printf("--hmark-smask 0x%x ", htonl(info->smask));
+	if (info->flags & (1 << XT_HMARK_DADR_AND))
+		printf("--hmark-dmask 0x%x ", htonl(info->dmask));
+	if (info->flags & (1 << XT_HMARK_SPORT_AND))
+		printf("--hmark-sp-mask 0x%x ", htons(info->pmask.p16.src));
+	if (info->flags & (1 << XT_HMARK_DPORT_AND))
+		printf("--hmark-dp-mask 0x%x ", htons(info->pmask.p16.dst));
+	if (info->flags & (1 << XT_HMARK_SPI_AND))
+		printf("--hmark-spi-mask 0x%x ", htonl(info->spimask));
+	if (info->flags & (1 << XT_HMARK_SPORT_OR))
+		printf("--hmark-sp-set 0x%x ", htons(info->pset.p16.src));
+	if (info->flags & (1 << XT_HMARK_DPORT_OR))
+		printf("--hmark-dp-set 0x%x ", htons(info->pset.p16.dst));
+	if (info->flags & (1 << XT_HMARK_SPI_OR))
+		printf("--hmark-spi-set 0x%x ", htonl(info->spiset));
+	if (info->flags & (1 << XT_HMARK_PROTO_AND))
+		printf("--hmark-proto-mask 0x%x ", info->prmask);
+	if (info->flags & (1 << XT_HMARK_RND))
+		printf("--hmark-rnd 0x%x ", info->hashrnd);
+	if (info->flags & (1 << XT_HMARK_MODULUS))
+		printf("--hmark-mod 0x%x ", info->hmod);
+	if (info->flags & (1 << XT_HMARK_OFFSET))
+		printf("--hmark-offs 0x%x ", info->hoffs);
+}
+
+static struct xtables_target mark_tg_reg[] = {
+	{
+		.family        = NFPROTO_UNSPEC,
+		.name          = "HMARK",
+		.version       = XTABLES_VERSION,
+		.revision      = 0,
+		.size          = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.userspacesize = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.help          = HMARK_help,
+		.parse         = HMARK_parse,
+		.final_check   = HMARK_check,
+		.print         = HMARK_print,
+		.save          = HMARK_save,
+		.extra_opts    = HMARK_opts,
+	},
+};
+
+void _init(void)
+{
+	xtables_register_targets(mark_tg_reg, ARRAY_SIZE(mark_tg_reg));
+}
+
diff --git a/extensions/libxt_HMARK.man b/extensions/libxt_HMARK.man
new file mode 100644
index 0000000..f24ac9b
--- /dev/null
+++ b/extensions/libxt_HMARK.man
@@ -0,0 +1,60 @@
+This module does the same as MARK, i.e. set an fwmark, but the mark is based on a hash value.
+The hash is based on saddr, daddr, sport, dport and proto. The same mark will be produced independet of direction if no masks is set or the same masks is used for src and dest.
+The hash mark could be adjusted by modulus and finally an offset could be added, i.e the final mark will be within a range. If state RELATED is used icmp will be handled also, i.e. hash will be calculated on the original message not the icmp it self.
+Note: None of the parameters effect the packet it self only the calculated hash value.
+.PP
+Parameters:
+For all masks default is all "1:s", to disable a field use mask 0
+For IPv6 it's just the last 32 bits that is included in the hash
+.TP
+\fB\-\-hmark\-smask\fP \fIvalue\fP
+The value to AND the source address with (saddr & value).
+.TP
+\fB\-\-hmark\-dmask\fP \fIvalue\fP
+The value to AND the dest. address with (daddr & value).
+.TP
+\fB\-\-hmark\-sp\-mask\fP \fIvalue\fP
+A 16 bit value to AND the src port with (sport & value).
+.TP
+\fB\-\-hmark\-dp\-mask\fP \fIvalue\fP
+A 16 bit value to AND the dest port with (dport & value).
+.TP
+\fB\-\-hmark\-sp\-set\fP \fIvalue\fP
+A 16 bit value to OR the src port with (sport | value).
+.TP
+\fB\-\-hmark\-dp\-set\fP \fIvalue\fP
+A 16 bit value to OR the dest port with (dport | value).
+.TP
+\fB\-\-hmark\-spi\-mask\fP \fIvalue\fP
+Value to AND the spi field with (spi & value) valid for proto esp or ah.
+.TP
+\fB\-\-hmark\-spi\-set\fP \fIvalue\fP
+Value to OR the spi field with (spi | value) valid for proto esp or ah.
+.TP
+\fB\-\-hmark\-proto\-mask\fP \fIvalue\fP
+An 8 bit value to AND the L4 proto field with (proto & value).
+.TP
+\fB\-\-hmark\-rnd\fP \fIvalue\fP
+A 32 bit initial value for hash calc, default is 0xc175a3b8.
+.PP
+Final processing of the mark in order of execution.
+.TP
+\fB\-\-hmark\-mod\fP \fvalue (must be > 0)\fP
+The easiest way to describe this is:  hash = hash mod <value>
+.TP
+\fB\-\-hmark\-offs\fP \fvalue\fP
+The easiest way to describe this is:  hash = hash + <value>
+.PP
+\fIExamples:\fP
+.PP
+Default rule handles all TCP, UDP, SCTP, ESP & AH
+.IP
+iptables \-t mangle \-A PREROUTING \-m state \-\-state NEW,ESTABLISHED,RELATED
+ \-j HMARK \-\-hmark-offs 10000 \-\-hmark-mod 10
+.PP
+Handle SCTP and hash dest port only and produce a nfmark between 100-119.
+.IP
+iptables \-t mangle \-A PREROUTING -p SCTP \-j HMARK \-\-smask 0 \-\-dmask 0
+ \-\-sp\-mask 0 \-\-offs 100 \-\-mod 20
+.PP
+
diff --git a/include/linux/netfilter/xt_hmark.h b/include/linux/netfilter/xt_hmark.h
new file mode 100644
index 0000000..3f7ecc6
--- /dev/null
+++ b/include/linux/netfilter/xt_hmark.h
@@ -0,0 +1,28 @@
+#ifndef XT_HMARK_H_
+#define XT_HMARK_H_
+
+#include <linux/types.h>
+
+union ports {
+	struct {
+		__u16	src;
+		__u16	dst;
+	} p16;
+	__u32	v32;
+};
+
+struct xt_hmark_info {
+	__u32		smask;		/* Source address mask */
+	__u32		dmask;		/* Dest address mask */
+	union ports	pmask;
+	union ports	pset;
+	__u32		spimask;
+	__u32		spiset;
+	__u16		flags;		/* Print out only */
+	__u16		prmask;		/* L4 Proto mask */
+	__u32		hashrnd;
+	__u32		hmod;		/* Modulus */
+	__u32		hoffs;		/* Offset */
+};
+
+#endif /* XT_HMARK_H_ */
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH] NETFILTER module xt_hmark new target for HASH MARK
From: Hans Schillstrom @ 2011-02-03 13:34 UTC (permalink / raw)
  To: kaber, jengelh, netfilter-devel, netdev; +Cc: hans, Hans Schillstrom
In-Reply-To: <1296740050-6311-1-git-send-email-hans.schillstrom@ericsson.com>

The target allows you to create rules in the "raw" and "mangle" tables
which alter the netfilter mark (nfmark) field within a given range.
First a 32 bit hash value is generated then modulus by <limit> and
finally an offset is added before it's written to nfmark.
Prior to routing, the nfmark can influence the routing method (see
"Use netfilter MARK value as routing key") and can also be used by
other subsystems to change their behavior.

man page
   HMARK
       This  module  does  the same as MARK, i.e. set an fwmark,
       but the mark is based on a hash value.  The hash is based on
       saddr, daddr, sport, dport and proto. The same mark will be produced
       independet of direction if no masks is set or the same masks is used for
       src and dest. The hash mark could be adjusted by modulus and finaly an
       offset could be added, i.e the final mark will be within a range.
       ICMP errors will have hash calc based on the original message.
       Note: None of the parameters effect the packet it self
       only the calculated hash value.

       Parameters: For all masks default is all "1:s", to disable a field
                   use mask 0. For IPv6 it's just the last 32 bits that
                   is included in the hash.

       --hmark-smask value
              The value to AND the source address with (saddr & value).

       --hmark-dmask value
              The value to AND the dest. address with (daddr & value).

       --hmark-sp-mask value
              A 16 bit value to AND the src port with (sport & value).

       --hmark-dp-mask value
              A 16 bit value to AND the dest port with (dport & value).

       --hmark-sp-set value
              A 16 bit value to OR the src port with (sport | value).

       --hmark-dp-set value
              A 16 bit value to OR the dest port with (dport | value).

       --hmark-spi-mask value
              Value to AND the spi field with (spi & value) valid for proto esp or ah.

       --hmark-spi-set value
              Value to OR the spi field with (spi | value) valid for proto esp or ah.

       --hmark-proto-mask value
              A 16 bit value to AND the L4 proto field with (proto & value).

       --hmark-rnd value
              A 32 bit intitial value for hash calc, default is 0xc175a3b8.

       Final processing of the mark in order of execution.

       --hmark-mod value (must be > 0)
              The easiest way to describe this is:  hash = hash mod <value>

       --hmark-offs alue (must be > 0)
              The easiest way to describe this is:  hash = hash + <value>

       Examples:

       Default rule handles all TCP, UDP, SCTP, ESP & AH

              iptables -t mangle -A PREROUTING \
               -j HMARK --hmark-offs 10000 --hmark-mod 10

       Handle SCTP and hash dest port only and produce a nfmark between 100-119.

              iptables -t mangle -A PREROUTING -p SCTP -j HMARK --smask 0 --dmask 0
               --sp-mask 0 --offs 100 --mod 20

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 include/linux/netfilter/xt_hmark.h |   28 ++++
 net/netfilter/Kconfig              |   18 +++
 net/netfilter/Makefile             |    1 +
 net/netfilter/xt_hmark.c           |  245 ++++++++++++++++++++++++++++++++++++
 4 files changed, 292 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_hmark.h
 create mode 100644 net/netfilter/xt_hmark.c

diff --git a/include/linux/netfilter/xt_hmark.h b/include/linux/netfilter/xt_hmark.h
new file mode 100644
index 0000000..3f7ecc6
--- /dev/null
+++ b/include/linux/netfilter/xt_hmark.h
@@ -0,0 +1,28 @@
+#ifndef XT_HMARK_H_
+#define XT_HMARK_H_
+
+#include <linux/types.h>
+
+union ports {
+	struct {
+		__u16	src;
+		__u16	dst;
+	} p16;
+	__u32	v32;
+};
+
+struct xt_hmark_info {
+	__u32		smask;		/* Source address mask */
+	__u32		dmask;		/* Dest address mask */
+	union ports	pmask;
+	union ports	pset;
+	__u32		spimask;
+	__u32		spiset;
+	__u16		flags;		/* Print out only */
+	__u16		prmask;		/* L4 Proto mask */
+	__u32		hashrnd;
+	__u32		hmod;		/* Modulus */
+	__u32		hoffs;		/* Offset */
+};
+
+#endif /* XT_HMARK_H_ */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 82a6e0d..2115079 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -471,6 +471,24 @@ config NETFILTER_XT_TARGET_HL
 	since you can easily create immortal packets that loop
 	forever on the network.
 
+config NETFILTER_XT_TARGET_HMARK
+	tristate '"HMARK" target support'
+	depends on NETFILTER_ADVANCED
+	select IP6_NF_IPTABLES if IPV6
+	---help---
+	This option adds the "HMARK" target.
+
+	The target allows you to create rules in the "raw" and "mangle" tables
+	which alter the netfilter mark (nfmark) field within a given range.
+	First a 32 bit hash value is generated then modulus by <limit> and
+	finally an offset is added before it's written to nfmark.
+
+	Prior to routing, the nfmark can influence the routing method (see
+	"Use netfilter MARK value as routing key") and can also be used by
+	other subsystems to change their behavior.
+
+	The mark match can also be used to match nfmark produced by this module.
+
 config NETFILTER_XT_TARGET_IDLETIMER
 	tristate  "IDLETIMER target support"
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index d57a890..b24a5e6 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -56,6 +56,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_CONNSECMARK) += xt_CONNSECMARK.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_CT) += xt_CT.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_DSCP) += xt_DSCP.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_HL) += xt_HL.o
+obj-$(CONFIG_NETFILTER_XT_TARGET_HMARK) += xt_hmark.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_LED) += xt_LED.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_NFLOG) += xt_NFLOG.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_NFQUEUE) += xt_NFQUEUE.o
diff --git a/net/netfilter/xt_hmark.c b/net/netfilter/xt_hmark.c
new file mode 100644
index 0000000..d4b8257
--- /dev/null
+++ b/net/netfilter/xt_hmark.c
@@ -0,0 +1,245 @@
+/*
+ *	xt_hmark - Netfilter module to set mark as hash value
+ *
+ *	(C) 2010 Hans Schillstrom <hans.schillstrom@ericsson.com>
+ *
+ *	Description:
+ *	This module calculates a hash value that can be modified by modulus
+ *	and an offset. The hash value is based on a direction independent
+ *	five tuple: src & dst addr src & dst ports and protocol.
+ *	However src & dst port can be masked and are not used for fragmented
+ *	packets, ESP and AH don't have ports so SPI will be used instead.
+ *	For ICMP error messages the hash mark values will be calculated on
+ *	the source packet i.e. the packet caused the error (If sufficient
+ *	amount of data exists).
+ *
+ *	This program is free software; you can redistribute it and/or modify
+ *	it under the terms of the GNU General Public License version 2 as
+ *	published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <net/ip.h>
+#include <linux/icmp.h>
+
+#include <linux/netfilter/xt_hmark.h>
+#include <linux/netfilter/x_tables.h>
+
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+#	define WITH_IPV6 1
+#include <net/ipv6.h>
+#include <linux/netfilter_ipv6/ip6_tables.h>
+#endif
+
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Hans Schillstrom <hans.schillstrom@ericsson.com>");
+MODULE_DESCRIPTION("Xtables: packet range mark operations by hash value");
+MODULE_ALIAS("ipt_HMARK");
+MODULE_ALIAS("ip6t_HMARK");
+
+/*
+ * ICMP, get inner header so calc can be made on the source message
+ *       not the icmp header, i.e. same hash mark must be produced
+ *       on an icmp error message.
+ */
+static int get_inner_hdr(struct sk_buff *skb, int iphsz, int nhoff)
+{
+	const struct icmphdr *icmph;
+	struct icmphdr _ih;
+	struct iphdr *iph = NULL;
+
+	/* Not enough header? */
+	icmph = skb_header_pointer(skb, nhoff + iphsz, sizeof(_ih), &_ih);
+	if (icmph == NULL)
+		goto out;
+
+	if (icmph->type > NR_ICMP_TYPES)
+		goto out;
+
+
+	/* Error message? */
+	if (icmph->type != ICMP_DEST_UNREACH &&
+	    icmph->type != ICMP_SOURCE_QUENCH &&
+	    icmph->type != ICMP_TIME_EXCEEDED &&
+	    icmph->type != ICMP_PARAMETERPROB &&
+	    icmph->type != ICMP_REDIRECT)
+		goto out;
+	/* Checkin full IP header plus 8 bytes of protocol to
+	 * avoid additional coding at protocol handlers.
+	 */
+	if (!pskb_may_pull(skb, nhoff + iphsz + sizeof(_ih) + 8))
+		goto out;
+
+	iph = (struct iphdr *)(skb->data + nhoff + iphsz + sizeof(_ih));
+	return nhoff + iphsz + sizeof(_ih);
+out:
+	return nhoff;
+}
+/*
+ * ICMPv6
+ * Input nhoff Offset into network header
+ *       offset where ICMPv6 header starts
+ * Returns offset to icmp embedded msg or nhoff
+ */
+#ifdef WITH_IPV6
+static int get_inner6_hdr(struct sk_buff *skb, int nhoff, int offset)
+{
+	struct icmp6hdr *icmp6h;
+	struct icmp6hdr _ih6;
+
+	icmp6h = skb_header_pointer(skb, offset, sizeof(_ih6), &_ih6);
+	if (icmp6h == NULL)
+		goto out;
+
+	if (icmp6h->icmp6_type && icmp6h->icmp6_type < 128)
+		return offset + sizeof(_ih6);
+
+out:
+	return nhoff;
+}
+#endif
+/*
+ * Calc hash value, special casre is taken on icmp and fragmented messages
+ * i.e. fragmented messages don't use ports.
+ */
+static __u32 get_hash(struct sk_buff *skb, struct xt_hmark_info *info)
+{
+	int nhoff, hash = 0, poff, proto;
+	struct iphdr *ip;
+	u8 ip_proto;
+	u32 addr1, addr2, ihl;
+	union {
+		u32 v32;
+		u16 v16[2];
+	} ports;
+
+	nhoff = skb_network_offset(skb);
+	proto = skb->protocol;
+
+	if (!proto && skb->sk) {
+		if (skb->sk->sk_family == AF_INET)
+			proto = __constant_htons(ETH_P_IP);
+		else if (skb->sk->sk_family == AF_INET6)
+			proto = __constant_htons(ETH_P_IPV6);
+	}
+
+	switch (proto) {
+	case __constant_htons(ETH_P_IP):
+		if (!pskb_may_pull(skb, sizeof(*ip) + nhoff))
+			goto done;
+
+		ip = (struct iphdr *) (skb->data + nhoff);
+		if (ip->protocol == IPPROTO_ICMP) {
+			/* Switch hash calc to inner header ? */
+			nhoff = get_inner_hdr(skb, ip->ihl * 4, nhoff);
+			ip = (struct iphdr *) (skb->data + nhoff);
+		}
+
+		if (ip->frag_off & htons(IP_MF | IP_OFFSET))
+			ip_proto = 0;
+		else
+			ip_proto = ip->protocol;
+
+		addr1 = (__force u32) ip->saddr & info->smask;
+		addr2 = (__force u32) ip->daddr & info->dmask;
+		ihl = ip->ihl;
+		break;
+#ifdef WITH_IPV6
+	case __constant_htons(ETH_P_IPV6):
+	{
+		struct ipv6hdr *ip6;
+		int ptr;
+
+		if (!pskb_may_pull(skb, sizeof(*ip6) + nhoff))
+			goto done;
+
+		ip6 = (struct ipv6hdr *) (skb->data + nhoff);
+
+		/* if (ip6->nexthdr == IPPROTO_ICMPV6) { */
+		if (ipv6_find_hdr(skb, &ptr, IPPROTO_ICMPV6, NULL) >= 0) {
+			nhoff = get_inner6_hdr(skb, nhoff, ptr);
+			ip6 = (struct ipv6hdr *) (skb->data + nhoff);
+		}
+		if (ipv6_find_hdr(skb, &ptr, NEXTHDR_FRAGMENT, NULL) < 0)
+			ip_proto = ip6->nexthdr;
+		else
+			ip_proto = 0;	/* It's a fragment */
+
+		addr1 = (__force u32) ip6->saddr.s6_addr32[3];
+		addr2 = (__force u32) ip6->daddr.s6_addr32[3];
+		ihl = (40 >> 2);
+		break;
+	}
+#endif
+	default:
+		goto done;
+	}
+
+	ports.v32 = 0;
+	poff = proto_ports_offset(ip_proto);
+	nhoff += ihl * 4 + poff;
+	if (poff >= 0 && pskb_may_pull(skb, nhoff + 4)) {
+		ports.v32 = * (__force u32 *) (skb->data + nhoff);
+		if (ip_proto == IPPROTO_ESP || ip_proto == IPPROTO_AH) {
+			ports.v32 = (ports.v32 & info->spimask) | info->spiset;
+		} else {
+			ports.v32 = (ports.v32 & info->pmask.v32) |
+				    info->pset.v32;
+			if (ports.v16[1] < ports.v16[0])
+				swap(ports.v16[0], ports.v16[1]);
+		}
+	}
+	ip_proto &= info->prmask;
+	/* get a consistent hash (same value on both flow directions) */
+	if (addr2 < addr1)
+		swap(addr1, addr2);
+
+	hash = jhash_3words(addr1, addr2, ports.v32, info->hashrnd) ^ ip_proto;
+	if (!hash)
+		hash = 1;
+
+	return hash;
+
+done:
+	return 0;
+}
+
+static unsigned int
+hmark_tg(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	struct xt_hmark_info *info = (struct xt_hmark_info *)par->targinfo;
+	__u32 hash = get_hash(skb, info);
+
+	if (info->hmod && hash)
+		skb->mark = (hash % info->hmod) + info->hoffs;
+	return XT_CONTINUE;
+}
+
+static struct xt_target hmark_tg_reg __read_mostly = {
+	.name           = "HMARK",
+	.revision       = 0,
+	.family         = NFPROTO_UNSPEC,
+	.target         = hmark_tg,
+	.targetsize     = sizeof(struct xt_hmark_info),
+	.me             = THIS_MODULE,
+};
+
+static int __init hmark_mt_init(void)
+{
+	int ret;
+
+	ret = xt_register_target(&hmark_tg_reg);
+	if (ret < 0)
+		return ret;
+	return 0;
+}
+
+static void __exit hmark_mt_exit(void)
+{
+	xt_unregister_target(&hmark_tg_reg);
+}
+
+module_init(hmark_mt_init);
+module_exit(hmark_mt_exit);
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH 0/2] NETFILTER new target module, HMARK
From: Hans Schillstrom @ 2011-02-03 13:34 UTC (permalink / raw)
  To: kaber, jengelh, netfilter-devel, netdev; +Cc: hans, Hans Schillstrom

The target allows you to create rules in the "raw" and "mangle" tables
which alter the netfilter mark (nfmark) field within a given range.
First a 32 bit hash value is generated then modulus by <limit> and
finally an offset is added before it's written to nfmark.
Prior to routing, the nfmark can influence the routing method (see
"Use netfilter MARK value as routing key") and can also be used by
other subsystems to change their behaviour.

The mark match can also be used to match nfmark produced by this module.
See the kernel module for more info.

REVISION

This is version 1

Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply

* Re: [GIT PULL] IPVS SCTP lock fix
From: Patrick McHardy @ 2011-02-03 12:10 UTC (permalink / raw)
  To: Simon Horman; +Cc: netdev, linux-next, linux-kernel, lvs-devel, Randy Dunlap
In-Reply-To: <1296733925-23131-1-git-send-email-horms@verge.net.au>

Am 03.02.2011 12:52, schrieb Simon Horman:
> Hi Patrick,
> 
> please consider pulling
> git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-test-2.6.git for-patrick
> which contains a single patch to the IPVS SCTP module so that
> it uses the correct lock.

Pulled, thanks Simon.

^ permalink raw reply

* [GIT PULL] IPVS SCTP lock fix
From: Simon Horman @ 2011-02-03 11:52 UTC (permalink / raw)
  To: netdev, linux-next, linux-kernel, lvs-devel; +Cc: Randy Dunlap

Hi Patrick,

please consider pulling
git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-test-2.6.git for-patrick
which contains a single patch to the IPVS SCTP module so that
it uses the correct lock.


^ permalink raw reply

* [PATCH] IPVS: Use correct lock in SCTP module
From: Simon Horman @ 2011-02-03 11:52 UTC (permalink / raw)
  To: netdev, linux-next, linux-kernel, lvs-devel
  Cc: Randy Dunlap, Simon Horman, Hans Schillstrom
In-Reply-To: <1296733925-23131-1-git-send-email-horms@verge.net.au>

Use sctp_app_lock instead of tcp_app_lock in the SCTP protocol module.

This appears to be a typo introduced by the netns changes.

Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 net/netfilter/ipvs/ip_vs_proto_sctp.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_proto_sctp.c b/net/netfilter/ipvs/ip_vs_proto_sctp.c
index fb2d04a..b027ccc 100644
--- a/net/netfilter/ipvs/ip_vs_proto_sctp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_sctp.c
@@ -1101,7 +1101,7 @@ static void __ip_vs_sctp_init(struct net *net, struct ip_vs_proto_data *pd)
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
 	ip_vs_init_hash_table(ipvs->sctp_apps, SCTP_APP_TAB_SIZE);
-	spin_lock_init(&ipvs->tcp_app_lock);
+	spin_lock_init(&ipvs->sctp_app_lock);
 	pd->timeout_table = ip_vs_create_timeout_table((int *)sctp_timeouts,
 							sizeof(sctp_timeouts));
 }
-- 
1.7.2.3

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox