Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev
From: Hannes Frederic Sowa @ 2015-01-07 11:23 UTC (permalink / raw)
  To: Shrijeet Mukherjee
  Cc: Scott Feldman, Netdev, Jiří Pírko, john fastabend,
	Thomas Graf, Jamal Hadi Salim, Andy Gospodarek, Roopa Prabhu
In-Reply-To: <98433f41e1876041471cdb513ca53fe5@mail.gmail.com>

On Di, 2015-01-06 at 18:08 -0800, Shrijeet Mukherjee wrote:
> >For the first idea, I'll try to make an example:
> >
> >Initial setup:
> ># ip rule ls
> >0:	from all lookup local
> >32766:	from all lookup main
> >32767:	from all lookup default
> >
> ># ip rule add pref 100 iif swdev0 table 5 # ip rule ls
> >0:	from all lookup local
> >100:	from all iif swdev0 [detached] lookup 5
> >> maybe we can show which rules are being able to get offloaded here
> >32766:	from all lookup main
> >32767:	from all lookup default
> >
> >table 5 should be the table we can insert routes into which are offloaded
> >to
> >hardware.
> >
> >During table modifications we linearly scan the rules if we find selectors
> >which
> >cannot be represented by hardware.
> >
> >In case we have a iif selector, we simply can use this table and just
> >synthesize it
> >into the particular interface.
> >
> >A ip-rule-from would need all the hardware being capable of matching source
> >addresses, otherwise we cannot offload all routing tables with higher
> >preference,
> >same for a to/tos rule. If we encounter a fwmark rule, we certainly cannot
> >represent it in hardware, so skip it (here we can think about entangling
> >those with
> >ACLs, but it feels hard to do).
> >
> >If rules are inserted or changed we must again validate the complete list
> >of rules
> >and decide if we need to flush all the routes and install a slow path via
> >kernel.
> >
> >What do you think? Does that make sense? I could try to come up with an API
> >for
> >that. ;)
> >
> 
> This sounds really good, but I suspect the real problem is the case where
> the rule evaluation is in the hardware path right. If it is purely IF based
> there is no issue .. but any other policy like missed in table 1, then use
> table 2 will not work with this model .. or did I miss something ?

I could come up with several ways how to model hardware. Depending on
that the integration with rules is easy or nearly impossible:

1) it simply cannot deal with ip rules, so there is no way an ACL can
influence the outcome of a routing table lookup - if the feature should
be used, it has to use the slow-path in the kernel.

2) ACLs can influence which routing table will get queried - this sounds
very much like the ip rule model and it seems not too hard to model
that.

3) Routing implementations in the hardware have a single routing table
and the leafs carry different actions with priorities: making this kind
of model working with the ip rule concept will become very difficult and
it might require lots of algorithmic code by every driver to adapt to a
single API provided by Linux. It might be possible, if the hardware
provides actions like backtrack and retrack and can keep state of
priorities during walking the tree, I really doubt that.

Implementations of type 3) would look naturally to do in hardware (see
different Cisco policy routing configurations or ipv6 subtree feature),
so it seems it won't be possible to find a simple way to fuse rules and
offloading in case of point 3).

Rocker sounds a lot like model 2) and this seems possible and should be
a matter of API design. It should merely be a matter of nicely model the
data structures. ;)

Also, @Scott: if you build drivers with l3 offloading as modules, don't
you need to push the full routing tables to the hw once? Maybe we should
think about the drivers pulling routing information from the kernel, the
kernel only notifying something changed?

Bye,
Hannes

^ permalink raw reply

* [PATCH iproute2 3/3] ip netns: Delete all netns
From: Vadim Kochan @ 2015-01-07 11:04 UTC (permalink / raw)
  To: netdev; +Cc: Vadim Kochan
In-Reply-To: <1420628662-9930-1-git-send-email-vadim4j@gmail.com>

From: Vadim Kochan <vadim4j@gmail.com>

Allow delete all namespace names by:

    $ ip netns del all

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
---
 ip/ipnetns.c        | 24 +++++++++++++++---------
 man/man8/ip-netns.8 | 12 ++++++++++--
 2 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/ip/ipnetns.c b/ip/ipnetns.c
index b5a6f57..20707b8 100644
--- a/ip/ipnetns.c
+++ b/ip/ipnetns.c
@@ -274,18 +274,11 @@ static int netns_identify(int argc, char **argv)
 
 }
 
-static int netns_delete(int argc, char **argv)
+static int on_netns_del(char *nsname, void *arg)
 {
-	const char *name;
 	char netns_path[MAXPATHLEN];
 
-	if (argc < 1) {
-		fprintf(stderr, "No netns name specified\n");
-		return -1;
-	}
-
-	name = argv[0];
-	snprintf(netns_path, sizeof(netns_path), "%s/%s", NETNS_RUN_DIR, name);
+	snprintf(netns_path, sizeof(netns_path), "%s/%s", NETNS_RUN_DIR, nsname);
 	umount2(netns_path, MNT_DETACH);
 	if (unlink(netns_path) < 0) {
 		fprintf(stderr, "Cannot remove namespace file \"%s\": %s\n",
@@ -295,6 +288,19 @@ static int netns_delete(int argc, char **argv)
 	return 0;
 }
 
+static int netns_delete(int argc, char **argv)
+{
+	if (argc < 1) {
+		fprintf(stderr, "No netns name specified\n");
+		return -1;
+	}
+
+	if (strcmp(argv[0], "all") == 0)
+		return netns_foreach(on_netns_del, NULL);
+
+	return on_netns_del(argv[0], NULL);
+}
+
 static int create_netns_dir(void)
 {
 	/* Create the base netns directory if it doesn't exist */
diff --git a/man/man8/ip-netns.8 b/man/man8/ip-netns.8
index 70ea4f0..e56068e 100644
--- a/man/man8/ip-netns.8
+++ b/man/man8/ip-netns.8
@@ -16,10 +16,14 @@ ip-netns \- process network namespace management
 .BR "ip netns" " { " list " } "
 
 .ti -8
-.BR "ip netns" " { " add " | " delete " } "
+.B ip netns add
 .I NETNSNAME
 
 .ti -8
+.B ip netns del
+.RI "{ " NETNSNAME " | " all " }"
+
+.ti -8
 .BR "ip netns identify"
 .RI "[ " PID " ]"
 
@@ -76,7 +80,7 @@ If NAME is available in /var/run/netns/ this command creates a new
 network namespace and assigns NAME.
 
 .TP
-.B ip netns delete NAME - delete the name of a network namespace
+.B ip netns delete { NAME | all } - delete the name of a network namespace(s)
 .sp
 If NAME is present in /var/run/netns it is umounted and the mount
 point is removed.  If this is the last user of the network namespace the
@@ -84,6 +88,10 @@ network namespace will be freed, otherwise the network namespace
 persists until it has no more users.  ip netns delete may fail if
 the mount point is in use in another mount namespace.
 
+If
+.B all
+was specified then all the network namespace names will be removed.
+
 .TP
 .B ip netns identify [PID] - Report network namespaces names for process
 .sp
-- 
2.1.3

^ permalink raw reply related

* [PATCH iproute2 2/3] ip netns: Allow exec on each netns
From: Vadim Kochan @ 2015-01-07 11:04 UTC (permalink / raw)
  To: netdev; +Cc: Vadim Kochan
In-Reply-To: <1420628662-9930-1-git-send-email-vadim4j@gmail.com>

From: Vadim Kochan <vadim4j@gmail.com>

This change allows to exec some cmd on each
netns (except default) by specifying 'all' as netns name:

    # ip netns exec all ip link

Each command executes synchronously.

Exit status is not considered, so there might be a case
that some CMD can fail on some netns but success on the other.

EXAMPLES:

1) Show link info on all netns:

$ ip netns exec all ip link

netns: test_net
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
4: tap0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 500
    link/ether 1a:19:6f:25:eb:85 brd ff:ff:ff:ff:ff:ff

netns: home0
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
4: tap0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 500
    link/ether ea:1a:59:40:d3:29 brd ff:ff:ff:ff:ff:ff

netns: lan0
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
4: tap0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 500
    link/ether ce:49:d5:46:81:ea brd ff:ff:ff:ff:ff:ff

2) Set UP tap0 device for the all netns:

$ ip netns exec all ip link set dev tap0 up

netns: test_net

netns: home0

netns: lan0

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
---
 ip/ipnetns.c        | 72 ++++++++++++++++++++++++++++++++---------------------
 man/man8/ip-netns.8 | 14 +++++++++--
 2 files changed, 55 insertions(+), 31 deletions(-)

diff --git a/ip/ipnetns.c b/ip/ipnetns.c
index 519d518..b5a6f57 100644
--- a/ip/ipnetns.c
+++ b/ip/ipnetns.c
@@ -26,7 +26,7 @@ static int usage(void)
 	fprintf(stderr, "       ip netns delete NAME\n");
 	fprintf(stderr, "       ip netns identify [PID]\n");
 	fprintf(stderr, "       ip netns pids NAME\n");
-	fprintf(stderr, "       ip netns exec NAME cmd ...\n");
+	fprintf(stderr, "       ip netns exec { NAME | all } cmd ...\n");
 	fprintf(stderr, "       ip netns monitor\n");
 	exit(-1);
 }
@@ -66,29 +66,10 @@ static int netns_list(int argc, char **argv)
 	return 0;
 }
 
-static int netns_exec(int argc, char **argv)
+static int cmd_exec(const char *cmd, char **argv, bool do_fork)
 {
-	/* Setup the proper environment for apps that are not netns
-	 * aware, and execute a program in that environment.
-	 */
-	const char *cmd;
-
-	if (argc < 1) {
-		fprintf(stderr, "No netns name specified\n");
-		return -1;
-	}
-	if (argc < 2) {
-		fprintf(stderr, "No command specified\n");
-		return -1;
-	}
-	cmd = argv[1];
-
-	if (netns_switch(argv[0]))
-		return -1;
-
 	fflush(stdout);
-
-	if (batch_mode) {
+	if (do_fork) {
 		int status;
 		pid_t pid;
 
@@ -106,23 +87,56 @@ static int netns_exec(int argc, char **argv)
 			}
 
 			if (WIFEXITED(status)) {
-				/* ip must return the status of the child,
-				 * but do_cmd() will add a minus to this,
-				 * so let's add another one here to cancel it.
-				 */
-				return -WEXITSTATUS(status);
+				return WEXITSTATUS(status);
 			}
 
 			exit(1);
 		}
 	}
 
-	if (execvp(cmd, argv + 1)  < 0)
+	if (execvp(cmd, argv)  < 0)
 		fprintf(stderr, "exec of \"%s\" failed: %s\n",
-			cmd, strerror(errno));
+				cmd, strerror(errno));
 	_exit(1);
 }
 
+static int on_netns_exec(char *nsname, void *arg)
+{
+	char **argv = arg;
+	cmd_exec(argv[1], argv + 1, 1);
+	return 0;
+}
+
+static int netns_exec(int argc, char **argv)
+{
+	/* Setup the proper environment for apps that are not netns
+	 * aware, and execute a program in that environment.
+	 */
+	const char *cmd;
+
+	if (argc < 1) {
+		fprintf(stderr, "No netns name specified\n");
+		return -1;
+	}
+	if (argc < 2) {
+		fprintf(stderr, "No command specified\n");
+		return -1;
+	}
+	cmd = argv[1];
+
+	if (strcmp(argv[0], "all") == 0)
+		return do_each_netns(on_netns_exec, argv, 1);
+
+	if (netns_switch(argv[0]))
+		return -1;
+
+	/* ip must return the status of the child,
+	 * but do_cmd() will add a minus to this,
+	 * so let's add another one here to cancel it.
+	 */
+	return -cmd_exec(cmd, argv + 1, batch_mode);
+}
+
 static int is_pid(const char *str)
 {
 	int ch;
diff --git a/man/man8/ip-netns.8 b/man/man8/ip-netns.8
index 74343ed..70ea4f0 100644
--- a/man/man8/ip-netns.8
+++ b/man/man8/ip-netns.8
@@ -29,7 +29,7 @@ ip-netns \- process network namespace management
 
 .ti -8
 .BR "ip netns exec "
-.I NETNSNAME command ...
+.RI "{ " NETNSNAME " | " all " } " command ...
 
 .ti -8
 .BR "ip netns monitor"
@@ -98,7 +98,7 @@ This command walks through proc and finds all of the process who have
 the named network namespace as their primary network namespace.
 
 .TP
-.B ip netns exec NAME cmd ... - Run cmd in the named network namespace
+.B ip netns exec { NAME | all } cmd ... - Run cmd in the named network namespace
 .sp
 This command allows applications that are network namespace unaware
 to be run in something other than the default network namespace with
@@ -107,6 +107,16 @@ in the customary global locations.  A network namespace and bind mounts
 are used to move files from their network namespace specific location
 to their default locations without affecting other processes.
 
+If
+.B all
+was specified then
+.B cmd
+will be executed synchronously on the each named network namespace even if
+.B cmd
+fails on some of them. Network namespace name is printed on each
+.B cmd
+executing.
+
 .TP
 .B ip netns monitor - Report as network namespace names are added and deleted
 .sp
-- 
2.1.3

^ permalink raw reply related

* [PATCH iproute2 1/3] lib: Exec func on each netns
From: Vadim Kochan @ 2015-01-07 11:04 UTC (permalink / raw)
  To: netdev; +Cc: Vadim Kochan
In-Reply-To: <1420628662-9930-1-git-send-email-vadim4j@gmail.com>

From: Vadim Kochan <vadim4j@gmail.com>

Added possibility to run some func on each netns.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
---
 include/namespace.h |  6 ++++++
 include/utils.h     |  5 +++++
 lib/namespace.c     | 22 ++++++++++++++++++++++
 lib/utils.c         | 28 ++++++++++++++++++++++++++++
 4 files changed, 61 insertions(+)

diff --git a/include/namespace.h b/include/namespace.h
index 2f13e65..1a7e066 100644
--- a/include/namespace.h
+++ b/include/namespace.h
@@ -42,5 +42,11 @@ static int setns(int fd, int nstype)
 #endif /* HAVE_SETNS */
 
 extern int netns_switch(char *netns);
+extern int netns_foreach(int (*func)(char *nsname, void *arg), void *arg);
+
+struct netns_func {
+	int (*func)(char *nsname, void *arg);
+	void *arg;
+};
 
 #endif /* __NAMESPACE_H__ */
diff --git a/include/utils.h b/include/utils.h
index eecbc39..3196561 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -5,6 +5,7 @@
 #include <asm/types.h>
 #include <resolv.h>
 #include <stdlib.h>
+#include <stdbool.h>
 
 #include "libnetlink.h"
 #include "ll_map.h"
@@ -160,4 +161,8 @@ struct iplink_req;
 int iplink_parse(int argc, char **argv, struct iplink_req *req,
 		char **name, char **type, char **link, char **dev,
 		int *group, int *index);
+
+extern int do_each_netns(int (*func)(char *nsname, void *arg), void *arg,
+		bool show_label);
+
 #endif /* __UTILS_H__ */
diff --git a/lib/namespace.c b/lib/namespace.c
index 1554ce0..a0f7c6d 100644
--- a/lib/namespace.c
+++ b/lib/namespace.c
@@ -84,3 +84,25 @@ int netns_switch(char *name)
 	bind_etc(name);
 	return 0;
 }
+
+int netns_foreach(int (*func)(char *nsname, void *arg), void *arg)
+{
+	DIR *dir;
+	struct dirent *entry;
+
+	dir = opendir(NETNS_RUN_DIR);
+	if (!dir)
+		return -1;
+
+	while ((entry = readdir(dir)) != NULL) {
+		if (strcmp(entry->d_name, ".") == 0)
+			continue;
+		if (strcmp(entry->d_name, "..") == 0)
+			continue;
+		if (func(entry->d_name, arg))
+			break;
+	}
+
+	closedir(dir);
+	return 0;
+}
diff --git a/lib/utils.c b/lib/utils.c
index 64915f3..e3ca11b 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -31,6 +31,7 @@
 
 
 #include "utils.h"
+#include "namespace.h"
 
 int timestamp_short = 0;
 
@@ -868,3 +869,30 @@ int inet_get_addr(const char *src, __u32 *dst, struct in6_addr *dst6)
 	else
 		return inet_pton(AF_INET, src, dst);
 }
+
+static int on_netns(char *nsname, void *arg)
+{
+	struct netns_func *f = arg;
+
+	if (netns_switch(nsname))
+		return -1;
+
+	return f->func(nsname, f->arg);
+}
+
+static int on_netns_label(char *nsname, void *arg)
+{
+	printf("\nnetns: %s\n", nsname);
+	return on_netns(nsname, arg);
+}
+
+int do_each_netns(int (*func)(char *nsname, void *arg), void *arg,
+		bool show_label)
+{
+	struct netns_func nsf = { .func = func, .arg = arg };
+
+	if (show_label)
+		return netns_foreach(on_netns_label, &nsf);
+
+	return netns_foreach(on_netns, &nsf);
+}
-- 
2.1.3

^ permalink raw reply related

* [PATCH iproute2 0/3] ip netns: Run over all netns
From: Vadim Kochan @ 2015-01-07 11:04 UTC (permalink / raw)
  To: netdev; +Cc: Vadim Kochan

From: Vadim Kochan <vadim4j@gmail.com>

Allow 'ip netns del' and 'ip netns exec' run over each network namespace names.

'ip netns exec' executes command forcely on eacn nsname.

Vadim Kochan (3):
  lib: Exec func on each netns
  ip netns: Allow exec on each netns
  ip netns: Delete all netns

 include/namespace.h |  6 ++++
 include/utils.h     |  5 +++
 ip/ipnetns.c        | 96 ++++++++++++++++++++++++++++++++---------------------
 lib/namespace.c     | 22 ++++++++++++
 lib/utils.c         | 28 ++++++++++++++++
 man/man8/ip-netns.8 | 26 ++++++++++++---
 6 files changed, 141 insertions(+), 42 deletions(-)

-- 
2.1.3

^ permalink raw reply

* Re: [PATCH 2/6] vxlan: Group Policy extension
From: Thomas Graf @ 2015-01-07 11:10 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Jesse Gross, Stephen Hemminger, Pravin Shelar,
	netdev@vger.kernel.org, dev@openvswitch.org
In-Reply-To: <CAADnVQ+EO34qYbK+sji-Da3CMkn+-V-9Uvdar_WgxRQ4_+JFXA@mail.gmail.com>

On 01/06/15 at 07:37pm, Alexei Starovoitov wrote:
> Even it works ok, I think this struct layout is ugly.
> imo would be much easier to read if you replace
> the whole vxlanhdr with vxlanhdr_gbp
> or split vxlanhdr into two 32-bit structs.
> then __packed hacks won't be needed.

The main reason why I merged it into vxlanhdr is for documentation
purposes and to avoid duplicating the generic VXLAN header for every
extension. The RCO and GPE extensions would need to duplicate this
over and over. It gets messy in particular when multiple extensions
can be used in combination (such as GBP and RCO) which then each
have their own conflicting header definitions. This way, it is clear
which extensions are compatible by just looking at the definition
of the structure.

That said, I'm not married to this idea.

^ permalink raw reply

* Re: [PATCH] wireless extensions should default to Y
From: Johannes Berg @ 2015-01-07 11:06 UTC (permalink / raw)
  To: Pavel Machek
  Cc: jikos-AlSwsSmVLrQ, kernel list, Linus Torvalds,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20150107110323.GB7368@amd>

On Wed, 2015-01-07 at 12:03 +0100, Pavel Machek wrote:
> On Wed 2015-01-07 11:41:55, Johannes Berg wrote:
> > On Wed, 2015-01-07 at 11:19 +0100, Pavel Machek wrote:
> > > Do we need following patch on top of
> > > 24a0aa212ee2dbe44360288684478d76a8e20a0a revert?
> > > 
> > > I updated kernel today, and (probably because extensions were not
> > > selectable before), the default choice was "N", which is wrong:
> > > oldconfig should result in compatible choices being made, for example
> > > to help bisect.
> > 
> > I don't believe we need this. It has been defaulting to N for a long
> > time, it's just that it got thrown out of your config due to building
> > inbetween the patch and its revert. Had you built only before and after
> > that wouldn't be an issue.
> 
> Well, I clearly hit the issue. If someone had _not_ build in between,
> the "default y" does not change anything for him, as he will not be
> asked thequestion. If someone starts config from scratch, Y is the
> safe answer.

We've long stated that if really starting from scratch it's the wrong
thing to do and had a default N, so I don't really know why we'd revert
that only to make a short period of bisect slightly happier?

I really don't want to merge this patch and encourage more people to
build with wext enabled.

johannes

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] wireless extensions should default to Y
From: Pavel Machek @ 2015-01-07 11:03 UTC (permalink / raw)
  To: Johannes Berg
  Cc: jikos, kernel list, Linus Torvalds, davem, linux-wireless, netdev
In-Reply-To: <1420627315.3407.2.camel@sipsolutions.net>

On Wed 2015-01-07 11:41:55, Johannes Berg wrote:
> On Wed, 2015-01-07 at 11:19 +0100, Pavel Machek wrote:
> > Do we need following patch on top of
> > 24a0aa212ee2dbe44360288684478d76a8e20a0a revert?
> > 
> > I updated kernel today, and (probably because extensions were not
> > selectable before), the default choice was "N", which is wrong:
> > oldconfig should result in compatible choices being made, for example
> > to help bisect.
> 
> I don't believe we need this. It has been defaulting to N for a long
> time, it's just that it got thrown out of your config due to building
> inbetween the patch and its revert. Had you built only before and after
> that wouldn't be an issue.

Well, I clearly hit the issue. If someone had _not_ build in between,
the "default y" does not change anything for him, as he will not be
asked thequestion. If someone starts config from scratch, Y is the
safe answer.

> If we default to Y now we send the wrong signal. If you really need to
> bisect something wext related you have far bigger issues I'd think, the
> code hasn't changed *forever*.

I am woried about bisecting something unrelated, and then wext coming
and breaking the bisect. But you are right that it will break the
bisect, anyway...

                                                                        Pavel
									
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply

* Re: [PATCH 2/6] vxlan: Group Policy extension
From: Thomas Graf @ 2015-01-07 11:01 UTC (permalink / raw)
  To: David Laight
  Cc: 'Alexei Starovoitov', David S. Miller, Jesse Gross,
	Stephen Hemminger, Pravin Shelar, netdev@vger.kernel.org,
	dev@openvswitch.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D1CAC2740@AcuExch.aculab.com>

On 01/07/15 at 10:03am, David Laight wrote:
> From: Alexei Starovoitov
> > On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> > > +struct vxlan_gbp {
> > > +#ifdef __LITTLE_ENDIAN_BITFIELD
> > > +       __u8    reserved_flags1:3,
> > ...
> > > +       __be16 policy_id;
> > > +} __packed;
> > 
> > are you sure that compiler will be smart enough
> > to do single short load when you pack
> > u8 + struct vxlan_gbp inside struct vxlanhdr ?
> > I suspect compiler will use two byte loads
> > with shifts and ors every time to access policy_id.
> > Even it works ok, I think this struct layout is ugly.
> > imo would be much easier to read if you replace
> > the whole vxlanhdr with vxlanhdr_gbp
> > or split vxlanhdr into two 32-bit structs.
> > then __packed hacks won't be needed.

If I read objdump output correctly, gcc seems fine with it:

        /* For backwards compatibility, only allow reserved fields to be
         * used by VXLAN extensions if explicitly requested.
         */
        if (vs->exts) {
                if (!vxh->vni_present)
    2640:       41 0f b6 55 08          movzbl 0x8(%r13),%edx
    2645:       f6 c2 08                test   $0x8,%dl
    2648:       74 c2                   je     260c <vxlan_udp_encap_recv+0x9c>
    [...]
                        md.gbp = ntohs(vxh->gbp.policy_id);
    2652:       41 0f b7 55 0a          movzwl 0xa(%r13),%edx

Let me know what I have to do/provide to validate this properly.

> Also, if you are writing the values then you need to write
> all the members of the bitfield in order to get a single write.
> 
> Basically you are much better off using explicit masks.

I went back and forth on this and chose to use individual bit fields
and map them to a static bit definition which is exported via Netlink.
That way the user space Netlink interface remains stable should the
wire protocol ever change. Yes, this implies some branching which could
be avoided right now as long as user and wire protocol are identical. I
did not observe any performance differences in benchmarks though.

^ permalink raw reply

* Re: ipv6: oops in datagram.c line 260
From: Hannes Frederic Sowa @ 2015-01-07 10:45 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: Chris Ruehl, netdev, davem
In-Reply-To: <20150107072254.GA13046@secunet.com>

On Mi, 2015-01-07 at 08:22 +0100, Steffen Klassert wrote:
> On Tue, Jan 06, 2015 at 05:01:13PM +0100, Hannes Frederic Sowa wrote:
> > 
> > xfrm6_output_finish unconditionally resets skb->protocol so we try to
> > dispatch to the IPv6 handler, even though tcp just sends an IPv4 packet.
> 
> Maybe we better dispatch based on sk->sk_family, this should give
> always the right address family of the socket.

The original problem was dealing with IPv4/v6 mapped traffic. Processing
local errors from unconnected UDP sockets which are emitting both IPv4
and IPv6 frames won't play nicely with sk->sk_family I fear.

Bye,
Hannes

^ permalink raw reply

* Re: [PATCH net] ipv6: Prevent ipv6_find_hdr() from returning ENOENT for valid non-first fragments
From: Hannes Frederic Sowa @ 2015-01-07 10:43 UTC (permalink / raw)
  To: Rahul Sharma; +Cc: Pablo Neira Ayuso, netdev, linux-kernel, netfilter-devel
In-Reply-To: <CAFB3abwKpR1VqZXUrsexqveAZrA2LTSmTe8a=-H7z2HjAptjwA@mail.gmail.com>

Hi,

On Mi, 2015-01-07 at 11:11 +0530, Rahul Sharma wrote:
> On Wed, Jan 7, 2015 at 4:17 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > On Wed, Jan 07, 2015 at 03:03:20AM +0530, Rahul Sharma wrote:
> >> ipv6_find_hdr() currently assumes that the next-header field in the
> >> fragment header of the non-first fragment is the "protocol number of
> >> the last header" (here last header excludes any extension header
> >> protocol numbers ) which is incorrect as per RFC2460. The next-header
> >> value is the first header of the fragmentable part of the original
> >> packet (which can be extension header as well).
> >> This can create reassembly problems. For example: Fragmented
> >> authenticated OSPFv3 packets (where AH header is inserted before the
> >> protocol header). For the second fragment, the next header value in
> >> the fragment header will be NEXTHDR_AUTH which is correct but
> >> ipv6_find_hdr will return ENOENT since AH is an extension header
> >> resulting in second fragment getting dropped. This check for the
> >> presence of non-extension header needs to be removed.
> >>
> >> Signed-off-by: Rahul Sharma <rsharma@arista.com>
> >> ---
> >> --- linux-3.18.1/net/ipv6/exthdrs_core.c.orig   2015-01-06
> >> 10:25:36.411419863 -0800
> >> +++ linux-3.18.1/net/ipv6/exthdrs_core.c        2015-01-06
> >> 10:51:45.819364986 -0800
> >> @@ -171,10 +171,11 @@ EXPORT_SYMBOL_GPL(ipv6_find_tlv);
> >>   * If the first fragment doesn't contain the final protocol header or
> >>   * NEXTHDR_NONE it is considered invalid.
> >>   *
> >> - * Note that non-1st fragment is special case that "the protocol number
> >> - * of last header" is "next header" field in Fragment header. In this case,
> >> - * *offset is meaningless and fragment offset is stored in *fragoff if fragoff
> >> - * isn't NULL.
> >> + * Note that non-1st fragment is special case that "the protocol number of the
> >> + * first header of the fragmentable part of the original packet" is
> >> + * "next header" field in the Fragment header. In this case, *offset is
> >> + * meaningless and fragment offset is stored in *fragoff if fragoff isn't
> >> + * NULL.
> >>   *
> >>   * if flags is not NULL and it's a fragment, then the frag flag
> >>   * IP6_FH_F_FRAG will be set. If it's an AH header, the
> >> @@ -250,9 +251,7 @@ int ipv6_find_hdr(const struct sk_buff *
> >>
> >>                         _frag_off = ntohs(*fp) & ~0x7;
> >>                         if (_frag_off) {
> >> -                               if (target < 0 &&
> >> -                                   ((!ipv6_ext_hdr(hp->nexthdr)) ||
> >
> > This check assumes that the following headers cannot show up in the
> > fragmented part of the IPv6 packet:
> >
> >  12 bool ipv6_ext_hdr(u8 nexthdr)
> >  13 {
> >  14         /*
> >  15          * find out if nexthdr is an extension header or a protocol
> >  16          */
> >  17         return   (nexthdr == NEXTHDR_HOP)       ||
> >  18                  (nexthdr == NEXTHDR_ROUTING)   ||
> >  19                  (nexthdr == NEXTHDR_FRAGMENT)  ||
> >  20                  (nexthdr == NEXTHDR_AUTH)      ||
> >  21                  (nexthdr == NEXTHDR_NONE)      ||
> >  22                  (nexthdr == NEXTHDR_DEST);
> >
> >> -                                    hp->nexthdr == NEXTHDR_NONE)) {
> >> +                               if (target < 0) {
> >>                                         if (fragoff)
> >>                                                 *fragoff = _frag_off;
> >>                                         return hp->nexthdr;
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> I think this is incorrect. Authentication header shows up in the
> fragmentable part of the original IPv6 packet. So, for the non-first
> fragments the next-header field value can be NEXTHDR_AUTH.

Pablo's mail got me thinking again.

In general, IPv6 extension headers can appear in any order and stacks
must be process them. Fragmentation adds a limitation, that some
extension headers do not make sense and don't have any effect if they
appear after a fragmentation header (HbH and ROUTING).

Looking at the rest of the function we don't check for HBHHDR or RTHDR
following a fragmentation header either if we process the first fragment
(core stack only processes HBH if directly following the ipv6 header
anyway).

So, in my opinion, it is safe to completely remove this check and it
would align if the rest of the extension processing logic. The callers
all seem fine with that.

Pablo, what do you think?

Anyway, the patch does not apply cleanly, the patch header is mangled.
Could you check and send again?

Thanks,
Hannes

^ permalink raw reply

* Re: [PATCH] wireless extensions should default to Y
From: Johannes Berg @ 2015-01-07 10:41 UTC (permalink / raw)
  To: Pavel Machek
  Cc: jikos-AlSwsSmVLrQ, kernel list, Linus Torvalds,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20150107101903.GA6563@amd>

On Wed, 2015-01-07 at 11:19 +0100, Pavel Machek wrote:
> Do we need following patch on top of
> 24a0aa212ee2dbe44360288684478d76a8e20a0a revert?
> 
> I updated kernel today, and (probably because extensions were not
> selectable before), the default choice was "N", which is wrong:
> oldconfig should result in compatible choices being made, for example
> to help bisect.

I don't believe we need this. It has been defaulting to N for a long
time, it's just that it got thrown out of your config due to building
inbetween the patch and its revert. Had you built only before and after
that wouldn't be an issue.

If we default to Y now we send the wrong signal. If you really need to
bisect something wext related you have far bigger issues I'd think, the
code hasn't changed *forever*.

johannes

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next v2 6/7] rhashtable: initialize atomic nelems variable
From: Thomas Graf @ 2015-01-07 10:36 UTC (permalink / raw)
  To: Ying Xue
  Cc: davem, jon.maloy, Paul.Gortmaker, erik.hugne, netdev,
	tipc-discussion
In-Reply-To: <54AD09CD.3010100@windriver.com>

On 01/07/15 at 06:26pm, Ying Xue wrote:
> I am not sure whether we really need to reinitialize atomic variable
> again although we have reset it with memset() or something else. But I
> see many places in kernel where we do this, for example:
> 
> Although we use kmem_cache_zalloc() to allocate "net" structure instance
> in net_alloc(), there are still several places where to reinitialize its
> atomic variables again:
> 
> setup_net()
>   atomic_set(&net->use_count, 0);
> 
> rt_genid_init()
>   atomic_set(&net->ipv4.rt_genid, 0);
>   atomic_set(&net->fnhe_genid, 0);
> 
> Can you please definitely confirm that the reinitialisation is redundant
> for us?

I see examples for both, explicit initialization and dependence on
a previous memset. I'm not sure what is the preferred way.

I'll provide my ACK since this obviously doesn't break anything and
leave it up to Dave.

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply

* Re: [PATCH iproute2 -next] ip: route: add congestion control metric
From: Vadim Kochan @ 2015-01-07 10:22 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Stephen Hemminger, fw, netdev
In-Reply-To: <54AD08A9.9000205@redhat.com>

On Wed, Jan 07, 2015 at 11:21:29AM +0100, Daniel Borkmann wrote:
> On 01/07/2015 02:09 AM, Stephen Hemminger wrote:
> >On Wed,  7 Jan 2015 00:52:37 +0100
> >Daniel Borkmann <dborkman@redhat.com> wrote:
> >
> >>+		} else if (matches(*argv, "congctl") == 0) {
> >>+			char cc[16];
> >>+			NEXT_ARG();
> >>+			memset(cc, 0, sizeof(cc));
> >>+			if (strcmp(*argv, "lock") == 0) {
> >>+				mxlock |= (1<<RTAX_CC_ALGO);
> >
> >Unneeded paren
> 
 And what about spaces arount "<<" ?

^ permalink raw reply

* Re: IPsec workshop at netdev01?
From: Steffen Klassert @ 2015-01-07 10:31 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, Jamal Hadi Salim, Herbert Xu, David Miller
In-Reply-To: <20150106170026.GD11324@breakpoint.cc>

On Tue, Jan 06, 2015 at 06:00:26PM +0100, Florian Westphal wrote:
> Steffen Klassert <steffen.klassert@secunet.com> wrote:
> > - We still lack a 32/64 bit compatibiltiy layer for IPsec, this issue
> >   comes up from time to time. Some solutions were proposed in the past
> >   but all had problems. The current behaviour is broken if someone tries
> >   to configure IPsec with 32 bit tools on a 64 bit machine. Can we get
> >   this right somehow or is it better to just return an error in this case?
> 
> FWIW I think
> http://patchwork.ozlabs.org/patch/49465/
> 
> came closest to achieving full CONFIG_COMPAT support; since netlink is
> no longer async now I'm not sure we'd still need additonal 32-compat syscalls
> to make compat work for all cases.
> 
> So "its ugly as hell" is probably the only problem that is hard to avoid ;-)

Yeah, and it will be no fun to maintain it...

So the question is still, do we really need/want it or should we
tell that this is not supported. We just can't leave it as it is.
We allow to configure with 32 bit tools, but the result is crap.

^ permalink raw reply

* Re: [PATCH net-next v2 5/7] rhashtable: avoid unnecessary wakeup for worker queue
From: Ying Xue @ 2015-01-07 10:28 UTC (permalink / raw)
  To: Thomas Graf; +Cc: jon.maloy, netdev, Paul.Gortmaker, tipc-discussion, davem
In-Reply-To: <20150107095013.GJ21820@casper.infradead.org>

On 01/07/2015 05:50 PM, Thomas Graf wrote:
> On 01/07/15 at 01:41pm, Ying Xue wrote:
>> Move condition statements of verifying whether hash table size exceeds
>> its maximum threshold or reaches its minimum threshold from resizing
>> functions to resizing decision functions, avoiding unnecessary wakeup
>> for worker queue thread.
>>
>> Signed-off-by: Ying Xue <ying.xue@windriver.com>
>> Cc: Thomas Graf <tgraf@suug.ch>
> 
> Good optimization, thanks!
> 
> Acked-by: Thomas Graf <tgraf@suug.ch>
> 
> Can you do a follow-up patch and add a note in rhashtable.h to
> indicate the implementation of the grow and shrink decision function
> must enforce min/max shift?
> 
> 

Thanks for the reminder, and I will do this later.

Regards,
Ying


------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net

^ permalink raw reply

* Re: [PATCH 1/6] vxlan: Allow for VXLAN extensions to be implemented
From: Thomas Graf @ 2015-01-07 10:27 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David Miller, Jesse Gross, Stephen Hemminger, Pravin B Shelar,
	Linux Netdev List, dev@openvswitch.org
In-Reply-To: <CA+mtBx_mp1mf_HPp5u9wDfgeOc8pt26Mf1VHFZTAqDhTdZe7cw@mail.gmail.com>

On 01/06/15 at 07:46pm, Tom Herbert wrote:
> On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> > The VXLAN receive code is currently conservative in what it accepts and
> > will reject any frame that uses any of the reserved VXLAN protocol fields.
> > The VXLAN draft specifies that "reserved fields MUST be set to zero on
> > transmit and ignored on receive.".
> >
> IMO it is an unfortunate decision in VXLAN to ignore set reserved
> fields on receive. Had they not done this, then adding a protocol
> field to the header would have been feasible and we wouldn't need yet
> another encapsulation protocol (i.e. VXLAN-GPE). Rejecting frames with
> reserved bits set is the better behavior, but I think the comment
> about this needs to be clear about why this diverges from RFC7348.

Agreed. Do you want to give it a shot? I know you have been involved
on all aspects of this topic for a long time in NVO3 ;-)

> > Retain the current conservative parsing behaviour by default but allows
> > these fields to be used by VXLAN extensions which are explicitly enabled
> > on the VXLAN socket respectively VXLAN net_device.
> >
> Conceptually, VXLAN has both mandatory flags and optional flags for
> extensions. You may want to look at the VXLAN RCO patches that added
> an extension and infrastructure for them.

I've seen your proposed changes. I think they could be easily rebased
on top of this and use the enablement infrastructure. In fact, I see
this as the only feasible option to deal with VXLAN extensions: Leave
it to the user to decide which extensions to run. That way we can
support any combination of extensions, even conflicting ones. All we
have to do is catch incompatible extension configurations on a socket
and reject them at configuration time. Expecting that NVO3/IETF will
find consensus on a list of compatible set of extensions with perfect
forward and backwards compatibility is unrealistic in my eyes. We have
Geneve and GUE to solve it properly in the future.

^ permalink raw reply

* Re: [PATCH net-next v2 6/7] rhashtable: initialize atomic nelems variable
From: Ying Xue @ 2015-01-07 10:26 UTC (permalink / raw)
  To: Thomas Graf
  Cc: davem, jon.maloy, Paul.Gortmaker, erik.hugne, netdev,
	tipc-discussion
In-Reply-To: <20150107095332.GK21820@casper.infradead.org>

On 01/07/2015 05:53 PM, Thomas Graf wrote:
> On 01/07/15 at 01:41pm, Ying Xue wrote:
>> Signed-off-by: Ying Xue <ying.xue@windriver.com>
>> Cc: Thomas Graf <tgraf@suug.ch>
> 
> Is this really needed at all? We initialize the full rhashtable
> struct to 0 in rhashtable_init().
> 
> 

I am not sure whether we really need to reinitialize atomic variable
again although we have reset it with memset() or something else. But I
see many places in kernel where we do this, for example:

Although we use kmem_cache_zalloc() to allocate "net" structure instance
in net_alloc(), there are still several places where to reinitialize its
atomic variables again:

setup_net()
  atomic_set(&net->use_count, 0);

rt_genid_init()
  atomic_set(&net->ipv4.rt_genid, 0);
  atomic_set(&net->fnhe_genid, 0);

Can you please definitely confirm that the reinitialisation is redundant
for us?

Regards,
Ying

^ permalink raw reply

* Re: [PATCH iproute2 -next] ip: route: add congestion control metric
From: Daniel Borkmann @ 2015-01-07 10:21 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: fw, netdev
In-Reply-To: <20150106170926.5bb2d816@urahara>

On 01/07/2015 02:09 AM, Stephen Hemminger wrote:
> On Wed,  7 Jan 2015 00:52:37 +0100
> Daniel Borkmann <dborkman@redhat.com> wrote:
>
>> +		} else if (matches(*argv, "congctl") == 0) {
>> +			char cc[16];
>> +			NEXT_ARG();
>> +			memset(cc, 0, sizeof(cc));
>> +			if (strcmp(*argv, "lock") == 0) {
>> +				mxlock |= (1<<RTAX_CC_ALGO);
>
> Unneeded paren

Yep, I kept it consistent across all mxlock assignments of this file,
but I can remove it, sure.

>> +				NEXT_ARG();
>> +			}
>> +			strncpy(cc, *argv, sizeof(cc) - 1);
>> +			if (strlen(cc) == 0)
>> +				invarg("\"conctl\" value must be an algorithm name\n", *argv
>
> Silently truncating the string is not odd. Can't we just let kernel impose
> length restrictions.

Sure, will respin, thanks.

^ permalink raw reply

* [PATCH] wireless extensions should default to Y
From: Pavel Machek @ 2015-01-07 10:19 UTC (permalink / raw)
  To: jikos, kernel list, Linus Torvalds
  Cc: johannes, davem, linux-wireless, netdev

Do we need following patch on top of
24a0aa212ee2dbe44360288684478d76a8e20a0a revert?

I updated kernel today, and (probably because extensions were not
selectable before), the default choice was "N", which is wrong:
oldconfig should result in compatible choices being made, for example
to help bisect.

Signed-off-by: Pavel Machek <pavel@ucw.cz>

diff --git a/net/wireless/Kconfig b/net/wireless/Kconfig
index 29c8675..8951dba 100644
--- a/net/wireless/Kconfig
+++ b/net/wireless/Kconfig
@@ -177,6 +177,7 @@ config CFG80211_INTERNAL_REGDB
 config CFG80211_WEXT
 	bool "cfg80211 wireless extensions compatibility"
 	depends on CFG80211
+	default y
 	select WEXT_CORE
 	help
 	  Enable this option if you need old userspace for wireless


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply related

* Re: [PATCH net-next] net: sched: use pinned timers
From: Cosmin GIRADU @ 2015-01-07 10:06 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <20140926.002710.1595423408687413602.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 1786 bytes --]

On 26/09/14 07:27, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Sat, 20 Sep 2014 18:01:30 -0700
>
>> From: Eric Dumazet <edumazet@google.com>
>>
>> While using a MQ + NETEM setup, I had confirmation that the default
>> timer migration ( /proc/sys/kernel/timer_migration ) is killing us.
>>
>> Installing this on a receiver side of a TCP_STREAM test, (NIC has 8 TX
>> queues) :
>  ...
>> Current Qdiscs that benefit from this change are :
>>
>> 	netem, cbq, fq, hfsc, tbf, htb.
>>
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Looks great, applied, thanks Eric.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Hi Eric,

    I saw that this patch didn't make it's way into the stable branches.
So I have two questions:
        - Would it be safe to apply to linux-3.12.x stable?
        - If yes, would there be any [noticeable] efects on a [pretty
complex] HTB setup? (I know, test and I'll see,
           but if theory sais I won't, then there would be no point to
the test, would there?)

Thank you!

-- 

Cosmin GIRADU
OSS Engineer
RCS & RDS - Unified Services
Phone:  +40-31-400-6323
Mobile: +40-77-020-0858
http://www.rcs-rds.ro

..........................................................................
Privileged/Confidential Information may be contained in this message. If
you are not the addressee indicated in this message (or responsible for
delivery of the message to such person), you may not copy or deliver
this message to anyone. In such a case, you should destroy this message
and kindly notify the sender by reply e-mail.



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply

* RE: [PATCH 1/1] update ip-sysctl.txt documentation
From: David Laight @ 2015-01-07 10:11 UTC (permalink / raw)
  To: 'Ani Sinha', corbet@lwn.net, davem@davemloft.net,
	edumazet@google.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, P@draigbrady.com,
	netdev@vger.kernel.org, fruggeri@arista.com
In-Reply-To: <1420587382-24612-1-git-send-email-ani@arista.com>

From: Ani Sinha
> Update documentation to reflect the fact that
> /proc/sys/net/ipv4/route/max_size is no longer used for ipv4.
> 
> Signed-off-by: Ani Sinha <ani@arista.com>
> ---
>  Documentation/networking/ip-sysctl.txt |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index 9bffdfc..2a28261 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -64,8 +64,10 @@ fwmark_reflect - BOOLEAN
>  	Default: 0
> 
>  route/max_size - INTEGER
> -	Maximum number of routes allowed in the kernel.  Increase
> -	this when using large numbers of interfaces and/or routes.
> +	Post linux kernel 3.6, this is deprecated for ipv4 as route cache is no
> +	longer used. For ipv6, this is used to limit the maximum number of ipv6
> +	routes allowed in the kernel.  Increase this when using large numbers of
> +	interfaces and/or routes.

Now imagine you are reading ip-sysctl.txt, the new text is all wrong.
Something like:
	Limit on the size of the ip route caches.
	Ignored for ipv4 after kernel 3.6 because the ipv4 route cache was removed.
	Increase this when using large numbers of interfaces and/or routes.
would read better.

	David

^ permalink raw reply

* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
From: Or Gerlitz @ 2015-01-07 10:07 UTC (permalink / raw)
  To: John Fastabend
  Cc: Thomas Graf, Scott Feldman, Jiří Pírko,
	Jamal Hadi Salim, simon.horman, Linux Netdev List, David Miller,
	Andy Gospodarek
In-Reply-To: <54AADEFF.3090306@gmail.com>

On Mon, Jan 5, 2015 at 8:59 PM, John Fastabend <john.fastabend@gmail.com> wrote:
>>> +struct sk_buff *net_flow_build_actions_msg(struct net_flow_action **a,
>>> +                                          struct net_device *dev,
>>> +                                          u32 portid, int seq, u8 cmd)
>>> +{
>>> +       struct genlmsghdr *hdr;
>>> +       struct sk_buff *skb;
>>> +       int err = -ENOBUFS;
>>> +
>>> +       skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
>>
>>
>> genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
>
>
> fixed along with the other cases.

small nit here, net_flow_build_actions_msg can be made static, it's
called only from within this file

few more nits... checkpatch --strict produces bunch of "CHECK: Please
use a blank line after function/struct/union/enum declarations"
comments, I guess worth fixing too.

^ permalink raw reply

* Re: [RFC PATCH] unlock rtnl mutex in ic_open_devs while waiting
From: Maarten Lankhorst @ 2015-01-07 10:06 UTC (permalink / raw)
  To: David Miller
  Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel,
	Jay Cliburn, Chris Snook, Sabrina Dubroca, Johannes Berg
In-Reply-To: <20150106.172152.2034122593693302134.davem@davemloft.net>

Op 06-01-15 om 23:21 schreef David Miller:
> From: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> Date: Mon, 05 Jan 2015 14:52:06 +0100
>
>> This fixes a deadlock with alx_link_check, which takes the rtnl_mutex in
>> a work item to check the link.
>>
>> I have no idea whether alx should be fixed or ipconfig.c,
>> but this saves 120 seconds off my boot time. ;-)
>>
>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> I genuinely think that alx_link_check() needs to use a smaller hammer
> to do it's locking, there is no reason to use the RTNL mutex.
>
> A driver private mutex will probably work just as well and not have
> this problem.

I guess alx_check_link uses the rtnl_lock for serializing against any possible alx_reset call.
The alternative is stopping check_link work before running anything that changes the device state.
Does the below patch look sane instead?
---
diff --git a/drivers/net/ethernet/atheros/alx/alx.h b/drivers/net/ethernet/atheros/alx/alx.h
index 8fc93c5f6abc..354f155b3144 100644
--- a/drivers/net/ethernet/atheros/alx/alx.h
+++ b/drivers/net/ethernet/atheros/alx/alx.h
@@ -88,6 +88,8 @@ struct alx_priv {
 		unsigned int size;
 	} descmem;
 
+	bool stop_link_check;
+
 	/* protect int_mask updates */
 	spinlock_t irq_lock;
 	u32 int_mask;
diff --git a/drivers/net/ethernet/atheros/alx/main.c b/drivers/net/ethernet/atheros/alx/main.c
index e398eda07298..ae93b8052cbf 100644
--- a/drivers/net/ethernet/atheros/alx/main.c
+++ b/drivers/net/ethernet/atheros/alx/main.c
@@ -315,7 +316,8 @@ static irqreturn_t alx_intr_handle(struct alx_priv *alx, u32 intr)
 		 */
 		alx->int_mask &= ~ALX_ISR_PHY;
 		write_int_mask = true;
-		alx_schedule_link_check(alx);
+		if (!alx->stop_link_check)
+			alx_schedule_link_check(alx);
 	}
 
 	if (intr & (ALX_ISR_TX_Q0 | ALX_ISR_RX_Q0)) {
@@ -742,6 +744,17 @@ static netdev_features_t alx_fix_features(struct net_device *netdev,
 	return features;
 }
 
+static void alx_disable_link_check(struct alx_priv *alx)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&alx->irq_lock, flags);
+	alx->stop_link_check = true;
+	spin_unlock_irqrestore(&alx->irq_lock, flags);
+
+	cancel_work_sync(&alx->link_check_wk);
+}
+
 static void alx_netif_stop(struct alx_priv *alx)
 {
 	alx->dev->trans_start = jiffies;
@@ -756,6 +769,7 @@ static void alx_halt(struct alx_priv *alx)
 {
 	struct alx_hw *hw = &alx->hw;
 
+	alx_disable_link_check(alx);
 	alx_netif_stop(alx);
 	hw->link_speed = SPEED_UNKNOWN;
 	hw->duplex = DUPLEX_UNKNOWN;
@@ -788,6 +802,7 @@ static void alx_activate(struct alx_priv *alx)
 	/* clear old interrupts */
 	alx_write_mem32(&alx->hw, ALX_ISR, ~(u32)ALX_ISR_DIS);
 
+	alx->stop_link_check = false;
 	alx_irq_enable(alx);
 
 	alx_schedule_link_check(alx);
@@ -850,6 +865,7 @@ static int __alx_open(struct alx_priv *alx, bool resume)
 	/* clear old interrupts */
 	alx_write_mem32(&alx->hw, ALX_ISR, ~(u32)ALX_ISR_DIS);
 
+	alx->stop_link_check = false;
 	alx_irq_enable(alx);
 
 	if (!resume)
@@ -966,9 +982,7 @@ static void alx_link_check(struct work_struct *work)
 
 	alx = container_of(work, struct alx_priv, link_check_wk);
 
-	rtnl_lock();
 	alx_check_link(alx);
-	rtnl_unlock();
 }
 

^ permalink raw reply related

* RE: [PATCH 2/6] vxlan: Group Policy extension
From: David Laight @ 2015-01-07 10:03 UTC (permalink / raw)
  To: 'Alexei Starovoitov', Thomas Graf
  Cc: David S. Miller, Jesse Gross, Stephen Hemminger, Pravin Shelar,
	netdev@vger.kernel.org, dev@openvswitch.org
In-Reply-To: <CAADnVQ+EO34qYbK+sji-Da3CMkn+-V-9Uvdar_WgxRQ4_+JFXA@mail.gmail.com>

From: Alexei Starovoitov
> On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> > +struct vxlan_gbp {
> > +#ifdef __LITTLE_ENDIAN_BITFIELD
> > +       __u8    reserved_flags1:3,
> ...
> > +       __be16 policy_id;
> > +} __packed;
> 
> are you sure that compiler will be smart enough
> to do single short load when you pack
> u8 + struct vxlan_gbp inside struct vxlanhdr ?
> I suspect compiler will use two byte loads
> with shifts and ors every time to access policy_id.
> Even it works ok, I think this struct layout is ugly.
> imo would be much easier to read if you replace
> the whole vxlanhdr with vxlanhdr_gbp
> or split vxlanhdr into two 32-bit structs.
> then __packed hacks won't be needed.

Also, if you are writing the values then you need to write
all the members of the bitfield in order to get a single write.

Basically you are much better off using explicit masks.

	David


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox