Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] Prevent reading uninitialized memory with socket filters
From: Joe Perches @ 2010-11-09 23:03 UTC (permalink / raw)
  To: Dan Rosenberg; +Cc: netdev, stable, security
In-Reply-To: <1289341724.7380.13.camel@dan>

On Tue, 2010-11-09 at 17:28 -0500, Dan Rosenberg wrote:
> The "mem" array used as scratch space for socket filters is not
> initialized, allowing unprivileged users to leak kernel stack bytes.

Hi Dan.

Using
	type var[count] = {};
instead of
	type var[count];
	...
	memset(var, 0, sizeof(var));

at least for gcc 4.4 and 4.5 generally results in smaller code.

$ size net/core/filter.o*
   text	   data	    bss	    dec	    hex	filename
   6751	     56	   1736	   8543	   215f	net/core/filter.o.memset
   6749	     56	   1736	   8541	   215d	net/core/filter.o.init



^ permalink raw reply

* Re: Networking hangs when too many parallel requests are made at once
From: Ben Greear @ 2010-11-09 22:49 UTC (permalink / raw)
  To: Luke Hutchison; +Cc: netdev
In-Reply-To: <AANLkTi=LGx2mbYT5gRMV0izg5=KY7pCKOuXnwPRNQDMK@mail.gmail.com>

On 11/09/2010 02:38 PM, Luke Hutchison wrote:
> On Tue, Nov 9, 2010 at 5:29 PM, Ben Greear<greearb@candelatech.com>  wrote:
>> If you get all names resolved with your caching name-server, can you then
>> open the browser tabs w/out problem?
>
> This is hard to test, because to get all the same domain names
> resolved for all resources on all pages, I have to successfully open
> all the pages once first.  Even opening the pages a few seconds apart
> seems to break things quite frequently.  And there is a period where
> the connection starts acting up but is not hard locked up, and it's
> hard to know at that point if it's the connection or the individual
> website.  The only way I can think of of reliably triggering this 100%
> of the time is to open a bunch of browser tabs all at the same time --
> and that hangs the dns caching server's requests too.
>
>> Have you tried setting all your browser tabs to simple low-bandwidth pages (no ads being
>> served from various hosts, etc) to see if that works?
>
> Not exactly, but I have one browser window with about 20 Wikipedia
> articles open, and not all of them load (some get stalled until they
> time out).  I think this serves the same purpose as your suggested
> test, because Wikipedia doesn't draw from many external domains.
>
>> Maybe you are just flooding the network so hard that responses are being
>> dropped?
>
> Yes, but you pointed out earlier that you routinely test with
> thousands of TCP connections, and we're only talking about 20-30
> browser tabs here, maybe a few thousand HTTP requests at most.  Also,
> this used to work fine on old Fedora kernels and no longer works with
> more recent kernels.

Well, I'm low on ideas.

For our tests though, we are running across 1G Ethernet most of the time,
so bandwidth is not an issue.  Also, we aren't dependent on external DNS for
this type of test.

 From looking at your capture, you are not getting DNS responses back
reliably.  On the great wild internet, there are lots of reasons why
that might be happening, so without a more controlled test case, I'm
not sure anyone can help you.

It wouldn't be quick, but if you were able to do a git-bisect to figure
out which kernel change affected you, then that might be a start.

If there were a way for you to tune your TCP stack to run slower, that
might help too.  Maybe hard limit the max window size to something small like
8k?

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* Re: Networking hangs when too many parallel requests are made at once
From: Luke Hutchison @ 2010-11-09 22:38 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev
In-Reply-To: <4CD9CB44.4030206@candelatech.com>

On Tue, Nov 9, 2010 at 5:29 PM, Ben Greear <greearb@candelatech.com> wrote:
> If you get all names resolved with your caching name-server, can you then
> open the browser tabs w/out problem?

This is hard to test, because to get all the same domain names
resolved for all resources on all pages, I have to successfully open
all the pages once first.  Even opening the pages a few seconds apart
seems to break things quite frequently.  And there is a period where
the connection starts acting up but is not hard locked up, and it's
hard to know at that point if it's the connection or the individual
website.  The only way I can think of of reliably triggering this 100%
of the time is to open a bunch of browser tabs all at the same time --
and that hangs the dns caching server's requests too.

> Have you tried setting all your browser tabs to simple low-bandwidth pages (no ads being
> served from various hosts, etc) to see if that works?

Not exactly, but I have one browser window with about 20 Wikipedia
articles open, and not all of them load (some get stalled until they
time out).  I think this serves the same purpose as your suggested
test, because Wikipedia doesn't draw from many external domains.

> Maybe you are just flooding the network so hard that responses are being
> dropped?

Yes, but you pointed out earlier that you routinely test with
thousands of TCP connections, and we're only talking about 20-30
browser tabs here, maybe a few thousand HTTP requests at most.  Also,
this used to work fine on old Fedora kernels and no longer works with
more recent kernels.

Thanks,
Luke

^ permalink raw reply

* Re: Networking hangs when too many parallel requests are made at once
From: Ben Greear @ 2010-11-09 22:29 UTC (permalink / raw)
  To: Luke Hutchison; +Cc: netdev
In-Reply-To: <AANLkTi=qgJcBUdhDpV8QdPPUO1o7QjZLUj07_C1B9Ygg@mail.gmail.com>

On 11/09/2010 02:20 PM, Luke Hutchison wrote:
> On Tue, Nov 9, 2010 at 5:14 PM, Ben Greear<greearb@candelatech.com>  wrote:
>> Have you tried using a different DNS server (open-dns?), or maybe a caching
>> one one
>> your local machine?  Maybe some part of your network is throwing away
>> some of your DNS requests since you send so many at once?
>
> I have tried Google's DNS as well as Comcast's, no difference in
> effect.  Also this has been a problem when I've been in the US,
> Portugal, Germany and China, so I have probably used a range of DNS
> servers.  I have tried nscd (it's off by default) and it has the
> expected behavior: if it resolves a name before re-opening the
> browser, then that name can continue to be resolved after the network
> gets flooded.  If I ask it to resolve a domain name after the network
> is flooded, the request times out, and subsequently nscd says the
> domain name doesn't exist, even after the network link has been
> restored to normal.

If you get all names resolved with your caching name-server, can you then
open the browser tabs w/out problem?

Have you tried setting all your browser tabs to simple low-bandwidth pages (no ads being
served from various hosts, etc) to see if that works?

Maybe you are just flooding the network so hard that responses are being
dropped?

Thanks,
Ben

>
> Thanks,
> Luke


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* [PATCH] Prevent reading uninitialized memory with socket filters
From: Dan Rosenberg @ 2010-11-09 22:28 UTC (permalink / raw)
  To: netdev; +Cc: stable, security

The "mem" array used as scratch space for socket filters is not
initialized, allowing unprivileged users to leak kernel stack bytes.

Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>

---
 net/core/filter.c               |    2 ++
 1 file changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 7beaec3..2749ba0 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -121,6 +121,8 @@ unsigned int sk_run_filter(struct sk_buff *skb, struct sock_filter *filter, int
 	int k;
 	int pc;
 
+	memset(mem, 0, sizeof(mem));
+
 	/*
 	 * Process array of filter instructions.
 	 */



^ permalink raw reply related

* Re: [PATCH] iproute2: add VF_PORT support
From: Stephen Hemminger @ 2010-11-09 22:22 UTC (permalink / raw)
  To: Roopa Prabhu; +Cc: netdev, chrisw, scofeldm, arnd
In-Reply-To: <20101109222011.23322.24678.stgit@savbu-pc100.cisco.com>

On Tue, 09 Nov 2010 14:20:11 -0800
Roopa Prabhu <roprabhu@cisco.com> wrote:

> 			response == PORT_VDP_RESPONSE_SUCCESS ?
> +				"SUCCESS" :
> +			response == PORT_VDP_RESPONSE_INVALID_FORMAT ?
> +				"INVALID FORMAT" :
> +			response == PORT_VDP_RESPONSE_INSUFFICIENT_RESOURCES ?
> +				"INSUFFICIENT RESOURCES" :
> +			response == PORT_VDP_RESPONSE_UNUSED_VTID ?
> +				"UNUSED VTID" :
> +			response == PORT_VDP_RESPONSE_VTID_VIOLATION ?
> +				"VTID VIOLATION" :
> +			response == PORT_VDP_RESPONSE_VTID_VERSION_VIOALTION ?
> +				"VTID VERSION VIOLATION" :
> +			response == PORT_VDP_RESPONSE_OUT_OF_SYNC ?
> +				"OUT-OF-SYNC" :
> +			response == PORT_PROFILE_RESPONSE_SUCCESS ?
> +				"SUCCESS" :
> +			response == PORT_PROFILE_RESPONSE_INPROGRESS ?
> +				"IN-PROGRESS" :
> +			response == PORT_PROFILE_RESPONSE_INVALID ?
> +				"INVALID" :
> +			response == PORT_PROFILE_RESPONSE_BADSTATE ?
> +				"BAD STATE" :
> +			response == PORT_PROFILE_RESPONSE_INSUFFICIENT_RESOURCES ?
> +				"INSUFFICIENT RESOURCES" :
> +			response == PORT_PROFILE_RESPONSE_ERROR ?
> +				"ERROR" :
> +			"UNKNOWN RESPONSE");

That's an ugly way to do this.
Make it a real function nor array.

-- 

^ permalink raw reply

* Re: Networking hangs when too many parallel requests are made at once
From: Luke Hutchison @ 2010-11-09 22:20 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev
In-Reply-To: <4CD9C7E2.6050909@candelatech.com>

On Tue, Nov 9, 2010 at 5:14 PM, Ben Greear <greearb@candelatech.com> wrote:
> Have you tried using a different DNS server (open-dns?), or maybe a caching
> one one
> your local machine?  Maybe some part of your network is throwing away
> some of your DNS requests since you send so many at once?

I have tried Google's DNS as well as Comcast's, no difference in
effect.  Also this has been a problem when I've been in the US,
Portugal, Germany and China, so I have probably used a range of DNS
servers.  I have tried nscd (it's off by default) and it has the
expected behavior: if it resolves a name before re-opening the
browser, then that name can continue to be resolved after the network
gets flooded.  If I ask it to resolve a domain name after the network
is flooded, the request times out, and subsequently nscd says the
domain name doesn't exist, even after the network link has been
restored to normal.

Thanks,
Luke

^ permalink raw reply

* [PATCH] iproute2: add VF_PORT support
From: Roopa Prabhu @ 2010-11-09 22:20 UTC (permalink / raw)
  To: netdev; +Cc: chrisw, scofeldm, shemminger, arnd

From: Roopa Prabhu <roprabhu@cisco.com>

Resubmitting Scott Feldmans original patch with a minor fix

Changes since last posted version:
- Fix port profile strlen which was off by 1

Add support for IFLA_VF_PORTS.  VF port netlink msg layout is

        [IFLA_NUM_VF]
        [IFLA_VF_PORTS]
                [IFLA_VF_PORT]
                        [IFLA_PORT_*], ...
                [IFLA_VF_PORT]
                        [IFLA_PORT_*], ...
                ...
        [IFLA_PORT_SELF]
                [IFLA_PORT_*], ...

The iproute2 cmd line for link set is now:

Usage: ip link add link DEV [ name ] NAME
                   [ txqueuelen PACKETS ]
                   [ address LLADDR ]
                   [ broadcast LLADDR ]
                   [ mtu MTU ]
                   type TYPE [ ARGS ]
       ip link delete DEV type TYPE [ ARGS ]

       ip link set DEVICE [ { up | down } ]
                          [ arp { on | off } ]
                          [ dynamic { on | off } ]
                          [ multicast { on | off } ]
                          [ allmulticast { on | off } ]
                          [ promisc { on | off } ]
                          [ trailers { on | off } ]
                          [ txqueuelen PACKETS ]
                          [ name NEWNAME ]
                          [ address LLADDR ]
                          [ broadcast LLADDR ]
                          [ mtu MTU ]
                          [ netns PID ]
                          [ alias NAME ]
                          [ port MODE { PROFILE | VSI } ]
                          [ vf NUM [ mac LLADDR ]
                                   [ vlan VLANID [ qos VLAN-QOS ] ]
                                   [ rate TXRATE ]
                                   [ port MODE { PROFILE | VSI } ] ]
       ip link show [ DEVICE ]

TYPE := { vlan | veth | vcan | dummy | ifb | macvlan | can }
MODE := { assoc | preassoc | preassocrr | disassoc }
PROFILE := profile PROFILE
           [ instance UUID ]
           [ host UUID ]
VSI := vsi mgr MGRID type VTID ver VER
       [ instance UUID ]

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
---
 ip/ipaddress.c |  115 ++++++++++++++++++++++++++++
 ip/iplink.c    |  227 +++++++++++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 304 insertions(+), 38 deletions(-)


diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 19b3d6e..65be741 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -187,6 +187,107 @@ static void print_linktype(FILE *fp, struct rtattr *tb)
 	}
 }
 
+static void print_port(FILE *fp, struct rtattr *port[])
+{
+	struct ifla_port_vsi *vsi;
+#define uuid_fmt "%02X%02X%02X%02X-%02X%02X-%02X%02X-" \
+	"%02X%02X-%02X%02X%02X%02X%02X%02X"
+	unsigned char *uuid;
+	__u8 request;
+	__u16 response;
+
+	if (port[IFLA_PORT_VF])
+		fprintf(fp, "\n    vf %d port",
+			*(__u32 *)RTA_DATA(port[IFLA_PORT_VF]));
+	else
+		fprintf(fp, "\n    port");
+
+	if (port[IFLA_PORT_REQUEST]) {
+		request = *(__u8 *)RTA_DATA(port[IFLA_PORT_REQUEST]);
+		fprintf(fp, " %s",
+			request == PORT_REQUEST_PREASSOCIATE ? "preassoc" :
+			request == PORT_REQUEST_PREASSOCIATE_RR ? "preassocrr" :
+			request == PORT_REQUEST_ASSOCIATE ? "assoc" :
+			request == PORT_REQUEST_DISASSOCIATE ? "disassoc" :
+			"unknown request");
+	}
+
+	if (port[IFLA_PORT_PROFILE])
+		fprintf(fp, " profile \"%s\"",
+			(char *)RTA_DATA(port[IFLA_PORT_PROFILE]));
+
+	if (port[IFLA_PORT_VSI_TYPE]) {
+		vsi = RTA_DATA(port[IFLA_PORT_VSI_TYPE]);
+		fprintf(fp, " vsi mgr %d type 0x%02x%02x%02x ver %d",
+			vsi->vsi_mgr_id, vsi->vsi_type_id[0],
+			vsi->vsi_type_id[1], vsi->vsi_type_id[2],
+			vsi->vsi_type_version);
+	}
+
+	if (port[IFLA_PORT_RESPONSE]) {
+		response = *(__u16 *)RTA_DATA(port[IFLA_PORT_RESPONSE]);
+		fprintf(fp, " status: %s",
+			response == PORT_VDP_RESPONSE_SUCCESS ?
+				"SUCCESS" :
+			response == PORT_VDP_RESPONSE_INVALID_FORMAT ?
+				"INVALID FORMAT" :
+			response == PORT_VDP_RESPONSE_INSUFFICIENT_RESOURCES ?
+				"INSUFFICIENT RESOURCES" :
+			response == PORT_VDP_RESPONSE_UNUSED_VTID ?
+				"UNUSED VTID" :
+			response == PORT_VDP_RESPONSE_VTID_VIOLATION ?
+				"VTID VIOLATION" :
+			response == PORT_VDP_RESPONSE_VTID_VERSION_VIOALTION ?
+				"VTID VERSION VIOLATION" :
+			response == PORT_VDP_RESPONSE_OUT_OF_SYNC ?
+				"OUT-OF-SYNC" :
+			response == PORT_PROFILE_RESPONSE_SUCCESS ?
+				"SUCCESS" :
+			response == PORT_PROFILE_RESPONSE_INPROGRESS ?
+				"IN-PROGRESS" :
+			response == PORT_PROFILE_RESPONSE_INVALID ?
+				"INVALID" :
+			response == PORT_PROFILE_RESPONSE_BADSTATE ?
+				"BAD STATE" :
+			response == PORT_PROFILE_RESPONSE_INSUFFICIENT_RESOURCES ?
+				"INSUFFICIENT RESOURCES" :
+			response == PORT_PROFILE_RESPONSE_ERROR ?
+				"ERROR" :
+			"UNKNOWN RESPONSE");
+	}
+
+	if (port[IFLA_PORT_INSTANCE_UUID]) {
+		uuid = RTA_DATA(port[IFLA_PORT_INSTANCE_UUID]);
+		fprintf(fp, "\n        instance "uuid_fmt,
+			uuid[0],  uuid[1],  uuid[2],  uuid[3],
+			uuid[4],  uuid[5],  uuid[6],  uuid[7],
+			uuid[8],  uuid[9],  uuid[10], uuid[11],
+			uuid[12], uuid[13], uuid[14], uuid[15]);
+	}
+
+	if (port[IFLA_PORT_HOST_UUID]) {
+		uuid = RTA_DATA(port[IFLA_PORT_HOST_UUID]);
+		fprintf(fp, "\n            host "uuid_fmt,
+			uuid[0],  uuid[1],  uuid[2],  uuid[3],
+			uuid[4],  uuid[5],  uuid[6],  uuid[7],
+			uuid[8],  uuid[9],  uuid[10], uuid[11],
+			uuid[12], uuid[13], uuid[14], uuid[15]);
+	}
+}
+
+static void print_vfport(FILE *fp, struct rtattr *vfport)
+{
+	struct rtattr *port[IFLA_PORT_MAX+1];
+
+	if (vfport->rta_type != IFLA_VF_PORT) {
+		fprintf(stderr, "BUG: rta type is %d\n", vfport->rta_type);
+		return;
+	}
+
+	parse_rtattr_nested(port, IFLA_PORT_MAX, vfport);
+	print_port(fp, port);
+}
+
 static void print_vfinfo(FILE *fp, struct rtattr *vfinfo)
 {
 	struct ifla_vf_mac *vf_mac;
@@ -421,6 +522,20 @@ int print_linkinfo(const struct sockaddr_nl *who,
 			print_vfinfo(fp, i);
 	}
 
+	if (do_link && tb[IFLA_PORT_SELF]) {
+		struct rtattr *port[IFLA_PORT_MAX+1];
+		parse_rtattr_nested(port, IFLA_PORT_MAX, tb[IFLA_PORT_SELF]);
+		print_port(fp, port);
+	}
+
+	if (do_link && tb[IFLA_VF_PORTS] && tb[IFLA_NUM_VF]) {
+		struct rtattr *i, *vfports = tb[IFLA_VF_PORTS];
+		int rem = RTA_PAYLOAD(vfports);
+		for (i = RTA_DATA(vfports); RTA_OK(i, rem);
+			i = RTA_NEXT(i, rem))
+			print_vfport(fp, i);
+	}
+
 	fprintf(fp, "\n");
 	fflush(fp);
 	return 0;
diff --git a/ip/iplink.c b/ip/iplink.c
index cb2c4f5..961a3ef 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -68,14 +68,22 @@ void iplink_usage(void)
 	fprintf(stderr, "	                  [ mtu MTU ]\n");
 	fprintf(stderr, "	                  [ netns PID ]\n");
 	fprintf(stderr, "			  [ alias NAME ]\n");
+	fprintf(stderr, "			  [ port MODE { PROFILE | VSI } ]\n");
 	fprintf(stderr, "	                  [ vf NUM [ mac LLADDR ]\n");
 	fprintf(stderr, "				   [ vlan VLANID [ qos VLAN-QOS ] ]\n");
-	fprintf(stderr, "				   [ rate TXRATE ] ] \n");
+	fprintf(stderr, "				   [ rate TXRATE ]\n");
+	fprintf(stderr, "				   [ port MODE { PROFILE | VSI } ] ]\n");
 	fprintf(stderr, "       ip link show [ DEVICE ]\n");
 
 	if (iplink_have_newlink()) {
 		fprintf(stderr, "\n");
 		fprintf(stderr, "TYPE := { vlan | veth | vcan | dummy | ifb | macvlan | can }\n");
+		fprintf(stderr, "MODE := { assoc | preassoc | preassocrr | disassoc }\n");
+		fprintf(stderr, "PROFILE := profile PROFILE\n");
+		fprintf(stderr, "           [ instance UUID ]\n");
+		fprintf(stderr, "           [ host UUID ]\n");
+		fprintf(stderr, "VSI := vsi mgr MGRID type VTID ver VER\n");
+		fprintf(stderr, "       [ instance UUID ]\n");
 	}
 	exit(-1);
 }
@@ -176,55 +184,170 @@ struct iplink_req {
 	char			buf[1024];
 };
 
-int iplink_parse_vf(int vf, int *argcp, char ***argvp,
-			   struct iplink_req *req)
+void iplink_parse_port(int vf, int *argcp, char ***argvp,
+		       struct iplink_req *req)
+{
+	int argc = *argcp;
+	char **argv = *argvp;
+	struct rtattr *nest, *nest_inner = NULL;
+	struct ifla_port_vsi port_vsi;
+	char *port_profile = NULL;
+	char *instance_uuid = NULL;
+	char *host_uuid = NULL;
+	unsigned char uuid[16];
+	char *uuid_fmt = "%02X%02X%02X%02X-%02X%02X-%02X%02X-"
+		"%02X%02X-%02X%02X%02X%02X%02X%02X";
+	int parsed;
+	int manager_id = -1;
+	int type_id = -1;
+	int type_id_version = -1;
+	int request = -1;
+	int vsi = 0;
+
+	if (NEXT_ARG_OK()) {
+		NEXT_ARG();
+		if (matches(*argv, "assoc") == 0)
+			request = PORT_REQUEST_ASSOCIATE;
+		else if (matches(*argv, "preassoc") == 0)
+			request = PORT_REQUEST_PREASSOCIATE;
+		else if (matches(*argv, "preassocrr") == 0)
+			request = PORT_REQUEST_PREASSOCIATE_RR;
+		else if (matches(*argv, "disassoc") == 0)
+			request = PORT_REQUEST_DISASSOCIATE;
+	}
+
+	while (NEXT_ARG_OK()) {
+		NEXT_ARG();
+		if (matches(*argv, "vsi") == 0) {
+			vsi = 1;
+		} else if (matches(*argv, "mgr") == 0) {
+			NEXT_ARG();
+			if (get_integer(&manager_id, *argv, 0))
+				invarg("Invalid \"mgr\" value\n", *argv);
+		} else if (matches(*argv, "type") == 0) {
+			NEXT_ARG();
+			if (get_integer(&type_id, *argv, 0))
+				invarg("Invalid \"type\" value\n", *argv);
+		} else if (matches(*argv, "ver") == 0) {
+			NEXT_ARG();
+			if (get_integer(&type_id_version, *argv, 0))
+				invarg("Invalid \"ver\" value\n", *argv);
+		} else if (matches(*argv, "profile") == 0) {
+			NEXT_ARG();
+			port_profile = *argv;
+		} else if (matches(*argv, "instance") == 0) {
+			NEXT_ARG();
+			instance_uuid = *argv;
+		} else if (matches(*argv, "host") == 0) {
+			NEXT_ARG();
+			host_uuid = *argv;
+		} else {
+			/* rewind arg */
+			PREV_ARG();
+			break;
+		}
+	}
+
+	if (argc == *argcp)
+		incomplete_command();
+
+	if (vf == PORT_SELF_VF) {
+		nest = addattr_nest(&req->n, sizeof(*req), IFLA_PORT_SELF);
+	} else {
+		nest = addattr_nest(&req->n, sizeof(*req), IFLA_VF_PORTS);
+		nest_inner = addattr_nest(&req->n, sizeof(*req), IFLA_VF_PORT);
+		addattr_l(&req->n, sizeof(*req), IFLA_PORT_VF,
+			(uint32_t *)&vf, sizeof(uint32_t));
+	}
+
+	if (port_profile)
+		addattr_l(&req->n, sizeof(*req), IFLA_PORT_PROFILE,
+			port_profile, strlen(port_profile) + 1);
+
+	if (instance_uuid) {
+		parsed = sscanf(instance_uuid, uuid_fmt,
+			&uuid[0],  &uuid[1],  &uuid[2],  &uuid[3],
+			&uuid[4],  &uuid[5],  &uuid[6],  &uuid[7],
+			&uuid[8],  &uuid[9],  &uuid[10], &uuid[11],
+			&uuid[12], &uuid[13], &uuid[14], &uuid[15]);
+		if (parsed != sizeof(uuid))
+			invarg("Invalid \"uuid\" value\n", instance_uuid);
+		addattr_l(&req->n, sizeof(*req), IFLA_PORT_INSTANCE_UUID,
+			uuid, sizeof(uuid));
+
+	}
+
+	if (host_uuid) {
+		parsed = sscanf(host_uuid, uuid_fmt,
+			&uuid[0],  &uuid[1],  &uuid[2],  &uuid[3],
+			&uuid[4],  &uuid[5],  &uuid[6],  &uuid[7],
+			&uuid[8],  &uuid[9],  &uuid[10], &uuid[11],
+			&uuid[12], &uuid[13], &uuid[14], &uuid[15]);
+		if (parsed != sizeof(uuid))
+			invarg("Invalid \"uuid\" value\n", host_uuid);
+		addattr_l(&req->n, sizeof(*req), IFLA_PORT_HOST_UUID,
+			uuid, sizeof(uuid));
+
+	}
+
+	if (vsi) {
+		port_vsi.vsi_mgr_id = manager_id;
+		memcpy(&port_vsi.vsi_type_id, &type_id,
+			sizeof(port_vsi.vsi_type_id));
+		port_vsi.vsi_type_version = type_id_version;
+		addattr_l(&req->n, sizeof(*req), IFLA_PORT_VSI_TYPE,
+			&port_vsi, sizeof(port_vsi));
+	}
+
+	addattr_l(&req->n, sizeof(*req), IFLA_PORT_REQUEST,
+		&request, 1);
+
+	if (nest_inner)
+		addattr_nest_end(&req->n, nest_inner);
+	addattr_nest_end(&req->n, nest);
+
+	*argcp = argc;
+	*argvp = argv;
+}
+
+void iplink_parse_vf(int vf, int *argcp, char ***argvp,
+		     struct iplink_req *req)
 {
 	int len, argc = *argcp;
 	char **argv = *argvp;
+	struct rtattr *vflist;
 	struct rtattr *vfinfo;
-
-	vfinfo = addattr_nest(&req->n, sizeof(*req), IFLA_VF_INFO);
+	char *mac = NULL;
+	char *vlan = NULL;
+	char *qos = NULL;
+	char *rate = NULL;
+	struct ifla_vf_mac ivm = { .vf = vf, };
+	struct ifla_vf_vlan ivv = { .vf = vf, .qos = 0, };
+	struct ifla_vf_tx_rate ivt = { .vf = vf, };
 
 	while (NEXT_ARG_OK()) {
 		NEXT_ARG();
-		if (matches(*argv, "mac") == 0) {
-			struct ifla_vf_mac ivm;
+		if (matches(*argv, "port") == 0) {
+			iplink_parse_port(vf, &argc, &argv, req);
+		} else if (matches(*argv, "mac") == 0) {
 			NEXT_ARG();
-			ivm.vf = vf;
-			len = ll_addr_a2n((char *)ivm.mac, 32, *argv);
-			if (len < 0)
-				return -1;
-			addattr_l(&req->n, sizeof(*req), IFLA_VF_MAC, &ivm, sizeof(ivm));
+			mac = *argv;
 		} else if (matches(*argv, "vlan") == 0) {
-			struct ifla_vf_vlan ivv;
 			NEXT_ARG();
-			if (get_unsigned(&ivv.vlan, *argv, 0)) {
-				invarg("Invalid \"vlan\" value\n", *argv);
-			}
-			ivv.vf = vf;
-			ivv.qos = 0;
+			vlan = *argv;
 			if (NEXT_ARG_OK()) {
 				NEXT_ARG();
 				if (matches(*argv, "qos") == 0) {
 					NEXT_ARG();
-					if (get_unsigned(&ivv.qos, *argv, 0)) {
-						invarg("Invalid \"qos\" value\n", *argv);
-					}
+					qos = *argv;
 				} else {
 					/* rewind arg */
 					PREV_ARG();
 				}
 			}
-			addattr_l(&req->n, sizeof(*req), IFLA_VF_VLAN, &ivv, sizeof(ivv));
 		} else if (matches(*argv, "rate") == 0) {
-			struct ifla_vf_tx_rate ivt;
 			NEXT_ARG();
-			if (get_unsigned(&ivt.rate, *argv, 0)) {
-				invarg("Invalid \"rate\" value\n", *argv);
-			}
-			ivt.vf = vf;
-			addattr_l(&req->n, sizeof(*req), IFLA_VF_TX_RATE, &ivt, sizeof(ivt));
-		
+			rate = *argv;
 		} else {
 			/* rewind arg */
 			PREV_ARG();
@@ -235,11 +358,43 @@ int iplink_parse_vf(int vf, int *argcp, char ***argvp,
 	if (argc == *argcp)
 		incomplete_command();
 
-	addattr_nest_end(&req->n, vfinfo);
+	if (mac || vlan || rate) {
+
+		vflist = addattr_nest(&req->n, sizeof(*req), IFLA_VFINFO_LIST);
+		vfinfo = addattr_nest(&req->n, sizeof(*req), IFLA_VF_INFO);
+
+		if (mac) {
+			len = ll_addr_a2n((char *)ivm.mac, 32, mac);
+			if (len < 0)
+				invarg("Invalid \"mac\" value\n", mac);
+			addattr_l(&req->n, sizeof(*req), IFLA_VF_MAC,
+				&ivm, sizeof(ivm));
+		}
+
+		if (vlan) {
+			if (get_unsigned(&ivv.vlan, vlan, 0))
+				invarg("Invalid \"vlan\" value\n", vlan);
+			if (qos) {
+				if (get_unsigned(&ivv.qos, qos, 0))
+					invarg("Invalid \"qos\" value\n", qos);
+			}
+			addattr_l(&req->n, sizeof(*req), IFLA_VF_VLAN,
+				&ivv, sizeof(ivv));
+		}
+
+		if (rate) {
+			if (get_unsigned(&ivt.rate, rate, 0))
+				invarg("Invalid \"rate\" value\n", rate);
+			addattr_l(&req->n, sizeof(*req), IFLA_VF_TX_RATE,
+				&ivt, sizeof(ivt));
+		}
+
+		addattr_nest_end(&req->n, vfinfo);
+		addattr_nest_end(&req->n, vflist);
+	}
 
 	*argcp = argc;
 	*argvp = argv;
-	return 0;
 }
 
 
@@ -349,18 +504,14 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req,
 				req->i.ifi_flags |= IFF_NOARP;
 			} else
 				return on_off("noarp");
+		} else if (strcmp(*argv, "port") == 0) {
+			iplink_parse_port(vf, &argc, &argv, req);
 		} else if (strcmp(*argv, "vf") == 0) {
-			struct rtattr *vflist;
 			NEXT_ARG();
 			if (get_integer(&vf,  *argv, 0)) {
 				invarg("Invalid \"vf\" value\n", *argv);
 			}
-			vflist = addattr_nest(&req->n, sizeof(*req),
-					      IFLA_VFINFO_LIST);
-			len = iplink_parse_vf(vf, &argc, &argv, req);
-			if (len < 0)
-				return -1;
-			addattr_nest_end(&req->n, vflist);
+			iplink_parse_vf(vf, &argc, &argv, req);
 #ifdef IFF_DYNAMIC
 		} else if (matches(*argv, "dynamic") == 0) {
 			NEXT_ARG();


^ permalink raw reply related

* Re: Networking hangs when too many parallel requests are made at once
From: Ben Greear @ 2010-11-09 22:14 UTC (permalink / raw)
  To: Luke Hutchison; +Cc: netdev
In-Reply-To: <AANLkTinu4=Ox_vA_yr7vbdbcqr4P7oS+zMX_yaXhK4E9@mail.gmail.com>

On 11/09/2010 01:17 PM, Luke Hutchison wrote:

> On Tue, Nov 9, 2010 at 3:35 PM, Ben Greear<greearb@candelatech.com>  wrote:
>> And, a network capture of your system going into this state might
>> be useful.  I'd try to disable your wireless NIC entirely and focus
>> on debugging the wired NIC as that is usually easier to debug.
>
> Sure -- a wireshark trace is here: http://web.mit.edu/~luke_h/www/trace.bz2
>
> In this particular trace, I opened about 20 browser tabs at once.
> They all locked up after about 5 seconds.  A few of them loaded some
> more content after a minute or two.  A minute or two later, I killed
> them all.  In this particular example, pinging to a specific domain
> name continued to work (it doesn't always), although I couldn't get
> content from the domains in question: e.g. I could ping google.com,
> but opening a new tab and trying to visit google.com caused the new
> tab to hang too.

Have you tried using a different DNS server (open-dns?), or maybe a caching one one
your local machine?  Maybe some part of your network is throwing away
some of your DNS requests since you send so many at once?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* Re: Netlink limitations
From: Jan Engelhardt @ 2010-11-09 22:02 UTC (permalink / raw)
  To: Thomas Graf; +Cc: Patrick McHardy, David S. Miller, pablo, netdev
In-Reply-To: <20101109214039.GA11005@canuck.infradead.org>


On Tuesday 2010-11-09 22:40, Thomas Graf wrote:
>The addition won't be a revolution but it the increased header size,
>8 vs. 12 bytes isn't a big deal and gives us some additional room to
>work with in the future.
>
>struct nlattr_ext {
>	u16	oldlen;		/* 0 */
>	u16	kind;		/* TCA_* */
>	u8	type;		/* NLA_U32 */
>	u8	flags;		/* NLA_F_* */
>	u16	reserved;
>	u32	length;
>};

And while we're discussing this, surely there are no objections
to bumping NLA_ALIGN to 8 at the same time..

^ permalink raw reply

* [PATCH] gianfar: Do not call device_set_wakeup_enable() under a spinlock
From: Rafael J. Wysocki @ 2010-11-09 21:54 UTC (permalink / raw)
  To: Daniel J Blueman, David S. Miller; +Cc: Francois Romieu, Linux Kernel, netdev
In-Reply-To: <201011090030.42693.rjw@sisk.pl>

On Tuesday, November 09, 2010, Rafael J. Wysocki wrote:
> On Tuesday, November 02, 2010, Daniel J Blueman wrote:
> > Since device_set_wakeup_enable now sleeps, it should not be called
> > from a critical section. Since wol_en is not updated elsewhere, we can
> > omit the locking entirely.
> > 
> > Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>
> 
> Acked-by: Rafael J. Wysocki <rjw@sisk.pl>

Having reconsidered that I think it may be better to do something like in the
patch below.

This is a regression fix, so please apply if there are no objections.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: gianfar: Do not call device_set_wakeup_enable() under a spinlock

The gianfar driver calls device_set_wakeup_enable() under a spinlock,
which causes a problem to happen after the recent core power
management changes, because this function can sleep now.  Fix this
by moving the device_set_wakeup_enable() call out of the
spinlock-protected area.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/net/gianfar_ethtool.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/net/gianfar_ethtool.c
===================================================================
--- linux-2.6.orig/drivers/net/gianfar_ethtool.c
+++ linux-2.6/drivers/net/gianfar_ethtool.c
@@ -635,9 +635,10 @@ static int gfar_set_wol(struct net_devic
 	if (wol->wolopts & ~WAKE_MAGIC)
 		return -EINVAL;
 
+	device_set_wakeup_enable(&dev->dev, wol->wolopts & WAKE_MAGIC);
+
 	spin_lock_irqsave(&priv->bflock, flags);
-	priv->wol_en = wol->wolopts & WAKE_MAGIC ? 1 : 0;
-	device_set_wakeup_enable(&dev->dev, priv->wol_en);
+	priv->wol_en =  !!device_may_wakeup(&dev->dev);
 	spin_unlock_irqrestore(&priv->bflock, flags);
 
 	return 0;

^ permalink raw reply

* Re: Netlink limitations
From: Thomas Graf @ 2010-11-09 21:40 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Patrick McHardy, David S. Miller, pablo, netdev
In-Reply-To: <alpine.LNX.2.01.1011092113420.10710@obet.zrqbmnf.qr>

On Tue, Nov 09, 2010 at 09:20:09PM +0100, Jan Engelhardt wrote:
> What's more, there is no way to specify a remote host in sockaddr_nl
> right now, so all communication is necessarily being local - that is,
> unless you add a hidden forwarder in kernel space that transparently
> tunnels it into IPv6 or something.

That's fine. I don't expect the kernel to send netlink message to
other machines directly but rather have a userspace daemon handle
this. 

> I do not believe that encoding the attribute type into the protocol
> itself is going to be such a big win. You still need a local
> authoritative database (struct nla_policy[] or some representation of
> it, nevermind I'm thinking "XML-DTD"-like) to do the verification
> against because some NL messages may be purposely forged. If you have
> an nlattr that says it is a string, how do you know that it is in
> fact a string rather than a blob that happens to have a trailing \0.

True, we will never be able to verify the contents of attributes but
what we can do is give the sender the ability to specify what type of
attribute he was meant to send. This can be a big advantage as it
limits the possibliy of misinterpreting messages which may have been
corrupted because we can match the expected attribute type against
the attribute type supplied by the sender. Of course this doesn't
protect against forged messages at all, we will never be able to do
that.

The addition won't be a revolution but it the increased header size,
8 vs. 12 bytes isn't a big deal and gives us some additional room to
work with in the future.

struct nlattr_ext {
	u16	oldlen;		/* 0 */
	u16	kind;		/* TCA_* */
	u8	type;		/* NLA_U32 */
	u8	flags;		/* NLA_F_* */
	u16	reserved;
	u32	length;
};

There has been more than one debate whether to share nla_policy between
kernel and userspace. There is nothing which prevents people from doing
so. But typically the semantics between kernel->userspace and vice versa
are slightly different and require a different policy to be applied.

^ permalink raw reply

* Re: Loopback performance from kernel 2.6.12 to 2.6.37
From: Xose Vazquez Perez @ 2010-11-09 21:35 UTC (permalink / raw)
  To: netdev, jdb

Jesper Dangaard Brouer wrote:

> To fix this I added "-q 0" to netcat.  Thus my working commands are:
> 
>  netcat -l -p 9999 >/dev/null &
>  time dd if=/dev/zero bs=1M count=10000 | netcat -q0 127.0.0.1 9999
> 
> Running this on my "big" 10G testlab system, Dual Xeon 5550 2.67GHz,
> kernel version 2.6.32-5-amd64 (which I usually don't use)
> The results are 7.487 sec:

netcat flavor ?

http://nc110.sourceforge.net/
http://nmap.org/ncat/
http://www.dest-unreach.org/socat/
http://cryptcat.sourceforge.net/
http://netcat.sourceforge.net/
http://www.openbsd.org/cgi-bin/cvsweb/src/usr.bin/nc/

-- 
«Allá muevan feroz guerra, ciegos reyes por un palmo más de tierra;
que yo aquí tengo por mío cuanto abarca el mar bravío, a quien nadie
impuso leyes. Y no hay playa, sea cualquiera, ni bandera de esplendor,
que no sienta mi derecho y dé pecho a mi valor.»

^ permalink raw reply

* Re: [PATCH] Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v2)
From: Neil Horman @ 2010-11-09 21:20 UTC (permalink / raw)
  To: Maciej Żenczykowski; +Cc: netdev, davem, eric.dumazet
In-Reply-To: <AANLkTi=s9zr+yF-nr8MoHY6W6m=usR2fH=dPJx6voC34@mail.gmail.com>

On Tue, Nov 09, 2010 at 01:07:32PM -0800, Maciej Żenczykowski wrote:
> > +#define PGV_FROM_VMALLOC 1
> 
> Why don't we always just use vmalloc, what's the benefit of get_user_pages?
> 
Because of how vmalloc works.  It maps discontiguous pages into contiguous
address space.  But we only have 128MB of that address space to work with by
default, so its quite possible that we won't be able to alloc all the memory.

> > +       /*
> > +        * vmalloc failed, lets dig into swap here
> > +        */
> > +       *flags = 0;
> 
> probably better to *flags &= ~PGV_FROM_VMALLOC;
> (since some flags could have been set before this function was called)
> 
Well, if any other users of this field existed, I'd agree, but since we're the
only one, I think its ok, at least for now.

> > +       gfp_flags &= ~__GFP_NORETRY;
> > +       buffer = (char *)__get_free_pages(gfp_flags, order);
> 
> wouldn't this still cause problems because you're now requiring linear
> memory again?
yes, its a last ditch effort after the other two options have been tried.  Its
all thats left to do.

> Would it be better to just fail at this point?
Why?  If we can dig into swap and get the memory, we may as well try.  It would
be better if we didn't have to, but if the choice is between failing and making
the system slow down....

Neil

> 

^ permalink raw reply

* Re: Networking hangs when too many parallel requests are made at once
From: Luke Hutchison @ 2010-11-09 21:17 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev
In-Reply-To: <4CD9B09B.8040906@candelatech.com>

On Tue, Nov 9, 2010 at 3:35 PM, Ben Greear <greearb@candelatech.com> wrote:
> On 11/09/2010 12:27 PM, Luke Hutchison wrote:
>> No, I haven't been able to reproduce on any other machine.  But it
>> happens on both my wifi NIC and my ethernet NIC in this machine.
>
> Well, let us know what those are, at least.

From my first email:

> I have a Toshiba Satellite Pro S300M-S2142
> laptop with a Core 2 Duo P8600 CPU, Intel GM45 gfx,
> Intel 82567V Gigabit Ethernet and Intel 5100 Wifi,
> running kernel kernel-2.6.36-1.1.fc15.x86_64 on top
> of Fedora 14.

On Tue, Nov 9, 2010 at 3:35 PM, Ben Greear <greearb@candelatech.com> wrote:
> And, a network capture of your system going into this state might
> be useful.  I'd try to disable your wireless NIC entirely and focus
> on debugging the wired NIC as that is usually easier to debug.

Sure -- a wireshark trace is here: http://web.mit.edu/~luke_h/www/trace.bz2

In this particular trace, I opened about 20 browser tabs at once.
They all locked up after about 5 seconds.  A few of them loaded some
more content after a minute or two.  A minute or two later, I killed
them all.  In this particular example, pinging to a specific domain
name continued to work (it doesn't always), although I couldn't get
content from the domains in question: e.g. I could ping google.com,
but opening a new tab and trying to visit google.com caused the new
tab to hang too.

Thanks,
Luke

^ permalink raw reply

* Re: [PATCH] Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v2)
From: Maciej Żenczykowski @ 2010-11-09 21:07 UTC (permalink / raw)
  To: nhorman; +Cc: netdev, davem, eric.dumazet
In-Reply-To: <1289324799-2256-1-git-send-email-nhorman@tuxdriver.com>

> +#define PGV_FROM_VMALLOC 1

Why don't we always just use vmalloc, what's the benefit of get_user_pages?

> +       /*
> +        * vmalloc failed, lets dig into swap here
> +        */
> +       *flags = 0;

probably better to *flags &= ~PGV_FROM_VMALLOC;
(since some flags could have been set before this function was called)

> +       gfp_flags &= ~__GFP_NORETRY;
> +       buffer = (char *)__get_free_pages(gfp_flags, order);

wouldn't this still cause problems because you're now requiring linear
memory again?
Would it be better to just fail at this point?

^ permalink raw reply

* Re: [PATCH] Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v2)
From: Neil Horman @ 2010-11-09 20:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem, zenczykowski
In-Reply-To: <1289330438.2774.25.camel@edumazet-laptop>

On Tue, Nov 09, 2010 at 08:20:38PM +0100, Eric Dumazet wrote:
> Le mardi 09 novembre 2010 à 13:38 -0500, Neil Horman a écrit :
> > On Tue, Nov 09, 2010 at 07:02:32PM +0100, Eric Dumazet wrote:
> > > Le mardi 09 novembre 2010 à 12:46 -0500, nhorman@tuxdriver.com a écrit :
> > ic char **alloc_pg_vec(struct tpacket_req *req, int order)
> > > > +static struct pgv *alloc_pg_vec(struct tpacket_req *req, int order)
> > > >  {
> > > >  	unsigned int block_nr = req->tp_block_nr;
> > > > -	char **pg_vec;
> > > > +	struct pgv *pg_vec;
> > > >  	int i;
> > > >  
> > > > -	pg_vec = kzalloc(block_nr * sizeof(char *), GFP_KERNEL);
> > > > +	pg_vec = kzalloc(block_nr * sizeof(struct pgv), GFP_KERNEL);
> > > 
> > > While we are at it, we could check block_nr being a sane value here ;)
> > > 
> > This is true.  What do you think a reasonable sane value is?  libpcap seems to
> > limit itself to 32 order 5 entries in the ring, but that seems a bit arbitrary.
> > Perhaps we could check and limit allocations to being no more than order 8
> > (1Mb), and a total allocation of no more than perhaps max(32Mb, 1% of all ram)?
> > Just throwing it out there, open to any suggestions here
> 
> I was refering to a malicious/buggy program giving a big tp_block_nr so
> that (block_nr * sizeof(struct pgv)) overflows the u32
> 
> One way to deal with that is to use
> 
> 	kcalloc(block_nr, sizeof(struct pgv), GFP_KERNEL);
> 
> I am not sure consistency checks done in packet_set_ring() are enough to
> properly detect such errors.
Ah, I get you, ok.  Yeah, I'll respin this with that taken into account.
Thanks!
Neil

> 
> 
> 
> 
> 
> 

^ permalink raw reply

* [PATCH 2/2] net: Simplify RX queue allocation
From: Tom Herbert @ 2010-11-09 20:47 UTC (permalink / raw)
  To: davem, netdev

This patch move RX queue allocation to alloc_netdev_mq and freeing of
the queues to free_netdev (symmetric to TX queue allocation).  Each
kobject RX queue takes a reference to the queue's device so that the
device can't be freed before all the kobjects have been released-- this
obviates the need for reference counts specific to RX queues.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/linux/netdevice.h |    3 +--
 net/core/dev.c            |   15 ++++++++++-----
 net/core/net-sysfs.c      |    7 ++-----
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d8fd2c2..da59595 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -592,8 +592,7 @@ struct netdev_rx_queue {
 	struct rps_map __rcu		*rps_map;
 	struct rps_dev_flow_table __rcu	*rps_flow_table;
 	struct kobject			kobj;
-	struct netdev_rx_queue		*first;
-	atomic_t			count;
+	struct net_device		*dev;
 } ____cacheline_aligned_in_smp;
 #endif /* CONFIG_RPS */
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 8f9c76e..87d89ba 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5034,7 +5034,7 @@ static int netif_alloc_rx_queues(struct net_device *dev)
 	 * reference count.
 	 */
 	for (i = 0; i < count; i++)
-		rx[i].first = rx;
+		rx[i].dev = dev;
 #endif
 	return 0;
 }
@@ -5110,10 +5110,6 @@ int register_netdevice(struct net_device *dev)
 
 	dev->iflink = -1;
 
-	ret = netif_alloc_rx_queues(dev);
-	if (ret)
-		goto out;
-
 	netdev_init_queues(dev);
 
 	/* Init, if this function is available */
@@ -5579,6 +5575,8 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 #ifdef CONFIG_RPS
 	dev->num_rx_queues = queue_count;
 	dev->real_num_rx_queues = queue_count;
+	if (netif_alloc_rx_queues(dev))
+		goto free_pcpu;
 #endif
 
 	dev->gso_max_size = GSO_MAX_SIZE;
@@ -5596,6 +5594,10 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 free_pcpu:
 	free_percpu(dev->pcpu_refcnt);
 	kfree(dev->_tx);
+#ifdef CONFIG_RPS
+	kfree(dev->_rx);
+#endif
+
 free_p:
 	kfree(p);
 	return NULL;
@@ -5617,6 +5619,9 @@ void free_netdev(struct net_device *dev)
 	release_net(dev_net(dev));
 
 	kfree(dev->_tx);
+#ifdef CONFIG_RPS
+	kfree(dev->_rx);
+#endif
 
 	kfree(rcu_dereference_raw(dev->ingress_queue));
 
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index a5ff5a8..3ba526b 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -706,7 +706,6 @@ static struct attribute *rx_queue_default_attrs[] = {
 static void rx_queue_release(struct kobject *kobj)
 {
 	struct netdev_rx_queue *queue = to_rx_queue(kobj);
-	struct netdev_rx_queue *first = queue->first;
 	struct rps_map *map;
 	struct rps_dev_flow_table *flow_table;
 
@@ -719,8 +718,7 @@ static void rx_queue_release(struct kobject *kobj)
 	if (flow_table)
 		call_rcu(&flow_table->rcu, rps_dev_flow_table_release);
 
-	if (atomic_dec_and_test(&first->count))
-		kfree(first);
+	dev_put(queue->dev);
 }
 
 static struct kobj_type rx_queue_ktype = {
@@ -732,7 +730,6 @@ static struct kobj_type rx_queue_ktype = {
 static int rx_queue_add_kobject(struct net_device *net, int index)
 {
 	struct netdev_rx_queue *queue = net->_rx + index;
-	struct netdev_rx_queue *first = queue->first;
 	struct kobject *kobj = &queue->kobj;
 	int error = 0;
 
@@ -745,7 +742,7 @@ static int rx_queue_add_kobject(struct net_device *net, int index)
 	}
 
 	kobject_uevent(kobj, KOBJ_ADD);
-	atomic_inc(&first->count);
+	dev_hold(queue->dev);
 
 	return error;
 }
-- 
1.7.3.1


^ permalink raw reply related

* [PATCH 1/2] net: Move TX queue allocation to alloc_netdev_mq
From: Tom Herbert @ 2010-11-09 20:47 UTC (permalink / raw)
  To: davem, netdev

TX queues are now allocated in alloc_netdev_mq and freed in
free_netdev.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 net/core/dev.c |    7 +++----
 1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 0dd54a6..8f9c76e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5114,10 +5114,6 @@ int register_netdevice(struct net_device *dev)
 	if (ret)
 		goto out;
 
-	ret = netif_alloc_netdev_queues(dev);
-	if (ret)
-		goto out;
-
 	netdev_init_queues(dev);
 
 	/* Init, if this function is available */
@@ -5577,6 +5573,8 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 
 	dev->num_tx_queues = queue_count;
 	dev->real_num_tx_queues = queue_count;
+	if (netif_alloc_netdev_queues(dev))
+		goto free_pcpu;
 
 #ifdef CONFIG_RPS
 	dev->num_rx_queues = queue_count;
@@ -5597,6 +5595,7 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 
 free_pcpu:
 	free_percpu(dev->pcpu_refcnt);
+	kfree(dev->_tx);
 free_p:
 	kfree(p);
 	return NULL;
-- 
1.7.3.1


^ permalink raw reply related

* [PATCH 0/2] net: Changes in queue allocation and freeing
From: Tom Herbert @ 2010-11-09 20:47 UTC (permalink / raw)
  To: davem, netdev

Changes to both RX and TX queue allocation.  In both cases allocate
in alloc_netdev_mq and free in free_netdev.  For RX the reference
couting also changed, the device reference count can now be used.

^ permalink raw reply

* Re: Networking hangs when too many parallel requests are made at once
From: Ben Greear @ 2010-11-09 20:35 UTC (permalink / raw)
  To: Luke Hutchison; +Cc: netdev
In-Reply-To: <AANLkTinz6C4s3wX7t6CsE2bzZ2adCoi0RAcqvpEFE2GV@mail.gmail.com>

On 11/09/2010 12:27 PM, Luke Hutchison wrote:
> On Tue, Nov 9, 2010 at 2:16 PM, Ben Greear<greearb@candelatech.com>  wrote:
>> On 11/09/2010 11:04 AM, Luke Hutchison wrote:
>>>
>>> On Tue, Nov 9, 2010 at 1:30 PM, Luke Hutchison<luke.hutch@gmail.com>
>>>   wrote:
>>>>
>>>> Since around Linux kernel 2.6.33 or so (but maybe as early as
>>>> 2.6.31, not sure exactly what version), when restoring a crashed or
>>>> closed browser session of either Firefox or Chrome where lots of tabs
>>>> (say 10-40) open simultaneously, the networking stack is brought to
>>>> its knees -- most or all the tabs eventually time out without data, or
>>>> a few tabs might get some data and then display a partial web page.
>>
>> Have you been able to reproduce this on any other machine?  I suspect
>> it might be an issue with your specific NIC or other hardware.
>>
>> At the least, it's not a general problem with opening lots
>> of TCP connections, as we routinely test with thousands...
>>
>> Thanks,
>> Ben
>
> No, I haven't been able to reproduce on any other machine.  But it
> happens on both my wifi NIC and my ethernet NIC in this machine.

Well, let us know what those are, at least.

And, a network capture of your system going into this state might
be useful.  I'd try to disable your wireless NIC entirely and focus
on debugging the wired NIC as that is usually easier to debug.

Thanks,
Ben

>
> Thanks,
> Luke


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* Re: [PATCH 3/3] net: tipc: fix information leak to userland
From: Vasiliy Kulikov @ 2010-11-09 20:33 UTC (permalink / raw)
  To: David Miller
  Cc: kernel-janitors, jon.maloy, allan.stephens, tipc-discussion,
	netdev, linux-kernel
In-Reply-To: <20101109.092630.260076036.davem@davemloft.net>

On Tue, Nov 09, 2010 at 09:26 -0800, David Miller wrote:
> From: Vasiliy Kulikov <segooon@gmail.com>
> Date: Sun, 31 Oct 2010 20:10:32 +0300
> 
> > Structure sockaddr_tipc is copied to userland with padding bytes after
> > "id" field in union field "name" unitialized.  It leads to leaking of
> > contents of kernel stack memory.  We have to initialize them to zero.
> > 
> > Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
> 
> Applied.
> 
> Patches #1 and #2 were given feedback which I need you to integrate
> and submit new patches based upon, thanks.

About #2:

I still think that this:

    if (dev)
        strncpy(uaddr->sa_data, dev->name, 14);
    else
        memset(uaddr->sa_data, 0, 14);

is better than this:

    memset(uaddr->sa_data, 0, 14);
    dev = dev_get_by_index_rcu(sock_net(sk), pkt_sk(sk)->ifindex);
    if (dev)
        strlcpy(uaddr->sa_data, dev->name, 15);

Doesn't it?  Explicitly filling with zero on the same "if" level is
slightly easier to read and understand.

-- 
Vasiliy

^ permalink raw reply

* Re: Networking hangs when too many parallel requests are made at once
From: Luke Hutchison @ 2010-11-09 20:27 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev
In-Reply-To: <4CD99E20.4060307@candelatech.com>

On Tue, Nov 9, 2010 at 2:16 PM, Ben Greear <greearb@candelatech.com> wrote:
> On 11/09/2010 11:04 AM, Luke Hutchison wrote:
>>
>> On Tue, Nov 9, 2010 at 1:30 PM, Luke Hutchison<luke.hutch@gmail.com>
>>  wrote:
>>>
>>> Since around Linux kernel 2.6.33 or so (but maybe as early as
>>> 2.6.31, not sure exactly what version), when restoring a crashed or
>>> closed browser session of either Firefox or Chrome where lots of tabs
>>> (say 10-40) open simultaneously, the networking stack is brought to
>>> its knees -- most or all the tabs eventually time out without data, or
>>> a few tabs might get some data and then display a partial web page.
>
> Have you been able to reproduce this on any other machine?  I suspect
> it might be an issue with your specific NIC or other hardware.
>
> At the least, it's not a general problem with opening lots
> of TCP connections, as we routinely test with thousands...
>
> Thanks,
> Ben

No, I haven't been able to reproduce on any other machine.  But it
happens on both my wifi NIC and my ethernet NIC in this machine.

Thanks,
Luke

^ permalink raw reply

* Re: Netlink limitations
From: Jan Engelhardt @ 2010-11-09 20:20 UTC (permalink / raw)
  To: Thomas Graf; +Cc: Patrick McHardy, David S. Miller, pablo, netdev
In-Reply-To: <20101109144941.GA4018@canuck.infradead.org>


On Tuesday 2010-11-09 15:49, Thomas Graf wrote:
>
>We have tried to come up with ways of forwarding netlink messages to
>other machines several times. It always failed due to the fact that
>protocols encode attributes/data differently without having the
>ability to specify the encoding.

What's more, there is no way to specify a remote host in sockaddr_nl
right now, so all communication is necessarily being local - that is,
unless you add a hidden forwarder in kernel space that transparently
tunnels it into IPv6 or something.

>I haven't given up on the idea of self describing netlink protocols
>yet. For example we could encode the attribute type
>(i8|u16|u32|u16|string) in additional to the existing nested attribute
>flag.

I do not believe that encoding the attribute type into the protocol
itself is going to be such a big win. You still need a local
authoritative database (struct nla_policy[] or some representation of
it, nevermind I'm thinking "XML-DTD"-like) to do the verification
against because some NL messages may be purposely forged. If you have
an nlattr that says it is a string, how do you know that it is in
fact a string rather than a blob that happens to have a trailing \0.

^ permalink raw reply

* Re: [PATCH] net/dst: dst_dev_event() called after other notifiers
From: Ben Greear @ 2010-11-09 20:11 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev
In-Reply-To: <20101109.114853.193732360.davem@davemloft.net>

On 11/09/2010 11:48 AM, David Miller wrote:
> From: Eric Dumazet<eric.dumazet@gmail.com>
> Date: Tue, 09 Nov 2010 20:37:55 +0100
>
>> [PATCH] net/dst: dst_dev_event() called after other notifiers
>
> Nice, applied.
>
> However, I had to apply this by hand:
>
>>   static struct notifier_block dst_dev_notifier = {
>>   	.notifier_call  = dst_dev_event,
>> +	.priority = -10, /* must be called after other network notifiers */
>>   };
>
> The character after ".notifier_call" in my tree is a TAB character but
> in your patch it is a sequence of spaces.  This isn't looking like the
> usual email corruption, because the leading TAB characters on these
> lines are properly there.
>
> Please figure out why this happened so that it doesn't repeat in
> future patches :-)

I manually applied this as well and can confirm that interface deletion
with a global IPv6 address on it is now comparable to any other device
delete (about 30ms).

Tested-by:  Ben Greear <greearb@candelatech.com>

I'd love to test patches that made all interface deletes faster,
btw :)

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox