Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Performance issue : GRE tunneling
From: Badalian Vyacheslav @ 2008-01-18 15:08 UTC (permalink / raw)
  To: Jeba Anandhan; +Cc: netdev, matthew.hattersley
In-Reply-To: <1200664092.22132.3.camel@vglwks010.vgl2.office.vaioni.com>

Jeba Anandhan пишет:
> Hi All,
> When i send the traffic outside of GRE tunnel, The speed is in 3-4Mbps.
> When i use the tunnel for traffic, the speed get reduced huge. It is
> around 100-300Kbps. What are the factors affects the performance when we
> use tunneling [ ex:GRE tunneling ]?
>   
Hello. You may try get slow place. Get pc without CPU load (console mode 
and etc).
Try compile gre support in kernel. Turn on OPROFILE support.
Install oprofile utils.
do
opcontrol --vmlinux=/usr/src/linux/vmlinux
opcontrol -s
#do connect and try do traffic.
opreport -l | vi -
# you see that functions in kernel use cpu...
opcontrol -h #to stop

Maybe its help for you. Thanks!
> Thanks
> Jeba
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>   


^ permalink raw reply

* Re: [PATCH] IPv6 support for NFS server
From: Aurélien Charbon @ 2008-01-18 14:50 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: netdev ML, Brian Haley, Mailing list NFSv4
In-Reply-To: <20080117203857.GB6416@fieldses.org>

[-- Attachment #1: Type: text/plain, Size: 596 bytes --]

OK Bruce I have added this comment before the patch.
I have also done the changes pointed by Brian.
Please let me know if there is still something to change.

Regards,
Aurélien

J. Bruce Fields wrote:

>On Thu, Jan 17, 2008 at 06:04:35PM +0100, Aurélien Charbon wrote:
>  
>
>>Hi Bruce.
>>
>>Thanks for your comments.
>>Here is the patch with some cleanups.
>>    
>>
-- 

********************************
       Aurelien Charbon
       Linux NFSv4 team
           Bull SAS
     Echirolles - France
http://nfsv4.bullopensource.org/
********************************


[-- Attachment #2: 0001-IPv6-support-for-NFS-server.patch --]
[-- Type: text/x-patch, Size: 12547 bytes --]

Here is some correction for the ip_map caching code part. It allows us to enable IPv6 in exports.
It completely maintains backward compatibility with IPv4 by using mapped address.
Thanks to Bruce Fields, Brian Haley, Neil Brown and Hideaki Joshifuji for comments

>From 51755892e19186cd18230bac3f783b0382bf9ae0 Mon Sep 17 00:00:00 2001
From: Aurelien Charbon <aurelien.charbon@bull.net>
Date: Thu, 17 Jan 2008 14:55:03 +0100
Subject: [PATCH 1/1] IPv6 support for NFS server

---
 fs/nfsd/export.c               |    9 ++-
 fs/nfsd/nfsctl.c               |   15 +++++-
 include/linux/sunrpc/svcauth.h |    4 +-
 include/net/ipv6.h             |    9 +++
 net/sunrpc/svcauth_unix.c      |  118 +++++++++++++++++++++++++++-------------
 5 files changed, 110 insertions(+), 45 deletions(-)

diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index 66d0aeb..208db3a 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -35,6 +35,7 @@
 #include <linux/lockd/bind.h>
 #include <linux/sunrpc/msg_prot.h>
 #include <linux/sunrpc/gss_api.h>
+#include <net/ipv6.h>
 
 #define NFSDDBG_FACILITY	NFSDDBG_EXPORT
 
@@ -1556,6 +1557,7 @@ exp_addclient(struct nfsctl_client *ncp)
 {
 	struct auth_domain	*dom;
 	int			i, err;
+	struct in6_addr addr6;
 
 	/* First, consistency check. */
 	err = -EINVAL;
@@ -1574,9 +1576,10 @@ exp_addclient(struct nfsctl_client *ncp)
 		goto out_unlock;
 
 	/* Insert client into hashtable. */
-	for (i = 0; i < ncp->cl_naddr; i++)
-		auth_unix_add_addr(ncp->cl_addrlist[i], dom);
-
+	for (i = 0; i < ncp->cl_naddr; i++) {
+		ipv6_addr_set_v4mapped(ncp->cl_addrlist[i].s_addr, &addr6);
+		auth_unix_add_addr(&addr6, dom);
+	}
 	auth_unix_forget_old(dom);
 	auth_domain_put(dom);
 
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 77dc989..13d6b6b 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -37,6 +37,7 @@
 #include <linux/nfsd/syscall.h>
 
 #include <asm/uaccess.h>
+#include <net/ipv6.h>
 
 /*
  *	We have a single directory with 9 nodes in it.
@@ -222,6 +223,7 @@ static ssize_t write_getfs(struct file *file, char *buf, size_t size)
 	struct auth_domain *clp;
 	int err = 0;
 	struct knfsd_fh *res;
+	struct in6_addr in6;
 
 	if (size < sizeof(*data))
 		return -EINVAL;
@@ -236,7 +238,11 @@ static ssize_t write_getfs(struct file *file, char *buf, size_t size)
 	res = (struct knfsd_fh*)buf;
 
 	exp_readlock();
-	if (!(clp = auth_unix_lookup(sin->sin_addr)))
+
+	ipv6_addr_set_v4mapped(sin->sin_addr.s_addr, &in6);
+
+	clp = auth_unix_lookup(&in6);
+	if (!clp)
 		err = -EPERM;
 	else {
 		err = exp_rootfh(clp, data->gd_path, res, data->gd_maxlen);
@@ -257,6 +263,7 @@ static ssize_t write_getfd(struct file *file, char *buf, size_t size)
 	int err = 0;
 	struct knfsd_fh fh;
 	char *res;
+	struct in6_addr in6;
 
 	if (size < sizeof(*data))
 		return -EINVAL;
@@ -271,7 +278,11 @@ static ssize_t write_getfd(struct file *file, char *buf, size_t size)
 	res = buf;
 	sin = (struct sockaddr_in *)&data->gd_addr;
 	exp_readlock();
-	if (!(clp = auth_unix_lookup(sin->sin_addr)))
+
+	ipv6_addr_set_v4mapped(sin->sin_addr.s_addr,&in6);
+
+	clp = auth_unix_lookup(&in6);
+	if (!clp)
 		err = -EPERM;
 	else {
 		err = exp_rootfh(clp, data->gd_path, &fh, NFS_FHSIZE);
diff --git a/include/linux/sunrpc/svcauth.h b/include/linux/sunrpc/svcauth.h
index 22e1ef8..9e6fb86 100644
--- a/include/linux/sunrpc/svcauth.h
+++ b/include/linux/sunrpc/svcauth.h
@@ -120,10 +120,10 @@ extern void	svc_auth_unregister(rpc_authflavor_t flavor);
 
 extern struct auth_domain *unix_domain_find(char *name);
 extern void auth_domain_put(struct auth_domain *item);
-extern int auth_unix_add_addr(struct in_addr addr, struct auth_domain *dom);
+extern int auth_unix_add_addr(struct in6_addr *addr, struct auth_domain *dom);
 extern struct auth_domain *auth_domain_lookup(char *name, struct auth_domain *new);
 extern struct auth_domain *auth_domain_find(char *name);
-extern struct auth_domain *auth_unix_lookup(struct in_addr addr);
+extern struct auth_domain *auth_unix_lookup(struct in6_addr *addr);
 extern int auth_unix_forget_old(struct auth_domain *dom);
 extern void svcauth_unix_purge(void);
 extern void svcauth_unix_info_release(void *);
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index ae328b6..9394710 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -400,6 +400,15 @@ static inline int ipv6_addr_v4mapped(const struct in6_addr *a)
 		 a->s6_addr32[2] == htonl(0x0000ffff));
 }
 
+static inline void ipv6_addr_set_v4mapped(const __be32 addr,
+					  struct in6_addr *v4mapped)
+{
+	ipv6_addr_set(v4mapped,
+			0, 0,
+			htonl(0x0000FFFF),
+			addr);
+}
+
 /*
  * find the first different bit between two addresses
  * length of address must be a multiple of 32bits
diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
index 4114794..10ba208 100644
--- a/net/sunrpc/svcauth_unix.c
+++ b/net/sunrpc/svcauth_unix.c
@@ -11,7 +11,8 @@
 #include <linux/hash.h>
 #include <linux/string.h>
 #include <net/sock.h>
-
+#include <net/ipv6.h>
+#include <linux/kernel.h>
 #define RPCDBG_FACILITY	RPCDBG_AUTH
 
 
@@ -84,7 +85,7 @@ static void svcauth_unix_domain_release(struct auth_domain *dom)
 struct ip_map {
 	struct cache_head	h;
 	char			m_class[8]; /* e.g. "nfsd" */
-	struct in_addr		m_addr;
+	struct in6_addr		m_addr;
 	struct unix_domain	*m_client;
 	int			m_add_change;
 };
@@ -112,12 +113,19 @@ static inline int hash_ip(__be32 ip)
 	return (hash ^ (hash>>8)) & 0xff;
 }
 #endif
+static inline int hash_ip6(struct in6_addr ip)
+{
+	return (hash_ip(ip.s6_addr32[0]) ^
+		hash_ip(ip.s6_addr32[1]) ^
+		hash_ip(ip.s6_addr32[2]) ^
+		hash_ip(ip.s6_addr32[3]));
+}
 static int ip_map_match(struct cache_head *corig, struct cache_head *cnew)
 {
 	struct ip_map *orig = container_of(corig, struct ip_map, h);
 	struct ip_map *new = container_of(cnew, struct ip_map, h);
 	return strcmp(orig->m_class, new->m_class) == 0
-		&& orig->m_addr.s_addr == new->m_addr.s_addr;
+		&& ipv6_addr_equal(&orig->m_addr, &new->m_addr);
 }
 static void ip_map_init(struct cache_head *cnew, struct cache_head *citem)
 {
@@ -125,7 +133,7 @@ static void ip_map_init(struct cache_head *cnew, struct cache_head *citem)
 	struct ip_map *item = container_of(citem, struct ip_map, h);
 
 	strcpy(new->m_class, item->m_class);
-	new->m_addr.s_addr = item->m_addr.s_addr;
+	ipv6_addr_copy(&new->m_addr, &item->m_addr);
 }
 static void update(struct cache_head *cnew, struct cache_head *citem)
 {
@@ -149,22 +157,24 @@ static void ip_map_request(struct cache_detail *cd,
 				  struct cache_head *h,
 				  char **bpp, int *blen)
 {
-	char text_addr[20];
+	char text_addr[40];
 	struct ip_map *im = container_of(h, struct ip_map, h);
-	__be32 addr = im->m_addr.s_addr;
-
-	snprintf(text_addr, 20, "%u.%u.%u.%u",
-		 ntohl(addr) >> 24 & 0xff,
-		 ntohl(addr) >> 16 & 0xff,
-		 ntohl(addr) >>  8 & 0xff,
-		 ntohl(addr) >>  0 & 0xff);
 
+	if (ipv6_addr_v4mapped(&(im->m_addr))) {
+		snprintf(text_addr, 20, NIPQUAD_FMT,
+				ntohl(im->m_addr.s6_addr32[3]) >> 24 & 0xff,
+				ntohl(im->m_addr.s6_addr32[3]) >> 16 & 0xff,
+				ntohl(im->m_addr.s6_addr32[3]) >>  8 & 0xff,
+				ntohl(im->m_addr.s6_addr32[3]) >>  0 & 0xff);
+	} else {
+		snprintf(text_addr, 40, NIP6_FMT, NIP6(im->m_addr));
+	}
 	qword_add(bpp, blen, im->m_class);
 	qword_add(bpp, blen, text_addr);
 	(*bpp)[-1] = '\n';
 }
 
-static struct ip_map *ip_map_lookup(char *class, struct in_addr addr);
+static struct ip_map *ip_map_lookup(char *class, struct in6_addr *addr);
 static int ip_map_update(struct ip_map *ipm, struct unix_domain *udom, time_t expiry);
 
 static int ip_map_parse(struct cache_detail *cd,
@@ -175,10 +185,10 @@ static int ip_map_parse(struct cache_detail *cd,
 	 * for scratch: */
 	char *buf = mesg;
 	int len;
-	int b1,b2,b3,b4;
+	int b1, b2, b3, b4, b5, b6, b7, b8;
 	char c;
 	char class[8];
-	struct in_addr addr;
+	struct in6_addr addr;
 	int err;
 
 	struct ip_map *ipmp;
@@ -197,7 +207,23 @@ static int ip_map_parse(struct cache_detail *cd,
 	len = qword_get(&mesg, buf, mlen);
 	if (len <= 0) return -EINVAL;
 
-	if (sscanf(buf, "%u.%u.%u.%u%c", &b1, &b2, &b3, &b4, &c) != 4)
+	if (sscanf(buf, NIPQUAD_FMT "%c", &b1, &b2, &b3, &b4, &c) == 4) {
+		addr.s6_addr32[0] = 0;
+		addr.s6_addr32[1] = 0;
+		addr.s6_addr32[2] = htonl(0xffff);
+		addr.s6_addr32[3] =
+			htonl((((((b1<<8)|b2)<<8)|b3)<<8)|b4);
+       } else if (sscanf(buf, NIP6_FMT "%c",
+			&b1, &b2, &b3, &b4, &b5, &b6, &b7, &b8, &c) == 8) {
+		addr.s6_addr16[0] = htons(b1);
+		addr.s6_addr16[1] = htons(b2);
+		addr.s6_addr16[2] = htons(b3);
+		addr.s6_addr16[3] = htons(b4);
+		addr.s6_addr16[4] = htons(b5);
+		addr.s6_addr16[5] = htons(b6);
+		addr.s6_addr16[6] = htons(b7);
+		addr.s6_addr16[7] = htons(b8);
+       } else
 		return -EINVAL;
 
 	expiry = get_expiry(&mesg);
@@ -215,10 +241,7 @@ static int ip_map_parse(struct cache_detail *cd,
 	} else
 		dom = NULL;
 
-	addr.s_addr =
-		htonl((((((b1<<8)|b2)<<8)|b3)<<8)|b4);
-
-	ipmp = ip_map_lookup(class,addr);
+	ipmp = ip_map_lookup(class, &addr);
 	if (ipmp) {
 		err = ip_map_update(ipmp,
 			     container_of(dom, struct unix_domain, h),
@@ -238,7 +261,7 @@ static int ip_map_show(struct seq_file *m,
 		       struct cache_head *h)
 {
 	struct ip_map *im;
-	struct in_addr addr;
+	struct in6_addr addr;
 	char *dom = "-no-domain-";
 
 	if (h == NULL) {
@@ -247,20 +270,24 @@ static int ip_map_show(struct seq_file *m,
 	}
 	im = container_of(h, struct ip_map, h);
 	/* class addr domain */
-	addr = im->m_addr;
+	ipv6_addr_copy(&addr, &im->m_addr);
 
 	if (test_bit(CACHE_VALID, &h->flags) &&
 	    !test_bit(CACHE_NEGATIVE, &h->flags))
 		dom = im->m_client->h.name;
 
-	seq_printf(m, "%s %d.%d.%d.%d %s\n",
-		   im->m_class,
-		   ntohl(addr.s_addr) >> 24 & 0xff,
-		   ntohl(addr.s_addr) >> 16 & 0xff,
-		   ntohl(addr.s_addr) >>  8 & 0xff,
-		   ntohl(addr.s_addr) >>  0 & 0xff,
-		   dom
-		   );
+	if (ipv6_addr_v4mapped(&addr)) {
+		seq_printf(m, "%s" NIPQUAD_FMT "%s\n",
+			im->m_class,
+			ntohl(addr.s6_addr32[3]) >> 24 & 0xff,
+			ntohl(addr.s6_addr32[3]) >> 16 & 0xff,
+			ntohl(addr.s6_addr32[3]) >>  8 & 0xff,
+			ntohl(addr.s6_addr32[3]) >>  0 & 0xff,
+			dom);
+	} else {
+		seq_printf(m, "%s" NIP6_FMT "%s\n",
+			im->m_class, NIP6(addr), dom);
+	}
 	return 0;
 }
 
@@ -280,16 +307,16 @@ struct cache_detail ip_map_cache = {
 	.alloc		= ip_map_alloc,
 };
 
-static struct ip_map *ip_map_lookup(char *class, struct in_addr addr)
+static struct ip_map *ip_map_lookup(char *class, struct in6_addr *addr)
 {
 	struct ip_map ip;
 	struct cache_head *ch;
 
 	strcpy(ip.m_class, class);
-	ip.m_addr = addr;
+	ipv6_addr_copy(&ip.m_addr, addr);
 	ch = sunrpc_cache_lookup(&ip_map_cache, &ip.h,
 				 hash_str(class, IP_HASHBITS) ^
-				 hash_ip(addr.s_addr));
+				 hash_ip6(*addr));
 
 	if (ch)
 		return container_of(ch, struct ip_map, h);
@@ -318,14 +345,14 @@ static int ip_map_update(struct ip_map *ipm, struct unix_domain *udom, time_t ex
 	ch = sunrpc_cache_update(&ip_map_cache,
 				 &ip.h, &ipm->h,
 				 hash_str(ipm->m_class, IP_HASHBITS) ^
-				 hash_ip(ipm->m_addr.s_addr));
+				 hash_ip6(ipm->m_addr));
 	if (!ch)
 		return -ENOMEM;
 	cache_put(ch, &ip_map_cache);
 	return 0;
 }
 
-int auth_unix_add_addr(struct in_addr addr, struct auth_domain *dom)
+int auth_unix_add_addr(struct in6_addr *addr, struct auth_domain *dom)
 {
 	struct unix_domain *udom;
 	struct ip_map *ipmp;
@@ -352,7 +379,7 @@ int auth_unix_forget_old(struct auth_domain *dom)
 	return 0;
 }
 
-struct auth_domain *auth_unix_lookup(struct in_addr addr)
+struct auth_domain *auth_unix_lookup(struct in6_addr *addr)
 {
 	struct ip_map *ipm;
 	struct auth_domain *rv;
@@ -641,9 +668,24 @@ static int unix_gid_find(uid_t uid, struct group_info **gip,
 int
 svcauth_unix_set_client(struct svc_rqst *rqstp)
 {
-	struct sockaddr_in *sin = svc_addr_in(rqstp);
+	struct sockaddr_in *sin;
+	struct sockaddr_in6 *sin6, sin6_storage;
 	struct ip_map *ipm;
 
+	switch (rqstp->rq_addr.ss_family) {
+	case AF_INET:
+		sin = svc_addr_in(rqstp);
+		sin6 = &sin6_storage;
+		ipv6_addr_set(&sin6->sin6_addr, 0, 0,
+				htonl(0x0000FFFF), sin->sin_addr.s_addr);
+		break;
+	case AF_INET6:
+		sin6 = svc_addr_in6(rqstp);
+		break;
+	default:
+		BUG();
+	}
+
 	rqstp->rq_client = NULL;
 	if (rqstp->rq_proc == 0)
 		return SVC_OK;
@@ -651,7 +693,7 @@ svcauth_unix_set_client(struct svc_rqst *rqstp)
 	ipm = ip_map_cached_get(rqstp);
 	if (ipm == NULL)
 		ipm = ip_map_lookup(rqstp->rq_server->sv_program->pg_class,
-				    sin->sin_addr);
+				    &sin6->sin6_addr);
 
 	if (ipm == NULL)
 		return SVC_DENIED;
-- 
1.5.3.8


[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
NFSv4 mailing list
NFSv4@linux-nfs.org
http://linux-nfs.org/cgi-bin/mailman/listinfo/nfsv4

^ permalink raw reply related

* [PATCH net-2.6.25][NET_NS][IPV6] fix ip6_frags.ctl oops
From: Daniel Lezcano @ 2008-01-18 14:19 UTC (permalink / raw)
  To: David Miller
  Cc: Alexey Dobriyan, Denis V. Lunev, Linux Netdev List,
	Pavel Emelianov, devel

[-- Attachment #1: Type: text/plain, Size: 1 bytes --]

[-- Attachment #2: fix-ip6frag-sysctl-oops.patch --]
[-- Type: text/x-patch, Size: 1402 bytes --]

Subject: fix ip6_frag ctl
From: Daniel Lezcano <dlezcano@fr.ibm.com>

Alexey Dobriyan reported an oops when unsharing the network
indefinitely inside a loop. This is because the ip6_frag is not per
namespace while the ctls are.

That happens at the fragment timer expiration: inet_frag_secret_rebuild 
function is called and this one restarts the timer using the value stored
inside the sysctl field.

        "mod_timer(&f->secret_timer, now + f->ctl->secret_interval);"

When the network is unshared, ip6_frag.ctl is initialized with the new
sysctl instances, but ip6_frag has only one instance. A race in this case
will appear because f->ctl can be modified during the read access in the 
timer callback.

Until the ip6_frag is not per namespace, I discard the assignation to the 
ctl field of ip6_frags in ip6_frag_sysctl_init when the network namespace
is not the init net.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
---
 net/ipv6/reassembly.c |    3 +++
 1 file changed, 3 insertions(+)

Index: net-2.6.25-misc/net/ipv6/reassembly.c
===================================================================
--- net-2.6.25-misc.orig/net/ipv6/reassembly.c
+++ net-2.6.25-misc/net/ipv6/reassembly.c
@@ -627,6 +627,9 @@ static struct inet6_protocol frag_protoc

 void ipv6_frag_sysctl_init(struct net *net)
 {
+	if (net != &init_net)
+		return;
+
 	ip6_frags.ctl = &net->ipv6.sysctl.frags;
 }

^ permalink raw reply

* Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key
From: jamal @ 2008-01-18 14:08 UTC (permalink / raw)
  To: Timo Teräs; +Cc: Herbert Xu, netdev, David Miller, Shoichi Sakane
In-Reply-To: <47904B13.9060409@iki.fi>

On Fri, 2008-18-01 at 08:45 +0200, Timo Teräs wrote:

> I'll run my patched kernel 
> and try to get ipsec-tools fixed to use netlink...
> eventually.

If you are going to mod racoon to use netlink then thats without a doubt
the _best_ solution (linux distros dilema i had essentially disappears).

I would suggest as looking at racoon2 since the same KAME toolsmiths are
involved (Soichi Sakane CCed) and you can kill two birds with one stone.

In any case Timo - thanks again for your efforts.

cheers,
jamal

^ permalink raw reply

* Performance issue : GRE tunneling
From: Jeba Anandhan @ 2008-01-18 13:48 UTC (permalink / raw)
  To: netdev; +Cc: matthew.hattersley

Hi All,
When i send the traffic outside of GRE tunnel, The speed is in 3-4Mbps.
When i use the tunnel for traffic, the speed get reduced huge. It is
around 100-300Kbps. What are the factors affects the performance when we
use tunneling [ ex:GRE tunneling ]?

Thanks
Jeba

^ permalink raw reply

* Re: [PATCH 3/4] bonding: Fix work rearming
From: Makito SHIOKAWA @ 2008-01-18 13:43 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: netdev, Makito SHIOKAWA
In-Reply-To: <20080117111822.GB1710@ff.dom.local>

> Hmm... I'm not sure I understand your point, but it seems both
> bonding_store_arp_interval() and bonding_store_miimon() where this
> field could be changed, currently use cancel_delayed_work() with
> flush_workqueue(), so I presume, there is no rtnl_lock() nor
> write_lock(&bond->lock) held, so cancel_delayed_work_sync() could
> be used, which doesn't require this additional check.
I see. I rewrited the patch as below. How about this?
(But, it may be just a matter to change 'if (new_value < 0)' to 'if (new_value
<= 0)' in bonding_store_miimon() and bonding_store_arp_interval()...)

---
  drivers/net/bonding/bond_sysfs.c |    4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -997,6 +997,8 @@ static ssize_t bonding_store_miimon(stru
  		       ": %s: Setting MII monitoring interval to %d.\n",
  		       bond->dev->name, new_value);
  		bond->params.miimon = new_value;
+		if (bond->params.miimon == 0)
+			cancel_delayed_work_sync(&bond->mii_work);
  		if(bond->params.updelay)
  			printk(KERN_INFO DRV_NAME
  			      ": %s: Note: Updating updelay (to %d) "
@@ -1026,7 +1028,7 @@ static ssize_t bonding_store_miimon(stru
  				cancel_delayed_work_sync(&bond->lb_arp_work);
  		}

-		if (bond->dev->flags & IFF_UP) {
+		if (bond->params.miimon && (bond->dev->flags & IFF_UP)) {
  			/* If the interface is up, we may need to fire off
  			 * the MII timer. If the interface is down, the
  			 * timer will get fired off when the open function


-- 
Makito SHIOKAWA
MIRACLE LINUX CORPORATION

^ permalink raw reply

* Re: [PATCH net-2.6.25] [BRIDGE] Remove unused include of a header file in ebtables.c
From: David Miller @ 2008-01-18 13:38 UTC (permalink / raw)
  To: ramirose; +Cc: netdev
In-Reply-To: <eb3ff54b0801180508k7d24855ft2747224f0fb1725f@mail.gmail.com>

From: "Rami Rosen" <ramirose@gmail.com>
Date: Fri, 18 Jan 2008 15:08:02 +0200

> In net/bridge/netfilter/ebtables.c,
> - remove unused include of a header file (linux/tty.h) and remove the
>   corresponding comment above it.
> 	
> Signed-off-by: Rami Rosen <ramirose@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
From: David Miller @ 2008-01-18 13:37 UTC (permalink / raw)
  To: Robert.Olsson; +Cc: elendil, jesse.brandeburg, slavon, netdev, linux-kernel
In-Reply-To: <18320.41737.651384.808195@robur.slu.se>

From: Robert Olsson <Robert.Olsson@data.slu.se>
Date: Fri, 18 Jan 2008 14:00:57 +0100

>  I don't understand the idea with semaphore for enabling/disabling 
>  irq's either the overall logic must safer/better without it.  

They must have had code paths where they didn't know if IRQs were
enabled or not already, so they tried to create something which
approximates the:

	local_irq_save(flags);
	local_irq_restore(flags);

constructs we have for CPU interrupts, so they could go:

	e1000_irq_disable();
	/* ... */
	e1000_irq_enable();

and this would work even if the caller was running
with e1000 interrupts disabled already.

Or, something like that... it is indeed confusing.

Anyways, yes it's totally bogus and should be removed.

^ permalink raw reply

* Re: Broken "Make ip6_frags per namespace" patch
From: Daniel Lezcano @ 2008-01-18 13:27 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: davem, den, netdev, devel, Pavel Emelianov
In-Reply-To: <20080118130305.GH6217@localhost.sw.ru>

Alexey Dobriyan wrote:
> On Thu, Jan 17, 2008 at 01:01:11PM +0100, Daniel Lezcano wrote:
>> Alexey Dobriyan wrote:
>>> On Thu, Jan 17, 2008 at 11:40:42AM +0100, Daniel Lezcano wrote:
>>>> Alexey Dobriyan wrote:
>>>>>> commit c064c4811b3e87ff8202f5a966ff4eea0bc54575
>>>>>> Author: Daniel Lezcano <dlezcano@fr.ibm.com>
>>>>>> Date:   Thu Jan 10 02:56:03 2008 -0800
>>>>>>
>>>>>>   [NETNS][IPV6]: Make ip6_frags per namespace.
>>>>>>   
>>>>>>   The ip6_frags is moved to the network namespace structure.  Because
>>>>>>   there can be multiple instances of the network namespaces, and the
>>>>>>   ip6_frags is no longer a global static variable, a helper function 
>>>>>>   has
>>>>>>   been added to facilitate the initialization of the variables.
>>>>>>   
>>>>>>   Until the ipv6 protocol is not per namespace, the variables are
>>>>>>   accessed relatively from the initial network namespace.
>>>>>> --- a/include/net/netns/ipv6.h
>>>>>> +++ b/include/net/netns/ipv6.h
>>>>>> @@ -11,6 +13,7 @@ struct netns_sysctl_ipv6 {
>>>>>> #ifdef CONFIG_SYSCTL
>>>>>> 	struct ctl_table_header *table;
>>>>>> #endif
>>>>>> +	struct inet_frags_ctl frags;
>>>>>> --- a/net/ipv6/reassembly.c
>>>>>> +++ b/net/ipv6/reassembly.c
>>>>>> @@ -632,6 +625,11 @@ static struct inet6_protocol frag_protocol =
>>>>>> 	.flags		=	INET6_PROTO_NOPOLICY,
>>>>>> };
>>>>>>
>>>>>> +void ipv6_frag_sysctl_init(struct net *net)
>>>>>> +{
>>>>>> +	ip6_frags.ctl = &net->ipv6.sysctl.frags;
>>>>>> +}
>>>>> _This_ can't work. ip6frags is only one and ->ctl pointer is flipped
>>>>> onto per-netns data. Changelog is also misleading: ip6_frags_ctl is
>>>>> moved to netns not all ip6_frags.
>>>>>
>>>>> Oopsing place below -- f->ctl dereference in preparation of mod_timer() 
>>>>> call.
>>>>>
>>>>>
>>>>>
>>>>> BUG: unable to handle kernel paging request at virtual address f5da8fc8
>>>>> printing eip: c11d868a *pdpt = 0000000000003001 *pde = 0000000001728067 
>>>>> *pte = 0000000035da8000 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>>>>> Modules linked in: ebt_ip ebt_dnat ebt_arpreply ebt_arp ebt_among 
>>>>> ebtable_nat ip6t_REJECT ip6table_filter ip6_tables ebtable_filter 
>>>>> ebtable_broute ebt_802_3 ebtables des_generic nf_conntrack_netbios_ns 
>>>>> nf_conntrack_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT 
>>>>> iptable_filter ip_tables deflate zlib_deflate zlib_inflate cryptomgr 
>>>>> crypto_hash cpufreq_stats cpufreq_ondemand cdrom cbc bridge llc 
>>>>> blkcipher crypto_algapi arpt_mangle arptable_filter arp_tables x_tables 
>>>>> ah6 af_packet ipv6
>>>>>
>>>>> Pid: 0, comm: swapper Not tainted (2.6.24-rc7-net-2.6.25-nf-sysfs-n #30)
>>>>> EIP: 0060:[<c11d868a>] EFLAGS: 00010246 CPU: 1
>>>>> EIP is at inet_frag_secret_rebuild+0xaa/0xd0
>>>>> EAX: f5da8fbc EBX: 00000000 ECX: c1310000 EDX: 00000100
>>>>> ESI: f7cba000 EDI: f898f7a0 EBP: 00000040 ESP: c1310f90
>>>>> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
>>>>> Process swapper (pid: 0, ti=c1310000 task=f7c9a580 task.ti=f7c9b000)
>>>>> Stack: f898f7a8 f898f8a8 000ddcbd f898f7a0 f7cba000 c1310fc4 00000100 
>>>>> c1026d60 00000002 00000001 c1191183 c4779ddc c11d85e0 f898c860 
>>>>>      f898c860 c12c4a88 00000001 c1308da0 0000000a c1023477 00000001 
>>>>>      c130b640 c130b640 f7c9bf34 Call Trace:
>>>>> [<c1026d60>] run_timer_softirq+0x120/0x190
>>>>> [<c1191183>] net_rx_action+0x53/0x220
>>>>> [<c11d85e0>] inet_frag_secret_rebuild+0x0/0xd0
>>>>> [<c1023477>] __do_softirq+0x87/0x100
>>>>> [<c10059cf>] do_softirq+0xaf/0x110
>>>>> [<c10233e3>] irq_exit+0x83/0x90
>>>>> [<c1010ce7>] smp_apic_timer_interrupt+0x57/0x90
>>>>> [<c10036e1>] apic_timer_interrupt+0x29/0x38
>>>>> [<c10036eb>] apic_timer_interrupt+0x33/0x38
>>>>> [<c1001460>] default_idle+0x0/0x60
>>>>> [<c10014a0>] default_idle+0x40/0x60
>>>>> [<c1000ea3>] cpu_idle+0x73/0xb0
>>>>> =======================
>>>>> Code: 8b 10 85 d2 89 13 74 03 89 5a 04 89 18 89 43 04 85 f6 89 f3 75 bb 
>>>>> 45 83 fd 40 75 a5 8b 44 24 04 e8 4c 3f 01 00 8b 87 50 01 00 00 <8b> 50 
>>>>> 0c 01 54 24 08 8d 87 38 01 00 00 8b 54 24 08 83 c4 0c 5b EIP: 
>>>>> [<c11d868a>] inet_frag_secret_rebuild+0xaa/0xd0 SS:ESP 0068:c1310f90
>>>>> Kernel panic - not syncing: Fatal exception in interrupt
>>>> Hi Alexey,
>>>>
>>>> does it happen after unsharing the network ?
>>> Yep. clone(CLONE_NEWNET) in a loop and sooner or later you'll see this.
>> Thanks.
>>
>> The network namespace is not yet complete, this is normal that you have 
>> not ip6_frag per namespace. Pavel is doing that.
> 
> it would be nice to not oops meanwhile :)

Sure :)

>> Perhaps, I should disable ipv6_frag_sysctl_init when not in the init_net 
>> and enable it again when Pavel send its fragment patchset ?
> 
> Pavel told me (he won't be in office for couple of days) that this
> place was the biggest PITA during netnsization of fragments.
> FWIW, I'm currently running with
> 
> -       ip6_frags.ctl = &net->ipv6.sysctl.frags;
> +       ip6_frags.ctl = &init_net.ipv6.sysctl.frags;
> 

Ok, I will post this patch to fix that until Pavel finishes the 
fragments. Thanks a lot for testing.


^ permalink raw reply

* [PATCH net-2.6.25] [BRIDGE] Remove unused include of a header file in ebtables.c
From: Rami Rosen @ 2008-01-18 13:08 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 216 bytes --]

Hi,

In net/bridge/netfilter/ebtables.c,
- remove unused include of a header file (linux/tty.h) and remove the
  corresponding comment above it.
	
Regards,
Rami Rosen


Signed-off-by: Rami Rosen <ramirose@gmail.com>

[-- Attachment #2: patch.txt --]
[-- Type: text/plain, Size: 379 bytes --]

diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 817169e..32afff8 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -15,8 +15,6 @@
  *  2 of the License, or (at your option) any later version.
  */
 
-/* used for print_string */
-#include <linux/tty.h>
 
 #include <linux/kmod.h>
 #include <linux/module.h>

^ permalink raw reply related

* Re: Broken "Make ip6_frags per namespace" patch
From: Alexey Dobriyan @ 2008-01-18 13:03 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: davem, den, netdev, devel, Pavel Emelianov
In-Reply-To: <478F4387.9040901@fr.ibm.com>

On Thu, Jan 17, 2008 at 01:01:11PM +0100, Daniel Lezcano wrote:
> Alexey Dobriyan wrote:
> >On Thu, Jan 17, 2008 at 11:40:42AM +0100, Daniel Lezcano wrote:
> >>Alexey Dobriyan wrote:
> >>>>commit c064c4811b3e87ff8202f5a966ff4eea0bc54575
> >>>>Author: Daniel Lezcano <dlezcano@fr.ibm.com>
> >>>>Date:   Thu Jan 10 02:56:03 2008 -0800
> >>>>
> >>>>   [NETNS][IPV6]: Make ip6_frags per namespace.
> >>>>   
> >>>>   The ip6_frags is moved to the network namespace structure.  Because
> >>>>   there can be multiple instances of the network namespaces, and the
> >>>>   ip6_frags is no longer a global static variable, a helper function 
> >>>>   has
> >>>>   been added to facilitate the initialization of the variables.
> >>>>   
> >>>>   Until the ipv6 protocol is not per namespace, the variables are
> >>>>   accessed relatively from the initial network namespace.
> >>>>--- a/include/net/netns/ipv6.h
> >>>>+++ b/include/net/netns/ipv6.h
> >>>>@@ -11,6 +13,7 @@ struct netns_sysctl_ipv6 {
> >>>>#ifdef CONFIG_SYSCTL
> >>>>	struct ctl_table_header *table;
> >>>>#endif
> >>>>+	struct inet_frags_ctl frags;
> >>>>--- a/net/ipv6/reassembly.c
> >>>>+++ b/net/ipv6/reassembly.c
> >>>>@@ -632,6 +625,11 @@ static struct inet6_protocol frag_protocol =
> >>>>	.flags		=	INET6_PROTO_NOPOLICY,
> >>>>};
> >>>>
> >>>>+void ipv6_frag_sysctl_init(struct net *net)
> >>>>+{
> >>>>+	ip6_frags.ctl = &net->ipv6.sysctl.frags;
> >>>>+}
> >>>_This_ can't work. ip6frags is only one and ->ctl pointer is flipped
> >>>onto per-netns data. Changelog is also misleading: ip6_frags_ctl is
> >>>moved to netns not all ip6_frags.
> >>>
> >>>Oopsing place below -- f->ctl dereference in preparation of mod_timer() 
> >>>call.
> >>>
> >>>
> >>>
> >>>BUG: unable to handle kernel paging request at virtual address f5da8fc8
> >>>printing eip: c11d868a *pdpt = 0000000000003001 *pde = 0000000001728067 
> >>>*pte = 0000000035da8000 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> >>>Modules linked in: ebt_ip ebt_dnat ebt_arpreply ebt_arp ebt_among 
> >>>ebtable_nat ip6t_REJECT ip6table_filter ip6_tables ebtable_filter 
> >>>ebtable_broute ebt_802_3 ebtables des_generic nf_conntrack_netbios_ns 
> >>>nf_conntrack_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT 
> >>>iptable_filter ip_tables deflate zlib_deflate zlib_inflate cryptomgr 
> >>>crypto_hash cpufreq_stats cpufreq_ondemand cdrom cbc bridge llc 
> >>>blkcipher crypto_algapi arpt_mangle arptable_filter arp_tables x_tables 
> >>>ah6 af_packet ipv6
> >>>
> >>>Pid: 0, comm: swapper Not tainted (2.6.24-rc7-net-2.6.25-nf-sysfs-n #30)
> >>>EIP: 0060:[<c11d868a>] EFLAGS: 00010246 CPU: 1
> >>>EIP is at inet_frag_secret_rebuild+0xaa/0xd0
> >>>EAX: f5da8fbc EBX: 00000000 ECX: c1310000 EDX: 00000100
> >>>ESI: f7cba000 EDI: f898f7a0 EBP: 00000040 ESP: c1310f90
> >>>DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> >>>Process swapper (pid: 0, ti=c1310000 task=f7c9a580 task.ti=f7c9b000)
> >>>Stack: f898f7a8 f898f8a8 000ddcbd f898f7a0 f7cba000 c1310fc4 00000100 
> >>>c1026d60 00000002 00000001 c1191183 c4779ddc c11d85e0 f898c860 
> >>>      f898c860 c12c4a88 00000001 c1308da0 0000000a c1023477 00000001 
> >>>      c130b640 c130b640 f7c9bf34 Call Trace:
> >>>[<c1026d60>] run_timer_softirq+0x120/0x190
> >>>[<c1191183>] net_rx_action+0x53/0x220
> >>>[<c11d85e0>] inet_frag_secret_rebuild+0x0/0xd0
> >>>[<c1023477>] __do_softirq+0x87/0x100
> >>>[<c10059cf>] do_softirq+0xaf/0x110
> >>>[<c10233e3>] irq_exit+0x83/0x90
> >>>[<c1010ce7>] smp_apic_timer_interrupt+0x57/0x90
> >>>[<c10036e1>] apic_timer_interrupt+0x29/0x38
> >>>[<c10036eb>] apic_timer_interrupt+0x33/0x38
> >>>[<c1001460>] default_idle+0x0/0x60
> >>>[<c10014a0>] default_idle+0x40/0x60
> >>>[<c1000ea3>] cpu_idle+0x73/0xb0
> >>>=======================
> >>>Code: 8b 10 85 d2 89 13 74 03 89 5a 04 89 18 89 43 04 85 f6 89 f3 75 bb 
> >>>45 83 fd 40 75 a5 8b 44 24 04 e8 4c 3f 01 00 8b 87 50 01 00 00 <8b> 50 
> >>>0c 01 54 24 08 8d 87 38 01 00 00 8b 54 24 08 83 c4 0c 5b EIP: 
> >>>[<c11d868a>] inet_frag_secret_rebuild+0xaa/0xd0 SS:ESP 0068:c1310f90
> >>>Kernel panic - not syncing: Fatal exception in interrupt
> >>Hi Alexey,
> >>
> >>does it happen after unsharing the network ?
> >
> >Yep. clone(CLONE_NEWNET) in a loop and sooner or later you'll see this.
> 
> Thanks.
> 
> The network namespace is not yet complete, this is normal that you have 
> not ip6_frag per namespace. Pavel is doing that.

it would be nice to not oops meanwhile :)

> Perhaps, I should disable ipv6_frag_sysctl_init when not in the init_net 
> and enable it again when Pavel send its fragment patchset ?

Pavel told me (he won't be in office for couple of days) that this
place was the biggest PITA during netnsization of fragments.
FWIW, I'm currently running with

-       ip6_frags.ctl = &net->ipv6.sysctl.frags;
+       ip6_frags.ctl = &init_net.ipv6.sysctl.frags;


^ permalink raw reply

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
From: Robert Olsson @ 2008-01-18 13:00 UTC (permalink / raw)
  To: David Miller
  Cc: Robert.Olsson, elendil, jesse.brandeburg, slavon, netdev,
	linux-kernel
In-Reply-To: <20080118.041144.81957249.davem@davemloft.net>


David Miller writes:

 > > eth0 e1000_irq_enable sem = 1    <- ifconfig eth0 down
 > > eth0 e1000_irq_disable sem = 2
 > > 
 > > **e1000_open                     <- ifconfig eth0 up
 > > eth0 e1000_irq_disable sem = 3      Dead. irq's can't be enabled
 > > e1000_irq_enable miss
 > > eth0 e1000_irq_enable sem = 2
 > > e1000_irq_enable miss
 > > eth0 e1000_irq_enable sem = 1
 > > ADDRCONF(NETDEV_UP): eth0: link is not ready
 > 
 > Yes, this semaphore thing is highly problematic.  In the most crucial
 > areas where network driver consistency matters the most for ease of
 > understanding and debugging, the Intel drivers choose to be different

 I don't understand the idea with semaphore for enabling/disabling 
 irq's either the overall logic must safer/better without it.  
 
 > The way the napi_disable() logic breaks out from high packet load in
 > net_rx_action() is it simply returns even leaving interrupts disabled
 > when a pending napi_disable() is pending.
 > 
 > This is what trips up the semaphore logic.
 > 
 > Robert, give this patch a try.
 > 
 > In the long term this semaphore should be completely eliminated,
 > there is no justification for it.

 It's on the testing list...

 Cheers
					--ro


 > 
 > Signed-off-by: David S. Miller <davem@davemloft.net>
 > 
 > diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
 > index 0c9a6f7..76c0fa6 100644
 > --- a/drivers/net/e1000/e1000_main.c
 > +++ b/drivers/net/e1000/e1000_main.c
 > @@ -632,6 +632,7 @@ e1000_down(struct e1000_adapter *adapter)
 >  
 >  #ifdef CONFIG_E1000_NAPI
 >  	napi_disable(&adapter->napi);
 > +	atomic_set(&adapter->irq_sem, 0);
 >  #endif
 >  	e1000_irq_disable(adapter);
 >  
 > diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
 > index 2ab3bfb..9cc5a6b 100644
 > --- a/drivers/net/e1000e/netdev.c
 > +++ b/drivers/net/e1000e/netdev.c
 > @@ -2183,6 +2183,7 @@ void e1000e_down(struct e1000_adapter *adapter)
 >  	msleep(10);
 >  
 >  	napi_disable(&adapter->napi);
 > +	atomic_set(&adapter->irq_sem, 0);
 >  	e1000_irq_disable(adapter);
 >  
 >  	del_timer_sync(&adapter->watchdog_timer);
 > diff --git a/drivers/net/ixgb/ixgb_main.c b/drivers/net/ixgb/ixgb_main.c
 > index d2fb88d..4f63839 100644
 > --- a/drivers/net/ixgb/ixgb_main.c
 > +++ b/drivers/net/ixgb/ixgb_main.c
 > @@ -296,6 +296,11 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t kill_watchdog)
 >  {
 >  	struct net_device *netdev = adapter->netdev;
 >  
 > +#ifdef CONFIG_IXGB_NAPI
 > +	napi_disable(&adapter->napi);
 > +	atomic_set(&adapter->irq_sem, 0);
 > +#endif
 > +
 >  	ixgb_irq_disable(adapter);
 >  	free_irq(adapter->pdev->irq, netdev);
 >  
 > @@ -304,9 +309,7 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t kill_watchdog)
 >  
 >  	if(kill_watchdog)
 >  		del_timer_sync(&adapter->watchdog_timer);
 > -#ifdef CONFIG_IXGB_NAPI
 > -	napi_disable(&adapter->napi);
 > -#endif
 > +
 >  	adapter->link_speed = 0;
 >  	adapter->link_duplex = 0;
 >  	netif_carrier_off(netdev);
 > diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
 > index de3f45e..a4265bc 100644
 > --- a/drivers/net/ixgbe/ixgbe_main.c
 > +++ b/drivers/net/ixgbe/ixgbe_main.c
 > @@ -1409,9 +1409,11 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
 >  	IXGBE_WRITE_FLUSH(&adapter->hw);
 >  	msleep(10);
 >  
 > +	napi_disable(&adapter->napi);
 > +	atomic_set(&adapter->irq_sem, 0);
 > +
 >  	ixgbe_irq_disable(adapter);
 >  
 > -	napi_disable(&adapter->napi);
 >  	del_timer_sync(&adapter->watchdog_timer);
 >  
 >  	netif_carrier_off(netdev);
 > --
 > To unsubscribe from this list: send the line "unsubscribe netdev" in
 > the body of a message to majordomo@vger.kernel.org
 > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH ] [NETNS 3/4 net-2.6.25] Consolidate kernel netlink socket destruction.
From: Denis V. Lunev @ 2008-01-18 12:53 UTC (permalink / raw)
  To: davem; +Cc: netdev, devel, containers, Denis V. Lunev
In-Reply-To: <4790A0E3.9080006@sw.ru>

Create a specific helper for netlink kernel socket disposal. This just
let the code look better and provides a ground for proper disposal inside
a namespace.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Tested-by: Alexey Dobriyan <adobriyan@openvz.org>
---
 drivers/connector/connector.c       |    9 +++------
 drivers/scsi/scsi_netlink.c         |    2 +-
 drivers/scsi/scsi_transport_iscsi.c |    2 +-
 fs/ecryptfs/netlink.c               |    3 +--
 include/linux/netlink.h             |    1 +
 net/bridge/netfilter/ebt_ulog.c     |    4 ++--
 net/core/rtnetlink.c                |    2 +-
 net/decnet/netfilter/dn_rtmsg.c     |    4 ++--
 net/ipv4/fib_frontend.c             |    2 +-
 net/ipv4/inet_diag.c                |    2 +-
 net/ipv4/netfilter/ip_queue.c       |    4 ++--
 net/ipv4/netfilter/ipt_ULOG.c       |    4 ++--
 net/ipv6/netfilter/ip6_queue.c      |    4 ++--
 net/netfilter/nfnetlink.c           |    2 +-
 net/netlink/af_netlink.c            |   11 +++++++++++
 net/xfrm/xfrm_user.c                |    2 +-
 16 files changed, 33 insertions(+), 25 deletions(-)

diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index 37976dc..fea2d3e 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -420,8 +420,7 @@ static int __devinit cn_init(void)
 
 	dev->cbdev = cn_queue_alloc_dev("cqueue", dev->nls);
 	if (!dev->cbdev) {
-		if (dev->nls->sk_socket)
-			sock_release(dev->nls->sk_socket);
+		netlink_kernel_release(dev->nls);
 		return -EINVAL;
 	}
 	
@@ -431,8 +430,7 @@ static int __devinit cn_init(void)
 	if (err) {
 		cn_already_initialized = 0;
 		cn_queue_free_dev(dev->cbdev);
-		if (dev->nls->sk_socket)
-			sock_release(dev->nls->sk_socket);
+		netlink_kernel_release(dev->nls);
 		return -EINVAL;
 	}
 
@@ -447,8 +445,7 @@ static void __devexit cn_fini(void)
 
 	cn_del_callback(&dev->id);
 	cn_queue_free_dev(dev->cbdev);
-	if (dev->nls->sk_socket)
-		sock_release(dev->nls->sk_socket);
+	netlink_kernel_release(dev->nls);
 }
 
 subsys_initcall(cn_init);
diff --git a/drivers/scsi/scsi_netlink.c b/drivers/scsi/scsi_netlink.c
index 40579ed..fe48c24 100644
--- a/drivers/scsi/scsi_netlink.c
+++ b/drivers/scsi/scsi_netlink.c
@@ -169,7 +169,7 @@ void
 scsi_netlink_exit(void)
 {
 	if (scsi_nl_sock) {
-		sock_release(scsi_nl_sock->sk_socket);
+		netlink_kernel_release(scsi_nl_sock);
 		netlink_unregister_notifier(&scsi_netlink_notifier);
 	}
 
diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
index 5428d15..9e463a6 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -1533,7 +1533,7 @@ unregister_transport_class:
 
 static void __exit iscsi_transport_exit(void)
 {
-	sock_release(nls->sk_socket);
+	netlink_kernel_release(nls);
 	transport_class_unregister(&iscsi_connection_class);
 	transport_class_unregister(&iscsi_session_class);
 	transport_class_unregister(&iscsi_host_class);
diff --git a/fs/ecryptfs/netlink.c b/fs/ecryptfs/netlink.c
index 9aa3451..f638a69 100644
--- a/fs/ecryptfs/netlink.c
+++ b/fs/ecryptfs/netlink.c
@@ -237,7 +237,6 @@ out:
  */
 void ecryptfs_release_netlink(void)
 {
-	if (ecryptfs_nl_sock && ecryptfs_nl_sock->sk_socket)
-		sock_release(ecryptfs_nl_sock->sk_socket);
+	netlink_kernel_release(ecryptfs_nl_sock);
 	ecryptfs_nl_sock = NULL;
 }
diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 2aee0f5..bd13b6f 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -178,6 +178,7 @@ extern struct sock *netlink_kernel_create(struct net *net,
 					  void (*input)(struct sk_buff *skb),
 					  struct mutex *cb_mutex,
 					  struct module *module);
+extern void netlink_kernel_release(struct sock *sk);
 extern int netlink_change_ngroups(struct sock *sk, unsigned int groups);
 extern void netlink_clear_multicast_users(struct sock *sk, unsigned int group);
 extern void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err);
diff --git a/net/bridge/netfilter/ebt_ulog.c b/net/bridge/netfilter/ebt_ulog.c
index b73ba28..8e7b00b 100644
--- a/net/bridge/netfilter/ebt_ulog.c
+++ b/net/bridge/netfilter/ebt_ulog.c
@@ -307,7 +307,7 @@ static int __init ebt_ulog_init(void)
 	if (!ebtulognl)
 		ret = -ENOMEM;
 	else if ((ret = ebt_register_watcher(&ulog)))
-		sock_release(ebtulognl->sk_socket);
+		netlink_kernel_release(ebtulognl);
 
 	if (ret == 0)
 		nf_log_register(PF_BRIDGE, &ebt_ulog_logger);
@@ -333,7 +333,7 @@ static void __exit ebt_ulog_fini(void)
 		}
 		spin_unlock_bh(&ub->lock);
 	}
-	sock_release(ebtulognl->sk_socket);
+	netlink_kernel_release(ebtulognl);
 }
 
 module_init(ebt_ulog_init);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 2c1f665..2ef9480 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1381,7 +1381,7 @@ static void rtnetlink_net_exit(struct net *net)
 		 * free.
 		 */
 		sk->sk_net = get_net(&init_net);
-		sock_release(net->rtnl->sk_socket);
+		netlink_kernel_release(net->rtnl);
 		net->rtnl = NULL;
 	}
 }
diff --git a/net/decnet/netfilter/dn_rtmsg.c b/net/decnet/netfilter/dn_rtmsg.c
index 96375f2..6d2bd32 100644
--- a/net/decnet/netfilter/dn_rtmsg.c
+++ b/net/decnet/netfilter/dn_rtmsg.c
@@ -137,7 +137,7 @@ static int __init dn_rtmsg_init(void)
 
 	rv = nf_register_hook(&dnrmg_ops);
 	if (rv) {
-		sock_release(dnrmg->sk_socket);
+		netlink_kernel_release(dnrmg);
 	}
 
 	return rv;
@@ -146,7 +146,7 @@ static int __init dn_rtmsg_init(void)
 static void __exit dn_rtmsg_fini(void)
 {
 	nf_unregister_hook(&dnrmg_ops);
-	sock_release(dnrmg->sk_socket);
+	netlink_kernel_release(dnrmg);
 }
 
 
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 4e5216e..e787d21 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -881,7 +881,7 @@ static void nl_fib_lookup_exit(struct net *net)
 	 * initial network namespace. So the socket will  be safe to free.
 	 */
 	net->ipv4.fibnl->sk_net = get_net(&init_net);
-	sock_release(net->ipv4.fibnl->sk_socket);
+	netlink_kernel_release(net->ipv4.fibnl);
 }
 
 static void fib_disable_ip(struct net_device *dev, int force)
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index e468e7a..605ed2c 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -935,7 +935,7 @@ out_free_table:
 
 static void __exit inet_diag_exit(void)
 {
-	sock_release(idiagnl->sk_socket);
+	netlink_kernel_release(idiagnl);
 	kfree(inet_diag_table);
 }
 
diff --git a/net/ipv4/netfilter/ip_queue.c b/net/ipv4/netfilter/ip_queue.c
index 7361315..5109839 100644
--- a/net/ipv4/netfilter/ip_queue.c
+++ b/net/ipv4/netfilter/ip_queue.c
@@ -605,7 +605,7 @@ cleanup_sysctl:
 	unregister_netdevice_notifier(&ipq_dev_notifier);
 	proc_net_remove(&init_net, IPQ_PROC_FS_NAME);
 cleanup_ipqnl:
-	sock_release(ipqnl->sk_socket);
+	netlink_kernel_release(ipqnl);
 	mutex_lock(&ipqnl_mutex);
 	mutex_unlock(&ipqnl_mutex);
 
@@ -624,7 +624,7 @@ static void __exit ip_queue_fini(void)
 	unregister_netdevice_notifier(&ipq_dev_notifier);
 	proc_net_remove(&init_net, IPQ_PROC_FS_NAME);
 
-	sock_release(ipqnl->sk_socket);
+	netlink_kernel_release(ipqnl);
 	mutex_lock(&ipqnl_mutex);
 	mutex_unlock(&ipqnl_mutex);
 
diff --git a/net/ipv4/netfilter/ipt_ULOG.c b/net/ipv4/netfilter/ipt_ULOG.c
index fa24efa..b192756 100644
--- a/net/ipv4/netfilter/ipt_ULOG.c
+++ b/net/ipv4/netfilter/ipt_ULOG.c
@@ -415,7 +415,7 @@ static int __init ulog_tg_init(void)
 
 	ret = xt_register_target(&ulog_tg_reg);
 	if (ret < 0) {
-		sock_release(nflognl->sk_socket);
+		netlink_kernel_release(nflognl);
 		return ret;
 	}
 	if (nflog)
@@ -434,7 +434,7 @@ static void __exit ulog_tg_exit(void)
 	if (nflog)
 		nf_log_unregister(&ipt_ulog_logger);
 	xt_unregister_target(&ulog_tg_reg);
-	sock_release(nflognl->sk_socket);
+	netlink_kernel_release(nflognl);
 
 	/* remove pending timers and free allocated skb's */
 	for (i = 0; i < ULOG_MAXNLGROUPS; i++) {
diff --git a/net/ipv6/netfilter/ip6_queue.c b/net/ipv6/netfilter/ip6_queue.c
index a20db0b..56b4ea6 100644
--- a/net/ipv6/netfilter/ip6_queue.c
+++ b/net/ipv6/netfilter/ip6_queue.c
@@ -609,7 +609,7 @@ cleanup_sysctl:
 	proc_net_remove(&init_net, IPQ_PROC_FS_NAME);
 
 cleanup_ipqnl:
-	sock_release(ipqnl->sk_socket);
+	netlink_kernel_release(ipqnl);
 	mutex_lock(&ipqnl_mutex);
 	mutex_unlock(&ipqnl_mutex);
 
@@ -628,7 +628,7 @@ static void __exit ip6_queue_fini(void)
 	unregister_netdevice_notifier(&ipq_dev_notifier);
 	proc_net_remove(&init_net, IPQ_PROC_FS_NAME);
 
-	sock_release(ipqnl->sk_socket);
+	netlink_kernel_release(ipqnl);
 	mutex_lock(&ipqnl_mutex);
 	mutex_unlock(&ipqnl_mutex);
 
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index 2128542..b75c9c4 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -179,7 +179,7 @@ static void nfnetlink_rcv(struct sk_buff *skb)
 static void __exit nfnetlink_exit(void)
 {
 	printk("Removing netfilter NETLINK layer.\n");
-	sock_release(nfnl->sk_socket);
+	netlink_kernel_release(nfnl);
 	return;
 }
 
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 29fef55..626a582 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1405,6 +1405,17 @@ out_sock_release:
 }
 EXPORT_SYMBOL(netlink_kernel_create);
 
+
+void
+netlink_kernel_release(struct sock *sk)
+{
+	if (sk == NULL || sk->sk_socket == NULL)
+		return;
+	sock_release(sk->sk_socket);
+}
+EXPORT_SYMBOL(netlink_kernel_release);
+
+
 /**
  * netlink_change_ngroups - change number of multicast groups
  *
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 35fc16a..e0ccdf2 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2420,7 +2420,7 @@ static void __exit xfrm_user_exit(void)
 	xfrm_unregister_km(&netlink_mgr);
 	rcu_assign_pointer(xfrm_nl, NULL);
 	synchronize_rcu();
-	sock_release(nlsk->sk_socket);
+	netlink_kernel_release(nlsk);
 }
 
 module_init(xfrm_user_init);
-- 
1.5.3.rc5


^ permalink raw reply related

* [PATCH] [NETNS 4/4 net-2.6.25] Namespace stop vs 'ip r l' race.
From: Denis V. Lunev @ 2008-01-18 12:53 UTC (permalink / raw)
  To: davem; +Cc: netdev, devel, containers, Denis V. Lunev
In-Reply-To: <4790A0E3.9080006@sw.ru>

During network namespace stop process kernel side netlink sockets belonging
to a namespace should be closed. They should not prevent namespace to stop,
so they do not increment namespace usage counter. Though this counter will
be put during last sock_put.

The raplacement of the correct netns for init_ns solves the problem only
partial as socket to be stoped until proper stop is a valid netlink kernel
socket and can be looked up by the user processes. This is not a problem
until it resides in initial namespace (no processes inside this net), but
this is not true for init_net.

So, hold the referrence for a socket, remove it from lookup tables and only
after that change namespace and perform a last put.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Tested-by: Alexey Dobriyan <adobriyan@openvz.org>
---
 net/core/rtnetlink.c     |   15 ++-------------
 net/ipv4/fib_frontend.c  |    7 +------
 net/netlink/af_netlink.c |   15 +++++++++++++++
 3 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 2ef9480..aafc34d 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1365,25 +1365,14 @@ static int rtnetlink_net_init(struct net *net)
 				   rtnetlink_rcv, &rtnl_mutex, THIS_MODULE);
 	if (!sk)
 		return -ENOMEM;
-
-	/* Don't hold an extra reference on the namespace */
-	put_net(sk->sk_net);
 	net->rtnl = sk;
 	return 0;
 }
 
 static void rtnetlink_net_exit(struct net *net)
 {
-	struct sock *sk = net->rtnl;
-	if (sk) {
-		/* At the last minute lie and say this is a socket for the
-		 * initial network namespace.  So the socket will be safe to
-		 * free.
-		 */
-		sk->sk_net = get_net(&init_net);
-		netlink_kernel_release(net->rtnl);
-		net->rtnl = NULL;
-	}
+	netlink_kernel_release(net->rtnl);
+	net->rtnl = NULL;
 }
 
 static struct pernet_operations rtnetlink_net_ops = {
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index e787d21..62bd791 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -869,19 +869,14 @@ static int nl_fib_lookup_init(struct net *net)
 				   nl_fib_input, NULL, THIS_MODULE);
 	if (sk == NULL)
 		return -EAFNOSUPPORT;
-	/* Don't hold an extra reference on the namespace */
-	put_net(sk->sk_net);
 	net->ipv4.fibnl = sk;
 	return 0;
 }
 
 static void nl_fib_lookup_exit(struct net *net)
 {
-	/* At the last minute lie and say this is a socket for the
-	 * initial network namespace. So the socket will  be safe to free.
-	 */
-	net->ipv4.fibnl->sk_net = get_net(&init_net);
 	netlink_kernel_release(net->ipv4.fibnl);
+	net->ipv4.fibnl = NULL;
 }
 
 static void fib_disable_ip(struct net_device *dev, int force)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 626a582..6b178e1 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1396,6 +1396,9 @@ netlink_kernel_create(struct net *net, int unit, unsigned int groups,
 	}
 	netlink_table_ungrab();
 
+	/* Do not hold an extra referrence to a namespace as this socket is
+	 * internal to a namespace and does not prevent it to stop. */
+	put_net(net);
 	return sk;
 
 out_sock_release:
@@ -1411,7 +1414,19 @@ netlink_kernel_release(struct sock *sk)
 {
 	if (sk == NULL || sk->sk_socket == NULL)
 		return;
+
+	/*
+	 * Last sock_put should drop referrence to sk->sk_net. It has already
+	 * been dropped in netlink_kernel_create. Taking referrence to stopping
+	 * namespace is not an option.
+	 * Take referrence to a socket to remove it from netlink lookup table
+	 * _alive_ and after that destroy it in the context of init_net.
+	 */
+	sock_hold(sk);
 	sock_release(sk->sk_socket);
+
+	sk->sk_net = get_net(&init_net);
+	sock_put(sk);
 }
 EXPORT_SYMBOL(netlink_kernel_release);
 
-- 
1.5.3.rc5


^ permalink raw reply related

* [PATCH] [NETNS 1/4 net-2.6.25] Double free in netlink_release.
From: Denis V. Lunev @ 2008-01-18 12:53 UTC (permalink / raw)
  To: davem; +Cc: netdev, devel, containers, Denis V. Lunev
In-Reply-To: <4790A0E3.9080006@sw.ru>

Netlink protocol table is global for all namespaces. Some netlink protocols
have been virtualized, i.e. they have per/namespace netlink socket. This
difference can easily lead to double free if more than 1 namespace is
started. Count the number of kernel netlink sockets to track that this
table is not used any more.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Tested-by: Alexey Dobriyan <adobriyan@openvz.org>
---
 net/netlink/af_netlink.c |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 21f9e30..29fef55 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -498,9 +498,12 @@ static int netlink_release(struct socket *sock)
 
 	netlink_table_grab();
 	if (netlink_is_kernel(sk)) {
-		kfree(nl_table[sk->sk_protocol].listeners);
-		nl_table[sk->sk_protocol].module = NULL;
-		nl_table[sk->sk_protocol].registered = 0;
+		BUG_ON(nl_table[sk->sk_protocol].registered == 0);
+		if (--nl_table[sk->sk_protocol].registered == 0) {
+			kfree(nl_table[sk->sk_protocol].listeners);
+			nl_table[sk->sk_protocol].module = NULL;
+			nl_table[sk->sk_protocol].registered = 0;
+		}
 	} else if (nlk->subscriptions)
 		netlink_update_listeners(sk);
 	netlink_table_ungrab();
@@ -1389,6 +1392,7 @@ netlink_kernel_create(struct net *net, int unit, unsigned int groups,
 		nl_table[unit].registered = 1;
 	} else {
 		kfree(listeners);
+		nl_table[unit].registered++;
 	}
 	netlink_table_ungrab();
 
-- 
1.5.3.rc5


^ permalink raw reply related

* [PATCH] [NETNS 2/4 net-2.6.25] Memory leak on network namespace stop.
From: Denis V. Lunev @ 2008-01-18 12:53 UTC (permalink / raw)
  To: davem; +Cc: netdev, devel, containers, Denis V. Lunev
In-Reply-To: <4790A0E3.9080006@sw.ru>

Network namespace allocates 2 kernel netlink sockets, fibnl & rtnl. These
sockets should be disposed properly, i.e. by sock_release. Plain sock_put
is not enough.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Tested-by: Alexey Dobriyan <adobriyan@openvz.org>
---
 net/core/rtnetlink.c    |    2 +-
 net/ipv4/fib_frontend.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 4a07e83..2c1f665 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1381,7 +1381,7 @@ static void rtnetlink_net_exit(struct net *net)
 		 * free.
 		 */
 		sk->sk_net = get_net(&init_net);
-		sock_put(sk);
+		sock_release(net->rtnl->sk_socket);
 		net->rtnl = NULL;
 	}
 }
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 8ddcd3f..4e5216e 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -881,7 +881,7 @@ static void nl_fib_lookup_exit(struct net *net)
 	 * initial network namespace. So the socket will  be safe to free.
 	 */
 	net->ipv4.fibnl->sk_net = get_net(&init_net);
-	sock_put(net->ipv4.fibnl);
+	sock_release(net->ipv4.fibnl->sk_socket);
 }
 
 static void fib_disable_ip(struct net_device *dev, int force)
-- 
1.5.3.rc5


^ permalink raw reply related

* Re: [PATCH] atl1: fix frame length bug
From: Jay Cliburn @ 2008-01-18 12:52 UTC (permalink / raw)
  To: davem; +Cc: Jay Cliburn, jeff, csnook, netdev
In-Reply-To: <20080114200428.1c62e8fb@osprey.hogchain.net>

On Mon, 14 Jan 2008 20:04:28 -0600
Jay Cliburn <jacliburn@bellsouth.net> wrote:

> On Mon, 14 Jan 2008 19:56:41 -0600
> Jay Cliburn <jacliburn@bellsouth.net> wrote:
> 
> > The driver sets up the hardware to accept a frame with max length
> > equal to MTU + Ethernet header + FCS + VLAN tag, but we neglect to
> > add the VLAN tag size to the ingress buffer.  When a VLAN-tagged
> > frame arrives, the hardware passes it, but bad things happen
> > because the buffer is too small.  This patch fixes that.
> > 
> > Thanks to David Harris for reporting the bug and testing the fix.
> > 
> > Tested-by: David Harris <david.harris@cpni-inc.com>
> > Signed-off-by: Jay Cliburn <jacliburn@bellsouth.net>
> 
> Jeff, Dave,
> 
> This bugfix needs to go in for 2.6.24 if possible.

Dave,

I saw a message you sent awhile ago about Jeff handing off some netdev
stuff to you.  Since I haven't seen anything to rescind that notice,
I'll direct my request to you.

Could you please merge this patch for 2.6.24?
http://article.gmane.org/gmane.linux.network/83368

When the reporter hits the bug, the result is a locked up kernel.
Granted, it's triggered only in the presence of VLAN tagged frames, but
it's a serious bug and the fix is simple.

Jay

^ permalink raw reply

* [PATCH 0/4 net-2.6.25] Proper netlink kernel sockets disposal.
From: Denis V. Lunev @ 2008-01-18 12:51 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Linux Containers, devel

Alexey Dobriyan found, that virtualized netlink kernel sockets (fibl &
rtnl) are leaked during namespace start/stop loop.

Leaking fix (simple and obvious) reveals that netlink kernel socket
disposal leads to OOPSes:
- nl_table[protocol]->listeners is double freed
- sometimes during namespace stop netlink_sock_destruct
  BUG_TRAP(!atomic_read(&sk->sk_rmem_alloc)); is hit

This set address all these issues.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Tested-by: Alexey Dobriayn <adobriyan@openvz.org>

^ permalink raw reply

* Re: sctp use-uninitialized warning in net-2.6.25
From: David Miller @ 2008-01-18 12:49 UTC (permalink / raw)
  To: akpm; +Cc: netdev, vladislav.yasevich
In-Reply-To: <20080116135957.4eb54958.akpm@linux-foundation.org>

From: Andrew Morton <akpm@linux-foundation.org>
Date: Wed, 16 Jan 2008 13:59:57 -0800

> net/sctp/sm_statefuns.c: In function 'sctp_sf_do_5_1C_ack':
> net/sctp/sm_statefuns.c:484: warning: 'error' may be used uninitialized in this function
> 
> It is not obvious that this is a false positive.

I'll check in the following for now.

Vlad, please take a look.

diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index b126751..6e12757 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -481,7 +481,7 @@ sctp_disposition_t sctp_sf_do_5_1C_ack(const struct sctp_endpoint *ep,
 	sctp_init_chunk_t *initchunk;
 	struct sctp_chunk *err_chunk;
 	struct sctp_packet *packet;
-	sctp_error_t error;
+	sctp_error_t error = SCTP_ERROR_NO_ERROR;
 
 	if (!sctp_vtag_verify(chunk, asoc))
 		return sctp_sf_pdiscard(ep, asoc, type, arg, commands);

^ permalink raw reply related

* Re: Please pull 'fixes-davem' branch of wireless-2.6
From: David Miller @ 2008-01-18 12:33 UTC (permalink / raw)
  To: linville-2XuSBdqkA4R54TAoqtyWWQ
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20080116212645.GD3164-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>

From: "John W. Linville" <linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
Date: Wed, 16 Jan 2008 16:26:45 -0500

> One more fix for rfkill in 2.6.24...note that this branch is based
> on 2.6.24-rc8.

Pulled, thanks John.

^ permalink raw reply

* Re: [IPV4] FIB_HASH : Avoid unecessary loop in fn_hash_dump_zone()
From: David Miller @ 2008-01-18 12:30 UTC (permalink / raw)
  To: dada1; +Cc: netdev, Robert.Olsson
In-Reply-To: <20080116181925.2c5c540f.dada1@cosmosbay.com>

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Wed, 16 Jan 2008 18:19:25 +0100

> I noticed "ip route list" was slower than "cat /proc/net/route" on a machine with a full Internet
> routing table (214392 entries : Special thanks to Robert ;) )
> 
> This is similar to problem reported in commit d8c9283089287341c85a0a69de32c2287a990e71
> 
> Fix is to avoid scanning the begining of fz_hash table, but directly seek to the right offset.
> 
> Before patch :
> 
> time ip route >/tmp/ROUTE
> 
> real    0m1.285s
> user    0m0.712s
> sys     0m0.436s
> 
> After patch
> 
> # time ip route >/tmp/ROUTE
> 
> real    0m0.835s
> user    0m0.692s
> sys     0m0.124s
> 
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

Applied, thanks Eric.

^ permalink raw reply

* RE: [PATCH 0/3] UCC TDM driver for MPC83xx platforms
From: Joakim Tjernlund @ 2008-01-18 12:18 UTC (permalink / raw)
  To: Aggrwal Poonam
  Cc: sfr, Phillips Kim, Barkowski Michael, netdev, Suresh PV, rubini,
	linux-kernel, linuxppc-dev, Kalra Ashish, Cutler Richard,
	Andrew Morton
In-Reply-To: <FBA61160C48B8D438F3323FEFB4EF2C279A908@zin33exm24.fsl.freescale.net>


On Fri, 2008-01-18 at 17:28 +0530, Aggrwal Poonam wrote:
> Hello All
> 
> The TDM driver just now does not have a proper framework. Probably the
> interface cannot be generalised as such. Hence we could not decide
> whether it would be right to think of a TDM framework. Infact the
> interface this TDM driver(for MPC8323ERDB) supplies may not be usable
> for some other client as such. Please suggest on this.
> 
> But you are right as far as Freescale PowerPC platforms are concerned
> which have TDM devices. Like, 8315 also has a TDM driver which also
> exposes similar interface as 8323 because the client it is talking to is
> the same.
> 
> Following is the small description of the TDM driver along with
> interface details:
> 
> The dts file keeps a track of the TDM devices present on the board.
> Depending on them the TDM driver initializes those many driver instances
> while coming up.
> 
> The driver on the upper level can plug to more than one tdm clients
> depending on the availablity  of TDM devices. At every new request of
> the TDM client to bind with a TDM device, a free driver  instance is
> allocated to the client.
> 
> The interface can be described as follows.
> 
> tdm_register_client(struct tdm_client *)
> 	This API returns a pointer to the structure tdm_client which is
> of type
> 	struct tdm_client {
>                 u32 driver_handle;
>                 u32 (*tdm_read)(u32 driver_handle, short chn_id, short
> *pcm_buffer, short len);
>                 u32 (*tdm_write)(u32 driver_handle, short chn_id, short
> *pcm_buffer, short len);
>                 wait_queue_head_t *wakeup_event;
>         }
> 
>    It consists of:
>    - driver_handle: It is basically to identify the particular TDM
> device/driver instance.
>    - tdm_read: It is a function pointer returned by the TDM driver to be
> used to read TDM data  form a particular TDM channel.
>    - tdm_write: It is a function pointer returned by the TDM driver to
> be used to write TDM data  to a particular TDM channel.
>    - wakeup_event: It is address of a wait_queue event on which the
> client keeps on sleeping,  and the TDM driver wakes it up periodically.
> The driver is configured to wake up the client  after every 10ms.
> 
> Once the TDM client gets registered to a TDM driver instance and a TDM
> device, it interfaces  with the driver using tdm_read, tdm_write and
> wakeup_event.
> 
> Note: The TDM driver can be used by only kernel level modules. The
> driver does not expose any  file interface for User Applications. Can be
> compared to the spi driver which interfaces with  the SPI clients
> through some APIs.
> 
> 
> I need your feedback on the interface details. Some changes were
> suggested by Andrew for 32 bit tdm handle which I will modify.(Thanks
> Andrew)
> 
> Please give your ideas about a TDM framework in the kernel and the
> interface.
> 
> Waiting for your feedback.
> 
> Thanks and Regards
> Poonam 

Hi Poonam

I may have to write a HDLC over QMC driver for 832x in the near future.
Although I haven't looked at much at the UCCs programming i/f I noticed
that QMC is supposed to run over TDM. Is your TDM driver suitable for
hooking up such a driver on top?

  Jocke

^ permalink raw reply

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
From: David Miller @ 2008-01-18 12:11 UTC (permalink / raw)
  To: Robert.Olsson; +Cc: elendil, jesse.brandeburg, slavon, netdev, linux-kernel
In-Reply-To: <18318.14810.877279.41815@robur.slu.se>

From: Robert Olsson <Robert.Olsson@data.slu.se>
Date: Wed, 16 Jan 2008 18:07:38 +0100

> 
> eth0 e1000_irq_enable sem = 1    <- High netload
> eth0 e1000_irq_enable sem = 1
> eth0 e1000_irq_enable sem = 1
> eth0 e1000_irq_enable sem = 1
> eth0 e1000_irq_enable sem = 1
> eth0 e1000_irq_enable sem = 1
> eth0 e1000_irq_enable sem = 1    <- ifconfig eth0 down
> eth0 e1000_irq_disable sem = 2
> 
> **e1000_open                     <- ifconfig eth0 up
> eth0 e1000_irq_disable sem = 3      Dead. irq's can't be enabled
> e1000_irq_enable miss
> eth0 e1000_irq_enable sem = 2
> e1000_irq_enable miss
> eth0 e1000_irq_enable sem = 1
> ADDRCONF(NETDEV_UP): eth0: link is not ready

Yes, this semaphore thing is highly problematic.  In the most crucial
areas where network driver consistency matters the most for ease of
understanding and debugging, the Intel drivers choose to be different
:-(

The way the napi_disable() logic breaks out from high packet load in
net_rx_action() is it simply returns even leaving interrupts disabled
when a pending napi_disable() is pending.

This is what trips up the semaphore logic.

Robert, give this patch a try.

In the long term this semaphore should be completely eliminated,
there is no justification for it.

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 0c9a6f7..76c0fa6 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -632,6 +632,7 @@ e1000_down(struct e1000_adapter *adapter)
 
 #ifdef CONFIG_E1000_NAPI
 	napi_disable(&adapter->napi);
+	atomic_set(&adapter->irq_sem, 0);
 #endif
 	e1000_irq_disable(adapter);
 
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 2ab3bfb..9cc5a6b 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -2183,6 +2183,7 @@ void e1000e_down(struct e1000_adapter *adapter)
 	msleep(10);
 
 	napi_disable(&adapter->napi);
+	atomic_set(&adapter->irq_sem, 0);
 	e1000_irq_disable(adapter);
 
 	del_timer_sync(&adapter->watchdog_timer);
diff --git a/drivers/net/ixgb/ixgb_main.c b/drivers/net/ixgb/ixgb_main.c
index d2fb88d..4f63839 100644
--- a/drivers/net/ixgb/ixgb_main.c
+++ b/drivers/net/ixgb/ixgb_main.c
@@ -296,6 +296,11 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t kill_watchdog)
 {
 	struct net_device *netdev = adapter->netdev;
 
+#ifdef CONFIG_IXGB_NAPI
+	napi_disable(&adapter->napi);
+	atomic_set(&adapter->irq_sem, 0);
+#endif
+
 	ixgb_irq_disable(adapter);
 	free_irq(adapter->pdev->irq, netdev);
 
@@ -304,9 +309,7 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t kill_watchdog)
 
 	if(kill_watchdog)
 		del_timer_sync(&adapter->watchdog_timer);
-#ifdef CONFIG_IXGB_NAPI
-	napi_disable(&adapter->napi);
-#endif
+
 	adapter->link_speed = 0;
 	adapter->link_duplex = 0;
 	netif_carrier_off(netdev);
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index de3f45e..a4265bc 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -1409,9 +1409,11 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
 	IXGBE_WRITE_FLUSH(&adapter->hw);
 	msleep(10);
 
+	napi_disable(&adapter->napi);
+	atomic_set(&adapter->irq_sem, 0);
+
 	ixgbe_irq_disable(adapter);
 
-	napi_disable(&adapter->napi);
 	del_timer_sync(&adapter->watchdog_timer);
 
 	netif_carrier_off(netdev);

^ permalink raw reply related

* RE: [PATCH 0/3] UCC TDM driver for MPC83xx platforms
From: Aggrwal Poonam @ 2008-01-18 11:58 UTC (permalink / raw)
  To: Kumar Gala, Andrew Morton
  Cc: Phillips Kim, sfr, rubini, linuxppc-dev, netdev, linux-kernel,
	Barkowski Michael, Kalra Ashish, Cutler Richard, Suresh PV
In-Reply-To: <5927F7A4-D42A-40F6-AE9C-EDA34738A752@kernel.crashing.org>

Hello All

The TDM driver just now does not have a proper framework. Probably the
interface cannot be generalised as such. Hence we could not decide
whether it would be right to think of a TDM framework. Infact the
interface this TDM driver(for MPC8323ERDB) supplies may not be usable
for some other client as such. Please suggest on this.

But you are right as far as Freescale PowerPC platforms are concerned
which have TDM devices. Like, 8315 also has a TDM driver which also
exposes similar interface as 8323 because the client it is talking to is
the same.

Following is the small description of the TDM driver along with
interface details:

The dts file keeps a track of the TDM devices present on the board.
Depending on them the TDM driver initializes those many driver instances
while coming up.

The driver on the upper level can plug to more than one tdm clients
depending on the availablity  of TDM devices. At every new request of
the TDM client to bind with a TDM device, a free driver  instance is
allocated to the client.

The interface can be described as follows.

tdm_register_client(struct tdm_client *)
	This API returns a pointer to the structure tdm_client which is
of type
	struct tdm_client {
                u32 driver_handle;
                u32 (*tdm_read)(u32 driver_handle, short chn_id, short
*pcm_buffer, short len);
                u32 (*tdm_write)(u32 driver_handle, short chn_id, short
*pcm_buffer, short len);
                wait_queue_head_t *wakeup_event;
        }

   It consists of:
   - driver_handle: It is basically to identify the particular TDM
device/driver instance.
   - tdm_read: It is a function pointer returned by the TDM driver to be
used to read TDM data  form a particular TDM channel.
   - tdm_write: It is a function pointer returned by the TDM driver to
be used to write TDM data  to a particular TDM channel.
   - wakeup_event: It is address of a wait_queue event on which the
client keeps on sleeping,  and the TDM driver wakes it up periodically.
The driver is configured to wake up the client  after every 10ms.

Once the TDM client gets registered to a TDM driver instance and a TDM
device, it interfaces  with the driver using tdm_read, tdm_write and
wakeup_event.

Note: The TDM driver can be used by only kernel level modules. The
driver does not expose any  file interface for User Applications. Can be
compared to the spi driver which interfaces with  the SPI clients
through some APIs.

I need your feedback on the interface details. Some changes were
suggested by Andrew for 32 bit tdm handle which I will modify.(Thanks
Andrew)

Please give your ideas about a TDM framework in the kernel and the
interface.

Waiting for your feedback.

Thanks and Regards
Poonam 

-----Original Message-----
From: Kumar Gala [mailto:galak@kernel.crashing.org] 
Sent: Tuesday, January 15, 2008 9:01 AM
To: Andrew Morton
Cc: Phillips Kim; Aggrwal Poonam; sfr@canb.auug.org.au;
rubini@vision.unipv.it; linux-ppcdev@ozlabs.kernel.org;
netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Barkowski Michael;
Kalra Ashish; Cutler Richard
Subject: Re: [PATCH 0/3] UCC TDM driver for MPC83xx platforms

On Jan 14, 2008, at 3:15 PM, Andrew Morton wrote:

> On Mon, 14 Jan 2008 12:00:51 -0600
> Kim Phillips <kim.phillips@freescale.com> wrote:
>
>> On Thu, 10 Jan 2008 21:41:20 -0700
>> "Aggrwal Poonam" <Poonam.Aggrwal@freescale.com> wrote:
>>
>>> Hello  All
>>>
>>> I am waiting for more feedback on the patches.
>>>
>>> If there are no objections please consider them for 2.6.25.
>>>
>> if this isn't going to go through Alessandro Rubini/misc drivers, can

>> it go through the akpm/mm tree?
>>
>
> That would work.  But it might be more appropriate to go Kumar-
> >paulus->Linus.

I'm ok w/taking the arch/powerpc bits, but I"m a bit concerned about  
the driver itself.  I'm wondering if we need a TDM framework in the  
kernel.

I guess if Poonam could possibly describe how this driver is actually  
used that would be helpful.  I see we have 8315 with a discrete TDM  
block and I'm guessing 82xx/85xx based CPM parts of some form of TDM  
as well.

- k

^ permalink raw reply

* Re: [patch 2/2][NETNS][DST] add the network namespace pointer in dst_ops
From: David Miller @ 2008-01-18 11:58 UTC (permalink / raw)
  To: dlezcano; +Cc: netdev, den, benjamin.thery
In-Reply-To: <20080116150733.443636890@localhost.localdomain>

From: Daniel Lezcano <dlezcano@fr.ibm.com>
Date: Wed, 16 Jan 2008 15:54:18 +0100

> The network namespace pointer can be stored into the dst_ops structure.
> This is usefull when there are multiple instances of the dst_ops for a
> protocol. When there are no several instances, this field will be never
> used in the protocol. So there is no impact for the protocols which do
> implement the network namespaces.
> 
> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>

Applied, thanks.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox