Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 6/8] [PATCH] Split up rndis_host.c
From: Jussi Kivilinna @ 2008-01-02 20:13 UTC (permalink / raw)
  To: David Brownell
  Cc: bjd-a1rhEgazXTw, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20071223011707.1798C2360C6-ZcXrCSuhvln6VZ3dlLfH/g4gEjPzgfUyLrfjE7I9kuVHxeISYlDBzl6hYfS7NtTn@public.gmane.org>

Hello,

Bjorge was not comfortable with double probing rndis_wext requires,
wireless RNDIS devices have same device id as the rest of RNDIS. Our
module version checks OID_GEN_PHYSICAL_MEDIUM in generic_rndis_bind,
with rndis_host bind fails if OID is supported and wireless media type
is returned, with rndis_wext if OID isn't supported or type isn't
wireless. Should this be ok?

Should separate rndis_wext be located in drivers/net/wireless instead of
drivers/net/usb?

 - Jussi Kivilinna

On Sat, 2007-12-22 at 17:17 -0800, David Brownell wrote:
> > From: Bjorge Dijkstra <bjd-a1rhEgazXTw@public.gmane.org>
> > Subject: [PATCH 6/8] [PATCH] Split up rndis_host.c
> > Date: Sat, 22 Dec 2007 22:51:32 +0100
> >
> > Split up rndis_host.c into rndis_host.h and rndis_base.c and
> > change Makefile accordingly. This is done so we can add extra
> > source files to the rndis_host module later on.
> 
> I'm fine with splitting out a header file and the EXPORT_SYMBOL_GPL.
> But why not just have a separate "rndis_wext" module?
> 
> 
> > ---
> >  drivers/net/usb/Makefile     |    1 +
> >  drivers/net/usb/rndis_base.c |  548 ++++++++++++++++++++++++++++++
> >  drivers/net/usb/rndis_host.c |  763 ------------------------------------------
> >  drivers/net/usb/rndis_host.h |  256 ++++++++++++++
> >  4 files changed, 805 insertions(+), 763 deletions(-)
> >  create mode 100644 drivers/net/usb/rndis_base.c
> >  delete mode 100644 drivers/net/usb/rndis_host.c
> >  create mode 100644 drivers/net/usb/rndis_host.h
> -
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [usb regression] Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage
From: David Brownell @ 2008-01-02 18:48 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, Andreas Mohr, Ingo Molnar, Alexey Dobriyan,
	Andrew Morton, David Woodhouse, Eric W. Biederman, Linus Torvalds,
	Rafael J. Wysocki, Pavel Machek, kernel list, netdev,
	Pavel Emelyanov, Denis V. Lunev, USB list
In-Reply-To: <Pine.LNX.4.44L0.0801021052470.4861-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org>

On Wednesday 02 January 2008, Alan Stern wrote:
> 	 BTW, I don't recall ever seeing Tony's patch announced on
> linux-usb or linux-usb-devel.  Did I simply miss it?

I think he didn't post it.  I got some questions from him at
one point, which I answered, but as I recall he decided for
some reason to stop that work.

-
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: 2.6.24-rc6-mm1
From: Torsten Kaiser @ 2008-01-02 18:29 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Andrew Morton, linux-kernel, Neil Brown, J. Bruce Fields, netdev,
	Tom Tucker
In-Reply-To: <20080101120406.GA27209@gondor.apana.org.au>

On Jan 1, 2008 1:04 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> In any case, I suspect the cause of your problem is that somebody
> somewhere is doing a double-free on an skb.
>
> Since you're the only person who can reproduce this, we really need
> your help to track this down.  Since bisecting the mm tree is not
> practical, you could start by checking whether the bug is in mm only
> or whether it affects rc6 too.

Vanilla 2.6.24-rc6 seems stable. I did not see any crash or warnings.

Torsten

^ permalink raw reply

* Re: [klibc] [patch] import socket defines
From: H. Peter Anvin @ 2008-01-02 18:09 UTC (permalink / raw)
  To: Mike Frysinger; +Cc: Netdev List, klibc list
In-Reply-To: <200801020830.43449.vapier@gentoo.org>

[-- Attachment #1: Type: text/plain, Size: 912 bytes --]

Mike Frysinger wrote:
> On Tuesday 01 January 2008, H. Peter Anvin wrote:
>> Mike Frysinger wrote:
>>> The kernel __GLIBC__ hacks were re-added so as to appease klibc people,
>>> but the klibc people didnt actually fix the problem on their side.  This
>>> patch imports the structures/defines that klibc seems to need.  Build
>>> tested on x86_64 (i dont actually use klibc so no idea how to test it).
>> The whole point was to NOT need to replicate all these structures and
>> constants, which are part of the ABI, in klibc...
> 
> then figure out a way that doesnt make the kernel headers blow for everyone 
> else out there.  change the __GLIBC__ crap to __KLIBC__ or something.

Seems the most logical thing to do would be to break out the small 
portion that everyone wants into <linux/sockaddr.h> or somesuch, and 
then remove those ifdefs entirely.

Proposed patch (still being tested) attached...

	-hpa

[-- Attachment #2: 0001-linux-socket.h-break-out-glibc-portions-into-l.patch --]
[-- Type: text/x-patch, Size: 3780 bytes --]

>From 727c56ac213bdaedb9247c442375a5979686acf5 Mon Sep 17 00:00:00 2001
From: H. Peter Anvin <hpa@zytor.com>
Date: Wed, 2 Jan 2008 10:08:16 -0800
Subject: [PATCH] <linux/socket.h>: break out "glibc" portions into <linux/sockaddr.h>

Some userspaces (e.g. klibc) want to be able to use the full set of
ABI constants in <linux/socket.h>, others (e.g. glibc) do not, and
rather want just the bare minimum to build a kernel-valid
sockaddr... but apparently they want that much.  This is currently
keyed on the existence of __GLIBC__, which is clearly wrong.

This patch breaks out the "bare minimum" into <linux/sockaddr.h> for
the userspaces who want to do it themselves, and eliminates the
ifdefs completely.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 include/linux/Kbuild     |    1 +
 include/linux/sockaddr.h |   19 +++++++++++++++++++
 include/linux/socket.h   |   21 ++-------------------
 3 files changed, 22 insertions(+), 19 deletions(-)
 create mode 100644 include/linux/sockaddr.h

diff --git a/include/linux/Kbuild b/include/linux/Kbuild
index f30fa92..f8bbb31 100644
--- a/include/linux/Kbuild
+++ b/include/linux/Kbuild
@@ -139,6 +139,7 @@ header-y += rose.h
 header-y += serial_reg.h
 header-y += smbno.h
 header-y += snmp.h
+header-y += sockaddr.h
 header-y += sockios.h
 header-y += som.h
 header-y += sound.h
diff --git a/include/linux/sockaddr.h b/include/linux/sockaddr.h
new file mode 100644
index 0000000..f182083
--- /dev/null
+++ b/include/linux/sockaddr.h
@@ -0,0 +1,19 @@
+#ifndef _KERNEL_SOCKADDR_H
+#define _KERNEL_SOCKADDR_H
+
+/*
+ * Desired design of maximum size and alignment (see RFC2553)
+ */
+#define _K_SS_MAXSIZE	128	/* Implementation specific max size */
+#define _K_SS_ALIGNSIZE	(__alignof__ (struct sockaddr *))
+				/* Implementation specific desired alignment */
+
+struct __kernel_sockaddr_storage {
+	unsigned short	ss_family;		/* address family */
+	/* Following field(s) are implementation specific */
+	char		__data[_K_SS_MAXSIZE - sizeof(unsigned short)];
+				/* space to achieve desired size, */
+				/* _SS_MAXSIZE value minus size of ss_family */
+} __attribute__ ((aligned(_K_SS_ALIGNSIZE)));	/* force desired alignment */
+
+#endif /* _KERNEL_SOCKADDR_H */
diff --git a/include/linux/socket.h b/include/linux/socket.h
index c22ef1c..9cd6edc 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -1,23 +1,7 @@
 #ifndef _LINUX_SOCKET_H
 #define _LINUX_SOCKET_H
 
-/*
- * Desired design of maximum size and alignment (see RFC2553)
- */
-#define _K_SS_MAXSIZE	128	/* Implementation specific max size */
-#define _K_SS_ALIGNSIZE	(__alignof__ (struct sockaddr *))
-				/* Implementation specific desired alignment */
-
-struct __kernel_sockaddr_storage {
-	unsigned short	ss_family;		/* address family */
-	/* Following field(s) are implementation specific */
-	char		__data[_K_SS_MAXSIZE - sizeof(unsigned short)];
-				/* space to achieve desired size, */
-				/* _SS_MAXSIZE value minus size of ss_family */
-} __attribute__ ((aligned(_K_SS_ALIGNSIZE)));	/* force desired alignment */
-
-#if defined(__KERNEL__) || !defined(__GLIBC__) || (__GLIBC__ < 2)
-
+#include <linux/sockaddr.h>
 #include <asm/socket.h>			/* arch-dependent defines	*/
 #include <linux/sockios.h>		/* the SIOCxxx I/O controls	*/
 #include <linux/uio.h>			/* iovec support		*/
@@ -310,7 +294,6 @@ extern int memcpy_toiovec(struct iovec *v, unsigned char *kdata, int len);
 extern int move_addr_to_user(void *kaddr, int klen, void __user *uaddr, int __user *ulen);
 extern int move_addr_to_kernel(void __user *uaddr, int ulen, void *kaddr);
 extern int put_cmsg(struct msghdr*, int level, int type, int len, void *data);
+#endif /* __KERNEL__ */
 
-#endif
-#endif /* not kernel and not glibc */
 #endif /* _LINUX_SOCKET_H */
-- 
1.5.3.6


^ permalink raw reply related

* Re: [patch] add tcp congestion control relevant parts
From: Stephen Hemminger @ 2008-01-02 17:40 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Michael Kerrisk, netdev, linux-net-u79uwXL29TY76Z2rM5mHXA,
	Thomas Egerer, linux-man-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <477B5F89.9000707-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On Wed, 02 Jan 2008 10:55:21 +0100
Michael Kerrisk <mtk.manpages-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:

> 
> 
> Stephen Hemminger wrote:
> > On Fri, 14 Dec 2007 09:48:32 +0100
> > Michael Kerrisk <mtk.manpages-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:
> > 
> >> Hello Linux networking folk,
> >>
> >> I received the patch below for the tcp.7 man page.  Would anybody here be
> >> prepared to review the new material / double check the details?
> >>
> >> Cheers,
> >>
> >> Michael
> >>
> >> -------- Original Message --------
> >> Subject: [patch] add tcp congestion control relevant parts
> >> Date: Wed, 12 Dec 2007 16:40:23 +0100
> >> From: Thomas Egerer <thomas.Egerer-opNxpl+3fjRBDgjK7y7TUQ@public.gmane.org>
> >> To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
> >> CC: linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >>
> >> Hello *,
> >>
> >> man-pages version : 2.70 from http://www.kernel.org/pub/linux/docs/man-pages/
> >> All required information were obtained by reading the kernel
> >> code/documentation.
> >> I'm not sure, whether it is completely bullet proof on when the sysctl
> >> variables/socket option first appeared in the kernel, so you might as well
> >> drop this information, but I'm pretty sure about how it works.
> >> Here we go with my patch:
> >>
> >> diff -ru man-pages-2.70/man7/tcp.7 man-pages-2.70.new/man7/tcp.7
> >> --- man-pages-2.70/man7/tcp.7   2007-11-24 14:33:34.000000000 +0100
> >> +++ man-pages-2.70.new/man7/tcp.7       2007-12-12 16:34:52.000000000 +0100
> >> @@ -177,8 +177,6 @@
> >>  .\" FIXME As at Sept 2006, kernel 2.6.18-rc5, the following are
> >>  .\"    not yet documented (shown with default values):
> >>  .\"
> >> -.\"     /proc/sys/net/ipv4/tcp_congestion_control (since 2.6.13)
> >> -.\"     bic
> >>  .\"     /proc/sys/net/ipv4/tcp_moderate_rcvbuf
> >>  .\"     1
> >>  .\"     /proc/sys/net/ipv4/tcp_no_metrics_save
> >> @@ -224,6 +222,20 @@
> >>  are reserved for the application buffer.
> >>  A value of 0
> >>  implies that no amount is reserved.
> >> +.TP
> >> +.BR tcp_allowed_congestion_control \
> >> +" (String; default: cubic reno) (since 2.6.13) "
> >> +Show/set the congestion control choices available to non-privileged
> >> +processes. The list is a subset of those listed in
> >> +.IR tcp_available_congestion_control "."
> >> +Default is "cubic reno" and the default setting
> >> +.RI ( tcp_congestion_control ).
> >> +.TP
> >> +.BR tcp_available_congestion_control \
> >> +" (String; default: cubic reno) (since 2.6.13) "
> >> +Lists the TCP congestion control algorithms available on the system. This
> >> value
> >> +can only be changed by loading/unloading modules responsible for congestion
> >> +control.
> >>  .\"
> >>  .\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
> >>  .TP
> >> @@ -257,6 +269,17 @@
> >>  Allows two flows sharing the same connection to converge
> >>  more rapidly.
> >>  .TP
> >> +.BR tcp_congestion_control " (String; default: cubic reno) (since 2.6.13) "
> >> +Determines the congestion control algorithm used for newly created TCP
> >> +sockets. By default Linux uses cubic with reno as fallback. If you want
> >> +to have more control over the algorithm used, you must enable the symbol
> >> +CONFIG_TCP_CONG_ADVANCED in your kernel config.
> > 
> > You can choose the default congestion control as well as part of the kernel
> > configuration.
> 
> Hi Stephen,
> 
> Other than this, did the doc patch look okay?  (I'm not sure whether there
> was an implied ACK in your message for the rest of the patch.)
> 
> Cheers,
> 
> Michael
> 

Yes, and having this documented will hopefully help answer people's
questions.

-- 
Stephen Hemminger <stephen.hemminger-ZtmgI6mnKB3QT0dZR+AlfA@public.gmane.org>
-
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] Re: Nested VLAN causes recursive locking error
From: Benny Amorsen @ 2008-01-02 16:40 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20071220135253.GA10932@ff.dom.local>

Jarek Poplawski <jarkao2@gmail.com> writes:

> Subject: [PATCH] nested VLAN: fix lockdep's recursive locking warning
>
> Allow vlans nesting other vlans without lockdep's warnings (max. 8 levels).
>
> Reported-by: Benny Amorsen
> Tested-by: Benny Amorsen	(?) NEEDS TESTING!
>
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>

The box that shows the problem is "almost-production", so it doesn't
have a build system. Is there a chance you could get the patch into
Rawhide or some other testing repository? It takes forever and a lot
of disk space to rebuild a kernel RPM, so if I could get someone else
to do the hard work, that would be lovely...

If not, I'll probably find the time to do the RPM rebuild sometime
within the next week or so.

/Benny

^ permalink raw reply

* Request to include ESFQ patch
From: Denys Fedoryshchenko @ 2008-01-02 16:10 UTC (permalink / raw)
  To: netdev; +Cc: bugfood-c

Hi

I took risk and installed ESFQ on my main backbone QoS. I found it highly 
useful, and very need in setup's where is more than 128 flows passing and 
especially where is nat available.

Here is results with overloaded class for low-priority P2P traffic customers:

pfifo 128Kbyte, bandwidth 512Kbit/s
... cut ...
64 bytes from usa.nuclearcat.com (66.230.167.210): icmp_seq=27 ttl=51 
time=228 ms
64 bytes from usa.nuclearcat.com (66.230.167.210): icmp_seq=28 ttl=51 
time=247 ms
64 bytes from usa.nuclearcat.com (66.230.167.210): icmp_seq=29 ttl=51 
time=415 ms
64 bytes from usa.nuclearcat.com (66.230.167.210): icmp_seq=30 ttl=51 
time=198 ms
64 bytes from usa.nuclearcat.com (66.230.167.210): icmp_seq=31 ttl=51 
time=2274 ms
64 bytes from usa.nuclearcat.com (66.230.167.210): icmp_seq=32 ttl=51 
time=2237 ms
64 bytes from usa.nuclearcat.com (66.230.167.210): icmp_seq=33 ttl=51 
time=2235 ms
... cut ...
--- www.nuclearcat.com ping statistics ---
100 packets transmitted, 98 received, 2% packet loss, time 99006ms
rtt min/avg/max/mdev = 155.647/1022.177/2289.229/881.461 ms, pipe 3


ping is very unstable, there is drops also. And i dont use almost traffic.

sfq
... cut ...
64 bytes from usa.nuclearcat.com (66.230.167.210): icmp_seq=61 ttl=51 
time=1136 ms
64 bytes from usa.nuclearcat.com (66.230.167.210): icmp_seq=62 ttl=51 
time=930 ms
64 bytes from usa.nuclearcat.com (66.230.167.210): icmp_seq=63 ttl=51 
time=1057 ms
64 bytes from usa.nuclearcat.com (66.230.167.210): icmp_seq=64 ttl=51 
time=1055 ms
64 bytes from usa.nuclearcat.com (66.230.167.210): icmp_seq=65 ttl=51 
time=1012 ms
64 bytes from usa.nuclearcat.com (66.230.167.210): icmp_seq=66 ttl=51 
time=880 ms
... cut ...
--- www.nuclearcat.com ping statistics ---
100 packets transmitted, 95 received, 5% packet loss, time 98984ms
rtt min/avg/max/mdev = 157.328/479.812/1136.569/331.170 ms, pipe 2

Also not so stable, buffer in sfq is 128packets, on average packet 500 bytes 
it will be around 64000 bytes, about 1 second delay only (while i need for 
test 2 second). Packetloss very high.

esfq: perturb 30 depth 65536 divisor 14 limit 256 hash ctorigdst
... cut ...
64 bytes from usa.nuclearcat.com (66.230.167.210): icmp_seq=12 ttl=51 
time=185 ms
64 bytes from usa.nuclearcat.com (66.230.167.210): icmp_seq=13 ttl=51 
time=238 ms
64 bytes from usa.nuclearcat.com (66.230.167.210): icmp_seq=14 ttl=51 
time=228 ms
64 bytes from usa.nuclearcat.com (66.230.167.210): icmp_seq=15 ttl=51 
time=377 ms
64 bytes from usa.nuclearcat.com (66.230.167.210): icmp_seq=16 ttl=51 
time=177 ms
... cut ...
--- www.nuclearcat.com ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 99009ms
rtt min/avg/max/mdev = 154.254/208.048/553.740/58.716 ms

This is worst jitter, other looks fine.
ping just great, no packetloss, even jitter is acceptable. This queue just my 
dream.

Conclusion:
There is no fair queue qdisc available for "serious" setup. ESFQ only one 
real pretendent i seen for today, to be included in kernel. And probably it 
will be highly useful feature.


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.


^ permalink raw reply

* [RFC PATCH] NET: Clone the sk_buff->iif field properly
From: Paul Moore @ 2008-01-02 16:01 UTC (permalink / raw)
  To: netdev

When sk_buffs are cloned the iif field of the new, cloned packet is neither
zeroed out or copied from the existing sk_buff.  The result is that the newly
cloned sk_buff has garbage in the iif field which is a Bad Thing.  This patch
fixes this problem by copying the iif field along with the other sk_buff
critical fields in __copy_skb_header().

This patch is needed by some of the labeled networking changes proposed for
2.6.25, does anyone have any objections?
---

 net/core/skbuff.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5b4ce9b..9cb7bb7 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -371,6 +371,7 @@ static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
 {
 	new->tstamp		= old->tstamp;
 	new->dev		= old->dev;
+	new->iif		= old->iif;
 	new->transport_header	= old->transport_header;
 	new->network_header	= old->network_header;
 	new->mac_header		= old->mac_header;


^ permalink raw reply related

* Re: [usb regression] Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage
From: Alan Stern @ 2008-01-02 15:56 UTC (permalink / raw)
  To: Greg KH
  Cc: Andreas Mohr, Ingo Molnar, Alexey Dobriyan, Andrew Morton,
	David Woodhouse, Eric W. Biederman, Linus Torvalds,
	Rafael J. Wysocki, Pavel Machek, kernel list, netdev,
	Pavel Emelyanov, Denis V. Lunev, USB list
In-Reply-To: <20080102060006.GA27693-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>

On Tue, 1 Jan 2008, Greg KH wrote:

> On Mon, Dec 31, 2007 at 11:26:43AM -0800, Greg KH wrote:
> > On Mon, Dec 31, 2007 at 12:49:52PM -0500, Alan Stern wrote:
> > > On Sun, 30 Dec 2007, Greg KH wrote:
> > > 
> > > > > It looks like Greg misused the debugfs API -- which is ironic, because
> > > > > he wrote debugfs in the first place!  :-)

> Ok, no, I didn't write that patch, I'm getting very confused here.
> 
> In 2.6.24-rc6 there is no usage of debugfs in the ohci driver.
> 
> In the -mm tree there is a patch, from Tony Jones, that moves some debug
> code out of sysfs and into debugfs where it belongs.  It does it for
> both the ehci and ohci USB host controller drivers, and this is the code
> that is incorrect if CONFIG_DEBUGFS is not enabled.

My mistake; I got the impression you had written that new code rather
than Tony.  BTW, I don't recall ever seeing Tony's patch announced on
linux-usb or linux-usb-devel.  Did I simply miss it?

> So, for the 2.6.24 release, nothing needs to be changed, all is good,
> and there is no regression.
> 
> Right?  Or am I still confused about this whole thing?

Correct.  The problem exists only in -mm and your development tree.

> I will go fix up Tony's patches in the -mm tree that do not handle the
> error return values from debugfs properly, but that is not such a rush,
> as Tony is on vacation for a few weeks, and those patches are going to
> be going in only after 2.6.24 is out.

The fix I posted earlier in this thread should simply be merged into 
Tony's patches.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Section mismatch between 'copy_net_ns' and 'net_enable_timestamp
From: Andy Johnson @ 2008-01-02 15:32 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: netdev
In-Reply-To: <477B9F70.5000709@fr.ibm.com>

Hi,

Thanks; Well, doing it indeed avoids that error.
when I disabled the sysfs entry I had the options for network namespace
which previously was not there.
Regards,
Andy



On Jan 2, 2008 4:28 PM, Daniel Lezcano <dlezcano@fr.ibm.com> wrote:
>
> Andy Johnson wrote:
> > Hello,
> >  I had git-cloned the net-2.6.25.git tree today;
> >
> >  Then I ran "make menuconfig" and accepted the defaults without any change.
> > I want to build this tree with network namespace support.
> > I saw that in 2.6.25/net/core/net_namespace.c
> > we have some #ifdef CONFIG_NET_NS ; I did not CONFIG_NET_NS in "make
> > menuconfig".
> > So I added in net_namespace.c #define CONFIG_NET_NS, before the first
> > #ifdef CONFIG_NET_NS; then I ran make.
> >
> > I saw the following warning:
> >
> > WARNING: vmlinux.o(.text+0x1d47cd): Section mismatch: reference to .init.text:
> > (between 'copy_net_ns' and 'net_enable_timestamp')
> >
> >
> > Is there something wrong here causing this "Section mismatch"?
> > or did I do something wrong?
> >
> > Regards,
> > Andy
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
>
> Hi Andy,
>
> If you want to activate the network namespace, you should disable the
> sysfs. This is a temporary restriction until Greg-KH integrates the
> sysfs per namespace:
>
> https://lists.linux-foundation.org/pipermail/containers/2007-December/009347.html
>
> Can you try to using the menuconfig to enable the network namespace and
> check if you still have the section mismatch ?
>
> General setup
>   -> Prompt for development and/or incomplete code/drivers => enable
>   -> Configure standard kernel features => enable
>
> File systems
>   -> Pseudo filesystems
>    -> sysfs file system support => disable
>
> Networking
>   -> Networking options
>    -> Network namespace support => enable
>
> Thanks
>    -- Daniel
>

^ permalink raw reply

* Re: Section mismatch between 'copy_net_ns' and 'net_enable_timestamp
From: Daniel Lezcano @ 2008-01-02 14:28 UTC (permalink / raw)
  To: Andy Johnson; +Cc: netdev
In-Reply-To: <147a89290801020616g310fff86t6c8d34d32df3865a@mail.gmail.com>

Andy Johnson wrote:
> Hello,
>  I had git-cloned the net-2.6.25.git tree today;
> 
>  Then I ran "make menuconfig" and accepted the defaults without any change.
> I want to build this tree with network namespace support.
> I saw that in 2.6.25/net/core/net_namespace.c
> we have some #ifdef CONFIG_NET_NS ; I did not CONFIG_NET_NS in "make
> menuconfig".
> So I added in net_namespace.c #define CONFIG_NET_NS, before the first
> #ifdef CONFIG_NET_NS; then I ran make.
> 
> I saw the following warning:
> 
> WARNING: vmlinux.o(.text+0x1d47cd): Section mismatch: reference to .init.text:
> (between 'copy_net_ns' and 'net_enable_timestamp')
> 
> 
> Is there something wrong here causing this "Section mismatch"?
> or did I do something wrong?
> 
> Regards,
> Andy
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Hi Andy,

If you want to activate the network namespace, you should disable the 
sysfs. This is a temporary restriction until Greg-KH integrates the 
sysfs per namespace:

https://lists.linux-foundation.org/pipermail/containers/2007-December/009347.html

Can you try to using the menuconfig to enable the network namespace and 
check if you still have the section mismatch ?

General setup
  -> Prompt for development and/or incomplete code/drivers => enable
  -> Configure standard kernel features => enable

File systems
  -> Pseudo filesystems
   -> sysfs file system support => disable

Networking
  -> Networking options
   -> Network namespace support => enable

Thanks
   -- Daniel

^ permalink raw reply

* Section mismatch between 'copy_net_ns' and 'net_enable_timestamp
From: Andy Johnson @ 2008-01-02 14:16 UTC (permalink / raw)
  To: netdev

Hello,
 I had git-cloned the net-2.6.25.git tree today;

 Then I ran "make menuconfig" and accepted the defaults without any change.
I want to build this tree with network namespace support.
I saw that in 2.6.25/net/core/net_namespace.c
we have some #ifdef CONFIG_NET_NS ; I did not CONFIG_NET_NS in "make
menuconfig".
So I added in net_namespace.c #define CONFIG_NET_NS, before the first
#ifdef CONFIG_NET_NS; then I ran make.

I saw the following warning:

WARNING: vmlinux.o(.text+0x1d47cd): Section mismatch: reference to .init.text:
(between 'copy_net_ns' and 'net_enable_timestamp')

Is there something wrong here causing this "Section mismatch"?
or did I do something wrong?

Regards,
Andy

^ permalink raw reply

* Re: [patch 0/9][NETNS][IPV6] make sysctl per namespace
From: Daniel Lezcano @ 2008-01-02 14:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev
In-Reply-To: <20080102145434.8bdba06d.dada1@cosmosbay.com>

Eric Dumazet wrote:
> On Wed, 02 Jan 2008 13:25:48 +0100
> Daniel Lezcano <dlezcano@fr.ibm.com> wrote:
> 
>> The following patchset makes the ipv6 sysctl to handle multiple
>> network namespaces. Each instance of a network namespace as its own
>> set of sysctl values, that means the behavior of the ipv6 stack can be
>> different depending on the sysctl values setup in the different
>> network namespaces.
> 
> Hi Daniel
> 
> Did you tested your patches with CONFIG_SYSCTL=n ?
> 
> For example, I had to apply this patch on current git.
> 
> Thank you
> 
> [PATCH] IPV4 : Should build with CONFIG_SYSCTL=n
> 
> Previous NETNS patches broke CONFIG_SYSCTL=n case
> 
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

Thanks Eric for fixing that.

I compiled with CONFIG_SYSCTL=n and booted.
I try to always do allmodconfig, allyesconfig, allnoconfig with a 
cross-compiler for different arch, i386, x86_64, s390, ppc, ia64, sh, 
sparc, arm and alpha, before sending to netdev@.

  -- Daniel

^ permalink raw reply

* Re: [patch 0/9][NETNS][IPV6] make sysctl per namespace
From: Eric Dumazet @ 2008-01-02 13:54 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: davem, netdev
In-Reply-To: <20080102122548.629622062@localhost.localdomain>

On Wed, 02 Jan 2008 13:25:48 +0100
Daniel Lezcano <dlezcano@fr.ibm.com> wrote:

> The following patchset makes the ipv6 sysctl to handle multiple
> network namespaces. Each instance of a network namespace as its own
> set of sysctl values, that means the behavior of the ipv6 stack can be
> different depending on the sysctl values setup in the different
> network namespaces.

Hi Daniel

Did you tested your patches with CONFIG_SYSCTL=n ?

For example, I had to apply this patch on current git.

Thank you

[PATCH] IPV4 : Should build with CONFIG_SYSCTL=n

Previous NETNS patches broke CONFIG_SYSCTL=n case

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index e06d7cf..61a28ff 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -8,7 +8,9 @@ struct ctl_table_header;
 struct ipv4_devconf;
 
 struct netns_ipv4 {
+#ifdef CONFIG_SYSCTL
 	struct ctl_table_header	*forw_hdr;
+#endif
 	struct ipv4_devconf	*devconf_all;
 	struct ipv4_devconf	*devconf_dflt;
 };
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 252ce01..98a0079 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1542,7 +1542,6 @@ static void devinet_sysctl_unregister(struct in_device *idev)
 	__devinet_sysctl_unregister(&idev->cnf);
 	neigh_sysctl_unregister(idev->arp_parms);
 }
-#endif
 
 static struct ctl_table ctl_forward_entry[] = {
 	{
@@ -1565,18 +1564,20 @@ static __net_initdata struct ctl_path net_ipv4_path[] = {
 	{ .procname = "ipv4", .ctl_name = NET_IPV4, },
 	{ },
 };
+#endif
 
 static __net_init int devinet_init_net(struct net *net)
 {
 	int err;
-	struct ctl_table *tbl;
-	struct ipv4_devconf *all, *dflt;
+#ifdef CONFIG_SYSCTL
+	struct ctl_table *tbl = ctl_forward_entry;
 	struct ctl_table_header *forw_hdr;
+#endif
+	struct ipv4_devconf *all, *dflt;
 
 	err = -ENOMEM;
 	all = &ipv4_devconf;
 	dflt = &ipv4_devconf_dflt;
-	tbl = ctl_forward_entry;
 
 	if (net != &init_net) {
 		all = kmemdup(all, sizeof(ipv4_devconf), GFP_KERNEL);
@@ -1587,6 +1588,7 @@ static __net_init int devinet_init_net(struct net *net)
 		if (dflt == NULL)
 			goto err_alloc_dflt;
 
+#ifdef CONFIG_SYSCTL
 		tbl = kmemdup(tbl, sizeof(ctl_forward_entry), GFP_KERNEL);
 		if (tbl == NULL)
 			goto err_alloc_ctl;
@@ -1594,6 +1596,7 @@ static __net_init int devinet_init_net(struct net *net)
 		tbl[0].data = &all->data[NET_IPV4_CONF_FORWARDING - 1];
 		tbl[0].extra1 = all;
 		tbl[0].extra2 = net;
+#endif
 	}
 
 #ifdef CONFIG_SYSCTL
@@ -1611,9 +1614,9 @@ static __net_init int devinet_init_net(struct net *net)
 	forw_hdr = register_net_sysctl_table(net, net_ipv4_path, tbl);
 	if (forw_hdr == NULL)
 		goto err_reg_ctl;
+	net->ipv4.forw_hdr = forw_hdr;
 #endif
 
-	net->ipv4.forw_hdr = forw_hdr;
 	net->ipv4.devconf_all = all;
 	net->ipv4.devconf_dflt = dflt;
 	return 0;
@@ -1626,8 +1629,8 @@ err_reg_dflt:
 err_reg_all:
 	if (tbl != ctl_forward_entry)
 		kfree(tbl);
-#endif
 err_alloc_ctl:
+#endif
 	if (dflt != &ipv4_devconf_dflt)
 		kfree(dflt);
 err_alloc_dflt:
@@ -1639,15 +1642,15 @@ err_alloc_all:
 
 static __net_exit void devinet_exit_net(struct net *net)
 {
+#ifdef CONFIG_SYSCTL
 	struct ctl_table *tbl;
 
 	tbl = net->ipv4.forw_hdr->ctl_table_arg;
-#ifdef CONFIG_SYSCTL
 	unregister_net_sysctl_table(net->ipv4.forw_hdr);
 	__devinet_sysctl_unregister(net->ipv4.devconf_dflt);
 	__devinet_sysctl_unregister(net->ipv4.devconf_all);
-#endif
 	kfree(tbl);
+#endif
 	kfree(net->ipv4.devconf_dflt);
 	kfree(net->ipv4.devconf_all);
 }


^ permalink raw reply related

* [patch 5/9][NETNS][IPV6] make bindv6only sysctl per namespace
From: Daniel Lezcano @ 2008-01-02 12:25 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <20080102122548.629622062@localhost.localdomain>

[-- Attachment #1: move-bindv6only-to-netns.patch --]
[-- Type: text/plain, Size: 2833 bytes --]

This patch moves the bindv6only sysctl to the network namespace
structure. Until the ipv6 protocol is not per namespace, the sysctl
variable is always from the initial network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
---
 include/net/ipv6.h         |    1 -
 include/net/netns/ipv6.h   |    1 +
 net/ipv6/af_inet6.c        |    4 +---
 net/ipv6/sysctl_net_ipv6.c |    6 +++++-
 4 files changed, 7 insertions(+), 5 deletions(-)

Index: net-2.6.25/include/net/ipv6.h
===================================================================
--- net-2.6.25.orig/include/net/ipv6.h
+++ net-2.6.25/include/net/ipv6.h
@@ -109,7 +109,6 @@ struct frag_hdr {
 #include <net/sock.h>
 
 /* sysctls */
-extern int sysctl_ipv6_bindv6only;
 extern int sysctl_mld_max_msf;
 
 #define _DEVINC(statname, modifier, idev, field)			\
Index: net-2.6.25/include/net/netns/ipv6.h
===================================================================
--- net-2.6.25.orig/include/net/netns/ipv6.h
+++ net-2.6.25/include/net/netns/ipv6.h
@@ -9,6 +9,7 @@ struct ctl_table_header;
 
 struct netns_sysctl_ipv6 {
 	struct ctl_table_header *table;
+ 	int bindv6only;
 };
 
 struct netns_ipv6 {
Index: net-2.6.25/net/ipv6/af_inet6.c
===================================================================
--- net-2.6.25.orig/net/ipv6/af_inet6.c
+++ net-2.6.25/net/ipv6/af_inet6.c
@@ -66,8 +66,6 @@ MODULE_AUTHOR("Cast of dozens");
 MODULE_DESCRIPTION("IPv6 protocol stack for Linux");
 MODULE_LICENSE("GPL");
 
-int sysctl_ipv6_bindv6only __read_mostly;
-
 /* The inetsw6 table contains everything that inet6_create needs to
  * build a new socket.
  */
@@ -193,7 +191,7 @@ lookup_protocol:
 	np->mcast_hops	= -1;
 	np->mc_loop	= 1;
 	np->pmtudisc	= IPV6_PMTUDISC_WANT;
-	np->ipv6only	= sysctl_ipv6_bindv6only;
+	np->ipv6only	= init_net.ipv6.sysctl.bindv6only;
 
 	/* Init the ipv4 part of the socket since we can have sockets
 	 * using v6 API for ipv4.
Index: net-2.6.25/net/ipv6/sysctl_net_ipv6.c
===================================================================
--- net-2.6.25.orig/net/ipv6/sysctl_net_ipv6.c
+++ net-2.6.25/net/ipv6/sysctl_net_ipv6.c
@@ -35,7 +35,7 @@ static ctl_table ipv6_table_template[] =
 	{
 		.ctl_name	= NET_IPV6_BINDV6ONLY,
 		.procname	= "bindv6only",
-		.data		= &sysctl_ipv6_bindv6only,
+		.data		= &init_net.ipv6.sysctl.bindv6only,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
 		.proc_handler	= &proc_dointvec
@@ -115,6 +115,10 @@ static int ipv6_sysctl_net_init(struct n
    	ipv6_table[0].child = ipv6_route_table;
    	ipv6_table[1].child = ipv6_icmp_table;
 
+  	ipv6_table[2].data = &net->ipv6.sysctl.bindv6only;
+
+	net->ipv6.sysctl.bindv6only = 0;
+
    	net->ipv6.sysctl.table = register_net_sysctl_table(net, ipv6_ctl_path, ipv6_table);
    	if (!net->ipv6.sysctl.table)
    		goto out_ipv6_icmp_table;

-- 

^ permalink raw reply

* [patch 9/9][NETNS][IPV6] make icmpv6_time sysctl per namespace
From: Daniel Lezcano @ 2008-01-02 12:25 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <20080102122548.629622062@localhost.localdomain>

[-- Attachment #1: move-sysctl-icmp-to-netns.patch --]
[-- Type: text/plain, Size: 2567 bytes --]

This patch moves the icmpv6_time sysctl to the network namespace
structure. A small initialization helper function has been added.

Because the ipv6 protocol is not yet per namespace, the variable is
accessed relatively to the initial network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
---
 include/net/netns/ipv6.h   |    1 +
 net/ipv6/icmp.c            |    8 ++++----
 net/ipv6/sysctl_net_ipv6.c |    1 +
 3 files changed, 6 insertions(+), 4 deletions(-)

Index: net-2.6.25/include/net/netns/ipv6.h
===================================================================
--- net-2.6.25.orig/include/net/netns/ipv6.h
+++ net-2.6.25/include/net/netns/ipv6.h
@@ -22,6 +22,7 @@ struct netns_sysctl_ipv6 {
  	int ip6_rt_gc_elasticity;
  	int ip6_rt_mtu_expires;
  	int ip6_rt_min_advmss;
+ 	int icmpv6_time;
 };
 
 struct netns_ipv6 {
Index: net-2.6.25/net/ipv6/icmp.c
===================================================================
--- net-2.6.25.orig/net/ipv6/icmp.c
+++ net-2.6.25/net/ipv6/icmp.c
@@ -154,8 +154,6 @@ static int is_ineligible(struct sk_buff 
 	return 0;
 }
 
-static int sysctl_icmpv6_time __read_mostly = 1*HZ;
-
 /*
  * Check the ICMP output rate limit
  */
@@ -186,7 +184,7 @@ static inline int icmpv6_xrlim_allow(str
 		res = 1;
 	} else {
 		struct rt6_info *rt = (struct rt6_info *)dst;
-		int tmo = sysctl_icmpv6_time;
+		int tmo = init_net.ipv6.sysctl.icmpv6_time;
 
 		/* Give more bandwidth to wider prefixes. */
 		if (rt->rt6i_dst.plen < 128)
@@ -913,7 +911,7 @@ ctl_table ipv6_icmp_table_template[] = {
 	{
 		.ctl_name	= NET_IPV6_ICMP_RATELIMIT,
 		.procname	= "ratelimit",
-		.data		= &sysctl_icmpv6_time,
+		.data		= &init_net.ipv6.sysctl.icmpv6_time,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
 		.proc_handler	= &proc_dointvec
@@ -925,6 +923,8 @@ struct ctl_table *ipv6_icmp_sysctl_init(
 {
 	struct ctl_table *table;
 
+	net->ipv6.sysctl.icmpv6_time = 1*HZ;
+
    	table = kmemdup(ipv6_icmp_table_template,
 			sizeof(ipv6_icmp_table_template),
 			GFP_KERNEL);
Index: net-2.6.25/net/ipv6/sysctl_net_ipv6.c
===================================================================
--- net-2.6.25.orig/net/ipv6/sysctl_net_ipv6.c
+++ net-2.6.25/net/ipv6/sysctl_net_ipv6.c
@@ -124,6 +124,7 @@ static int ipv6_sysctl_net_init(struct n
    	ipv6_route_table[8].data = &net->ipv6.sysctl.ip6_rt_min_advmss;
    	ipv6_table[0].child = ipv6_route_table;
 
+  	ipv6_icmp_table[0].data = &net->ipv6.sysctl.icmpv6_time;
    	ipv6_table[1].child = ipv6_icmp_table;
 
   	ipv6_table[2].data = &net->ipv6.sysctl.bindv6only;

-- 

^ permalink raw reply

* [patch 3/9][NETNS][IPV6] make ipv6 structure for netns
From: Daniel Lezcano @ 2008-01-02 12:25 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <20080102122548.629622062@localhost.localdomain>

[-- Attachment #1: make-ipv6-for-netns.patch --]
[-- Type: text/plain, Size: 1181 bytes --]

Like the ipv4 part, this patch adds an ipv6 structure in the net structure
to aggregate the different resources to make ipv6 per namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
---
 include/net/net_namespace.h |    2 ++
 include/net/netns/ipv6.h    |   10 ++++++++++
 2 files changed, 12 insertions(+)

Index: net-2.6.25/include/net/net_namespace.h
===================================================================
--- net-2.6.25.orig/include/net/net_namespace.h
+++ net-2.6.25/include/net/net_namespace.h
@@ -11,6 +11,7 @@
 #include <net/netns/unix.h>
 #include <net/netns/packet.h>
 #include <net/netns/ipv4.h>
+#include <net/netns/ipv6.h>
 
 struct proc_dir_entry;
 struct net_device;
@@ -48,6 +49,7 @@ struct net {
 	struct netns_packet	packet;
 	struct netns_unix	unx;
 	struct netns_ipv4	ipv4;
+	struct netns_ipv6	ipv6;
 };
 
 #ifdef CONFIG_NET
Index: net-2.6.25/include/net/netns/ipv6.h
===================================================================
--- /dev/null
+++ net-2.6.25/include/net/netns/ipv6.h
@@ -0,0 +1,10 @@
+/*
+ * ipv6 in net namespaces
+ */
+
+#ifndef __NETNS_IPV6_H__
+#define __NETNS_IPV6_H__
+
+struct netns_ipv6 {
+};
+#endif

-- 

^ permalink raw reply

* [patch 8/9][NETNS][IPV6] make sysctls route per namespace
From: Daniel Lezcano @ 2008-01-02 12:25 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <20080102122548.629622062@localhost.localdomain>

[-- Attachment #1: move-sysctl-route-to-netns.patch --]
[-- Type: text/plain, Size: 10976 bytes --]

All the sysctl concerning the routes are moved to the network namespace
structure.

Because the ipv6 protocol is not yet per namespace, the variables are
accessed relatively from the network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
---
 include/net/ip6_route.h    |    2 -
 include/net/netns/ipv6.h   |    8 ++++++
 net/ipv6/ip6_fib.c         |   14 ++++++----
 net/ipv6/route.c           |   58 ++++++++++++++++++++++-----------------------
 net/ipv6/sysctl_net_ipv6.c |    9 ++++++
 5 files changed, 55 insertions(+), 36 deletions(-)

Index: net-2.6.25/include/net/netns/ipv6.h
===================================================================
--- net-2.6.25.orig/include/net/netns/ipv6.h
+++ net-2.6.25/include/net/netns/ipv6.h
@@ -14,6 +14,14 @@ struct netns_sysctl_ipv6 {
    	struct inet_frags_ctl frags;
  	int bindv6only;
  	int mld_max_msf;
+ 	int flush_delay;
+ 	int ip6_rt_max_size;
+ 	int ip6_rt_gc_min_interval;
+ 	int ip6_rt_gc_timeout;
+ 	int ip6_rt_gc_interval;
+ 	int ip6_rt_gc_elasticity;
+ 	int ip6_rt_mtu_expires;
+ 	int ip6_rt_min_advmss;
 };
 
 struct netns_ipv6 {
Index: net-2.6.25/net/ipv6/route.c
===================================================================
--- net-2.6.25.orig/net/ipv6/route.c
+++ net-2.6.25/net/ipv6/route.c
@@ -73,14 +73,6 @@
 
 #define CLONE_OFFLINK_ROUTE 0
 
-static int ip6_rt_max_size = 4096;
-static int ip6_rt_gc_min_interval = HZ / 2;
-static int ip6_rt_gc_timeout = 60*HZ;
-int ip6_rt_gc_interval = 30*HZ;
-static int ip6_rt_gc_elasticity = 9;
-static int ip6_rt_mtu_expires = 10*60*HZ;
-static int ip6_rt_min_advmss = IPV6_MIN_MTU - 20 - 40;
-
 static struct rt6_info * ip6_rt_copy(struct rt6_info *ort);
 static struct dst_entry	*ip6_dst_check(struct dst_entry *dst, u32 cookie);
 static struct dst_entry *ip6_negative_advice(struct dst_entry *);
@@ -889,8 +881,8 @@ static inline unsigned int ipv6_advmss(u
 {
 	mtu -= sizeof(struct ipv6hdr) + sizeof(struct tcphdr);
 
-	if (mtu < ip6_rt_min_advmss)
-		mtu = ip6_rt_min_advmss;
+	if (mtu < init_net.ipv6.sysctl.ip6_rt_min_advmss)
+		mtu = init_net.ipv6.sysctl.ip6_rt_min_advmss;
 
 	/*
 	 * Maximal non-jumbo IPv6 payload is IPV6_MAXPLEN and
@@ -990,19 +982,19 @@ static int ip6_dst_gc(void)
 	static unsigned long last_gc;
 	unsigned long now = jiffies;
 
-	if (time_after(last_gc + ip6_rt_gc_min_interval, now) &&
-	    atomic_read(&ip6_dst_ops.entries) <= ip6_rt_max_size)
+	if (time_after(last_gc + init_net.ipv6.sysctl.ip6_rt_gc_min_interval, now) &&
+	    atomic_read(&ip6_dst_ops.entries) <= init_net.ipv6.sysctl.ip6_rt_max_size)
 		goto out;
 
 	expire++;
 	fib6_run_gc(expire);
 	last_gc = now;
 	if (atomic_read(&ip6_dst_ops.entries) < ip6_dst_ops.gc_thresh)
-		expire = ip6_rt_gc_timeout>>1;
+		expire = init_net.ipv6.sysctl.ip6_rt_gc_timeout>>1;
 
 out:
-	expire -= expire>>ip6_rt_gc_elasticity;
-	return (atomic_read(&ip6_dst_ops.entries) > ip6_rt_max_size);
+	expire -= expire>>init_net.ipv6.sysctl.ip6_rt_gc_elasticity;
+	return (atomic_read(&ip6_dst_ops.entries) > init_net.ipv6.sysctl.ip6_rt_max_size);
 }
 
 /* Clean host part of a prefix. Not necessary in radix tree,
@@ -1508,7 +1500,7 @@ void rt6_pmtu_discovery(struct in6_addr 
 		rt->u.dst.metrics[RTAX_MTU-1] = pmtu;
 		if (allfrag)
 			rt->u.dst.metrics[RTAX_FEATURES-1] |= RTAX_FEATURE_ALLFRAG;
-		dst_set_expires(&rt->u.dst, ip6_rt_mtu_expires);
+		dst_set_expires(&rt->u.dst, init_net.ipv6.sysctl.ip6_rt_mtu_expires);
 		rt->rt6i_flags |= RTF_MODIFIED|RTF_EXPIRES;
 		goto out;
 	}
@@ -1534,7 +1526,7 @@ void rt6_pmtu_discovery(struct in6_addr 
 		 * which is 10 mins. After 10 mins the decreased pmtu is expired
 		 * and detecting PMTU increase will be automatically happened.
 		 */
-		dst_set_expires(&nrt->u.dst, ip6_rt_mtu_expires);
+		dst_set_expires(&nrt->u.dst, init_net.ipv6.sysctl.ip6_rt_mtu_expires);
 		nrt->rt6i_flags |= RTF_DYNAMIC|RTF_EXPIRES;
 
 		ip6_ins_rt(nrt);
@@ -2390,15 +2382,14 @@ static inline void ipv6_route_proc_fini(
 
 #ifdef CONFIG_SYSCTL
 
-static int flush_delay;
-
 static
 int ipv6_sysctl_rtcache_flush(ctl_table *ctl, int write, struct file * filp,
 			      void __user *buffer, size_t *lenp, loff_t *ppos)
 {
+	int delay = init_net.ipv6.sysctl.flush_delay;
 	if (write) {
 		proc_dointvec(ctl, write, filp, buffer, lenp, ppos);
-		fib6_run_gc(flush_delay <= 0 ? ~0UL : (unsigned long)flush_delay);
+		fib6_run_gc(delay <= 0 ? ~0UL : (unsigned long)delay);
 		return 0;
 	} else
 		return -EINVAL;
@@ -2407,7 +2398,7 @@ int ipv6_sysctl_rtcache_flush(ctl_table 
 ctl_table ipv6_route_table_template[] = {
 	{
 		.procname	=	"flush",
-		.data		=	&flush_delay,
+		.data		=	&init_net.ipv6.sysctl.flush_delay,
 		.maxlen		=	sizeof(int),
 		.mode		=	0200,
 		.proc_handler	=	&ipv6_sysctl_rtcache_flush
@@ -2423,7 +2414,7 @@ ctl_table ipv6_route_table_template[] = 
 	{
 		.ctl_name	=	NET_IPV6_ROUTE_MAX_SIZE,
 		.procname	=	"max_size",
-		.data		=	&ip6_rt_max_size,
+		.data		=	&init_net.ipv6.sysctl.ip6_rt_max_size,
 		.maxlen		=	sizeof(int),
 		.mode		=	0644,
 		.proc_handler	=	&proc_dointvec,
@@ -2431,7 +2422,7 @@ ctl_table ipv6_route_table_template[] = 
 	{
 		.ctl_name	=	NET_IPV6_ROUTE_GC_MIN_INTERVAL,
 		.procname	=	"gc_min_interval",
-		.data		=	&ip6_rt_gc_min_interval,
+		.data		=	&init_net.ipv6.sysctl.ip6_rt_gc_min_interval,
 		.maxlen		=	sizeof(int),
 		.mode		=	0644,
 		.proc_handler	=	&proc_dointvec_jiffies,
@@ -2440,7 +2431,7 @@ ctl_table ipv6_route_table_template[] = 
 	{
 		.ctl_name	=	NET_IPV6_ROUTE_GC_TIMEOUT,
 		.procname	=	"gc_timeout",
-		.data		=	&ip6_rt_gc_timeout,
+		.data		=	&init_net.ipv6.sysctl.ip6_rt_gc_timeout,
 		.maxlen		=	sizeof(int),
 		.mode		=	0644,
 		.proc_handler	=	&proc_dointvec_jiffies,
@@ -2449,7 +2440,7 @@ ctl_table ipv6_route_table_template[] = 
 	{
 		.ctl_name	=	NET_IPV6_ROUTE_GC_INTERVAL,
 		.procname	=	"gc_interval",
-		.data		=	&ip6_rt_gc_interval,
+		.data		=	&init_net.ipv6.sysctl.ip6_rt_gc_interval,
 		.maxlen		=	sizeof(int),
 		.mode		=	0644,
 		.proc_handler	=	&proc_dointvec_jiffies,
@@ -2458,7 +2449,7 @@ ctl_table ipv6_route_table_template[] = 
 	{
 		.ctl_name	=	NET_IPV6_ROUTE_GC_ELASTICITY,
 		.procname	=	"gc_elasticity",
-		.data		=	&ip6_rt_gc_elasticity,
+		.data		=	&init_net.ipv6.sysctl.ip6_rt_gc_elasticity,
 		.maxlen		=	sizeof(int),
 		.mode		=	0644,
 		.proc_handler	=	&proc_dointvec_jiffies,
@@ -2467,7 +2458,7 @@ ctl_table ipv6_route_table_template[] = 
 	{
 		.ctl_name	=	NET_IPV6_ROUTE_MTU_EXPIRES,
 		.procname	=	"mtu_expires",
-		.data		=	&ip6_rt_mtu_expires,
+		.data		=	&init_net.ipv6.sysctl.ip6_rt_mtu_expires,
 		.maxlen		=	sizeof(int),
 		.mode		=	0644,
 		.proc_handler	=	&proc_dointvec_jiffies,
@@ -2476,7 +2467,7 @@ ctl_table ipv6_route_table_template[] = 
 	{
 		.ctl_name	=	NET_IPV6_ROUTE_MIN_ADVMSS,
 		.procname	=	"min_adv_mss",
-		.data		=	&ip6_rt_min_advmss,
+		.data		=	&init_net.ipv6.sysctl.ip6_rt_min_advmss,
 		.maxlen		=	sizeof(int),
 		.mode		=	0644,
 		.proc_handler	=	&proc_dointvec_jiffies,
@@ -2485,7 +2476,7 @@ ctl_table ipv6_route_table_template[] = 
 	{
 		.ctl_name	=	NET_IPV6_ROUTE_GC_MIN_INTERVAL_MS,
 		.procname	=	"gc_min_interval_ms",
-		.data		=	&ip6_rt_gc_min_interval,
+		.data		=	&init_net.ipv6.sysctl.ip6_rt_gc_min_interval,
 		.maxlen		=	sizeof(int),
 		.mode		=	0644,
 		.proc_handler	=	&proc_dointvec_ms_jiffies,
@@ -2498,6 +2489,15 @@ struct ctl_table *ipv6_route_sysctl_init
 {
 	struct ctl_table *table;
 
+	net->ipv6.sysctl.flush_delay = 0;
+	net->ipv6.sysctl.ip6_rt_max_size = 4096;
+	net->ipv6.sysctl.ip6_rt_gc_min_interval = HZ / 2;
+	net->ipv6.sysctl.ip6_rt_gc_timeout = 60*HZ;
+	net->ipv6.sysctl.ip6_rt_gc_interval = 30*HZ;
+	net->ipv6.sysctl.ip6_rt_gc_elasticity = 9;
+	net->ipv6.sysctl.ip6_rt_mtu_expires = 10*60*HZ;
+	net->ipv6.sysctl.ip6_rt_min_advmss = IPV6_MIN_MTU - 20 - 40;
+
    	table = kmemdup(ipv6_route_table_template,
 			sizeof(ipv6_route_table_template),
 			GFP_KERNEL);
Index: net-2.6.25/net/ipv6/sysctl_net_ipv6.c
===================================================================
--- net-2.6.25.orig/net/ipv6/sysctl_net_ipv6.c
+++ net-2.6.25/net/ipv6/sysctl_net_ipv6.c
@@ -114,7 +114,16 @@ static int ipv6_sysctl_net_init(struct n
    	if (!ipv6_icmp_table)
    		goto out_ipv6_route_table;
 
+   	ipv6_route_table[0].data = &net->ipv6.sysctl.flush_delay;
+   	ipv6_route_table[2].data = &net->ipv6.sysctl.ip6_rt_max_size;
+   	ipv6_route_table[3].data = &net->ipv6.sysctl.ip6_rt_gc_min_interval;
+   	ipv6_route_table[4].data = &net->ipv6.sysctl.ip6_rt_gc_timeout;
+   	ipv6_route_table[5].data = &net->ipv6.sysctl.ip6_rt_gc_interval;
+   	ipv6_route_table[6].data = &net->ipv6.sysctl.ip6_rt_gc_elasticity;
+   	ipv6_route_table[7].data = &net->ipv6.sysctl.ip6_rt_mtu_expires;
+   	ipv6_route_table[8].data = &net->ipv6.sysctl.ip6_rt_min_advmss;
    	ipv6_table[0].child = ipv6_route_table;
+
    	ipv6_table[1].child = ipv6_icmp_table;
 
   	ipv6_table[2].data = &net->ipv6.sysctl.bindv6only;
Index: net-2.6.25/include/net/ip6_route.h
===================================================================
--- net-2.6.25.orig/include/net/ip6_route.h
+++ net-2.6.25/include/net/ip6_route.h
@@ -43,8 +43,6 @@ extern struct rt6_info	ip6_prohibit_entr
 extern struct rt6_info	ip6_blk_hole_entry;
 #endif
 
-extern int ip6_rt_gc_interval;
-
 extern void			ip6_route_input(struct sk_buff *skb);
 
 extern struct dst_entry *	ip6_route_output(struct sock *sk,
Index: net-2.6.25/net/ipv6/ip6_fib.c
===================================================================
--- net-2.6.25.orig/net/ipv6/ip6_fib.c
+++ net-2.6.25/net/ipv6/ip6_fib.c
@@ -681,13 +681,15 @@ static __inline__ void fib6_start_gc(str
 {
 	if (ip6_fib_timer.expires == 0 &&
 	    (rt->rt6i_flags & (RTF_EXPIRES|RTF_CACHE)))
-		mod_timer(&ip6_fib_timer, jiffies + ip6_rt_gc_interval);
+		mod_timer(&ip6_fib_timer, jiffies +
+			  init_net.ipv6.sysctl.ip6_rt_gc_interval);
 }
 
 void fib6_force_start_gc(void)
 {
 	if (ip6_fib_timer.expires == 0)
-		mod_timer(&ip6_fib_timer, jiffies + ip6_rt_gc_interval);
+		mod_timer(&ip6_fib_timer, jiffies +
+			  init_net.ipv6.sysctl.ip6_rt_gc_interval);
 }
 
 /*
@@ -1447,7 +1449,8 @@ void fib6_run_gc(unsigned long dummy)
 {
 	if (dummy != ~0UL) {
 		spin_lock_bh(&fib6_gc_lock);
-		gc_args.timeout = dummy ? (int)dummy : ip6_rt_gc_interval;
+		gc_args.timeout = dummy ? (int)dummy :
+			init_net.ipv6.sysctl.ip6_rt_gc_interval;
 	} else {
 		local_bh_disable();
 		if (!spin_trylock(&fib6_gc_lock)) {
@@ -1455,7 +1458,7 @@ void fib6_run_gc(unsigned long dummy)
 			local_bh_enable();
 			return;
 		}
-		gc_args.timeout = ip6_rt_gc_interval;
+		gc_args.timeout = init_net.ipv6.sysctl.ip6_rt_gc_interval;
 	}
 	gc_args.more = 0;
 
@@ -1463,7 +1466,8 @@ void fib6_run_gc(unsigned long dummy)
 	fib6_clean_all(fib6_age, 0, NULL);
 
 	if (gc_args.more)
-		mod_timer(&ip6_fib_timer, jiffies + ip6_rt_gc_interval);
+		mod_timer(&ip6_fib_timer, jiffies +
+			  init_net.ipv6.sysctl.ip6_rt_gc_interval);
 	else {
 		del_timer(&ip6_fib_timer);
 		ip6_fib_timer.expires = 0;

-- 

^ permalink raw reply

* [patch 7/9][NETNS][IPV6] make mld_max_msf per namespace
From: Daniel Lezcano @ 2008-01-02 12:25 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <20080102122548.629622062@localhost.localdomain>

[-- Attachment #1: move-mld_max_msf-to-netns.patch --]
[-- Type: text/plain, Size: 4074 bytes --]

The mld_max_msf variable is moved to the network namespace structure.
A helper function has been added to initialize the variable.

Because the ipv6 protocol is not yet per namespace, the variable is
accessed relatively from the initial network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
---
 include/net/ipv6.h         |    3 ---
 include/net/netns/ipv6.h   |    1 +
 net/ipv6/ipv6_sockglue.c   |    3 +--
 net/ipv6/mcast.c           |    9 ++++++---
 net/ipv6/sysctl_net_ipv6.c |    5 ++++-
 5 files changed, 12 insertions(+), 9 deletions(-)

Index: net-2.6.25/include/net/ipv6.h
===================================================================
--- net-2.6.25.orig/include/net/ipv6.h
+++ net-2.6.25/include/net/ipv6.h
@@ -108,9 +108,6 @@ struct frag_hdr {
 
 #include <net/sock.h>
 
-/* sysctls */
-extern int sysctl_mld_max_msf;
-
 #define _DEVINC(statname, modifier, idev, field)			\
 ({									\
 	struct inet6_dev *_idev = (idev);				\
Index: net-2.6.25/include/net/netns/ipv6.h
===================================================================
--- net-2.6.25.orig/include/net/netns/ipv6.h
+++ net-2.6.25/include/net/netns/ipv6.h
@@ -13,6 +13,7 @@ struct netns_sysctl_ipv6 {
 	struct ctl_table_header *table;
    	struct inet_frags_ctl frags;
  	int bindv6only;
+ 	int mld_max_msf;
 };
 
 struct netns_ipv6 {
Index: net-2.6.25/net/ipv6/ipv6_sockglue.c
===================================================================
--- net-2.6.25.orig/net/ipv6/ipv6_sockglue.c
+++ net-2.6.25/net/ipv6/ipv6_sockglue.c
@@ -656,7 +656,6 @@ done:
 	}
 	case MCAST_MSFILTER:
 	{
-		extern int sysctl_mld_max_msf;
 		struct group_filter *gsf;
 
 		if (optlen < GROUP_FILTER_SIZE(0))
@@ -677,7 +676,7 @@ done:
 		}
 		/* numsrc >= (4G-140)/128 overflow in 32 bits */
 		if (gsf->gf_numsrc >= 0x1ffffffU ||
-		    gsf->gf_numsrc > sysctl_mld_max_msf) {
+		    gsf->gf_numsrc > init_net.ipv6.sysctl.mld_max_msf) {
 			kfree(gsf);
 			retv = -ENOBUFS;
 			break;
Index: net-2.6.25/net/ipv6/mcast.c
===================================================================
--- net-2.6.25.orig/net/ipv6/mcast.c
+++ net-2.6.25/net/ipv6/mcast.c
@@ -172,8 +172,6 @@ static int ip6_mc_leave_src(struct sock 
 
 #define IPV6_MLD_MAX_MSF	64
 
-int sysctl_mld_max_msf __read_mostly = IPV6_MLD_MAX_MSF;
-
 /*
  *	socket join on multicast group
  */
@@ -441,7 +439,7 @@ int ip6_mc_source(int add, int omode, st
 	}
 	/* else, add a new source to the filter */
 
-	if (psl && psl->sl_count >= sysctl_mld_max_msf) {
+	if (psl && psl->sl_count >= init_net.ipv6.sysctl.mld_max_msf) {
 		err = -ENOBUFS;
 		goto done;
 	}
@@ -2597,6 +2595,11 @@ static const struct file_operations igmp
 };
 #endif
 
+void igmp6_sysctl_init(struct net *net)
+{
+	net->ipv6.sysctl.mld_max_msf = IPV6_MLD_MAX_MSF;
+}
+
 int __init igmp6_init(struct net_proto_family *ops)
 {
 	struct ipv6_pinfo *np;
Index: net-2.6.25/net/ipv6/sysctl_net_ipv6.c
===================================================================
--- net-2.6.25.orig/net/ipv6/sysctl_net_ipv6.c
+++ net-2.6.25/net/ipv6/sysctl_net_ipv6.c
@@ -17,6 +17,7 @@
 extern struct ctl_table *ipv6_route_sysctl_init(struct net *net);
 extern struct ctl_table *ipv6_icmp_sysctl_init(struct net *net);
 extern void ipv6_frag_sysctl_init(struct net *net);
+extern void igmp6_sysctl_init(struct net *net);
 
 static ctl_table ipv6_table_template[] = {
 	{
@@ -78,7 +79,7 @@ static ctl_table ipv6_table_template[] =
 	{
 		.ctl_name	= NET_IPV6_MLD_MAX_MSF,
 		.procname	= "mld_max_msf",
-		.data		= &sysctl_mld_max_msf,
+		.data		= &init_net.ipv6.sysctl.mld_max_msf,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
 		.proc_handler	= &proc_dointvec
@@ -121,8 +122,10 @@ static int ipv6_sysctl_net_init(struct n
     	ipv6_table[4].data = &net->ipv6.sysctl.frags.low_thresh;
     	ipv6_table[5].data = &net->ipv6.sysctl.frags.timeout;
   	ipv6_table[6].data = &net->ipv6.sysctl.frags.secret_interval;
+   	ipv6_table[7].data = &net->ipv6.sysctl.mld_max_msf;
 
 	ipv6_frag_sysctl_init(net);
+	igmp6_sysctl_init(net);
 
 	net->ipv6.sysctl.bindv6only = 0;
 

-- 

^ permalink raw reply

* [patch 0/9][NETNS][IPV6] make sysctl per namespace
From: Daniel Lezcano @ 2008-01-02 12:25 UTC (permalink / raw)
  To: davem; +Cc: netdev

The following patchset makes the ipv6 sysctl to handle multiple
network namespaces. Each instance of a network namespace as its own
set of sysctl values, that means the behavior of the ipv6 stack can be
different depending on the sysctl values setup in the different
network namespaces.

-- 

^ permalink raw reply

* [patch 1/9][NETNS][IPV6] make ipv6_sysctl_register to return a value
From: Daniel Lezcano @ 2008-01-02 12:25 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <20080102122548.629622062@localhost.localdomain>

[-- Attachment #1: ipv6-sysctl-register-return-value.patch --]
[-- Type: text/plain, Size: 1998 bytes --]

This patch makes the function ipv6_sysctl_register to return a
value. The af_inet6 init function is now able to catch and handle
an error from the initialization of the sysctl.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
---
 include/net/ipv6.h         |    2 +-
 net/ipv6/af_inet6.c        |    5 ++++-
 net/ipv6/sysctl_net_ipv6.c |    6 +++++-
 3 files changed, 10 insertions(+), 3 deletions(-)

Index: net-2.6.25/include/net/ipv6.h
===================================================================
--- net-2.6.25.orig/include/net/ipv6.h
+++ net-2.6.25/include/net/ipv6.h
@@ -620,7 +620,7 @@ static inline int snmp6_unregister_dev(s
 extern ctl_table ipv6_route_table[];
 extern ctl_table ipv6_icmp_table[];
 
-extern void ipv6_sysctl_register(void);
+extern int ipv6_sysctl_register(void);
 extern void ipv6_sysctl_unregister(void);
 #endif
 
Index: net-2.6.25/net/ipv6/af_inet6.c
===================================================================
--- net-2.6.25.orig/net/ipv6/af_inet6.c
+++ net-2.6.25/net/ipv6/af_inet6.c
@@ -783,7 +783,9 @@ static int __init inet6_init(void)
 	 */
 
 #ifdef CONFIG_SYSCTL
-	ipv6_sysctl_register();
+	err = ipv6_sysctl_register();
+	if (err)
+		goto sysctl_fail;
 #endif
 	err = icmpv6_init(&inet6_family_ops);
 	if (err)
@@ -897,6 +899,7 @@ ndisc_fail:
 icmp_fail:
 #ifdef CONFIG_SYSCTL
 	ipv6_sysctl_unregister();
+sysctl_fail:
 #endif
 	cleanup_ipv6_mibs();
 out_unregister_sock:
Index: net-2.6.25/net/ipv6/sysctl_net_ipv6.c
===================================================================
--- net-2.6.25.orig/net/ipv6/sysctl_net_ipv6.c
+++ net-2.6.25/net/ipv6/sysctl_net_ipv6.c
@@ -90,9 +90,13 @@ static struct ctl_path ipv6_ctl_path[] =
 
 static struct ctl_table_header *ipv6_sysctl_header;
 
-void ipv6_sysctl_register(void)
+int ipv6_sysctl_register(void)
 {
 	ipv6_sysctl_header = register_sysctl_paths(ipv6_ctl_path, ipv6_table);
+	if (!ipv6_sysctl_header)
+		return -ENOMEM;
+
+	return 0;
 }
 
 void ipv6_sysctl_unregister(void)

-- 

^ permalink raw reply

* [patch 6/9][NETNS][IPV6] make ip6_frags per namespace
From: Daniel Lezcano @ 2008-01-02 12:25 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <20080102122548.629622062@localhost.localdomain>

[-- Attachment #1: move-ip6-frags-to-netns.patch --]
[-- Type: text/plain, Size: 5417 bytes --]

The ip6_frags is moved to the network namespace structure.
Because there can be multiple instances of the network namespaces,
and the ip6_frags is no longer a global static variable, a helper
function has been added to facilitate the initialization of the
variables.

Until the ipv6 protocol is not per namespace, the variables are
accessed relatively from the initial network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
---
 include/net/ipv6.h         |    3 ---
 include/net/netns/ipv6.h   |    3 +++
 net/ipv6/reassembly.c      |   21 ++++++++++++---------
 net/ipv6/sysctl_net_ipv6.c |   15 +++++++++++----
 4 files changed, 26 insertions(+), 16 deletions(-)

Index: net-2.6.25/include/net/ipv6.h
===================================================================
--- net-2.6.25.orig/include/net/ipv6.h
+++ net-2.6.25/include/net/ipv6.h
@@ -570,9 +570,6 @@ extern int inet6_hash_connect(struct ine
 /*
  * reassembly.c
  */
-struct inet_frags_ctl;
-extern struct inet_frags_ctl ip6_frags_ctl;
-
 extern const struct proto_ops inet6_stream_ops;
 extern const struct proto_ops inet6_dgram_ops;
 
Index: net-2.6.25/include/net/netns/ipv6.h
===================================================================
--- net-2.6.25.orig/include/net/netns/ipv6.h
+++ net-2.6.25/include/net/netns/ipv6.h
@@ -2,6 +2,8 @@
  * ipv6 in net namespaces
  */
 
+#include <net/inet_frag.h>
+
 #ifndef __NETNS_IPV6_H__
 #define __NETNS_IPV6_H__
 
@@ -9,6 +11,7 @@ struct ctl_table_header;
 
 struct netns_sysctl_ipv6 {
 	struct ctl_table_header *table;
+   	struct inet_frags_ctl frags;
  	int bindv6only;
 };
 
Index: net-2.6.25/net/ipv6/reassembly.c
===================================================================
--- net-2.6.25.orig/net/ipv6/reassembly.c
+++ net-2.6.25/net/ipv6/reassembly.c
@@ -82,13 +82,6 @@ struct frag_queue
 	__u16			nhoffset;
 };
 
-struct inet_frags_ctl ip6_frags_ctl __read_mostly = {
-	.high_thresh 	 = 256 * 1024,
-	.low_thresh	 = 192 * 1024,
-	.timeout	 = IPV6_FRAG_TIMEOUT,
-	.secret_interval = 10 * 60 * HZ,
-};
-
 static struct inet_frags ip6_frags;
 
 int ip6_frag_nqueues(void)
@@ -605,7 +598,7 @@ static int ipv6_frag_rcv(struct sk_buff 
 		return 1;
 	}
 
-	if (atomic_read(&ip6_frags.mem) > ip6_frags_ctl.high_thresh)
+	if (atomic_read(&ip6_frags.mem) > init_net.ipv6.sysctl.frags.high_thresh)
 		ip6_evictor(ip6_dst_idev(skb->dst));
 
 	if ((fq = fq_find(fhdr->identification, &hdr->saddr, &hdr->daddr,
@@ -632,6 +625,16 @@ static struct inet6_protocol frag_protoc
 	.flags		=	INET6_PROTO_NOPOLICY,
 };
 
+void ipv6_frag_sysctl_init(struct net *net)
+{
+	net->ipv6.sysctl.frags.high_thresh = 256 * 1024,
+	net->ipv6.sysctl.frags.low_thresh = 192 * 1024,
+	net->ipv6.sysctl.frags.timeout = IPV6_FRAG_TIMEOUT,
+	net->ipv6.sysctl.frags.secret_interval = 10 * 60 * HZ,
+
+	ip6_frags.ctl = &net->ipv6.sysctl.frags;
+}
+
 int __init ipv6_frag_init(void)
 {
 	int ret;
@@ -639,7 +642,7 @@ int __init ipv6_frag_init(void)
 	ret = inet6_add_protocol(&frag_protocol, IPPROTO_FRAGMENT);
 	if (ret)
 		goto out;
-	ip6_frags.ctl = &ip6_frags_ctl;
+
 	ip6_frags.hashfn = ip6_hashfn;
 	ip6_frags.constructor = ip6_frag_init;
 	ip6_frags.destructor = NULL;
Index: net-2.6.25/net/ipv6/sysctl_net_ipv6.c
===================================================================
--- net-2.6.25.orig/net/ipv6/sysctl_net_ipv6.c
+++ net-2.6.25/net/ipv6/sysctl_net_ipv6.c
@@ -16,6 +16,7 @@
 
 extern struct ctl_table *ipv6_route_sysctl_init(struct net *net);
 extern struct ctl_table *ipv6_icmp_sysctl_init(struct net *net);
+extern void ipv6_frag_sysctl_init(struct net *net);
 
 static ctl_table ipv6_table_template[] = {
 	{
@@ -43,7 +44,7 @@ static ctl_table ipv6_table_template[] =
 	{
 		.ctl_name	= NET_IPV6_IP6FRAG_HIGH_THRESH,
 		.procname	= "ip6frag_high_thresh",
-		.data		= &ip6_frags_ctl.high_thresh,
+		.data		= &init_net.ipv6.sysctl.frags.high_thresh,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
 		.proc_handler	= &proc_dointvec
@@ -51,7 +52,7 @@ static ctl_table ipv6_table_template[] =
 	{
 		.ctl_name	= NET_IPV6_IP6FRAG_LOW_THRESH,
 		.procname	= "ip6frag_low_thresh",
-		.data		= &ip6_frags_ctl.low_thresh,
+		.data		= &init_net.ipv6.sysctl.frags.low_thresh,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
 		.proc_handler	= &proc_dointvec
@@ -59,7 +60,7 @@ static ctl_table ipv6_table_template[] =
 	{
 		.ctl_name	= NET_IPV6_IP6FRAG_TIME,
 		.procname	= "ip6frag_time",
-		.data		= &ip6_frags_ctl.timeout,
+		.data		= &init_net.ipv6.sysctl.frags.timeout,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
 		.proc_handler	= &proc_dointvec_jiffies,
@@ -68,7 +69,7 @@ static ctl_table ipv6_table_template[] =
 	{
 		.ctl_name	= NET_IPV6_IP6FRAG_SECRET_INTERVAL,
 		.procname	= "ip6frag_secret_interval",
-		.data		= &ip6_frags_ctl.secret_interval,
+		.data		= &init_net.ipv6.sysctl.frags.secret_interval,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
 		.proc_handler	= &proc_dointvec_jiffies,
@@ -116,6 +117,12 @@ static int ipv6_sysctl_net_init(struct n
    	ipv6_table[1].child = ipv6_icmp_table;
 
   	ipv6_table[2].data = &net->ipv6.sysctl.bindv6only;
+    	ipv6_table[3].data = &net->ipv6.sysctl.frags.high_thresh;
+    	ipv6_table[4].data = &net->ipv6.sysctl.frags.low_thresh;
+    	ipv6_table[5].data = &net->ipv6.sysctl.frags.timeout;
+  	ipv6_table[6].data = &net->ipv6.sysctl.frags.secret_interval;
+
+	ipv6_frag_sysctl_init(net);
 
 	net->ipv6.sysctl.bindv6only = 0;
 

-- 

^ permalink raw reply

* [patch 4/9][NETNS][IPV6] make multiple instance of sysctl tables
From: Daniel Lezcano @ 2008-01-02 12:25 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <20080102122548.629622062@localhost.localdomain>

[-- Attachment #1: make-ipv6-sysctl-per-namespace.patch --]
[-- Type: text/plain, Size: 5934 bytes --]

Each network namespace wants its own set of sysctl value, eg. we should
not be able from a namespace to set a sysctl value for another namespace
, especially for the initial network namespace.

This patch duplicates the sysctl table when we register a new network
namespace for ipv6. The duplicated table are postfixed with the "template"
word to notify the developper the table is cloned.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
---
 include/net/ipv6.h         |    4 +-
 include/net/netns/ipv6.h   |    9 ++++++
 net/ipv6/icmp.c            |   12 +++++++-
 net/ipv6/route.c           |   11 ++++++-
 net/ipv6/sysctl_net_ipv6.c |   67 +++++++++++++++++++++++++++++++++++++--------
 5 files changed, 88 insertions(+), 15 deletions(-)

Index: net-2.6.25/net/ipv6/sysctl_net_ipv6.c
===================================================================
--- net-2.6.25.orig/net/ipv6/sysctl_net_ipv6.c
+++ net-2.6.25/net/ipv6/sysctl_net_ipv6.c
@@ -14,20 +14,23 @@
 #include <net/addrconf.h>
 #include <net/inet_frag.h>
 
-static ctl_table ipv6_table[] = {
+extern struct ctl_table *ipv6_route_sysctl_init(struct net *net);
+extern struct ctl_table *ipv6_icmp_sysctl_init(struct net *net);
+
+static ctl_table ipv6_table_template[] = {
 	{
 		.ctl_name	= NET_IPV6_ROUTE,
 		.procname	= "route",
 		.maxlen		= 0,
 		.mode		= 0555,
-		.child		= ipv6_route_table
+		.child		= ipv6_route_table_template
 	},
 	{
 		.ctl_name	= NET_IPV6_ICMP,
 		.procname	= "icmp",
 		.maxlen		= 0,
 		.mode		= 0555,
-		.child		= ipv6_icmp_table
+		.child		= ipv6_icmp_table_template
 	},
 	{
 		.ctl_name	= NET_IPV6_BINDV6ONLY,
@@ -88,20 +91,62 @@ static struct ctl_path ipv6_ctl_path[] =
 	{ },
 };
 
-static struct ctl_table_header *ipv6_sysctl_header;
-
 static int ipv6_sysctl_net_init(struct net *net)
 {
-	ipv6_sysctl_header = register_net_sysctl_table(net, ipv6_ctl_path, ipv6_table);
-	if (!ipv6_sysctl_header)
-		return -ENOMEM;
-
-	return 0;
+   	struct ctl_table *ipv6_table;
+   	struct ctl_table *ipv6_route_table;
+   	struct ctl_table *ipv6_icmp_table;
+   	int err;
+
+   	err = -ENOMEM;
+   	ipv6_table = kmemdup(ipv6_table_template, sizeof(ipv6_table_template),
+   			     GFP_KERNEL);
+   	if (!ipv6_table)
+   		goto out;
+
+	ipv6_route_table = ipv6_route_sysctl_init(net);
+	if (!ipv6_route_table)
+		goto out_ipv6_table;
+
+   	ipv6_icmp_table = ipv6_icmp_sysctl_init(net);
+   	if (!ipv6_icmp_table)
+   		goto out_ipv6_route_table;
+
+   	ipv6_table[0].child = ipv6_route_table;
+   	ipv6_table[1].child = ipv6_icmp_table;
+
+   	net->ipv6.sysctl.table = register_net_sysctl_table(net, ipv6_ctl_path, ipv6_table);
+   	if (!net->ipv6.sysctl.table)
+   		goto out_ipv6_icmp_table;
+
+   	err = 0;
+out:
+   	return err;
+
+out_ipv6_icmp_table:
+   	kfree(ipv6_icmp_table);
+out_ipv6_route_table:
+   	kfree(ipv6_route_table);
+out_ipv6_table:
+   	kfree(ipv6_table);
+   	goto out;
 }
 
 static void ipv6_sysctl_net_exit(struct net *net)
 {
-	unregister_net_sysctl_table(ipv6_sysctl_header);
+   	struct ctl_table *ipv6_table;
+   	struct ctl_table *ipv6_route_table;
+   	struct ctl_table *ipv6_icmp_table;
+
+   	ipv6_table = net->ipv6.sysctl.table->ctl_table_arg;
+   	ipv6_route_table = ipv6_table[0].child;
+   	ipv6_icmp_table = ipv6_table[1].child;
+
+   	unregister_net_sysctl_table(net->ipv6.sysctl.table);
+
+   	kfree(ipv6_table);
+   	kfree(ipv6_route_table);
+   	kfree(ipv6_icmp_table);
 }
 
 static struct pernet_operations ipv6_sysctl_net_ops = {
Index: net-2.6.25/include/net/netns/ipv6.h
===================================================================
--- net-2.6.25.orig/include/net/netns/ipv6.h
+++ net-2.6.25/include/net/netns/ipv6.h
@@ -5,6 +5,15 @@
 #ifndef __NETNS_IPV6_H__
 #define __NETNS_IPV6_H__
 
+struct ctl_table_header;
+
+struct netns_sysctl_ipv6 {
+	struct ctl_table_header *table;
+};
+
 struct netns_ipv6 {
+#ifdef CONFIG_SYSCTL
+	struct netns_sysctl_ipv6 sysctl;
+#endif
 };
 #endif
Index: net-2.6.25/include/net/ipv6.h
===================================================================
--- net-2.6.25.orig/include/net/ipv6.h
+++ net-2.6.25/include/net/ipv6.h
@@ -617,8 +617,8 @@ static inline int snmp6_unregister_dev(s
 #endif
 
 #ifdef CONFIG_SYSCTL
-extern ctl_table ipv6_route_table[];
-extern ctl_table ipv6_icmp_table[];
+extern ctl_table ipv6_route_table_template[];
+extern ctl_table ipv6_icmp_table_template[];
 
 extern int ipv6_sysctl_register(void);
 extern void ipv6_sysctl_unregister(void);
Index: net-2.6.25/net/ipv6/icmp.c
===================================================================
--- net-2.6.25.orig/net/ipv6/icmp.c
+++ net-2.6.25/net/ipv6/icmp.c
@@ -909,7 +909,7 @@ int icmpv6_err_convert(int type, int cod
 EXPORT_SYMBOL(icmpv6_err_convert);
 
 #ifdef CONFIG_SYSCTL
-ctl_table ipv6_icmp_table[] = {
+ctl_table ipv6_icmp_table_template[] = {
 	{
 		.ctl_name	= NET_IPV6_ICMP_RATELIMIT,
 		.procname	= "ratelimit",
@@ -920,5 +920,15 @@ ctl_table ipv6_icmp_table[] = {
 	},
 	{ .ctl_name = 0 },
 };
+
+struct ctl_table *ipv6_icmp_sysctl_init(struct net *net)
+{
+	struct ctl_table *table;
+
+   	table = kmemdup(ipv6_icmp_table_template,
+			sizeof(ipv6_icmp_table_template),
+			GFP_KERNEL);
+	return table;
+}
 #endif
 
Index: net-2.6.25/net/ipv6/route.c
===================================================================
--- net-2.6.25.orig/net/ipv6/route.c
+++ net-2.6.25/net/ipv6/route.c
@@ -2404,7 +2404,7 @@ int ipv6_sysctl_rtcache_flush(ctl_table 
 		return -EINVAL;
 }
 
-ctl_table ipv6_route_table[] = {
+ctl_table ipv6_route_table_template[] = {
 	{
 		.procname	=	"flush",
 		.data		=	&flush_delay,
@@ -2494,6 +2494,15 @@ ctl_table ipv6_route_table[] = {
 	{ .ctl_name = 0 }
 };
 
+struct ctl_table *ipv6_route_sysctl_init(struct net *net)
+{
+	struct ctl_table *table;
+
+   	table = kmemdup(ipv6_route_table_template,
+			sizeof(ipv6_route_table_template),
+			GFP_KERNEL);
+	return table;
+}
 #endif
 
 int __init ip6_route_init(void)

-- 

^ permalink raw reply

* [patch 2/9][NETNS][IPV6] make the ipv6 sysctl to be a netns subsystem
From: Daniel Lezcano @ 2008-01-02 12:25 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <20080102122548.629622062@localhost.localdomain>

[-- Attachment #1: make-ipv6-sysctl-to-be-a-subsystem.patch --]
[-- Type: text/plain, Size: 1581 bytes --]

The initialization of the sysctl for the ipv6 protocol is changed to
a network namespace subsystem. That means when a new network namespace
is created the initialization function for the sysctl will be called.

That do not change the behavior of the sysctl in case of the kernel
with the network namespace disabled.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
---
 net/ipv6/sysctl_net_ipv6.c |   21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

Index: net-2.6.25/net/ipv6/sysctl_net_ipv6.c
===================================================================
--- net-2.6.25.orig/net/ipv6/sysctl_net_ipv6.c
+++ net-2.6.25/net/ipv6/sysctl_net_ipv6.c
@@ -90,16 +90,31 @@ static struct ctl_path ipv6_ctl_path[] =
 
 static struct ctl_table_header *ipv6_sysctl_header;
 
-int ipv6_sysctl_register(void)
+static int ipv6_sysctl_net_init(struct net *net)
 {
-	ipv6_sysctl_header = register_sysctl_paths(ipv6_ctl_path, ipv6_table);
+	ipv6_sysctl_header = register_net_sysctl_table(net, ipv6_ctl_path, ipv6_table);
 	if (!ipv6_sysctl_header)
 		return -ENOMEM;
 
 	return 0;
 }
 
+static void ipv6_sysctl_net_exit(struct net *net)
+{
+	unregister_net_sysctl_table(ipv6_sysctl_header);
+}
+
+static struct pernet_operations ipv6_sysctl_net_ops = {
+	.init = ipv6_sysctl_net_init,
+	.exit = ipv6_sysctl_net_exit,
+};
+
+int ipv6_sysctl_register(void)
+{
+	return register_pernet_subsys(&ipv6_sysctl_net_ops);
+}
+
 void ipv6_sysctl_unregister(void)
 {
-	unregister_sysctl_table(ipv6_sysctl_header);
+	unregister_pernet_subsys(&ipv6_sysctl_net_ops);
 }

-- 

^ permalink raw reply

* Re: [PATCH] Force UNIX domain sockets to be built in
From: Bodo Eggert @ 2008-01-02 12:26 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Theodore Tso, 7eggert, viro, davem, jengelh, devzero,
	linux-kernel, bunk, netdev
In-Reply-To: <E1JA0mT-0008U2-00@gondolin.me.apana.org.au>

On Wed, 2 Jan 2008, Herbert Xu wrote:
> Theodore Tso <tytso@mit.edu> wrote:

> > The question is whether the size of the Unix domain sockets support is
> > worth the complexity of yet another config option that we expose to
> > the user.  For the embedded world, OK, maybe they want to save 14k of
> > non-swappable memory.  But for the non-embedded world, given the 117k
> > mandatory memory usage of sysfs, or the 124k memory usage of the core
> > networking stack, never mind the 3 megabytes of memory used by objects
> > in the kernel subdirectory, it's not clear that it's worth worrying
> > over 14k of memory, especially when many Unix programs assume
> > that Unix Domain Sockets are present.
> 
> That would make sense if we were proposing to get rid of the CONFIG_UNIX
> question altogether for !CONFIG_EMBEDDED.

Exactly this is what my patch does: The question is not to be displayed 
unless EMBEDDED, and the default is changed to y.

>  However, the proposal here is
> merely to eliminate the modular option but the CONFIG_UNIX prompt itself
> will remain even without CONFIG_EMBEDDED.
> 
> This I think is quite pointless.

That's what another patch would do. I decided that s/tristate/bool/ is 
something completely different from adding the default and hiding the 
option, and that I'd avoid this discussion by not eliminating UNIX=m.

-- 
Top 100 things you don't want the sysadmin to say:
96. That's SOOOOO bizarre.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox