[PATCH 0/5] Make nicer CONFIG_NET

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code
@ 2007-10-31 19:19 Pavel Emelyanov
  2007-10-31 18:49 ` Eric Dumazet
                   ` (6 more replies)
  0 siblings, 7 replies; 21+ messages in thread
From: Pavel Emelyanov @ 2007-10-31 19:19 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, Eric W. Biederman, devel

Currently we have the NET_NS config option, but the only change it 
makes is just return ERR_PTR(-EINVAL) inside the cloning call thus
introducing a bunch of a dead code and making the reference counting
unneeded. This is not very good.

So clean the net_namespace.c to fix this.

I have sent a set of patches to Andrew to make similar thing for
other namespaces, which introduces the NAMESPACES option to turn
all the namespaces off at once (to make embedded people suffer
less). So after that stuff is in, there will be some more patches 
to tie all this together.

What is to be done after this set is to make the register_pernet_xxx
stuff smaller. Currently this code weights approximately 500 bytes, 
so it worths reducing it, but I haven't found a good solution yet.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code
  2007-10-31 19:19 [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code Pavel Emelyanov
@ 2007-10-31 18:49 ` Eric Dumazet
  2007-10-31 21:35   ` Daniel Lezcano
  2007-10-31 22:05   ` Eric W. Biederman
  2007-10-31 19:23 ` [PATCH 1/5][NETNS] Make the init/exit hooks checks outside the loop Pavel Emelyanov
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 21+ messages in thread
From: Eric Dumazet @ 2007-10-31 18:49 UTC (permalink / raw)
  To: Pavel Emelyanov; +Cc: David Miller, Linux Netdev List, Eric W. Biederman, devel

On Wed, 31 Oct 2007 22:19:43 +0300
Pavel Emelyanov <xemul@openvz.org> wrote:

> Currently we have the NET_NS config option, but the only change it 
> makes is just return ERR_PTR(-EINVAL) inside the cloning call thus
> introducing a bunch of a dead code and making the reference counting
> unneeded. This is not very good.
> 
> So clean the net_namespace.c to fix this.
> 
> I have sent a set of patches to Andrew to make similar thing for
> other namespaces, which introduces the NAMESPACES option to turn
> all the namespaces off at once (to make embedded people suffer
> less). So after that stuff is in, there will be some more patches 
> to tie all this together.
> 
> What is to be done after this set is to make the register_pernet_xxx
> stuff smaller. Currently this code weights approximately 500 bytes, 
> so it worths reducing it, but I haven't found a good solution yet.

Definitly wanted here. Thank you.
One more refcounting on each socket creation/deletion was expensive.

Maybe we can add a macro to get nd_net from a "struct net_device"
so that every instance of

if (dev->nd_net != &init_net)
    goto drop;

can also be optimized away if !CONFIG_NET_NS

extern inline netdev_get_ns(struct netdevice *dev)
{
#ifdef CONFIG_NET_NS
	return dev->nd_net;
#else
	return &init_net;
#endif
}

...

if (netdev_get_ns(dev) != &init_net)
	goto drop;


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code
  2007-10-31 18:49 ` Eric Dumazet
@ 2007-10-31 21:35   ` Daniel Lezcano
  2007-10-31 22:05   ` Eric W. Biederman
  1 sibling, 0 replies; 21+ messages in thread
From: Daniel Lezcano @ 2007-10-31 21:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Pavel Emelyanov, David Miller, Linux Netdev List,
	Eric W. Biederman, devel

Eric Dumazet wrote:
> On Wed, 31 Oct 2007 22:19:43 +0300
> Pavel Emelyanov <xemul@openvz.org> wrote:
> 
>> Currently we have the NET_NS config option, but the only change it 
>> makes is just return ERR_PTR(-EINVAL) inside the cloning call thus
>> introducing a bunch of a dead code and making the reference counting
>> unneeded. This is not very good.
>>
>> So clean the net_namespace.c to fix this.
>>
>> I have sent a set of patches to Andrew to make similar thing for
>> other namespaces, which introduces the NAMESPACES option to turn
>> all the namespaces off at once (to make embedded people suffer
>> less). So after that stuff is in, there will be some more patches 
>> to tie all this together.
>>
>> What is to be done after this set is to make the register_pernet_xxx
>> stuff smaller. Currently this code weights approximately 500 bytes, 
>> so it worths reducing it, but I haven't found a good solution yet.
> 
> Definitly wanted here. Thank you.
> One more refcounting on each socket creation/deletion was expensive.
> 
> Maybe we can add a macro to get nd_net from a "struct net_device"
> so that every instance of
> 
> if (dev->nd_net != &init_net)
>     goto drop;
> 
> can also be optimized away if !CONFIG_NET_NS
> 
> extern inline netdev_get_ns(struct netdevice *dev)
> {
> #ifdef CONFIG_NET_NS
> 	return dev->nd_net;
> #else
> 	return &init_net;
> #endif
> }

Or something like:

#ifdef CONFIG_NET_NS
static inline int init_net_dev(struct net_device *dev)
{
	return dev->nd_net == &init_net;
}
#else
static inline int init_net_dev(struct net_device *dev)
{
	return 1;
}
#endif


By the way, this kind of test will disappear when the network namespace 
will be complete and take into account the differents protocols.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code
  2007-10-31 18:49 ` Eric Dumazet
  2007-10-31 21:35   ` Daniel Lezcano
@ 2007-10-31 22:05   ` Eric W. Biederman
  2007-10-31 22:40     ` Eric Dumazet
  1 sibling, 1 reply; 21+ messages in thread
From: Eric W. Biederman @ 2007-10-31 22:05 UTC (permalink / raw)
  To: Pavel Emelyanov; +Cc: David Miller, Linux Netdev List, devel

Eric Dumazet <dada1@cosmosbay.com> writes:

> Definitly wanted here. Thank you.
> One more refcounting on each socket creation/deletion was expensive.

Really?  Have you actually measured that?  If the overhead is
measurable and expensive we may want to look at per cpu counters or
something like that.  So far I don't have any numbers that say any
of the network namespace work inherently has any overhead.

> Maybe we can add a macro to get nd_net from a "struct net_device"
> so that every instance of
>
> if (dev->nd_net != &init_net)
>     goto drop;
>
> can also be optimized away if !CONFIG_NET_NS

Well that extra check should be removed once we finish converting
those code paths.  So I'm not too worried.

If this becomes a big issue I can dig up my old code that
replaced struct net * with a net_t typedef and used functions
for all of the comparisons and allowed everything to be compiled
away.

Trouble was it was sufficiently different that it was just enough
different that people could not immediately understand the code.

Eric

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code
  2007-10-31 22:05   ` Eric W. Biederman
@ 2007-10-31 22:40     ` Eric Dumazet
  2007-10-31 23:31       ` David Miller
  2007-11-01  0:51       ` Eric W. Biederman
  0 siblings, 2 replies; 21+ messages in thread
From: Eric Dumazet @ 2007-10-31 22:40 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Pavel Emelyanov, David Miller, Linux Netdev List, devel

Eric W. Biederman a écrit :
> Eric Dumazet <dada1@cosmosbay.com> writes:
> 
> 
>> Definitly wanted here. Thank you.
>> One more refcounting on each socket creation/deletion was expensive.
> 
> Really?  Have you actually measured that?  If the overhead is
> measurable and expensive we may want to look at per cpu counters or
> something like that.  So far I don't have any numbers that say any
> of the network namespace work inherently has any overhead.

It seems that on some old opterons (two 246 for example),
"if (atomic_dec_and_test(&net->count))" is rather expensive yes :(

I am not sure per cpu counters help : I tried this and got no speedup. (This 
was on net_device refcnt at that time)

(on this machines, the access through fs/gs selector seems expensive too)

Maybe a lazy mode could be done, ie only do a atomic_dec(), as done in dev_put() ?

Also, "count" sits in a cache line that contains mostly read and shared 
fields, you might want to put it in a separate cache line in SMP, to avoid 
cache line ping-pongs.


> 
>> Maybe we can add a macro to get nd_net from a "struct net_device"
>> so that every instance of
>>
>> if (dev->nd_net != &init_net)
>>     goto drop;
>>
>> can also be optimized away if !CONFIG_NET_NS
> 
> Well that extra check should be removed once we finish converting
> those code paths.  So I'm not too worried.

OK. Since the conditional test can be predicted by cpu, it certainly doesnt 
matter.

> 
> If this becomes a big issue I can dig up my old code that
> replaced struct net * with a net_t typedef and used functions
> for all of the comparisons and allowed everything to be compiled
> away.



> 
> Trouble was it was sufficiently different that it was just enough
> different that people could not immediately understand the code.
> 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code
  2007-10-31 22:40     ` Eric Dumazet
@ 2007-10-31 23:31       ` David Miller
  2007-11-01  0:58         ` Eric W. Biederman
  2007-11-01  0:51       ` Eric W. Biederman
  1 sibling, 1 reply; 21+ messages in thread
From: David Miller @ 2007-10-31 23:31 UTC (permalink / raw)
  To: dada1; +Cc: ebiederm, xemul, netdev, devel

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Wed, 31 Oct 2007 23:40:59 +0100

> Eric W. Biederman a écrit :
> > Eric Dumazet <dada1@cosmosbay.com> writes:
> > 
> > 
> >> Definitly wanted here. Thank you.
> >> One more refcounting on each socket creation/deletion was expensive.
> > 
> > Really?  Have you actually measured that?  If the overhead is
> > measurable and expensive we may want to look at per cpu counters or
> > something like that.  So far I don't have any numbers that say any
> > of the network namespace work inherently has any overhead.
> 
> It seems that on some old opterons (two 246 for example),
> "if (atomic_dec_and_test(&net->count))" is rather expensive yes :(

P4 chips are generally very poor at mispredicted branches and
atomics.  So every atomic you remove from the socket paths
gives a noticable improvement on them.

Network device reference counting is such a stupid problem.  There has
to be a way to get rid of it on the packet side.

I think we could get rid of all of the device refcounting from packets
if we:

1) Formalize "SKB roots".  This is every place a packet
   could sit in the transmit path.

2) On device unregister:

	a) wait for RCU quiesce period
	b) stop_machine_run(skb_walk_roots, netdev, NR_CPUS);

   skb_walk_roots is a function that walks all the places in
   #1, rewriting the packet to point to loopback or whatever
   instead of 'netdev' which we are trying to unregister.

This gives us two things.

First, we no longer would need to rectount net devices
for packet references.

Second, we have a debugging framework for all those dreaded SKB leaks
that keep devices from being unloadable.  As we walk the roots
we'll see where all packets referencing a device actually are.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code
  2007-10-31 23:31       ` David Miller
@ 2007-11-01  0:58         ` Eric W. Biederman
  0 siblings, 0 replies; 21+ messages in thread
From: Eric W. Biederman @ 2007-11-01  0:58 UTC (permalink / raw)
  To: David Miller; +Cc: dada1, xemul, netdev, devel

David Miller <davem@davemloft.net> writes:

> From: Eric Dumazet <dada1@cosmosbay.com>
> Date: Wed, 31 Oct 2007 23:40:59 +0100
>
>> Eric W. Biederman a écrit :
>> > Eric Dumazet <dada1@cosmosbay.com> writes:
>> > 
>> > 
>> >> Definitly wanted here. Thank you.
>> >> One more refcounting on each socket creation/deletion was expensive.
>> > 
>> > Really?  Have you actually measured that?  If the overhead is
>> > measurable and expensive we may want to look at per cpu counters or
>> > something like that.  So far I don't have any numbers that say any
>> > of the network namespace work inherently has any overhead.
>> 
>> It seems that on some old opterons (two 246 for example),
>> "if (atomic_dec_and_test(&net->count))" is rather expensive yes :(
>
> P4 chips are generally very poor at mispredicted branches and
> atomics.  So every atomic you remove from the socket paths
> gives a noticable improvement on them.

Interesting.

> Network device reference counting is such a stupid problem.  There has
> to be a way to get rid of it on the packet side.
>
> I think we could get rid of all of the device refcounting from packets
> if we:
>
> 1) Formalize "SKB roots".  This is every place a packet
>    could sit in the transmit path.

Yes there are very few of these, and I think they are generally
in interrupt or at least bottom half context aren't they?

I think the OpenVz version of network namespaces may have
already identified all of these.

> 2) On device unregister:
>
> 	a) wait for RCU quiesce period
> 	b) stop_machine_run(skb_walk_roots, netdev, NR_CPUS);

RCU sounds sufficient but possibly overkill to achieve what
we need to do here.

>    skb_walk_roots is a function that walks all the places in
>    #1, rewriting the packet to point to loopback or whatever
>    instead of 'netdev' which we are trying to unregister.
>
> This gives us two things.
>
> First, we no longer would need to rectount net devices
> for packet references.
>
> Second, we have a debugging framework for all those dreaded SKB leaks
> that keep devices from being unloadable.  As we walk the roots
> we'll see where all packets referencing a device actually are.

Sounds quite useful.  Grrr.  The brain cache locality that gets us
to rewrite things while we are refactoring them to have more
functionality....  It just keeps the problem from being straigh
forward ;)

Eric


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code
  2007-10-31 22:40     ` Eric Dumazet
  2007-10-31 23:31       ` David Miller
@ 2007-11-01  0:51       ` Eric W. Biederman
  2007-11-01  6:58         ` Eric Dumazet
  1 sibling, 1 reply; 21+ messages in thread
From: Eric W. Biederman @ 2007-11-01  0:51 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Pavel Emelyanov, David Miller, Linux Netdev List, devel

Eric Dumazet <dada1@cosmosbay.com> writes:

> Eric W. Biederman a écrit :
>> Eric Dumazet <dada1@cosmosbay.com> writes:
>>
>>
>>> Definitly wanted here. Thank you.
>>> One more refcounting on each socket creation/deletion was expensive.
>>
>> Really?  Have you actually measured that?  If the overhead is
>> measurable and expensive we may want to look at per cpu counters or
>> something like that.  So far I don't have any numbers that say any
>> of the network namespace work inherently has any overhead.
>
> It seems that on some old opterons (two 246 for example),
> "if (atomic_dec_and_test(&net->count))" is rather expensive yes :(

I won't argue that atomic_dec_and_test is costly.  My gut feel is that
socket creation/destruction is sufficiently rare that such a test
would be lost in the noise.  Doing anything more sophisticated is
likely to be less readable, and unless we can measure some overhead
my preference right now is to keep the code stupid and simple.  Which
usually has a good icache footprint.

> I am not sure per cpu counters help : I tried this and got no speedup. (This was
> on net_device refcnt at that time)
>
> (on this machines, the access through fs/gs selector seems expensive too)
>
> Maybe a lazy mode could be done, ie only do a atomic_dec(), as done in dev_put()
> ?
>
> Also, "count" sits in a cache line that contains mostly read and shared fields,
> you might want to put it in a separate cache line in SMP, to avoid cache line
> ping-pongs.

As for cache lines I could reverse the order 'list' and 'work' which
should split the read-only and the writable fields in practice for
that part of the structure.

Eric

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code
  2007-11-01  0:51       ` Eric W. Biederman
@ 2007-11-01  6:58         ` Eric Dumazet
  2007-11-01  7:02           ` David Miller
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2007-11-01  6:58 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Pavel Emelyanov, David Miller, Linux Netdev List, devel

Eric W. Biederman a écrit :
> Eric Dumazet <dada1@cosmosbay.com> writes:
> 
>> Eric W. Biederman a écrit :
>>> Eric Dumazet <dada1@cosmosbay.com> writes:
>>>
>>>
>>>> Definitly wanted here. Thank you.
>>>> One more refcounting on each socket creation/deletion was expensive.
>>> Really?  Have you actually measured that?  If the overhead is
>>> measurable and expensive we may want to look at per cpu counters or
>>> something like that.  So far I don't have any numbers that say any
>>> of the network namespace work inherently has any overhead.
>> It seems that on some old opterons (two 246 for example),
>> "if (atomic_dec_and_test(&net->count))" is rather expensive yes :(
> 
> I won't argue that atomic_dec_and_test is costly.  My gut feel is that
> socket creation/destruction is sufficiently rare that such a test
> would be lost in the noise.  Doing anything more sophisticated is
> likely to be less readable, and unless we can measure some overhead
> my preference right now is to keep the code stupid and simple.  Which
> usually has a good icache footprint.

I agree with you that with current state, this atomic_inc/atomic_dec_and_test 
wont come in profiles unless a trivial bench is writen

for(;;){close(socket(....));}

If David or another dev can eliminate the atomic inc/dec on device refcount 
cost for each packet traveling, the socket creation/destruction would 
certainly raise.

Other contention points is the mnt_count (yet another refcount) in "struct 
vfsmount", a truly useless refcount as I never had (and nobody had) to 
un-mount sock_mnt :)



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code
  2007-11-01  6:58         ` Eric Dumazet
@ 2007-11-01  7:02           ` David Miller
  0 siblings, 0 replies; 21+ messages in thread
From: David Miller @ 2007-11-01  7:02 UTC (permalink / raw)
  To: dada1; +Cc: ebiederm, xemul, netdev, devel

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Thu, 01 Nov 2007 07:58:59 +0100

> I agree with you that with current state, this atomic_inc/atomic_dec_and_test 
> wont come in profiles unless a trivial bench is writen
> 
> for(;;){close(socket(....));}

Just add one packet send and one packet receive in there and you have
a transaction workload.  It's really not that unrealistic.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/5][NETNS] Make the init/exit hooks checks outside the loop
  2007-10-31 19:19 [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code Pavel Emelyanov
  2007-10-31 18:49 ` Eric Dumazet
@ 2007-10-31 19:23 ` Pavel Emelyanov
  2007-11-01  7:43   ` David Miller
  2007-10-31 19:25 ` [PATCH 2/5] Relax the reference counting of init_net_ns Pavel Emelyanov
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Pavel Emelyanov @ 2007-10-31 19:23 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, Eric W. Biederman, devel

When the new pernet something (subsys, device or operations) is
being registered, the init callback is to be called for each
namespace, that currently exitst in the system. During the
unregister, the same is to be done with the exit callback.

However, not every pernet something has both calls, but the
check for the appropriate pointer to be not NULL is performed
inside the for_each_net() loop.

This is (at least) strange, so tune this.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

---

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 662e6ea..4e52921 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -187,29 +187,28 @@ static int register_pernet_operations(struct list_head *list,
 	struct net *net, *undo_net;
 	int error;
 
-	error = 0;
 	list_add_tail(&ops->list, list);
-	for_each_net(net) {
-		if (ops->init) {
+	if (ops->init) {
+		for_each_net(net) {
 			error = ops->init(net);
 			if (error)
 				goto out_undo;
 		}
 	}
-out:
-	return error;
+	return 0;
 
 out_undo:
 	/* If I have an error cleanup all namespaces I initialized */
 	list_del(&ops->list);
-	for_each_net(undo_net) {
-		if (undo_net == net)
-			goto undone;
-		if (ops->exit)
+	if (ops->exit) {
+		for_each_net(undo_net) {
+			if (undo_net == net)
+				goto undone;
 			ops->exit(undo_net);
+		}
 	}
 undone:
-	goto out;
+	return error;
 }
 
 static void unregister_pernet_operations(struct pernet_operations *ops)
@@ -217,8 +216,8 @@ static void unregister_pernet_operations(struct pernet_operations *ops)
 	struct net *net;
 
 	list_del(&ops->list);
-	for_each_net(net)
-		if (ops->exit)
+	if (ops->exit)
+		for_each_net(net)
 			ops->exit(net);
 }
 
-- 
1.5.3.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5][NETNS] Make the init/exit hooks checks outside the loop
  2007-10-31 19:23 ` [PATCH 1/5][NETNS] Make the init/exit hooks checks outside the loop Pavel Emelyanov
@ 2007-11-01  7:43   ` David Miller
  0 siblings, 0 replies; 21+ messages in thread
From: David Miller @ 2007-11-01  7:43 UTC (permalink / raw)
  To: xemul; +Cc: netdev, ebiederm, devel

From: Pavel Emelyanov <xemul@openvz.org>
Date: Wed, 31 Oct 2007 22:23:35 +0300

> When the new pernet something (subsys, device or operations) is
> being registered, the init callback is to be called for each
> namespace, that currently exitst in the system. During the
> unregister, the same is to be done with the exit callback.
> 
> However, not every pernet something has both calls, but the
> check for the appropriate pointer to be not NULL is performed
> inside the for_each_net() loop.
> 
> This is (at least) strange, so tune this.
> 
> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

Applied.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 2/5] Relax the reference counting of init_net_ns
  2007-10-31 19:19 [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code Pavel Emelyanov
  2007-10-31 18:49 ` Eric Dumazet
  2007-10-31 19:23 ` [PATCH 1/5][NETNS] Make the init/exit hooks checks outside the loop Pavel Emelyanov
@ 2007-10-31 19:25 ` Pavel Emelyanov
  2007-11-01  7:43   ` David Miller
  2007-10-31 19:28 ` [PATCH 3/5] Hide the dead code in the net_namespace.c Pavel Emelyanov
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Pavel Emelyanov @ 2007-10-31 19:25 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, Eric W. Biederman, devel

When the CONFIG_NET_NS is n there's no need in refcounting
the initial net namespace. So relax this code by making a
stupid stubs for the "n" case.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

---

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 5279466..1fd449a 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -51,13 +51,12 @@ static inline struct net *copy_net_ns(unsigned long flags, struct net *net_ns)
 }
 #endif
 
+#ifdef CONFIG_NET_NS
 extern void __put_net(struct net *net);
 
 static inline struct net *get_net(struct net *net)
 {
-#ifdef CONFIG_NET
 	atomic_inc(&net->count);
-#endif
 	return net;
 }
 
@@ -75,26 +74,44 @@ static inline struct net *maybe_get_net(struct net *net)
 
 static inline void put_net(struct net *net)
 {
-#ifdef CONFIG_NET
 	if (atomic_dec_and_test(&net->count))
 		__put_net(net);
-#endif
 }
 
 static inline struct net *hold_net(struct net *net)
 {
-#ifdef CONFIG_NET
 	atomic_inc(&net->use_count);
-#endif
 	return net;
 }
 
 static inline void release_net(struct net *net)
 {
-#ifdef CONFIG_NET
 	atomic_dec(&net->use_count);
-#endif
 }
+#else
+static inline struct net *get_net(struct net *net)
+{
+	return net;
+}
+
+static inline void put_net(struct net *net)
+{
+}
+
+static inline struct net *hold_net(struct net *net)
+{
+	return net;
+}
+
+static inline void release_net(struct net *net)
+{
+}
+
+static inline struct net *maybe_get_net(struct net *net)
+{
+	return net;
+}
+#endif
 
 #define for_each_net(VAR)				\
 	list_for_each_entry(VAR, &net_namespace_list, list)
-- 
1.5.3.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] Relax the reference counting of init_net_ns
  2007-10-31 19:25 ` [PATCH 2/5] Relax the reference counting of init_net_ns Pavel Emelyanov
@ 2007-11-01  7:43   ` David Miller
  0 siblings, 0 replies; 21+ messages in thread
From: David Miller @ 2007-11-01  7:43 UTC (permalink / raw)
  To: xemul; +Cc: netdev, ebiederm, devel

From: Pavel Emelyanov <xemul@openvz.org>
Date: Wed, 31 Oct 2007 22:25:18 +0300

> When the CONFIG_NET_NS is n there's no need in refcounting
> the initial net namespace. So relax this code by making a
> stupid stubs for the "n" case.
> 
> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

Applied.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 3/5] Hide the dead code in the net_namespace.c
  2007-10-31 19:19 [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code Pavel Emelyanov
                   ` (2 preceding siblings ...)
  2007-10-31 19:25 ` [PATCH 2/5] Relax the reference counting of init_net_ns Pavel Emelyanov
@ 2007-10-31 19:28 ` Pavel Emelyanov
  2007-11-01  7:45   ` David Miller
  2007-10-31 19:31 ` [PATCH 4/5] Mark the setup_net as __net_init Pavel Emelyanov
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Pavel Emelyanov @ 2007-10-31 19:28 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, Eric W. Biederman, devel

The namespace creation/destruction code is never called
if the CONFIG_NET_NS is n, so it's OK to move it under
appropriate ifdef.

The copy_net_ns() in the "n" case checks for flags and
returns -EINVAL when new net ns is requested. In a perfect
world this stub must be in net_namespace.h, but this
function need to know the CLONE_NEWNET value and thus
requires sched.h. On the other hand this header is to be
injected into almost every .c file in the networking code,
and making all this code depend on the sched.h is a
suicidal attempt.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

---

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 4e52921..d5bf8b2 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -22,65 +22,6 @@ static struct kmem_cache *net_cachep;
 struct net init_net;
 EXPORT_SYMBOL_GPL(init_net);
 
-static struct net *net_alloc(void)
-{
-	return kmem_cache_zalloc(net_cachep, GFP_KERNEL);
-}
-
-static void net_free(struct net *net)
-{
-	if (!net)
-		return;
-
-	if (unlikely(atomic_read(&net->use_count) != 0)) {
-		printk(KERN_EMERG "network namespace not free! Usage: %d\n",
-			atomic_read(&net->use_count));
-		return;
-	}
-
-	kmem_cache_free(net_cachep, net);
-}
-
-static void cleanup_net(struct work_struct *work)
-{
-	struct pernet_operations *ops;
-	struct net *net;
-
-	net = container_of(work, struct net, work);
-
-	mutex_lock(&net_mutex);
-
-	/* Don't let anyone else find us. */
-	rtnl_lock();
-	list_del(&net->list);
-	rtnl_unlock();
-
-	/* Run all of the network namespace exit methods */
-	list_for_each_entry_reverse(ops, &pernet_list, list) {
-		if (ops->exit)
-			ops->exit(net);
-	}
-
-	mutex_unlock(&net_mutex);
-
-	/* Ensure there are no outstanding rcu callbacks using this
-	 * network namespace.
-	 */
-	rcu_barrier();
-
-	/* Finally it is safe to free my network namespace structure */
-	net_free(net);
-}
-
-
-void __put_net(struct net *net)
-{
-	/* Cleanup the network namespace in process context */
-	INIT_WORK(&net->work, cleanup_net);
-	schedule_work(&net->work);
-}
-EXPORT_SYMBOL_GPL(__put_net);
-
 /*
  * setup_net runs the initializers for the network namespace object.
  */
@@ -117,6 +58,12 @@ out_undo:
 	goto out;
 }
 
+#ifdef CONFIG_NET_NS
+static struct net *net_alloc(void)
+{
+	return kmem_cache_zalloc(net_cachep, GFP_KERNEL);
+}
+
 struct net *copy_net_ns(unsigned long flags, struct net *old_net)
 {
 	struct net *new_net = NULL;
@@ -127,10 +74,6 @@ struct net *copy_net_ns(unsigned long flags, struct net *old_net)
 	if (!(flags & CLONE_NEWNET))
 		return old_net;
 
-#ifndef CONFIG_NET_NS
-	return ERR_PTR(-EINVAL);
-#endif
-
 	err = -ENOMEM;
 	new_net = net_alloc();
 	if (!new_net)
@@ -157,6 +100,68 @@ out:
 	return new_net;
 }
 
+static void net_free(struct net *net)
+{
+	if (!net)
+		return;
+
+	if (unlikely(atomic_read(&net->use_count) != 0)) {
+		printk(KERN_EMERG "network namespace not free! Usage: %d\n",
+			atomic_read(&net->use_count));
+		return;
+	}
+
+	kmem_cache_free(net_cachep, net);
+}
+
+static void cleanup_net(struct work_struct *work)
+{
+	struct pernet_operations *ops;
+	struct net *net;
+
+	net = container_of(work, struct net, work);
+
+	mutex_lock(&net_mutex);
+
+	/* Don't let anyone else find us. */
+	rtnl_lock();
+	list_del(&net->list);
+	rtnl_unlock();
+
+	/* Run all of the network namespace exit methods */
+	list_for_each_entry_reverse(ops, &pernet_list, list) {
+		if (ops->exit)
+			ops->exit(net);
+	}
+
+	mutex_unlock(&net_mutex);
+
+	/* Ensure there are no outstanding rcu callbacks using this
+	 * network namespace.
+	 */
+	rcu_barrier();
+
+	/* Finally it is safe to free my network namespace structure */
+	net_free(net);
+}
+
+void __put_net(struct net *net)
+{
+	/* Cleanup the network namespace in process context */
+	INIT_WORK(&net->work, cleanup_net);
+	schedule_work(&net->work);
+}
+EXPORT_SYMBOL_GPL(__put_net);
+
+#else
+struct net *copy_net_ns(unsigned long flags, struct net *old_net)
+{
+	if (flags & CLONE_NEWNET)
+		return ERR_PTR(-EINVAL);
+	return old_net;
+}
+#endif
+
 static int __init net_ns_init(void)
 {
 	int err;
-- 
1.5.3.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/5] Hide the dead code in the net_namespace.c
  2007-10-31 19:28 ` [PATCH 3/5] Hide the dead code in the net_namespace.c Pavel Emelyanov
@ 2007-11-01  7:45   ` David Miller
  0 siblings, 0 replies; 21+ messages in thread
From: David Miller @ 2007-11-01  7:45 UTC (permalink / raw)
  To: xemul; +Cc: netdev, ebiederm, devel

From: Pavel Emelyanov <xemul@openvz.org>
Date: Wed, 31 Oct 2007 22:28:51 +0300

> The namespace creation/destruction code is never called
> if the CONFIG_NET_NS is n, so it's OK to move it under
> appropriate ifdef.
> 
> The copy_net_ns() in the "n" case checks for flags and
> returns -EINVAL when new net ns is requested. In a perfect
> world this stub must be in net_namespace.h, but this
> function need to know the CLONE_NEWNET value and thus
> requires sched.h. On the other hand this header is to be
> injected into almost every .c file in the networking code,
> and making all this code depend on the sched.h is a
> suicidal attempt.
> 
> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

Applied.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 4/5] Mark the setup_net as __net_init
  2007-10-31 19:19 [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code Pavel Emelyanov
                   ` (3 preceding siblings ...)
  2007-10-31 19:28 ` [PATCH 3/5] Hide the dead code in the net_namespace.c Pavel Emelyanov
@ 2007-10-31 19:31 ` Pavel Emelyanov
  2007-11-01  7:46   ` David Miller
  2007-10-31 19:32 ` [PATCH 5/5] Hide the net_ns kmem cache Pavel Emelyanov
  2007-10-31 21:37 ` [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code Daniel Lezcano
  6 siblings, 1 reply; 21+ messages in thread
From: Pavel Emelyanov @ 2007-10-31 19:31 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, Eric W. Biederman, devel

The setup_net is called for the init net namespace
only (int the CONFIG_NET_NS=n of course) from the __init
function, so mark it as __net_init to disappear with the
caller after the boot.

Yet again, in the perfect world this has to be under
#ifdef CONFIG_NET_NS, but it isn't guaranteed that every
subsystem is registered *after* the init_net_ns is set
up. After we are sure, that we don't start registering
them before the init net setup, we'll be able to move
this code under the ifdef.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

---

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index d5bf8b2..a044e2d 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -25,7 +25,7 @@ EXPORT_SYMBOL_GPL(init_net);
 /*
  * setup_net runs the initializers for the network namespace object.
  */
-static int setup_net(struct net *net)
+static __net_init int setup_net(struct net *net)
 {
 	/* Must be called with net_mutex held */
 	struct pernet_operations *ops;
-- 
1.5.3.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] Mark the setup_net as __net_init
  2007-10-31 19:31 ` [PATCH 4/5] Mark the setup_net as __net_init Pavel Emelyanov
@ 2007-11-01  7:46   ` David Miller
  0 siblings, 0 replies; 21+ messages in thread
From: David Miller @ 2007-11-01  7:46 UTC (permalink / raw)
  To: xemul; +Cc: netdev, ebiederm, devel

From: Pavel Emelyanov <xemul@openvz.org>
Date: Wed, 31 Oct 2007 22:31:46 +0300

> The setup_net is called for the init net namespace
> only (int the CONFIG_NET_NS=n of course) from the __init
> function, so mark it as __net_init to disappear with the
> caller after the boot.
> 
> Yet again, in the perfect world this has to be under
> #ifdef CONFIG_NET_NS, but it isn't guaranteed that every
> subsystem is registered *after* the init_net_ns is set
> up. After we are sure, that we don't start registering
> them before the init net setup, we'll be able to move
> this code under the ifdef.
> 
> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

Applied.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 5/5] Hide the net_ns kmem cache
  2007-10-31 19:19 [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code Pavel Emelyanov
                   ` (4 preceding siblings ...)
  2007-10-31 19:31 ` [PATCH 4/5] Mark the setup_net as __net_init Pavel Emelyanov
@ 2007-10-31 19:32 ` Pavel Emelyanov
  2007-11-01  7:47   ` David Miller
  2007-10-31 21:37 ` [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code Daniel Lezcano
  6 siblings, 1 reply; 21+ messages in thread
From: Pavel Emelyanov @ 2007-10-31 19:32 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, Eric W. Biederman, devel

This cache is only required to create new namespaces,
but we won't have them in CONFIG_NET_NS=n case.

Hide it under the appropriate ifdef.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

---

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index a044e2d..e9f0964 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -17,8 +17,6 @@ static DEFINE_MUTEX(net_mutex);
 
 LIST_HEAD(net_namespace_list);
 
-static struct kmem_cache *net_cachep;
-
 struct net init_net;
 EXPORT_SYMBOL_GPL(init_net);
 
@@ -59,6 +57,8 @@ out_undo:
 }
 
 #ifdef CONFIG_NET_NS
+static struct kmem_cache *net_cachep;
+
 static struct net *net_alloc(void)
 {
 	return kmem_cache_zalloc(net_cachep, GFP_KERNEL);
@@ -167,9 +167,11 @@ static int __init net_ns_init(void)
 	int err;
 
 	printk(KERN_INFO "net_namespace: %zd bytes\n", sizeof(struct net));
+#ifdef CONFIG_NET_NS
 	net_cachep = kmem_cache_create("net_namespace", sizeof(struct net),
 					SMP_CACHE_BYTES,
 					SLAB_PANIC, NULL);
+#endif
 	mutex_lock(&net_mutex);
 	err = setup_net(&init_net);
 
-- 
1.5.3.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/5] Hide the net_ns kmem cache
  2007-10-31 19:32 ` [PATCH 5/5] Hide the net_ns kmem cache Pavel Emelyanov
@ 2007-11-01  7:47   ` David Miller
  0 siblings, 0 replies; 21+ messages in thread
From: David Miller @ 2007-11-01  7:47 UTC (permalink / raw)
  To: xemul; +Cc: netdev, ebiederm, devel

From: Pavel Emelyanov <xemul@openvz.org>
Date: Wed, 31 Oct 2007 22:32:55 +0300

> This cache is only required to create new namespaces,
> but we won't have them in CONFIG_NET_NS=n case.
> 
> Hide it under the appropriate ifdef.
> 
> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

Applied.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code
  2007-10-31 19:19 [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code Pavel Emelyanov
                   ` (5 preceding siblings ...)
  2007-10-31 19:32 ` [PATCH 5/5] Hide the net_ns kmem cache Pavel Emelyanov
@ 2007-10-31 21:37 ` Daniel Lezcano
  6 siblings, 0 replies; 21+ messages in thread
From: Daniel Lezcano @ 2007-10-31 21:37 UTC (permalink / raw)
  To: Pavel Emelyanov; +Cc: David Miller, Linux Netdev List, Eric W. Biederman, devel

Pavel Emelyanov wrote:
> Currently we have the NET_NS config option, but the only change it 
> makes is just return ERR_PTR(-EINVAL) inside the cloning call thus
> introducing a bunch of a dead code and making the reference counting
> unneeded. This is not very good.
> 
> So clean the net_namespace.c to fix this.
> 
> I have sent a set of patches to Andrew to make similar thing for
> other namespaces, which introduces the NAMESPACES option to turn
> all the namespaces off at once (to make embedded people suffer
> less). So after that stuff is in, there will be some more patches 
> to tie all this together.
> 
> What is to be done after this set is to make the register_pernet_xxx
> stuff smaller. Currently this code weights approximately 500 bytes, 
> so it worths reducing it, but I haven't found a good solution yet.

Did you had time to check the impact of your patch with the rest of the 
network namespaces not yet included in mainline, belonging to Eric's git 
tree ?

ps: can you cc' emails concerning the network namespace to the 
containers mailing list too ? thx.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2007-11-01  7:47 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-31 19:19 [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code Pavel Emelyanov
2007-10-31 18:49 ` Eric Dumazet
2007-10-31 21:35   ` Daniel Lezcano
2007-10-31 22:05   ` Eric W. Biederman
2007-10-31 22:40     ` Eric Dumazet
2007-10-31 23:31       ` David Miller
2007-11-01  0:58         ` Eric W. Biederman
2007-11-01  0:51       ` Eric W. Biederman
2007-11-01  6:58         ` Eric Dumazet
2007-11-01  7:02           ` David Miller
2007-10-31 19:23 ` [PATCH 1/5][NETNS] Make the init/exit hooks checks outside the loop Pavel Emelyanov
2007-11-01  7:43   ` David Miller
2007-10-31 19:25 ` [PATCH 2/5] Relax the reference counting of init_net_ns Pavel Emelyanov
2007-11-01  7:43   ` David Miller
2007-10-31 19:28 ` [PATCH 3/5] Hide the dead code in the net_namespace.c Pavel Emelyanov
2007-11-01  7:45   ` David Miller
2007-10-31 19:31 ` [PATCH 4/5] Mark the setup_net as __net_init Pavel Emelyanov
2007-11-01  7:46   ` David Miller
2007-10-31 19:32 ` [PATCH 5/5] Hide the net_ns kmem cache Pavel Emelyanov
2007-11-01  7:47   ` David Miller
2007-10-31 21:37 ` [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code Daniel Lezcano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).