* Re: [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code
2007-10-31 19:19 [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code Pavel Emelyanov
@ 2007-10-31 18:49 ` Eric Dumazet
2007-10-31 21:35 ` Daniel Lezcano
2007-10-31 22:05 ` Eric W. Biederman
2007-10-31 19:23 ` [PATCH 1/5][NETNS] Make the init/exit hooks checks outside the loop Pavel Emelyanov
` (5 subsequent siblings)
6 siblings, 2 replies; 21+ messages in thread
From: Eric Dumazet @ 2007-10-31 18:49 UTC (permalink / raw)
To: Pavel Emelyanov; +Cc: David Miller, Linux Netdev List, Eric W. Biederman, devel
On Wed, 31 Oct 2007 22:19:43 +0300
Pavel Emelyanov <xemul@openvz.org> wrote:
> Currently we have the NET_NS config option, but the only change it
> makes is just return ERR_PTR(-EINVAL) inside the cloning call thus
> introducing a bunch of a dead code and making the reference counting
> unneeded. This is not very good.
>
> So clean the net_namespace.c to fix this.
>
> I have sent a set of patches to Andrew to make similar thing for
> other namespaces, which introduces the NAMESPACES option to turn
> all the namespaces off at once (to make embedded people suffer
> less). So after that stuff is in, there will be some more patches
> to tie all this together.
>
> What is to be done after this set is to make the register_pernet_xxx
> stuff smaller. Currently this code weights approximately 500 bytes,
> so it worths reducing it, but I haven't found a good solution yet.
Definitly wanted here. Thank you.
One more refcounting on each socket creation/deletion was expensive.
Maybe we can add a macro to get nd_net from a "struct net_device"
so that every instance of
if (dev->nd_net != &init_net)
goto drop;
can also be optimized away if !CONFIG_NET_NS
extern inline netdev_get_ns(struct netdevice *dev)
{
#ifdef CONFIG_NET_NS
return dev->nd_net;
#else
return &init_net;
#endif
}
...
if (netdev_get_ns(dev) != &init_net)
goto drop;
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code
2007-10-31 18:49 ` Eric Dumazet
@ 2007-10-31 21:35 ` Daniel Lezcano
2007-10-31 22:05 ` Eric W. Biederman
1 sibling, 0 replies; 21+ messages in thread
From: Daniel Lezcano @ 2007-10-31 21:35 UTC (permalink / raw)
To: Eric Dumazet
Cc: Pavel Emelyanov, David Miller, Linux Netdev List,
Eric W. Biederman, devel
Eric Dumazet wrote:
> On Wed, 31 Oct 2007 22:19:43 +0300
> Pavel Emelyanov <xemul@openvz.org> wrote:
>
>> Currently we have the NET_NS config option, but the only change it
>> makes is just return ERR_PTR(-EINVAL) inside the cloning call thus
>> introducing a bunch of a dead code and making the reference counting
>> unneeded. This is not very good.
>>
>> So clean the net_namespace.c to fix this.
>>
>> I have sent a set of patches to Andrew to make similar thing for
>> other namespaces, which introduces the NAMESPACES option to turn
>> all the namespaces off at once (to make embedded people suffer
>> less). So after that stuff is in, there will be some more patches
>> to tie all this together.
>>
>> What is to be done after this set is to make the register_pernet_xxx
>> stuff smaller. Currently this code weights approximately 500 bytes,
>> so it worths reducing it, but I haven't found a good solution yet.
>
> Definitly wanted here. Thank you.
> One more refcounting on each socket creation/deletion was expensive.
>
> Maybe we can add a macro to get nd_net from a "struct net_device"
> so that every instance of
>
> if (dev->nd_net != &init_net)
> goto drop;
>
> can also be optimized away if !CONFIG_NET_NS
>
> extern inline netdev_get_ns(struct netdevice *dev)
> {
> #ifdef CONFIG_NET_NS
> return dev->nd_net;
> #else
> return &init_net;
> #endif
> }
Or something like:
#ifdef CONFIG_NET_NS
static inline int init_net_dev(struct net_device *dev)
{
return dev->nd_net == &init_net;
}
#else
static inline int init_net_dev(struct net_device *dev)
{
return 1;
}
#endif
By the way, this kind of test will disappear when the network namespace
will be complete and take into account the differents protocols.
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code
2007-10-31 18:49 ` Eric Dumazet
2007-10-31 21:35 ` Daniel Lezcano
@ 2007-10-31 22:05 ` Eric W. Biederman
2007-10-31 22:40 ` Eric Dumazet
1 sibling, 1 reply; 21+ messages in thread
From: Eric W. Biederman @ 2007-10-31 22:05 UTC (permalink / raw)
To: Pavel Emelyanov; +Cc: David Miller, Linux Netdev List, devel
Eric Dumazet <dada1@cosmosbay.com> writes:
> Definitly wanted here. Thank you.
> One more refcounting on each socket creation/deletion was expensive.
Really? Have you actually measured that? If the overhead is
measurable and expensive we may want to look at per cpu counters or
something like that. So far I don't have any numbers that say any
of the network namespace work inherently has any overhead.
> Maybe we can add a macro to get nd_net from a "struct net_device"
> so that every instance of
>
> if (dev->nd_net != &init_net)
> goto drop;
>
> can also be optimized away if !CONFIG_NET_NS
Well that extra check should be removed once we finish converting
those code paths. So I'm not too worried.
If this becomes a big issue I can dig up my old code that
replaced struct net * with a net_t typedef and used functions
for all of the comparisons and allowed everything to be compiled
away.
Trouble was it was sufficiently different that it was just enough
different that people could not immediately understand the code.
Eric
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code
2007-10-31 22:05 ` Eric W. Biederman
@ 2007-10-31 22:40 ` Eric Dumazet
2007-10-31 23:31 ` David Miller
2007-11-01 0:51 ` Eric W. Biederman
0 siblings, 2 replies; 21+ messages in thread
From: Eric Dumazet @ 2007-10-31 22:40 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Pavel Emelyanov, David Miller, Linux Netdev List, devel
Eric W. Biederman a écrit :
> Eric Dumazet <dada1@cosmosbay.com> writes:
>
>
>> Definitly wanted here. Thank you.
>> One more refcounting on each socket creation/deletion was expensive.
>
> Really? Have you actually measured that? If the overhead is
> measurable and expensive we may want to look at per cpu counters or
> something like that. So far I don't have any numbers that say any
> of the network namespace work inherently has any overhead.
It seems that on some old opterons (two 246 for example),
"if (atomic_dec_and_test(&net->count))" is rather expensive yes :(
I am not sure per cpu counters help : I tried this and got no speedup. (This
was on net_device refcnt at that time)
(on this machines, the access through fs/gs selector seems expensive too)
Maybe a lazy mode could be done, ie only do a atomic_dec(), as done in dev_put() ?
Also, "count" sits in a cache line that contains mostly read and shared
fields, you might want to put it in a separate cache line in SMP, to avoid
cache line ping-pongs.
>
>> Maybe we can add a macro to get nd_net from a "struct net_device"
>> so that every instance of
>>
>> if (dev->nd_net != &init_net)
>> goto drop;
>>
>> can also be optimized away if !CONFIG_NET_NS
>
> Well that extra check should be removed once we finish converting
> those code paths. So I'm not too worried.
OK. Since the conditional test can be predicted by cpu, it certainly doesnt
matter.
>
> If this becomes a big issue I can dig up my old code that
> replaced struct net * with a net_t typedef and used functions
> for all of the comparisons and allowed everything to be compiled
> away.
>
> Trouble was it was sufficiently different that it was just enough
> different that people could not immediately understand the code.
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code
2007-10-31 22:40 ` Eric Dumazet
@ 2007-10-31 23:31 ` David Miller
2007-11-01 0:58 ` Eric W. Biederman
2007-11-01 0:51 ` Eric W. Biederman
1 sibling, 1 reply; 21+ messages in thread
From: David Miller @ 2007-10-31 23:31 UTC (permalink / raw)
To: dada1; +Cc: ebiederm, xemul, netdev, devel
From: Eric Dumazet <dada1@cosmosbay.com>
Date: Wed, 31 Oct 2007 23:40:59 +0100
> Eric W. Biederman a écrit :
> > Eric Dumazet <dada1@cosmosbay.com> writes:
> >
> >
> >> Definitly wanted here. Thank you.
> >> One more refcounting on each socket creation/deletion was expensive.
> >
> > Really? Have you actually measured that? If the overhead is
> > measurable and expensive we may want to look at per cpu counters or
> > something like that. So far I don't have any numbers that say any
> > of the network namespace work inherently has any overhead.
>
> It seems that on some old opterons (two 246 for example),
> "if (atomic_dec_and_test(&net->count))" is rather expensive yes :(
P4 chips are generally very poor at mispredicted branches and
atomics. So every atomic you remove from the socket paths
gives a noticable improvement on them.
Network device reference counting is such a stupid problem. There has
to be a way to get rid of it on the packet side.
I think we could get rid of all of the device refcounting from packets
if we:
1) Formalize "SKB roots". This is every place a packet
could sit in the transmit path.
2) On device unregister:
a) wait for RCU quiesce period
b) stop_machine_run(skb_walk_roots, netdev, NR_CPUS);
skb_walk_roots is a function that walks all the places in
#1, rewriting the packet to point to loopback or whatever
instead of 'netdev' which we are trying to unregister.
This gives us two things.
First, we no longer would need to rectount net devices
for packet references.
Second, we have a debugging framework for all those dreaded SKB leaks
that keep devices from being unloadable. As we walk the roots
we'll see where all packets referencing a device actually are.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code
2007-10-31 23:31 ` David Miller
@ 2007-11-01 0:58 ` Eric W. Biederman
0 siblings, 0 replies; 21+ messages in thread
From: Eric W. Biederman @ 2007-11-01 0:58 UTC (permalink / raw)
To: David Miller; +Cc: dada1, xemul, netdev, devel
David Miller <davem@davemloft.net> writes:
> From: Eric Dumazet <dada1@cosmosbay.com>
> Date: Wed, 31 Oct 2007 23:40:59 +0100
>
>> Eric W. Biederman a écrit :
>> > Eric Dumazet <dada1@cosmosbay.com> writes:
>> >
>> >
>> >> Definitly wanted here. Thank you.
>> >> One more refcounting on each socket creation/deletion was expensive.
>> >
>> > Really? Have you actually measured that? If the overhead is
>> > measurable and expensive we may want to look at per cpu counters or
>> > something like that. So far I don't have any numbers that say any
>> > of the network namespace work inherently has any overhead.
>>
>> It seems that on some old opterons (two 246 for example),
>> "if (atomic_dec_and_test(&net->count))" is rather expensive yes :(
>
> P4 chips are generally very poor at mispredicted branches and
> atomics. So every atomic you remove from the socket paths
> gives a noticable improvement on them.
Interesting.
> Network device reference counting is such a stupid problem. There has
> to be a way to get rid of it on the packet side.
>
> I think we could get rid of all of the device refcounting from packets
> if we:
>
> 1) Formalize "SKB roots". This is every place a packet
> could sit in the transmit path.
Yes there are very few of these, and I think they are generally
in interrupt or at least bottom half context aren't they?
I think the OpenVz version of network namespaces may have
already identified all of these.
> 2) On device unregister:
>
> a) wait for RCU quiesce period
> b) stop_machine_run(skb_walk_roots, netdev, NR_CPUS);
RCU sounds sufficient but possibly overkill to achieve what
we need to do here.
> skb_walk_roots is a function that walks all the places in
> #1, rewriting the packet to point to loopback or whatever
> instead of 'netdev' which we are trying to unregister.
>
> This gives us two things.
>
> First, we no longer would need to rectount net devices
> for packet references.
>
> Second, we have a debugging framework for all those dreaded SKB leaks
> that keep devices from being unloadable. As we walk the roots
> we'll see where all packets referencing a device actually are.
Sounds quite useful. Grrr. The brain cache locality that gets us
to rewrite things while we are refactoring them to have more
functionality.... It just keeps the problem from being straigh
forward ;)
Eric
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code
2007-10-31 22:40 ` Eric Dumazet
2007-10-31 23:31 ` David Miller
@ 2007-11-01 0:51 ` Eric W. Biederman
2007-11-01 6:58 ` Eric Dumazet
1 sibling, 1 reply; 21+ messages in thread
From: Eric W. Biederman @ 2007-11-01 0:51 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Pavel Emelyanov, David Miller, Linux Netdev List, devel
Eric Dumazet <dada1@cosmosbay.com> writes:
> Eric W. Biederman a écrit :
>> Eric Dumazet <dada1@cosmosbay.com> writes:
>>
>>
>>> Definitly wanted here. Thank you.
>>> One more refcounting on each socket creation/deletion was expensive.
>>
>> Really? Have you actually measured that? If the overhead is
>> measurable and expensive we may want to look at per cpu counters or
>> something like that. So far I don't have any numbers that say any
>> of the network namespace work inherently has any overhead.
>
> It seems that on some old opterons (two 246 for example),
> "if (atomic_dec_and_test(&net->count))" is rather expensive yes :(
I won't argue that atomic_dec_and_test is costly. My gut feel is that
socket creation/destruction is sufficiently rare that such a test
would be lost in the noise. Doing anything more sophisticated is
likely to be less readable, and unless we can measure some overhead
my preference right now is to keep the code stupid and simple. Which
usually has a good icache footprint.
> I am not sure per cpu counters help : I tried this and got no speedup. (This was
> on net_device refcnt at that time)
>
> (on this machines, the access through fs/gs selector seems expensive too)
>
> Maybe a lazy mode could be done, ie only do a atomic_dec(), as done in dev_put()
> ?
>
> Also, "count" sits in a cache line that contains mostly read and shared fields,
> you might want to put it in a separate cache line in SMP, to avoid cache line
> ping-pongs.
As for cache lines I could reverse the order 'list' and 'work' which
should split the read-only and the writable fields in practice for
that part of the structure.
Eric
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code
2007-11-01 0:51 ` Eric W. Biederman
@ 2007-11-01 6:58 ` Eric Dumazet
2007-11-01 7:02 ` David Miller
0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2007-11-01 6:58 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Pavel Emelyanov, David Miller, Linux Netdev List, devel
Eric W. Biederman a écrit :
> Eric Dumazet <dada1@cosmosbay.com> writes:
>
>> Eric W. Biederman a écrit :
>>> Eric Dumazet <dada1@cosmosbay.com> writes:
>>>
>>>
>>>> Definitly wanted here. Thank you.
>>>> One more refcounting on each socket creation/deletion was expensive.
>>> Really? Have you actually measured that? If the overhead is
>>> measurable and expensive we may want to look at per cpu counters or
>>> something like that. So far I don't have any numbers that say any
>>> of the network namespace work inherently has any overhead.
>> It seems that on some old opterons (two 246 for example),
>> "if (atomic_dec_and_test(&net->count))" is rather expensive yes :(
>
> I won't argue that atomic_dec_and_test is costly. My gut feel is that
> socket creation/destruction is sufficiently rare that such a test
> would be lost in the noise. Doing anything more sophisticated is
> likely to be less readable, and unless we can measure some overhead
> my preference right now is to keep the code stupid and simple. Which
> usually has a good icache footprint.
I agree with you that with current state, this atomic_inc/atomic_dec_and_test
wont come in profiles unless a trivial bench is writen
for(;;){close(socket(....));}
If David or another dev can eliminate the atomic inc/dec on device refcount
cost for each packet traveling, the socket creation/destruction would
certainly raise.
Other contention points is the mnt_count (yet another refcount) in "struct
vfsmount", a truly useless refcount as I never had (and nobody had) to
un-mount sock_mnt :)
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code
2007-11-01 6:58 ` Eric Dumazet
@ 2007-11-01 7:02 ` David Miller
0 siblings, 0 replies; 21+ messages in thread
From: David Miller @ 2007-11-01 7:02 UTC (permalink / raw)
To: dada1; +Cc: ebiederm, xemul, netdev, devel
From: Eric Dumazet <dada1@cosmosbay.com>
Date: Thu, 01 Nov 2007 07:58:59 +0100
> I agree with you that with current state, this atomic_inc/atomic_dec_and_test
> wont come in profiles unless a trivial bench is writen
>
> for(;;){close(socket(....));}
Just add one packet send and one packet receive in there and you have
a transaction workload. It's really not that unrealistic.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 1/5][NETNS] Make the init/exit hooks checks outside the loop
2007-10-31 19:19 [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code Pavel Emelyanov
2007-10-31 18:49 ` Eric Dumazet
@ 2007-10-31 19:23 ` Pavel Emelyanov
2007-11-01 7:43 ` David Miller
2007-10-31 19:25 ` [PATCH 2/5] Relax the reference counting of init_net_ns Pavel Emelyanov
` (4 subsequent siblings)
6 siblings, 1 reply; 21+ messages in thread
From: Pavel Emelyanov @ 2007-10-31 19:23 UTC (permalink / raw)
To: David Miller; +Cc: Linux Netdev List, Eric W. Biederman, devel
When the new pernet something (subsys, device or operations) is
being registered, the init callback is to be called for each
namespace, that currently exitst in the system. During the
unregister, the same is to be done with the exit callback.
However, not every pernet something has both calls, but the
check for the appropriate pointer to be not NULL is performed
inside the for_each_net() loop.
This is (at least) strange, so tune this.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
---
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 662e6ea..4e52921 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -187,29 +187,28 @@ static int register_pernet_operations(struct list_head *list,
struct net *net, *undo_net;
int error;
- error = 0;
list_add_tail(&ops->list, list);
- for_each_net(net) {
- if (ops->init) {
+ if (ops->init) {
+ for_each_net(net) {
error = ops->init(net);
if (error)
goto out_undo;
}
}
-out:
- return error;
+ return 0;
out_undo:
/* If I have an error cleanup all namespaces I initialized */
list_del(&ops->list);
- for_each_net(undo_net) {
- if (undo_net == net)
- goto undone;
- if (ops->exit)
+ if (ops->exit) {
+ for_each_net(undo_net) {
+ if (undo_net == net)
+ goto undone;
ops->exit(undo_net);
+ }
}
undone:
- goto out;
+ return error;
}
static void unregister_pernet_operations(struct pernet_operations *ops)
@@ -217,8 +216,8 @@ static void unregister_pernet_operations(struct pernet_operations *ops)
struct net *net;
list_del(&ops->list);
- for_each_net(net)
- if (ops->exit)
+ if (ops->exit)
+ for_each_net(net)
ops->exit(net);
}
--
1.5.3.4
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH 1/5][NETNS] Make the init/exit hooks checks outside the loop
2007-10-31 19:23 ` [PATCH 1/5][NETNS] Make the init/exit hooks checks outside the loop Pavel Emelyanov
@ 2007-11-01 7:43 ` David Miller
0 siblings, 0 replies; 21+ messages in thread
From: David Miller @ 2007-11-01 7:43 UTC (permalink / raw)
To: xemul; +Cc: netdev, ebiederm, devel
From: Pavel Emelyanov <xemul@openvz.org>
Date: Wed, 31 Oct 2007 22:23:35 +0300
> When the new pernet something (subsys, device or operations) is
> being registered, the init callback is to be called for each
> namespace, that currently exitst in the system. During the
> unregister, the same is to be done with the exit callback.
>
> However, not every pernet something has both calls, but the
> check for the appropriate pointer to be not NULL is performed
> inside the for_each_net() loop.
>
> This is (at least) strange, so tune this.
>
> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Applied.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 2/5] Relax the reference counting of init_net_ns
2007-10-31 19:19 [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code Pavel Emelyanov
2007-10-31 18:49 ` Eric Dumazet
2007-10-31 19:23 ` [PATCH 1/5][NETNS] Make the init/exit hooks checks outside the loop Pavel Emelyanov
@ 2007-10-31 19:25 ` Pavel Emelyanov
2007-11-01 7:43 ` David Miller
2007-10-31 19:28 ` [PATCH 3/5] Hide the dead code in the net_namespace.c Pavel Emelyanov
` (3 subsequent siblings)
6 siblings, 1 reply; 21+ messages in thread
From: Pavel Emelyanov @ 2007-10-31 19:25 UTC (permalink / raw)
To: David Miller; +Cc: Linux Netdev List, Eric W. Biederman, devel
When the CONFIG_NET_NS is n there's no need in refcounting
the initial net namespace. So relax this code by making a
stupid stubs for the "n" case.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
---
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 5279466..1fd449a 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -51,13 +51,12 @@ static inline struct net *copy_net_ns(unsigned long flags, struct net *net_ns)
}
#endif
+#ifdef CONFIG_NET_NS
extern void __put_net(struct net *net);
static inline struct net *get_net(struct net *net)
{
-#ifdef CONFIG_NET
atomic_inc(&net->count);
-#endif
return net;
}
@@ -75,26 +74,44 @@ static inline struct net *maybe_get_net(struct net *net)
static inline void put_net(struct net *net)
{
-#ifdef CONFIG_NET
if (atomic_dec_and_test(&net->count))
__put_net(net);
-#endif
}
static inline struct net *hold_net(struct net *net)
{
-#ifdef CONFIG_NET
atomic_inc(&net->use_count);
-#endif
return net;
}
static inline void release_net(struct net *net)
{
-#ifdef CONFIG_NET
atomic_dec(&net->use_count);
-#endif
}
+#else
+static inline struct net *get_net(struct net *net)
+{
+ return net;
+}
+
+static inline void put_net(struct net *net)
+{
+}
+
+static inline struct net *hold_net(struct net *net)
+{
+ return net;
+}
+
+static inline void release_net(struct net *net)
+{
+}
+
+static inline struct net *maybe_get_net(struct net *net)
+{
+ return net;
+}
+#endif
#define for_each_net(VAR) \
list_for_each_entry(VAR, &net_namespace_list, list)
--
1.5.3.4
^ permalink raw reply related [flat|nested] 21+ messages in thread* [PATCH 3/5] Hide the dead code in the net_namespace.c
2007-10-31 19:19 [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code Pavel Emelyanov
` (2 preceding siblings ...)
2007-10-31 19:25 ` [PATCH 2/5] Relax the reference counting of init_net_ns Pavel Emelyanov
@ 2007-10-31 19:28 ` Pavel Emelyanov
2007-11-01 7:45 ` David Miller
2007-10-31 19:31 ` [PATCH 4/5] Mark the setup_net as __net_init Pavel Emelyanov
` (2 subsequent siblings)
6 siblings, 1 reply; 21+ messages in thread
From: Pavel Emelyanov @ 2007-10-31 19:28 UTC (permalink / raw)
To: David Miller; +Cc: Linux Netdev List, Eric W. Biederman, devel
The namespace creation/destruction code is never called
if the CONFIG_NET_NS is n, so it's OK to move it under
appropriate ifdef.
The copy_net_ns() in the "n" case checks for flags and
returns -EINVAL when new net ns is requested. In a perfect
world this stub must be in net_namespace.h, but this
function need to know the CLONE_NEWNET value and thus
requires sched.h. On the other hand this header is to be
injected into almost every .c file in the networking code,
and making all this code depend on the sched.h is a
suicidal attempt.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
---
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 4e52921..d5bf8b2 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -22,65 +22,6 @@ static struct kmem_cache *net_cachep;
struct net init_net;
EXPORT_SYMBOL_GPL(init_net);
-static struct net *net_alloc(void)
-{
- return kmem_cache_zalloc(net_cachep, GFP_KERNEL);
-}
-
-static void net_free(struct net *net)
-{
- if (!net)
- return;
-
- if (unlikely(atomic_read(&net->use_count) != 0)) {
- printk(KERN_EMERG "network namespace not free! Usage: %d\n",
- atomic_read(&net->use_count));
- return;
- }
-
- kmem_cache_free(net_cachep, net);
-}
-
-static void cleanup_net(struct work_struct *work)
-{
- struct pernet_operations *ops;
- struct net *net;
-
- net = container_of(work, struct net, work);
-
- mutex_lock(&net_mutex);
-
- /* Don't let anyone else find us. */
- rtnl_lock();
- list_del(&net->list);
- rtnl_unlock();
-
- /* Run all of the network namespace exit methods */
- list_for_each_entry_reverse(ops, &pernet_list, list) {
- if (ops->exit)
- ops->exit(net);
- }
-
- mutex_unlock(&net_mutex);
-
- /* Ensure there are no outstanding rcu callbacks using this
- * network namespace.
- */
- rcu_barrier();
-
- /* Finally it is safe to free my network namespace structure */
- net_free(net);
-}
-
-
-void __put_net(struct net *net)
-{
- /* Cleanup the network namespace in process context */
- INIT_WORK(&net->work, cleanup_net);
- schedule_work(&net->work);
-}
-EXPORT_SYMBOL_GPL(__put_net);
-
/*
* setup_net runs the initializers for the network namespace object.
*/
@@ -117,6 +58,12 @@ out_undo:
goto out;
}
+#ifdef CONFIG_NET_NS
+static struct net *net_alloc(void)
+{
+ return kmem_cache_zalloc(net_cachep, GFP_KERNEL);
+}
+
struct net *copy_net_ns(unsigned long flags, struct net *old_net)
{
struct net *new_net = NULL;
@@ -127,10 +74,6 @@ struct net *copy_net_ns(unsigned long flags, struct net *old_net)
if (!(flags & CLONE_NEWNET))
return old_net;
-#ifndef CONFIG_NET_NS
- return ERR_PTR(-EINVAL);
-#endif
-
err = -ENOMEM;
new_net = net_alloc();
if (!new_net)
@@ -157,6 +100,68 @@ out:
return new_net;
}
+static void net_free(struct net *net)
+{
+ if (!net)
+ return;
+
+ if (unlikely(atomic_read(&net->use_count) != 0)) {
+ printk(KERN_EMERG "network namespace not free! Usage: %d\n",
+ atomic_read(&net->use_count));
+ return;
+ }
+
+ kmem_cache_free(net_cachep, net);
+}
+
+static void cleanup_net(struct work_struct *work)
+{
+ struct pernet_operations *ops;
+ struct net *net;
+
+ net = container_of(work, struct net, work);
+
+ mutex_lock(&net_mutex);
+
+ /* Don't let anyone else find us. */
+ rtnl_lock();
+ list_del(&net->list);
+ rtnl_unlock();
+
+ /* Run all of the network namespace exit methods */
+ list_for_each_entry_reverse(ops, &pernet_list, list) {
+ if (ops->exit)
+ ops->exit(net);
+ }
+
+ mutex_unlock(&net_mutex);
+
+ /* Ensure there are no outstanding rcu callbacks using this
+ * network namespace.
+ */
+ rcu_barrier();
+
+ /* Finally it is safe to free my network namespace structure */
+ net_free(net);
+}
+
+void __put_net(struct net *net)
+{
+ /* Cleanup the network namespace in process context */
+ INIT_WORK(&net->work, cleanup_net);
+ schedule_work(&net->work);
+}
+EXPORT_SYMBOL_GPL(__put_net);
+
+#else
+struct net *copy_net_ns(unsigned long flags, struct net *old_net)
+{
+ if (flags & CLONE_NEWNET)
+ return ERR_PTR(-EINVAL);
+ return old_net;
+}
+#endif
+
static int __init net_ns_init(void)
{
int err;
--
1.5.3.4
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH 3/5] Hide the dead code in the net_namespace.c
2007-10-31 19:28 ` [PATCH 3/5] Hide the dead code in the net_namespace.c Pavel Emelyanov
@ 2007-11-01 7:45 ` David Miller
0 siblings, 0 replies; 21+ messages in thread
From: David Miller @ 2007-11-01 7:45 UTC (permalink / raw)
To: xemul; +Cc: netdev, ebiederm, devel
From: Pavel Emelyanov <xemul@openvz.org>
Date: Wed, 31 Oct 2007 22:28:51 +0300
> The namespace creation/destruction code is never called
> if the CONFIG_NET_NS is n, so it's OK to move it under
> appropriate ifdef.
>
> The copy_net_ns() in the "n" case checks for flags and
> returns -EINVAL when new net ns is requested. In a perfect
> world this stub must be in net_namespace.h, but this
> function need to know the CLONE_NEWNET value and thus
> requires sched.h. On the other hand this header is to be
> injected into almost every .c file in the networking code,
> and making all this code depend on the sched.h is a
> suicidal attempt.
>
> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Applied.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 4/5] Mark the setup_net as __net_init
2007-10-31 19:19 [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code Pavel Emelyanov
` (3 preceding siblings ...)
2007-10-31 19:28 ` [PATCH 3/5] Hide the dead code in the net_namespace.c Pavel Emelyanov
@ 2007-10-31 19:31 ` Pavel Emelyanov
2007-11-01 7:46 ` David Miller
2007-10-31 19:32 ` [PATCH 5/5] Hide the net_ns kmem cache Pavel Emelyanov
2007-10-31 21:37 ` [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code Daniel Lezcano
6 siblings, 1 reply; 21+ messages in thread
From: Pavel Emelyanov @ 2007-10-31 19:31 UTC (permalink / raw)
To: David Miller; +Cc: Linux Netdev List, Eric W. Biederman, devel
The setup_net is called for the init net namespace
only (int the CONFIG_NET_NS=n of course) from the __init
function, so mark it as __net_init to disappear with the
caller after the boot.
Yet again, in the perfect world this has to be under
#ifdef CONFIG_NET_NS, but it isn't guaranteed that every
subsystem is registered *after* the init_net_ns is set
up. After we are sure, that we don't start registering
them before the init net setup, we'll be able to move
this code under the ifdef.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
---
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index d5bf8b2..a044e2d 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -25,7 +25,7 @@ EXPORT_SYMBOL_GPL(init_net);
/*
* setup_net runs the initializers for the network namespace object.
*/
-static int setup_net(struct net *net)
+static __net_init int setup_net(struct net *net)
{
/* Must be called with net_mutex held */
struct pernet_operations *ops;
--
1.5.3.4
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH 4/5] Mark the setup_net as __net_init
2007-10-31 19:31 ` [PATCH 4/5] Mark the setup_net as __net_init Pavel Emelyanov
@ 2007-11-01 7:46 ` David Miller
0 siblings, 0 replies; 21+ messages in thread
From: David Miller @ 2007-11-01 7:46 UTC (permalink / raw)
To: xemul; +Cc: netdev, ebiederm, devel
From: Pavel Emelyanov <xemul@openvz.org>
Date: Wed, 31 Oct 2007 22:31:46 +0300
> The setup_net is called for the init net namespace
> only (int the CONFIG_NET_NS=n of course) from the __init
> function, so mark it as __net_init to disappear with the
> caller after the boot.
>
> Yet again, in the perfect world this has to be under
> #ifdef CONFIG_NET_NS, but it isn't guaranteed that every
> subsystem is registered *after* the init_net_ns is set
> up. After we are sure, that we don't start registering
> them before the init net setup, we'll be able to move
> this code under the ifdef.
>
> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Applied.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 5/5] Hide the net_ns kmem cache
2007-10-31 19:19 [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code Pavel Emelyanov
` (4 preceding siblings ...)
2007-10-31 19:31 ` [PATCH 4/5] Mark the setup_net as __net_init Pavel Emelyanov
@ 2007-10-31 19:32 ` Pavel Emelyanov
2007-11-01 7:47 ` David Miller
2007-10-31 21:37 ` [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code Daniel Lezcano
6 siblings, 1 reply; 21+ messages in thread
From: Pavel Emelyanov @ 2007-10-31 19:32 UTC (permalink / raw)
To: David Miller; +Cc: Linux Netdev List, Eric W. Biederman, devel
This cache is only required to create new namespaces,
but we won't have them in CONFIG_NET_NS=n case.
Hide it under the appropriate ifdef.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
---
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index a044e2d..e9f0964 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -17,8 +17,6 @@ static DEFINE_MUTEX(net_mutex);
LIST_HEAD(net_namespace_list);
-static struct kmem_cache *net_cachep;
-
struct net init_net;
EXPORT_SYMBOL_GPL(init_net);
@@ -59,6 +57,8 @@ out_undo:
}
#ifdef CONFIG_NET_NS
+static struct kmem_cache *net_cachep;
+
static struct net *net_alloc(void)
{
return kmem_cache_zalloc(net_cachep, GFP_KERNEL);
@@ -167,9 +167,11 @@ static int __init net_ns_init(void)
int err;
printk(KERN_INFO "net_namespace: %zd bytes\n", sizeof(struct net));
+#ifdef CONFIG_NET_NS
net_cachep = kmem_cache_create("net_namespace", sizeof(struct net),
SMP_CACHE_BYTES,
SLAB_PANIC, NULL);
+#endif
mutex_lock(&net_mutex);
err = setup_net(&init_net);
--
1.5.3.4
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code
2007-10-31 19:19 [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code Pavel Emelyanov
` (5 preceding siblings ...)
2007-10-31 19:32 ` [PATCH 5/5] Hide the net_ns kmem cache Pavel Emelyanov
@ 2007-10-31 21:37 ` Daniel Lezcano
6 siblings, 0 replies; 21+ messages in thread
From: Daniel Lezcano @ 2007-10-31 21:37 UTC (permalink / raw)
To: Pavel Emelyanov; +Cc: David Miller, Linux Netdev List, Eric W. Biederman, devel
Pavel Emelyanov wrote:
> Currently we have the NET_NS config option, but the only change it
> makes is just return ERR_PTR(-EINVAL) inside the cloning call thus
> introducing a bunch of a dead code and making the reference counting
> unneeded. This is not very good.
>
> So clean the net_namespace.c to fix this.
>
> I have sent a set of patches to Andrew to make similar thing for
> other namespaces, which introduces the NAMESPACES option to turn
> all the namespaces off at once (to make embedded people suffer
> less). So after that stuff is in, there will be some more patches
> to tie all this together.
>
> What is to be done after this set is to make the register_pernet_xxx
> stuff smaller. Currently this code weights approximately 500 bytes,
> so it worths reducing it, but I haven't found a good solution yet.
Did you had time to check the impact of your patch with the rest of the
network namespaces not yet included in mainline, belonging to Eric's git
tree ?
ps: can you cc' emails concerning the network namespace to the
containers mailing list too ? thx.
^ permalink raw reply [flat|nested] 21+ messages in thread