From: David Miller <davem@davemloft.net>
To: dada1@cosmosbay.com
Cc: ebiederm@xmission.com, xemul@openvz.org, netdev@vger.kernel.org,
devel@openvz.org
Subject: Re: [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code
Date: Wed, 31 Oct 2007 16:31:46 -0700 (PDT) [thread overview]
Message-ID: <20071031.163146.92277507.davem@davemloft.net> (raw)
In-Reply-To: <4729047B.3080003@cosmosbay.com>
From: Eric Dumazet <dada1@cosmosbay.com>
Date: Wed, 31 Oct 2007 23:40:59 +0100
> Eric W. Biederman a écrit :
> > Eric Dumazet <dada1@cosmosbay.com> writes:
> >
> >
> >> Definitly wanted here. Thank you.
> >> One more refcounting on each socket creation/deletion was expensive.
> >
> > Really? Have you actually measured that? If the overhead is
> > measurable and expensive we may want to look at per cpu counters or
> > something like that. So far I don't have any numbers that say any
> > of the network namespace work inherently has any overhead.
>
> It seems that on some old opterons (two 246 for example),
> "if (atomic_dec_and_test(&net->count))" is rather expensive yes :(
P4 chips are generally very poor at mispredicted branches and
atomics. So every atomic you remove from the socket paths
gives a noticable improvement on them.
Network device reference counting is such a stupid problem. There has
to be a way to get rid of it on the packet side.
I think we could get rid of all of the device refcounting from packets
if we:
1) Formalize "SKB roots". This is every place a packet
could sit in the transmit path.
2) On device unregister:
a) wait for RCU quiesce period
b) stop_machine_run(skb_walk_roots, netdev, NR_CPUS);
skb_walk_roots is a function that walks all the places in
#1, rewriting the packet to point to loopback or whatever
instead of 'netdev' which we are trying to unregister.
This gives us two things.
First, we no longer would need to rectount net devices
for packet references.
Second, we have a debugging framework for all those dreaded SKB leaks
that keep devices from being unloadable. As we walk the roots
we'll see where all packets referencing a device actually are.
next prev parent reply other threads:[~2007-10-31 23:31 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-31 19:19 [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code Pavel Emelyanov
2007-10-31 18:49 ` Eric Dumazet
2007-10-31 21:35 ` Daniel Lezcano
2007-10-31 22:05 ` Eric W. Biederman
2007-10-31 22:40 ` Eric Dumazet
2007-10-31 23:31 ` David Miller [this message]
2007-11-01 0:58 ` Eric W. Biederman
2007-11-01 0:51 ` Eric W. Biederman
2007-11-01 6:58 ` Eric Dumazet
2007-11-01 7:02 ` David Miller
2007-10-31 19:23 ` [PATCH 1/5][NETNS] Make the init/exit hooks checks outside the loop Pavel Emelyanov
2007-11-01 7:43 ` David Miller
2007-10-31 19:25 ` [PATCH 2/5] Relax the reference counting of init_net_ns Pavel Emelyanov
2007-11-01 7:43 ` David Miller
2007-10-31 19:28 ` [PATCH 3/5] Hide the dead code in the net_namespace.c Pavel Emelyanov
2007-11-01 7:45 ` David Miller
2007-10-31 19:31 ` [PATCH 4/5] Mark the setup_net as __net_init Pavel Emelyanov
2007-11-01 7:46 ` David Miller
2007-10-31 19:32 ` [PATCH 5/5] Hide the net_ns kmem cache Pavel Emelyanov
2007-11-01 7:47 ` David Miller
2007-10-31 21:37 ` [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code Daniel Lezcano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20071031.163146.92277507.davem@davemloft.net \
--to=davem@davemloft.net \
--cc=dada1@cosmosbay.com \
--cc=devel@openvz.org \
--cc=ebiederm@xmission.com \
--cc=netdev@vger.kernel.org \
--cc=xemul@openvz.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).