From: ebiederm@xmission.com (Eric W. Biederman)
To: Herbert Poetzl <herbert@13thfloor.at>
Cc: Daniel Lezcano <dlezcano@fr.ibm.com>,
Andrey Savochkin <saw@swsoft.com>,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
serue@us.ibm.com, haveblue@us.ibm.com, clg@fr.ibm.com,
Andrew Morton <akpm@osdl.org>,
dev@sw.ru, devel@openvz.org, sam@vilain.net,
viro@ftp.linux.org.uk, Alexey Kuznetsov <alexey@sw.ru>
Subject: Re: [patch 2/6] [Network namespace] Network device sharing by view
Date: Mon, 26 Jun 2006 14:37:15 -0600 [thread overview]
Message-ID: <m1u067psas.fsf@ebiederm.dsl.xmission.com> (raw)
In-Reply-To: <20060626200225.GA5330@MAIL.13thfloor.at> (Herbert Poetzl's message of "Mon, 26 Jun 2006 22:02:25 +0200")
Herbert Poetzl <herbert@13thfloor.at> writes:
> On Mon, Jun 26, 2006 at 01:35:15PM -0600, Eric W. Biederman wrote:
>> Herbert Poetzl <herbert@13thfloor.at> writes:
>>
>
> yes, but you will not be able to apply policy on
> the parent, restricting the child networking in a
> proper way without jumping through hoops ...
? I don't understand where you are coming from.
There is no restriction on where you can apply policy.
>> I really do not believe we have a hotpath issue, if this
>> is implemented properly. Benchmarks of course need to be taken,
>> to prove this.
>
> I'm fine with proper testing and good numbers here
> but until then, I can only consider it a prototype
We are taking the first steps to get this all sorted out.
I think what we have is more than a prototype but less then
the final implementation. Call it the very first draft version.
>> There are only two places a sane implementation should show issues.
>> - When the access to a pointer goes through a pointer to find
>> that global variable.
>> - When doing a lookup in a hash table we need to look at an additional
>> field to verify a hash match. Because having a completely separate
>> hash table is likely too expensive.
>>
>> If that can be shown to really slow down packets on the hot path I am
>> willing to consider other possibilities. Until then I think we are on
>> path to the simplest and most powerful version of building a network
>> namespace usable by containers.
>
> keep in mind that you actually have three kinds
> of network traffic on a typical host/guest system:
>
> - traffic between unit and outside
> - host traffic should be quite minimal
> - guest traffic will be quite high
>
> - traffic between host and guest
> probably minimal too (only for shared services)
>
> - traffic between guests
> can be as high (or even higher) than the
> outbound traffic, just think web guest and
> database guest
Interesting.
>> The routing between network namespaces does have the potential to be
>> more expensive than just a packet trivially coming off the wire into a
>> socket.
>
> IMHO the routing between network namespaces should
> not require more than the current local traffic
> does (i.e. you should be able to achieve loopback
> speed within an insignificant tolerance) and not
> nearly the time required for on-wire stuff ...
That assumes on the wire stuff is noticeably slower.
You can achieve over 1GB/s on some networks.
But I agree that the cost should resemble the current
loopback device. I have seen nothing that suggests
it is not.
>> However that is fundamentally from a lack of hardware. If the
>> rest works smarter filters in the drivers should enable to remove the
>> cost.
>>
>> Basically it is just a matter of:
>> if (dest_mac == my_mac1) it is for device 1.
>> If (dest_mac == my_mac2) it is for device 2.
>> etc.
>
> hmm, so you plan on providing a different MAC for
> each guest? how should that be handled from the
> user PoV? you cannot simply make up MACs as you
> go, and, depending on the network card, operation
> in promisc mode might be slower than for a given
> set (maybe only one) MAC, no?
The speed is a factor certainly. As for making up
macs. There is a local assignment bit that you can set.
With that set it is just a matter of using a decent random
number generator. The kernel already does this is some places.
>> At a small count of macs it is trivial to understand it will go
>> fast for a larger count of macs it only works with a good data
>> structure. We don't hit any extra cache lines of the packet,
>> and the above test can be collapsed with other routing lookup tests.
>
> well, I'm absolutely not against flexibility or
> full virtualization, but the proposed 'routing'
> on the host effectively doubles the time the
> packet spends in the network stack(s), so I can
> not believe that this approach would not add
> (significant) overhead to the hot path ...
It might, but I am pretty certain it won't double
the cost, as you don't do 2 full network stack traversals.
And even at a full doubling I doubt it will affect bandwith
or latency very much. If it does we have a lot more to optimize
in the network stack than just this code.
Eric
next prev parent reply other threads:[~2006-06-26 20:38 UTC|newest]
Thread overview: 113+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-06-09 21:02 [RFC] [patch 0/6] [Network namespace] introduction dlezcano
2006-06-09 21:02 ` [RFC] [patch 1/6] [Network namespace] Network namespace structure dlezcano
2006-06-09 21:02 ` [RFC] [patch 2/6] [Network namespace] Network device sharing by view dlezcano
2006-06-11 10:18 ` Andrew Morton
2006-06-18 18:53 ` Al Viro
2006-06-26 9:47 ` Andrey Savochkin
2006-06-26 13:02 ` Herbert Poetzl
2006-06-26 14:05 ` Eric W. Biederman
2006-06-26 14:08 ` Andrey Savochkin
2006-06-26 18:28 ` Herbert Poetzl
2006-06-26 18:59 ` Eric W. Biederman
2006-06-26 14:56 ` Daniel Lezcano
2006-06-26 15:21 ` Eric W. Biederman
2006-06-26 15:27 ` Andrey Savochkin
2006-06-26 15:49 ` Daniel Lezcano
2006-06-26 16:40 ` Eric W. Biederman
2006-06-26 18:36 ` Herbert Poetzl
2006-06-26 19:35 ` Eric W. Biederman
2006-06-26 20:02 ` Herbert Poetzl
2006-06-26 20:37 ` Eric W. Biederman [this message]
2006-06-26 21:26 ` Herbert Poetzl
2006-06-26 21:59 ` Ben Greear
2006-06-26 22:11 ` Eric W. Biederman
2006-06-27 9:09 ` Andrey Savochkin
2006-06-27 15:48 ` Herbert Poetzl
2006-06-27 16:19 ` Andrey Savochkin
2006-06-27 16:40 ` Eric W. Biederman
2006-06-26 22:13 ` Ben Greear
2006-06-26 22:54 ` Herbert Poetzl
2006-06-26 23:08 ` Ben Greear
2006-06-27 16:07 ` Ben Greear
2006-06-27 22:48 ` Herbert Poetzl
2006-06-27 9:11 ` Andrey Savochkin
2006-06-27 9:34 ` Daniel Lezcano
2006-06-27 9:38 ` Andrey Savochkin
2006-06-27 11:21 ` Daniel Lezcano
2006-06-27 11:52 ` Eric W. Biederman
2006-06-27 16:02 ` Herbert Poetzl
2006-06-27 16:47 ` Eric W. Biederman
2006-06-27 17:19 ` Ben Greear
2006-06-27 22:52 ` Herbert Poetzl
2006-06-27 23:12 ` Dave Hansen
2006-06-27 23:42 ` Alexey Kuznetsov
2006-06-28 3:38 ` Eric W. Biederman
2006-06-28 13:36 ` Herbert Poetzl
2006-06-28 13:53 ` jamal
2006-06-28 14:19 ` Andrey Savochkin
2006-06-28 16:17 ` jamal
2006-06-28 16:58 ` Andrey Savochkin
2006-06-28 17:17 ` Eric W. Biederman
2006-06-28 17:04 ` Herbert Poetzl
2006-06-28 14:39 ` Eric W. Biederman
2006-06-30 1:41 ` Sam Vilain
2006-06-29 21:07 ` Sam Vilain
2006-06-29 22:14 ` strict isolation of net interfaces Cedric Le Goater
2006-06-30 2:39 ` Serge E. Hallyn
2006-06-30 2:49 ` Sam Vilain
2006-07-03 14:53 ` Andrey Savochkin
2006-07-04 3:00 ` Sam Vilain
2006-07-04 12:29 ` Daniel Lezcano
2006-07-04 13:13 ` Sam Vilain
2006-07-04 13:19 ` Daniel Lezcano
2006-06-30 8:56 ` Cedric Le Goater
2006-07-03 13:36 ` Herbert Poetzl
2006-06-30 12:23 ` Daniel Lezcano
2006-06-30 14:20 ` Eric W. Biederman
2006-06-30 15:22 ` Daniel Lezcano
2006-06-30 17:58 ` Eric W. Biederman
2006-06-30 16:14 ` Serge E. Hallyn
2006-06-30 17:41 ` Eric W. Biederman
2006-06-30 18:09 ` Eric W. Biederman
2006-06-30 0:15 ` [patch 2/6] [Network namespace] Network device sharing by view jamal
2006-06-30 3:35 ` Herbert Poetzl
2006-06-30 7:45 ` Andrey Savochkin
2006-06-30 13:50 ` jamal
2006-06-30 15:01 ` Andrey Savochkin
2006-06-30 18:22 ` Eric W. Biederman
2006-06-30 21:51 ` jamal
2006-07-01 0:50 ` Eric W. Biederman
2006-06-28 14:21 ` Eric W. Biederman
2006-06-28 14:51 ` Eric W. Biederman
2006-06-27 16:49 ` Alexey Kuznetsov
2006-06-27 11:55 ` Andrey Savochkin
2006-06-27 9:54 ` Kirill Korotaev
2006-06-27 16:09 ` Herbert Poetzl
2006-06-27 16:29 ` Eric W. Biederman
2006-06-27 23:07 ` Herbert Poetzl
2006-06-28 4:07 ` Eric W. Biederman
2006-06-28 6:31 ` Sam Vilain
2006-06-28 14:15 ` Herbert Poetzl
2006-06-28 15:36 ` Eric W. Biederman
2006-06-28 17:18 ` Herbert Poetzl
2006-06-28 10:14 ` Cedric Le Goater
2006-06-28 14:11 ` Herbert Poetzl
2006-06-28 16:10 ` Eric W. Biederman
2006-07-06 9:45 ` Routing tables (Re: [patch 2/6] [Network namespace] Network device sharing by view) Kari Hurtta
2006-06-09 21:02 ` [RFC] [patch 3/6] [Network namespace] Network devices isolation dlezcano
2006-06-18 18:57 ` Al Viro
2006-06-09 21:02 ` [RFC] [patch 4/6] [Network namespace] Network inet " dlezcano
2006-06-09 21:02 ` [RFC] [patch 5/6] [Network namespace] ipv4 isolation dlezcano
2006-06-10 0:23 ` James Morris
2006-06-10 0:27 ` Rick Jones
2006-06-10 0:47 ` James Morris
2006-06-09 21:02 ` [RFC] [patch 6/6] [Network namespace] Network namespace debugfs dlezcano
2006-06-10 7:16 ` [RFC] [patch 0/6] [Network namespace] introduction Kari Hurtta
2006-06-16 4:23 ` Eric W. Biederman
2006-06-16 9:06 ` Daniel Lezcano
2006-06-16 9:22 ` Eric W. Biederman
2006-06-18 18:47 ` Al Viro
2006-06-20 21:21 ` Daniel Lezcano
2006-06-20 21:25 ` Al Viro
2006-06-20 22:45 ` Daniel Lezcano
2006-06-26 23:38 ` Patrick McHardy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m1u067psas.fsf@ebiederm.dsl.xmission.com \
--to=ebiederm@xmission.com \
--cc=akpm@osdl.org \
--cc=alexey@sw.ru \
--cc=clg@fr.ibm.com \
--cc=dev@sw.ru \
--cc=devel@openvz.org \
--cc=dlezcano@fr.ibm.com \
--cc=haveblue@us.ibm.com \
--cc=herbert@13thfloor.at \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=sam@vilain.net \
--cc=saw@swsoft.com \
--cc=serue@us.ibm.com \
--cc=viro@ftp.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).