From: "Denis V. Lunev" <den-3ImXcnM4P+0@public.gmane.org>
To: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Cc: Linux Containers
<containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org>,
Pavel Emelianov <xemul-3ImXcnM4P+0@public.gmane.org>
Subject: Re: [Devel] [PATCH 1/5] net: Modify all rtnetlink methods to only work in the initial namespace
Date: Thu, 11 Oct 2007 10:36:58 +0400 [thread overview]
Message-ID: <470DC48A.30603@sw.ru> (raw)
In-Reply-To: <m13awilq1r.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
Eric W. Biederman wrote:
[..]
>> - introduce an skb_net field in the 'struct sk_buff' making all code
>> uniform, at least when we have an skb
>
> No. That bloats a sk_buff, changes the semantics of moving a skb
> around, and decreases performance (because we have to maintain the
> field on a fast path).
>
> There will not be a skb_net field.
>
> The entire concept of skb_net is a maintenance disaster.
>
>> I think that this is not the last place with such a parameter list and
>> we should make a decision at this point when the code in not mainline yet.
>
> Certainly that is what I have a proof of concept tree for. So we can
> see how these things look before we merge them.
>
>> As far as I understand, netfilters are not touched by the Eric and we
>> can face some non-trivial problems there.
>
> No. In my proof of concept tree I should have working per network
> namespace netfilter code. My intention was to just do enough to see
> what the impact would be so most of the netfilter code (in my tree)
> insists on running in the initial network namespace. But there are
> a few pieces that are fully converted. Please take a look.
>
>> So, if my point about uniformity is valid, this patchset looks wrong and
>> should be re-worked :(
>
> This patchset does need to get rebased on top of net-2.6.25 when it
> opens and hopefully your patchset to remove the unnecessary work in
> rtnl_unlock, and to really process netlink requests in process
> context. I see a need for the more fundamental change you seem to
> be advocating.
I see that current patchset of RTNL code is attached. I'll start the
next piece of work next week after some reaction from people.
[..]
> In the particular case of the netfilter hooks we don't have a
> network namespace parameter laying around before we call NF_HOOK,
> and the idiom "net = (in?in:out)->nd_net" seems perfectly accurate
> so it seems reasonable to me to derive the network namespace that
> way in generic code. Although thinking about this. We know which
> hooks we are being called from so we may in fact actually know
> if which of in or out must be valid when we get to the netfilter
> hook.
I understand your position. But still have some points :)
First. A real-life usecase we have recently fixed in the OpenVZ. I have
described it in the previous post, but (may be) not in great details.
UDP buffers management for outgoing traffic.
The packet carries over a destructor, which is called in skb_orphan in
the real networking device. This destructor allows to queue more
packets. There is no problem in the current implementation. But there is
one after namespace introduction in the following configuration:
[NS1] <-> [NS2] <-> [world]
There are two namespaces on the host. One of them is connected to the
outside world via another and the packet follows usual forwarding path
in NS2.
The problem: if we call skb_orphan in [NS1] outgoing device, there is a
great possibility to have a really huge packet drop in [NS2] on queuing
operation.
In OpenVZ we have added skb_orphan into receive path (tcp_v4_rcv,
__udp4_lib_rcv etc) and removed one from our virtual devices. As
unfortunate side effect we have packet in NS2 with a socket from NS1.
So, we should have an architectural solution for this from a beginning.
It will be too late to hack around and change namespace lookup scheme.
I see that we should adopt a generic approach:
- each concrete packet belongs to a concrete namespace
- if function has a packet as a parameter, we should get namespace from
packet
- if skb->dev is present we should get namespace as skb->dev->nd_net
- otherwise we should get skb->sk->sk_net. If skb->sk is NULL -> this is
a bug
So, for the case of all netfilter calls we'll have a pskb->dev->nd_net
defined correctly.
Regards,
Den
next prev parent reply other threads:[~2007-10-11 6:36 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-09-29 1:00 [PATCH 1/5] net: Modify all rtnetlink methods to only work in the initial namespace Eric W. Biederman
2007-09-29 1:00 ` Eric W. Biederman
2007-09-29 1:02 ` [PATCH 2/5] net: Make rtnetlink infrastructure network namespace aware Eric W. Biederman
2007-09-29 1:02 ` Eric W. Biederman
[not found] ` <m1hcle9td8.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
2007-09-29 1:04 ` [PATCH 3/5] net: Make the netlink methods in rtnetlink handle multiple network namespaces Eric W. Biederman
2007-09-29 1:07 ` [PATCH 4/5] net: Make AF_PACKET " Eric W. Biederman
2007-09-29 1:07 ` Eric W. Biederman
2007-09-29 1:08 ` [PATCH 5/5] net: Make AF_UNIX per network namespace safe Eric W. Biederman
2007-09-29 1:08 ` Eric W. Biederman
2007-09-29 15:47 ` Patrick McHardy
2007-09-29 17:03 ` Eric W. Biederman
2007-09-29 17:50 ` Patrick McHardy
2007-09-29 15:44 ` [PATCH 2/5] net: Make rtnetlink infrastructure network namespace aware Patrick McHardy
2007-09-29 16:51 ` Eric W. Biederman
2007-09-29 17:48 ` Patrick McHardy
2007-09-29 21:00 ` Eric W. Biederman
2007-09-30 13:13 ` [Devel] " Denis V. Lunev
2007-09-30 15:39 ` Patrick McHardy
2007-10-01 8:26 ` [Devel] " Denis V. Lunev
2007-10-01 8:45 ` Eric W. Biederman
2007-10-10 12:35 ` [Devel] [PATCH 1/5] net: Modify all rtnetlink methods to only work in the initial namespace Denis V. Lunev
2007-10-10 14:05 ` Daniel Lezcano
2007-10-10 14:29 ` Denis V. Lunev
2007-10-10 19:37 ` Eric W. Biederman
[not found] ` <m13awilq1r.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
2007-10-11 6:36 ` Denis V. Lunev [this message]
[not found] ` <470DC48A.30603-3ImXcnM4P+0@public.gmane.org>
2007-10-11 7:57 ` Eric W. Biederman
[not found] ` <m1r6k2gk2w.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
2007-10-11 12:18 ` Denis V. Lunev
[not found] ` <470E149A.3040803-3ImXcnM4P+0@public.gmane.org>
2007-10-11 17:08 ` Eric W. Biederman
[not found] ` <m1przlbmu4.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
2007-10-11 17:28 ` Denis V. Lunev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=470DC48A.30603@sw.ru \
--to=den-3imxcnm4p+0@public.gmane.org \
--cc=containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org \
--cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
--cc=xemul-3ImXcnM4P+0@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.