From: Daniel Lezcano <dlezcano@fr.ibm.com>
To: Pavel Emelianov <xemul@openvz.org>
Cc: Patrick McHardy <kaber@trash.net>,
Linux Containers <containers@lists.osdl.org>,
Linux Netdev List <netdev@vger.kernel.org>,
"Eric W. Biederman" <ebiederm@xmission.com>,
Kirill Korotaev <dev@openvz.org>
Subject: Re: [PATCH] Virtual ethernet tunnel
Date: Thu, 07 Jun 2007 11:29:50 +0200 [thread overview]
Message-ID: <4667D00E.2020605@fr.ibm.com> (raw)
In-Reply-To: <4667BD1D.9080905@openvz.org>
Pavel Emelianov wrote:
> Patrick McHardy wrote:
>
>> Pavel Emelianov wrote:
>>
>>> Veth stands for Virtual ETHernet. It is a simple tunnel driver
>>> that works at the link layer and looks like a pair of ethernet
>>> devices interconnected with each other.
>>>
>>> Mainly it allows to communicate between network namespaces but
>>> it can be used as is as well.
>>>
>>> Eric recently sent a similar driver called etun. This
>>> implementation uses another interface - the RTM_NRELINK
>>> message introduced by Patric. The patch fits today netdev
>>> tree with Patrick's patches.
>>>
>>> The newlink callback is organized that way to make it easy
>>> to create the peer device in the separate namespace when we
>>> have them in kernel.
>>>
>>>
>>> +struct veth_priv {
>>> + struct net_device *peer;
>>> + struct net_device *dev;
>>> + struct list_head list;
>>> + struct net_device_stats stats;
>>>
>> You can use dev->stats instead.
>>
>
> OK. Actually I planned to use percpu stats to reduce cacheline
> trashing (Stephen has noticed it also). The reason I didn't do it
> here is that the patch would look more complicated, but I wanted to
> show and approve the netlink interface first.
>
>
>>> +static int veth_xmit(struct sk_buff *skb, struct net_device *dev)
>>> +{
>>> + struct net_device *rcv = NULL;
>>> + struct veth_priv *priv, *rcv_priv;
>>> + int length;
>>> +
>>> + skb_orphan(skb);
>>> +
>>> + priv = netdev_priv(dev);
>>> + rcv = priv->peer;
>>> + rcv_priv = netdev_priv(rcv);
>>> +
>>> + if (!(rcv->flags & IFF_UP))
>>> + goto outf;
>>> +
>>> + skb->dev = rcv;
>>>
>> eth_type_trans already sets skb->dev.
>>
>
> Ok. Thanks.
>
>
>>> + skb->pkt_type = PACKET_HOST;
>>> + skb->protocol = eth_type_trans(skb, rcv);
>>> + if (dev->features & NETIF_F_NO_CSUM)
>>> + skb->ip_summed = rcv_priv->ip_summed;
>>> +
>>> + dst_release(skb->dst);
>>> + skb->dst = NULL;
>>> +
>>> + secpath_reset(skb);
>>> + nf_reset(skb);
>>>
>> Is skb->mark supposed to survive communication between different
>> namespaces?
>>
>
> I guess it must not. Thanks.
>
>
>>> +static const struct nla_policy veth_policy[VETH_INFO_MAX] = {
>>> + [VETH_INFO_MAC] = { .type = NLA_BINARY, .len = ETH_ALEN },
>>> + [VETH_INFO_PEER] = { .type = NLA_STRING },
>>> + [VETH_INFO_PEER_MAC] = { .type = NLA_BINARY, .len = ETH_ALEN },
>>> +};
>>>
>> The rtnl_link codes looks fine. I don't like the VETH_INFO_MAC attribute
>> very much though, we already have a generic device attribute for MAC
>> addresses. Of course that only allows you to supply one MAC address, so
>> I'm wondering what you think of allocating only a single device per
>> newlink operation and binding them in a seperate enslave operation?
>>
>
> I did this at the very first version, but Alexey showed me that this
> would be wrong. Look. When we create the second device it must be in
> the other namespace as it is useless to have them in one namespace.
> But if we have the device in the other namespace the RTNL_NEWLINK
> message from kernel would come into this namespace thus confusing ip
> utility in the init namespace. Creating the device in the init ns and
> moving it into the new one is rather a complex task.
>
Pavel,
moving the netdevice to another namespace is not a complex task. Eric
Biederman did it in its patchset ( cf. http://lxc.sf.net/network )
When the pair device is created, both extremeties are into the init
namespace and you can choose to which namespace to move one extremity.
When the network namespace dies, the netdev is moved back to the init
namespace.
That facilitate network device management.
Concerning netlink events, this is automatically generated when the
network device is moved through namespaces.
IMHO, we should have the network device movement between namespaces in
order to be able to move a physical network device too (eg. you have 4
NIC and you want to create 3 containers and assign 3 NIC to each of them)
> But with such approach the creation looks really logical. We send a
> packet to the kernel and have a single response about the new device
> appearance. At the same time we have a RTNL_NEWLINK message arrived at
> the destination namespace informing that a new device has appeared
> there as well.
>
>
>>> +enum {
>>> + VETH_INFO_UNSPEC,
>>> + VETH_INFO_MAC,
>>> + VETH_INFO_PEER,
>>> + VETH_INFO_PEER_MAC,
>>> +
>>> + VETH_INFO_MAX
>>> +};
>>>
>> Please follow the
>>
>> #define VETH_INFO_MAX (__VETH_INFO_MAX - 1)
>>
>> convention here.
>>
>
> Could you please clarify this point. I saw the lines
> enum {
> ...
> RTNL_NEWLINK
> #define RTNL_NEWLINK RTNL_NEWLINK
> ...
> }
> and had my brains exploded imagining what this would mean :(
>
>
>> -
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
>
>
next prev parent reply other threads:[~2007-06-07 9:30 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-06 15:11 [PATCH] Virtual ethernet tunnel Pavel Emelianov
2007-06-06 15:17 ` [PATCH] Module for ip utility to support veth device Pavel Emelianov
2007-06-06 15:18 ` Patrick McHardy
2007-06-06 15:28 ` [PATCH] Virtual ethernet tunnel Patrick McHardy
2007-06-07 8:09 ` Pavel Emelianov
2007-06-07 9:29 ` Daniel Lezcano [this message]
2007-06-07 9:51 ` Pavel Emelianov
2007-06-07 14:05 ` Daniel Lezcano
2007-06-07 14:23 ` Kirill Korotaev
2007-06-07 14:42 ` Daniel Lezcano
2007-06-07 15:33 ` Pavel Emelianov
2007-06-07 15:25 ` Pavel Emelianov
2007-06-07 15:44 ` Daniel Lezcano
2007-06-11 11:39 ` Patrick McHardy
2007-06-13 9:24 ` Pavel Emelianov
2007-06-13 11:12 ` Patrick McHardy
2007-06-13 16:02 ` Pavel Emelianov
2007-06-13 15:37 ` Patrick McHardy
2007-06-06 15:39 ` Patrick McHardy
2007-06-06 16:17 ` Stephen Hemminger
2007-06-06 19:47 ` David Miller
2007-06-06 20:38 ` [Devel] " Daniel Lezcano
2007-06-06 20:49 ` David Miller
2007-06-07 8:14 ` Kirill Korotaev
2007-06-07 9:07 ` David Miller
2007-06-07 9:30 ` Benjamin Thery
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4667D00E.2020605@fr.ibm.com \
--to=dlezcano@fr.ibm.com \
--cc=containers@lists.osdl.org \
--cc=dev@openvz.org \
--cc=ebiederm@xmission.com \
--cc=kaber@trash.net \
--cc=netdev@vger.kernel.org \
--cc=xemul@openvz.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).