netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pavel Emelyanov <xemul@parallels.com>
To: Mahesh Bandewar <maheshb@google.com>, netdev <netdev@vger.kernel.org>
Cc: Eric Dumazet <edumazet@google.com>,
	Maciej Zenczykowski <maze@google.com>,
	Laurent Chavey <chavey@google.com>,
	Tim Hockin <thockin@google.com>,
	David Miller <davem@davemloft.net>,
	Brandon Philips <brandon.philips@coreos.com>
Subject: Re: [PATCH net-next 1/1] ipvlan: Initial check-in of the IPVLAN driver.
Date: Wed, 12 Nov 2014 20:11:27 +0400	[thread overview]
Message-ID: <546386AF.9030300@parallels.com> (raw)
In-Reply-To: <1415744984-25802-1-git-send-email-maheshb@google.com>

On 11/12/2014 02:29 AM, Mahesh Bandewar wrote:
> This driver is very similar to the macvlan driver except that it
> uses L3 on the frame to determine the logical interface while
> functioning as packet dispatcher. It inherits L2 of the master
> device hence the packets on wire will have the same L2 for all
> the packets originating from all virtual devices off of the same
> master device.
> 
> This driver was developed keeping the namespace use-case in
> mind. Hence most of the examples given here take that as the
> base setup where main-device belongs to the default-ns and
> virtual devices are assigned to the additional namespaces.
> 
> The device operates in two different modes and the difference
> in these two modes in primarily in the TX side.
> 
> (a) L2 mode : In this mode, the device behaves as a L2 device.
> TX processing upto L2 happens on the stack of the virtual device
> associated with (namespace). Packets are switched after that
> into the main device (default-ns) and queued for xmit.
> 
> RX processing is simple and all multicast, broadcast (if
> applicable), and unicast belonging to the address(es) are
> delivered to the virtual devices.
> 
> (b) L3 mode : In this mode, the device behaves like a L3 device.
> TX processing upto L3 happens on the stack of the virtual device
> associated with (namespace). Packets are switched to the
> main-device (default-ns) for the L2 processing. Hence the routing
> table of the default-ns will be used in this mode.
> 
> RX processins is somewhat similar to the L2 mode except that in
> this mode only Unicast packets are delivered to the virtual device
> while main-dev will handle all other packets.
> 
> The devices can be added using the "ip" command from the iproute2
> package -
> 
> 	ip link add link <master> <virtual> type ipvlan mode [ l2 | l3 ]
> 
> Signed-off-by: Mahesh Bandewar <maheshb@google.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Maciej Żenczykowski <maze@google.com>
> Cc: Laurent Chavey <chavey@google.com>
> Cc: Tim Hockin <thockin@google.com>
> Cc: Brandon Philips <brandon.philips@coreos.com>
> Cc: Pavel Emelianov <xemul@parallels.com>

Acked-by: /me on the general idea. We use this device of type in Parallels
heavily for several reasons -- not to generate too many MAC-s from one host
and to "enforce" the IP address for a container. I have a comment about the
latter below.


> +static void *ipvlan_get_L3_hdr(struct sk_buff *skb, int *type)
> +{
> +	void *lyr3h = NULL;
> +
> +	switch (skb->protocol) {
> +	case htons(ETH_P_ARP): {
> +		struct arphdr *arph;
> +
> +		if (unlikely(!pskb_may_pull(skb, sizeof(struct arphdr))))
> +			return NULL;
> +
> +		arph = arp_hdr(skb);
> +		*type = IPVL_ARP;
> +		lyr3h = arph;
> +		break;
> +	}
> +
> +	case htons(ETH_P_IP): {
> +		u32 pktlen;
> +		struct iphdr *ip4h;
> +
> +		if (unlikely(!pskb_may_pull(skb, sizeof(struct iphdr))))
> +			return NULL;
> +
> +		ip4h = ip_hdr(skb);
> +		pktlen = ntohs(ip4h->tot_len);
> +		if (ip4h->ihl < 5 || ip4h->version != 4)
> +			return NULL;
> +		if (skb->len < pktlen || pktlen < (ip4h->ihl * 4))
> +			return NULL;
> +
> +		*type = IPVL_IPV4;
> +		lyr3h = ip4h;
> +		break;
> +	}
> +	case htons(ETH_P_IPV6): {
> +		struct ipv6hdr *ip6h;
> +
> +		if (unlikely(!pskb_may_pull(skb, sizeof(struct iphdr))))

Misprint -- should be sizeof(struct ipv6hdr)

> +static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
> +			   struct nlattr *tb[], struct nlattr *data[])
> +{
> +	struct ipvl_dev *ipvlan = netdev_priv(dev);
> +	struct ipvl_port *port;
> +	struct net_device *phy_dev;
> +	int err;
> +
> +	ipvlan_dbg(3, "%s[%d]: Entering...\n", __func__, __LINE__);
> +	if (!tb[IFLA_LINK]) {
> +		ipvlan_dbg(3, "%s[%d]: Returning -EINVAL...\n",
> +			   __func__, __LINE__);
> +		return -EINVAL;
> +	}
> +
> +	phy_dev = __dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK]));
> +	if (phy_dev == NULL) {
> +		ipvlan_dbg(3, "%s[%d]: Returning -ENODEV...\n",
> +			   __func__, __LINE__);
> +		return -ENODEV;
> +	}
> +
> +	/* TODO will someone try creating ipvlan-dev on an ipvlan-virtual dev?*/
> +	if (!ipvlan_dev_master(phy_dev)) {
> +		err = ipvlan_port_create(phy_dev);
> +		if (err < 0) {
> +			ipvlan_dbg(3, "%s[%d]: Returning error (%d)...\n",
> +				   __func__, __LINE__, err);
> +			return err;
> +		}
> +	}
> +
> +	port = ipvlan_port_get_rtnl(phy_dev);
> +	/* Get the mode if specified. */
> +	if (data && data[IFLA_IPVLAN_MODE])
> +		port->mode = nla_get_u16(data[IFLA_IPVLAN_MODE]);

Should the invalid value be checked here? There are places
where we BUG() in mode being "unknown".

> +
> +	ipvlan->phy_dev = phy_dev;
> +	ipvlan->dev = dev;
> +	ipvlan->port = port;
> +	ipvlan->sfeatures = IPVLAN_FEATURES;
> +	INIT_LIST_HEAD(&ipvlan->addrs);
> +	ipvlan->ipv4cnt = 0;
> +	ipvlan->ipv6cnt = 0;


> +static int ipvlan_device_event(struct notifier_block *unused,
> +			       unsigned long event, void *ptr)
> +{
> +	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
> +	struct ipvl_dev *ipvlan, *next;
> +	struct ipvl_port *port;
> +	LIST_HEAD(lst_kill);
> +
> +	if (!ipvlan_dev_master(dev))
> +		return NOTIFY_DONE;
> +
> +	port = ipvlan_port_get_rtnl(dev);
> +
> +	switch (event) {
> +	case NETDEV_CHANGE:
> +		list_for_each_entry(ipvlan, &port->ipvlans, pnode)
> +			netif_stacked_transfer_operstate(ipvlan->phy_dev,
> +							 ipvlan->dev);
> +		break;
> +
> +	case NETDEV_UNREGISTER:
> +		if (dev->reg_state != NETREG_UNREGISTERING)
> +			break;
> +
> +		list_for_each_entry_safe(ipvlan, next, &port->ipvlans,
> +					 pnode)
> +			ipvlan->dev->rtnl_link_ops->dellink(ipvlan->dev,
> +							    &lst_kill);
> +		unregister_netdevice_many(&lst_kill);
> +		list_del(&lst_kill);

This list_del seems to be excessive.

> +		break;
> +

> +static int ipvlan_addr4_event(struct notifier_block *unused,
> +			      unsigned long event, void *ptr)
> +{
> +	struct in_ifaddr *if4 = (struct in_ifaddr *)ptr;
> +	struct net_device *dev = (struct net_device *)if4->ifa_dev->dev;
> +	struct ipvl_dev *ipvlan = netdev_priv(dev);
> +	struct in_addr ip4_addr;
> +
> +	ipvlan_dbg(3, "%s[%d]: Entering...\n", __func__, __LINE__);
> +	if (!ipvlan_dev_slave(dev))
> +		return NOTIFY_DONE;
> +
> +	if (!ipvlan || !ipvlan->port)
> +		return NOTIFY_DONE;
> +
> +	switch (event) {
> +	case NETDEV_UP:

Can it be (in the future) somehow restricted so that net-namespace wouldn't
be able to assign arbitrary IP address here? One of the reasons for using
such devices is to enforce the container to use the IP address given from
the host.

> +		ip4_addr.s_addr = if4->ifa_address;
> +		if (ipvlan_add_addr4(ipvlan, &ip4_addr))
> +			return NOTIFY_BAD;
> +		break;
> +
> +	case NETDEV_DOWN:
> +		ip4_addr.s_addr = if4->ifa_address;
> +		ipvlan_del_addr4(ipvlan, &ip4_addr);
> +		break;
> +	}
> +
> +	ipvlan_dbg(3, "%s[%d]: Leaving...\n", __func__, __LINE__);
> +	return NOTIFY_OK;
> +}

  parent reply	other threads:[~2014-11-12 17:13 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-11 22:29 [PATCH net-next 1/1] ipvlan: Initial check-in of the IPVLAN driver Mahesh Bandewar
2014-11-11 23:12 ` Cong Wang
2014-11-11 23:19   ` David Miller
2014-11-12  0:37     ` Cong Wang
2014-11-11 23:22   ` Hannes Frederic Sowa
2014-11-12  0:39     ` Cong Wang
2014-11-12  2:29       ` Eric Dumazet
2014-11-12  2:46       ` Mahesh Bandewar
2014-11-11 23:28 ` Eric Dumazet
2014-11-12 23:36   ` Mahesh Bandewar
2014-11-12 16:11 ` Pavel Emelyanov [this message]
2014-11-12 23:56   ` Mahesh Bandewar
2014-11-13 11:07     ` Pavel Emelyanov
2014-11-13 16:50       ` Mahesh Bandewar
2014-11-13 15:57         ` Pavel Emelyanov
  -- strict thread matches above, loose matches on Subject: below --
2014-11-13 23:25 Alexei Starovoitov
2014-11-14  5:47 ` Mahesh Bandewar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=546386AF.9030300@parallels.com \
    --to=xemul@parallels.com \
    --cc=brandon.philips@coreos.com \
    --cc=chavey@google.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=maheshb@google.com \
    --cc=maze@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=thockin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).