Linux Container Development
 help / color / mirror / Atom feed
From: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
To: Dan Smith <danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org
Subject: Re: [PATCH 3/3] C/R: Basic support for network namespaces and devices
Date: Wed, 20 Jan 2010 16:26:50 -0500	[thread overview]
Message-ID: <4B57751A.5080506@cs.columbia.edu> (raw)
In-Reply-To: <1263999673-11279-4-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>


Cool - looks good !

Would this compile without CONFIG_NET ?  without CONFIG_NET_NS ?

How can a user ask to not checkpoint the network-ns ?  (e.g. in
a subtree checkpoint)

And a few minor comments inline...


Dan Smith wrote:
> When checkpointing a task tree with network namespaces, we hook into
> do_checkpoint_ns() along with the others.  Any devices in a given namespace
> are checkpointed (including their peer, in the case of veth) sequentially.
> Each network device stores a list of protocol addresses, as well as other
> information, such as hardware address.
> 
> This patch supports veth pairs, as well as the loopback adapter.  The
> loopback support is there to make sure that any additional addresses and
> state (such as up/down) is copied to the loopback adapter that we are
> given in the new network namespace.
> 
> On restart, we instantiate new network namespaces and veth pairs as
> necessary.  Any device we encounter that isn't in a network namespace
> that was checkpointed as part of a task is left in the namespace of the
> restarting process.  This will be the case for a veth half that exists
> in the init netns to provide network access to a container.
> 
> Still to do are:
> 
>   1. Routes
>   2. Netfilter rules
>   3. IPv6 addresses
>   4. Other virtual device types (e.g. bridges)
> 
> Signed-off-by: Dan Smith <danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

[...]

> +/*
> + * Determine if an interface should be checkpointed, skipped, or
> + * if it makes us uncheckpointable.  This needs to be improved
> + * dramatically, but works for the moment.

Maybe be a bit more verbose about what's missing ?

> + *
> + * Return 1 for yes, 0 for skip, -ERRNO for error
> + */

[...]

> +static int count_inet4_addrs(struct in_device *indev)
> +{
> +	int count = 0;
> +	struct in_ifaddr *addr;
> +
> +	for (addr = indev->ifa_list; addr; addr = addr->ifa_next)
> +		count++;
> +
> +	return count;
> +}
> +
> +static int checkpoint_in_addrs(struct ckpt_ctx *ctx, struct in_device *indev)
> +{
> +	struct ckpt_hdr_netdev_addr *h;
> +	struct in_ifaddr *addr = indev->ifa_list;
> +	int ret;
> +	int count = 0;
> +

Is there a reason not to collect all addresses into one buffer (can
there be more than a page worth of them ?) and write in one go ?

> +	while (addr) {
> +		h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_NETDEV_ADDR);
> +		if (!h)
> +			return -ENOMEM;
> +
> +		h->type = CKPT_NETDEV_ADDR_IPV4; /* Only IPv4 right now */
> +
> +		h->inet4_local = addr->ifa_local;
> +		h->inet4_address = addr->ifa_address;
> +		h->inet4_mask = addr->ifa_mask;
> +		h->inet4_broadcast = addr->ifa_broadcast;
> +
> +		ret = ckpt_write_obj(ctx, (struct ckpt_hdr *) h);
> +		ckpt_hdr_put(ctx, h);
> +		if (ret < 0)
> +			break;
> +
> +		addr = addr->ifa_next;
> +
> +		count++;
> +	}
> +
> +	return ret < 0 ? ret : count;
> +}

[...]

> +
> +int checkpoint_netdev(struct ckpt_ctx *ctx, void *ptr)
> +{
> +	struct ckpt_hdr_netdev *h;
> +	struct net_device *dev = ptr;
> +	struct net_device *peer = NULL;
> +	struct net *net = dev->nd_net;
> +	int ret = 0;
> +	struct ifreq req;
> +
> +	h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_NETDEV);
> +	if (!h)
> +		return -ENOMEM;
> +
> +	if (strcmp(dev->name, "lo") == 0)
> +		h->type = CKPT_NETDEV_LO;
> +	else {

While this is correct, perhaps be more verbose and change to:
	else if (strncmp(dev->name, "veth", 4) == 0)

> +		h->type = CKPT_NETDEV_VETH;
> +		peer = veth_get_peer(dev);
> +	}

and then
	} else {
		/* error */
	}

> +
> +	memcpy(req.ifr_name, dev->name, IFNAMSIZ);
> +	ret = __kern_dev_ioctl(net, SIOCGIFFLAGS, &req);
> +	h->flags = req.ifr_flags;
> +	if (ret < 0)
> +		goto out;
> +

[...]

> +
> +int checkpoint_netns(struct ckpt_ctx *ctx, void *ptr)
> +{
> +	struct net *net = ptr;
> +	struct net_device *dev;
> +	struct ckpt_hdr_netns *h;
> +	int ret;
> +
> +	h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_NET_NS);
> +	if (!h)
> +		return -ENOMEM;
> +
> +	h->this_ref = ckpt_obj_lookup(ctx, net, CKPT_OBJ_NET_NS);
> +	if (h->this_ref == 0) {
> +		/* This shouldn't happen because we're called from
> +		 * checkpoint_obj() which should have already put
> +		 * us in the hash
> +		 */

If this can only happen due to a bug, then BUG_ON(). Otherwise,
maybe use ckpt_err() ?

> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	ret = ckpt_write_obj(ctx, (struct ckpt_hdr *) h);
> +	if (ret < 0)
> +		goto out;
> +
> +	for_each_netdev(net, dev) {
> +		ret = should_checkpoint_netdev(dev);
> +		if (ret > 0)
> +			ret = checkpoint_obj(ctx, dev, CKPT_OBJ_NETDEV);
> +		if (ret < 0)
> +			break;
> +	}
> + out:
> +	ckpt_hdr_put(ctx, h);
> +
> +	return ret;
> +}
> +

[...]

Thanks,

Oren.

  parent reply	other threads:[~2010-01-20 21:26 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-20 15:01 Network namespace and device support Dan Smith
     [not found] ` <1263999673-11279-1-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-01-20 15:01   ` [PATCH 1/3] Expose rtnl_link_ops_get() Dan Smith
2010-01-20 15:01   ` [PATCH 2/3] Add a veth_get_peer() and veth_set_peer() functions Dan Smith
2010-01-21  9:24     ` David Miller
2010-01-20 15:01   ` [PATCH 3/3] C/R: Basic support for network namespaces and devices Dan Smith
     [not found]     ` <1263999673-11279-4-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-01-20 17:36       ` Serge E. Hallyn
2010-01-20 21:26       ` Oren Laadan [this message]
2010-01-21 15:38         ` Dan Smith
     [not found]           ` <878wbrpix6.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
2010-01-21 16:14             ` Oren Laadan
2010-01-20 22:21       ` Brian Haley
     [not found]         ` <4B5781DF.6050106-VXdhtT5mjnY@public.gmane.org>
2010-01-21 15:37           ` Dan Smith
     [not found]             ` <87fx5zpiy7.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
2010-01-21 16:08               ` Oren Laadan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B57751A.5080506@cs.columbia.edu \
    --to=orenl-eqauephvms7envbuuze7ea@public.gmane.org \
    --cc=containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org \
    --cc=danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox