Linux Container Development
 help / color / mirror / Atom feed
From: "Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
To: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
Subject: Re: [PATCH 2/3] epoll: Add support for checkpointing large numbers of epoll items
Date: Wed, 21 Oct 2009 09:59:50 -0500	[thread overview]
Message-ID: <20091021145950.GA13327@us.ibm.com> (raw)
In-Reply-To: <d0fd1f3eb4eaa326488f59955e5b4790080f3073.1255971848.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

Quoting Matt Helsley (matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
> Currently we allocate memory to output all of the epoll items in one
> big chunk. At 20 bytes per item, and since epoll was designed to
> support on the order of 10,000 items, we may find ourselves kmalloc'ing
> 200,000 bytes. That's an order 7 allocation whereas the heuristic for
> difficult allocations, PAGE_ALLOC_COST_ORDER, is 3.
> 
> Instead, output the epoll header and items separately. Chunk the output
> much like the pid array gets chunked. This ensures that even sub-order 0
> allocations will enable checkpoint of large epoll sets. A subsequent
> patch will do something similar for the restore path.
> 
> Signed-off-by: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

Feels a bit auto-tune-magic-happy :) but looks good

Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

> ---
>  fs/eventpoll.c |   71 ++++++++++++++++++++++++++++++++++++-------------------
>  1 files changed, 46 insertions(+), 25 deletions(-)
> 
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index 4706ec5..2506b40 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -1480,7 +1480,7 @@ static int ep_items_checkpoint(void *data)
>  	struct rb_node *rbp;
>  	struct eventpoll *ep;
>  	__s32 epfile_objref;
> -	int i, num_items, ret;
> +	int num_items = 0, nchunk, ret;
> 
>  	ctx = dq_entry->ctx;
> 
> @@ -1489,9 +1489,8 @@ static int ep_items_checkpoint(void *data)
> 
>  	ep = dq_entry->epfile->private_data;
>  	mutex_lock(&ep->mtx);
> -	for (i = 0, rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp), i++) {}
> +	for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp), num_items++) {}
>  	mutex_unlock(&ep->mtx);
> -	num_items = i;
> 
>  	h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_EPOLL_ITEMS);
>  	if (!h)
> @@ -1503,36 +1502,58 @@ static int ep_items_checkpoint(void *data)
>  	if (ret || !num_items)
>  		return ret;
> 
> -	items = kzalloc(sizeof(*items)*num_items, GFP_KERNEL);
> +	ret = ckpt_write_obj_type(ctx, NULL, sizeof(*items)*num_items,
> +				  CKPT_HDR_BUFFER);
> +	if (ret < 0)
> +		return ret;
> +
> +	nchunk = num_items;
> +	do {
> +		items = kzalloc(sizeof(*items)*nchunk, GFP_KERNEL);
> +		if (items)
> +			break;
> +		nchunk = nchunk >> 1;
> +	} while (nchunk > 0);
>  	if (!items)
>  		return -ENOMEM;
> +
> +	/*
> +	 * Walk the rbtree copying items into the chunk of memory and then
> +	 * writing them to the checkpoint image
> +	 */
>  	ret = 0;
> -	i = 0;
>  	mutex_lock(&ep->mtx);
> -	for (rbp = rb_first(&ep->rbr); i < num_items && rbp; rbp = rb_next(rbp),
> -	     i++) {
> -		struct epitem *epi;
> -		int objref;
> -
> -		epi = rb_entry(rbp, struct epitem, rbn);
> -		items[i].fd = epi->ffd.fd;
> -		items[i].events = epi->event.events;
> -		items[i].data = epi->event.data;
> -		objref = ckpt_obj_lookup(ctx, epi->ffd.file, CKPT_OBJ_FILE);
> -		if (objref <= 0) {
> -			ret = -EBUSY; /* missing item -- checkpoint obj leak */
> -			break;
> +	rbp = rb_first(&ep->rbr);
> +	while ((num_items > 0) && rbp) {
> +		int n = min(num_items, nchunk);
> +		int j;
> +
> +		for (j = 0; rbp && j < n; j++, rbp = rb_next(rbp)) {
> +			struct epitem *epi;
> +			int objref;
> +
> +			epi = rb_entry(rbp, struct epitem, rbn);
> +			items[j].fd = epi->ffd.fd;
> +			items[j].events = epi->event.events;
> +			items[j].data = epi->event.data;
> +			objref = ckpt_obj_lookup(ctx, epi->ffd.file,
> +						 CKPT_OBJ_FILE);
> +			if (objref <= 0)
> +				goto unlock;
> +			items[j].file_objref = objref;
>  		}
> -		items[i].file_objref = objref;
> +		ret = ckpt_kwrite(ctx, items, n*sizeof(*items));
> +		if (ret < 0)
> +			break;
> +		num_items -= n;
>  	}
> +unlock:
>  	mutex_unlock(&ep->mtx);
> -	if (i == num_items && rbp)
> -		ret = -EBUSY; /* extra item(s) -- checkpoint obj leak */
> -	if (!ret)
> -		ret = ckpt_write_buffer(ctx, items, sizeof(*items)*num_items);
> -	else
> -		ckpt_write_err(ctx, "E", "checkpoint leak detected.\n", ret);
>  	kfree(items);
> +	if (num_items != 0 || (num_items == 0 && rbp))
> +		ret = -EBUSY; /* extra item(s) -- checkpoint obj leak */
> +	if (ret)
> +		ckpt_write_err(ctx, "E", " checkpointing epoll items.\n", ret);
>  	return ret;
>  }
> 
> -- 
> 1.5.6.3
> 
> 
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linux-foundation.org/mailman/listinfo/containers

  parent reply	other threads:[~2009-10-21 14:59 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-19 17:04 [PATCH 1/3] Checkpoint/restart epoll sets Matt Helsley
     [not found] ` <ce2e15faf44e254b80578c6c62e71d8685516896.1255971848.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-19 17:04   ` [PATCH 2/3] epoll: Add support for checkpointing large numbers of epoll items Matt Helsley
     [not found]     ` <d0fd1f3eb4eaa326488f59955e5b4790080f3073.1255971848.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-21 14:59       ` Serge E. Hallyn [this message]
     [not found]         ` <20091021145950.GA13327-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-22  6:40           ` Matt Helsley
     [not found]             ` <20091022064007.GG7757-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2009-10-23 23:54               ` Oren Laadan
2009-10-23 23:51       ` Oren Laadan
2009-10-23 23:58       ` Oren Laadan
     [not found]         ` <4AE24340.9030203-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-10-24  4:32           ` Matt Helsley
2009-10-19 17:04   ` [PATCH 3/3] epoll: Add support for restoring many " Matt Helsley
     [not found]     ` <8e4344b801150b95cd54f2d09b660525601de256.1255971848.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-21 15:09       ` Serge E. Hallyn
2009-10-23 23:56       ` Oren Laadan
2009-10-21  0:31   ` [PATCH 1/3] Checkpoint/restart epoll sets Serge E. Hallyn
     [not found]     ` <20091021003128.GA23721-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-22  6:29       ` Matt Helsley
     [not found]         ` <20091022062909.GF7757-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2009-10-22 14:02           ` Serge E. Hallyn
2009-10-23 23:30       ` Oren Laadan
2009-10-23 23:41   ` Oren Laadan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091021145950.GA13327@us.ibm.com \
    --to=serue-r/jw6+rmf7hqt0dzr+alfa@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox