From: "Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
To: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
Subject: Re: [PATCH 2/3] epoll: Add support for checkpointing large numbers of epoll items
Date: Wed, 21 Oct 2009 09:59:50 -0500 [thread overview]
Message-ID: <20091021145950.GA13327@us.ibm.com> (raw)
In-Reply-To: <d0fd1f3eb4eaa326488f59955e5b4790080f3073.1255971848.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Quoting Matt Helsley (matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
> Currently we allocate memory to output all of the epoll items in one
> big chunk. At 20 bytes per item, and since epoll was designed to
> support on the order of 10,000 items, we may find ourselves kmalloc'ing
> 200,000 bytes. That's an order 7 allocation whereas the heuristic for
> difficult allocations, PAGE_ALLOC_COST_ORDER, is 3.
>
> Instead, output the epoll header and items separately. Chunk the output
> much like the pid array gets chunked. This ensures that even sub-order 0
> allocations will enable checkpoint of large epoll sets. A subsequent
> patch will do something similar for the restore path.
>
> Signed-off-by: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Feels a bit auto-tune-magic-happy :) but looks good
Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> ---
> fs/eventpoll.c | 71 ++++++++++++++++++++++++++++++++++++-------------------
> 1 files changed, 46 insertions(+), 25 deletions(-)
>
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index 4706ec5..2506b40 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -1480,7 +1480,7 @@ static int ep_items_checkpoint(void *data)
> struct rb_node *rbp;
> struct eventpoll *ep;
> __s32 epfile_objref;
> - int i, num_items, ret;
> + int num_items = 0, nchunk, ret;
>
> ctx = dq_entry->ctx;
>
> @@ -1489,9 +1489,8 @@ static int ep_items_checkpoint(void *data)
>
> ep = dq_entry->epfile->private_data;
> mutex_lock(&ep->mtx);
> - for (i = 0, rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp), i++) {}
> + for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp), num_items++) {}
> mutex_unlock(&ep->mtx);
> - num_items = i;
>
> h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_EPOLL_ITEMS);
> if (!h)
> @@ -1503,36 +1502,58 @@ static int ep_items_checkpoint(void *data)
> if (ret || !num_items)
> return ret;
>
> - items = kzalloc(sizeof(*items)*num_items, GFP_KERNEL);
> + ret = ckpt_write_obj_type(ctx, NULL, sizeof(*items)*num_items,
> + CKPT_HDR_BUFFER);
> + if (ret < 0)
> + return ret;
> +
> + nchunk = num_items;
> + do {
> + items = kzalloc(sizeof(*items)*nchunk, GFP_KERNEL);
> + if (items)
> + break;
> + nchunk = nchunk >> 1;
> + } while (nchunk > 0);
> if (!items)
> return -ENOMEM;
> +
> + /*
> + * Walk the rbtree copying items into the chunk of memory and then
> + * writing them to the checkpoint image
> + */
> ret = 0;
> - i = 0;
> mutex_lock(&ep->mtx);
> - for (rbp = rb_first(&ep->rbr); i < num_items && rbp; rbp = rb_next(rbp),
> - i++) {
> - struct epitem *epi;
> - int objref;
> -
> - epi = rb_entry(rbp, struct epitem, rbn);
> - items[i].fd = epi->ffd.fd;
> - items[i].events = epi->event.events;
> - items[i].data = epi->event.data;
> - objref = ckpt_obj_lookup(ctx, epi->ffd.file, CKPT_OBJ_FILE);
> - if (objref <= 0) {
> - ret = -EBUSY; /* missing item -- checkpoint obj leak */
> - break;
> + rbp = rb_first(&ep->rbr);
> + while ((num_items > 0) && rbp) {
> + int n = min(num_items, nchunk);
> + int j;
> +
> + for (j = 0; rbp && j < n; j++, rbp = rb_next(rbp)) {
> + struct epitem *epi;
> + int objref;
> +
> + epi = rb_entry(rbp, struct epitem, rbn);
> + items[j].fd = epi->ffd.fd;
> + items[j].events = epi->event.events;
> + items[j].data = epi->event.data;
> + objref = ckpt_obj_lookup(ctx, epi->ffd.file,
> + CKPT_OBJ_FILE);
> + if (objref <= 0)
> + goto unlock;
> + items[j].file_objref = objref;
> }
> - items[i].file_objref = objref;
> + ret = ckpt_kwrite(ctx, items, n*sizeof(*items));
> + if (ret < 0)
> + break;
> + num_items -= n;
> }
> +unlock:
> mutex_unlock(&ep->mtx);
> - if (i == num_items && rbp)
> - ret = -EBUSY; /* extra item(s) -- checkpoint obj leak */
> - if (!ret)
> - ret = ckpt_write_buffer(ctx, items, sizeof(*items)*num_items);
> - else
> - ckpt_write_err(ctx, "E", "checkpoint leak detected.\n", ret);
> kfree(items);
> + if (num_items != 0 || (num_items == 0 && rbp))
> + ret = -EBUSY; /* extra item(s) -- checkpoint obj leak */
> + if (ret)
> + ckpt_write_err(ctx, "E", " checkpointing epoll items.\n", ret);
> return ret;
> }
>
> --
> 1.5.6.3
>
>
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
next prev parent reply other threads:[~2009-10-21 14:59 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-19 17:04 [PATCH 1/3] Checkpoint/restart epoll sets Matt Helsley
[not found] ` <ce2e15faf44e254b80578c6c62e71d8685516896.1255971848.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-19 17:04 ` [PATCH 2/3] epoll: Add support for checkpointing large numbers of epoll items Matt Helsley
[not found] ` <d0fd1f3eb4eaa326488f59955e5b4790080f3073.1255971848.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-21 14:59 ` Serge E. Hallyn [this message]
[not found] ` <20091021145950.GA13327-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-22 6:40 ` Matt Helsley
[not found] ` <20091022064007.GG7757-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2009-10-23 23:54 ` Oren Laadan
2009-10-23 23:51 ` Oren Laadan
2009-10-23 23:58 ` Oren Laadan
[not found] ` <4AE24340.9030203-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-10-24 4:32 ` Matt Helsley
2009-10-19 17:04 ` [PATCH 3/3] epoll: Add support for restoring many " Matt Helsley
[not found] ` <8e4344b801150b95cd54f2d09b660525601de256.1255971848.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-21 15:09 ` Serge E. Hallyn
2009-10-23 23:56 ` Oren Laadan
2009-10-21 0:31 ` [PATCH 1/3] Checkpoint/restart epoll sets Serge E. Hallyn
[not found] ` <20091021003128.GA23721-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-22 6:29 ` Matt Helsley
[not found] ` <20091022062909.GF7757-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2009-10-22 14:02 ` Serge E. Hallyn
2009-10-23 23:30 ` Oren Laadan
2009-10-23 23:41 ` Oren Laadan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091021145950.GA13327@us.ibm.com \
--to=serue-r/jw6+rmf7hqt0dzr+alfa@public.gmane.org \
--cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.