From: Oren Laadan <orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
To: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Cc: Containers
<containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
Subject: Re: [PATCH 2/2] [RFC] Add checkpoint/restart support for epoll files.
Date: Tue, 25 Aug 2009 13:42:20 -0400 [thread overview]
Message-ID: <4A94227C.9050101@librato.com> (raw)
In-Reply-To: <20090825051723.GD8078-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
Matt Helsley wrote:
> On Thu, Aug 20, 2009 at 12:33:38PM -0400, Oren Laadan wrote:
>> Hi Matt,
>>
>> I like the approach of the patch and use of deferqueue.
>> See comments below.
>>
>> Matt Helsley wrote:
>>> Save/restore epoll items during checkpoint/restart respectively.
>>>
>>> Tests for the cr_tests suite to follow. Tests pass on i386.
>>>
>>> TODOs (search the patch for "TODO") that could probably use some
>>> comments:
>>>
[...]
>>> +
>>> +struct epoll_deferq_entry {
>>> + struct ckpt_ctx *ctx;
>>> + struct file *epfile;
>>> +};
>> I suggest that:
>>
>> 1) Add 'int fd' here,
>> 2) in ckpt_eventpoll_items use the fd (instead of file objref)
>>
>> The motivation is security: if, on restart, you rely on a file objref,
>> then malicious user can change the file objref and the restart logic
>> may end up populating some other process's event poll items.
>>
>> Instead, by using an fd, we ensure that the restarting process will
>> only modify a file that it actually owns.
>
> The fd saved with the items is not necessarily matched with the file*
> anymore. For example userspace could do:
>
> fda = open("/dev/null", O_RDONLY);
> epoll_ctl(efd, EPOLL_CTL_ADD, fda, ...);
> fdb = dup(fda);
> dup2(0, fda);
> <checkpoint here>
> epoll_ctl(efd, EPOLL_CTL_MOD, fda, ...);
>
> Saving the fd would associate stdin with the epoll item and not
> /dev/null.
>
> So we can't use the fd in place of the file reference.
You are totally right. I was confused by the fact that epoll data
does save the original fd, and assumed it was "updated" at close()
and dup() time. But not...
... which makes me wonder: is it impossible for a user to delete
the original fd from the epoll using epoll_ctl(.., EPOLL_CTL_DEL...) ?
This is related to my concern during restart: the input can provide
_any_ file objref that is valid at that time during restart, and
plug in files owned by different processes than intended.
I'm unsure to what extent this is a security concern, and what's
the right approach to solve it.
Looking at ep_cmp_ffd(), it uses both the file pointer and the fd
to find a matching entry in the RB tree; So I'd think that in that
case you need to save both the original FD and the objref of the
file, no ?
>>> +static int ep_items_restore(void *data)
>>> +{
>>> + struct ckpt_ctx *ctx = *((struct ckpt_ctx**)data);
>> This is really obfuscated :(
>
> Tell that to the author of deferqueue. ;) Since deferqueue does memcpy()
> there's really not a better way to pass the ctx pointer.
Next time I'll use my right to remain silent :p
Seriously, maybe we can add a wrapper for this (common ?) case of
a passing a single pointer as data. Do you think this interface
may be usedful:
deferqueue_add_ptr(head, ptr, func, dtor)
{
return deferqueue(head, &ptr, sizeof(ptr), func, dtor);
}
and
void *deferqueue_data_ptr(void *data)
{
return *((void **) data);
}
[...]
>>> + struct ckpt_eventpoll_items *h;
>>> + struct eventpoll *ep;
>>> + struct file *epfile = NULL;
>>> + int ret, i = 0, remaining_watches;
>>> +
>>> + /*
>>> + * TODO possible kmalloc failure due to too many watches.
>>> + */
>>> + h = ckpt_read_obj(ctx, 0,
>>> + sizeof(*h) + max_user_watches*sizeof(h->items[0]));
>> Any reason to perfer ckpt_read_obj() over ckpt_read_buf_type() ?
>> (it will rid the check for type/len below).
>
> So I could read it all at once rather than fiddle with reading the
> header and chunking all of the input. I wanted to get something simple
> working first and then make it work with thousands of watches.
Hmm.. ckpt_read_buf_type() calls ckpt_read_obj() like you do, then
checks that the type matches. So you are left with the h->h.len sanity
check only.
[...]
>>> + if (!is_file_epoll(epfile))
>>> + goto out;
>>> +
>>> + ep = epfile->private_data;
>>> +
>>> + ret = -ENOSPC;
>>> + remaining_watches = (max_user_watches -
>>> + atomic_read(&ep->user->epoll_watches));
>>> + if (h->num_items > remaining_watches)
>>> + goto out;
>> This is tested anyway by ep_insert() below. Besides, it's racy here
>> (user may have another restart in parallel, and the check-modify isn't
>> atomic. Unlike with ep_insert(), this check is for a bunch of fd's not
>> a single one.
>
> Yes, it's racy. However, we're restarting so we're primarily adding watches.
> It's also possible we're restarting thousands of watches only to fail
> when we add the last few. This avoids all that useless work when possible.
Is this concern worth taking the code out of its "usual" place (the
common epoll code) and making a special case here ?
>>
>> struct ckpt_hdr_eventpoll_item {
> ^^^^^
> I strongly dislike this name change idea. There's no header for each item and
> it's obviously part of the checkpoint image. If anything, too many
> things have "_hdr_" in their names when they lack a ckpt_hdr or don't
> need one: ckpt_hdr_const, ckpt_hdr_pids, ckpt_hdr_socket, ckpt_hdr_sigset,
> ckpt_hdr_sigaction, ckpt_hdr_siginfo, ckpt_hdr_rlimit, ckpt_hdr_ipc_perms*)
I have no strong opinion in either way, and what you say makes sense.
I do strongly prefer a single common convention to all "ckpt_hdr"-less
structures (and one to all "ckpt_hdr"-full ones). When the patches for
this show up, I'll pull them :)
>
> (*We shouldn't need two adjacent headers for ckpt_hdr_ipc_{shm,msg,sem}.)
Right. I'll remove the ckpt-hdr from ckpt_{hdr_}ipc_perms.
Oren.
next prev parent reply other threads:[~2009-08-25 17:42 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-20 5:17 [PATCH 1/2] Check for valid destructor pointer before calling it Matt Helsley
[not found] ` <6d896e12f316e5ff5f44bd13ae1482dab7f9253a.1250745409.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-08-20 5:17 ` [PATCH 2/2] [RFC] Add checkpoint/restart support for epoll files Matt Helsley
[not found] ` <e820d207a3b1a3228ded29bce9b29f2179ab598b.1250745409.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-08-20 16:33 ` Oren Laadan
[not found] ` <4A8D7AE2.1030705-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-08-25 5:17 ` Matt Helsley
[not found] ` <20090825051723.GD8078-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2009-08-25 17:42 ` Oren Laadan [this message]
[not found] ` <4A94227C.9050101-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-08-26 9:00 ` Matt Helsley
2009-08-24 21:27 ` Serge E. Hallyn
[not found] ` <20090824212725.GA25030-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-08-25 2:01 ` Matt Helsley
[not found] ` <20090825020146.GC8078-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2009-08-25 4:17 ` Oren Laadan
[not found] ` <4A9365C4.3010305-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-08-25 12:09 ` Matt Helsley
2009-08-20 12:28 ` [PATCH 1/2] Check for valid destructor pointer before calling it Oren Laadan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A94227C.9050101@librato.com \
--to=orenl-rdfvbdnroixbdgjk7y7tuq@public.gmane.org \
--cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.