From: Oren Laadan <orenl@cs.columbia.edu>
To: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: containers@lists.linux-foundation.org, jeremy@goop.org,
linux-kernel@vger.kernel.org, arnd@arndb.de
Subject: Re: [RFC v3][PATCH 5/9] Memory managemnet (restore)
Date: Thu, 11 Sep 2008 03:37:42 -0400 [thread overview]
Message-ID: <48C8CAC6.3090209@cs.columbia.edu> (raw)
In-Reply-To: <1221082922.6781.62.camel@nimitz>
Dave Hansen wrote:
> On Tue, 2008-09-09 at 02:01 -0400, Oren Laadan wrote:
>>> Have you looked at mprotect_fixup()? It deals with two things:
>>> 1. altering the commit charge against RSS if the mapping is actually
>>> writable.
>>> 2. Merging the VMA with an adjacent one if possible
>>>
>>> We don't want to do either of these two things. Even if we do merge the
>>> VMA, it will be a waste of time and energy since we'll just re-split it
>>> when we mprotect() again.
>> Your observation is correct; I chose this interface because it's really
>> simple and handy. I'm not worried about the performance because such VMAs
>> (read only but modified) are really rare, and the code can be optimized
>> later on.
>
> The worry is that it will never get cleaned up, and it is basically
> cruft as it stands. People may think that it is here protecting or
> fixing something that it is not.
Let me start with the bottom line - since this creates too much confusion,
I'll just switch to the alternative: will use get_user_pages() to bring
pages in and copy the data directly. Hopefully this will end the discussion.
(Note, there there is a performance penalty in the form of extra data copy:
instead of reading data directly to the page, we instead read into a buffer,
kmap_atomic the page and copy into the page).
>
>>>>>> + /* restore original protection for this vma */
>>>>>> + if (!(cr_vma->vm_flags & VM_WRITE))
>>>>>> + ret = cr_vma_writable(mm, cr_vma->vm_start, cr_vma->vm_end, 0);
>>>>>> +
>>>>>> + out:
>>>>>> + return ret;
>>>>>> +}
>>>>> Ugh. Is this a security hole? What if the user was not allowed to
>>>>> write to the file being mmap()'d by this VMA? Is this a window where
>>>>> someone could come in and (using ptrace or something similar) write to
>>>>> the file?
>>>> Not a security hole: this is only for private memory, so it never
>>>> modifies the underlying file. This is related to what I explained before
>>>> about read-only VMAs that have modified pages.
>>> OK, so a shared, read-only mmap() should never get into this code path.
>>> What if an attacker modified the checkpoint file to pretend to have
>>> pages for a read-only, but shared mmap(). Would this code be tricked?
>> VMAs of shared maps (IPC, anonymous shared) will be treated differently.
>>
>> VMAs of shared files (mapped shared) are saved without their contents,
>> as the contents remains available on the file system ! (yes, for that
>> we will eventually need file system snapshots).
>>
>> As for an attack that provides an altered checkpoint image: since we
>> (currently) don't escalate privileges, the attacker will not be able
>> to modify something that it doesn't have access to in the first place.
>
> I bugged Serge about this. He said that this, at least, bypasses the SE
> Linux checks that are normally done with an mprotect() system call.
> That's a larger design problem that we need to keep in mind: we need to
> be careful to keep existing checks in place.
I also discussed this with Serge, and I got the impression that he
agreed that there was no security issue because it was all and only
about private memory.
>
>>>> The process is restarting, inside a container that is restarting. All
>>>> tasks inside should be calling sys_restart() (by design) and no other
>>>> process from outside should be allowed to ptrace them at this point.
>>> Are there plans to implement this, or is it already in here somehow?
>> Once we get positive responses about the current patchset, the next
>> step is to handle multiple processes: I plan to extend the freezer
>> with two more state for this purpose (dumping, restarting).
>
> OK, but I just asked you why a ptrace() of a process during this
> elevated privilege operation couldn't potentially do something bad. You
> responded that, by design, we can't ptrace things. The design is all
> well and good, but the patch isn't, because it doesn't implement that
> design. :( Before we get these merged, that needs to get resolved.
If a task is ptraced, then the tracer can easily arrange for the tracee
to call mprotect(), or to call sys_restart() with a tampered checkpoint
file, or do other tricks. The call to mprotect_fix(), on a private vma,
does not make this any worse. That is why I didn't bother implementing
that bit.
>
>>>> (In any case, if some other tasks ptraces this task, it can make it do
>>>> anything anyhow).
>>> No. I'm suggesting that since this lets you effectively write to
>>> something that is not writable, it may be a hole with which to bypass
>>> permissions which were set up at an earlier time.
>> That's a good comment, but here all we are doing here is to modify a
>> privately mapped/anonymous memory.
>>
>>>>> We copy into the process address space all the time when not in its
>>>>> context explicitly.
>>>> Huh ?
>>> I'm just saying that you don't need to be in a process's context in
>>> order to copy contents into its virtual address space. Check out
>>> access_process_vm().
>>>
>> That would be the other way to implement the restart. But, since restart
>> executes in task's context, it's simpler and more efficient to leverage
>> copy-to-user().
>> In terms of security, both methods brings about the same end results: the
>> memory is modified (perhaps bypassing the read-only property of the VMA)
>
> But copy_to_user() is fundamentally different. It writes *over*
> contents and in to files. Simulating a fault fills in those pages, but
> it never writes over things or in to files. Faulting is fundamentally
> safer.
copy_to_user() does not write into a file with private VMAs.
copy_to_user() in our case will always trigger a page fault.
copy_to_user() is faster as it does not require an extra copy.
>
> Faulting today can also handle populating a memory area with pages that
> appear read-only via userspace. That's exactly what we're doing here as
> well.
>
> Anyway, I don't expect that you'll agree with this. I'll prototype
> doing it the other way at some point and we can compare how both look.
Back to bottom line - whether or not I agree - I already changed the code
to use get_user_pages() and got rid of this controversy.
Oren.
next prev parent reply other threads:[~2008-09-11 7:39 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-09-04 7:57 [RFC v3][PATCH 0/9] Kernel based checkpoint/restart Oren Laadan
2008-09-04 8:02 ` [RFC v3][PATCH 1/9] Create syscalls: sys_checkpoint, sys_restart Oren Laadan
2008-09-04 8:37 ` Cedric Le Goater
[not found] ` <Pine.LNX.4.64.0809040401320.5982-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2008-09-04 8:37 ` Cedric Le Goater
2008-09-04 14:42 ` Serge E. Hallyn
2008-09-04 14:42 ` Serge E. Hallyn
2008-09-04 17:32 ` Oren Laadan
[not found] ` <48C01B92.60900-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-09-04 20:37 ` Serge E. Hallyn
2008-09-04 20:37 ` Serge E. Hallyn
[not found] ` <20080904203730.GA28313-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-09-04 21:05 ` Oren Laadan
2008-09-04 21:05 ` Oren Laadan
2008-09-04 22:03 ` Serge E. Hallyn
[not found] ` <48C04D7C.6020500-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-09-04 22:03 ` Serge E. Hallyn
[not found] ` <20080904144223.GA19364-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-09-04 17:32 ` Oren Laadan
2008-09-08 15:02 ` [Devel] " Andrey Mirkin
2008-09-08 15:02 ` Andrey Mirkin
[not found] ` <200809081902.33709.amirkin-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2008-09-08 16:07 ` Cedric Le Goater
2008-09-08 16:07 ` Cedric Le Goater
2008-09-04 8:02 ` [RFC v3][PATCH 2/9] General infrastructure for checkpoint restart Oren Laadan
[not found] ` <Pine.LNX.4.64.0809040402170.5982-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2008-09-04 9:12 ` Louis Rilling
2008-09-04 9:12 ` Louis Rilling
[not found] ` <20080904091230.GW14473-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-09-04 16:00 ` Serge E. Hallyn
2008-09-04 16:00 ` Serge E. Hallyn
2008-09-04 16:03 ` Serge E. Hallyn
2008-09-04 16:03 ` Serge E. Hallyn
[not found] ` <20080904160311.GC19364-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-09-04 16:09 ` Dave Hansen
2008-09-04 16:09 ` Dave Hansen
2008-09-04 8:03 ` [RFC v3][PATCH 3/9] x86 support for checkpoint/restart Oren Laadan
2008-09-04 8:03 ` [RFC v3][PATCH 4/9] Memory management (dump) Oren Laadan
[not found] ` <Pine.LNX.4.64.0809040403120.5982-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2008-09-04 18:25 ` Dave Hansen
2008-09-04 18:25 ` Dave Hansen
2008-09-07 1:54 ` Oren Laadan
2008-09-07 1:54 ` Oren Laadan
2008-09-08 15:55 ` Dave Hansen
[not found] ` <48C3343D.9000407-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-09-08 15:55 ` Dave Hansen
2008-09-04 8:04 ` [RFC v3][PATCH 5/9] Memory managemnet (restore) Oren Laadan
2008-09-04 18:08 ` Dave Hansen
2008-09-07 3:09 ` Oren Laadan
[not found] ` <48C345D2.1020603-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-09-08 16:49 ` Dave Hansen
2008-09-08 16:49 ` Dave Hansen
2008-09-09 6:01 ` Oren Laadan
2008-09-09 6:01 ` Oren Laadan
2008-09-10 21:42 ` Dave Hansen
2008-09-10 22:00 ` Cleanups for: [PATCH " Dave Hansen
2008-09-10 22:00 ` Dave Hansen
2008-09-11 7:37 ` [RFC v3][PATCH " Oren Laadan
2008-09-11 7:37 ` Oren Laadan [this message]
2008-09-11 15:38 ` Serge E. Hallyn
2008-09-12 16:34 ` Dave Hansen
[not found] ` <48C8CAC6.3090209-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-09-11 15:38 ` Serge E. Hallyn
2008-09-12 16:34 ` Dave Hansen
[not found] ` <48C6113A.3080804-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-09-10 21:42 ` Dave Hansen
2008-09-07 3:09 ` Oren Laadan
[not found] ` <Pine.LNX.4.64.0809040404060.5982-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2008-09-04 18:08 ` Dave Hansen
2008-09-04 8:04 ` [RFC v3][PATCH 6/9] Checkpoint/restart: initial documentation Oren Laadan
2008-09-04 8:05 ` [RFC v3][PATCH 7/9] Infrastructure for shared objects Oren Laadan
2008-09-04 9:38 ` Louis Rilling
2008-09-04 14:23 ` Oren Laadan
[not found] ` <20080904093803.GX14473-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-09-04 14:23 ` Oren Laadan
[not found] ` <Pine.LNX.4.64.0809040404550.5982-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2008-09-04 9:38 ` Louis Rilling
2008-09-04 18:14 ` Dave Hansen
2008-09-04 18:14 ` Dave Hansen
2008-09-04 8:05 ` [RFC v3][PATCH 8/9] File descriprtors (dump) Oren Laadan
2008-09-04 9:47 ` Louis Rilling
[not found] ` <20080904094740.GY14473-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-09-04 14:43 ` Oren Laadan
2008-09-04 14:43 ` Oren Laadan
[not found] ` <Pine.LNX.4.64.0809040405250.5982-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2008-09-04 9:47 ` Louis Rilling
2008-09-04 15:01 ` Dave Hansen
2008-09-04 18:41 ` Dave Hansen
2008-09-04 18:41 ` Dave Hansen
2008-09-07 4:52 ` Oren Laadan
[not found] ` <48C35DFC.9080903-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-09-08 16:57 ` Dave Hansen
2008-09-08 16:57 ` Dave Hansen
2008-09-07 4:52 ` Oren Laadan
2008-09-04 15:01 ` Dave Hansen
2008-09-04 8:06 ` [RFC v3][PATCH 9/9] File descriprtors (restore) Oren Laadan
[not found] ` <Pine.LNX.4.64.0809040354440.460-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2008-09-04 8:02 ` [RFC v3][PATCH 1/9] Create syscalls: sys_checkpoint, sys_restart Oren Laadan
2008-09-04 8:02 ` [RFC v3][PATCH 2/9] General infrastructure for checkpoint restart Oren Laadan
2008-09-04 8:03 ` [RFC v3][PATCH 3/9] x86 support for checkpoint/restart Oren Laadan
2008-09-04 8:03 ` [RFC v3][PATCH 4/9] Memory management (dump) Oren Laadan
2008-09-04 8:04 ` [RFC v3][PATCH 5/9] Memory managemnet (restore) Oren Laadan
2008-09-04 8:04 ` [RFC v3][PATCH 6/9] Checkpoint/restart: initial documentation Oren Laadan
2008-09-04 8:05 ` [RFC v3][PATCH 7/9] Infrastructure for shared objects Oren Laadan
2008-09-04 8:05 ` [RFC v3][PATCH 8/9] File descriprtors (dump) Oren Laadan
2008-09-04 8:06 ` [RFC v3][PATCH 9/9] File descriprtors (restore) Oren Laadan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48C8CAC6.3090209@cs.columbia.edu \
--to=orenl@cs.columbia.edu \
--cc=arnd@arndb.de \
--cc=containers@lists.linux-foundation.org \
--cc=dave@linux.vnet.ibm.com \
--cc=jeremy@goop.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.