From: Oren Laadan <orenl@cs.columbia.edu>
To: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: containers@lists.linux-foundation.org, jeremy@goop.org,
linux-kernel@vger.kernel.org, arnd@arndb.de
Subject: Re: [RFC v3][PATCH 5/9] Memory managemnet (restore)
Date: Thu, 11 Sep 2008 03:37:42 -0400 [thread overview]
Message-ID: <48C8CAC6.3090209@cs.columbia.edu> (raw)
In-Reply-To: <1221082922.6781.62.camel@nimitz>
Dave Hansen wrote:
> On Tue, 2008-09-09 at 02:01 -0400, Oren Laadan wrote:
>>> Have you looked at mprotect_fixup()? It deals with two things:
>>> 1. altering the commit charge against RSS if the mapping is actually
>>> writable.
>>> 2. Merging the VMA with an adjacent one if possible
>>>
>>> We don't want to do either of these two things. Even if we do merge the
>>> VMA, it will be a waste of time and energy since we'll just re-split it
>>> when we mprotect() again.
>> Your observation is correct; I chose this interface because it's really
>> simple and handy. I'm not worried about the performance because such VMAs
>> (read only but modified) are really rare, and the code can be optimized
>> later on.
>
> The worry is that it will never get cleaned up, and it is basically
> cruft as it stands. People may think that it is here protecting or
> fixing something that it is not.
Let me start with the bottom line - since this creates too much confusion,
I'll just switch to the alternative: will use get_user_pages() to bring
pages in and copy the data directly. Hopefully this will end the discussion.
(Note, there there is a performance penalty in the form of extra data copy:
instead of reading data directly to the page, we instead read into a buffer,
kmap_atomic the page and copy into the page).
>
>>>>>> + /* restore original protection for this vma */
>>>>>> + if (!(cr_vma->vm_flags & VM_WRITE))
>>>>>> + ret = cr_vma_writable(mm, cr_vma->vm_start, cr_vma->vm_end, 0);
>>>>>> +
>>>>>> + out:
>>>>>> + return ret;
>>>>>> +}
>>>>> Ugh. Is this a security hole? What if the user was not allowed to
>>>>> write to the file being mmap()'d by this VMA? Is this a window where
>>>>> someone could come in and (using ptrace or something similar) write to
>>>>> the file?
>>>> Not a security hole: this is only for private memory, so it never
>>>> modifies the underlying file. This is related to what I explained before
>>>> about read-only VMAs that have modified pages.
>>> OK, so a shared, read-only mmap() should never get into this code path.
>>> What if an attacker modified the checkpoint file to pretend to have
>>> pages for a read-only, but shared mmap(). Would this code be tricked?
>> VMAs of shared maps (IPC, anonymous shared) will be treated differently.
>>
>> VMAs of shared files (mapped shared) are saved without their contents,
>> as the contents remains available on the file system ! (yes, for that
>> we will eventually need file system snapshots).
>>
>> As for an attack that provides an altered checkpoint image: since we
>> (currently) don't escalate privileges, the attacker will not be able
>> to modify something that it doesn't have access to in the first place.
>
> I bugged Serge about this. He said that this, at least, bypasses the SE
> Linux checks that are normally done with an mprotect() system call.
> That's a larger design problem that we need to keep in mind: we need to
> be careful to keep existing checks in place.
I also discussed this with Serge, and I got the impression that he
agreed that there was no security issue because it was all and only
about private memory.
>
>>>> The process is restarting, inside a container that is restarting. All
>>>> tasks inside should be calling sys_restart() (by design) and no other
>>>> process from outside should be allowed to ptrace them at this point.
>>> Are there plans to implement this, or is it already in here somehow?
>> Once we get positive responses about the current patchset, the next
>> step is to handle multiple processes: I plan to extend the freezer
>> with two more state for this purpose (dumping, restarting).
>
> OK, but I just asked you why a ptrace() of a process during this
> elevated privilege operation couldn't potentially do something bad. You
> responded that, by design, we can't ptrace things. The design is all
> well and good, but the patch isn't, because it doesn't implement that
> design. :( Before we get these merged, that needs to get resolved.
If a task is ptraced, then the tracer can easily arrange for the tracee
to call mprotect(), or to call sys_restart() with a tampered checkpoint
file, or do other tricks. The call to mprotect_fix(), on a private vma,
does not make this any worse. That is why I didn't bother implementing
that bit.
>
>>>> (In any case, if some other tasks ptraces this task, it can make it do
>>>> anything anyhow).
>>> No. I'm suggesting that since this lets you effectively write to
>>> something that is not writable, it may be a hole with which to bypass
>>> permissions which were set up at an earlier time.
>> That's a good comment, but here all we are doing here is to modify a
>> privately mapped/anonymous memory.
>>
>>>>> We copy into the process address space all the time when not in its
>>>>> context explicitly.
>>>> Huh ?
>>> I'm just saying that you don't need to be in a process's context in
>>> order to copy contents into its virtual address space. Check out
>>> access_process_vm().
>>>
>> That would be the other way to implement the restart. But, since restart
>> executes in task's context, it's simpler and more efficient to leverage
>> copy-to-user().
>> In terms of security, both methods brings about the same end results: the
>> memory is modified (perhaps bypassing the read-only property of the VMA)
>
> But copy_to_user() is fundamentally different. It writes *over*
> contents and in to files. Simulating a fault fills in those pages, but
> it never writes over things or in to files. Faulting is fundamentally
> safer.
copy_to_user() does not write into a file with private VMAs.
copy_to_user() in our case will always trigger a page fault.
copy_to_user() is faster as it does not require an extra copy.
>
> Faulting today can also handle populating a memory area with pages that
> appear read-only via userspace. That's exactly what we're doing here as
> well.
>
> Anyway, I don't expect that you'll agree with this. I'll prototype
> doing it the other way at some point and we can compare how both look.
Back to bottom line - whether or not I agree - I already changed the code
to use get_user_pages() and got rid of this controversy.
Oren.
next prev parent reply other threads:[~2008-09-11 7:39 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-09-04 7:57 [RFC v3][PATCH 0/9] Kernel based checkpoint/restart Oren Laadan
2008-09-04 8:02 ` [RFC v3][PATCH 1/9] Create syscalls: sys_checkpoint, sys_restart Oren Laadan
2008-09-04 8:37 ` Cedric Le Goater
2008-09-04 14:42 ` Serge E. Hallyn
2008-09-04 17:32 ` Oren Laadan
2008-09-04 20:37 ` Serge E. Hallyn
2008-09-04 21:05 ` Oren Laadan
2008-09-04 22:03 ` Serge E. Hallyn
2008-09-08 15:02 ` [Devel] " Andrey Mirkin
2008-09-08 16:07 ` Cedric Le Goater
2008-09-04 8:02 ` [RFC v3][PATCH 2/9] General infrastructure for checkpoint restart Oren Laadan
2008-09-04 9:12 ` Louis Rilling
2008-09-04 16:00 ` Serge E. Hallyn
2008-09-04 16:03 ` Serge E. Hallyn
2008-09-04 16:09 ` Dave Hansen
2008-09-04 8:03 ` [RFC v3][PATCH 3/9] x86 support for checkpoint/restart Oren Laadan
2008-09-04 8:03 ` [RFC v3][PATCH 4/9] Memory management (dump) Oren Laadan
2008-09-04 18:25 ` Dave Hansen
2008-09-07 1:54 ` Oren Laadan
2008-09-08 15:55 ` Dave Hansen
2008-09-04 8:04 ` [RFC v3][PATCH 5/9] Memory managemnet (restore) Oren Laadan
2008-09-04 18:08 ` Dave Hansen
2008-09-07 3:09 ` Oren Laadan
2008-09-08 16:49 ` Dave Hansen
2008-09-09 6:01 ` Oren Laadan
2008-09-10 21:42 ` Dave Hansen
2008-09-10 22:00 ` Cleanups for: [PATCH " Dave Hansen
2008-09-11 7:37 ` Oren Laadan [this message]
2008-09-11 15:38 ` [RFC v3][PATCH " Serge E. Hallyn
2008-09-12 16:34 ` Dave Hansen
2008-09-04 8:04 ` [RFC v3][PATCH 6/9] Checkpoint/restart: initial documentation Oren Laadan
2008-09-04 8:05 ` [RFC v3][PATCH 7/9] Infrastructure for shared objects Oren Laadan
2008-09-04 9:38 ` Louis Rilling
2008-09-04 14:23 ` Oren Laadan
2008-09-04 18:14 ` Dave Hansen
2008-09-04 8:05 ` [RFC v3][PATCH 8/9] File descriprtors (dump) Oren Laadan
2008-09-04 9:47 ` Louis Rilling
2008-09-04 14:43 ` Oren Laadan
2008-09-04 15:01 ` Dave Hansen
2008-09-04 18:41 ` Dave Hansen
2008-09-07 4:52 ` Oren Laadan
2008-09-08 16:57 ` Dave Hansen
2008-09-04 8:06 ` [RFC v3][PATCH 9/9] File descriprtors (restore) Oren Laadan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48C8CAC6.3090209@cs.columbia.edu \
--to=orenl@cs.columbia.edu \
--cc=arnd@arndb.de \
--cc=containers@lists.linux-foundation.org \
--cc=dave@linux.vnet.ibm.com \
--cc=jeremy@goop.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox