Re: [RFC v3][PATCH 5/9] Memory managemnet (restore)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Oren Laadan <orenl@cs.columbia.edu>
To: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: containers@lists.linux-foundation.org, jeremy@goop.org,
	linux-kernel@vger.kernel.org, arnd@arndb.de
Subject: Re: [RFC v3][PATCH 5/9] Memory managemnet (restore)
Date: Thu, 11 Sep 2008 03:37:42 -0400	[thread overview]
Message-ID: <48C8CAC6.3090209@cs.columbia.edu> (raw)
In-Reply-To: <1221082922.6781.62.camel@nimitz>



Dave Hansen wrote:
> On Tue, 2008-09-09 at 02:01 -0400, Oren Laadan wrote: 
>>> Have you looked at mprotect_fixup()?  It deals with two things:
>>> 1. altering the commit charge against RSS if the mapping is actually
>>>    writable.
>>> 2. Merging the VMA with an adjacent one if possible
>>>
>>> We don't want to do either of these two things.  Even if we do merge the
>>> VMA, it will be a waste of time and energy since we'll just re-split it
>>> when we mprotect() again.
>> Your observation is correct; I chose this interface because it's really
>> simple and handy. I'm not worried about the performance because such VMAs
>> (read only but modified) are really rare, and the code can be optimized
>> later on.
> 
> The worry is that it will never get cleaned up, and it is basically
> cruft as it stands.  People may think that it is here protecting or
> fixing something that it is not.

Let me start with the bottom line - since this creates too much confusion,
I'll just switch to the alternative: will use get_user_pages() to bring
pages in and copy the data directly. Hopefully this will end the discussion.

(Note, there there is a performance penalty in the form of extra data copy:
instead of reading data directly to the page, we instead read into a buffer,
kmap_atomic the page and copy into the page).

> 
>>>>>> +	/* restore original protection for this vma */
>>>>>> +	if (!(cr_vma->vm_flags & VM_WRITE))
>>>>>> +		ret = cr_vma_writable(mm, cr_vma->vm_start, cr_vma->vm_end, 0);
>>>>>> +
>>>>>> + out:
>>>>>> +	return ret;
>>>>>> +}
>>>>> Ugh.  Is this a security hole?  What if the user was not allowed to
>>>>> write to the file being mmap()'d by this VMA?  Is this a window where
>>>>> someone could come in and (using ptrace or something similar) write to
>>>>> the file?
>>>> Not a security hole: this is only for private memory, so it never
>>>> modifies the underlying file. This is related to what I explained before
>>>> about read-only VMAs that have modified pages.
>>> OK, so a shared, read-only mmap() should never get into this code path.
>>> What if an attacker modified the checkpoint file to pretend to have
>>> pages for a read-only, but shared mmap().  Would this code be tricked?
>> VMAs of shared maps (IPC, anonymous shared) will be treated differently.
>>
>> VMAs of shared files (mapped shared) are saved without their contents,
>> as the contents remains available on the file system !  (yes, for that
>> we will eventually need file system snapshots).
>>
>> As for an attack that provides an altered checkpoint image: since we
>> (currently) don't escalate privileges, the attacker will not be able
>> to modify something that it doesn't have access to in the first place.
> 
> I bugged Serge about this.  He said that this, at least, bypasses the SE
> Linux checks that are normally done with an mprotect() system call.
> That's a larger design problem that we need to keep in mind: we need to
> be careful to keep existing checks in place.

I also discussed this with Serge, and I got the impression that he
agreed that there was no security issue because it was all and only
about private memory.

> 
>>>> The process is restarting, inside a container that is restarting. All
>>>> tasks inside should be calling sys_restart() (by design) and no other
>>>> process from outside should be allowed to ptrace them at this point.
>>> Are there plans to implement this, or is it already in here somehow?
>> Once we get positive responses about the current patchset, the next
>> step is to handle multiple processes: I plan to extend the freezer
>> with two more state for this purpose (dumping, restarting).
> 
> OK, but I just asked you why a ptrace() of a process during this
> elevated privilege operation couldn't potentially do something bad.  You
> responded that, by design, we can't ptrace things.  The design is all
> well and good, but the patch isn't, because it doesn't implement that
> design. :(  Before we get these merged, that needs to get resolved.

If a task is ptraced, then the tracer can easily arrange for the tracee
to call mprotect(), or to call sys_restart() with a tampered checkpoint
file, or do other tricks. The call to mprotect_fix(), on a private vma,
does not make this any worse. That is why I didn't bother implementing
that bit.

> 
>>>> (In any case, if some other tasks ptraces this task, it can make it do
>>>> anything anyhow).
>>> No.  I'm suggesting that since this lets you effectively write to
>>> something that is not writable, it may be a hole with which to bypass
>>> permissions which were set up at an earlier time.
>> That's a good comment, but here all we are doing here is to modify a
>> privately mapped/anonymous memory.
>>
>>>>> We copy into the process address space all the time when not in its
>>>>> context explicitly.  
>>>> Huh ?
>>> I'm just saying that you don't need to be in a process's context in
>>> order to copy contents into its virtual address space.  Check out
>>> access_process_vm().
>>>
>> That would be the other way to implement the restart. But, since restart
>> executes in task's context, it's simpler and more efficient to leverage
>> copy-to-user().
>> In terms of security, both methods brings about the same end results: the
>> memory is modified (perhaps bypassing the read-only property of the VMA)
> 
> But copy_to_user() is fundamentally different.  It writes *over*
> contents and in to files.  Simulating a fault fills in those pages, but
> it never writes over things or in to files.   Faulting is fundamentally
> safer.

copy_to_user() does not write into a file with private VMAs.
copy_to_user() in our case will always trigger a page fault.
copy_to_user() is faster as it does not require an extra copy.

> 
> Faulting today can also handle populating a memory area with pages that
> appear read-only via userspace.  That's exactly what we're doing here as
> well.
> 
> Anyway, I don't expect that you'll agree with this.  I'll prototype
> doing it the other way at some point and we can compare how both look.

Back to bottom line - whether or not I agree - I already changed the code
to use get_user_pages() and got rid of this controversy.

Oren.

next prev parent reply	other threads:[~2008-09-11  7:39 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-04  7:57 [RFC v3][PATCH 0/9] Kernel based checkpoint/restart Oren Laadan
2008-09-04  8:02 ` [RFC v3][PATCH 1/9] Create syscalls: sys_checkpoint, sys_restart Oren Laadan
2008-09-04  8:37   ` Cedric Le Goater
2008-09-04 14:42   ` Serge E. Hallyn
2008-09-04 17:32     ` Oren Laadan
2008-09-04 20:37       ` Serge E. Hallyn
2008-09-04 21:05         ` Oren Laadan
2008-09-04 22:03           ` Serge E. Hallyn
2008-09-08 15:02     ` [Devel] " Andrey Mirkin
2008-09-08 16:07       ` Cedric Le Goater
2008-09-04  8:02 ` [RFC v3][PATCH 2/9] General infrastructure for checkpoint restart Oren Laadan
2008-09-04  9:12   ` Louis Rilling
2008-09-04 16:00     ` Serge E. Hallyn
2008-09-04 16:03   ` Serge E. Hallyn
2008-09-04 16:09     ` Dave Hansen
2008-09-04  8:03 ` [RFC v3][PATCH 3/9] x86 support for checkpoint/restart Oren Laadan
2008-09-04  8:03 ` [RFC v3][PATCH 4/9] Memory management (dump) Oren Laadan
2008-09-04 18:25   ` Dave Hansen
2008-09-07  1:54     ` Oren Laadan
2008-09-08 15:55       ` Dave Hansen
2008-09-04  8:04 ` [RFC v3][PATCH 5/9] Memory managemnet (restore) Oren Laadan
2008-09-04 18:08   ` Dave Hansen
2008-09-07  3:09     ` Oren Laadan
2008-09-08 16:49       ` Dave Hansen
2008-09-09  6:01         ` Oren Laadan
2008-09-10 21:42           ` Dave Hansen
2008-09-10 22:00             ` Cleanups for: [PATCH " Dave Hansen
2008-09-11  7:37             ` Oren Laadan [this message]
2008-09-11 15:38               ` [RFC v3][PATCH " Serge E. Hallyn
2008-09-12 16:34               ` Dave Hansen
2008-09-04  8:04 ` [RFC v3][PATCH 6/9] Checkpoint/restart: initial documentation Oren Laadan
2008-09-04  8:05 ` [RFC v3][PATCH 7/9] Infrastructure for shared objects Oren Laadan
2008-09-04  9:38   ` Louis Rilling
2008-09-04 14:23     ` Oren Laadan
2008-09-04 18:14   ` Dave Hansen
2008-09-04  8:05 ` [RFC v3][PATCH 8/9] File descriprtors (dump) Oren Laadan
2008-09-04  9:47   ` Louis Rilling
2008-09-04 14:43     ` Oren Laadan
2008-09-04 15:01   ` Dave Hansen
2008-09-04 18:41   ` Dave Hansen
2008-09-07  4:52     ` Oren Laadan
2008-09-08 16:57       ` Dave Hansen
2008-09-04  8:06 ` [RFC v3][PATCH 9/9] File descriprtors (restore) Oren Laadan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48C8CAC6.3090209@cs.columbia.edu \
    --to=orenl@cs.columbia.edu \
    --cc=arnd@arndb.de \
    --cc=containers@lists.linux-foundation.org \
    --cc=dave@linux.vnet.ibm.com \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox