[patch 0/4] MAP_NOZERO v2 - VM_NOZERO/MAP_NOZERO early summer madness

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [patch 0/4] MAP_NOZERO v2 - VM_NOZERO/MAP_NOZERO early summer madness
@ 2007-06-28 18:49 Davide Libenzi
  2007-06-29  2:57 ` Kyle Moffett
  0 siblings, 1 reply; 23+ messages in thread
From: Davide Libenzi @ 2007-06-28 18:49 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Rik van Riel, Andy Isaacson

I was using oprofile to sample some userspace code I am working on,
and I was continuosly noticing  clear_page  in the top three entries
of the oprofile logs.
Also, a simple kernel build, in my Dual Opteron with 8GB of RAM,
shows  clear_page  as the first kernel entry, second only to the
userspace the  cc1  and  as.
Most of the userspace code uses malloc() (and anonymous mappings) in
such a way that the memory returned via kernel->glibc is immediately
written soon after. The POSIX malloc() definition itself also, does
not require the returned memory to be zeroed (as calloc() does).
So I implemented a rather quick hack that introduces a new mmap() flag
MAP_NOZERO (only valid for anonymous mappings) and the  vma  counter-part
VM_NOZERO. Also, a new sys_brk2() has been introduced to accept a new
flags  parameter. A brief description of the patches follows in the next
emails.
I first hacked Val's ebizzy to accept a new '-N' flag to make use of
MAP_NOZERO:

http://infohost.nmt.edu/~val/patches/ebizzy.tar.gz
http://www.xmailserver.org/ebizzy-nzmmap-0.3.diff

On my box,  ebizzy  performance jumped up from 10% to 15%.
The userspace code I am working on (uses malloc() quite heavily), saw
a performance jump of around 14%.
In both cases,  clear_page  dropped way down in the oprofile logs.
I then coded quick (and rather ugly) hacks for  glibc  and  gcc  to
make them use the new features (MAP_NOZERO and sys_brk2()):

http://www.xmailserver.org/glibc-nzmalloc-tweaks
http://www.xmailserver.org/gcc-nozero-hack

I then tried a 2.6.22-rc5 kernel build using the newly built  glibc
and  gcc  (with and w/out no-zero enabling options/env-vars), and
when using the no-zero mode,  clear_page  went way down in the oprofile
logs and build time dropped of about 2.5% to 3%.
I did not have time (and will) to tweak  as  and  ld  also.
These are some test utilities to verify the no-zero behaviour of MAP_NOZERO
(and sys_brk2()):

http://www.xmailserver.org/nzmmap-test.c
http://www.xmailserver.org/nzmalloc-test.c
http://www.xmailserver.org/smiffy.c

To run  nzmalloc-test  you need a patched glibc (using  glibc-nzmalloc-tweaks).
The  smiffy  one, should be run under a user that has no other processes
running and that owns no files on the system, and it verifies that all the
pages it gets from the kernel are zeroed (otherwise "Houston, we have a problem ...").
It is running on my system w/out barfing by more than two days.
How crazy is that?

- Davide

ChangeLog:

* Version 2

o Reusing _mapcount instead of adding a new field in the page struct
o Added a fix for a setuid+exec/ptrace race (Andy spotted)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] MAP_NOZERO v2 - VM_NOZERO/MAP_NOZERO early summer madness
  2007-06-28 18:49 [patch 0/4] MAP_NOZERO v2 - VM_NOZERO/MAP_NOZERO early summer madness Davide Libenzi
@ 2007-06-29  2:57 ` Kyle Moffett
  2007-06-29  3:04   ` Rik van Riel
                     ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Kyle Moffett @ 2007-06-29  2:57 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Linux Kernel Mailing List, Rik van Riel, Andy Isaacson

On Jun 28, 2007, at 14:49:24, Davide Libenzi wrote:
> I was using oprofile to sample some userspace code I am working on,  
> and I was continuosly noticing  clear_page  in the top three  
> entries of the oprofile logs.
>
> Also, a simple kernel build, in my Dual Opteron with 8GB of RAM,  
> shows  clear_page  as the first kernel entry, second only to the  
> userspace the  cc1  and  as. Most of the userspace code uses malloc 
> () (and anonymous  mappings) in such a way that the memory returned  
> via kernel->glibc is immediately written soon after. The POSIX  
> malloc() definition itself also, does not require the returned  
> memory to be zeroed (as calloc() does).
>
> So I implemented a rather quick hack that introduces a new mmap()  
> flag MAP_NOZERO (only valid for anonymous mappings) and the  vma   
> counter-part VM_NOZERO. Also, a new sys_brk2() has been introduced  
> to accept a new flags  parameter. A brief description of the  
> patches follows in the next emails.

Hmm, sounds like this would also need a "MAP_NOREUSE" flag of some  
kind for security sensitive applications.  Basically, I wouldn't want  
my ssh-agent pages holding private SSH keys to be reused by my web  
browser which then gets exploited :-D.  It would also be a massive  
information leak under SELinux.  To fix it properly according to the  
SELinux model you would need to tag each page with a label  
immediately after it's freed and then do an access-vector-check  
against the old page and the new process before allowing reuse.  On  
the other hand, that would probably be at least as expensive as  
zeroing the page.

Cheers,
Kyle Moffett




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] MAP_NOZERO v2 - VM_NOZERO/MAP_NOZERO early summer madness
  2007-06-29  2:57 ` Kyle Moffett
@ 2007-06-29  3:04   ` Rik van Riel
  2007-06-29  5:09     ` Ulrich Drepper
  2007-06-29  5:20   ` Davide Libenzi
  2007-06-29 19:39   ` Andy Isaacson
  2 siblings, 1 reply; 23+ messages in thread
From: Rik van Riel @ 2007-06-29  3:04 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: Davide Libenzi, Linux Kernel Mailing List, Andy Isaacson

Kyle Moffett wrote:
> On Jun 28, 2007, at 14:49:24, Davide Libenzi wrote:
>> I was using oprofile to sample some userspace code I am working on, 
>> and I was continuosly noticing  clear_page  in the top three entries 
>> of the oprofile logs.
>>
>> Also, a simple kernel build, in my Dual Opteron with 8GB of RAM, 
>> shows  clear_page  as the first kernel entry, second only to the 
>> userspace the  cc1  and  as. Most of the userspace code uses malloc() 
>> (and anonymous  mappings) in such a way that the memory returned via 
>> kernel->glibc is immediately written soon after. The POSIX malloc() 
>> definition itself also, does not require the returned memory to be 
>> zeroed (as calloc() does).
>>
>> So I implemented a rather quick hack that introduces a new mmap() flag 
>> MAP_NOZERO (only valid for anonymous mappings) and the  vma  
>> counter-part VM_NOZERO. Also, a new sys_brk2() has been introduced to 
>> accept a new flags  parameter. A brief description of the patches 
>> follows in the next emails.
> 
> Hmm, sounds like this would also need a "MAP_NOREUSE" flag of some kind 
> for security sensitive applications.

That wants MAP_PRIVATE so that the kernel can also decide to not
swap these pages out to an unencrypted swap area.

-- 
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] MAP_NOZERO v2 - VM_NOZERO/MAP_NOZERO early summer madness
  2007-06-29  3:04   ` Rik van Riel
@ 2007-06-29  5:09     ` Ulrich Drepper
  0 siblings, 0 replies; 23+ messages in thread
From: Ulrich Drepper @ 2007-06-29  5:09 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Kyle Moffett, Davide Libenzi, Linux Kernel Mailing List,
	Andy Isaacson

On 6/28/07, Rik van Riel <riel@redhat.com> wrote:
> That wants MAP_PRIVATE so that the kernel can also decide to not
> swap these pages out to an unencrypted swap area.

That's not what MAP_PRIVATE means.  MAP_PRIVATE is the opposite of
MAP_SHARED.  It's meaningless for anonymous memory (which is what
ssh-agent etc would use) and for file-backed data it definitely allows
swapping unless you use mlock().

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] MAP_NOZERO v2 - VM_NOZERO/MAP_NOZERO early summer madness
  2007-06-29  2:57 ` Kyle Moffett
  2007-06-29  3:04   ` Rik van Riel
@ 2007-06-29  5:20   ` Davide Libenzi
  2007-06-29 19:39   ` Andy Isaacson
  2 siblings, 0 replies; 23+ messages in thread
From: Davide Libenzi @ 2007-06-29  5:20 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: Linux Kernel Mailing List, Rik van Riel, Andy Isaacson

On Thu, 28 Jun 2007, Kyle Moffett wrote:

> On Jun 28, 2007, at 14:49:24, Davide Libenzi wrote:
> > I was using oprofile to sample some userspace code I am working on, and I
> > was continuosly noticing  clear_page  in the top three entries of the
> > oprofile logs.
> > 
> > Also, a simple kernel build, in my Dual Opteron with 8GB of RAM, shows
> > clear_page  as the first kernel entry, second only to the userspace the  cc1
> > and  as. Most of the userspace code uses malloc() (and anonymous  mappings)
> > in such a way that the memory returned via kernel->glibc is immediately
> > written soon after. The POSIX malloc() definition itself also, does not
> > require the returned memory to be zeroed (as calloc() does).
> > 
> > So I implemented a rather quick hack that introduces a new mmap() flag
> > MAP_NOZERO (only valid for anonymous mappings) and the  vma  counter-part
> > VM_NOZERO. Also, a new sys_brk2() has been introduced to accept a new flags
> > parameter. A brief description of the patches follows in the next emails.
> 
> Hmm, sounds like this would also need a "MAP_NOREUSE" flag of some kind for
> security sensitive applications.  Basically, I wouldn't want my ssh-agent
> pages holding private SSH keys to be reused by my web browser which then gets
> exploited :-D

Well, if your web browser and your ssh session are running under the same 
user, and your web browser gets hacked, someone is basically logged in in 
your system under your user. It can ptrace-attach your ssh-agent and take 
it from there. It can also read your .ssh directory for what it matter and 
more simply ;)



> It would also be a massive information leak under SELinux.  To
> fix it properly according to the SELinux model you would need to tag each page
> with a label immediately after it's freed and then do an access-vector-check
> against the old page and the new process before allowing reuse.  On the other
> hand, that would probably be at least as expensive as zeroing the page.

SeLinux could use a simple hook and disable the feature per-task and 
globally. Just assign an invalid UID to mm->owner_uid and pages will never 
be used. It could also hook mmap and clear off MAP_NOZERO.



- Davide



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] MAP_NOZERO v2 - VM_NOZERO/MAP_NOZERO early summer madness
  2007-06-29  2:57 ` Kyle Moffett
  2007-06-29  3:04   ` Rik van Riel
  2007-06-29  5:20   ` Davide Libenzi
@ 2007-06-29 19:39   ` Andy Isaacson
  2007-06-29 20:12     ` Davide Libenzi
  2 siblings, 1 reply; 23+ messages in thread
From: Andy Isaacson @ 2007-06-29 19:39 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: Davide Libenzi, Linux Kernel Mailing List, Rik van Riel

On Thu, Jun 28, 2007 at 10:57:00PM -0400, Kyle Moffett wrote:
> On Jun 28, 2007, at 14:49:24, Davide Libenzi wrote:
> >So I implemented a rather quick hack that introduces a new mmap()  
> >flag MAP_NOZERO (only valid for anonymous mappings) and the  vma   
> >counter-part VM_NOZERO. Also, a new sys_brk2() has been introduced  
> >to accept a new flags  parameter. A brief description of the  
> >patches follows in the next emails.
> 
> Hmm, sounds like this would also need a "MAP_NOREUSE" flag of some  
> kind for security sensitive applications.  Basically, I wouldn't want  
> my ssh-agent pages holding private SSH keys to be reused by my web  
> browser which then gets exploited :-D.

PGP at least (and I think GPG still) did overwrite keys before calling
free(), and attempted to use mlock().  Looks like ssh-agent doesn't use
mlock -- at least it hasn't in this case:
% grep Lck /proc/`pidof ssh-agent`/status
VmLck:         0 kB
% ulimit -a | grep lock
file size (blocks)         unlimited
core file size (blocks)    0
locked-in-memory size (kb) 32
file locks                 unlimited

Requiring security-sensitive apps to use a new flag to get safe behavior
is dangerous.  Better to be safe by default and turn on the
less-safe-but-faster behavior for the cases that benefit from it.

> It would also be a massive  
> information leak under SELinux.  To fix it properly according to the  
> SELinux model you would need to tag each page with a label  
> immediately after it's freed and then do an access-vector-check  
> against the old page and the new process before allowing reuse.  On  
> the other hand, that would probably be at least as expensive as  
> zeroing the page.

I still think that using uid in mm_struct is wrong, and some kind of
abstraction is required.  I called this "free pool" in
<20070628061911.GA16986@hexapodia.org>, but I think that name is
misleading -- I am not proposing that this should be part of the
management of free pages, but should be a label which abstracts "safe to
share freed pages among" groups.  Then different SELinux protection
domains would simply have different labels.

-andy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] MAP_NOZERO v2 - VM_NOZERO/MAP_NOZERO early summer madness
  2007-06-29 19:39   ` Andy Isaacson
@ 2007-06-29 20:12     ` Davide Libenzi
  2007-06-29 23:48       ` Kyle Moffett
  0 siblings, 1 reply; 23+ messages in thread
From: Davide Libenzi @ 2007-06-29 20:12 UTC (permalink / raw)
  To: Andy Isaacson; +Cc: Kyle Moffett, Linux Kernel Mailing List, Rik van Riel

On Fri, 29 Jun 2007, Andy Isaacson wrote:

> On Thu, Jun 28, 2007 at 10:57:00PM -0400, Kyle Moffett wrote:
> > On Jun 28, 2007, at 14:49:24, Davide Libenzi wrote:
> > >So I implemented a rather quick hack that introduces a new mmap()  
> > >flag MAP_NOZERO (only valid for anonymous mappings) and the  vma   
> > >counter-part VM_NOZERO. Also, a new sys_brk2() has been introduced  
> > >to accept a new flags  parameter. A brief description of the  
> > >patches follows in the next emails.
> > 
> > Hmm, sounds like this would also need a "MAP_NOREUSE" flag of some  
> > kind for security sensitive applications.  Basically, I wouldn't want  
> > my ssh-agent pages holding private SSH keys to be reused by my web  
> > browser which then gets exploited :-D.
> 
> PGP at least (and I think GPG still) did overwrite keys before calling
> free(), and attempted to use mlock().  Looks like ssh-agent doesn't use
> mlock -- at least it hasn't in this case:
> % grep Lck /proc/`pidof ssh-agent`/status
> VmLck:         0 kB
> % ulimit -a | grep lock
> file size (blocks)         unlimited
> core file size (blocks)    0
> locked-in-memory size (kb) 32
> file locks                 unlimited
> 
> Requiring security-sensitive apps to use a new flag to get safe behavior
> is dangerous.  Better to be safe by default and turn on the
> less-safe-but-faster behavior for the cases that benefit from it.

Can you better explain what MAP_NOZERO would alter in such case?



> I still think that using uid in mm_struct is wrong, and some kind of
> abstraction is required.  I called this "free pool" in
> <20070628061911.GA16986@hexapodia.org>, but I think that name is
> misleading -- I am not proposing that this should be part of the
> management of free pages, but should be a label which abstracts "safe to
> share freed pages among" groups.  Then different SELinux protection
> domains would simply have different labels.

I think I answered this one at least a couple of times, but anyawy. First, 
that can be whatever cookie we choose. At the moment UID is used because 
it makes easier a fit into _mapcount. Second, SeLinux will be able to 
disable the feature on a per-process base, or globally.
Anything else?



- Davide



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] MAP_NOZERO v2 - VM_NOZERO/MAP_NOZERO early summer madness
  2007-06-29 20:12     ` Davide Libenzi
@ 2007-06-29 23:48       ` Kyle Moffett
  2007-06-30 19:03         ` Davide Libenzi
  0 siblings, 1 reply; 23+ messages in thread
From: Kyle Moffett @ 2007-06-29 23:48 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Andy Isaacson, Linux Kernel Mailing List, Rik van Riel

On Jun 29, 2007, at 16:12:58, Davide Libenzi wrote:
> On Fri, 29 Jun 2007, Andy Isaacson wrote:
>> I still think that using uid in mm_struct is wrong, and some kind  
>> of abstraction is required.  I called this "free pool" in  
>> <20070628061911.GA16986@hexapodia.org>, but I think that name is  
>> misleading -- I am not proposing that this should be part of the  
>> management of free pages, but should be a label which abstracts  
>> "safe to share freed pages among" groups.  Then different SELinux  
>> protection domains would simply have different labels.
>
> I think I answered this one at least a couple of times, but anyawy.  
> First, that can be whatever cookie we choose. At the moment UID is  
> used because it makes easier a fit into _mapcount. Second, SeLinux  
> will be able to  disable the feature on a per-process base, or  
> globally.
>
> Anything else?

Well I would be very interested in actually being able to use this  
feature under SELinux, I think that just the underlying "can-I-use- 
this-page" logic needs modification.  Maybe "MAP_REUSABLE"?  That  
would both imply that we can accept reused (IE: nonzeroed) memory  
*AND* that the current page may be reused (IE: remapped without  
zeroing), although you could conceivably have one flag for each  
case.  The userspace allocator should be able to (when prompted by  
MAP_REUSABLE) look in a page "pool" of sorts before falling back to a  
zeroed page.  The pool would be created for a given "key" the first  
time it unmaps MAP_REUSABLE pages, possibly using the memory freed by  
said unmap.

The real trick is how to define the "key".  The default, without  
LSMs, should be something like the UID.  SELinux, on the other hand,  
would probably want to use some kind of hash of the label as the  
"key", (and store the label in each pool, as well).  That way SELinux  
could have a simple access-vector check for process:reusepage, as  
well as an access-vector check and type transition for  
"freereusablepage".  Then a policy could allow most user processes to  
unconditionally reuse pages (which would end up in the access-vector- 
cache and therefore be fast), while security-sensitive processes like  
ssh-agent could neither reuse pages nor have their pages reused, even  
if they request it.

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] MAP_NOZERO v2 - VM_NOZERO/MAP_NOZERO early summer madness
  2007-06-29 23:48       ` Kyle Moffett
@ 2007-06-30 19:03         ` Davide Libenzi
  2007-06-30 23:46           ` Kyle Moffett
  2007-07-02 18:38           ` Andy Isaacson
  0 siblings, 2 replies; 23+ messages in thread
From: Davide Libenzi @ 2007-06-30 19:03 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: Andy Isaacson, Linux Kernel Mailing List, Rik van Riel

On Fri, 29 Jun 2007, Kyle Moffett wrote:

> Well I would be very interested in actually being able to use this feature
> under SELinux, I think that just the underlying "can-I-use-this-page" logic
> needs modification.  Maybe "MAP_REUSABLE"?  That would both imply that we can
> accept reused (IE: nonzeroed) memory *AND* that the current page may be reused
> (IE: remapped without zeroing), although you could conceivably have one flag
> for each case.  The userspace allocator should be able to (when prompted by
> MAP_REUSABLE) look in a page "pool" of sorts before falling back to a zeroed
> page.  The pool would be created for a given "key" the first time it unmaps
> MAP_REUSABLE pages, possibly using the memory freed by said unmap.

Hmm, why would you need MAP_REUSABLE? If a page is visible at any time for 
a given UID, and you have a login under such UID, you can fetch the content 
of the page at any time (ie, ptrace_attach, gdb, ...).   And if you are 
not under a UID login, you'll never get to see that page. ATM not even the 
classical "root can see everything" rule is applied.
I think the focus should be to find a case where under the currently 
implemented policy for MAP_NOZERO, MAP_NOZERO represent a loss of security 
WRT no MAP_NOZERO.   I have not been able to find one yet, although Andy 
found a potential one in the setuid+exec/ptrace race (fixed by a patch 
that should IMO go in in any case).
The more ppl think about breaking it, the better it is.

> The real trick is how to define the "key".  The default, without LSMs, should
> be something like the UID.  SELinux, on the other hand, would probably want to
> use some kind of hash of the label as the "key", (and store the label in each
> pool, as well).  That way SELinux could have a simple access-vector check for
> process:reusepage, as well as an access-vector check and type transition for
> "freereusablepage".  Then a policy could allow most user processes to
> unconditionally reuse pages (which would end up in the access-vector-cache and
> therefore be fast), while security-sensitive processes like ssh-agent could
> neither reuse pages nor have their pages reused, even if they request it.

It is very easy, assuming a simple unsigned long cookie is enough for 
SeLinux, to fit in the current MAP_NOZERO.   Well, we have to change 
something in the current struct page _mapcount reuse, but that doable.
There is one line to change, that is the line where the cookie is assigned 
to the mm_struct.   From there on, it's all handled the same way.
If the hash is any longer than unsigned long, I don't really think is ever 
gonna fly, being it stored inside a struct page.
Also, if you start introducing "keys" whose domain is wider than a single 
user, then you'll run for sure in some sort of problem.   This is why the 
current code does not even try to go into the "group" policies.
IMO all this may have some sense if 1) it is very simple 2) limits code 
and data structures bloat to very little or nothing 3) stays all the way 
to the safe side, at the cost of losing some possible valid pages to be 
recycled.   After all, MAP_NOZERO is an hint and not a requirement.
I think that the current method (either UID or KEY based) is simple, does 
not add extra management pools and, *so far*, is not showing security 
leaks.

- Davide

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] MAP_NOZERO v2 - VM_NOZERO/MAP_NOZERO early summer madness
  2007-06-30 19:03         ` Davide Libenzi
@ 2007-06-30 23:46           ` Kyle Moffett
  2007-06-30 23:57             ` Davide Libenzi
  2007-07-02 18:38           ` Andy Isaacson
  1 sibling, 1 reply; 23+ messages in thread
From: Kyle Moffett @ 2007-06-30 23:46 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Andy Isaacson, Linux Kernel Mailing List, Rik van Riel

On Jun 30, 2007, at 15:03:07, Davide Libenzi wrote:
> Hmm, why would you need MAP_REUSABLE? If a page is visible at any  
> time for a given UID, and you have a login under such UID, you can  
> fetch the content of the page at any time (ie, ptrace_attach,  
> gdb, ...).

Not under SELinux or other LSMs.  I suppose those could live without  
a 15% performance improvement in some workloads, but it would be nice  
if we could avoid it.  Essentially, UID is a really poor way to  
define process-security-equivalence classes in some systems.  If you  
really want to define such classes then you need to add LSM hooks to  
manage the equivalence classes.

> I think the focus should be to find a case where under the  
> currently implemented policy for MAP_NOZERO, MAP_NOZERO represent a  
> loss of security WRT no MAP_NOZERO.

Very simple case:
SELinux is turned on, an s9 (IE: TOP_SECRET) process calls free(),  
and an s3 (IE: UNCLASSIFIED) process calls malloc(), getting the data  
from the TOP_SECRET process.

>> The real trick is how to define the "key".  The default, without  
>> LSMs, should be something like the UID.  SELinux, on the other  
>> hand, would probably want to use some kind of hash of the label as  
>> the "key", (and store the label in each pool, as well).  That way  
>> SELinux could have a simple access-vector check for  
>> process:reusepage, as well as an access-vector check and type  
>> transition for "freereusablepage".  Then a policy could allow most  
>> user processes to unconditionally reuse pages (which would end up  
>> in the access-vector-cache and therefore be fast), while security- 
>> sensitive processes like ssh-agent could neither reuse pages nor  
>> have their pages reused, even if they request it.
>
> It is very easy, assuming a simple unsigned long cookie is enough  
> for SeLinux, to fit in the current MAP_NOZERO.   Well, we have to  
> change something in the current struct page _mapcount reuse, but  
> that doable. There is one line to change, that is the line where  
> the cookie is assigned  to the mm_struct.

I think if you create the concept of a "process equivalence class"  
and add an LSM hook for it, then the unsigned long could just store  
which equivalence class the page is in.  The default without LSMs  
would be to use mutual-ptraceability as the equivalence class (IE:  
the UID with a proviso for SUID binaries).  LSMs should be able to  
create a process_equivalence_class hook which when called returns an  
unsigned long identifying the "equivalence class" (IE: pool) into  
which the page is placed when freed (or ((unsigned long)-1) to  
forcibly zero the page).  When a process requests a maybe-not-zeroed  
page, the LSM hook would be called again to determine what  
equivalence class should be used, (or ((unsigned long)-1) for dont- 
use-any-class).

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] MAP_NOZERO v2 - VM_NOZERO/MAP_NOZERO early summer madness
  2007-06-30 23:46           ` Kyle Moffett
@ 2007-06-30 23:57             ` Davide Libenzi
  2007-07-01  0:21               ` Kyle Moffett
  0 siblings, 1 reply; 23+ messages in thread
From: Davide Libenzi @ 2007-06-30 23:57 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: Andy Isaacson, Linux Kernel Mailing List, Rik van Riel

On Sat, 30 Jun 2007, Kyle Moffett wrote:

> On Jun 30, 2007, at 15:03:07, Davide Libenzi wrote:
> > Hmm, why would you need MAP_REUSABLE? If a page is visible at any time for a
> > given UID, and you have a login under such UID, you can fetch the content of
> > the page at any time (ie, ptrace_attach, gdb, ...).
> 
> Not under SELinux or other LSMs.  I suppose those could live without a 15%
> performance improvement in some workloads, but it would be nice if we could
> avoid it.  Essentially, UID is a really poor way to define
> process-security-equivalence classes in some systems.  If you really want to
> define such classes then you need to add LSM hooks to manage the equivalence
> classes.
> 
> > I think the focus should be to find a case where under the currently
> > implemented policy for MAP_NOZERO, MAP_NOZERO represent a loss of security
> > WRT no MAP_NOZERO.
> 
> Very simple case:
> SELinux is turned on, an s9 (IE: TOP_SECRET) process calls free(), and an s3
> (IE: UNCLASSIFIED) process calls malloc(), getting the data from the
> TOP_SECRET process.

Note that you use *s3* and *s9*. Those will be two different context cookies. 
SeLinux will have its own way to set the cookie in the mm_struct, to *s3* 
in one case, and to *s9* in the other case. This will make things so that 
they'll never see each other pages.




> > It is very easy, assuming a simple unsigned long cookie is enough for
> > SeLinux, to fit in the current MAP_NOZERO.   Well, we have to change
> > something in the current struct page _mapcount reuse, but that doable. There
> > is one line to change, that is the line where the cookie is assigned  to the
> > mm_struct.
> 
> I think if you create the concept of a "process equivalence class" and add an
> LSM hook for it, then the unsigned long could just store which equivalence
> class the page is in.  The default without LSMs would be to use
> mutual-ptraceability as the equivalence class (IE: the UID with a proviso for
> SUID binaries).  LSMs should be able to create a process_equivalence_class
> hook which when called returns an unsigned long identifying the "equivalence
> class" (IE: pool) into which the page is placed when freed (or ((unsigned
> long)-1) to forcibly zero the page).  When a process requests a
> maybe-not-zeroed page, the LSM hook would be called again to determine what
> equivalence class should be used, (or ((unsigned long)-1) for
> dont-use-any-class).

I'd rather prefer to not create anything, and to not add anything. SeLinux 
can set the cookie as it likes, and it can also disable the feature. Since 
SeLinux is definitely not the common case, I'd prefer to keep the simple 
unsigned long cookie abstraction, and let them handle however they like it.



- Davide



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] MAP_NOZERO v2 - VM_NOZERO/MAP_NOZERO early summer madness
  2007-06-30 23:57             ` Davide Libenzi
@ 2007-07-01  0:21               ` Kyle Moffett
  2007-07-01  4:25                 ` Davide Libenzi
  2007-07-02 19:00                 ` Andy Isaacson
  0 siblings, 2 replies; 23+ messages in thread
From: Kyle Moffett @ 2007-07-01  0:21 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Andy Isaacson, Linux Kernel Mailing List, Rik van Riel

On Jun 30, 2007, at 19:57:18, Davide Libenzi wrote:
> On Sat, 30 Jun 2007, Kyle Moffett wrote:
>> Very simple case:  SELinux is turned on, an s9 (IE: TOP_SECRET)  
>> process calls free(), and an s3 (IE: UNCLASSIFIED) process calls  
>> malloc(), getting the data from the TOP_SECRET process.
>
> Note that you use *s3* and *s9*. Those will be two different  
> context cookies.  SeLinux will have its own way to set the cookie  
> in the mm_struct, to *s3* in one case, and to *s9* in the other  
> case. This will make things so that they'll never see each other  
> pages.

Except s3 and s9 aren't complete cookies.  A complete label might be:  
"system_u:system_r:apache2_t:s3" for an unclassified apache web- 
server process, or "kmoffett_u:secadmin_r:usershell_t:s9" for me  
logged in with a top-secret label in my security-administrator role.   
That's why you'd need to call an LSM hook to get a unique identifier,  
as the LSM would actually need to allocate identifiers for  
equivalence classes.  Secondly, processes may change labels as they  
run, so you couldn't just call it once and cache the result, you  
would need to call it for every freed page (or every re-use of a page).

>>> It is very easy, assuming a simple unsigned long cookie is enough  
>>> for
>>> SeLinux, to fit in the current MAP_NOZERO.   Well, we have to change
>>> something in the current struct page _mapcount reuse, but that  
>>> doable. There
>>> is one line to change, that is the line where the cookie is  
>>> assigned  to the
>>> mm_struct.

I do think a single unsigned long cookie would work, as long as you  
have an LSM hook:

int process_equivalence_class(struct task_struct *task, unsigned long  
*result);

If it returns 0 then you can use the result as a page cookie,  
otherwise you can't reuse pages for this process at all.

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] MAP_NOZERO v2 - VM_NOZERO/MAP_NOZERO early summer madness
  2007-07-01  0:21               ` Kyle Moffett
@ 2007-07-01  4:25                 ` Davide Libenzi
  2007-07-02 19:00                 ` Andy Isaacson
  1 sibling, 0 replies; 23+ messages in thread
From: Davide Libenzi @ 2007-07-01  4:25 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: Andy Isaacson, Linux Kernel Mailing List, Rik van Riel

On Sat, 30 Jun 2007, Kyle Moffett wrote:

> On Jun 30, 2007, at 19:57:18, Davide Libenzi wrote:
> > On Sat, 30 Jun 2007, Kyle Moffett wrote:
> > > Very simple case:  SELinux is turned on, an s9 (IE: TOP_SECRET) process
> > > calls free(), and an s3 (IE: UNCLASSIFIED) process calls malloc(), getting
> > > the data from the TOP_SECRET process.
> > 
> > Note that you use *s3* and *s9*. Those will be two different context
> > cookies.  SeLinux will have its own way to set the cookie in the mm_struct,
> > to *s3* in one case, and to *s9* in the other case. This will make things so
> > that they'll never see each other pages.
> 
> Except s3 and s9 aren't complete cookies.  A complete label might be:
> "system_u:system_r:apache2_t:s3" for an unclassified apache web-server
> process, or "kmoffett_u:secadmin_r:usershell_t:s9" for me logged in with a
> top-secret label in my security-administrator role.  That's why you'd need to
> call an LSM hook to get a unique identifier, as the LSM would actually need to
> allocate identifiers for equivalence classes.  Secondly, processes may change
> labels as they run, so you couldn't just call it once and cache the result,
> you would need to call it for every freed page (or every re-use of a page).

It does not matter what the UID contain and how it is generated. The focus 
now should be to verify that different UIDs do not see other UIDs pages. 
Once that is verified to be true, and if SeLinux actually wants to use 
it, we'll find a way so that the other 99% of Linux users won't pay the 
price of the abstraction needed to accomplish what they need for this to 
work for them. Ok?



- Davide



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] MAP_NOZERO v2 - VM_NOZERO/MAP_NOZERO early summer madness
  2007-07-01  0:21               ` Kyle Moffett
  2007-07-01  4:25                 ` Davide Libenzi
@ 2007-07-02 19:00                 ` Andy Isaacson
  2007-07-02 19:03                   ` Rik van Riel
  1 sibling, 1 reply; 23+ messages in thread
From: Andy Isaacson @ 2007-07-02 19:00 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: Davide Libenzi, Linux Kernel Mailing List, Rik van Riel

On Sat, Jun 30, 2007 at 08:21:52PM -0400, Kyle Moffett wrote:
> That's why you'd need to call an LSM hook to get a unique identifier,  
> as the LSM would actually need to allocate identifiers for  
> equivalence classes.  Secondly, processes may change labels as they  
> run, so you couldn't just call it once and cache the result, you  
> would need to call it for every freed page (or every re-use of a page).

Davide's patch adds a owner_uid field to mm_struct.  Assuming that turns
into a "mm security equivalence class identifier", the LSM can simply
update it when a label-change-event occurs.  No need to call out to
(potentially heavyweight!) LSM code in page allocation critical paths.

I'm a bit concerned that tracking the equivalence classes will get
expensive.  I think you can end up with quadratic explosion in the worst
case (every user using every permutation of LSM bits).

-andy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] MAP_NOZERO v2 - VM_NOZERO/MAP_NOZERO early summer madness
  2007-07-02 19:00                 ` Andy Isaacson
@ 2007-07-02 19:03                   ` Rik van Riel
  2007-07-02 19:06                     ` Ulrich Drepper
  0 siblings, 1 reply; 23+ messages in thread
From: Rik van Riel @ 2007-07-02 19:03 UTC (permalink / raw)
  To: Andy Isaacson; +Cc: Kyle Moffett, Davide Libenzi, Linux Kernel Mailing List

Andy Isaacson wrote:
> On Sat, Jun 30, 2007 at 08:21:52PM -0400, Kyle Moffett wrote:
>> That's why you'd need to call an LSM hook to get a unique identifier,  
>> as the LSM would actually need to allocate identifiers for  
>> equivalence classes.  Secondly, processes may change labels as they  
>> run, so you couldn't just call it once and cache the result, you  
>> would need to call it for every freed page (or every re-use of a page).
> 
> Davide's patch adds a owner_uid field to mm_struct.  Assuming that turns
> into a "mm security equivalence class identifier", the LSM can simply
> update it when a label-change-event occurs.  No need to call out to
> (potentially heavyweight!) LSM code in page allocation critical paths.
> 
> I'm a bit concerned that tracking the equivalence classes will get
> expensive.  I think you can end up with quadratic explosion in the worst
> case (every user using every permutation of LSM bits).

That should not happen.  The default SELinux configuration
in Fedora (and Debian?) runs a few daemons in their own
restricted modes and has most of the system running in
unconfined_t, including the majority of user programs.

-- 
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] MAP_NOZERO v2 - VM_NOZERO/MAP_NOZERO early summer madness
  2007-07-02 19:03                   ` Rik van Riel
@ 2007-07-02 19:06                     ` Ulrich Drepper
  2007-07-02 22:46                       ` Davide Libenzi
  0 siblings, 1 reply; 23+ messages in thread
From: Ulrich Drepper @ 2007-07-02 19:06 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andy Isaacson, Kyle Moffett, Davide Libenzi,
	Linux Kernel Mailing List

On 7/2/07, Rik van Riel <riel@redhat.com> wrote:
> That should not happen.  The default SELinux configuration
> in Fedora (and Debian?) runs a few daemons in their own
> restricted modes and has most of the system running in
> unconfined_t, including the majority of user programs.

This is the state as of F7.  This will change hopefully soon.
Programs like firefox run by normal users must be confined, to. Any
tests using security must be fast, it's not something which is done
only for a few apps.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] MAP_NOZERO v2 - VM_NOZERO/MAP_NOZERO early summer madness
  2007-07-02 19:06                     ` Ulrich Drepper
@ 2007-07-02 22:46                       ` Davide Libenzi
  2007-07-02 22:55                         ` Rik van Riel
  0 siblings, 1 reply; 23+ messages in thread
From: Davide Libenzi @ 2007-07-02 22:46 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: Rik van Riel, Andy Isaacson, Kyle Moffett,
	Linux Kernel Mailing List

On Mon, 2 Jul 2007, Ulrich Drepper wrote:

> On 7/2/07, Rik van Riel <riel@redhat.com> wrote:
> > That should not happen.  The default SELinux configuration
> > in Fedora (and Debian?) runs a few daemons in their own
> > restricted modes and has most of the system running in
> > unconfined_t, including the majority of user programs.
> 
> This is the state as of F7.  This will change hopefully soon.
> Programs like firefox run by normal users must be confined, to. Any
> tests using security must be fast, it's not something which is done
> only for a few apps.

The strong requirement would be that the cookie is not a bit longer than 
sizeof(unsigned long).
For the "equality" check, this better be "==", although it could be 
abstracted in a function/macro that SeLinux implements in a different way.
But this "equality" check is done a page-fault time, so it better be 
pretty quick (otherwise if they bloat that path, they probably be better 
of not using MAP_NOZERO at all).

- Davide

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] MAP_NOZERO v2 - VM_NOZERO/MAP_NOZERO early summer madness
  2007-07-02 22:46                       ` Davide Libenzi
@ 2007-07-02 22:55                         ` Rik van Riel
  2007-07-02 23:46                           ` Davide Libenzi
  2007-07-04 21:53                           ` Andy Isaacson
  0 siblings, 2 replies; 23+ messages in thread
From: Rik van Riel @ 2007-07-02 22:55 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Ulrich Drepper, Andy Isaacson, Kyle Moffett,
	Linux Kernel Mailing List

Davide Libenzi wrote:
> On Mon, 2 Jul 2007, Ulrich Drepper wrote:
> 
>> On 7/2/07, Rik van Riel <riel@redhat.com> wrote:
>>> That should not happen.  The default SELinux configuration
>>> in Fedora (and Debian?) runs a few daemons in their own
>>> restricted modes and has most of the system running in
>>> unconfined_t, including the majority of user programs.
>> This is the state as of F7.  This will change hopefully soon.
>> Programs like firefox run by normal users must be confined, to. Any
>> tests using security must be fast, it's not something which is done
>> only for a few apps.
> 
> The strong requirement would be that the cookie is not a bit longer than 
> sizeof(unsigned long).

You could easily replace the cookie with a pointer to a free
page pool.

-- 
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] MAP_NOZERO v2 - VM_NOZERO/MAP_NOZERO early summer madness
  2007-07-02 22:55                         ` Rik van Riel
@ 2007-07-02 23:46                           ` Davide Libenzi
  2007-07-04 21:53                           ` Andy Isaacson
  1 sibling, 0 replies; 23+ messages in thread
From: Davide Libenzi @ 2007-07-02 23:46 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Ulrich Drepper, Andy Isaacson, Kyle Moffett,
	Linux Kernel Mailing List

On Mon, 2 Jul 2007, Rik van Riel wrote:

> Davide Libenzi wrote:
> > On Mon, 2 Jul 2007, Ulrich Drepper wrote:
> > 
> > > On 7/2/07, Rik van Riel <riel@redhat.com> wrote:
> > > > That should not happen.  The default SELinux configuration
> > > > in Fedora (and Debian?) runs a few daemons in their own
> > > > restricted modes and has most of the system running in
> > > > unconfined_t, including the majority of user programs.
> > > This is the state as of F7.  This will change hopefully soon.
> > > Programs like firefox run by normal users must be confined, to. Any
> > > tests using security must be fast, it's not something which is done
> > > only for a few apps.
> > 
> > The strong requirement would be that the cookie is not a bit longer than
> > sizeof(unsigned long).
> 
> You could easily replace the cookie with a pointer to a free
> page pool.

You talked about this pool even before, but honestly I don't see how it 
can help.
Pools requires management (mainly flush under pressure), and it wouldn't 
really do any benefit. Don't we already have enough pools in the 
allocation path?
The pool has to be UID (or cookie/whatever) based [1], so on top of the 
pool lookup/management code, you'll have another lock to be taken (the 
pool one). So using the UID (cookie) is simpler, faster and requires no 
extra code.
A simple numeric cookie virtually creates "pools" w/out requiring extra 
code. Yes, having really separate pool helps the recycle efficency, but 
I'm not sure that's worth the extra code.

[1] An mm_struct based pool would be clearly useless

- Davide

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] MAP_NOZERO v2 - VM_NOZERO/MAP_NOZERO early summer madness
  2007-07-02 22:55                         ` Rik van Riel
  2007-07-02 23:46                           ` Davide Libenzi
@ 2007-07-04 21:53                           ` Andy Isaacson
  2007-07-04 23:42                             ` Davide Libenzi
  1 sibling, 1 reply; 23+ messages in thread
From: Andy Isaacson @ 2007-07-04 21:53 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Rik van Riel, Ulrich Drepper, Kyle Moffett,
	Linux Kernel Mailing List

On Mon, Jul 02, 2007 at 06:55:40PM -0400, Rik van Riel wrote:
> You could easily replace the cookie with a pointer to a free
> page pool.

It just occurred to me that something like this is *required* to get the
performance benefit from MAP_NOZERO on a busy system.  With Davide's
current proposal, if there are N jobs with different "nozero cookies"
busy allocating and deallocating pages on a single-NUMA-node system,
then there's only a 1/N chance that the page returned by __alloc_pages
will have the correct cookie, so (N-1)/N percent of the time MAP_NOZERO
will have no positive effect -- 90% of the time for the case of N=10.

(Of course on NUMA systems node affinity will probably make the
situation a bit better.)

So I'm afraid that it sounds like additional complexity would be
required so that __alloc_pages could track and preferentially return
pages with the correct "nozero cookie".

In a mostly unrelated complaint, I note that having a function named
"alloc_zeroed_page_vma" which returns a potentially nonzero page is, um,
unintuitive.  It needs a name which does not claim it returns zeroed
pages.  I'm no good at names, but perhaps "alloc_available_page_vma"?

-andy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] MAP_NOZERO v2 - VM_NOZERO/MAP_NOZERO early summer madness
  2007-07-04 21:53                           ` Andy Isaacson
@ 2007-07-04 23:42                             ` Davide Libenzi
  0 siblings, 0 replies; 23+ messages in thread
From: Davide Libenzi @ 2007-07-04 23:42 UTC (permalink / raw)
  To: Andy Isaacson
  Cc: Rik van Riel, Ulrich Drepper, Kyle Moffett,
	Linux Kernel Mailing List

On Wed, 4 Jul 2007, Andy Isaacson wrote:

> On Mon, Jul 02, 2007 at 06:55:40PM -0400, Rik van Riel wrote:
> > You could easily replace the cookie with a pointer to a free
> > page pool.
> 
> It just occurred to me that something like this is *required* to get the
> performance benefit from MAP_NOZERO on a busy system.  With Davide's
> current proposal, if there are N jobs with different "nozero cookies"
> busy allocating and deallocating pages on a single-NUMA-node system,
> then there's only a 1/N chance that the page returned by __alloc_pages
> will have the correct cookie, so (N-1)/N percent of the time MAP_NOZERO
> will have no positive effect -- 90% of the time for the case of N=10.

That can indeed happen. If you have a system busy with pressure coming 
from many different users, the recycle efficency goes down proportionally 
with it.
As I said before, I prefer a smaller and less intrusive code that gives 
good recycle efficency for the mejority of systems, WRT the idea of 
dedicated pools.
But maybe I'm wrong, and Rik or you can show up with a patch implementing 
pools that does not hurt code, structure sizes and performances.

> In a mostly unrelated complaint, I note that having a function named
> "alloc_zeroed_page_vma" which returns a potentially nonzero page is, um,
> unintuitive.  It needs a name which does not claim it returns zeroed
> pages.  I'm no good at names, but perhaps "alloc_available_page_vma"?

Yeah, the name "zeroed" is not perfect, but neither it is "available". The 
name should give the idea that the page is either from the same ID/cookie 
or it is zeroed. That naming should be applied back to 
alloc_zeroed_user_highpage(), __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE, ...

- Davide

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] MAP_NOZERO v2 - VM_NOZERO/MAP_NOZERO early summer madness
  2007-06-30 19:03         ` Davide Libenzi
  2007-06-30 23:46           ` Kyle Moffett
@ 2007-07-02 18:38           ` Andy Isaacson
  2007-07-02 22:38             ` Davide Libenzi
  1 sibling, 1 reply; 23+ messages in thread
From: Andy Isaacson @ 2007-07-02 18:38 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Kyle Moffett, Linux Kernel Mailing List, Rik van Riel

On Sat, Jun 30, 2007 at 12:03:07PM -0700, Davide Libenzi wrote:
> I think the focus should be to find a case where under the currently 
> implemented policy for MAP_NOZERO, MAP_NOZERO represent a loss of security 
> WRT no MAP_NOZERO.   I have not been able to find one yet, although Andy 
> found a potential one in the setuid+exec/ptrace race (fixed by a patch 
> that should IMO go in in any case).

BTW, the ptrace variant of this issue is not a problem -- PTRACE_ATTACH
running as newuid gets EPERM when trying to attach at /* here */ below.

    setuid(newuid);
    /* here */
    exec(...);
    exit(1);

sys_setuid sets current->mm->dumpable = suid_dumpable, so unless the
admin asked for it, there is no risk WRT PTRACE_ATTACH.  However, this
risk vector does need to be considered when implementing MAP_NOZERO.

-andy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] MAP_NOZERO v2 - VM_NOZERO/MAP_NOZERO early summer madness
  2007-07-02 18:38           ` Andy Isaacson
@ 2007-07-02 22:38             ` Davide Libenzi
  0 siblings, 0 replies; 23+ messages in thread
From: Davide Libenzi @ 2007-07-02 22:38 UTC (permalink / raw)
  To: Andy Isaacson; +Cc: Kyle Moffett, Linux Kernel Mailing List, Rik van Riel

On Mon, 2 Jul 2007, Andy Isaacson wrote:

> On Sat, Jun 30, 2007 at 12:03:07PM -0700, Davide Libenzi wrote:
> > I think the focus should be to find a case where under the currently 
> > implemented policy for MAP_NOZERO, MAP_NOZERO represent a loss of security 
> > WRT no MAP_NOZERO.   I have not been able to find one yet, although Andy 
> > found a potential one in the setuid+exec/ptrace race (fixed by a patch 
> > that should IMO go in in any case).
> 
> BTW, the ptrace variant of this issue is not a problem -- PTRACE_ATTACH
> running as newuid gets EPERM when trying to attach at /* here */ below.
> 
>     setuid(newuid);
>     /* here */
>     exec(...);
>     exit(1);
> 
> sys_setuid sets current->mm->dumpable = suid_dumpable, so unless the
> admin asked for it, there is no risk WRT PTRACE_ATTACH.  However, this
> risk vector does need to be considered when implementing MAP_NOZERO.

Yes, I missed that. Ptrace is fine there. The 3 lines patch is still 
needed for MAP_NOZERO though.



- Davide



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2007-07-04 23:42 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-28 18:49 [patch 0/4] MAP_NOZERO v2 - VM_NOZERO/MAP_NOZERO early summer madness Davide Libenzi
2007-06-29  2:57 ` Kyle Moffett
2007-06-29  3:04   ` Rik van Riel
2007-06-29  5:09     ` Ulrich Drepper
2007-06-29  5:20   ` Davide Libenzi
2007-06-29 19:39   ` Andy Isaacson
2007-06-29 20:12     ` Davide Libenzi
2007-06-29 23:48       ` Kyle Moffett
2007-06-30 19:03         ` Davide Libenzi
2007-06-30 23:46           ` Kyle Moffett
2007-06-30 23:57             ` Davide Libenzi
2007-07-01  0:21               ` Kyle Moffett
2007-07-01  4:25                 ` Davide Libenzi
2007-07-02 19:00                 ` Andy Isaacson
2007-07-02 19:03                   ` Rik van Riel
2007-07-02 19:06                     ` Ulrich Drepper
2007-07-02 22:46                       ` Davide Libenzi
2007-07-02 22:55                         ` Rik van Riel
2007-07-02 23:46                           ` Davide Libenzi
2007-07-04 21:53                           ` Andy Isaacson
2007-07-04 23:42                             ` Davide Libenzi
2007-07-02 18:38           ` Andy Isaacson
2007-07-02 22:38             ` Davide Libenzi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox