* Re: [PATCH] kernel: reduce required permission for prctl_set_mm
2014-02-12 21:32 ` Andrew Morton
@ 2014-02-12 21:50 ` Kees Cook
2014-02-12 23:08 ` Andrew Vagin
2014-02-12 21:55 ` [CRIU] " Cyrill Gorcunov
2014-02-12 22:11 ` Andrew Vagin
2 siblings, 1 reply; 7+ messages in thread
From: Kees Cook @ 2014-02-12 21:50 UTC (permalink / raw)
To: Andrew Morton
Cc: Andrey Vagin, LKML, criu, Oleg Nesterov, Robin Holt, Al Viro,
Eric W. Biederman, Chen Gang, Stephen Rothwell, Pavel Emelyanov,
Aditya Kali, Michael Kerrisk
On Wed, Feb 12, 2014 at 1:32 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Wed, 12 Feb 2014 19:40:11 +0400 Andrey Vagin <avagin@openvz.org> wrote:
>
>> Currently prctl_set_mm requires the global CAP_SYS_RESOURCE,
>> this patch reduce requiremence to CAP_SYS_RESOURCE in the current
>> namespace.
>>
>> When we restore a task we need to set up text, data and data heap sizes
>> from userspace to the values a task had at checkpoint time.
>>
>> Currently we can not restore these parameters, if a task lives in
>> a non-root user name space, because it has no capabilities in the
>> parent namespace.
>>
>> prctl_set_mm() changes parameters of the current task and doesn't affect
>> other tasks.
>>
>> This patch affects the RLIMIT_DATA limit, because a consumtiuon is
>> calculated relatively to mm->end_data, mm->start_data, mm->start_brk.
>
> I can't for the life of me work out what you were trying to say here.
> Please fix and resend this paragraph?
>
>> rlim = rlimit(RLIMIT_DATA);
>> if (rlim < RLIM_INFINITY && (brk - mm->start_brk) +
>> (mm->end_data - mm->start_data) > rlim)
>> goto out;
>>
>> This limit affects calls to brk() and sbrk(), but it doesn't affect
>> mmap. So I think requirement of CAP_SYS_RESOURCE in the current
>> namespace is enough for this limit.
>>
>> ...
>>
>> Cc: security@kernel.org
>
> That list is for reporting kernel security bugs.
>
>>
>> --- a/kernel/sys.c
>> +++ b/kernel/sys.c
>> @@ -1701,7 +1701,7 @@ static int prctl_set_mm(int opt, unsigned long addr,
>> if (arg5 || (arg4 && opt != PR_SET_MM_AUXV))
>> return -EINVAL;
>>
>> - if (!capable(CAP_SYS_RESOURCE))
>> + if (!ns_capable(current_user_ns(), CAP_SYS_RESOURCE))
>> return -EPERM;
>>
>> if (opt == PR_SET_MM_EXE_FILE)
>
> This looks harmless.
I want to be convinced of this, but weakening this cap check seems
like an easy way for a process to hide itself trivially from the real
root user. It can change it's exe file link, and dodge RLIMIT_DATA by
changing the brk addresses. The whole reason this cap check was there
was to stop that kind of thing. Limiting it to a namespace isn't great
since USER_NS means unprivileged processes can enter a new NS as the
NS root user.
-Kees
--
Kees Cook
Chrome OS Security
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] kernel: reduce required permission for prctl_set_mm
2014-02-12 21:50 ` Kees Cook
@ 2014-02-12 23:08 ` Andrew Vagin
0 siblings, 0 replies; 7+ messages in thread
From: Andrew Vagin @ 2014-02-12 23:08 UTC (permalink / raw)
To: Kees Cook
Cc: Andrew Morton, Andrey Vagin, LKML, criu, Oleg Nesterov,
Robin Holt, Al Viro, Eric W. Biederman, Chen Gang,
Stephen Rothwell, Pavel Emelyanov, Aditya Kali, Michael Kerrisk
On Wed, Feb 12, 2014 at 01:50:35PM -0800, Kees Cook wrote:
> On Wed, Feb 12, 2014 at 1:32 PM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
> > On Wed, 12 Feb 2014 19:40:11 +0400 Andrey Vagin <avagin@openvz.org> wrote:
> >
> >> Currently prctl_set_mm requires the global CAP_SYS_RESOURCE,
> >> this patch reduce requiremence to CAP_SYS_RESOURCE in the current
> >> namespace.
> >>
> >> When we restore a task we need to set up text, data and data heap sizes
> >> from userspace to the values a task had at checkpoint time.
> >>
> >> Currently we can not restore these parameters, if a task lives in
> >> a non-root user name space, because it has no capabilities in the
> >> parent namespace.
> >>
> >> prctl_set_mm() changes parameters of the current task and doesn't affect
> >> other tasks.
> >>
> >> This patch affects the RLIMIT_DATA limit, because a consumtiuon is
> >> calculated relatively to mm->end_data, mm->start_data, mm->start_brk.
> >
> > I can't for the life of me work out what you were trying to say here.
> > Please fix and resend this paragraph?
> >
> >> rlim = rlimit(RLIMIT_DATA);
> >> if (rlim < RLIM_INFINITY && (brk - mm->start_brk) +
> >> (mm->end_data - mm->start_data) > rlim)
> >> goto out;
> >>
> >> This limit affects calls to brk() and sbrk(), but it doesn't affect
> >> mmap. So I think requirement of CAP_SYS_RESOURCE in the current
> >> namespace is enough for this limit.
> >>
> >> ...
> >>
> >> Cc: security@kernel.org
> >
> > That list is for reporting kernel security bugs.
> >
> >>
> >> --- a/kernel/sys.c
> >> +++ b/kernel/sys.c
> >> @@ -1701,7 +1701,7 @@ static int prctl_set_mm(int opt, unsigned long addr,
> >> if (arg5 || (arg4 && opt != PR_SET_MM_AUXV))
> >> return -EINVAL;
> >>
> >> - if (!capable(CAP_SYS_RESOURCE))
> >> + if (!ns_capable(current_user_ns(), CAP_SYS_RESOURCE))
> >> return -EPERM;
> >>
> >> if (opt == PR_SET_MM_EXE_FILE)
> >
> > This looks harmless.
>
> I want to be convinced of this, but weakening this cap check seems
> like an easy way for a process to hide itself trivially from the real
> root user. It can change it's exe file link, and dodge RLIMIT_DATA by
> changing the brk addresses. The whole reason this cap check was there
> was to stop that kind of thing. Limiting it to a namespace isn't great
> since USER_NS means unprivileged processes can enter a new NS as the
> NS root user.
All what you are describing here we are doing on restoring tasks. We
need a way how to restore these parameters. One of our targets is to be
able to dump and restore Linux Containers. All processes of a container
live in a separate set of namespaces.
I was thinking to restore these parameters before entering into userns,
but this idea failed, because a process can't enter in pidns, but pidns
must be created in userns...
>> It can change it's exe file link
We can change memory content with help of ptrace. So if we want to hide
a process, we can execute another process and inject our code into it.
It can be equivalent to changing exe file link. Yes, it's a bit
harder, but we can do that even without this patch.
>> dodge RLIMIT_DATA
This limit affects calls to brk(2) and sbrk(2). But a task can use mmap() to
allocate memory. How is this limit used?
Sorry if I miss something.
>
> -Kees
>
> --
> Kees Cook
> Chrome OS Security
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [CRIU] [PATCH] kernel: reduce required permission for prctl_set_mm
2014-02-12 21:32 ` Andrew Morton
2014-02-12 21:50 ` Kees Cook
@ 2014-02-12 21:55 ` Cyrill Gorcunov
2014-02-12 22:11 ` Andrew Vagin
2 siblings, 0 replies; 7+ messages in thread
From: Cyrill Gorcunov @ 2014-02-12 21:55 UTC (permalink / raw)
To: Andrew Morton
Cc: Andrey Vagin, Aditya Kali, Stephen Rothwell, Eric W. Biederman,
Kees Cook, Pavel Emelyanov, Chen Gang, linux-kernel,
Oleg Nesterov, criu, Robin Holt, Michael Kerrisk, Al Viro
On Wed, Feb 12, 2014 at 01:32:28PM -0800, Andrew Morton wrote:
> On Wed, 12 Feb 2014 19:40:11 +0400 Andrey Vagin <avagin@openvz.org> wrote:
>
> > Currently prctl_set_mm requires the global CAP_SYS_RESOURCE,
> > this patch reduce requiremence to CAP_SYS_RESOURCE in the current
> > namespace.
> >
> > When we restore a task we need to set up text, data and data heap sizes
> > from userspace to the values a task had at checkpoint time.
> >
> > Currently we can not restore these parameters, if a task lives in
> > a non-root user name space, because it has no capabilities in the
> > parent namespace.
> >
> > prctl_set_mm() changes parameters of the current task and doesn't affect
> > other tasks.
> >
> > This patch affects the RLIMIT_DATA limit, because a consumtiuon is
> > calculated relatively to mm->end_data, mm->start_data, mm->start_brk.
>
> I can't for the life of me work out what you were trying to say here.
> Please fix and resend this paragraph?
I guess Andrey wanted to say that with this prctl call we rely on
user that the data provided to assign mm members is somehow sane.
We do a basic checks here but still it is possible to write compele
crap into these fields if you have enough privileges. And this will
be not that scary because in worst scenarion the only thing one
may achieve is "weird" output in task statistics (but this won't
harm kernel itself anyhow).
Still the fields start_brk,end_data,start_data and start_brk are
involved into address computation inside sys_brk syscall. So
if we assume someone have set complete random/crap values into
the mm members pointed above -- he might screw own sys_brk
call. But again it won't affect the kernel itself only "current"
task is involved. Thus harmless.
>
> > rlim = rlimit(RLIMIT_DATA);
> > if (rlim < RLIM_INFINITY && (brk - mm->start_brk) +
> > (mm->end_data - mm->start_data) > rlim)
> > goto out;
> >
> > This limit affects calls to brk() and sbrk(), but it doesn't affect
> > mmap. So I think requirement of CAP_SYS_RESOURCE in the current
> > namespace is enough for this limit.
>
> This looks harmless.
>
> My relatively-up-to-date manpages don't mention prctl(PR_SET_MM). I
> see from http://marc.info/?l=linux-man&m=133132612704130&w=2 that
> manpage additions were prepared nearly three years ago. Michael, did
> this fall through a crack?
For sure your manpages are too old ;) On my fedora 19 PR_SET_MM
is pretty here.
[cyrill@moon ~] yum info man-pages
Loaded plugins: auto-update-debuginfo, langpacks, refresh-packagekit
Installed Packages
Name : man-pages
Arch : noarch
Version : 3.51
As to me, the patch looks good.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] kernel: reduce required permission for prctl_set_mm
2014-02-12 21:32 ` Andrew Morton
2014-02-12 21:50 ` Kees Cook
2014-02-12 21:55 ` [CRIU] " Cyrill Gorcunov
@ 2014-02-12 22:11 ` Andrew Vagin
2 siblings, 0 replies; 7+ messages in thread
From: Andrew Vagin @ 2014-02-12 22:11 UTC (permalink / raw)
To: Andrew Morton
Cc: Andrey Vagin, linux-kernel, criu, Oleg Nesterov, Al Viro,
Kees Cook, Eric W. Biederman, Stephen Rothwell, Pavel Emelyanov,
Aditya Kali, Michael Kerrisk
On Wed, Feb 12, 2014 at 01:32:28PM -0800, Andrew Morton wrote:
> On Wed, 12 Feb 2014 19:40:11 +0400 Andrey Vagin <avagin@openvz.org> wrote:
>
> > Currently prctl_set_mm requires the global CAP_SYS_RESOURCE,
> > this patch reduce requiremence to CAP_SYS_RESOURCE in the current
> > namespace.
> >
> > When we restore a task we need to set up text, data and data heap sizes
> > from userspace to the values a task had at checkpoint time.
> >
> > Currently we can not restore these parameters, if a task lives in
> > a non-root user name space, because it has no capabilities in the
> > parent namespace.
> >
> > prctl_set_mm() changes parameters of the current task and doesn't affect
> > other tasks.
> >
> > This patch affects the RLIMIT_DATA limit, because a consumtiuon is
> > calculated relatively to mm->end_data, mm->start_data, mm->start_brk.
>
> I can't for the life of me work out what you were trying to say here.
> Please fix and resend this paragraph?
A task can exceed the RLIMIT_DATA limit by changing mm->start_brk,
so this patch reduces required permission for RLIMIT_DATA too
>
> > rlim = rlimit(RLIMIT_DATA);
> > if (rlim < RLIM_INFINITY && (brk - mm->start_brk) +
> > (mm->end_data - mm->start_data) > rlim)
> > goto out;
> >
> > This limit affects calls to brk() and sbrk(), but it doesn't affect
> > mmap. So I think requirement of CAP_SYS_RESOURCE in the current
> > namespace is enough for this limit.
> >
> > ...
> >
> > Cc: security@kernel.org
>
> That list is for reporting kernel security bugs.
>
> >
> > --- a/kernel/sys.c
> > +++ b/kernel/sys.c
> > @@ -1701,7 +1701,7 @@ static int prctl_set_mm(int opt, unsigned long addr,
> > if (arg5 || (arg4 && opt != PR_SET_MM_AUXV))
> > return -EINVAL;
> >
> > - if (!capable(CAP_SYS_RESOURCE))
> > + if (!ns_capable(current_user_ns(), CAP_SYS_RESOURCE))
> > return -EPERM;
> >
> > if (opt == PR_SET_MM_EXE_FILE)
>
> This looks harmless.
>
> My relatively-up-to-date manpages don't mention prctl(PR_SET_MM). I
> see from http://marc.info/?l=linux-man&m=133132612704130&w=2 that
> manpage additions were prepared nearly three years ago. Michael, did
> this fall through a crack?
>
^ permalink raw reply [flat|nested] 7+ messages in thread