[PATCH v6] fs: clear file privilege bits when mmap writing

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v6] fs: clear file privilege bits when mmap writing
@ 2016-01-08 23:27 Kees Cook
  2016-01-09  4:28 ` Al Viro
  2016-01-10 15:48 ` Konstantin Khlebnikov
  0 siblings, 2 replies; 12+ messages in thread
From: Kees Cook @ 2016-01-08 23:27 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andy Lutomirski, Jan Kara, yalin wang, Willy Tarreau,
	Andrew Morton, linux-fsdevel, linux-arch, linux-api, linux-mm

Normally, when a user can modify a file that has setuid or setgid bits,
those bits are cleared when they are not the file owner or a member
of the group. This is enforced when using write and truncate but not
when writing to a shared mmap on the file. This could allow the file
writer to gain privileges by changing a binary without losing the
setuid/setgid/caps bits.

Changing the bits requires holding inode->i_mutex, so it cannot be done
during the page fault (due to mmap_sem being held during the fault). We
could do this during vm_mmap_pgoff, but that would need coverage in
mprotect as well, but to check for MAP_SHARED, we'd need to hold mmap_sem
again. We could clear at open() time, but it's possible things are
accidentally opening with O_RDWR and only reading. Better to clear on
close and error failures (i.e. an improvement over now, which is not
clearing at all).

Instead, detect the need to clear the bits during the page fault, and
actually remove the bits during final fput. Since the file was open for
writing, it wouldn't have been possible to execute it yet (ETXTBSY).

Signed-off-by: Kees Cook <keescook@chromium.org>
---
v6:
- clarify ETXTBSY situation in comments, luto
v5:
- add to f_flags instead, viro
- add i_mutex during __fput, jack
v4:
- delay removal instead of still needing mmap_sem for mprotect, yalin
v3:
- move outside of mmap_sem for real now, fengguang
- check return code of file_remove_privs, akpm
v2:
- move to mmap from fault handler, jack
---
 fs/file_table.c                  | 15 +++++++++++++++
 fs/open.c                        |  2 +-
 include/uapi/asm-generic/fcntl.h |  4 ++++
 mm/memory.c                      |  5 +++++
 4 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index ad17e05ebf95..ca11b86613cf 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -191,6 +191,21 @@ static void __fput(struct file *file)
 
 	might_sleep();
 
+	/*
+	 * XXX: This is a delayed removal of privs (we've already been
+	 * written to), since we must avoid mmap_sem. But a race shouldn't
+	 * be possible since when open for writing, execve() will fail
+	 * with ETXTBSY (via deny_write_access()). A remaining problem
+	 * is that since we've already been written to, we must ignore the
+	 * return value of file_remove_privs(), since we can't reject the
+	 * writes of the past.
+	 */
+	if (unlikely(file->f_flags & O_REMOVEPRIV)) {
+		mutex_lock(&inode->i_mutex);
+		file_remove_privs(file);
+		mutex_unlock(&inode->i_mutex);
+	}
+
 	fsnotify_close(file);
 	/*
 	 * The function eventpoll_release() should be the first called
diff --git a/fs/open.c b/fs/open.c
index b6f1e96a7c0b..89069d16ca80 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -895,7 +895,7 @@ static inline int build_open_flags(int flags, umode_t mode, struct open_flags *o
 		op->mode = 0;
 
 	/* Must never be set by userspace */
-	flags &= ~FMODE_NONOTIFY & ~O_CLOEXEC;
+	flags &= ~FMODE_NONOTIFY & ~O_CLOEXEC & ~O_REMOVEPRIV;
 
 	/*
 	 * O_SYNC is implemented as __O_SYNC|O_DSYNC.  As many places only
diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h
index e063effe0cc1..096c4b3afe6a 100644
--- a/include/uapi/asm-generic/fcntl.h
+++ b/include/uapi/asm-generic/fcntl.h
@@ -88,6 +88,10 @@
 #define __O_TMPFILE	020000000
 #endif
 
+#ifndef O_REMOVEPRIV
+#define O_REMOVEPRIV	040000000
+#endif
+
 /* a horrid kludge trying to make sure that this will fail on old kernels */
 #define O_TMPFILE (__O_TMPFILE | O_DIRECTORY)
 #define O_TMPFILE_MASK (__O_TMPFILE | O_DIRECTORY | O_CREAT)      
diff --git a/mm/memory.c b/mm/memory.c
index c387430f06c3..ad4188a8f279 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2036,6 +2036,11 @@ static inline int wp_page_reuse(struct mm_struct *mm,
 
 		if (!page_mkwrite)
 			file_update_time(vma->vm_file);
+		if (unlikely((vma->vm_file->f_flags & O_REMOVEPRIV) == 0)) {
+			spin_lock(&vma->vm_file->f_lock);
+			vma->vm_file->f_flags |= O_REMOVEPRIV;
+			spin_unlock(&vma->vm_file->f_lock);
+		}
 	}
 
 	return VM_FAULT_WRITE;
-- 
2.6.3


-- 
Kees Cook
Chrome OS & Brillo Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v6] fs: clear file privilege bits when mmap writing
  2016-01-08 23:27 [PATCH v6] fs: clear file privilege bits when mmap writing Kees Cook
@ 2016-01-09  4:28 ` Al Viro
  2016-01-10 15:48 ` Konstantin Khlebnikov
  1 sibling, 0 replies; 12+ messages in thread
From: Al Viro @ 2016-01-09  4:28 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Lutomirski, Jan Kara, yalin wang, Willy Tarreau,
	Andrew Morton, linux-fsdevel, linux-arch, linux-api, linux-mm

On Fri, Jan 08, 2016 at 03:27:27PM -0800, Kees Cook wrote:

> diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h
> index e063effe0cc1..096c4b3afe6a 100644
> --- a/include/uapi/asm-generic/fcntl.h
> +++ b/include/uapi/asm-generic/fcntl.h
> @@ -88,6 +88,10 @@
>  #define __O_TMPFILE	020000000
>  #endif
>  
> +#ifndef O_REMOVEPRIV
> +#define O_REMOVEPRIV	040000000
> +#endif

Hmm...  Is that value always available?  AFAICS, parisc has already grabbed
it (for __O_TMPFILE).  On sparc it's taken by __O_SYNC, on alpha - O_PATH...
There's a reason why those definitions are not unconditional; some targets
have ABI shared with a preexisting Unix variant on the architecture in
question.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6] fs: clear file privilege bits when mmap writing
  2016-01-08 23:27 [PATCH v6] fs: clear file privilege bits when mmap writing Kees Cook
  2016-01-09  4:28 ` Al Viro
@ 2016-01-10 15:48 ` Konstantin Khlebnikov
  2016-01-10 19:30   ` Al Viro
  2016-01-11 19:38   ` Kees Cook
  1 sibling, 2 replies; 12+ messages in thread
From: Konstantin Khlebnikov @ 2016-01-10 15:48 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexander Viro, Andy Lutomirski, Jan Kara, yalin wang,
	Willy Tarreau, Andrew Morton, linux-fsdevel, linux-arch,
	Linux API, linux-mm@kvack.org

On Sat, Jan 9, 2016 at 2:27 AM, Kees Cook <keescook@chromium.org> wrote:
> Normally, when a user can modify a file that has setuid or setgid bits,
> those bits are cleared when they are not the file owner or a member
> of the group. This is enforced when using write and truncate but not
> when writing to a shared mmap on the file. This could allow the file
> writer to gain privileges by changing a binary without losing the
> setuid/setgid/caps bits.
>
> Changing the bits requires holding inode->i_mutex, so it cannot be done
> during the page fault (due to mmap_sem being held during the fault). We
> could do this during vm_mmap_pgoff, but that would need coverage in
> mprotect as well, but to check for MAP_SHARED, we'd need to hold mmap_sem
> again. We could clear at open() time, but it's possible things are
> accidentally opening with O_RDWR and only reading. Better to clear on
> close and error failures (i.e. an improvement over now, which is not
> clearing at all).

I think this should be done in mmap/mprotect. Code in sys_mmap is trivial.

In sys_mprotect you can check file_needs_remove_privs() and VM_SHARED
under mmap_sem, then if needed grab reference to struct file from vma and
clear suid after unlocking mmap_sem.

I haven't seen previous iterations, probably this approach has known flaws.

>
> Instead, detect the need to clear the bits during the page fault, and
> actually remove the bits during final fput. Since the file was open for
> writing, it wouldn't have been possible to execute it yet (ETXTBSY).
>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
> v6:
> - clarify ETXTBSY situation in comments, luto
> v5:
> - add to f_flags instead, viro
> - add i_mutex during __fput, jack
> v4:
> - delay removal instead of still needing mmap_sem for mprotect, yalin
> v3:
> - move outside of mmap_sem for real now, fengguang
> - check return code of file_remove_privs, akpm
> v2:
> - move to mmap from fault handler, jack
> ---
>  fs/file_table.c                  | 15 +++++++++++++++
>  fs/open.c                        |  2 +-
>  include/uapi/asm-generic/fcntl.h |  4 ++++
>  mm/memory.c                      |  5 +++++
>  4 files changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/fs/file_table.c b/fs/file_table.c
> index ad17e05ebf95..ca11b86613cf 100644
> --- a/fs/file_table.c
> +++ b/fs/file_table.c
> @@ -191,6 +191,21 @@ static void __fput(struct file *file)
>
>         might_sleep();
>
> +       /*
> +        * XXX: This is a delayed removal of privs (we've already been
> +        * written to), since we must avoid mmap_sem. But a race shouldn't
> +        * be possible since when open for writing, execve() will fail
> +        * with ETXTBSY (via deny_write_access()). A remaining problem
> +        * is that since we've already been written to, we must ignore the
> +        * return value of file_remove_privs(), since we can't reject the
> +        * writes of the past.
> +        */
> +       if (unlikely(file->f_flags & O_REMOVEPRIV)) {
> +               mutex_lock(&inode->i_mutex);
> +               file_remove_privs(file);
> +               mutex_unlock(&inode->i_mutex);
> +       }
> +
>         fsnotify_close(file);
>         /*
>          * The function eventpoll_release() should be the first called
> diff --git a/fs/open.c b/fs/open.c
> index b6f1e96a7c0b..89069d16ca80 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -895,7 +895,7 @@ static inline int build_open_flags(int flags, umode_t mode, struct open_flags *o
>                 op->mode = 0;
>
>         /* Must never be set by userspace */
> -       flags &= ~FMODE_NONOTIFY & ~O_CLOEXEC;
> +       flags &= ~FMODE_NONOTIFY & ~O_CLOEXEC & ~O_REMOVEPRIV;
>
>         /*
>          * O_SYNC is implemented as __O_SYNC|O_DSYNC.  As many places only
> diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h
> index e063effe0cc1..096c4b3afe6a 100644
> --- a/include/uapi/asm-generic/fcntl.h
> +++ b/include/uapi/asm-generic/fcntl.h
> @@ -88,6 +88,10 @@
>  #define __O_TMPFILE    020000000
>  #endif
>
> +#ifndef O_REMOVEPRIV
> +#define O_REMOVEPRIV   040000000
> +#endif
> +
>  /* a horrid kludge trying to make sure that this will fail on old kernels */
>  #define O_TMPFILE (__O_TMPFILE | O_DIRECTORY)
>  #define O_TMPFILE_MASK (__O_TMPFILE | O_DIRECTORY | O_CREAT)
> diff --git a/mm/memory.c b/mm/memory.c
> index c387430f06c3..ad4188a8f279 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2036,6 +2036,11 @@ static inline int wp_page_reuse(struct mm_struct *mm,
>
>                 if (!page_mkwrite)
>                         file_update_time(vma->vm_file);
> +               if (unlikely((vma->vm_file->f_flags & O_REMOVEPRIV) == 0)) {
> +                       spin_lock(&vma->vm_file->f_lock);
> +                       vma->vm_file->f_flags |= O_REMOVEPRIV;
> +                       spin_unlock(&vma->vm_file->f_lock);
> +               }
>         }
>
>         return VM_FAULT_WRITE;
> --
> 2.6.3
>
>
> --
> Kees Cook
> Chrome OS & Brillo Security
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6] fs: clear file privilege bits when mmap writing
  2016-01-10 15:48 ` Konstantin Khlebnikov
@ 2016-01-10 19:30   ` Al Viro
  2016-01-10 19:51     ` Konstantin Khlebnikov
  2016-01-11 19:38   ` Kees Cook
  1 sibling, 1 reply; 12+ messages in thread
From: Al Viro @ 2016-01-10 19:30 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Kees Cook, Andy Lutomirski, Jan Kara, yalin wang, Willy Tarreau,
	Andrew Morton, linux-fsdevel, linux-arch, Linux API,
	linux-mm@kvack.org

On Sun, Jan 10, 2016 at 06:48:32PM +0300, Konstantin Khlebnikov wrote:
> I think this should be done in mmap/mprotect. Code in sys_mmap is trivial.
> 
> In sys_mprotect you can check file_needs_remove_privs() and VM_SHARED
> under mmap_sem, then if needed grab reference to struct file from vma and
> clear suid after unlocking mmap_sem.

Which vma?  mprotect(2) can cover more than one mapping...  You'd have to
play interesting games to collect the set of affected struct file; it
_might_ be doable (e.g. by using task_work_add() to have the damn thing
trigger on the way to userland), but it would require some care to avoid
hitting the same file more than once - it might, after all, be mmapped
in more than one process, so racing mprotect() would need to be taken
into account.  Hell knows - might be doable, but I'm not sure it'll be
any prettier.

->f_u.fu_rcuhead.func would need to be zeroed on struct file allocation,
and that code would need to
	* check ->f_u.fu_rcuhead.func; if non-NULL - sod off, nothing to do
	* lock ->f_lock
	* recheck (and unlock if we'd lost a race and need to sod off)
	* get_file()
	* task_work_add() on ->f_u.fu_rcuhead
	* drop ->f_lock
with task_work_add() callback removing SUID, zero ->fu.fu_rcuhead.func (under
->f_lock) and finally fput().

In principle, that would work; the primitive would be along the lines of
"make sure that SUID is removed before return to userland" and both mmap
and mprotect would use it.  The primitive itself would be in fs/file_table.c,
encapsulating the messy details in there.  All existing users of ->f_u don't
touch it until ->f_count drops to 0, so we are OK to use it here.  It obviously
should _not_ be used for kernel threads (task_work_add() won't do us any
good in those, but then we are not going to do mmap or mprotect there either).
Regular writes should *not* use that - they ought to strip SUID directly.

Might be worth trying...  Any takers?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6] fs: clear file privilege bits when mmap writing
  2016-01-10 19:30   ` Al Viro
@ 2016-01-10 19:51     ` Konstantin Khlebnikov
  2016-01-10 21:10       ` Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Konstantin Khlebnikov @ 2016-01-10 19:51 UTC (permalink / raw)
  To: Al Viro
  Cc: Kees Cook, Andy Lutomirski, Jan Kara, yalin wang, Willy Tarreau,
	Andrew Morton, linux-fsdevel, linux-arch, Linux API,
	linux-mm@kvack.org

On Sun, Jan 10, 2016 at 10:30 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Sun, Jan 10, 2016 at 06:48:32PM +0300, Konstantin Khlebnikov wrote:
>> I think this should be done in mmap/mprotect. Code in sys_mmap is trivial.
>>
>> In sys_mprotect you can check file_needs_remove_privs() and VM_SHARED
>> under mmap_sem, then if needed grab reference to struct file from vma and
>> clear suid after unlocking mmap_sem.
>
> Which vma?  mprotect(2) can cover more than one mapping...  You'd have to
> play interesting games to collect the set of affected struct file; it
> _might_ be doable (e.g. by using task_work_add() to have the damn thing
> trigger on the way to userland), but it would require some care to avoid
> hitting the same file more than once - it might, after all, be mmapped
> in more than one process, so racing mprotect() would need to be taken
> into account.  Hell knows - might be doable, but I'm not sure it'll be
> any prettier.

Ok, I didn't thought about that. mprotect don't have to be atomic for whole
range -- we could drop mmap_sem, clear suid from one file and restart it
for next vma and so on.

>
> ->f_u.fu_rcuhead.func would need to be zeroed on struct file allocation,
> and that code would need to
>         * check ->f_u.fu_rcuhead.func; if non-NULL - sod off, nothing to do
>         * lock ->f_lock
>         * recheck (and unlock if we'd lost a race and need to sod off)
>         * get_file()
>         * task_work_add() on ->f_u.fu_rcuhead
>         * drop ->f_lock
> with task_work_add() callback removing SUID, zero ->fu.fu_rcuhead.func (under
> ->f_lock) and finally fput().
>
> In principle, that would work; the primitive would be along the lines of
> "make sure that SUID is removed before return to userland" and both mmap
> and mprotect would use it.  The primitive itself would be in fs/file_table.c,
> encapsulating the messy details in there.  All existing users of ->f_u don't
> touch it until ->f_count drops to 0, so we are OK to use it here.  It obviously
> should _not_ be used for kernel threads (task_work_add() won't do us any
> good in those, but then we are not going to do mmap or mprotect there either).
> Regular writes should *not* use that - they ought to strip SUID directly.
>
> Might be worth trying...  Any takers?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6] fs: clear file privilege bits when mmap writing
  2016-01-10 19:51     ` Konstantin Khlebnikov
@ 2016-01-10 21:10       ` Al Viro
  2016-01-10 22:30         ` Konstantin Khlebnikov
  0 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2016-01-10 21:10 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Kees Cook, Andy Lutomirski, Jan Kara, yalin wang, Willy Tarreau,
	Andrew Morton, linux-fsdevel, linux-arch, Linux API,
	linux-mm@kvack.org

On Sun, Jan 10, 2016 at 10:51:52PM +0300, Konstantin Khlebnikov wrote:
> On Sun, Jan 10, 2016 at 10:30 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > On Sun, Jan 10, 2016 at 06:48:32PM +0300, Konstantin Khlebnikov wrote:
> >> I think this should be done in mmap/mprotect. Code in sys_mmap is trivial.
> >>
> >> In sys_mprotect you can check file_needs_remove_privs() and VM_SHARED
> >> under mmap_sem, then if needed grab reference to struct file from vma and
> >> clear suid after unlocking mmap_sem.
> >
> > Which vma?  mprotect(2) can cover more than one mapping...  You'd have to
> > play interesting games to collect the set of affected struct file; it
> > _might_ be doable (e.g. by using task_work_add() to have the damn thing
> > trigger on the way to userland), but it would require some care to avoid
> > hitting the same file more than once - it might, after all, be mmapped
> > in more than one process, so racing mprotect() would need to be taken
> > into account.  Hell knows - might be doable, but I'm not sure it'll be
> > any prettier.
> 
> Ok, I didn't thought about that. mprotect don't have to be atomic for whole
> range -- we could drop mmap_sem, clear suid from one file and restart it
> for next vma and so on.

Won't be fun.  Even aside of the user-visible behaviour changes, you'll have
a lot of new corner cases, starting with the fact that you can't hold onto
vma - virtual address is the best you can do and vma you find after regaining
mmap_sem might start at lower address than one where you are restarting;
getting the splitting-related logics right will be interesting, to put it
mildly.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6] fs: clear file privilege bits when mmap writing
  2016-01-10 21:10       ` Al Viro
@ 2016-01-10 22:30         ` Konstantin Khlebnikov
  0 siblings, 0 replies; 12+ messages in thread
From: Konstantin Khlebnikov @ 2016-01-10 22:30 UTC (permalink / raw)
  To: Al Viro
  Cc: Kees Cook, Andy Lutomirski, Jan Kara, yalin wang, Willy Tarreau,
	Andrew Morton, linux-fsdevel, linux-arch, Linux API,
	linux-mm@kvack.org

On Mon, Jan 11, 2016 at 12:10 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Sun, Jan 10, 2016 at 10:51:52PM +0300, Konstantin Khlebnikov wrote:
>> On Sun, Jan 10, 2016 at 10:30 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> > On Sun, Jan 10, 2016 at 06:48:32PM +0300, Konstantin Khlebnikov wrote:
>> >> I think this should be done in mmap/mprotect. Code in sys_mmap is trivial.
>> >>
>> >> In sys_mprotect you can check file_needs_remove_privs() and VM_SHARED
>> >> under mmap_sem, then if needed grab reference to struct file from vma and
>> >> clear suid after unlocking mmap_sem.
>> >
>> > Which vma?  mprotect(2) can cover more than one mapping...  You'd have to
>> > play interesting games to collect the set of affected struct file; it
>> > _might_ be doable (e.g. by using task_work_add() to have the damn thing
>> > trigger on the way to userland), but it would require some care to avoid
>> > hitting the same file more than once - it might, after all, be mmapped
>> > in more than one process, so racing mprotect() would need to be taken
>> > into account.  Hell knows - might be doable, but I'm not sure it'll be
>> > any prettier.
>>
>> Ok, I didn't thought about that. mprotect don't have to be atomic for whole
>> range -- we could drop mmap_sem, clear suid from one file and restart it
>> for next vma and so on.
>
> Won't be fun.  Even aside of the user-visible behaviour changes, you'll have
> a lot of new corner cases, starting with the fact that you can't hold onto
> vma - virtual address is the best you can do and vma you find after regaining
> mmap_sem might start at lower address than one where you are restarting;
> getting the splitting-related logics right will be interesting, to put it
> mildly.

I don't see any problems here -- in this case mprotect virtually turns
into series of
indendent mprotect calls. Yes, we have to find vma again. Not a big deal.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6] fs: clear file privilege bits when mmap writing
  2016-01-10 15:48 ` Konstantin Khlebnikov
  2016-01-10 19:30   ` Al Viro
@ 2016-01-11 19:38   ` Kees Cook
  2016-01-11 22:39     ` Konstantin Khlebnikov
  1 sibling, 1 reply; 12+ messages in thread
From: Kees Cook @ 2016-01-11 19:38 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Alexander Viro, Andy Lutomirski, Jan Kara, yalin wang,
	Willy Tarreau, Andrew Morton, linux-fsdevel, linux-arch,
	Linux API, linux-mm@kvack.org

On Sun, Jan 10, 2016 at 7:48 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
> On Sat, Jan 9, 2016 at 2:27 AM, Kees Cook <keescook@chromium.org> wrote:
>> Normally, when a user can modify a file that has setuid or setgid bits,
>> those bits are cleared when they are not the file owner or a member
>> of the group. This is enforced when using write and truncate but not
>> when writing to a shared mmap on the file. This could allow the file
>> writer to gain privileges by changing a binary without losing the
>> setuid/setgid/caps bits.
>>
>> Changing the bits requires holding inode->i_mutex, so it cannot be done
>> during the page fault (due to mmap_sem being held during the fault). We
>> could do this during vm_mmap_pgoff, but that would need coverage in
>> mprotect as well, but to check for MAP_SHARED, we'd need to hold mmap_sem
>> again. We could clear at open() time, but it's possible things are
>> accidentally opening with O_RDWR and only reading. Better to clear on
>> close and error failures (i.e. an improvement over now, which is not
>> clearing at all).
>
> I think this should be done in mmap/mprotect. Code in sys_mmap is trivial.
>
> In sys_mprotect you can check file_needs_remove_privs() and VM_SHARED
> under mmap_sem, then if needed grab reference to struct file from vma and
> clear suid after unlocking mmap_sem.
>
> I haven't seen previous iterations, probably this approach has known flaws.

mmap_sem is still needed in mprotect (to find and hold the vma), so
it's not possible. I'd love to be proven wrong, but I didn't see a
way.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6] fs: clear file privilege bits when mmap writing
  2016-01-11 19:38   ` Kees Cook
@ 2016-01-11 22:39     ` Konstantin Khlebnikov
  2016-01-11 22:45       ` Kees Cook
  0 siblings, 1 reply; 12+ messages in thread
From: Konstantin Khlebnikov @ 2016-01-11 22:39 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexander Viro, Andy Lutomirski, Jan Kara, yalin wang,
	Willy Tarreau, Andrew Morton, linux-fsdevel, linux-arch,
	Linux API, linux-mm@kvack.org

On Mon, Jan 11, 2016 at 10:38 PM, Kees Cook <keescook@chromium.org> wrote:
> On Sun, Jan 10, 2016 at 7:48 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
>> On Sat, Jan 9, 2016 at 2:27 AM, Kees Cook <keescook@chromium.org> wrote:
>>> Normally, when a user can modify a file that has setuid or setgid bits,
>>> those bits are cleared when they are not the file owner or a member
>>> of the group. This is enforced when using write and truncate but not
>>> when writing to a shared mmap on the file. This could allow the file
>>> writer to gain privileges by changing a binary without losing the
>>> setuid/setgid/caps bits.
>>>
>>> Changing the bits requires holding inode->i_mutex, so it cannot be done
>>> during the page fault (due to mmap_sem being held during the fault). We
>>> could do this during vm_mmap_pgoff, but that would need coverage in
>>> mprotect as well, but to check for MAP_SHARED, we'd need to hold mmap_sem
>>> again. We could clear at open() time, but it's possible things are
>>> accidentally opening with O_RDWR and only reading. Better to clear on
>>> close and error failures (i.e. an improvement over now, which is not
>>> clearing at all).
>>
>> I think this should be done in mmap/mprotect. Code in sys_mmap is trivial.
>>
>> In sys_mprotect you can check file_needs_remove_privs() and VM_SHARED
>> under mmap_sem, then if needed grab reference to struct file from vma and
>> clear suid after unlocking mmap_sem.
>>
>> I haven't seen previous iterations, probably this approach has known flaws.
>
> mmap_sem is still needed in mprotect (to find and hold the vma), so
> it's not possible. I'd love to be proven wrong, but I didn't see a
> way.

something like this

@@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,

        vm_flags = calc_vm_prot_bits(prot);

+restart:
        down_write(&current->mm->mmap_sem);

        vma = find_vma(current->mm, start);
@@ -416,6 +418,21 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start,
size_t, len,
                        goto out;
                }

+               if ((newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
+                   vma->vm_file && file_needs_remove_privs(vma->vm_file)) {
+                       struct file *file = get_file(vma->vm_file);
+
+                       start = vma->vm_start;
+                       up_write(&current->mm->mmap_sem);
+                       mutex_lock(&file_inode(file)->i_mutex);
+                       error = file_remove_privs(file);
+                       mutex_unlock(&file_inode(file)->i_mutex);
+                       fput(file);
+                       if (error)
+                               return error;
+                       goto restart;
+               }
+


>
> -Kees
>
> --
> Kees Cook
> Chrome OS & Brillo Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6] fs: clear file privilege bits when mmap writing
  2016-01-11 22:39     ` Konstantin Khlebnikov
@ 2016-01-11 22:45       ` Kees Cook
  2016-01-11 23:16         ` Konstantin Khlebnikov
  0 siblings, 1 reply; 12+ messages in thread
From: Kees Cook @ 2016-01-11 22:45 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Alexander Viro, Andy Lutomirski, Jan Kara, yalin wang,
	Willy Tarreau, Andrew Morton, linux-fsdevel, linux-arch,
	Linux API, linux-mm@kvack.org

On Mon, Jan 11, 2016 at 2:39 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
> On Mon, Jan 11, 2016 at 10:38 PM, Kees Cook <keescook@chromium.org> wrote:
>> On Sun, Jan 10, 2016 at 7:48 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
>>> On Sat, Jan 9, 2016 at 2:27 AM, Kees Cook <keescook@chromium.org> wrote:
>>>> Normally, when a user can modify a file that has setuid or setgid bits,
>>>> those bits are cleared when they are not the file owner or a member
>>>> of the group. This is enforced when using write and truncate but not
>>>> when writing to a shared mmap on the file. This could allow the file
>>>> writer to gain privileges by changing a binary without losing the
>>>> setuid/setgid/caps bits.
>>>>
>>>> Changing the bits requires holding inode->i_mutex, so it cannot be done
>>>> during the page fault (due to mmap_sem being held during the fault). We
>>>> could do this during vm_mmap_pgoff, but that would need coverage in
>>>> mprotect as well, but to check for MAP_SHARED, we'd need to hold mmap_sem
>>>> again. We could clear at open() time, but it's possible things are
>>>> accidentally opening with O_RDWR and only reading. Better to clear on
>>>> close and error failures (i.e. an improvement over now, which is not
>>>> clearing at all).
>>>
>>> I think this should be done in mmap/mprotect. Code in sys_mmap is trivial.
>>>
>>> In sys_mprotect you can check file_needs_remove_privs() and VM_SHARED
>>> under mmap_sem, then if needed grab reference to struct file from vma and
>>> clear suid after unlocking mmap_sem.
>>>
>>> I haven't seen previous iterations, probably this approach has known flaws.
>>
>> mmap_sem is still needed in mprotect (to find and hold the vma), so
>> it's not possible. I'd love to be proven wrong, but I didn't see a
>> way.
>
> something like this
>
> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>
>         vm_flags = calc_vm_prot_bits(prot);
>
> +restart:
>         down_write(&current->mm->mmap_sem);
>
>         vma = find_vma(current->mm, start);
> @@ -416,6 +418,21 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start,
> size_t, len,
>                         goto out;
>                 }
>
> +               if ((newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
> +                   vma->vm_file && file_needs_remove_privs(vma->vm_file)) {
> +                       struct file *file = get_file(vma->vm_file);
> +
> +                       start = vma->vm_start;
> +                       up_write(&current->mm->mmap_sem);
> +                       mutex_lock(&file_inode(file)->i_mutex);
> +                       error = file_remove_privs(file);
> +                       mutex_unlock(&file_inode(file)->i_mutex);
> +                       fput(file);
> +                       if (error)
> +                               return error;
> +                       goto restart;
> +               }
> +

Is this safe against the things Al mentioned? I still don't like the
mmap/mprotect approach because it makes the change before anything was
actually written...

-Kees

>
>
>>
>> -Kees
>>
>> --
>> Kees Cook
>> Chrome OS & Brillo Security



-- 
Kees Cook
Chrome OS & Brillo Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6] fs: clear file privilege bits when mmap writing
  2016-01-11 22:45       ` Kees Cook
@ 2016-01-11 23:16         ` Konstantin Khlebnikov
  2016-01-11 23:19           ` Kees Cook
  0 siblings, 1 reply; 12+ messages in thread
From: Konstantin Khlebnikov @ 2016-01-11 23:16 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexander Viro, Andy Lutomirski, Jan Kara, yalin wang,
	Willy Tarreau, Andrew Morton, linux-fsdevel, linux-arch,
	Linux API, linux-mm@kvack.org

On Tue, Jan 12, 2016 at 1:45 AM, Kees Cook <keescook@chromium.org> wrote:
> On Mon, Jan 11, 2016 at 2:39 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
>> On Mon, Jan 11, 2016 at 10:38 PM, Kees Cook <keescook@chromium.org> wrote:
>>> On Sun, Jan 10, 2016 at 7:48 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
>>>> On Sat, Jan 9, 2016 at 2:27 AM, Kees Cook <keescook@chromium.org> wrote:
>>>>> Normally, when a user can modify a file that has setuid or setgid bits,
>>>>> those bits are cleared when they are not the file owner or a member
>>>>> of the group. This is enforced when using write and truncate but not
>>>>> when writing to a shared mmap on the file. This could allow the file
>>>>> writer to gain privileges by changing a binary without losing the
>>>>> setuid/setgid/caps bits.
>>>>>
>>>>> Changing the bits requires holding inode->i_mutex, so it cannot be done
>>>>> during the page fault (due to mmap_sem being held during the fault). We
>>>>> could do this during vm_mmap_pgoff, but that would need coverage in
>>>>> mprotect as well, but to check for MAP_SHARED, we'd need to hold mmap_sem
>>>>> again. We could clear at open() time, but it's possible things are
>>>>> accidentally opening with O_RDWR and only reading. Better to clear on
>>>>> close and error failures (i.e. an improvement over now, which is not
>>>>> clearing at all).
>>>>
>>>> I think this should be done in mmap/mprotect. Code in sys_mmap is trivial.
>>>>
>>>> In sys_mprotect you can check file_needs_remove_privs() and VM_SHARED
>>>> under mmap_sem, then if needed grab reference to struct file from vma and
>>>> clear suid after unlocking mmap_sem.
>>>>
>>>> I haven't seen previous iterations, probably this approach has known flaws.
>>>
>>> mmap_sem is still needed in mprotect (to find and hold the vma), so
>>> it's not possible. I'd love to be proven wrong, but I didn't see a
>>> way.
>>
>> something like this
>>
>> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>
>>         vm_flags = calc_vm_prot_bits(prot);
>>
>> +restart:
>>         down_write(&current->mm->mmap_sem);
>>
>>         vma = find_vma(current->mm, start);
>> @@ -416,6 +418,21 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start,
>> size_t, len,
>>                         goto out;
>>                 }
>>
>> +               if ((newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
>> +                   vma->vm_file && file_needs_remove_privs(vma->vm_file)) {
>> +                       struct file *file = get_file(vma->vm_file);
>> +
>> +                       start = vma->vm_start;
>> +                       up_write(&current->mm->mmap_sem);
>> +                       mutex_lock(&file_inode(file)->i_mutex);
>> +                       error = file_remove_privs(file);
>> +                       mutex_unlock(&file_inode(file)->i_mutex);
>> +                       fput(file);
>> +                       if (error)
>> +                               return error;
>> +                       goto restart;
>> +               }
>> +
>
> Is this safe against the things Al mentioned? I still don't like the
> mmap/mprotect approach because it makes the change before anything was
> actually written...

(I forgot to check VM_SHARED)

Yep, this should be safe.

I think suid should be cleared before any possible change of data.
New content could hit the disk but suid never be cleared,
for example if system suddenly crashed or rebooted.

>
> -Kees
>
>>
>>
>>>
>>> -Kees
>>>
>>> --
>>> Kees Cook
>>> Chrome OS & Brillo Security
>
>
>
> --
> Kees Cook
> Chrome OS & Brillo Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6] fs: clear file privilege bits when mmap writing
  2016-01-11 23:16         ` Konstantin Khlebnikov
@ 2016-01-11 23:19           ` Kees Cook
  0 siblings, 0 replies; 12+ messages in thread
From: Kees Cook @ 2016-01-11 23:19 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Alexander Viro, Andy Lutomirski, Jan Kara, yalin wang,
	Willy Tarreau, Andrew Morton, linux-fsdevel, linux-arch,
	Linux API, linux-mm@kvack.org

On Mon, Jan 11, 2016 at 3:16 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
> On Tue, Jan 12, 2016 at 1:45 AM, Kees Cook <keescook@chromium.org> wrote:
>> On Mon, Jan 11, 2016 at 2:39 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
>>> On Mon, Jan 11, 2016 at 10:38 PM, Kees Cook <keescook@chromium.org> wrote:
>>>> On Sun, Jan 10, 2016 at 7:48 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
>>>>> On Sat, Jan 9, 2016 at 2:27 AM, Kees Cook <keescook@chromium.org> wrote:
>>>>>> Normally, when a user can modify a file that has setuid or setgid bits,
>>>>>> those bits are cleared when they are not the file owner or a member
>>>>>> of the group. This is enforced when using write and truncate but not
>>>>>> when writing to a shared mmap on the file. This could allow the file
>>>>>> writer to gain privileges by changing a binary without losing the
>>>>>> setuid/setgid/caps bits.
>>>>>>
>>>>>> Changing the bits requires holding inode->i_mutex, so it cannot be done
>>>>>> during the page fault (due to mmap_sem being held during the fault). We
>>>>>> could do this during vm_mmap_pgoff, but that would need coverage in
>>>>>> mprotect as well, but to check for MAP_SHARED, we'd need to hold mmap_sem
>>>>>> again. We could clear at open() time, but it's possible things are
>>>>>> accidentally opening with O_RDWR and only reading. Better to clear on
>>>>>> close and error failures (i.e. an improvement over now, which is not
>>>>>> clearing at all).
>>>>>
>>>>> I think this should be done in mmap/mprotect. Code in sys_mmap is trivial.
>>>>>
>>>>> In sys_mprotect you can check file_needs_remove_privs() and VM_SHARED
>>>>> under mmap_sem, then if needed grab reference to struct file from vma and
>>>>> clear suid after unlocking mmap_sem.
>>>>>
>>>>> I haven't seen previous iterations, probably this approach has known flaws.
>>>>
>>>> mmap_sem is still needed in mprotect (to find and hold the vma), so
>>>> it's not possible. I'd love to be proven wrong, but I didn't see a
>>>> way.
>>>
>>> something like this
>>>
>>> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>>
>>>         vm_flags = calc_vm_prot_bits(prot);
>>>
>>> +restart:
>>>         down_write(&current->mm->mmap_sem);
>>>
>>>         vma = find_vma(current->mm, start);
>>> @@ -416,6 +418,21 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start,
>>> size_t, len,
>>>                         goto out;
>>>                 }
>>>
>>> +               if ((newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
>>> +                   vma->vm_file && file_needs_remove_privs(vma->vm_file)) {
>>> +                       struct file *file = get_file(vma->vm_file);
>>> +
>>> +                       start = vma->vm_start;
>>> +                       up_write(&current->mm->mmap_sem);
>>> +                       mutex_lock(&file_inode(file)->i_mutex);
>>> +                       error = file_remove_privs(file);
>>> +                       mutex_unlock(&file_inode(file)->i_mutex);
>>> +                       fput(file);
>>> +                       if (error)
>>> +                               return error;
>>> +                       goto restart;
>>> +               }
>>> +
>>
>> Is this safe against the things Al mentioned? I still don't like the
>> mmap/mprotect approach because it makes the change before anything was
>> actually written...
>
> (I forgot to check VM_SHARED)
>
> Yep, this should be safe.
>
> I think suid should be cleared before any possible change of data.
> New content could hit the disk but suid never be cleared,
> for example if system suddenly crashed or rebooted.

Oooh, very good point. Yeah, that's enough to convince me. :) Ignore my v7...

Al, are you okay with this semantic change?

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-01-11 23:19 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-08 23:27 [PATCH v6] fs: clear file privilege bits when mmap writing Kees Cook
2016-01-09  4:28 ` Al Viro
2016-01-10 15:48 ` Konstantin Khlebnikov
2016-01-10 19:30   ` Al Viro
2016-01-10 19:51     ` Konstantin Khlebnikov
2016-01-10 21:10       ` Al Viro
2016-01-10 22:30         ` Konstantin Khlebnikov
2016-01-11 19:38   ` Kees Cook
2016-01-11 22:39     ` Konstantin Khlebnikov
2016-01-11 22:45       ` Kees Cook
2016-01-11 23:16         ` Konstantin Khlebnikov
2016-01-11 23:19           ` Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).