public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
@ 2012-03-08 16:51 Cyrill Gorcunov
  2012-03-08 18:26 ` Oleg Nesterov
  2012-03-08 19:31 ` Kees Cook
  0 siblings, 2 replies; 48+ messages in thread
From: Cyrill Gorcunov @ 2012-03-08 16:51 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: KOSAKI Motohiro, Pavel Emelyanov, Kees Cook, Tejun Heo,
	Andrew Morton, LKML

Hi Oleg, could you please take a look once you get a minute (no urgency).

	Cyrill
---
From: Cyrill Gorcunov <gorcunov@openvz.org>
Subject: c/r: prctl: Add ability to set new mm_struct::exe_file v3

When we do restore we would like to have a way to setup
a former mm_struct::exe_file so that /proc/pid/exe would
point to the original executable file a process had at
checkpoint time.

For this the PR_SET_MM_EXE_FILE code is introduced.
This option takes a file descriptor which will be
set as new /proc/$pid/exe symlink.

This feature is available iif CONFIG_CHECKPOINT_RESTORE is set.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Kees Cook <keescook@chromium.org>
CC: Tejun Heo <tj@kernel.org>
---
 include/linux/prctl.h |    1 +
 kernel/sys.c          |   50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 51 insertions(+)

Index: linux-2.6.git/include/linux/prctl.h
===================================================================
--- linux-2.6.git.orig/include/linux/prctl.h
+++ linux-2.6.git/include/linux/prctl.h
@@ -118,5 +118,6 @@
 # define PR_SET_MM_ENV_START		10
 # define PR_SET_MM_ENV_END		11
 # define PR_SET_MM_AUXV			12
+# define PR_SET_MM_EXE_FILE		13
 
 #endif /* _LINUX_PRCTL_H */
Index: linux-2.6.git/kernel/sys.c
===================================================================
--- linux-2.6.git.orig/kernel/sys.c
+++ linux-2.6.git/kernel/sys.c
@@ -36,6 +36,8 @@
 #include <linux/personality.h>
 #include <linux/ptrace.h>
 #include <linux/fs_struct.h>
+#include <linux/file.h>
+#include <linux/mount.h>
 #include <linux/gfp.h>
 #include <linux/syscore_ops.h>
 #include <linux/version.h>
@@ -1701,6 +1703,50 @@ static bool vma_flags_mismatch(struct vm
 		(vma->vm_flags & banned);
 }
 
+static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
+{
+	struct file *exe_file;
+	struct dentry *dentry;
+	int err;
+
+	exe_file = fget(fd);
+	if (!exe_file)
+		return -EBADF;
+
+	dentry = exe_file->f_path.dentry;
+
+	/*
+	 * Because the original mm->exe_file
+	 * points to executable file, make sure
+	 * this one is executable as well to not
+	 * break "big" picture and proc/pid/exe
+	 * symlink will be still pointing to
+	 * executable one.
+	 */
+	err = -EACCES;
+	if (!S_ISREG(dentry->d_inode->i_mode)	||
+	    exe_file->f_path.mnt->mnt_flags & MNT_NOEXEC)
+		goto exit;
+
+	err = inode_permission(dentry->d_inode, MAY_EXEC);
+	if (err)
+		goto exit;
+
+	down_write(&mm->mmap_sem);
+	if (mm->num_exe_file_vmas) {
+		fput(mm->exe_file);
+		mm->exe_file = exe_file;
+		exe_file = NULL;
+	} else
+		set_mm_exe_file(mm, exe_file);
+	up_write(&mm->mmap_sem);
+
+exit:
+	if (exe_file)
+		fput(exe_file);
+	return err;
+}
+
 static int prctl_set_mm(int opt, unsigned long addr,
 			unsigned long arg4, unsigned long arg5)
 {
@@ -1715,6 +1761,9 @@ static int prctl_set_mm(int opt, unsigne
 	if (!capable(CAP_SYS_RESOURCE))
 		return -EPERM;
 
+	if (opt == PR_SET_MM_EXE_FILE)
+		return prctl_set_mm_exe_file(mm, (unsigned int)addr);
+
 	if (addr >= TASK_SIZE)
 		return -EINVAL;
 
@@ -1837,6 +1886,7 @@ static int prctl_set_mm(int opt, unsigne
 
 		return 0;
 	}
+
 	default:
 		error = -EINVAL;
 		goto out;

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-08 16:51 [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3 Cyrill Gorcunov
@ 2012-03-08 18:26 ` Oleg Nesterov
  2012-03-08 19:03   ` Cyrill Gorcunov
  2012-03-08 19:31 ` Kees Cook
  1 sibling, 1 reply; 48+ messages in thread
From: Oleg Nesterov @ 2012-03-08 18:26 UTC (permalink / raw)
  To: Cyrill Gorcunov, Matt Helsley
  Cc: KOSAKI Motohiro, Pavel Emelyanov, Kees Cook, Tejun Heo,
	Andrew Morton, LKML

On 03/08, Cyrill Gorcunov wrote:
>
> Hi Oleg, could you please take a look once you get a minute (no urgency).

Add Matt. I won't touch the text below to keep the patch intact.

With this change

	down_write(&mm->mmap_sem);
	if (mm->num_exe_file_vmas) {
		fput(mm->exe_file);
		mm->exe_file = exe_file;
		exe_file = NULL;
	} else
		set_mm_exe_file(mm, exe_file);
	up_write(&mm->mmap_sem);

I simply do not understand what mm->num_exe_file_vmas means after
PR_SET_MM_EXE_FILE.

I think that you should do

	down_write(&mm->mmap_sem);
	if (mm->num_exe_file_vmas) {
		fput(mm->exe_file);
		mm->exe_file = exe_file;
		exe_file = NULL;
	}
	up_write(&mm->mmap_sem);

to keep the current "mm->exe_file goes away after the final
unmap(MAP_EXECUTABLE)" logic.

OK, may be this doesn't work in c/r case because you are actually
going to remove the old mappings? But in this case the new exe_file
will go away anyway, afaics PR_SET_MM_EXE_FILE is called when you
still have the old mappings.

And I don't think the unconditional

	down_write(&mm->mmap_sem);
	set_mm_exe_file(mm, exe_file);
	up_write(&mm->mmap_sem);

is 100% right, this clears ->num_exe_file_vmas. This means that
(if you still have the old mapping) the new exe_file can go away
after added_exe_file_vma() + removed_exe_file_vma(). Normally this
should happen, but afaics this is possible. Note that even, say,
mprotect() can trigger added_exe_file_vma().

May be we can do something like

	down_write(&mm->mmap_sem);
	set_mm_exe_file(mm, exe_file);
	// we are cheating anyway, make sure it can never == 0
	// if we have the "old" VM_EXECUTABLE vmas.
	mm->num_exe_file_vmas = LONG_MAX;
	up_write(&mm->mmap_sem);

I dunno. Matt, could you help?

> From: Cyrill Gorcunov <gorcunov@openvz.org>
> Subject: c/r: prctl: Add ability to set new mm_struct::exe_file v3
>
> When we do restore we would like to have a way to setup
> a former mm_struct::exe_file so that /proc/pid/exe would
> point to the original executable file a process had at
> checkpoint time.
>
> For this the PR_SET_MM_EXE_FILE code is introduced.
> This option takes a file descriptor which will be
> set as new /proc/$pid/exe symlink.
>
> This feature is available iif CONFIG_CHECKPOINT_RESTORE is set.
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Pavel Emelyanov <xemul@parallels.com>
> CC: Kees Cook <keescook@chromium.org>
> CC: Tejun Heo <tj@kernel.org>
> ---
>  include/linux/prctl.h |    1 +
>  kernel/sys.c          |   50 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 51 insertions(+)
>
> Index: linux-2.6.git/include/linux/prctl.h
> ===================================================================
> --- linux-2.6.git.orig/include/linux/prctl.h
> +++ linux-2.6.git/include/linux/prctl.h
> @@ -118,5 +118,6 @@
>  # define PR_SET_MM_ENV_START		10
>  # define PR_SET_MM_ENV_END		11
>  # define PR_SET_MM_AUXV			12
> +# define PR_SET_MM_EXE_FILE		13
>
>  #endif /* _LINUX_PRCTL_H */
> Index: linux-2.6.git/kernel/sys.c
> ===================================================================
> --- linux-2.6.git.orig/kernel/sys.c
> +++ linux-2.6.git/kernel/sys.c
> @@ -36,6 +36,8 @@
>  #include <linux/personality.h>
>  #include <linux/ptrace.h>
>  #include <linux/fs_struct.h>
> +#include <linux/file.h>
> +#include <linux/mount.h>
>  #include <linux/gfp.h>
>  #include <linux/syscore_ops.h>
>  #include <linux/version.h>
> @@ -1701,6 +1703,50 @@ static bool vma_flags_mismatch(struct vm
>  		(vma->vm_flags & banned);
>  }
>
> +static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
> +{
> +	struct file *exe_file;
> +	struct dentry *dentry;
> +	int err;
> +
> +	exe_file = fget(fd);
> +	if (!exe_file)
> +		return -EBADF;
> +
> +	dentry = exe_file->f_path.dentry;
> +
> +	/*
> +	 * Because the original mm->exe_file
> +	 * points to executable file, make sure
> +	 * this one is executable as well to not
> +	 * break "big" picture and proc/pid/exe
> +	 * symlink will be still pointing to
> +	 * executable one.
> +	 */
> +	err = -EACCES;
> +	if (!S_ISREG(dentry->d_inode->i_mode)	||
> +	    exe_file->f_path.mnt->mnt_flags & MNT_NOEXEC)
> +		goto exit;
> +
> +	err = inode_permission(dentry->d_inode, MAY_EXEC);
> +	if (err)
> +		goto exit;
> +
> +	down_write(&mm->mmap_sem);
> +	if (mm->num_exe_file_vmas) {
> +		fput(mm->exe_file);
> +		mm->exe_file = exe_file;
> +		exe_file = NULL;
> +	} else
> +		set_mm_exe_file(mm, exe_file);
> +	up_write(&mm->mmap_sem);
> +
> +exit:
> +	if (exe_file)
> +		fput(exe_file);
> +	return err;
> +}
> +
>  static int prctl_set_mm(int opt, unsigned long addr,
>  			unsigned long arg4, unsigned long arg5)
>  {
> @@ -1715,6 +1761,9 @@ static int prctl_set_mm(int opt, unsigne
>  	if (!capable(CAP_SYS_RESOURCE))
>  		return -EPERM;
>
> +	if (opt == PR_SET_MM_EXE_FILE)
> +		return prctl_set_mm_exe_file(mm, (unsigned int)addr);
> +
>  	if (addr >= TASK_SIZE)
>  		return -EINVAL;
>
> @@ -1837,6 +1886,7 @@ static int prctl_set_mm(int opt, unsigne
>
>  		return 0;
>  	}
> +
>  	default:
>  		error = -EINVAL;
>  		goto out;


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-08 18:26 ` Oleg Nesterov
@ 2012-03-08 19:03   ` Cyrill Gorcunov
  2012-03-08 19:05     ` Oleg Nesterov
  2012-03-09 21:46     ` Matt Helsley
  0 siblings, 2 replies; 48+ messages in thread
From: Cyrill Gorcunov @ 2012-03-08 19:03 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Matt Helsley, KOSAKI Motohiro, Pavel Emelyanov, Kees Cook,
	Tejun Heo, Andrew Morton, LKML

On Thu, Mar 08, 2012 at 07:26:23PM +0100, Oleg Nesterov wrote:
> On 03/08, Cyrill Gorcunov wrote:
> >
> > Hi Oleg, could you please take a look once you get a minute (no urgency).
> 
> Add Matt. I won't touch the text below to keep the patch intact.

Thanks for CC'ing Matt, Oleg (I forgot, sorry).

> 
> With this change
> 
> 	down_write(&mm->mmap_sem);
> 	if (mm->num_exe_file_vmas) {
> 		fput(mm->exe_file);
> 		mm->exe_file = exe_file;
> 		exe_file = NULL;
> 	} else
> 		set_mm_exe_file(mm, exe_file);
> 	up_write(&mm->mmap_sem);
> 
> I simply do not understand what mm->num_exe_file_vmas means after
> PR_SET_MM_EXE_FILE.
> 
> I think that you should do
> 
> 	down_write(&mm->mmap_sem);
> 	if (mm->num_exe_file_vmas) {
> 		fput(mm->exe_file);
> 		mm->exe_file = exe_file;
> 		exe_file = NULL;
> 	}
> 	up_write(&mm->mmap_sem);
> 
> to keep the current "mm->exe_file goes away after the final
> unmap(MAP_EXECUTABLE)" logic.
> 
> OK, may be this doesn't work in c/r case because you are actually
> going to remove the old mappings? But in this case the new exe_file
> will go away anyway, afaics PR_SET_MM_EXE_FILE is called when you
> still have the old mappings.

Yes, exactly, I need to remove old mappings first (because VMAs
we're about to restore may intersect with current map the host
program has). And yes, once they all are removed I don't have
/proc/pid/exe anymore. That's why I need num_exe_file_vmas == 0
case.

When I setup new exe_file with num_exe_file_vmas = 0, this reference
to a file brings /proc/pid/exe back to live (and when process exiting
it'll call set_mm_exe_file(mm, NULL) and the new exe_file will be dropped,
so no leak here).

> 
> And I don't think the unconditional
> 
> 	down_write(&mm->mmap_sem);
> 	set_mm_exe_file(mm, exe_file);
> 	up_write(&mm->mmap_sem);
> 
> is 100% right, this clears ->num_exe_file_vmas. This means that
> (if you still have the old mapping) the new exe_file can go away
> after added_exe_file_vma() + removed_exe_file_vma(). Normally this
> should happen, but afaics this is possible. Note that even, say,
> mprotect() can trigger added_exe_file_vma().
> 

Wait, Oleg, I'm confused, in case if there *is* exitsting VM_EXECUTABLEs
then we jump into first banch and simply replace old exe_file.
If there is no VM_EXECUTABLEs, then we simply setup new exe_file
and num_exe_file_vmas remains zero.

Or I miss something obvious and we somehow can cause the kernel
to map VM_EXECUTABLEs out of binfmt-elf loader?

> May be we can do something like
> 
> 	down_write(&mm->mmap_sem);
> 	set_mm_exe_file(mm, exe_file);
> 	// we are cheating anyway, make sure it can never == 0
> 	// if we have the "old" VM_EXECUTABLE vmas.
> 	mm->num_exe_file_vmas = LONG_MAX;
> 	up_write(&mm->mmap_sem);
> 
> I dunno. Matt, could you help?

	Cyrill

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-08 19:03   ` Cyrill Gorcunov
@ 2012-03-08 19:05     ` Oleg Nesterov
  2012-03-08 19:25       ` Cyrill Gorcunov
  2012-03-09 21:46     ` Matt Helsley
  1 sibling, 1 reply; 48+ messages in thread
From: Oleg Nesterov @ 2012-03-08 19:05 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Matt Helsley, KOSAKI Motohiro, Pavel Emelyanov, Kees Cook,
	Tejun Heo, Andrew Morton, LKML

On 03/08, Cyrill Gorcunov wrote:
>
> On Thu, Mar 08, 2012 at 07:26:23PM +0100, Oleg Nesterov wrote:
> > I think that you should do
> >
> > 	down_write(&mm->mmap_sem);
> > 	if (mm->num_exe_file_vmas) {
> > 		fput(mm->exe_file);
> > 		mm->exe_file = exe_file;
> > 		exe_file = NULL;
> > 	}
> > 	up_write(&mm->mmap_sem);
> >
> > to keep the current "mm->exe_file goes away after the final
> > unmap(MAP_EXECUTABLE)" logic.
> >
> > OK, may be this doesn't work in c/r case because you are actually
> > going to remove the old mappings? But in this case the new exe_file
> > will go away anyway, afaics PR_SET_MM_EXE_FILE is called when you
> > still have the old mappings.
>
> Yes, exactly, I need to remove old mappings first (because VMAs
> we're about to restore may intersect with current map the host
> program has). And yes, once they all are removed I don't have
> /proc/pid/exe anymore. That's why I need num_exe_file_vmas == 0
> case.

OK, in this case PR_SET_MM_EXE_FILE should probably fail if
mm->num_exe_file_vmas != 0 ? This way it would be more or less
consistent or at least understandable. Just we add the new
special case: num_exe_file_vmas == 0 but exe_file != NULL
because c/r people are crazy.

> > And I don't think the unconditional
> >
> > 	down_write(&mm->mmap_sem);
> > 	set_mm_exe_file(mm, exe_file);
> > 	up_write(&mm->mmap_sem);
> >
> > is 100% right, this clears ->num_exe_file_vmas. This means that
> > (if you still have the old mapping) the new exe_file can go away
> > after added_exe_file_vma() + removed_exe_file_vma(). Normally this
> > should happen, but afaics this is possible. Note that even, say,
> > mprotect() can trigger added_exe_file_vma().
> >
>
> Wait, Oleg, I'm confused, in case if there *is* exitsting VM_EXECUTABLEs
> then we jump into first banch and simply replace old exe_file.

Yes. And then later you remove the old mapping (which do not match
the new file anyway) and the new exe_file goes away. Unlikely you
want this.

Oleg.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-08 19:05     ` Oleg Nesterov
@ 2012-03-08 19:25       ` Cyrill Gorcunov
  2012-03-08 19:25         ` Oleg Nesterov
  0 siblings, 1 reply; 48+ messages in thread
From: Cyrill Gorcunov @ 2012-03-08 19:25 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Matt Helsley, KOSAKI Motohiro, Pavel Emelyanov, Kees Cook,
	Tejun Heo, Andrew Morton, LKML

On Thu, Mar 08, 2012 at 08:05:34PM +0100, Oleg Nesterov wrote:
...
> >
> > Yes, exactly, I need to remove old mappings first (because VMAs
> > we're about to restore may intersect with current map the host
> > program has). And yes, once they all are removed I don't have
> > /proc/pid/exe anymore. That's why I need num_exe_file_vmas == 0
> > case.
> 
> OK, in this case PR_SET_MM_EXE_FILE should probably fail if
> mm->num_exe_file_vmas != 0 ? This way it would be more or less
> consistent or at least understandable. Just we add the new
> special case: num_exe_file_vmas == 0 but exe_file != NULL
> because c/r people are crazy.
> 

Sure, I can drop num_exe_file_vmas != 0 case and refuse to
setup new exe symlink if there some VM_EXECUTABLE remains
unmapped. Sounds good?

> > > And I don't think the unconditional
> > >
> > > 	down_write(&mm->mmap_sem);
> > > 	set_mm_exe_file(mm, exe_file);
> > > 	up_write(&mm->mmap_sem);
> > >
> > > is 100% right, this clears ->num_exe_file_vmas. This means that
> > > (if you still have the old mapping) the new exe_file can go away
> > > after added_exe_file_vma() + removed_exe_file_vma(). Normally this
> > > should happen, but afaics this is possible. Note that even, say,
> > > mprotect() can trigger added_exe_file_vma().
> > >
> >
> > Wait, Oleg, I'm confused, in case if there *is* exitsting VM_EXECUTABLEs
> > then we jump into first banch and simply replace old exe_file.
> 
> Yes. And then later you remove the old mapping (which do not match
> the new file anyway) and the new exe_file goes away. Unlikely you
> want this.

Yes, unlikely ;)

	Cyrill

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-08 19:25       ` Cyrill Gorcunov
@ 2012-03-08 19:25         ` Oleg Nesterov
  2012-03-08 19:36           ` Cyrill Gorcunov
  2012-03-08 21:48           ` Cyrill Gorcunov
  0 siblings, 2 replies; 48+ messages in thread
From: Oleg Nesterov @ 2012-03-08 19:25 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Matt Helsley, KOSAKI Motohiro, Pavel Emelyanov, Kees Cook,
	Tejun Heo, Andrew Morton, LKML

On 03/08, Cyrill Gorcunov wrote:
>
> On Thu, Mar 08, 2012 at 08:05:34PM +0100, Oleg Nesterov wrote:
> ...
> > >
> > > Yes, exactly, I need to remove old mappings first (because VMAs
> > > we're about to restore may intersect with current map the host
> > > program has). And yes, once they all are removed I don't have
> > > /proc/pid/exe anymore. That's why I need num_exe_file_vmas == 0
> > > case.
> >
> > OK, in this case PR_SET_MM_EXE_FILE should probably fail if
> > mm->num_exe_file_vmas != 0 ? This way it would be more or less
> > consistent or at least understandable. Just we add the new
> > special case: num_exe_file_vmas == 0 but exe_file != NULL
> > because c/r people are crazy.
> >
>
> Sure, I can drop num_exe_file_vmas != 0 case and refuse to
> setup new exe symlink if there some VM_EXECUTABLE remains
> unmapped. Sounds good?

Personally I like this. This is simple and _understable_, even
if ->num_exe_file_vmas has no meaning after PR_SET_MM_EXE.

But please-please document the new special case in the changelog.

Oleg.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-08 16:51 [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3 Cyrill Gorcunov
  2012-03-08 18:26 ` Oleg Nesterov
@ 2012-03-08 19:31 ` Kees Cook
  2012-03-08 19:40   ` Cyrill Gorcunov
  1 sibling, 1 reply; 48+ messages in thread
From: Kees Cook @ 2012-03-08 19:31 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Oleg Nesterov, KOSAKI Motohiro, Pavel Emelyanov, Tejun Heo,
	Andrew Morton, LKML, Andy Lutomirski, Will Drewry

On Thu, Mar 8, 2012 at 8:51 AM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> When we do restore we would like to have a way to setup
> a former mm_struct::exe_file so that /proc/pid/exe would
> point to the original executable file a process had at
> checkpoint time.
>
> For this the PR_SET_MM_EXE_FILE code is introduced.
> This option takes a file descriptor which will be
> set as new /proc/$pid/exe symlink.
>
> This feature is available iif CONFIG_CHECKPOINT_RESTORE is set.
> [...]
> Index: linux-2.6.git/kernel/sys.c
> ===================================================================
> --- linux-2.6.git.orig/kernel/sys.c
> +++ linux-2.6.git/kernel/sys.c
> [...]
> +       exe_file = fget(fd);
> +       if (!exe_file)
> +               return -EBADF;
> +
> +       dentry = exe_file->f_path.dentry;
> +
> +       /*
> +        * Because the original mm->exe_file
> +        * points to executable file, make sure
> +        * this one is executable as well to not
> +        * break "big" picture and proc/pid/exe
> +        * symlink will be still pointing to
> +        * executable one.
> +        */
> +       err = -EACCES;
> +       if (!S_ISREG(dentry->d_inode->i_mode)   ||
> +           exe_file->f_path.mnt->mnt_flags & MNT_NOEXEC)
> +               goto exit;

I'm starting to notice that this pattern (testing ISREG and
MNT_NOEXEC) is getting repeated a few times in the kernel, and at
least the no-new-privs patch (not yet in -mm but hopefully soon given
the seccomp_filter work) updates this pattern everywhere. Perhaps this
should be extracted into a helper first, and then this patch can call
that helper here? (And then nnp can just update the single helper.)

-Kees

-- 
Kees Cook
ChromeOS Security

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-08 19:25         ` Oleg Nesterov
@ 2012-03-08 19:36           ` Cyrill Gorcunov
  2012-03-08 21:48           ` Cyrill Gorcunov
  1 sibling, 0 replies; 48+ messages in thread
From: Cyrill Gorcunov @ 2012-03-08 19:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Matt Helsley, KOSAKI Motohiro, Pavel Emelyanov, Kees Cook,
	Tejun Heo, Andrew Morton, LKML

On Thu, Mar 08, 2012 at 08:25:59PM +0100, Oleg Nesterov wrote:
> >
> > Sure, I can drop num_exe_file_vmas != 0 case and refuse to
> > setup new exe symlink if there some VM_EXECUTABLE remains
> > unmapped. Sounds good?
> 
> Personally I like this. This is simple and _understable_, even
> if ->num_exe_file_vmas has no meaning after PR_SET_MM_EXE.
> 
> But please-please document the new special case in the changelog.
> 

Sure, will update, thanks.

	Cyrill

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-08 19:31 ` Kees Cook
@ 2012-03-08 19:40   ` Cyrill Gorcunov
  2012-03-08 20:02     ` Andy Lutomirski
  0 siblings, 1 reply; 48+ messages in thread
From: Cyrill Gorcunov @ 2012-03-08 19:40 UTC (permalink / raw)
  To: Kees Cook
  Cc: Oleg Nesterov, KOSAKI Motohiro, Pavel Emelyanov, Tejun Heo,
	Andrew Morton, LKML, Andy Lutomirski, Will Drewry

On Thu, Mar 08, 2012 at 11:31:58AM -0800, Kees Cook wrote:
...
> > +       err = -EACCES;
> > +       if (!S_ISREG(dentry->d_inode->i_mode)   ||
> > +           exe_file->f_path.mnt->mnt_flags & MNT_NOEXEC)
> > +               goto exit;
> 
> I'm starting to notice that this pattern (testing ISREG and
> MNT_NOEXEC) is getting repeated a few times in the kernel, and at
> least the no-new-privs patch (not yet in -mm but hopefully soon given
> the seccomp_filter work) updates this pattern everywhere. Perhaps this
> should be extracted into a helper first, and then this patch can call
> that helper here? (And then nnp can just update the single helper.)
> 

I can do that if Andrew agree.

	Cyrill

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-08 19:40   ` Cyrill Gorcunov
@ 2012-03-08 20:02     ` Andy Lutomirski
  2012-03-08 20:06       ` Kees Cook
  2012-03-08 20:07       ` Cyrill Gorcunov
  0 siblings, 2 replies; 48+ messages in thread
From: Andy Lutomirski @ 2012-03-08 20:02 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Kees Cook, Oleg Nesterov, KOSAKI Motohiro, Pavel Emelyanov,
	Tejun Heo, Andrew Morton, LKML, Will Drewry

On Thu, Mar 8, 2012 at 11:40 AM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> On Thu, Mar 08, 2012 at 11:31:58AM -0800, Kees Cook wrote:
> ...
>> > +       err = -EACCES;
>> > +       if (!S_ISREG(dentry->d_inode->i_mode)   ||
>> > +           exe_file->f_path.mnt->mnt_flags & MNT_NOEXEC)
>> > +               goto exit;
>>
>> I'm starting to notice that this pattern (testing ISREG and
>> MNT_NOEXEC) is getting repeated a few times in the kernel, and at
>> least the no-new-privs patch (not yet in -mm but hopefully soon given
>> the seccomp_filter work) updates this pattern everywhere. Perhaps this
>> should be extracted into a helper first, and then this patch can call
>> that helper here? (And then nnp can just update the single helper.)
>>
>
> I can do that if Andrew agree.

I'm a bit lost.  nnp updates the MNT_NOSUID checks, not the MNT_NOEXEC
checks.  (And the effects of the two flags is different in selinux for
historical reasons.)  I'm sure I'm missing something.

--Andy

>
>        Cyrill

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-08 20:02     ` Andy Lutomirski
@ 2012-03-08 20:06       ` Kees Cook
  2012-03-08 20:07       ` Cyrill Gorcunov
  1 sibling, 0 replies; 48+ messages in thread
From: Kees Cook @ 2012-03-08 20:06 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Cyrill Gorcunov, Oleg Nesterov, KOSAKI Motohiro, Pavel Emelyanov,
	Tejun Heo, Andrew Morton, LKML, Will Drewry

On Thu, Mar 8, 2012 at 12:02 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Thu, Mar 8, 2012 at 11:40 AM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
>> On Thu, Mar 08, 2012 at 11:31:58AM -0800, Kees Cook wrote:
>> ...
>>> > +       err = -EACCES;
>>> > +       if (!S_ISREG(dentry->d_inode->i_mode)   ||
>>> > +           exe_file->f_path.mnt->mnt_flags & MNT_NOEXEC)
>>> > +               goto exit;
>>>
>>> I'm starting to notice that this pattern (testing ISREG and
>>> MNT_NOEXEC) is getting repeated a few times in the kernel, and at
>>> least the no-new-privs patch (not yet in -mm but hopefully soon given
>>> the seccomp_filter work) updates this pattern everywhere. Perhaps this
>>> should be extracted into a helper first, and then this patch can call
>>> that helper here? (And then nnp can just update the single helper.)
>>>
>>
>> I can do that if Andrew agree.
>
> I'm a bit lost.  nnp updates the MNT_NOSUID checks, not the MNT_NOEXEC
> checks.  (And the effects of the two flags is different in selinux for
> historical reasons.)  I'm sure I'm missing something.

Oops, you're right. Regardless, we might want helpers anyway. Better
to have single places to do these tests.

-Kees

-- 
Kees Cook
ChromeOS Security

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-08 20:02     ` Andy Lutomirski
  2012-03-08 20:06       ` Kees Cook
@ 2012-03-08 20:07       ` Cyrill Gorcunov
  2012-03-08 20:15         ` Andy Lutomirski
  1 sibling, 1 reply; 48+ messages in thread
From: Cyrill Gorcunov @ 2012-03-08 20:07 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Oleg Nesterov, KOSAKI Motohiro, Pavel Emelyanov,
	Tejun Heo, Andrew Morton, LKML, Will Drewry

On Thu, Mar 08, 2012 at 12:02:50PM -0800, Andy Lutomirski wrote:
> >
> > I can do that if Andrew agree.
> 
> I'm a bit lost.  nnp updates the MNT_NOSUID checks, not the MNT_NOEXEC
> checks.  (And the effects of the two flags is different in selinux for
> historical reasons.)  I'm sure I'm missing something.
> 

Andy, I've no idea what nnp is ;) I was only about to gather those
ISREG/MNT_NOEXEC to one helper since we indeed have a few places in
kernel which do same thing in open-coded manner.

	Cyrill

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-08 20:07       ` Cyrill Gorcunov
@ 2012-03-08 20:15         ` Andy Lutomirski
  2012-03-08 20:21           ` Cyrill Gorcunov
  0 siblings, 1 reply; 48+ messages in thread
From: Andy Lutomirski @ 2012-03-08 20:15 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Kees Cook, Oleg Nesterov, KOSAKI Motohiro, Pavel Emelyanov,
	Tejun Heo, Andrew Morton, LKML, Will Drewry

On Thu, Mar 8, 2012 at 12:07 PM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> On Thu, Mar 08, 2012 at 12:02:50PM -0800, Andy Lutomirski wrote:
>> >
>> > I can do that if Andrew agree.
>>
>> I'm a bit lost.  nnp updates the MNT_NOSUID checks, not the MNT_NOEXEC
>> checks.  (And the effects of the two flags is different in selinux for
>> historical reasons.)  I'm sure I'm missing something.
>>
>
> Andy, I've no idea what nnp is ;) I was only about to gather those
> ISREG/MNT_NOEXEC to one helper since we indeed have a few places in
> kernel which do same thing in open-coded manner.

Am I not the Andrew you were referring to?

--Andy

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-08 20:15         ` Andy Lutomirski
@ 2012-03-08 20:21           ` Cyrill Gorcunov
  2012-03-08 20:24             ` Andy Lutomirski
  0 siblings, 1 reply; 48+ messages in thread
From: Cyrill Gorcunov @ 2012-03-08 20:21 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Oleg Nesterov, KOSAKI Motohiro, Pavel Emelyanov,
	Tejun Heo, Andrew Morton, LKML, Will Drewry

On Thu, Mar 08, 2012 at 12:15:55PM -0800, Andy Lutomirski wrote:
> On Thu, Mar 8, 2012 at 12:07 PM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> > On Thu, Mar 08, 2012 at 12:02:50PM -0800, Andy Lutomirski wrote:
> >> >
> >> > I can do that if Andrew agree.
> >>
> >> I'm a bit lost.  nnp updates the MNT_NOSUID checks, not the MNT_NOEXEC
> >> checks.  (And the effects of the two flags is different in selinux for
> >> historical reasons.)  I'm sure I'm missing something.
> >>
> >
> > Andy, I've no idea what nnp is ;) I was only about to gather those
> > ISREG/MNT_NOEXEC to one helper since we indeed have a few places in
> > kernel which do same thing in open-coded manner.
> 
> Am I not the Andrew you were referring to?
> 

Nope, I meant Andrew Morton /because this patch is for -mm/ ;)

I've been in To: field that's why I replied you about nnp
(and, btw, what nnp is? not "Net national product" I suppose,
 this hint wikipedia gave me)

	Cyrill

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-08 20:21           ` Cyrill Gorcunov
@ 2012-03-08 20:24             ` Andy Lutomirski
  2012-03-08 20:28               ` Cyrill Gorcunov
  2012-03-08 21:57               ` Cyrill Gorcunov
  0 siblings, 2 replies; 48+ messages in thread
From: Andy Lutomirski @ 2012-03-08 20:24 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Kees Cook, Oleg Nesterov, KOSAKI Motohiro, Pavel Emelyanov,
	Tejun Heo, Andrew Morton, LKML, Will Drewry

On Thu, Mar 8, 2012 at 12:21 PM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> On Thu, Mar 08, 2012 at 12:15:55PM -0800, Andy Lutomirski wrote:
>> On Thu, Mar 8, 2012 at 12:07 PM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
>> > On Thu, Mar 08, 2012 at 12:02:50PM -0800, Andy Lutomirski wrote:
>> >> >
>> >> > I can do that if Andrew agree.
>> >>
>> >> I'm a bit lost.  nnp updates the MNT_NOSUID checks, not the MNT_NOEXEC
>> >> checks.  (And the effects of the two flags is different in selinux for
>> >> historical reasons.)  I'm sure I'm missing something.
>> >>
>> >
>> > Andy, I've no idea what nnp is ;) I was only about to gather those
>> > ISREG/MNT_NOEXEC to one helper since we indeed have a few places in
>> > kernel which do same thing in open-coded manner.
>>
>> Am I not the Andrew you were referring to?
>>
>
> Nope, I meant Andrew Morton /because this patch is for -mm/ ;)
>
> I've been in To: field that's why I replied you about nnp
> (and, btw, what nnp is? not "Net national product" I suppose,
>  this hint wikipedia gave me)

nnp is no_new_privs, which is my patch and is almost, but not quite,
very relevant to this discussion.  Hence my confusion ;)

FWIW, since I've touched this code recently, the cleanup you're
suggesting sounds good.

--Andy

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-08 20:24             ` Andy Lutomirski
@ 2012-03-08 20:28               ` Cyrill Gorcunov
  2012-03-08 21:57               ` Cyrill Gorcunov
  1 sibling, 0 replies; 48+ messages in thread
From: Cyrill Gorcunov @ 2012-03-08 20:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Oleg Nesterov, KOSAKI Motohiro, Pavel Emelyanov,
	Tejun Heo, Andrew Morton, LKML, Will Drewry

On Thu, Mar 08, 2012 at 12:24:38PM -0800, Andy Lutomirski wrote:
...
> >
> > I've been in To: field that's why I replied you about nnp
> > (and, btw, what nnp is? not "Net national product" I suppose,
> >  this hint wikipedia gave me)
> 
> nnp is no_new_privs, which is my patch and is almost, but not quite,
> very relevant to this discussion.  Hence my confusion ;)

Ah, good to know (I'm on 3.3-rc6 now, so I've not yet noticed
this nnp helper :)

> 
> FWIW, since I've touched this code recently, the cleanup you're
> suggesting sounds good.

OK, once I prepare the helper we will see how it fits with
other code.

	Cyrill

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-08 19:25         ` Oleg Nesterov
  2012-03-08 19:36           ` Cyrill Gorcunov
@ 2012-03-08 21:48           ` Cyrill Gorcunov
  2012-03-09 12:48             ` Oleg Nesterov
  1 sibling, 1 reply; 48+ messages in thread
From: Cyrill Gorcunov @ 2012-03-08 21:48 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Matt Helsley, KOSAKI Motohiro, Pavel Emelyanov, Kees Cook,
	Tejun Heo, Andrew Morton, LKML

On Thu, Mar 08, 2012 at 08:25:59PM +0100, Oleg Nesterov wrote:
...
> 
> But please-please document the new special case in the changelog.
> 

Oleg, will the following change log sound more or less fine?

	Cyrill
---
From: Cyrill Gorcunov <gorcunov@openvz.org>
Subject: c/r: prctl: Add ability to set new mm_struct::exe_file v4

When we do restore we would like to have a way to setup
a former mm_struct::exe_file so that /proc/pid/exe would
point to the original executable file a process had at
checkpoint time.

For this the PR_SET_MM_EXE_FILE code is introduced.
This option takes a file descriptor which will be
set as a source for new /proc/$pid/exe symlink.

Note it allows to change /proc/$pid/exe iif there
are no VM_EXECUTABLE vmas present for current process,
simply because this feature is a special to C/R
and mm::num_exe_file_vmas become meaningless after
that.

This feature is available iif CONFIG_CHECKPOINT_RESTORE is set.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Kees Cook <keescook@chromium.org>
CC: Tejun Heo <tj@kernel.org>
CC: Matt Helsley <matthltc@us.ibm.com>
---
 include/linux/prctl.h |    1 
 kernel/sys.c          |   54 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+)

Index: linux-2.6.git/include/linux/prctl.h
===================================================================
--- linux-2.6.git.orig/include/linux/prctl.h
+++ linux-2.6.git/include/linux/prctl.h
@@ -118,5 +118,6 @@
 # define PR_SET_MM_ENV_START		10
 # define PR_SET_MM_ENV_END		11
 # define PR_SET_MM_AUXV			12
+# define PR_SET_MM_EXE_FILE		13
 
 #endif /* _LINUX_PRCTL_H */
Index: linux-2.6.git/kernel/sys.c
===================================================================
--- linux-2.6.git.orig/kernel/sys.c
+++ linux-2.6.git/kernel/sys.c
@@ -36,6 +36,8 @@
 #include <linux/personality.h>
 #include <linux/ptrace.h>
 #include <linux/fs_struct.h>
+#include <linux/file.h>
+#include <linux/mount.h>
 #include <linux/gfp.h>
 #include <linux/syscore_ops.h>
 #include <linux/version.h>
@@ -1701,6 +1703,55 @@ static bool vma_flags_mismatch(struct vm
 		(vma->vm_flags & banned);
 }
 
+static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
+{
+	struct file *exe_file;
+	struct dentry *dentry;
+	int err;
+
+	exe_file = fget(fd);
+	if (!exe_file)
+		return -EBADF;
+
+	dentry = exe_file->f_path.dentry;
+
+	/*
+	 * Because the original mm->exe_file
+	 * points to executable file, make sure
+	 * this one is executable as well to not
+	 * break an overall picture.
+	 */
+	err = -EACCES;
+	if (!S_ISREG(dentry->d_inode->i_mode)	||
+	    exe_file->f_path.mnt->mnt_flags & MNT_NOEXEC)
+		goto exit;
+
+	err = inode_permission(dentry->d_inode, MAY_EXEC);
+	if (err)
+		goto exit;
+
+	/*
+	 * Setting new mm::exe_file is only allowed
+	 * when no executable VMAs left. This is
+	 * special C/R case when a restored program
+	 * need to change own /proc/$pid/exe symlink.
+	 * After this call mm::num_exe_file_vmas become
+	 * meaningless.
+	 */
+	down_write(&mm->mmap_sem);
+	if (mm->num_exe_file_vmas == 0) {
+		set_mm_exe_file(mm, exe_file);
+		exe_file = NULL;
+	} else
+		err = -EBUSY;
+	up_write(&mm->mmap_sem);
+
+exit:
+	if (exe_file)
+		fput(exe_file);
+	return err;
+}
+
 static int prctl_set_mm(int opt, unsigned long addr,
 			unsigned long arg4, unsigned long arg5)
 {
@@ -1715,6 +1766,9 @@ static int prctl_set_mm(int opt, unsigne
 	if (!capable(CAP_SYS_RESOURCE))
 		return -EPERM;
 
+	if (opt == PR_SET_MM_EXE_FILE)
+		return prctl_set_mm_exe_file(mm, (unsigned int)addr);
+
 	if (addr >= TASK_SIZE)
 		return -EINVAL;
 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-08 20:24             ` Andy Lutomirski
  2012-03-08 20:28               ` Cyrill Gorcunov
@ 2012-03-08 21:57               ` Cyrill Gorcunov
  2012-03-08 22:03                 ` Kees Cook
  1 sibling, 1 reply; 48+ messages in thread
From: Cyrill Gorcunov @ 2012-03-08 21:57 UTC (permalink / raw)
  To: Andy Lutomirski, Kees Cook
  Cc: Oleg Nesterov, KOSAKI Motohiro, Pavel Emelyanov, Tejun Heo,
	Andrew Morton, LKML, Will Drewry

On Thu, Mar 08, 2012 at 12:24:38PM -0800, Andy Lutomirski wrote:
> 
> nnp is no_new_privs, which is my patch and is almost, but not quite,
> very relevant to this discussion.  Hence my confusion ;)
> 
> FWIW, since I've touched this code recently, the cleanup you're
> suggesting sounds good.
> 

Andy, Kees, I guess the patch below might be a helper we need,
while I'm not sure on naming. hm?

	Cyrill
---
 include/linux/fs.h |    6 ++++++
 1 file changed, 6 insertions(+)

Index: linux-2.6.git/include/linux/fs.h
===================================================================
--- linux-2.6.git.orig/include/linux/fs.h
+++ linux-2.6.git/include/linux/fs.h
@@ -2669,5 +2669,11 @@ static inline void inode_has_no_xattr(st
 		inode->i_flags |= S_NOSEC;
 }
 
+static inline bool file_may_exec(struct file *f)
+{
+	return S_ISREG(f->f_path.dentry->d_inode->i_mode) &&
+		!(f->f_path.mnt->mnt_flags & MNT_NOEXEC);
+}
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_FS_H */

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-08 21:57               ` Cyrill Gorcunov
@ 2012-03-08 22:03                 ` Kees Cook
  2012-03-08 22:12                   ` Cyrill Gorcunov
  0 siblings, 1 reply; 48+ messages in thread
From: Kees Cook @ 2012-03-08 22:03 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Andy Lutomirski, Oleg Nesterov, KOSAKI Motohiro, Pavel Emelyanov,
	Tejun Heo, Andrew Morton, LKML, Will Drewry

On Thu, Mar 8, 2012 at 1:57 PM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> On Thu, Mar 08, 2012 at 12:24:38PM -0800, Andy Lutomirski wrote:
>>
>> nnp is no_new_privs, which is my patch and is almost, but not quite,
>> very relevant to this discussion.  Hence my confusion ;)
>>
>> FWIW, since I've touched this code recently, the cleanup you're
>> suggesting sounds good.
>>
>
> Andy, Kees, I guess the patch below might be a helper we need,
> while I'm not sure on naming. hm?
>
>        Cyrill
> ---
>  include/linux/fs.h |    6 ++++++
>  1 file changed, 6 insertions(+)
>
> Index: linux-2.6.git/include/linux/fs.h
> ===================================================================
> --- linux-2.6.git.orig/include/linux/fs.h
> +++ linux-2.6.git/include/linux/fs.h
> @@ -2669,5 +2669,11 @@ static inline void inode_has_no_xattr(st
>                inode->i_flags |= S_NOSEC;
>  }
>
> +static inline bool file_may_exec(struct file *f)
> +{
> +       return S_ISREG(f->f_path.dentry->d_inode->i_mode) &&
> +               !(f->f_path.mnt->mnt_flags & MNT_NOEXEC);
> +}
> +
>  #endif /* __KERNEL__ */
>  #endif /* _LINUX_FS_H */

How about "file_is_exec" instead, since it doesn't (and likely
shouldn't) include the inode_permission(..., EXEC)? I'd like other
people's thoughts on this since maybe it's not needed and I instead
have accidentally derailed this patch with useless bike shedding.

-Kees

-- 
Kees Cook
ChromeOS Security

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-08 22:03                 ` Kees Cook
@ 2012-03-08 22:12                   ` Cyrill Gorcunov
  2012-03-08 22:14                     ` Kees Cook
  0 siblings, 1 reply; 48+ messages in thread
From: Cyrill Gorcunov @ 2012-03-08 22:12 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Lutomirski, Oleg Nesterov, KOSAKI Motohiro, Pavel Emelyanov,
	Tejun Heo, Andrew Morton, LKML, Will Drewry

On Thu, Mar 08, 2012 at 02:03:13PM -0800, Kees Cook wrote:
> 
> How about "file_is_exec" instead, since it doesn't (and likely
> shouldn't) include the inode_permission(..., EXEC)? I'd like other
> people's thoughts on this since maybe it's not needed and I instead
> have accidentally derailed this patch with useless bike shedding.
> 

Yup. Anyway, I'm leaving the former mm_struct::exe_file patch with
EXEC test opencoded so we can do everything on top then.

	Cyrill

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-08 22:12                   ` Cyrill Gorcunov
@ 2012-03-08 22:14                     ` Kees Cook
  0 siblings, 0 replies; 48+ messages in thread
From: Kees Cook @ 2012-03-08 22:14 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Andy Lutomirski, Oleg Nesterov, KOSAKI Motohiro, Pavel Emelyanov,
	Tejun Heo, Andrew Morton, LKML, Will Drewry

On Thu, Mar 8, 2012 at 2:12 PM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> On Thu, Mar 08, 2012 at 02:03:13PM -0800, Kees Cook wrote:
>>
>> How about "file_is_exec" instead, since it doesn't (and likely
>> shouldn't) include the inode_permission(..., EXEC)? I'd like other
>> people's thoughts on this since maybe it's not needed and I instead
>> have accidentally derailed this patch with useless bike shedding.
>>
>
> Yup. Anyway, I'm leaving the former mm_struct::exe_file patch with
> EXEC test opencoded so we can do everything on top then.

Sounds good to me. Thanks for putting up with my distraction! :)

-Kees

-- 
Kees Cook
ChromeOS Security

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-08 21:48           ` Cyrill Gorcunov
@ 2012-03-09 12:48             ` Oleg Nesterov
  2012-03-09 12:57               ` Cyrill Gorcunov
  0 siblings, 1 reply; 48+ messages in thread
From: Oleg Nesterov @ 2012-03-09 12:48 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Matt Helsley, KOSAKI Motohiro, Pavel Emelyanov, Kees Cook,
	Tejun Heo, Andrew Morton, LKML

On 03/09, Cyrill Gorcunov wrote:
>
> +static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
> +{
> +	struct file *exe_file;
> +	struct dentry *dentry;
> +	int err;
> +
> +	exe_file = fget(fd);
> +	if (!exe_file)
> +		return -EBADF;
> +
> +	dentry = exe_file->f_path.dentry;
> +
> +	/*
> +	 * Because the original mm->exe_file
> +	 * points to executable file, make sure
> +	 * this one is executable as well to not
> +	 * break an overall picture.
> +	 */
> +	err = -EACCES;
> +	if (!S_ISREG(dentry->d_inode->i_mode)	||
> +	    exe_file->f_path.mnt->mnt_flags & MNT_NOEXEC)
> +		goto exit;
> +
> +	err = inode_permission(dentry->d_inode, MAY_EXEC);
> +	if (err)
> +		goto exit;
> +
> +	/*
> +	 * Setting new mm::exe_file is only allowed
> +	 * when no executable VMAs left. This is
                   ^^^^^^^^^^
Perhaps this is just me, but imho "executable" is not clear enough.
I'd suggest VM_EXECUTABLE to avoid the confusion with VM_EXEC.

> +	 * special C/R case when a restored program
> +	 * need to change own /proc/$pid/exe symlink.
> +	 * After this call mm::num_exe_file_vmas become
> +	 * meaningless.
> +	 */
> +	down_write(&mm->mmap_sem);
> +	if (mm->num_exe_file_vmas == 0) {

You can check this at the very start lockless and simplify the code.
Once it is zero, it can never grow (or we have a bug anyway).

> +		set_mm_exe_file(mm, exe_file);
> +		exe_file = NULL;
> +	} else
> +		err = -EBUSY;
> +	up_write(&mm->mmap_sem);
> +
> +exit:
> +	if (exe_file)
> +		fput(exe_file);

This doesn't look correct, you need fput() in any case.
set_mm_exe_file() does another get_file().

Oleg.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-09 12:48             ` Oleg Nesterov
@ 2012-03-09 12:57               ` Cyrill Gorcunov
  2012-03-09 13:35                 ` Cyrill Gorcunov
  0 siblings, 1 reply; 48+ messages in thread
From: Cyrill Gorcunov @ 2012-03-09 12:57 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Matt Helsley, KOSAKI Motohiro, Pavel Emelyanov, Kees Cook,
	Tejun Heo, Andrew Morton, LKML

On Fri, Mar 09, 2012 at 01:48:11PM +0100, Oleg Nesterov wrote:
...
> > +
> > +	/*
> > +	 * Setting new mm::exe_file is only allowed
> > +	 * when no executable VMAs left. This is
>                    ^^^^^^^^^^
> Perhaps this is just me, but imho "executable" is not clear enough.
> I'd suggest VM_EXECUTABLE to avoid the confusion with VM_EXEC.

OK

> 
> > +	 * special C/R case when a restored program
> > +	 * need to change own /proc/$pid/exe symlink.
> > +	 * After this call mm::num_exe_file_vmas become
> > +	 * meaningless.
> > +	 */
> > +	down_write(&mm->mmap_sem);
> > +	if (mm->num_exe_file_vmas == 0) {
> 
> You can check this at the very start lockless and simplify the code.
> Once it is zero, it can never grow (or we have a bug anyway).

sure

> 
> > +		set_mm_exe_file(mm, exe_file);
> > +		exe_file = NULL;
> > +	} else
> > +		err = -EBUSY;
> > +	up_write(&mm->mmap_sem);
> > +
> > +exit:
> > +	if (exe_file)
> > +		fput(exe_file);
> 
> This doesn't look correct, you need fput() in any case.
> set_mm_exe_file() does another get_file().

yeah, thanks, will update.

	Cyrill

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-09 12:57               ` Cyrill Gorcunov
@ 2012-03-09 13:35                 ` Cyrill Gorcunov
  2012-03-09 13:47                   ` Oleg Nesterov
  0 siblings, 1 reply; 48+ messages in thread
From: Cyrill Gorcunov @ 2012-03-09 13:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Matt Helsley, KOSAKI Motohiro, Pavel Emelyanov, Kees Cook,
	Tejun Heo, Andrew Morton, LKML

On Fri, Mar 09, 2012 at 04:57:35PM +0400, Cyrill Gorcunov wrote:
> 
> yeah, thanks, will update.
> 

This one should fit all requirements I hope.

	Cyrill
---
From: Cyrill Gorcunov <gorcunov@openvz.org>
Subject: c/r: prctl: Add ability to set new mm_struct::exe_file v5

When we do restore we would like to have a way to setup
a former mm_struct::exe_file so that /proc/pid/exe would
point to the original executable file a process had at
checkpoint time.

For this the PR_SET_MM_EXE_FILE code is introduced.
This option takes a file descriptor which will be
set as a source for new /proc/$pid/exe symlink.

Note it allows to change /proc/$pid/exe iif there
are no VM_EXECUTABLE vmas present for current process,
simply because this feature is a special to C/R
and mm::num_exe_file_vmas become meaningless after
that.

This feature is available iif CONFIG_CHECKPOINT_RESTORE is set.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Kees Cook <keescook@chromium.org>
CC: Tejun Heo <tj@kernel.org>
CC: Matt Helsley <matthltc@us.ibm.com>
---
 include/linux/prctl.h |    1 
 kernel/sys.c          |   56 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)

Index: linux-2.6.git/include/linux/prctl.h
===================================================================
--- linux-2.6.git.orig/include/linux/prctl.h
+++ linux-2.6.git/include/linux/prctl.h
@@ -118,5 +118,6 @@
 # define PR_SET_MM_ENV_START		10
 # define PR_SET_MM_ENV_END		11
 # define PR_SET_MM_AUXV			12
+# define PR_SET_MM_EXE_FILE		13
 
 #endif /* _LINUX_PRCTL_H */
Index: linux-2.6.git/kernel/sys.c
===================================================================
--- linux-2.6.git.orig/kernel/sys.c
+++ linux-2.6.git/kernel/sys.c
@@ -36,6 +36,8 @@
 #include <linux/personality.h>
 #include <linux/ptrace.h>
 #include <linux/fs_struct.h>
+#include <linux/file.h>
+#include <linux/mount.h>
 #include <linux/gfp.h>
 #include <linux/syscore_ops.h>
 #include <linux/version.h>
@@ -1701,6 +1703,57 @@ static bool vma_flags_mismatch(struct vm
 		(vma->vm_flags & banned);
 }
 
+static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
+{
+	struct file *exe_file;
+	struct dentry *dentry;
+	int err;
+
+	if (mm->num_exe_file_vmas)
+		return -EBUSY;
+
+	exe_file = fget(fd);
+	if (!exe_file)
+		return -EBADF;
+
+	dentry = exe_file->f_path.dentry;
+
+	/*
+	 * Because the original mm->exe_file
+	 * points to executable file, make sure
+	 * this one is executable as well to not
+	 * break an overall picture.
+	 */
+	err = -EACCES;
+	if (!S_ISREG(dentry->d_inode->i_mode)	||
+	    exe_file->f_path.mnt->mnt_flags & MNT_NOEXEC)
+		goto exit;
+
+	err = inode_permission(dentry->d_inode, MAY_EXEC);
+	if (err)
+		goto exit;
+
+	/*
+	 * Setting new mm::exe_file is only allowed
+	 * when no VM_EXECUTABLE vma's left. This is
+	 * a special C/R case when a restored program
+	 * need to change own /proc/$pid/exe symlink.
+	 * After this call mm::num_exe_file_vmas become
+	 * meaningless. If mm::num_exe_file_vmas will
+	 * ever increase back from zero -- this code
+	 * needs to be revised, thus WARN_ here, just
+	 * to be sure.
+	 */
+	down_write(&mm->mmap_sem);
+	WARN_ON_ONCE(mm->num_exe_file_vmas);
+	set_mm_exe_file(mm, exe_file);
+	up_write(&mm->mmap_sem);
+
+exit:
+	fput(exe_file);
+	return err;
+}
+
 static int prctl_set_mm(int opt, unsigned long addr,
 			unsigned long arg4, unsigned long arg5)
 {
@@ -1715,6 +1768,9 @@ static int prctl_set_mm(int opt, unsigne
 	if (!capable(CAP_SYS_RESOURCE))
 		return -EPERM;
 
+	if (opt == PR_SET_MM_EXE_FILE)
+		return prctl_set_mm_exe_file(mm, (unsigned int)addr);
+
 	if (addr >= TASK_SIZE)
 		return -EINVAL;
 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-09 13:35                 ` Cyrill Gorcunov
@ 2012-03-09 13:47                   ` Oleg Nesterov
  2012-03-09 14:13                     ` Cyrill Gorcunov
  0 siblings, 1 reply; 48+ messages in thread
From: Oleg Nesterov @ 2012-03-09 13:47 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Matt Helsley, KOSAKI Motohiro, Pavel Emelyanov, Kees Cook,
	Tejun Heo, Andrew Morton, LKML

On 03/09, Cyrill Gorcunov wrote:
>
> On Fri, Mar 09, 2012 at 04:57:35PM +0400, Cyrill Gorcunov wrote:
> >
> > yeah, thanks, will update.
> >
>
> This one should fit all requirements I hope.

Oh, sorry Cyrill, I simply can't resist...

> +static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
> +{
> +	struct file *exe_file;
> +	struct dentry *dentry;
> +	int err;
> +
> +	if (mm->num_exe_file_vmas)
> +		return -EBUSY;
> +
> +	exe_file = fget(fd);
> +	if (!exe_file)
> +		return -EBADF;
> +
> +	dentry = exe_file->f_path.dentry;
> +
> +	/*
> +	 * Because the original mm->exe_file
> +	 * points to executable file, make sure
> +	 * this one is executable as well to not
> +	 * break an overall picture.
> +	 */
> +	err = -EACCES;
> +	if (!S_ISREG(dentry->d_inode->i_mode)	||
> +	    exe_file->f_path.mnt->mnt_flags & MNT_NOEXEC)
> +		goto exit;
> +
> +	err = inode_permission(dentry->d_inode, MAY_EXEC);
> +	if (err)
> +		goto exit;
> +
> +	/*
> +	 * Setting new mm::exe_file is only allowed
> +	 * when no VM_EXECUTABLE vma's left. This is
> +	 * a special C/R case when a restored program
> +	 * need to change own /proc/$pid/exe symlink.
> +	 * After this call mm::num_exe_file_vmas become
> +	 * meaningless. If mm::num_exe_file_vmas will
> +	 * ever increase back from zero -- this code
> +	 * needs to be revised, thus WARN_ here, just
> +	 * to be sure.

To be shure in what??

> +	 */
> +	down_write(&mm->mmap_sem);
> +	WARN_ON_ONCE(mm->num_exe_file_vmas);

We already checked it is zero. Yes, it shouldn't grow. But why
do we need another check here?

If it can grow, it can grow after we drop mmap_sem as well and
this would be wrong. So may be we need another WARN_ON() at the
end?

I'd understand if you add something like

	WARN_ON(!mm->num_exe_file_vmas && !current->in_exec);

into added_exe_file_vma().

Or

	WARN_ON(mm->num_exe_file_vmas <= 0);

into removed_exe_file_vma().

But imho your WARN looks like "OK, I checked it lockless but I
am not sure this is correct".

Oleg.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-09 13:47                   ` Oleg Nesterov
@ 2012-03-09 14:13                     ` Cyrill Gorcunov
  2012-03-09 14:26                       ` Oleg Nesterov
  0 siblings, 1 reply; 48+ messages in thread
From: Cyrill Gorcunov @ 2012-03-09 14:13 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Matt Helsley, KOSAKI Motohiro, Pavel Emelyanov, Kees Cook,
	Tejun Heo, Andrew Morton, LKML

On Fri, Mar 09, 2012 at 02:47:32PM +0100, Oleg Nesterov wrote:
> On 03/09, Cyrill Gorcunov wrote:
> >
> > On Fri, Mar 09, 2012 at 04:57:35PM +0400, Cyrill Gorcunov wrote:
> > >
> > > yeah, thanks, will update.
> > >
> >
> > This one should fit all requirements I hope.
> 
> Oh, sorry Cyrill, I simply can't resist...

Hehe ;) No problem, please continue complaining,
I don't wanna miss something and try to merge a
patch with nit/error/or-whatever.

> > +	/*
> > +	 * Setting new mm::exe_file is only allowed
> > +	 * when no VM_EXECUTABLE vma's left. This is
> > +	 * a special C/R case when a restored program
> > +	 * need to change own /proc/$pid/exe symlink.
> > +	 * After this call mm::num_exe_file_vmas become
> > +	 * meaningless. If mm::num_exe_file_vmas will
> > +	 * ever increase back from zero -- this code
> > +	 * needs to be revised, thus WARN_ here, just
> > +	 * to be sure.
> 
> To be shure in what??

To be sure it's not increased somewhere else before
down_write taken.

> 
> > +	 */
> > +	down_write(&mm->mmap_sem);
> > +	WARN_ON_ONCE(mm->num_exe_file_vmas);
> 
> We already checked it is zero. Yes, it shouldn't grow. But why
> do we need another check here?
> 
> If it can grow, it can grow after we drop mmap_sem as well and
> this would be wrong. So may be we need another WARN_ON() at the
> end?
> 
> I'd understand if you add something like
> 
> 	WARN_ON(!mm->num_exe_file_vmas && !current->in_exec);
> 
> into added_exe_file_vma().
> 
> Or
> 	WARN_ON(mm->num_exe_file_vmas <= 0);
> 
> into removed_exe_file_vma().

This one looks like a good idea for me -- it's cheap and
not a hot path.

> 
> But imho your WARN looks like "OK, I checked it lockless but I
> am not sure this is correct".

Oleg, I bet if someone will be changing num_exe_file_vmas overall
idea -- this prctl code will be fixed at last moment (if ever) only
because it's very specific, so I wanted to not miss such moment
and add some check that the rest of the kernel is in a good state.
This test is cheap but may prevent potential problem if one day
mm::exe_file concept will be reworked.

Sure I can simply drop this WARN_ON ;)

	Cyrill

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-09 14:13                     ` Cyrill Gorcunov
@ 2012-03-09 14:26                       ` Oleg Nesterov
  2012-03-09 14:42                         ` Cyrill Gorcunov
  0 siblings, 1 reply; 48+ messages in thread
From: Oleg Nesterov @ 2012-03-09 14:26 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Matt Helsley, KOSAKI Motohiro, Pavel Emelyanov, Kees Cook,
	Tejun Heo, Andrew Morton, LKML

On 03/09, Cyrill Gorcunov wrote:
>
> > > +	/*
> > > +	 * Setting new mm::exe_file is only allowed
> > > +	 * when no VM_EXECUTABLE vma's left. This is
> > > +	 * a special C/R case when a restored program
> > > +	 * need to change own /proc/$pid/exe symlink.
> > > +	 * After this call mm::num_exe_file_vmas become
> > > +	 * meaningless. If mm::num_exe_file_vmas will
> > > +	 * ever increase back from zero -- this code
> > > +	 * needs to be revised, thus WARN_ here, just
> > > +	 * to be sure.
> >
> > To be shure in what??
>
> To be sure it's not increased somewhere else before
> down_write taken.

Who can do this? Only another CLONE_VM thread. And _only_ if we
already have the bug in mm_exe accounting logic. And only if that
thread does something to trigger the bug in the small window
between.

> > I'd understand if you add something like
> >
> > 	WARN_ON(!mm->num_exe_file_vmas && !current->in_exec);
> >
> > into added_exe_file_vma().
> >
> > Or
> > 	WARN_ON(mm->num_exe_file_vmas <= 0);
> >
> > into removed_exe_file_vma().
>
> This one looks like a good idea for me -- it's cheap and
> not a hot path.

But not in this patch, please.

> > But imho your WARN looks like "OK, I checked it lockless but I
> > am not sure this is correct".
>
> Oleg, I bet if someone will be changing num_exe_file_vmas overall
> idea -- this prctl code will be fixed at last moment (if ever) only
> because it's very specific, so I wanted to not miss such moment
> and add some check that the rest of the kernel is in a good state.
> This test is cheap but may prevent potential problem if one day
> mm::exe_file concept will be reworked.

The test is cheap indeed. If you mean performance-wise.

But it looks confusing, imho. I do not care about a couple of CPU
cycles. The code should be optimized for the reading in the first
place, not for executing ;) Imho, of course.

And once again. Following your logic you need another WARN_ON()
right after we drop mmap_sem. Why? To be sure it's not increased
somewhere else _after_ down_write taken. And another one after
fput.

Sure, bugs are possible. And yes, in theory this WARN_ON() can
catch some problem. But there is tradeoff. Given that you need
another thread to trigger the (potential) bug and the window is
tiny, how high do you estimate the probability it can help?

> Sure I can simply drop this WARN_ON ;)

Oh, keep it if you like it ;)

Yes I hate it, but you are the author and this is almost cosmetic.

Oleg.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-09 14:26                       ` Oleg Nesterov
@ 2012-03-09 14:42                         ` Cyrill Gorcunov
  2012-03-09 15:21                           ` Oleg Nesterov
  0 siblings, 1 reply; 48+ messages in thread
From: Cyrill Gorcunov @ 2012-03-09 14:42 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Matt Helsley, KOSAKI Motohiro, Pavel Emelyanov, Kees Cook,
	Tejun Heo, Andrew Morton, LKML

On Fri, Mar 09, 2012 at 03:26:20PM +0100, Oleg Nesterov wrote:
> >
> > To be sure it's not increased somewhere else before
> > down_write taken.
> 
> Who can do this? Only another CLONE_VM thread. And _only_ if we
> already have the bug in mm_exe accounting logic. And only if that
> thread does something to trigger the bug in the small window
> between.

ok, agreed.

> > > into removed_exe_file_vma().
> >
> > This one looks like a good idea for me -- it's cheap and
> > not a hot path.
> 
> But not in this patch, please.
> 

Sure.

> > > But imho your WARN looks like "OK, I checked it lockless but I
> > > am not sure this is correct".
> >
> > Oleg, I bet if someone will be changing num_exe_file_vmas overall
> > idea -- this prctl code will be fixed at last moment (if ever) only
> > because it's very specific, so I wanted to not miss such moment
> > and add some check that the rest of the kernel is in a good state.
> > This test is cheap but may prevent potential problem if one day
> > mm::exe_file concept will be reworked.
> 
> The test is cheap indeed. If you mean performance-wise.
> 
> But it looks confusing, imho. I do not care about a couple of CPU
> cycles. The code should be optimized for the reading in the first
> place, not for executing ;) Imho, of course.
> 
> And once again. Following your logic you need another WARN_ON()
> right after we drop mmap_sem. Why? To be sure it's not increased
> somewhere else _after_ down_write taken. And another one after
> fput.
> 
> Sure, bugs are possible. And yes, in theory this WARN_ON() can
> catch some problem. But there is tradeoff. Given that you need
> another thread to trigger the (potential) bug and the window is
> tiny, how high do you estimate the probability it can help?
> 
> > Sure I can simply drop this WARN_ON ;)
> 
> Oh, keep it if you like it ;)
> 
> Yes I hate it, but you are the author and this is almost cosmetic.

OK, Oleg, can't argue, you've convinced me ;) I'll drop this WARN_ON.
Would it be enough for your Reviewed-by tag? /me hides

	Cyrill

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-09 14:42                         ` Cyrill Gorcunov
@ 2012-03-09 15:21                           ` Oleg Nesterov
  2012-03-09 15:42                             ` Cyrill Gorcunov
  0 siblings, 1 reply; 48+ messages in thread
From: Oleg Nesterov @ 2012-03-09 15:21 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Matt Helsley, KOSAKI Motohiro, Pavel Emelyanov, Kees Cook,
	Tejun Heo, Andrew Morton, LKML

On 03/09, Cyrill Gorcunov wrote:
>
> OK, Oleg, can't argue, you've convinced me ;) I'll drop this WARN_ON.

Yes, yes, I am a troll!

> Would it be enough for your Reviewed-by tag? /me hides

Yes.

But, when you send the patch to Andrew, please remind that he
should remove the evidence of my ignorance from -mm first.


Just one note for the record, prctl_set_mm_exe_file() does

	if (mm->num_exe_file_vmas)
		return -EBUSY;

We could do

	if (mm->exe_file)
		return -EBUSY;

This way "because this feature is a special to C/R" becomes
really true. IOW, you can't do PR_SET_MM_EXE_FILE twice.

I am fine either way, just I want to ensure you really want
the current version.

Oleg.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-09 15:21                           ` Oleg Nesterov
@ 2012-03-09 15:42                             ` Cyrill Gorcunov
  2012-03-09 22:02                               ` Matt Helsley
  0 siblings, 1 reply; 48+ messages in thread
From: Cyrill Gorcunov @ 2012-03-09 15:42 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Matt Helsley, KOSAKI Motohiro, Pavel Emelyanov, Kees Cook,
	Tejun Heo, Andrew Morton, LKML

On Fri, Mar 09, 2012 at 04:21:22PM +0100, Oleg Nesterov wrote:
> 
> But, when you send the patch to Andrew, please remind that he
> should remove the evidence of my ignorance from -mm first.
> 

Ah, will ping him, sure. I'll send two patches in series
one -- this patch and second

"From: Andrew Vagin <avagin@openvz.org>"
"Subject: c/r: prctl: Add ability to get clear_tid_address"

for which I've got no feedback at all :( Is there some
fundamental problem with this patch? Or everyone simply
agree on it? ;) Anyway, I hope with next series I'll get
some feedback fron anyone in CC list :)

> 
> Just one note for the record, prctl_set_mm_exe_file() does
> 
> 	if (mm->num_exe_file_vmas)
> 		return -EBUSY;
> 
> We could do
> 
> 	if (mm->exe_file)
> 		return -EBUSY;
> 
> This way "because this feature is a special to C/R" becomes
> really true. IOW, you can't do PR_SET_MM_EXE_FILE twice.
> 

Sure, i'll make it this way. Thanks a lot, Oleg!!!

> I am fine either way, just I want to ensure you really want
> the current version.
> 

	Cyrill

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-08 19:03   ` Cyrill Gorcunov
  2012-03-08 19:05     ` Oleg Nesterov
@ 2012-03-09 21:46     ` Matt Helsley
  2012-03-09 21:52       ` Cyrill Gorcunov
  1 sibling, 1 reply; 48+ messages in thread
From: Matt Helsley @ 2012-03-09 21:46 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Oleg Nesterov, Matt Helsley, KOSAKI Motohiro, Pavel Emelyanov,
	Kees Cook, Tejun Heo, Andrew Morton, LKML

On Thu, Mar 08, 2012 at 11:03:03PM +0400, Cyrill Gorcunov wrote:
> On Thu, Mar 08, 2012 at 07:26:23PM +0100, Oleg Nesterov wrote:
> > On 03/08, Cyrill Gorcunov wrote:
> > >
> > > Hi Oleg, could you please take a look once you get a minute (no urgency).
> > 
> > Add Matt. I won't touch the text below to keep the patch intact.
> 
> Thanks for CC'ing Matt, Oleg (I forgot, sorry).
> 
> > 
> > With this change
> > 
> > 	down_write(&mm->mmap_sem);
> > 	if (mm->num_exe_file_vmas) {
> > 		fput(mm->exe_file);
> > 		mm->exe_file = exe_file;
> > 		exe_file = NULL;
> > 	} else
> > 		set_mm_exe_file(mm, exe_file);
> > 	up_write(&mm->mmap_sem);
> > 
> > I simply do not understand what mm->num_exe_file_vmas means after
> > PR_SET_MM_EXE_FILE.

I think it should fail if the num_exe_file_vmas is not 0 when
PR_SET_MM_EXE_FILE is used. It's simple, keeps things clear, might
catch userspace bugs (harder to accidentally leave a mapping of the original
executable), and could avoid kernel bugs too.

> > 
> > I think that you should do
> > 
> > 	down_write(&mm->mmap_sem);
> > 	if (mm->num_exe_file_vmas) {
> > 		fput(mm->exe_file);
> > 		mm->exe_file = exe_file;
> > 		exe_file = NULL;
> > 	}
> > 	up_write(&mm->mmap_sem);
> > 
> > to keep the current "mm->exe_file goes away after the final
> > unmap(MAP_EXECUTABLE)" logic.
> > 
> > OK, may be this doesn't work in c/r case because you are actually
> > going to remove the old mappings? But in this case the new exe_file
> > will go away anyway, afaics PR_SET_MM_EXE_FILE is called when you
> > still have the old mappings.
> 
> Yes, exactly, I need to remove old mappings first (because VMAs
> we're about to restore may intersect with current map the host
> program has). And yes, once they all are removed I don't have
> /proc/pid/exe anymore. That's why I need num_exe_file_vmas == 0
> case.
> 
> When I setup new exe_file with num_exe_file_vmas = 0, this reference
> to a file brings /proc/pid/exe back to live (and when process exiting
> it'll call set_mm_exe_file(mm, NULL) and the new exe_file will be dropped,
> so no leak here).

Makes sense, I think.

> > And I don't think the unconditional
> > 
> > 	down_write(&mm->mmap_sem);
> > 	set_mm_exe_file(mm, exe_file);
> > 	up_write(&mm->mmap_sem);
> > 
> > is 100% right, this clears ->num_exe_file_vmas. This means that
> > (if you still have the old mapping) the new exe_file can go away
> > after added_exe_file_vma() + removed_exe_file_vma(). Normally this
> > should happen, but afaics this is possible. Note that even, say,
> > mprotect() can trigger added_exe_file_vma().
> > 
> 
> Wait, Oleg, I'm confused, in case if there *is* exitsting VM_EXECUTABLEs
> then we jump into first banch and simply replace old exe_file.

What happens if multiple prctl calls are made? We'll have a mix of N
executable files that've been mapped n_i times. I think we're better off
just returning an error in that case -- -EBUSY or something.

> If there is no VM_EXECUTABLEs, then we simply setup new exe_file
> and num_exe_file_vmas remains zero.

Which is fine.

Cheers,
	-Matt


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-09 21:46     ` Matt Helsley
@ 2012-03-09 21:52       ` Cyrill Gorcunov
  0 siblings, 0 replies; 48+ messages in thread
From: Cyrill Gorcunov @ 2012-03-09 21:52 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Oleg Nesterov, KOSAKI Motohiro, Pavel Emelyanov, Kees Cook,
	Tejun Heo, Andrew Morton, LKML

On Fri, Mar 09, 2012 at 01:46:37PM -0800, Matt Helsley wrote:
> > > 
> > > I simply do not understand what mm->num_exe_file_vmas means after
> > > PR_SET_MM_EXE_FILE.
> 
> I think it should fail if the num_exe_file_vmas is not 0 when
> PR_SET_MM_EXE_FILE is used. It's simple, keeps things clear, might
> catch userspace bugs (harder to accidentally leave a mapping of the original
> executable), and could avoid kernel bugs too.

Yes, and in last version (whuch I just sent out) we have at the very beginning
of the function

+	if (mm->num_exe_file_vmas)
+               return -EBUSY;

> > 
> > Wait, Oleg, I'm confused, in case if there *is* exitsting VM_EXECUTABLEs
> > then we jump into first banch and simply replace old exe_file.
> 
> What happens if multiple prctl calls are made? We'll have a mix of N
> executable files that've been mapped n_i times. I think we're better off
> just returning an error in that case -- -EBUSY or something.
> 
> > If there is no VM_EXECUTABLEs, then we simply setup new exe_file
> > and num_exe_file_vmas remains zero.
> 
> Which is fine.
> 

Matt, please check the last version, and tell me if it's fine for you.

	Cyrill

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-09 15:42                             ` Cyrill Gorcunov
@ 2012-03-09 22:02                               ` Matt Helsley
  2012-03-09 22:39                                 ` Cyrill Gorcunov
  0 siblings, 1 reply; 48+ messages in thread
From: Matt Helsley @ 2012-03-09 22:02 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Oleg Nesterov, Matt Helsley, KOSAKI Motohiro, Pavel Emelyanov,
	Kees Cook, Tejun Heo, Andrew Morton, LKML

On Fri, Mar 09, 2012 at 07:42:24PM +0400, Cyrill Gorcunov wrote:
> On Fri, Mar 09, 2012 at 04:21:22PM +0100, Oleg Nesterov wrote:

<snip>

> > Just one note for the record, prctl_set_mm_exe_file() does
> > 
> > 	if (mm->num_exe_file_vmas)
> > 		return -EBUSY;
> > 
> > We could do
> > 
> > 	if (mm->exe_file)
> > 		return -EBUSY;
> > 
> > This way "because this feature is a special to C/R" becomes
> > really true. IOW, you can't do PR_SET_MM_EXE_FILE twice.
> > 
> 
> Sure, i'll make it this way. Thanks a lot, Oleg!!!

Sorry about the other email -- hadn't full caught up on this thread.
This is even better, yes.

Of course I'd prefer it if there was a way to keep num_exe_file_vmas
correct and not special-case c/r. The first approximation of a solution
might be to increment the count whenever a new mmap filp == mm->exe_file
and decrement on unmap. I think there are a bunch of details needed to
make that work but my feeling is it's do-able. Have you investigated this
already and rejected it for some reason (did I miss that discussion
somehow?)?

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-09 22:02                               ` Matt Helsley
@ 2012-03-09 22:39                                 ` Cyrill Gorcunov
  2012-03-09 23:59                                   ` Matt Helsley
  0 siblings, 1 reply; 48+ messages in thread
From: Cyrill Gorcunov @ 2012-03-09 22:39 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Oleg Nesterov, KOSAKI Motohiro, Pavel Emelyanov, Kees Cook,
	Tejun Heo, Andrew Morton, LKML

On Fri, Mar 09, 2012 at 02:02:44PM -0800, Matt Helsley wrote:
...
> 
> Sorry about the other email -- hadn't full caught up on this thread.

No problem.

> This is even better, yes.
>

Well, in final version I switched back to

+	if (mm->num_exe_file_vmas)
+		return -EBUSY;

simply because it's more flexible than mm->exe_file.

With mm->exe_file this prctl option become a one-shot
only, and while at moment our user-space tool can perfectly
live with that I thought that there is no strict need to
limit the option this way from the very beginning.

> Of course I'd prefer it if there was a way to keep num_exe_file_vmas
> correct and not special-case c/r. The first approximation of a solution

It remains correct actually. There is no way to map new VM_EXECUTABLE
from user-space after we've unmapped previous ones, so num_exe_file_vmas
will remain 0.

> might be to increment the count whenever a new mmap filp == mm->exe_file
> and decrement on unmap. I think there are a bunch of details needed to
> make that work but my feeling is it's do-able. Have you investigated this
> already and rejected it for some reason (did I miss that discussion
> somehow?)?

As far as I understand overall num_exe_file_vmas concept -- we track
a number of VM_EXECUTABLE with it, so setting new exe_file should not
change num_exe_file_vmas I think.

	Cyrill

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-09 22:39                                 ` Cyrill Gorcunov
@ 2012-03-09 23:59                                   ` Matt Helsley
  2012-03-10  7:48                                     ` Cyrill Gorcunov
  0 siblings, 1 reply; 48+ messages in thread
From: Matt Helsley @ 2012-03-09 23:59 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Matt Helsley, Oleg Nesterov, KOSAKI Motohiro, Pavel Emelyanov,
	Kees Cook, Tejun Heo, Andrew Morton, LKML

On Sat, Mar 10, 2012 at 02:39:32AM +0400, Cyrill Gorcunov wrote:
> On Fri, Mar 09, 2012 at 02:02:44PM -0800, Matt Helsley wrote:
> ...
> > 
> > Sorry about the other email -- hadn't full caught up on this thread.
> 
> No problem.
> 
> > This is even better, yes.
> >
> 
> Well, in final version I switched back to
> 
> +	if (mm->num_exe_file_vmas)
> +		return -EBUSY;
> 
> simply because it's more flexible than mm->exe_file.
> 
> With mm->exe_file this prctl option become a one-shot
> only, and while at moment our user-space tool can perfectly
> live with that I thought that there is no strict need to
> limit the option this way from the very beginning.

As far as backward compatibility, isn't it better to lift that restriction
later rather than add it? I think the latter would very likely "break"
things whereas the former would not.

I also prefer that restriction because it establishes a bound on how
frequently the symlink can change. Keeping it a one-shot deal makes the
values that show up in tools like top more reliable for admins.

> 
> > Of course I'd prefer it if there was a way to keep num_exe_file_vmas
> > correct and not special-case c/r. The first approximation of a solution
> 
> It remains correct actually. There is no way to map new VM_EXECUTABLE
> from user-space after we've unmapped previous ones, so num_exe_file_vmas
> will remain 0.
> 
> > might be to increment the count whenever a new mmap filp == mm->exe_file
> > and decrement on unmap. I think there are a bunch of details needed to
> > make that work but my feeling is it's do-able. Have you investigated this
> > already and rejected it for some reason (did I miss that discussion
> > somehow?)?
> 
> As far as I understand overall num_exe_file_vmas concept -- we track
> a number of VM_EXECUTABLE with it, so setting new exe_file should not
> change num_exe_file_vmas I think.

True, it's literally correct. However the whole reason for having it
is to turn the mm->exe_file reference into a sort of weak reference
which happens to coincide with counting the number of VM_EXECUTABLE vmas
until you do c/r (really just the restart side of c/r).

Cheers,
	-Matt


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-09 23:59                                   ` Matt Helsley
@ 2012-03-10  7:48                                     ` Cyrill Gorcunov
  2012-03-13  2:45                                       ` Matt Helsley
  0 siblings, 1 reply; 48+ messages in thread
From: Cyrill Gorcunov @ 2012-03-10  7:48 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Oleg Nesterov, KOSAKI Motohiro, Pavel Emelyanov, Kees Cook,
	Tejun Heo, Andrew Morton, LKML

On Fri, Mar 09, 2012 at 03:59:01PM -0800, Matt Helsley wrote:
> > Well, in final version I switched back to
> > 
> > +	if (mm->num_exe_file_vmas)
> > +		return -EBUSY;
> > 
> > simply because it's more flexible than mm->exe_file.
> > 
> > With mm->exe_file this prctl option become a one-shot
> > only, and while at moment our user-space tool can perfectly
> > live with that I thought that there is no strict need to
> > limit the option this way from the very beginning.
> 
> As far as backward compatibility, isn't it better to lift that restriction
> later rather than add it? I think the latter would very likely "break"
> things whereas the former would not.

Indeed. But I think any change will mean compatibility broken, programs
may rely on one-shot or multi-shot behaviour. So I personally vote
for more flexible approach here.

> 
> I also prefer that restriction because it establishes a bound on how
> frequently the symlink can change. Keeping it a one-shot deal makes the
> values that show up in tools like top more reliable for admins.

How? Once exe_file changed -- we already cheating the kernel, it's not
bad, not good, just work this way ;) I mean imagine an admin which
runs top and sees some program in 'top' ouput (btw, I'm not sure but
does top really parse /proc/pid/exe?) so say he sees some programs
names -- how would he know if a program did change own /proc/pid/exe
at all? Note it's not that important how many times the symlink was
changed there is simply no way to find out if it was changed at all,
and actually from my POV it's a win for transparent c/r, that was
all the idea.

> > 
> > As far as I understand overall num_exe_file_vmas concept -- we track
> > a number of VM_EXECUTABLE with it, so setting new exe_file should not
> > change num_exe_file_vmas I think.
> 
> True, it's literally correct. However the whole reason for having it
> is to turn the mm->exe_file reference into a sort of weak reference
> which happens to coincide with counting the number of VM_EXECUTABLE vmas
> until you do c/r (really just the restart side of c/r).
> 

Look. We actually have a period of time where exe_file is set but
num_exe_file_vmas = 0 when we start program execution before elf
map get parsed, so I dare to say such state is legitime (yes, userspace
doesn't see this state and yes, we start mapping elf sections pretty
immediately after exe_file assigned). So I don't see much problem in
extending this "state" (exe_file!=0,num_exe_file_vmas = 0).

Thanks for comments, Matt!

	Cyrill

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-10  7:48                                     ` Cyrill Gorcunov
@ 2012-03-13  2:45                                       ` Matt Helsley
  2012-03-13  6:26                                         ` Cyrill Gorcunov
  0 siblings, 1 reply; 48+ messages in thread
From: Matt Helsley @ 2012-03-13  2:45 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Matt Helsley, Oleg Nesterov, KOSAKI Motohiro, Pavel Emelyanov,
	Kees Cook, Tejun Heo, Andrew Morton, LKML

On Sat, Mar 10, 2012 at 11:48:54AM +0400, Cyrill Gorcunov wrote:
> On Fri, Mar 09, 2012 at 03:59:01PM -0800, Matt Helsley wrote:
> > > Well, in final version I switched back to
> > > 
> > > +	if (mm->num_exe_file_vmas)
> > > +		return -EBUSY;
> > > 
> > > simply because it's more flexible than mm->exe_file.
> > > 
> > > With mm->exe_file this prctl option become a one-shot
> > > only, and while at moment our user-space tool can perfectly
> > > live with that I thought that there is no strict need to
> > > limit the option this way from the very beginning.
> > 
> > As far as backward compatibility, isn't it better to lift that restriction
> > later rather than add it? I think the latter would very likely "break"
> > things whereas the former would not.
> 
> Indeed. But I think any change will mean compatibility broken, programs
> may rely on one-shot or multi-shot behaviour. So I personally vote
> for more flexible approach here.

Very true. In fact thinking about this prctl a bit more makes me more certain
that one-shot is better and it ought to stay that way forever. The
flexibility to change the /proc/pid/exe symlink could be yet another
way for malicious code to obscure a compromised program and
masquerade as a benign process. That's a problem inherent in this prctl
whether its one-shot or multi-shot. However, if you use the one-shot
approach then a security-concious program can use this prctl once
during its early initialization to ensure the prctl cannot later be abused
for this purpose.

> 
> > 
> > I also prefer that restriction because it establishes a bound on how
> > frequently the symlink can change. Keeping it a one-shot deal makes the
> > values that show up in tools like top more reliable for admins.
> 
> How? Once exe_file changed -- we already cheating the kernel, it's not
> bad, not good, just work this way ;) I mean imagine an admin which
> runs top and sees some program in 'top' ouput (btw, I'm not sure but
> does top really parse /proc/pid/exe?) so say he sees some programs

Sorry, my phrase "tools like top" was ambiguous. I was trying to succinctly
refer to top as one type of program which might rely on this information.
I don't know if top itself uses this file but it would be reasonable for
tools of that kind to use/display it to an administrator.

> names -- how would he know if a program did change own /proc/pid/exe
> at all? Note it's not that important how many times the symlink was
> changed there is simply no way to find out if it was changed at all,
> and actually from my POV it's a win for transparent c/r, that was
> all the idea.

I am quite aware of the c/r use for this prctl :). However I also
wonder if there aren't serious malicious uses of it. I'm not saying the
symlink has to be perfectly accurate at all times,  but it's easy and
reasonable to make it much harder to abuse this particular prctl for
malicious purposes by making it one-shot.

> 
> > > 
> > > As far as I understand overall num_exe_file_vmas concept -- we track
> > > a number of VM_EXECUTABLE with it, so setting new exe_file should not
> > > change num_exe_file_vmas I think.
> > 
> > True, it's literally correct. However the whole reason for having it
> > is to turn the mm->exe_file reference into a sort of weak reference
> > which happens to coincide with counting the number of VM_EXECUTABLE vmas
> > until you do c/r (really just the restart side of c/r).
> > 
> 
> Look. We actually have a period of time where exe_file is set but
> num_exe_file_vmas = 0 when we start program execution before elf
> map get parsed, so I dare to say such state is legitime (yes, userspace
> doesn't see this state and yes, we start mapping elf sections pretty
> immediately after exe_file assigned). So I don't see much problem in
> extending this "state" (exe_file!=0,num_exe_file_vmas = 0).

True, there is a time when that combination of values occurs. However
who sets those values and when is the important difference.

Before this patch that state was rather ephemeral and almost entirely
under the control of the kernel. The only way userspace could change it
was by unmapping the region(s) mapped during exec*(). At that point it
could not "lie" and insert some other symlink there and the admin would
be better able to determine what had happened.

With this patch -- especially the multi-shot form -- the symlink will
be entirely under the control of (potentially untrusted) userspace code
and the admin is totally at the mercy of the userspace code. In
single-shot form programs could use the prctl() to ensure the symlink
could not be changed later -- the restart tool would be the only program
that would need to ensure that prctl() had not been used since the last
exec*().

If we're going to let userspace do arbitrary things to the symlink I can't
help but wonder why we can't skip the prctl() altogether and just enable
MAP_EXECUTABLE in mmap().

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-13  2:45                                       ` Matt Helsley
@ 2012-03-13  6:26                                         ` Cyrill Gorcunov
  2012-03-13  7:18                                           ` Cyrill Gorcunov
  0 siblings, 1 reply; 48+ messages in thread
From: Cyrill Gorcunov @ 2012-03-13  6:26 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Oleg Nesterov, KOSAKI Motohiro, Pavel Emelyanov, Kees Cook,
	Tejun Heo, Andrew Morton, LKML

On Mon, Mar 12, 2012 at 07:45:11PM -0700, Matt Helsley wrote:
> > 
> > Indeed. But I think any change will mean compatibility broken, programs
> > may rely on one-shot or multi-shot behaviour. So I personally vote
> > for more flexible approach here.
> 
> Very true. In fact thinking about this prctl a bit more makes me more certain
> that one-shot is better and it ought to stay that way forever. The
> flexibility to change the /proc/pid/exe symlink could be yet another
> way for malicious code to obscure a compromised program and
> masquerade as a benign process. That's a problem inherent in this prctl
> whether its one-shot or multi-shot. However, if you use the one-shot
> approach then a security-concious program can use this prctl once
> during its early initialization to ensure the prctl cannot later be abused
> for this purpose.
> 

Hi Matt,

well, sure our tool can live with one-shot approach (and I'll update it)
but not that only program with CAP_RESOURCE granted can do that, ie it's
not any arbitrary program in a system.

> > names -- how would he know if a program did change own /proc/pid/exe
> > at all? Note it's not that important how many times the symlink was
> > changed there is simply no way to find out if it was changed at all,
> > and actually from my POV it's a win for transparent c/r, that was
> > all the idea.
> 
> I am quite aware of the c/r use for this prctl :). However I also
> wonder if there aren't serious malicious uses of it. I'm not saying the
> symlink has to be perfectly accurate at all times,  but it's easy and
> reasonable to make it much harder to abuse this particular prctl for
> malicious purposes by making it one-shot.
> 

ok, convinced, I'll update the patch ;)

> With this patch -- especially the multi-shot form -- the symlink will
> be entirely under the control of (potentially untrusted) userspace code
> and the admin is totally at the mercy of the userspace code. In
> single-shot form programs could use the prctl() to ensure the symlink
> could not be changed later -- the restart tool would be the only program
> that would need to ensure that prctl() had not been used since the last
> exec*().
> 
> If we're going to let userspace do arbitrary things to the symlink I can't
> help but wonder why we can't skip the prctl() altogether and just enable
> MAP_EXECUTABLE in mmap().

Well, hard to tell from my side. At moment I don't see problem in allowing
MAP_EXECUTABLE in mmap, but -mm guys help needed. I'm sure there were a reason
why it's not allowed.

	Cyrill

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-13  6:26                                         ` Cyrill Gorcunov
@ 2012-03-13  7:18                                           ` Cyrill Gorcunov
  2012-03-13 15:43                                             ` Oleg Nesterov
  0 siblings, 1 reply; 48+ messages in thread
From: Cyrill Gorcunov @ 2012-03-13  7:18 UTC (permalink / raw)
  To: Matt Helsley, Oleg Nesterov
  Cc: KOSAKI Motohiro, Pavel Emelyanov, Kees Cook, Tejun Heo,
	Andrew Morton, LKML

On Tue, Mar 13, 2012 at 10:26:25AM +0400, Cyrill Gorcunov wrote:
> > I am quite aware of the c/r use for this prctl :). However I also
> > wonder if there aren't serious malicious uses of it. I'm not saying the
> > symlink has to be perfectly accurate at all times,  but it's easy and
> > reasonable to make it much harder to abuse this particular prctl for
> > malicious purposes by making it one-shot.
> > 
> 
> ok, convinced, I'll update the patch ;)

Matt, Oleg, there is a final version I hope,
which should fit everyone.

	Cyrill
---
From: Cyrill Gorcunov <gorcunov@openvz.org>
Subject: c/r: prctl: Add ability to set new mm_struct::exe_file

When we do restore we would like to have a way to setup
a former mm_struct::exe_file so that /proc/pid/exe would
point to the original executable file a process had at
checkpoint time.

For this the PR_SET_MM_EXE_FILE code is introduced.
This option takes a file descriptor which will be
set as a source for new /proc/$pid/exe symlink.

Note it allows to change /proc/$pid/exe iif there
are no VM_EXECUTABLE vmas present for current process,
simply because this feature is a special to C/R
and mm::num_exe_file_vmas become meaningless after
that.

Also this action is one-shot only. For secutiry reason
we don't allow to chanage the symlink several times.

This feature is available iif CONFIG_CHECKPOINT_RESTORE is set.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Kees Cook <keescook@chromium.org>
CC: Tejun Heo <tj@kernel.org>
CC: Matt Helsley <matthltc@us.ibm.com>
---
 include/linux/prctl.h |    1 
 kernel/sys.c          |   59 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+)

Index: linux-2.6.git/include/linux/prctl.h
===================================================================
--- linux-2.6.git.orig/include/linux/prctl.h
+++ linux-2.6.git/include/linux/prctl.h
@@ -118,5 +118,6 @@
 # define PR_SET_MM_ENV_START		10
 # define PR_SET_MM_ENV_END		11
 # define PR_SET_MM_AUXV			12
+# define PR_SET_MM_EXE_FILE		13
 
 #endif /* _LINUX_PRCTL_H */
Index: linux-2.6.git/kernel/sys.c
===================================================================
--- linux-2.6.git.orig/kernel/sys.c
+++ linux-2.6.git/kernel/sys.c
@@ -36,6 +36,8 @@
 #include <linux/personality.h>
 #include <linux/ptrace.h>
 #include <linux/fs_struct.h>
+#include <linux/file.h>
+#include <linux/mount.h>
 #include <linux/gfp.h>
 #include <linux/syscore_ops.h>
 #include <linux/version.h>
@@ -1701,6 +1703,60 @@ static bool vma_flags_mismatch(struct vm
 		(vma->vm_flags & banned);
 }
 
+static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
+{
+	struct file *exe_file;
+	struct dentry *dentry;
+	int err;
+
+	/*
+	 * Setting new mm::exe_file is only allowed
+	 * when no VM_EXECUTABLE vma's left. This is
+	 * a special C/R case when a restored program
+	 * need to change own /proc/$pid/exe symlink.
+	 * After this call mm::num_exe_file_vmas become
+	 * meaningless.
+	 */
+	if (mm->num_exe_file_vmas)
+		return -EBUSY;
+
+	exe_file = fget(fd);
+	if (!exe_file)
+		return -EBADF;
+
+	dentry = exe_file->f_path.dentry;
+
+	/*
+	 * Because the original mm->exe_file
+	 * points to executable file, make sure
+	 * this one is executable as well to not
+	 * break an overall picture.
+	 */
+	err = -EACCES;
+	if (!S_ISREG(dentry->d_inode->i_mode)	||
+	    exe_file->f_path.mnt->mnt_flags & MNT_NOEXEC)
+		goto exit;
+
+	err = inode_permission(dentry->d_inode, MAY_EXEC);
+	if (err)
+		goto exit;
+
+	/*
+	 * For security reason changing mm->exe_file
+	 * is one-shot action.
+	 */
+	down_write(&mm->mmap_sem);
+	if (likely(!mm->exe_file))
+		set_mm_exe_file(mm, exe_file);
+	else
+		err = -EBUSY;
+	up_write(&mm->mmap_sem);
+
+exit:
+	fput(exe_file);
+	return err;
+}
+
 static int prctl_set_mm(int opt, unsigned long addr,
 			unsigned long arg4, unsigned long arg5)
 {
@@ -1715,6 +1771,9 @@ static int prctl_set_mm(int opt, unsigne
 	if (!capable(CAP_SYS_RESOURCE))
 		return -EPERM;
 
+	if (opt == PR_SET_MM_EXE_FILE)
+		return prctl_set_mm_exe_file(mm, (unsigned int)addr);
+
 	if (addr >= TASK_SIZE)
 		return -EINVAL;
 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-13  7:18                                           ` Cyrill Gorcunov
@ 2012-03-13 15:43                                             ` Oleg Nesterov
  2012-03-13 16:00                                               ` Cyrill Gorcunov
  2012-03-14  0:36                                               ` Matt Helsley
  0 siblings, 2 replies; 48+ messages in thread
From: Oleg Nesterov @ 2012-03-13 15:43 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Matt Helsley, KOSAKI Motohiro, Pavel Emelyanov, Kees Cook,
	Tejun Heo, Andrew Morton, LKML

On 03/13, Cyrill Gorcunov wrote:
>
> Matt, Oleg, there is a final version I hope,
> which should fit everyone.

Well, this version looks correct, but you are checking the same
condition twice, with the different comments. This is confusing.

> +static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
> +{
> +	struct file *exe_file;
> +	struct dentry *dentry;
> +	int err;
> +
> +	/*
> +	 * Setting new mm::exe_file is only allowed
> +	 * when no VM_EXECUTABLE vma's left. This is
> +	 * a special C/R case when a restored program
> +	 * need to change own /proc/$pid/exe symlink.
> +	 * After this call mm::num_exe_file_vmas become
> +	 * meaningless.
> +	 */
> +	if (mm->num_exe_file_vmas)
> +		return -EBUSY;
> +
> +	exe_file = fget(fd);
> +	if (!exe_file)
> +		return -EBADF;
> +
> +	dentry = exe_file->f_path.dentry;
> +
> +	/*
> +	 * Because the original mm->exe_file
> +	 * points to executable file, make sure
> +	 * this one is executable as well to not
> +	 * break an overall picture.
> +	 */
> +	err = -EACCES;
> +	if (!S_ISREG(dentry->d_inode->i_mode)	||
> +	    exe_file->f_path.mnt->mnt_flags & MNT_NOEXEC)
> +		goto exit;
> +
> +	err = inode_permission(dentry->d_inode, MAY_EXEC);
> +	if (err)
> +		goto exit;
> +
> +	/*
> +	 * For security reason changing mm->exe_file
> +	 * is one-shot action.
> +	 */
> +	down_write(&mm->mmap_sem);
> +	if (likely(!mm->exe_file))

This means that the num_exe_file_vmas check at the start is not needed.
If you want it as a "fast-path" check, please fix the comment. Or just
remove it. Otherwise the code looks as if we have to check them both.

Matt, is it really possible to hit mm->exe_file = NULL in
removed_exe_file_vma ? Unless I missed something, this check just
hides the potentional problem, no?

IOW, shouldn't it do

	void removed_exe_file_vma(struct mm_struct *mm)
	{
		WARN_ON(!mm->exe_file);
		WARN_ON(mm->num_exe_file_vmas <= 0);

		if (!--mm->num_exe_file_vmas) {
			fput(mm->exe_file);
			mm->exe_file = NULL;
		}
	}

?

Oleg.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-13 15:43                                             ` Oleg Nesterov
@ 2012-03-13 16:00                                               ` Cyrill Gorcunov
  2012-03-13 16:04                                                 ` Cyrill Gorcunov
  2012-03-14  0:36                                               ` Matt Helsley
  1 sibling, 1 reply; 48+ messages in thread
From: Cyrill Gorcunov @ 2012-03-13 16:00 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Matt Helsley, KOSAKI Motohiro, Pavel Emelyanov, Kees Cook,
	Tejun Heo, Andrew Morton, LKML

On Tue, Mar 13, 2012 at 04:43:37PM +0100, Oleg Nesterov wrote:
> > +
> > +	/*
> > +	 * For security reason changing mm->exe_file
> > +	 * is one-shot action.
> > +	 */
> > +	down_write(&mm->mmap_sem);
> > +	if (likely(!mm->exe_file))
> 
> This means that the num_exe_file_vmas check at the start is not needed.
> If you want it as a "fast-path" check, please fix the comment. Or just
> remove it. Otherwise the code looks as if we have to check them both.

Yes, I wanted a fast test first, while the second test will give
one-shot condition and the second attempt to setup new exe_file
will fail. OK, I'll update the comment block.

> 
> Matt, is it really possible to hit mm->exe_file = NULL in
> removed_exe_file_vma ? Unless I missed something, this check just
> hides the potentional problem, no?
> 
> IOW, shouldn't it do
> 
> 	void removed_exe_file_vma(struct mm_struct *mm)
> 	{
> 		WARN_ON(!mm->exe_file);
> 		WARN_ON(mm->num_exe_file_vmas <= 0);
> 

I guess if num_exe_file_vmas < 1 here we've a bug somewhere
and should not decrement the counter at all.

	Cyrill

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-13 16:00                                               ` Cyrill Gorcunov
@ 2012-03-13 16:04                                                 ` Cyrill Gorcunov
  2012-03-13 16:44                                                   ` Oleg Nesterov
  2012-03-14  1:41                                                   ` Matt Helsley
  0 siblings, 2 replies; 48+ messages in thread
From: Cyrill Gorcunov @ 2012-03-13 16:04 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Matt Helsley, KOSAKI Motohiro, Pavel Emelyanov, Kees Cook,
	Tejun Heo, Andrew Morton, LKML

On Tue, Mar 13, 2012 at 08:00:44PM +0400, Cyrill Gorcunov wrote:
> > 
> > This means that the num_exe_file_vmas check at the start is not needed.
> > If you want it as a "fast-path" check, please fix the comment. Or just
> > remove it. Otherwise the code looks as if we have to check them both.
> 
> Yes, I wanted a fast test first, while the second test will give
> one-shot condition and the second attempt to setup new exe_file
> will fail. OK, I'll update the comment block.
> 

Something like below?

	Cyrill
---
From: Cyrill Gorcunov <gorcunov@openvz.org>
Subject: c/r: prctl: Add ability to set new mm_struct::exe_file

When we do restore we would like to have a way to setup
a former mm_struct::exe_file so that /proc/pid/exe would
point to the original executable file a process had at
checkpoint time.

For this the PR_SET_MM_EXE_FILE code is introduced.
This option takes a file descriptor which will be
set as a source for new /proc/$pid/exe symlink.

Note it allows to change /proc/$pid/exe iif there
are no VM_EXECUTABLE vmas present for current process,
simply because this feature is a special to C/R
and mm::num_exe_file_vmas become meaningless after
that.

Also this action is one-shot only. For secutiry reason
we don't allow to chanage the symlink several times.

This feature is available iif CONFIG_CHECKPOINT_RESTORE is set.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Kees Cook <keescook@chromium.org>
CC: Tejun Heo <tj@kernel.org>
CC: Matt Helsley <matthltc@us.ibm.com>
---
 include/linux/prctl.h |    1 
 kernel/sys.c          |   56 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)

Index: linux-2.6.git/include/linux/prctl.h
===================================================================
--- linux-2.6.git.orig/include/linux/prctl.h
+++ linux-2.6.git/include/linux/prctl.h
@@ -118,5 +118,6 @@
 # define PR_SET_MM_ENV_START		10
 # define PR_SET_MM_ENV_END		11
 # define PR_SET_MM_AUXV			12
+# define PR_SET_MM_EXE_FILE		13
 
 #endif /* _LINUX_PRCTL_H */
Index: linux-2.6.git/kernel/sys.c
===================================================================
--- linux-2.6.git.orig/kernel/sys.c
+++ linux-2.6.git/kernel/sys.c
@@ -36,6 +36,8 @@
 #include <linux/personality.h>
 #include <linux/ptrace.h>
 #include <linux/fs_struct.h>
+#include <linux/file.h>
+#include <linux/mount.h>
 #include <linux/gfp.h>
 #include <linux/syscore_ops.h>
 #include <linux/version.h>
@@ -1701,6 +1703,57 @@ static bool vma_flags_mismatch(struct vm
 		(vma->vm_flags & banned);
 }
 
+static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
+{
+	struct file *exe_file;
+	struct dentry *dentry;
+	int err;
+
+	/*
+	 * Setting new mm::exe_file is only allowed
+	 * when no VM_EXECUTABLE vma's left. So make
+	 * a fast test first.
+	 */
+	if (mm->num_exe_file_vmas)
+		return -EBUSY;
+
+	exe_file = fget(fd);
+	if (!exe_file)
+		return -EBADF;
+
+	dentry = exe_file->f_path.dentry;
+
+	/*
+	 * Because the original mm->exe_file
+	 * points to executable file, make sure
+	 * this one is executable as well to not
+	 * break an overall picture.
+	 */
+	err = -EACCES;
+	if (!S_ISREG(dentry->d_inode->i_mode)	||
+	    exe_file->f_path.mnt->mnt_flags & MNT_NOEXEC)
+		goto exit;
+
+	err = inode_permission(dentry->d_inode, MAY_EXEC);
+	if (err)
+		goto exit;
+
+	/*
+	 * For security reason changing mm->exe_file
+	 * is one-shot action.
+	 */
+	down_write(&mm->mmap_sem);
+	if (likely(!mm->exe_file))
+		set_mm_exe_file(mm, exe_file);
+	else
+		err = -EBUSY;
+	up_write(&mm->mmap_sem);
+
+exit:
+	fput(exe_file);
+	return err;
+}
+
 static int prctl_set_mm(int opt, unsigned long addr,
 			unsigned long arg4, unsigned long arg5)
 {
@@ -1715,6 +1768,9 @@ static int prctl_set_mm(int opt, unsigne
 	if (!capable(CAP_SYS_RESOURCE))
 		return -EPERM;
 
+	if (opt == PR_SET_MM_EXE_FILE)
+		return prctl_set_mm_exe_file(mm, (unsigned int)addr);
+
 	if (addr >= TASK_SIZE)
 		return -EINVAL;
 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-13 16:04                                                 ` Cyrill Gorcunov
@ 2012-03-13 16:44                                                   ` Oleg Nesterov
  2012-03-14  1:41                                                   ` Matt Helsley
  1 sibling, 0 replies; 48+ messages in thread
From: Oleg Nesterov @ 2012-03-13 16:44 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Matt Helsley, KOSAKI Motohiro, Pavel Emelyanov, Kees Cook,
	Tejun Heo, Andrew Morton, LKML

On 03/13, Cyrill Gorcunov wrote:
>
> On Tue, Mar 13, 2012 at 08:00:44PM +0400, Cyrill Gorcunov wrote:
> > >
> > > This means that the num_exe_file_vmas check at the start is not needed.
> > > If you want it as a "fast-path" check, please fix the comment. Or just
> > > remove it. Otherwise the code looks as if we have to check them both.
> >
> > Yes, I wanted a fast test first, while the second test will give
> > one-shot condition and the second attempt to setup new exe_file
> > will fail. OK, I'll update the comment block.
> >
>
> Something like below?

Yes, looks good to me.

Oleg.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-13 15:43                                             ` Oleg Nesterov
  2012-03-13 16:00                                               ` Cyrill Gorcunov
@ 2012-03-14  0:36                                               ` Matt Helsley
  1 sibling, 0 replies; 48+ messages in thread
From: Matt Helsley @ 2012-03-14  0:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Cyrill Gorcunov, Matt Helsley, KOSAKI Motohiro, Pavel Emelyanov,
	Kees Cook, Tejun Heo, Andrew Morton, LKML

On Tue, Mar 13, 2012 at 04:43:37PM +0100, Oleg Nesterov wrote:
> 
> Matt, is it really possible to hit mm->exe_file = NULL in
> removed_exe_file_vma ? Unless I missed something, this check just
> hides the potentional problem, no?
> 
> IOW, shouldn't it do
> 
> 	void removed_exe_file_vma(struct mm_struct *mm)
> 	{
> 		WARN_ON(!mm->exe_file);
> 		WARN_ON(mm->num_exe_file_vmas <= 0);
> 
> 		if (!--mm->num_exe_file_vmas) {
> 			fput(mm->exe_file);
> 			mm->exe_file = NULL;
> 		}
> 	}
> 
> ?

I think you're spot-on about hiding the problem. I'm not sure the
WARN_ON() would be welcome in the mm's VMA paths though.
Also, it's a nit but I'd keep the decrement out of the condition like in
the original.

Cheers,
	-Matt


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-13 16:04                                                 ` Cyrill Gorcunov
  2012-03-13 16:44                                                   ` Oleg Nesterov
@ 2012-03-14  1:41                                                   ` Matt Helsley
  2012-03-14  5:47                                                     ` Cyrill Gorcunov
  1 sibling, 1 reply; 48+ messages in thread
From: Matt Helsley @ 2012-03-14  1:41 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Oleg Nesterov, Matt Helsley, KOSAKI Motohiro, Pavel Emelyanov,
	Kees Cook, Tejun Heo, Andrew Morton, LKML

On Tue, Mar 13, 2012 at 08:04:20PM +0400, Cyrill Gorcunov wrote:
> On Tue, Mar 13, 2012 at 08:00:44PM +0400, Cyrill Gorcunov wrote:
> > > 
> > > This means that the num_exe_file_vmas check at the start is not needed.
> > > If you want it as a "fast-path" check, please fix the comment. Or just
> > > remove it. Otherwise the code looks as if we have to check them both.
> > 
> > Yes, I wanted a fast test first, while the second test will give
> > one-shot condition and the second attempt to setup new exe_file
> > will fail. OK, I'll update the comment block.
> > 
> 
> Something like below?
> 
> 	Cyrill
> ---
> From: Cyrill Gorcunov <gorcunov@openvz.org>
> Subject: c/r: prctl: Add ability to set new mm_struct::exe_file
> 
> When we do restore we would like to have a way to setup
> a former mm_struct::exe_file so that /proc/pid/exe would
> point to the original executable file a process had at
> checkpoint time.
> 
> For this the PR_SET_MM_EXE_FILE code is introduced.
> This option takes a file descriptor which will be
> set as a source for new /proc/$pid/exe symlink.
> 
> Note it allows to change /proc/$pid/exe iif there
> are no VM_EXECUTABLE vmas present for current process,
> simply because this feature is a special to C/R
> and mm::num_exe_file_vmas become meaningless after
> that.
> 
> Also this action is one-shot only. For secutiry reason
> we don't allow to chanage the symlink several times.
> 
> This feature is available iif CONFIG_CHECKPOINT_RESTORE is set.
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> Reviewed-by: Oleg Nesterov <oleg@redhat.com>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Pavel Emelyanov <xemul@parallels.com>
> CC: Kees Cook <keescook@chromium.org>
> CC: Tejun Heo <tj@kernel.org>
> CC: Matt Helsley <matthltc@us.ibm.com>
> ---
>  include/linux/prctl.h |    1 
>  kernel/sys.c          |   56 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 57 insertions(+)
> 
> Index: linux-2.6.git/include/linux/prctl.h
> ===================================================================
> --- linux-2.6.git.orig/include/linux/prctl.h
> +++ linux-2.6.git/include/linux/prctl.h
> @@ -118,5 +118,6 @@
>  # define PR_SET_MM_ENV_START		10
>  # define PR_SET_MM_ENV_END		11
>  # define PR_SET_MM_AUXV			12
> +# define PR_SET_MM_EXE_FILE		13
> 
>  #endif /* _LINUX_PRCTL_H */
> Index: linux-2.6.git/kernel/sys.c
> ===================================================================
> --- linux-2.6.git.orig/kernel/sys.c
> +++ linux-2.6.git/kernel/sys.c
> @@ -36,6 +36,8 @@
>  #include <linux/personality.h>
>  #include <linux/ptrace.h>
>  #include <linux/fs_struct.h>
> +#include <linux/file.h>
> +#include <linux/mount.h>
>  #include <linux/gfp.h>
>  #include <linux/syscore_ops.h>
>  #include <linux/version.h>
> @@ -1701,6 +1703,57 @@ static bool vma_flags_mismatch(struct vm
>  		(vma->vm_flags & banned);
>  }
> 
> +static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
> +{
> +	struct file *exe_file;
> +	struct dentry *dentry;
> +	int err;
> +
> +	/*
> +	 * Setting new mm::exe_file is only allowed
> +	 * when no VM_EXECUTABLE vma's left. So make
> +	 * a fast test first.
> +	 */
> +	if (mm->num_exe_file_vmas)
> +		return -EBUSY;
> +
> +	exe_file = fget(fd);
> +	if (!exe_file)
> +		return -EBADF;
> +
> +	dentry = exe_file->f_path.dentry;
> +
> +	/*
> +	 * Because the original mm->exe_file
> +	 * points to executable file, make sure
> +	 * this one is executable as well to not
> +	 * break an overall picture.
> +	 */
> +	err = -EACCES;
> +	if (!S_ISREG(dentry->d_inode->i_mode)	||
> +	    exe_file->f_path.mnt->mnt_flags & MNT_NOEXEC)
> +		goto exit;

You could factor out this portion of the access checking from open_exec()
after the do_filp_open() in open_exec() and re-use it here. I know it's
tiny helper but tying these two together might be good for
maintenance later.

Should it check for some of the flags open_exec() uses? open_exec()
passes:

	O_LARGEFILE|O_RDONLY|__FMODE_EXEC

to do_filp_open(). I think a O_RDONLY check might be good. I don't
think __FMODE_EXEC is something userspace can set so could be ignored.
O_LARGEFILE might be important though.

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-14  1:41                                                   ` Matt Helsley
@ 2012-03-14  5:47                                                     ` Cyrill Gorcunov
  2012-03-14 22:21                                                       ` Matt Helsley
  0 siblings, 1 reply; 48+ messages in thread
From: Cyrill Gorcunov @ 2012-03-14  5:47 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Oleg Nesterov, KOSAKI Motohiro, Pavel Emelyanov, Kees Cook,
	Tejun Heo, Andrew Morton, LKML

On Tue, Mar 13, 2012 at 06:41:06PM -0700, Matt Helsley wrote:
...
> > +
> > +	exe_file = fget(fd);
> > +	if (!exe_file)
> > +		return -EBADF;
> > +
> > +	dentry = exe_file->f_path.dentry;
> > +
> > +	/*
> > +	 * Because the original mm->exe_file
> > +	 * points to executable file, make sure
> > +	 * this one is executable as well to not
> > +	 * break an overall picture.
> > +	 */
> > +	err = -EACCES;
> > +	if (!S_ISREG(dentry->d_inode->i_mode)	||
> > +	    exe_file->f_path.mnt->mnt_flags & MNT_NOEXEC)
> > +		goto exit;
> 
> You could factor out this portion of the access checking from open_exec()
> after the do_filp_open() in open_exec() and re-use it here. I know it's
> tiny helper but tying these two together might be good for
> maintenance later.
> 

Matt, I really dont wanna touch code outside of prctl and this function
in particualar, at least in this patch, ie I can clean up and factor out
is on top of the patch, as a separate task.

> Should it check for some of the flags open_exec() uses? open_exec()
> passes:
> 
> 	O_LARGEFILE|O_RDONLY|__FMODE_EXEC
> 
> to do_filp_open(). I think a O_RDONLY check might be good. I don't
> think __FMODE_EXEC is something userspace can set so could be ignored.
> O_LARGEFILE might be important though.

Well, we're not going to read from this file, so it is not that important
at moment, so previously I've had

> +     if ((exe_file->f_flags & O_ACCMODE) != O_RDONLY)
> +             goto exit;

and Oleg pointed me

 | But the O_RDONLY check looks strange. We are not going to write
 | to this file, we only set the name (and that is why I think it
 | should be mm->exe_path). What is the point to check that the file
 | was opened without FMODE_WRITE? Even if there were any security
 | risk the apllication can open this file again with the different
 | flags.

so I dropped it. And I think the same applies to O_LARGEFILE. Sure
it's not a problem to bring it back but should we?

	Cyrill

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-14  5:47                                                     ` Cyrill Gorcunov
@ 2012-03-14 22:21                                                       ` Matt Helsley
  2012-03-14 22:48                                                         ` Cyrill Gorcunov
  0 siblings, 1 reply; 48+ messages in thread
From: Matt Helsley @ 2012-03-14 22:21 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Matt Helsley, Oleg Nesterov, KOSAKI Motohiro, Pavel Emelyanov,
	Kees Cook, Tejun Heo, Andrew Morton, LKML

On Wed, Mar 14, 2012 at 09:47:28AM +0400, Cyrill Gorcunov wrote:
> On Tue, Mar 13, 2012 at 06:41:06PM -0700, Matt Helsley wrote:
> ...
> > > +
> > > +	exe_file = fget(fd);
> > > +	if (!exe_file)
> > > +		return -EBADF;
> > > +
> > > +	dentry = exe_file->f_path.dentry;
> > > +
> > > +	/*
> > > +	 * Because the original mm->exe_file
> > > +	 * points to executable file, make sure
> > > +	 * this one is executable as well to not
> > > +	 * break an overall picture.
> > > +	 */
> > > +	err = -EACCES;
> > > +	if (!S_ISREG(dentry->d_inode->i_mode)	||
> > > +	    exe_file->f_path.mnt->mnt_flags & MNT_NOEXEC)
> > > +		goto exit;
> > 
> > You could factor out this portion of the access checking from open_exec()
> > after the do_filp_open() in open_exec() and re-use it here. I know it's
> > tiny helper but tying these two together might be good for
> > maintenance later.
> > 
> 
> Matt, I really dont wanna touch code outside of prctl and this function
> in particualar, at least in this patch, ie I can clean up and factor out
> is on top of the patch, as a separate task.

OK, sounds fine.

> 
> > Should it check for some of the flags open_exec() uses? open_exec()
> > passes:
> > 
> > 	O_LARGEFILE|O_RDONLY|__FMODE_EXEC
> > 
> > to do_filp_open(). I think a O_RDONLY check might be good. I don't
> > think __FMODE_EXEC is something userspace can set so could be ignored.
> > O_LARGEFILE might be important though.
> 
> Well, we're not going to read from this file, so it is not that important
> at moment, so previously I've had
> 
> > +     if ((exe_file->f_flags & O_ACCMODE) != O_RDONLY)
> > +             goto exit;
> 
> and Oleg pointed me
> 
>  | But the O_RDONLY check looks strange. We are not going to write
>  | to this file, we only set the name (and that is why I think it
>  | should be mm->exe_path). What is the point to check that the file
>  | was opened without FMODE_WRITE? Even if there were any security
>  | risk the apllication can open this file again with the different
>  | flags.
> 
> so I dropped it. And I think the same applies to O_LARGEFILE. Sure
> it's not a problem to bring it back but should we?

OK, sorry I must have missed that portion of the discussion. It all looks
good to me.

Cheers,
	-Matt


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3
  2012-03-14 22:21                                                       ` Matt Helsley
@ 2012-03-14 22:48                                                         ` Cyrill Gorcunov
  0 siblings, 0 replies; 48+ messages in thread
From: Cyrill Gorcunov @ 2012-03-14 22:48 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Oleg Nesterov, KOSAKI Motohiro, Pavel Emelyanov, Kees Cook,
	Tejun Heo, Andrew Morton, LKML

On Wed, Mar 14, 2012 at 03:21:08PM -0700, Matt Helsley wrote:
...
> 
> OK, sorry I must have missed that portion of the discussion. It all looks
> good to me.
> 

Thanks a lot for all your comments, Matt!
Can I add your Reviewed-by or something
on the last version of the patch?

(this one which looks good to Oleg
 https://lkml.org/lkml/2012/3/13/437)

so I would gather these two patches
together and will resend them out
again (you'll be CC'ed so you'll see).

	Cyrill

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2012-03-14 22:48 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-08 16:51 [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file v3 Cyrill Gorcunov
2012-03-08 18:26 ` Oleg Nesterov
2012-03-08 19:03   ` Cyrill Gorcunov
2012-03-08 19:05     ` Oleg Nesterov
2012-03-08 19:25       ` Cyrill Gorcunov
2012-03-08 19:25         ` Oleg Nesterov
2012-03-08 19:36           ` Cyrill Gorcunov
2012-03-08 21:48           ` Cyrill Gorcunov
2012-03-09 12:48             ` Oleg Nesterov
2012-03-09 12:57               ` Cyrill Gorcunov
2012-03-09 13:35                 ` Cyrill Gorcunov
2012-03-09 13:47                   ` Oleg Nesterov
2012-03-09 14:13                     ` Cyrill Gorcunov
2012-03-09 14:26                       ` Oleg Nesterov
2012-03-09 14:42                         ` Cyrill Gorcunov
2012-03-09 15:21                           ` Oleg Nesterov
2012-03-09 15:42                             ` Cyrill Gorcunov
2012-03-09 22:02                               ` Matt Helsley
2012-03-09 22:39                                 ` Cyrill Gorcunov
2012-03-09 23:59                                   ` Matt Helsley
2012-03-10  7:48                                     ` Cyrill Gorcunov
2012-03-13  2:45                                       ` Matt Helsley
2012-03-13  6:26                                         ` Cyrill Gorcunov
2012-03-13  7:18                                           ` Cyrill Gorcunov
2012-03-13 15:43                                             ` Oleg Nesterov
2012-03-13 16:00                                               ` Cyrill Gorcunov
2012-03-13 16:04                                                 ` Cyrill Gorcunov
2012-03-13 16:44                                                   ` Oleg Nesterov
2012-03-14  1:41                                                   ` Matt Helsley
2012-03-14  5:47                                                     ` Cyrill Gorcunov
2012-03-14 22:21                                                       ` Matt Helsley
2012-03-14 22:48                                                         ` Cyrill Gorcunov
2012-03-14  0:36                                               ` Matt Helsley
2012-03-09 21:46     ` Matt Helsley
2012-03-09 21:52       ` Cyrill Gorcunov
2012-03-08 19:31 ` Kees Cook
2012-03-08 19:40   ` Cyrill Gorcunov
2012-03-08 20:02     ` Andy Lutomirski
2012-03-08 20:06       ` Kees Cook
2012-03-08 20:07       ` Cyrill Gorcunov
2012-03-08 20:15         ` Andy Lutomirski
2012-03-08 20:21           ` Cyrill Gorcunov
2012-03-08 20:24             ` Andy Lutomirski
2012-03-08 20:28               ` Cyrill Gorcunov
2012-03-08 21:57               ` Cyrill Gorcunov
2012-03-08 22:03                 ` Kees Cook
2012-03-08 22:12                   ` Cyrill Gorcunov
2012-03-08 22:14                     ` Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox