public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH] exec: Fix use after free of tracepoint trace_sched_process_exec
@ 2014-02-04 17:05 Steven Rostedt
  2014-02-04 19:00 ` Oleg Nesterov
  2014-02-04 20:18 ` Linus Torvalds
  0 siblings, 2 replies; 14+ messages in thread
From: Steven Rostedt @ 2014-02-04 17:05 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Andrew Morton, Oleg Nesterov, Al Viro,
	David Smith, Peter Zijlstra, Igor Zhbanov, Christoph Hellwig,
	Paul Menage

Igor reported garbage in a trace that looked like this:

           login-1718  [000] ...1  1340.195118: sched_process_exec: filename=/bin/login pid=1718 old_pid=1718
 systemd-cgroups-2061  [006] ...1  1341.672207: sched_process_exec: filename=/usr/lib/systemd/systemd-cgroups-agent pid=2061 old_pid=2061
 systemd-cgroups-2063  [004] ...1  1341.673437: sched_process_exec: filename=ÀÈûÉ pid=2063 old_pid=2063

See the junk in the last line for filename.

Investigating this, I found that the problem is where the tracepoint
lies. In exec.c exec_binprm() we have:

	ret = search_binary_handler(bprm);
	if (ret >= 0) {
		audit_bprm(bprm);
		trace_sched_process_exec(current, old_pid, bprm);

The tracepoint uses bprm->filename to record the process that was
executed. The problem is that the handlers called in
search_binary_handler() can release the process' old memory for the
new memory. The bprm->filename is not in this memory, but the issue is
when this happens, if the parent used a vfork() it will wake up at
that moment. To explain exactly what happens, we have this:

cgroup_release_agent()
 agentbuf = kstrdup();
 argv[0] = argentbuf;
 call_usermodehelper(argv[0],...,UMH_WAIT_EXEC);
  call_usermodehelper_setup()
    sub_info->path = path; (argv[0] from parameter)
    sub_info->work = __call_usermodehelper;
  call_usermodehelper_exec()
    queue_work(subinfo->work);
    wait_for_completion();

Now the work gets called:

__call_usermodehelper
  kernel_thread(call_helper, CLONE_VFORK);

Notice the VFORK flag. This means that the parent will sleep till the
child exits or execs.

call_helper()
  ____call_usermodehelper()
    do_execve(subinfo->path)
      filename = subinfo->path; (filename is allocated agentbuf)
      do_execve_common(filename)
        bprm->filename = filename;
        exec_bprm(bprm)
          search_binary_handler(bprm)
            fmt->load_binary() (load_elf_binary);
              load_elf_binary()
                flush_old_exec()
                  exec_mmap()
                    mm_release()
                      if (tsk->vfork_done)
                         complete_vfork_done(tsk);

Here we wake up the parent. The one that called
kthread_thread(call_helper). Now that one continues to run:

 	case UMH_WAIT_EXEC:
		if (pid < 0)
			sub_info->retval = pid;
		umh_complete(sub_info);

Now this wakes up the process that's waiting for completion. Which
continues to run and does:

		kfree(agentbuf);

This is what bprm->filename points to.

When the search_binary_handler() returns, (with a possibility that
filename has been freed), when it calls the tracepoint, the tracepoint
will record bogus data. Luckily it didn't crash the kernel.

Adding a few strategic trace_printks, we can see this happening:

     kworker/3:1-51    [003] ....    59.700772: cgroup_release_agent: agentbuf = ffff880114fc28c0 /usr/lib/systemd/systemd-cgroups-agent
 systemd-cgroups-1925  [005] ....    59.700866: do_execve_common.isra.25: calling search_binary_handler (bprm->filename=ffff880114fc28c0 /usr/lib/systemd/systemd-cgroups-agent
 systemd-cgroups-1925  [005] ....    59.700867: search_binary_handler: fmt->load_binary = load_misc_binary+0x0/0x302
 systemd-cgroups-1925  [005] ....    59.700868: search_binary_handler: fmt->load_binary = load_script+0x0/0x1e8
 systemd-cgroups-1925  [005] ....    59.700868: search_binary_handler: fmt->load_binary = load_elf_binary+0x0/0x179b
 systemd-cgroups-1925  [005] ....    59.700876: flush_old_exec: calling exec_mmap bprm->filename=ffff880114fc28c0 /usr/lib/systemd/systemd-cgroups-agent
 systemd-cgroups-1925  [005] ....    59.700876: flush_old_exec: calling mm_release
 systemd-cgroups-1925  [005] ....    59.700876: mm_release: tsk->vfork_done = ffff880000091d68
     kworker/3:1-51    [003] ....    59.700904: cgroup_release_agent: freeing agentbuf
 systemd-cgroups-1925  [005] ....    59.700911: do_execve_common.isra.25: back from binary (bprm->filename=ffff880114fc28c0 @$ü\x14\x01ˆÿÿ/systemd/systemd-cgroups-agent
 systemd-cgroups-1925  [005] ...1    59.700912: sched_process_exec: filename=@$ü\x14\x01ˆÿÿ/systemd/systemd-cgroups-agent pid=1925 old_pid=1925


The junk starts after "freeing agentbuf"

Now to fix this we need to save the filename before calling
search_binary_handler(). But we don't want to save it if we are not
tracing. Why slow everyone else down?  To prevent this, I added
a static key branch (nop or unconditional jump) that gets enabled when
the tracepoint sched_process_exec is registered, and disabled when the
tracepoint is unregistered.

When the tracepoint is registered, the filename will be allocated by a
kstrdup and freed after the call. If filename fails to allocate, we
simply fall back to brpm->tcomm (just the name, not the path).

This works, but is rather ugly. Looking for any other suggestions here.

Reported-by: Igor Zhbanov <i.zhbanov@samsung.com>
[ I'm assuming this is the right person. All I have is a nick "IZh" on IRC]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

diff --git a/fs/exec.c b/fs/exec.c
index e1529b4..9bb1b0f 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1407,9 +1407,22 @@ int search_binary_handler(struct linux_binprm *bprm)
 }
 EXPORT_SYMBOL(search_binary_handler);
 
+static struct static_key sched_process_exec_key = STATIC_KEY_INIT_FALSE;
+
+void sched_process_exec_key_reg(void)
+{
+	static_key_slow_inc(&sched_process_exec_key);
+}
+
+void sched_process_exec_key_unreg(void)
+{
+	static_key_slow_dec(&sched_process_exec_key);
+}
+
 static int exec_binprm(struct linux_binprm *bprm)
 {
 	pid_t old_pid, old_vpid;
+	char *filename;
 	int ret;
 
 	/* Need to fetch pid before load_binary changes it */
@@ -1418,10 +1431,34 @@ static int exec_binprm(struct linux_binprm *bprm)
 	old_vpid = task_pid_nr_ns(current, task_active_pid_ns(current->parent));
 	rcu_read_unlock();
 
-	ret = search_binary_handler(bprm);
+	/*
+	 * When the sched_process_exec tracepoint is activated, it
+	 * enables the static key sched_process_exec_key to do some more
+	 * work. The issue is that the tracepoint wants to record the
+	 * filename of the bprm but only if search_binary_handler() returns
+	 * without an error. The problem is that the search_binary_handler()
+	 * may call mm_release() which can wake up the parent that did a vfork
+	 * (see cgroup call_usermodehelper()) and the parent may free bprm->filename.
+	 *
+	 * The filename must be saved before calling search_binary_handler().
+	 * But as this is only needed for tracing, only do this work if
+	 * the tracepoint is enabled.
+	 *
+	 * The static key usage removes conditional branches as well.
+	 */ 
+	if (static_key_false(&sched_process_exec_key)) {
+		/* If filename fails to alloc, just use tcomm */
+		filename = kstrdup(bprm->filename, GFP_KERNEL);
+		ret = search_binary_handler(bprm);
+		if (ret >= 0)
+			trace_sched_process_exec(current, old_pid,
+						 filename ? filename : bprm->tcomm);
+		kfree(filename);
+	} else
+		ret = search_binary_handler(bprm);
+
 	if (ret >= 0) {
 		audit_bprm(bprm);
-		trace_sched_process_exec(current, old_pid, bprm);
 		ptrace_event(PTRACE_EVENT_EXEC, old_vpid);
 		proc_exec_connector(current);
 	}
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 67e1bbf..a7eaa06 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -276,30 +276,35 @@ TRACE_EVENT(sched_process_fork,
 		__entry->child_comm, __entry->child_pid)
 );
 
+extern void sched_process_exec_key_reg(void);
+extern void sched_process_exec_key_unreg(void);
+
 /*
  * Tracepoint for exec:
  */
-TRACE_EVENT(sched_process_exec,
+TRACE_EVENT_FN(sched_process_exec,
 
-	TP_PROTO(struct task_struct *p, pid_t old_pid,
-		 struct linux_binprm *bprm),
+	TP_PROTO(struct task_struct *p, pid_t old_pid, char *filename),
 
-	TP_ARGS(p, old_pid, bprm),
+	TP_ARGS(p, old_pid, filename),
 
 	TP_STRUCT__entry(
-		__string(	filename,	bprm->filename	)
+		__string(	filename,	filename	)
 		__field(	pid_t,		pid		)
 		__field(	pid_t,		old_pid		)
 	),
 
 	TP_fast_assign(
-		__assign_str(filename, bprm->filename);
+		__assign_str(filename, filename);
 		__entry->pid		= p->pid;
 		__entry->old_pid	= old_pid;
 	),
 
 	TP_printk("filename=%s pid=%d old_pid=%d", __get_str(filename),
-		  __entry->pid, __entry->old_pid)
+		  __entry->pid, __entry->old_pid),
+
+	sched_process_exec_key_reg,
+	sched_process_exec_key_unreg
 );
 
 /*

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH] exec: Fix use after free of tracepoint trace_sched_process_exec
  2014-02-04 17:05 [RFC][PATCH] exec: Fix use after free of tracepoint trace_sched_process_exec Steven Rostedt
@ 2014-02-04 19:00 ` Oleg Nesterov
  2014-02-04 20:10   ` Steven Rostedt
  2014-02-04 20:18 ` Linus Torvalds
  1 sibling, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2014-02-04 19:00 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linus Torvalds, Andrew Morton, Al Viro, David Smith,
	Peter Zijlstra, Igor Zhbanov, Christoph Hellwig, Paul Menage

On 02/04, Steven Rostedt wrote:
>
> Now to fix this we need to save the filename before calling
> search_binary_handler(). But we don't want to save it if we are not
> tracing. Why slow everyone else down?

Yes, but it would be much simpler to dup filename unconditionally.

Note also that in this case we can kill linux_binprm->tcomm[] and
simplify filename_to_taskname().

> This works, but is rather ugly.

Yes ;)

> Looking for any other suggestions here.

Perhaps we can change flush_old_exec() to do

	if (!current->mm) {
		bprm->filename = kstrdup(bprm->filename);
		if (bprm->filename)
			bprm->filename_was_dupped = true; // for free_bprm()
		else
			bprm->filename = "//enomem";
	}

This won't penalize the normal exec, and this should fix the problem
afaics.

Perhaps, instead of "//enomem" flush_old_exec() should simply fail,
in this case we can kill bprm->tcomm[] too.

Oleg.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH] exec: Fix use after free of tracepoint trace_sched_process_exec
  2014-02-04 19:00 ` Oleg Nesterov
@ 2014-02-04 20:10   ` Steven Rostedt
  0 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2014-02-04 20:10 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: LKML, Linus Torvalds, Andrew Morton, Al Viro, David Smith,
	Peter Zijlstra, Igor Zhbanov, Christoph Hellwig, Paul Menage,
	David Howells

On Tue, 4 Feb 2014 20:00:07 +0100
Oleg Nesterov <oleg@redhat.com> wrote:

> On 02/04, Steven Rostedt wrote:
> >
> > Now to fix this we need to save the filename before calling
> > search_binary_handler(). But we don't want to save it if we are not
> > tracing. Why slow everyone else down?
> 
> Yes, but it would be much simpler to dup filename unconditionally.

I was hoping not to add more work to exec, for the purpose if tracing
is enabled.

> 
> Note also that in this case we can kill linux_binprm->tcomm[] and
> simplify filename_to_taskname().
> 
> > This works, but is rather ugly.
> 
> Yes ;)
> 
> > Looking for any other suggestions here.
> 
> Perhaps we can change flush_old_exec() to do
> 
> 	if (!current->mm) {
> 		bprm->filename = kstrdup(bprm->filename);
> 		if (bprm->filename)
> 			bprm->filename_was_dupped = true; // for free_bprm()
> 		else
> 			bprm->filename = "//enomem";
> 	}
> 
> This won't penalize the normal exec, and this should fix the problem
> afaics.

Don't forget to add the free. And in any case, we added a branch
conditional. Yes, I'm that anal.

> 
> Perhaps, instead of "//enomem" flush_old_exec() should simply fail,
> in this case we can kill bprm->tcomm[] too.

We could just add "//enomem" to the tracepoint too, and not use tcomm.

Also, I'm working on a patch that gets rid of all the setup of the
static_key and reuses the tracepoint static key instead (suggested by
Peter Zijlstra).

Thus, the change in exec.c would just look like this:


	if (tracepoint_enabled(sched_process_exec)) {
		[...]
	} else
		ret = search_binary_handler(bprm);


That is, we can get rid of the sched_process_exec_key_reg and unreg
functions, and just use this and it would become the same thing.

This will be helpful in other places too (David, I'm talking to you ;-)


-- Steve

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH] exec: Fix use after free of tracepoint trace_sched_process_exec
  2014-02-04 17:05 [RFC][PATCH] exec: Fix use after free of tracepoint trace_sched_process_exec Steven Rostedt
  2014-02-04 19:00 ` Oleg Nesterov
@ 2014-02-04 20:18 ` Linus Torvalds
  2014-02-04 20:31   ` Steven Rostedt
  2014-02-04 23:28   ` Steven Rostedt
  1 sibling, 2 replies; 14+ messages in thread
From: Linus Torvalds @ 2014-02-04 20:18 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Andrew Morton, Oleg Nesterov, Al Viro, David Smith,
	Peter Zijlstra, Igor Zhbanov, Christoph Hellwig, Paul Menage

On Tue, Feb 4, 2014 at 9:05 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> This works, but is rather ugly.

Oh, please, that's a British-level understatement. It's like calling
WWII "a small bother".

That's too ugly to live.

> Looking for any other suggestions here.

Do we actually have to use "filename" at all?

We do have bprm->file, and we could get a path from that. It would be
more expensive, but for tracing execve that might be fine. Yes/no?

Or maybe we could just push the "putname(path)" into free_bprm() and
remove it from the callers. That's where we free bprm->interp anyway,
so it would kind of match.

          Linus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH] exec: Fix use after free of tracepoint trace_sched_process_exec
  2014-02-04 20:18 ` Linus Torvalds
@ 2014-02-04 20:31   ` Steven Rostedt
  2014-02-04 23:28   ` Steven Rostedt
  1 sibling, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2014-02-04 20:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, Andrew Morton, Oleg Nesterov, Al Viro, David Smith,
	Peter Zijlstra, Igor Zhbanov, Christoph Hellwig, Paul Menage

On Tue, 4 Feb 2014 12:18:53 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:


> We do have bprm->file, and we could get a path from that. It would be
> more expensive, but for tracing execve that might be fine. Yes/no?

I actually looked at that. But it may duplicate a bit of code. I was
thinking perhaps of making helper functions to extract the path string
from the file pointer (if there isn't one already that I'm unaware of)
and using that. Also it might be used for places like proc.

-- Steve

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH] exec: Fix use after free of tracepoint trace_sched_process_exec
  2014-02-04 20:18 ` Linus Torvalds
  2014-02-04 20:31   ` Steven Rostedt
@ 2014-02-04 23:28   ` Steven Rostedt
  2014-02-04 23:42     ` Steven Rostedt
  1 sibling, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2014-02-04 23:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, Andrew Morton, Oleg Nesterov, Al Viro, David Smith,
	Peter Zijlstra, Igor Zhbanov, Christoph Hellwig

On Tue, 4 Feb 2014 12:18:53 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> That's too ugly to live.

New patch. Not as ugly. Well, I think this one lacks ugly enough to be
worth living for.

It's dependent on another patch that adds a helper function for
tracepoints, that allows users to implicitly use static_key of a
tracepoint to see if it is enabled or not. Basically, it's:

#define tracepoint_enabled(name) \
	static_key_false(&__tracepoint_##name.key)

This uses the same key that the tracepoint has when it is enabled. It
may be enabled before or after the tracepoint is, but in cases like
these, it doesn't really matter.

At least this patch keeps the ugliness with the code. I could even
encapsulate that in a static inline function to remove the ugliness out
of exec_binprm().

Note, this still requires some comments added to the code.

Butt-ugly-by: Steven Rostedt <rostedt@goodmis.org>

diff --git a/fs/exec.c b/fs/exec.c
index e1529b4..f7902aef 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1421,7 +1421,22 @@ static int exec_binprm(struct linux_binprm *bprm)
 	ret = search_binary_handler(bprm);
 	if (ret >= 0) {
 		audit_bprm(bprm);
-		trace_sched_process_exec(current, old_pid, bprm);
+		if (tracepoint_enabled(sched_process_exec)) {
+			char *tmp = (char*)__get_free_page(GFP_TEMPORARY);
+			char *pathname;
+
+			if (tmp)
+				pathname = dentry_path_raw(bprm->file->f_dentry,
+							   tmp, PAGE_SIZE);
+			else
+				pathname = ERR_PTR(-ENOMEM);
+			if (IS_ERR(pathname))
+				trace_sched_process_exec(current, old_pid,
+							 "//error-no-mem");
+			else
+				trace_sched_process_exec(current, old_pid, pathname);
+			free_page((unsigned long)tmp);
+		}
 		ptrace_event(PTRACE_EVENT_EXEC, old_vpid);
 		proc_exec_connector(current);
 	}
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 67e1bbf..520ba9a 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -282,18 +282,18 @@ TRACE_EVENT(sched_process_fork,
 TRACE_EVENT(sched_process_exec,
 
 	TP_PROTO(struct task_struct *p, pid_t old_pid,
-		 struct linux_binprm *bprm),
+		 const char *pathname),
 
-	TP_ARGS(p, old_pid, bprm),
+	TP_ARGS(p, old_pid, pathname),
 
 	TP_STRUCT__entry(
-		__string(	filename,	bprm->filename	)
+		__string(	filename,	pathname	)
 		__field(	pid_t,		pid		)
 		__field(	pid_t,		old_pid		)
 	),
 
 	TP_fast_assign(
-		__assign_str(filename, bprm->filename);
+		__assign_str(filename, pathname);
 		__entry->pid		= p->pid;
 		__entry->old_pid	= old_pid;
 	),

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH] exec: Fix use after free of tracepoint trace_sched_process_exec
  2014-02-04 23:28   ` Steven Rostedt
@ 2014-02-04 23:42     ` Steven Rostedt
  2014-02-05  0:57       ` Linus Torvalds
  0 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2014-02-04 23:42 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linus Torvalds, LKML, Andrew Morton, Oleg Nesterov, Al Viro,
	David Smith, Peter Zijlstra, Igor Zhbanov, Christoph Hellwig

On Tue, 4 Feb 2014 18:28:52 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:


> At least this patch keeps the ugliness with the code. I could even
> encapsulate that in a static inline function to remove the ugliness out
> of exec_binprm().

New version that moves all the ugliness into a static inline helper
function.

I'll start testing this and the added helper, and if there's no
problems or arguments from you, I'll post them tomorrow.

-- Steve

diff --git a/fs/exec.c b/fs/exec.c
index e1529b4..06101e9 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1407,6 +1407,36 @@ int search_binary_handler(struct linux_binprm *bprm)
 }
 EXPORT_SYMBOL(search_binary_handler);
 
+static __always_inline void
+do_trace_sched_process_exec(pid_t old_pid, struct linux_binprm *bprm)
+{
+	char *pathname;
+	char *tmp;
+
+	/*
+	 * We can't use bprm->filename here because it can be freed
+	 * within the search_binary_handler() call (by mm_release()
+	 * waking up the cgroup call_usermodehelper parent, that
+	 * will free what bprm->filename points to). Instead, if
+	 * the sched_process_exec tracepoint is enabled, we need
+	 * to create the pathname from the file dentry pointer.
+	 */
+	if (tracepoint_enabled(sched_process_exec)) {
+		tmp = (char*)__get_free_page(GFP_TEMPORARY);
+		if (tmp)
+			pathname = dentry_path_raw(bprm->file->f_dentry,
+						   tmp, PAGE_SIZE);
+		else
+			pathname = ERR_PTR(-ENOMEM);
+		if (IS_ERR(pathname))
+			trace_sched_process_exec(current, old_pid,
+						 "//error-no-mem");
+		else
+			trace_sched_process_exec(current, old_pid, pathname);
+		free_page((unsigned long)tmp);
+	}
+}
+
 static int exec_binprm(struct linux_binprm *bprm)
 {
 	pid_t old_pid, old_vpid;
@@ -1421,7 +1451,7 @@ static int exec_binprm(struct linux_binprm *bprm)
 	ret = search_binary_handler(bprm);
 	if (ret >= 0) {
 		audit_bprm(bprm);
-		trace_sched_process_exec(current, old_pid, bprm);
+		do_trace_sched_process_exec(old_pid, bprm);
 		ptrace_event(PTRACE_EVENT_EXEC, old_vpid);
 		proc_exec_connector(current);
 	}
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 67e1bbf..520ba9a 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -282,18 +282,18 @@ TRACE_EVENT(sched_process_fork,
 TRACE_EVENT(sched_process_exec,
 
 	TP_PROTO(struct task_struct *p, pid_t old_pid,
-		 struct linux_binprm *bprm),
+		 const char *pathname),
 
-	TP_ARGS(p, old_pid, bprm),
+	TP_ARGS(p, old_pid, pathname),
 
 	TP_STRUCT__entry(
-		__string(	filename,	bprm->filename	)
+		__string(	filename,	pathname	)
 		__field(	pid_t,		pid		)
 		__field(	pid_t,		old_pid		)
 	),
 
 	TP_fast_assign(
-		__assign_str(filename, bprm->filename);
+		__assign_str(filename, pathname);
 		__entry->pid		= p->pid;
 		__entry->old_pid	= old_pid;
 	),

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH] exec: Fix use after free of tracepoint trace_sched_process_exec
  2014-02-04 23:42     ` Steven Rostedt
@ 2014-02-05  0:57       ` Linus Torvalds
  2014-02-05  1:10         ` Al Viro
  2014-02-05  2:31         ` Steven Rostedt
  0 siblings, 2 replies; 14+ messages in thread
From: Linus Torvalds @ 2014-02-05  0:57 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Andrew Morton, Oleg Nesterov, Al Viro, David Smith,
	Peter Zijlstra, Igor Zhbanov, Christoph Hellwig

[-- Attachment #1: Type: text/plain, Size: 1634 bytes --]

On Tue, Feb 4, 2014 at 3:42 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> New version that moves all the ugliness into a static inline helper
> function.

Ok, that's better, but I really think we should just use "getname()"
and "putname()".

That's what the path that *matters* already does (ie the normal
execve() system call), so let's just make all the random cases do the
same thing.

That requires a "getname_kernel()" to create a "struct filename *"
from a kernel string, but hey, that's simple enough.

NOTE! This means that "bprm->filename" is no longer a string: it's a
"struct filename *", so if you want the string, you do
"filename->name".

This actually cleans the normal paths up - look how "open_exec()" used
to create a dummy

        struct filename tmp = { .name = name };

on the stack because do_filp_open() wants a 'struct filename' pointer.
I leave that for the external callers (that use it for the
interpreter), but for the main path that actually just goes away,
because now we have that "struct filename *" natively.

It does add code to the special kernel-execve paths, but moving the
"handle errors from getname()" code into do_execve(), even that is
really trivial, eg:

  -       return do_execve(init_filename,
  +       return do_execve(getname_kernel(init_filename),

NOTE NOTE NOTE. This is untested, but it looks fine. If I missed
something, the compiler should warn about bad types. I didn't my the
bprm_flat changes, for example, because that's a no-MMU-only file. And
I might have missed something that didn't match my grep patterns, for
example.

How does this look?

               Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 16836 bytes --]

 arch/parisc/hpux/fs.c             |  3 +-
 arch/tile/mm/elf.c                |  2 +-
 fs/binfmt_em86.c                  |  2 +-
 fs/binfmt_flat.c                  | 15 ++++++----
 fs/exec.c                         | 58 ++++++++++++++++++++-------------------
 fs/namei.c                        | 27 ++++++++++++++++++
 include/linux/binfmts.h           |  3 +-
 include/linux/fs.h                |  1 +
 include/linux/sched.h             |  3 +-
 include/trace/events/sched.h      |  4 +--
 init/main.c                       |  2 +-
 kernel/kmod.c                     |  2 +-
 security/apparmor/domain.c        |  2 +-
 security/commoncap.c              |  6 ++--
 security/integrity/ima/ima_main.c |  4 +--
 security/tomoyo/domain.c          |  2 +-
 security/tomoyo/tomoyo.c          |  2 +-
 17 files changed, 86 insertions(+), 52 deletions(-)

diff --git a/arch/parisc/hpux/fs.c b/arch/parisc/hpux/fs.c
index 88d0962de65a..23ebcd678da3 100644
--- a/arch/parisc/hpux/fs.c
+++ b/arch/parisc/hpux/fs.c
@@ -41,11 +41,10 @@ int hpux_execve(struct pt_regs *regs)
 	if (IS_ERR(filename))
 		goto out;
 
-	error = do_execve(filename->name,
+	error = do_execve(filename,
 			  (const char __user *const __user *) regs->gr[25],
 			  (const char __user *const __user *) regs->gr[24]);
 
-	putname(filename);
 
 out:
 	return error;
diff --git a/arch/tile/mm/elf.c b/arch/tile/mm/elf.c
index 23f044e8a7ab..3ad54a7a927c 100644
--- a/arch/tile/mm/elf.c
+++ b/arch/tile/mm/elf.c
@@ -117,7 +117,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
 	 * whatever was passed as the linux_binprm filename.
 	 */
 	if (!notify_exec(mm))
-		sim_notify_exec(bprm->filename);
+		sim_notify_exec(bprm->filename->name);
 
 	retval = setup_vdso_pages();
 
diff --git a/fs/binfmt_em86.c b/fs/binfmt_em86.c
index f37b08cea1f7..077a76fb6347 100644
--- a/fs/binfmt_em86.c
+++ b/fs/binfmt_em86.c
@@ -62,7 +62,7 @@ static int load_em86(struct linux_binprm *bprm)
 	 * user environment and arguments are stored.
 	 */
 	remove_arg_zero(bprm);
-	retval = copy_strings_kernel(1, &bprm->filename, bprm);
+	retval = copy_strings_kernel(1, &bprm->filename->name, bprm);
 	if (retval < 0) return retval; 
 	bprm->argc++;
 	if (i_arg) {
diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
index d50bbe59da1e..f1377a5d260b 100644
--- a/fs/binfmt_flat.c
+++ b/fs/binfmt_flat.c
@@ -469,7 +469,7 @@ static int load_flat_file(struct linux_binprm * bprm,
 	}
 
 	if (flags & FLAT_FLAG_KTRACE)
-		printk("BINFMT_FLAT: Loading file: %s\n", bprm->filename);
+		printk("BINFMT_FLAT: Loading file: %s\n", bprm->filename->name);
 
 	if (rev != FLAT_VERSION && rev != OLD_FLAT_VERSION) {
 		printk("BINFMT_FLAT: bad flat file version 0x%x (supported "
@@ -686,7 +686,7 @@ static int load_flat_file(struct linux_binprm * bprm,
 
 	if (flags & FLAT_FLAG_KTRACE)
 		printk("%s %s: TEXT=%x-%x DATA=%x-%x BSS=%x-%x\n",
-			id ? "Lib" : "Load", bprm->filename,
+			id ? "Lib" : "Load", bprm->filename->name,
 			(int) start_code, (int) end_code,
 			(int) datapos,
 			(int) (datapos + data_len),
@@ -818,11 +818,14 @@ static int load_flat_shared_library(int id, struct lib_info *libs)
 	sprintf(buf, "/lib/lib%d.so", id);
 
 	/* Open the file up */
-	bprm.filename = buf;
-	bprm.file = open_exec(bprm.filename);
+	bprm.filename = getname_kernel(buf);
+	if (IS_ERR(bprm.filename))
+		return res;
+
+	bprm.file = open_exec(buf);
 	res = PTR_ERR(bprm.file);
 	if (IS_ERR(bprm.file))
-		return res;
+		goto out_putname;
 
 	bprm.cred = prepare_exec_creds();
 	res = -ENOMEM;
@@ -845,6 +848,8 @@ static int load_flat_shared_library(int id, struct lib_info *libs)
 out:
 	allow_write_access(bprm.file);
 	fput(bprm.file);
+out_putname:
+	putname(bprm.filename);
 
 	return(res);
 }
diff --git a/fs/exec.c b/fs/exec.c
index e1529b4c79b1..538be413b26b 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -748,11 +748,10 @@ EXPORT_SYMBOL(setup_arg_pages);
 
 #endif /* CONFIG_MMU */
 
-struct file *open_exec(const char *name)
+static struct file *do_open_exec(struct filename *name)
 {
 	struct file *file;
 	int err;
-	struct filename tmp = { .name = name };
 	static const struct open_flags open_exec_flags = {
 		.open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
 		.acc_mode = MAY_EXEC | MAY_OPEN,
@@ -760,7 +759,7 @@ struct file *open_exec(const char *name)
 		.lookup_flags = LOOKUP_FOLLOW,
 	};
 
-	file = do_filp_open(AT_FDCWD, &tmp, &open_exec_flags);
+	file = do_filp_open(AT_FDCWD, name, &open_exec_flags);
 	if (IS_ERR(file))
 		goto out;
 
@@ -784,6 +783,12 @@ exit:
 	fput(file);
 	return ERR_PTR(err);
 }
+
+struct file *open_exec(const char *name)
+{
+	struct filename tmp = { .name = name };
+	return do_open_exec(&tmp);
+}
 EXPORT_SYMBOL(open_exec);
 
 int kernel_read(struct file *file, loff_t offset,
@@ -1074,7 +1079,7 @@ int flush_old_exec(struct linux_binprm * bprm)
 
 	set_mm_exe_file(bprm->mm, bprm->file);
 
-	filename_to_taskname(bprm->tcomm, bprm->filename, sizeof(bprm->tcomm));
+	filename_to_taskname(bprm->tcomm, bprm->filename->name, sizeof(bprm->tcomm));
 	/*
 	 * Release all of the old mmap stuff
 	 */
@@ -1162,7 +1167,7 @@ int prepare_bprm_creds(struct linux_binprm *bprm)
 	return -ENOMEM;
 }
 
-void free_bprm(struct linux_binprm *bprm)
+static void free_bprm(struct linux_binprm *bprm)
 {
 	free_arg_pages(bprm);
 	if (bprm->cred) {
@@ -1174,15 +1179,17 @@ void free_bprm(struct linux_binprm *bprm)
 		fput(bprm->file);
 	}
 	/* If a binfmt changed the interp, free it. */
-	if (bprm->interp != bprm->filename)
+	if (bprm->interp != bprm->filename->name)
 		kfree(bprm->interp);
+	if (bprm->filename)
+		putname(bprm->filename);
 	kfree(bprm);
 }
 
 int bprm_change_interp(char *interp, struct linux_binprm *bprm)
 {
 	/* If a binfmt changed the interp, free it first. */
-	if (bprm->interp != bprm->filename)
+	if (bprm->interp != bprm->filename->name)
 		kfree(bprm->interp);
 	bprm->interp = kstrdup(interp, GFP_KERNEL);
 	if (!bprm->interp)
@@ -1432,7 +1439,7 @@ static int exec_binprm(struct linux_binprm *bprm)
 /*
  * sys_execve() executes a new program.
  */
-static int do_execve_common(const char *filename,
+static int do_execve_common(struct filename *filename,
 				struct user_arg_ptr argv,
 				struct user_arg_ptr envp)
 {
@@ -1441,6 +1448,9 @@ static int do_execve_common(const char *filename,
 	struct files_struct *displaced;
 	int retval;
 
+	if (IS_ERR(filename))
+		return PTR_ERR(filename);
+
 	/*
 	 * We move the actual failure in case of RLIMIT_NPROC excess from
 	 * set*uid() to execve() because too many poorly written programs
@@ -1466,6 +1476,9 @@ static int do_execve_common(const char *filename,
 	if (!bprm)
 		goto out_files;
 
+	bprm->filename = filename;
+	bprm->interp = filename->name;
+
 	retval = prepare_bprm_creds(bprm);
 	if (retval)
 		goto out_free;
@@ -1473,7 +1486,7 @@ static int do_execve_common(const char *filename,
 	check_unsafe_exec(bprm);
 	current->in_execve = 1;
 
-	file = open_exec(filename);
+	file = do_open_exec(filename);
 	retval = PTR_ERR(file);
 	if (IS_ERR(file))
 		goto out_unmark;
@@ -1481,8 +1494,6 @@ static int do_execve_common(const char *filename,
 	sched_exec();
 
 	bprm->file = file;
-	bprm->filename = filename;
-	bprm->interp = filename;
 
 	retval = bprm_mm_init(bprm);
 	if (retval)
@@ -1500,7 +1511,7 @@ static int do_execve_common(const char *filename,
 	if (retval < 0)
 		goto out;
 
-	retval = copy_strings_kernel(1, &bprm->filename, bprm);
+	retval = copy_strings_kernel(1, &bprm->filename->name, bprm);
 	if (retval < 0)
 		goto out;
 
@@ -1539,15 +1550,18 @@ out_unmark:
 
 out_free:
 	free_bprm(bprm);
+	filename = NULL;
 
 out_files:
 	if (displaced)
 		reset_files_struct(displaced);
 out_ret:
+	if (filename)
+		putname(filename);
 	return retval;
 }
 
-int do_execve(const char *filename,
+int do_execve(struct filename *filename,
 	const char __user *const __user *__argv,
 	const char __user *const __user *__envp)
 {
@@ -1557,7 +1571,7 @@ int do_execve(const char *filename,
 }
 
 #ifdef CONFIG_COMPAT
-static int compat_do_execve(const char *filename,
+static int compat_do_execve(struct filename *filename,
 	const compat_uptr_t __user *__argv,
 	const compat_uptr_t __user *__envp)
 {
@@ -1607,25 +1621,13 @@ SYSCALL_DEFINE3(execve,
 		const char __user *const __user *, argv,
 		const char __user *const __user *, envp)
 {
-	struct filename *path = getname(filename);
-	int error = PTR_ERR(path);
-	if (!IS_ERR(path)) {
-		error = do_execve(path->name, argv, envp);
-		putname(path);
-	}
-	return error;
+	return do_execve(getname(filename), argv, envp);
 }
 #ifdef CONFIG_COMPAT
 asmlinkage long compat_sys_execve(const char __user * filename,
 	const compat_uptr_t __user * argv,
 	const compat_uptr_t __user * envp)
 {
-	struct filename *path = getname(filename);
-	int error = PTR_ERR(path);
-	if (!IS_ERR(path)) {
-		error = compat_do_execve(path->name, argv, envp);
-		putname(path);
-	}
-	return error;
+	return compat_do_execve(getname(filename), argv, envp);
 }
 #endif
diff --git a/fs/namei.c b/fs/namei.c
index d580df2e6804..593d70a75ebd 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -210,6 +210,33 @@ getname(const char __user * filename)
 	return getname_flags(filename, 0, NULL);
 }
 
+/*
+ * The "getname_kernel()" interface doesn't do pathnames longer
+ * than EMBEDDED_NAME_MAX. Deal with it - you're a kernel user.
+ */
+struct filename *
+getname_kernel(const char * filename)
+{
+	struct filename *result;
+	char *kname;
+	int len;
+
+	len = strlen(filename);
+	if (len >= EMBEDDED_NAME_MAX)
+		return ERR_PTR(-ENAMETOOLONG);
+
+	result = __getname();
+	if (unlikely(!result))
+		return ERR_PTR(-ENOMEM);
+
+	kname = (char *)result + sizeof(*result);
+	result->name = kname;
+	result->separate = false;
+
+	strlcpy(kname, filename, EMBEDDED_NAME_MAX);
+	return result;
+}
+
 #ifdef CONFIG_AUDITSYSCALL
 void putname(struct filename *name)
 {
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index fd8bf3219ef7..4c4a85accbb1 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -37,7 +37,7 @@ struct linux_binprm {
 	int unsafe;		/* how unsafe this exec is (mask of LSM_UNSAFE_*) */
 	unsigned int per_clear;	/* bits to clear in current->personality */
 	int argc, envc;
-	const char * filename;	/* Name of binary as seen by procps */
+	struct filename *filename;	/* Name of binary as seen by procps */
 	const char * interp;	/* Name of the binary really executed. Most
 				   of the time same as filename, but could be
 				   different for binfmt_{misc,script} */
@@ -115,7 +115,6 @@ extern int copy_strings_kernel(int argc, const char *const *argv,
 extern int prepare_bprm_creds(struct linux_binprm *bprm);
 extern void install_exec_creds(struct linux_binprm *bprm);
 extern void set_binfmt(struct linux_binfmt *new);
-extern void free_bprm(struct linux_binprm *);
 extern ssize_t read_code(struct file *, unsigned long, loff_t, size_t);
 
 #endif /* _LINUX_BINFMTS_H */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 09f553c59813..d79678c188ad 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2079,6 +2079,7 @@ extern struct file * dentry_open(const struct path *, int, const struct cred *);
 extern int filp_close(struct file *, fl_owner_t id);
 
 extern struct filename *getname(const char __user *);
+extern struct filename *getname_kernel(const char *);
 
 enum {
 	FILE_CREATED = 1,
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 68a0e84463a0..a781dec1cd0b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -128,6 +128,7 @@ struct bio_list;
 struct fs_struct;
 struct perf_event_context;
 struct blk_plug;
+struct filename;
 
 /*
  * List of flags we want to share for kernel threads,
@@ -2311,7 +2312,7 @@ extern void do_group_exit(int);
 extern int allow_signal(int);
 extern int disallow_signal(int);
 
-extern int do_execve(const char *,
+extern int do_execve(struct filename *,
 		     const char __user * const __user *,
 		     const char __user * const __user *);
 extern long do_fork(unsigned long, unsigned long, unsigned long, int __user *, int __user *);
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 67e1bbf83695..c1e611e29cb3 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -287,13 +287,13 @@ TRACE_EVENT(sched_process_exec,
 	TP_ARGS(p, old_pid, bprm),
 
 	TP_STRUCT__entry(
-		__string(	filename,	bprm->filename	)
+		__string(	filename,	bprm->filename->name	)
 		__field(	pid_t,		pid		)
 		__field(	pid_t,		old_pid		)
 	),
 
 	TP_fast_assign(
-		__assign_str(filename, bprm->filename);
+		__assign_str(filename, bprm->filename->name);
 		__entry->pid		= p->pid;
 		__entry->old_pid	= old_pid;
 	),
diff --git a/init/main.c b/init/main.c
index 2fd9cef70ee8..eb03090cdced 100644
--- a/init/main.c
+++ b/init/main.c
@@ -812,7 +812,7 @@ void __init load_default_modules(void)
 static int run_init_process(const char *init_filename)
 {
 	argv_init[0] = init_filename;
-	return do_execve(init_filename,
+	return do_execve(getname_kernel(init_filename),
 		(const char __user *const __user *)argv_init,
 		(const char __user *const __user *)envp_init);
 }
diff --git a/kernel/kmod.c b/kernel/kmod.c
index b086006c59e7..6b375af4958d 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -239,7 +239,7 @@ static int ____call_usermodehelper(void *data)
 
 	commit_creds(new);
 
-	retval = do_execve(sub_info->path,
+	retval = do_execve(getname_kernel(sub_info->path),
 			   (const char __user *const __user *)sub_info->argv,
 			   (const char __user *const __user *)sub_info->envp);
 	if (!retval)
diff --git a/security/apparmor/domain.c b/security/apparmor/domain.c
index 452567d3a08e..0358166d0652 100644
--- a/security/apparmor/domain.c
+++ b/security/apparmor/domain.c
@@ -372,7 +372,7 @@ int apparmor_bprm_set_creds(struct linux_binprm *bprm)
 		if (unconfined(profile) ||
 		    (profile->flags & PFLAG_IX_ON_NAME_ERROR))
 			error = 0;
-		name = bprm->filename;
+		name = bprm->filename->name;
 		goto audit;
 	}
 
diff --git a/security/commoncap.c b/security/commoncap.c
index b9d613e0ef14..43b39e520697 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -449,7 +449,7 @@ static int get_file_caps(struct linux_binprm *bprm, bool *effective, bool *has_c
 	if (rc < 0) {
 		if (rc == -EINVAL)
 			printk(KERN_NOTICE "%s: get_vfs_caps_from_disk returned %d for %s\n",
-				__func__, rc, bprm->filename);
+				__func__, rc, bprm->filename->name);
 		else if (rc == -ENODATA)
 			rc = 0;
 		goto out;
@@ -458,7 +458,7 @@ static int get_file_caps(struct linux_binprm *bprm, bool *effective, bool *has_c
 	rc = bprm_caps_from_vfs_caps(&vcaps, bprm, effective, has_cap);
 	if (rc == -EINVAL)
 		printk(KERN_NOTICE "%s: cap_from_disk returned %d for %s\n",
-		       __func__, rc, bprm->filename);
+		       __func__, rc, bprm->filename->name);
 
 out:
 	dput(dentry);
@@ -498,7 +498,7 @@ int cap_bprm_set_creds(struct linux_binprm *bprm)
 		 * for a root user just to cause least surprise to an admin.
 		 */
 		if (has_cap && !uid_eq(new->uid, root_uid) && uid_eq(new->euid, root_uid)) {
-			warn_setuid_and_fcaps_mixed(bprm->filename);
+			warn_setuid_and_fcaps_mixed(bprm->filename->name);
 			goto skip;
 		}
 		/*
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index 149ee1119f87..330b706a570e 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -278,8 +278,8 @@ int ima_file_mmap(struct file *file, unsigned long prot)
 int ima_bprm_check(struct linux_binprm *bprm)
 {
 	return process_measurement(bprm->file,
-				   (strcmp(bprm->filename, bprm->interp) == 0) ?
-				   bprm->filename : bprm->interp,
+				   (strcmp(bprm->filename->name, bprm->interp) == 0) ?
+				   bprm->filename->name : bprm->interp,
 				   MAY_EXEC, BPRM_CHECK);
 }
 
diff --git a/security/tomoyo/domain.c b/security/tomoyo/domain.c
index 38651454ed08..eb3230cbf158 100644
--- a/security/tomoyo/domain.c
+++ b/security/tomoyo/domain.c
@@ -677,7 +677,7 @@ int tomoyo_find_next_domain(struct linux_binprm *bprm)
 {
 	struct tomoyo_domain_info *old_domain = tomoyo_domain();
 	struct tomoyo_domain_info *domain = NULL;
-	const char *original_name = bprm->filename;
+	const char *original_name = bprm->filename->name;
 	int retval = -ENOMEM;
 	bool reject_on_transition_failure = false;
 	const struct tomoyo_path_info *candidate;
diff --git a/security/tomoyo/tomoyo.c b/security/tomoyo/tomoyo.c
index f0b756e27fed..96757c3b1bd8 100644
--- a/security/tomoyo/tomoyo.c
+++ b/security/tomoyo/tomoyo.c
@@ -90,7 +90,7 @@ static int tomoyo_bprm_set_creds(struct linux_binprm *bprm)
 	 * for the first time.
 	 */
 	if (!tomoyo_policy_loaded)
-		tomoyo_load_policy(bprm->filename);
+		tomoyo_load_policy(bprm->filename->name);
 #endif
 	/*
 	 * Release reference to "struct tomoyo_domain_info" stored inside

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH] exec: Fix use after free of tracepoint trace_sched_process_exec
  2014-02-05  0:57       ` Linus Torvalds
@ 2014-02-05  1:10         ` Al Viro
  2014-02-05  3:37           ` Linus Torvalds
  2014-02-05  2:31         ` Steven Rostedt
  1 sibling, 1 reply; 14+ messages in thread
From: Al Viro @ 2014-02-05  1:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Rostedt, LKML, Andrew Morton, Oleg Nesterov, David Smith,
	Peter Zijlstra, Igor Zhbanov, Christoph Hellwig

On Tue, Feb 04, 2014 at 04:57:31PM -0800, Linus Torvalds wrote:
> On Tue, Feb 4, 2014 at 3:42 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > New version that moves all the ugliness into a static inline helper
> > function.
> 
> Ok, that's better, but I really think we should just use "getname()"
> and "putname()".

Umm...  Interactions with aushit might be interesting.  It hooks into
getname() and putname(); I'm not up to doing analysis right now (13 hours
of nearly non-stop {R,Grep}TFS today already and I'm ears-deep in
looking through locking implications of Miklos' stuff at the moment ;-/),
but it's worth looking into; there may be dragons.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH] exec: Fix use after free of tracepoint trace_sched_process_exec
  2014-02-05  0:57       ` Linus Torvalds
  2014-02-05  1:10         ` Al Viro
@ 2014-02-05  2:31         ` Steven Rostedt
  2014-02-05  2:51           ` Steven Rostedt
  1 sibling, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2014-02-05  2:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, Andrew Morton, Oleg Nesterov, Al Viro, David Smith,
	Peter Zijlstra, Igor Zhbanov, Christoph Hellwig

On Tue, 4 Feb 2014 16:57:31 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:


> How does this look?

Well, it is lacking a bit of the "ugly factor" but other than that,
I ran it through some minor tests (basically just logged in and enabled
the sched_process_exec tracepoint and ran a few programs and looked at
what would cause the previous bug). It passed.

I can run it through some more vigorous tests, but I think Al still
needs to really take a look at it.

Tested-by: Steven Rostedt <rostedt@goodmis.org>

(probably should keep)

Reported-by: Igor Zhbanov <i.zhbanov@samsung.com>

-- Steve

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH] exec: Fix use after free of tracepoint trace_sched_process_exec
  2014-02-05  2:31         ` Steven Rostedt
@ 2014-02-05  2:51           ` Steven Rostedt
  0 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2014-02-05  2:51 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linus Torvalds, LKML, Andrew Morton, Oleg Nesterov, Al Viro,
	David Smith, Peter Zijlstra, Igor Zhbanov, Christoph Hellwig

On Tue, 4 Feb 2014 21:31:09 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
 
> (probably should keep)
> 
> Reported-by: Igor Zhbanov <i.zhbanov@samsung.com>

I should probably state that "IZh" confirmed on IRC that this is indeed
the person that reported the issue in the first place.

-- Steve

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH] exec: Fix use after free of tracepoint trace_sched_process_exec
  2014-02-05  1:10         ` Al Viro
@ 2014-02-05  3:37           ` Linus Torvalds
  2014-02-05 13:52             ` Oleg Nesterov
  0 siblings, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2014-02-05  3:37 UTC (permalink / raw)
  To: Al Viro, Eric Paris
  Cc: Steven Rostedt, LKML, Andrew Morton, Oleg Nesterov, David Smith,
	Peter Zijlstra, Igor Zhbanov, Christoph Hellwig

[-- Attachment #1: Type: text/plain, Size: 1618 bytes --]

On Tue, Feb 4, 2014 at 5:10 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Umm...  Interactions with aushit might be interesting.

Freudian slip or intentional? :-)

> It hooks into getname() and putname(); I'm not up to doing analysis
> right now [...]

Right you are. I was actually aware of that, but grepping for things
it all looked fine. But I got confused by all the insane audit
wrappers, and you're right, it needs some massaging for audit
handling.

And that audit code really is aushit. I think I found a bug in it
while just scanning it: if audit_alloc_name() fails, the filename will
never be added to the audit lists, and name_count will never be
incremented. But then when we call audit_putname it won't actually put
the name, so it all just leaks - and if you have AUDIT_DEBUG enabled
you'd eventually see an error.

I wonder if we could get rid of some of that crap, and make the audit
code use dentry_path() instead of trying to save off pathnames like
that. But I don't know what the audit code actually *uses* the
pathnames for, so what do I know.

Eric? Can you please explain?

Also, here's a slightly updated patch. The change is that:
 - getname_kernel() will now clear 'filename->aname'
 - cleared 'aname' for regular getname too before calling
audit_getname(), so that if that one fails, it will be NULL.
 - audit_putname() will consider a NULL aname to be the same as not
being in audit context, and just do a final_putname() on it.

That should fix the audit filename leak too, afaik.

Eric, please take a look. As well as explain the audit name thing if possible.

                Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 17553 bytes --]

 arch/parisc/hpux/fs.c             |  3 +-
 arch/tile/mm/elf.c                |  2 +-
 fs/binfmt_em86.c                  |  2 +-
 fs/binfmt_flat.c                  | 15 ++++++----
 fs/exec.c                         | 58 ++++++++++++++++++++-------------------
 fs/namei.c                        | 30 ++++++++++++++++++++
 include/linux/binfmts.h           |  3 +-
 include/linux/fs.h                |  1 +
 include/linux/sched.h             |  3 +-
 include/trace/events/sched.h      |  4 +--
 init/main.c                       |  2 +-
 kernel/auditsc.c                  |  2 +-
 kernel/kmod.c                     |  2 +-
 security/apparmor/domain.c        |  2 +-
 security/commoncap.c              |  6 ++--
 security/integrity/ima/ima_main.c |  4 +--
 security/tomoyo/domain.c          |  2 +-
 security/tomoyo/tomoyo.c          |  2 +-
 18 files changed, 90 insertions(+), 53 deletions(-)

diff --git a/arch/parisc/hpux/fs.c b/arch/parisc/hpux/fs.c
index 88d0962de65a..23ebcd678da3 100644
--- a/arch/parisc/hpux/fs.c
+++ b/arch/parisc/hpux/fs.c
@@ -41,11 +41,10 @@ int hpux_execve(struct pt_regs *regs)
 	if (IS_ERR(filename))
 		goto out;
 
-	error = do_execve(filename->name,
+	error = do_execve(filename,
 			  (const char __user *const __user *) regs->gr[25],
 			  (const char __user *const __user *) regs->gr[24]);
 
-	putname(filename);
 
 out:
 	return error;
diff --git a/arch/tile/mm/elf.c b/arch/tile/mm/elf.c
index 23f044e8a7ab..3ad54a7a927c 100644
--- a/arch/tile/mm/elf.c
+++ b/arch/tile/mm/elf.c
@@ -117,7 +117,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
 	 * whatever was passed as the linux_binprm filename.
 	 */
 	if (!notify_exec(mm))
-		sim_notify_exec(bprm->filename);
+		sim_notify_exec(bprm->filename->name);
 
 	retval = setup_vdso_pages();
 
diff --git a/fs/binfmt_em86.c b/fs/binfmt_em86.c
index f37b08cea1f7..077a76fb6347 100644
--- a/fs/binfmt_em86.c
+++ b/fs/binfmt_em86.c
@@ -62,7 +62,7 @@ static int load_em86(struct linux_binprm *bprm)
 	 * user environment and arguments are stored.
 	 */
 	remove_arg_zero(bprm);
-	retval = copy_strings_kernel(1, &bprm->filename, bprm);
+	retval = copy_strings_kernel(1, &bprm->filename->name, bprm);
 	if (retval < 0) return retval; 
 	bprm->argc++;
 	if (i_arg) {
diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
index d50bbe59da1e..f1377a5d260b 100644
--- a/fs/binfmt_flat.c
+++ b/fs/binfmt_flat.c
@@ -469,7 +469,7 @@ static int load_flat_file(struct linux_binprm * bprm,
 	}
 
 	if (flags & FLAT_FLAG_KTRACE)
-		printk("BINFMT_FLAT: Loading file: %s\n", bprm->filename);
+		printk("BINFMT_FLAT: Loading file: %s\n", bprm->filename->name);
 
 	if (rev != FLAT_VERSION && rev != OLD_FLAT_VERSION) {
 		printk("BINFMT_FLAT: bad flat file version 0x%x (supported "
@@ -686,7 +686,7 @@ static int load_flat_file(struct linux_binprm * bprm,
 
 	if (flags & FLAT_FLAG_KTRACE)
 		printk("%s %s: TEXT=%x-%x DATA=%x-%x BSS=%x-%x\n",
-			id ? "Lib" : "Load", bprm->filename,
+			id ? "Lib" : "Load", bprm->filename->name,
 			(int) start_code, (int) end_code,
 			(int) datapos,
 			(int) (datapos + data_len),
@@ -818,11 +818,14 @@ static int load_flat_shared_library(int id, struct lib_info *libs)
 	sprintf(buf, "/lib/lib%d.so", id);
 
 	/* Open the file up */
-	bprm.filename = buf;
-	bprm.file = open_exec(bprm.filename);
+	bprm.filename = getname_kernel(buf);
+	if (IS_ERR(bprm.filename))
+		return res;
+
+	bprm.file = open_exec(buf);
 	res = PTR_ERR(bprm.file);
 	if (IS_ERR(bprm.file))
-		return res;
+		goto out_putname;
 
 	bprm.cred = prepare_exec_creds();
 	res = -ENOMEM;
@@ -845,6 +848,8 @@ static int load_flat_shared_library(int id, struct lib_info *libs)
 out:
 	allow_write_access(bprm.file);
 	fput(bprm.file);
+out_putname:
+	putname(bprm.filename);
 
 	return(res);
 }
diff --git a/fs/exec.c b/fs/exec.c
index e1529b4c79b1..538be413b26b 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -748,11 +748,10 @@ EXPORT_SYMBOL(setup_arg_pages);
 
 #endif /* CONFIG_MMU */
 
-struct file *open_exec(const char *name)
+static struct file *do_open_exec(struct filename *name)
 {
 	struct file *file;
 	int err;
-	struct filename tmp = { .name = name };
 	static const struct open_flags open_exec_flags = {
 		.open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
 		.acc_mode = MAY_EXEC | MAY_OPEN,
@@ -760,7 +759,7 @@ struct file *open_exec(const char *name)
 		.lookup_flags = LOOKUP_FOLLOW,
 	};
 
-	file = do_filp_open(AT_FDCWD, &tmp, &open_exec_flags);
+	file = do_filp_open(AT_FDCWD, name, &open_exec_flags);
 	if (IS_ERR(file))
 		goto out;
 
@@ -784,6 +783,12 @@ exit:
 	fput(file);
 	return ERR_PTR(err);
 }
+
+struct file *open_exec(const char *name)
+{
+	struct filename tmp = { .name = name };
+	return do_open_exec(&tmp);
+}
 EXPORT_SYMBOL(open_exec);
 
 int kernel_read(struct file *file, loff_t offset,
@@ -1074,7 +1079,7 @@ int flush_old_exec(struct linux_binprm * bprm)
 
 	set_mm_exe_file(bprm->mm, bprm->file);
 
-	filename_to_taskname(bprm->tcomm, bprm->filename, sizeof(bprm->tcomm));
+	filename_to_taskname(bprm->tcomm, bprm->filename->name, sizeof(bprm->tcomm));
 	/*
 	 * Release all of the old mmap stuff
 	 */
@@ -1162,7 +1167,7 @@ int prepare_bprm_creds(struct linux_binprm *bprm)
 	return -ENOMEM;
 }
 
-void free_bprm(struct linux_binprm *bprm)
+static void free_bprm(struct linux_binprm *bprm)
 {
 	free_arg_pages(bprm);
 	if (bprm->cred) {
@@ -1174,15 +1179,17 @@ void free_bprm(struct linux_binprm *bprm)
 		fput(bprm->file);
 	}
 	/* If a binfmt changed the interp, free it. */
-	if (bprm->interp != bprm->filename)
+	if (bprm->interp != bprm->filename->name)
 		kfree(bprm->interp);
+	if (bprm->filename)
+		putname(bprm->filename);
 	kfree(bprm);
 }
 
 int bprm_change_interp(char *interp, struct linux_binprm *bprm)
 {
 	/* If a binfmt changed the interp, free it first. */
-	if (bprm->interp != bprm->filename)
+	if (bprm->interp != bprm->filename->name)
 		kfree(bprm->interp);
 	bprm->interp = kstrdup(interp, GFP_KERNEL);
 	if (!bprm->interp)
@@ -1432,7 +1439,7 @@ static int exec_binprm(struct linux_binprm *bprm)
 /*
  * sys_execve() executes a new program.
  */
-static int do_execve_common(const char *filename,
+static int do_execve_common(struct filename *filename,
 				struct user_arg_ptr argv,
 				struct user_arg_ptr envp)
 {
@@ -1441,6 +1448,9 @@ static int do_execve_common(const char *filename,
 	struct files_struct *displaced;
 	int retval;
 
+	if (IS_ERR(filename))
+		return PTR_ERR(filename);
+
 	/*
 	 * We move the actual failure in case of RLIMIT_NPROC excess from
 	 * set*uid() to execve() because too many poorly written programs
@@ -1466,6 +1476,9 @@ static int do_execve_common(const char *filename,
 	if (!bprm)
 		goto out_files;
 
+	bprm->filename = filename;
+	bprm->interp = filename->name;
+
 	retval = prepare_bprm_creds(bprm);
 	if (retval)
 		goto out_free;
@@ -1473,7 +1486,7 @@ static int do_execve_common(const char *filename,
 	check_unsafe_exec(bprm);
 	current->in_execve = 1;
 
-	file = open_exec(filename);
+	file = do_open_exec(filename);
 	retval = PTR_ERR(file);
 	if (IS_ERR(file))
 		goto out_unmark;
@@ -1481,8 +1494,6 @@ static int do_execve_common(const char *filename,
 	sched_exec();
 
 	bprm->file = file;
-	bprm->filename = filename;
-	bprm->interp = filename;
 
 	retval = bprm_mm_init(bprm);
 	if (retval)
@@ -1500,7 +1511,7 @@ static int do_execve_common(const char *filename,
 	if (retval < 0)
 		goto out;
 
-	retval = copy_strings_kernel(1, &bprm->filename, bprm);
+	retval = copy_strings_kernel(1, &bprm->filename->name, bprm);
 	if (retval < 0)
 		goto out;
 
@@ -1539,15 +1550,18 @@ out_unmark:
 
 out_free:
 	free_bprm(bprm);
+	filename = NULL;
 
 out_files:
 	if (displaced)
 		reset_files_struct(displaced);
 out_ret:
+	if (filename)
+		putname(filename);
 	return retval;
 }
 
-int do_execve(const char *filename,
+int do_execve(struct filename *filename,
 	const char __user *const __user *__argv,
 	const char __user *const __user *__envp)
 {
@@ -1557,7 +1571,7 @@ int do_execve(const char *filename,
 }
 
 #ifdef CONFIG_COMPAT
-static int compat_do_execve(const char *filename,
+static int compat_do_execve(struct filename *filename,
 	const compat_uptr_t __user *__argv,
 	const compat_uptr_t __user *__envp)
 {
@@ -1607,25 +1621,13 @@ SYSCALL_DEFINE3(execve,
 		const char __user *const __user *, argv,
 		const char __user *const __user *, envp)
 {
-	struct filename *path = getname(filename);
-	int error = PTR_ERR(path);
-	if (!IS_ERR(path)) {
-		error = do_execve(path->name, argv, envp);
-		putname(path);
-	}
-	return error;
+	return do_execve(getname(filename), argv, envp);
 }
 #ifdef CONFIG_COMPAT
 asmlinkage long compat_sys_execve(const char __user * filename,
 	const compat_uptr_t __user * argv,
 	const compat_uptr_t __user * envp)
 {
-	struct filename *path = getname(filename);
-	int error = PTR_ERR(path);
-	if (!IS_ERR(path)) {
-		error = compat_do_execve(path->name, argv, envp);
-		putname(path);
-	}
-	return error;
+	return compat_do_execve(getname(filename), argv, envp);
 }
 #endif
diff --git a/fs/namei.c b/fs/namei.c
index d580df2e6804..385f7817bfcc 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -196,6 +196,7 @@ recopy:
 		goto error;
 
 	result->uptr = filename;
+	result->aname = NULL;
 	audit_getname(result);
 	return result;
 
@@ -210,6 +211,35 @@ getname(const char __user * filename)
 	return getname_flags(filename, 0, NULL);
 }
 
+/*
+ * The "getname_kernel()" interface doesn't do pathnames longer
+ * than EMBEDDED_NAME_MAX. Deal with it - you're a kernel user.
+ */
+struct filename *
+getname_kernel(const char * filename)
+{
+	struct filename *result;
+	char *kname;
+	int len;
+
+	len = strlen(filename);
+	if (len >= EMBEDDED_NAME_MAX)
+		return ERR_PTR(-ENAMETOOLONG);
+
+	result = __getname();
+	if (unlikely(!result))
+		return ERR_PTR(-ENOMEM);
+
+	kname = (char *)result + sizeof(*result);
+	result->name = kname;
+	result->uptr = NULL;
+	result->aname = NULL;
+	result->separate = false;
+
+	strlcpy(kname, filename, EMBEDDED_NAME_MAX);
+	return result;
+}
+
 #ifdef CONFIG_AUDITSYSCALL
 void putname(struct filename *name)
 {
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index fd8bf3219ef7..4c4a85accbb1 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -37,7 +37,7 @@ struct linux_binprm {
 	int unsafe;		/* how unsafe this exec is (mask of LSM_UNSAFE_*) */
 	unsigned int per_clear;	/* bits to clear in current->personality */
 	int argc, envc;
-	const char * filename;	/* Name of binary as seen by procps */
+	struct filename *filename;	/* Name of binary as seen by procps */
 	const char * interp;	/* Name of the binary really executed. Most
 				   of the time same as filename, but could be
 				   different for binfmt_{misc,script} */
@@ -115,7 +115,6 @@ extern int copy_strings_kernel(int argc, const char *const *argv,
 extern int prepare_bprm_creds(struct linux_binprm *bprm);
 extern void install_exec_creds(struct linux_binprm *bprm);
 extern void set_binfmt(struct linux_binfmt *new);
-extern void free_bprm(struct linux_binprm *);
 extern ssize_t read_code(struct file *, unsigned long, loff_t, size_t);
 
 #endif /* _LINUX_BINFMTS_H */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 09f553c59813..d79678c188ad 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2079,6 +2079,7 @@ extern struct file * dentry_open(const struct path *, int, const struct cred *);
 extern int filp_close(struct file *, fl_owner_t id);
 
 extern struct filename *getname(const char __user *);
+extern struct filename *getname_kernel(const char *);
 
 enum {
 	FILE_CREATED = 1,
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 68a0e84463a0..a781dec1cd0b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -128,6 +128,7 @@ struct bio_list;
 struct fs_struct;
 struct perf_event_context;
 struct blk_plug;
+struct filename;
 
 /*
  * List of flags we want to share for kernel threads,
@@ -2311,7 +2312,7 @@ extern void do_group_exit(int);
 extern int allow_signal(int);
 extern int disallow_signal(int);
 
-extern int do_execve(const char *,
+extern int do_execve(struct filename *,
 		     const char __user * const __user *,
 		     const char __user * const __user *);
 extern long do_fork(unsigned long, unsigned long, unsigned long, int __user *, int __user *);
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 67e1bbf83695..c1e611e29cb3 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -287,13 +287,13 @@ TRACE_EVENT(sched_process_exec,
 	TP_ARGS(p, old_pid, bprm),
 
 	TP_STRUCT__entry(
-		__string(	filename,	bprm->filename	)
+		__string(	filename,	bprm->filename->name	)
 		__field(	pid_t,		pid		)
 		__field(	pid_t,		old_pid		)
 	),
 
 	TP_fast_assign(
-		__assign_str(filename, bprm->filename);
+		__assign_str(filename, bprm->filename->name);
 		__entry->pid		= p->pid;
 		__entry->old_pid	= old_pid;
 	),
diff --git a/init/main.c b/init/main.c
index 2fd9cef70ee8..eb03090cdced 100644
--- a/init/main.c
+++ b/init/main.c
@@ -812,7 +812,7 @@ void __init load_default_modules(void)
 static int run_init_process(const char *init_filename)
 {
 	argv_init[0] = init_filename;
-	return do_execve(init_filename,
+	return do_execve(getname_kernel(init_filename),
 		(const char __user *const __user *)argv_init,
 		(const char __user *const __user *)envp_init);
 }
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 10176cd5956a..7aef2f4b6c64 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1719,7 +1719,7 @@ void audit_putname(struct filename *name)
 	struct audit_context *context = current->audit_context;
 
 	BUG_ON(!context);
-	if (!context->in_syscall) {
+	if (!name->aname || !context->in_syscall) {
 #if AUDIT_DEBUG == 2
 		printk(KERN_ERR "%s:%d(:%d): final_putname(%p)\n",
 		       __FILE__, __LINE__, context->serial, name);
diff --git a/kernel/kmod.c b/kernel/kmod.c
index b086006c59e7..6b375af4958d 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -239,7 +239,7 @@ static int ____call_usermodehelper(void *data)
 
 	commit_creds(new);
 
-	retval = do_execve(sub_info->path,
+	retval = do_execve(getname_kernel(sub_info->path),
 			   (const char __user *const __user *)sub_info->argv,
 			   (const char __user *const __user *)sub_info->envp);
 	if (!retval)
diff --git a/security/apparmor/domain.c b/security/apparmor/domain.c
index 452567d3a08e..0358166d0652 100644
--- a/security/apparmor/domain.c
+++ b/security/apparmor/domain.c
@@ -372,7 +372,7 @@ int apparmor_bprm_set_creds(struct linux_binprm *bprm)
 		if (unconfined(profile) ||
 		    (profile->flags & PFLAG_IX_ON_NAME_ERROR))
 			error = 0;
-		name = bprm->filename;
+		name = bprm->filename->name;
 		goto audit;
 	}
 
diff --git a/security/commoncap.c b/security/commoncap.c
index b9d613e0ef14..43b39e520697 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -449,7 +449,7 @@ static int get_file_caps(struct linux_binprm *bprm, bool *effective, bool *has_c
 	if (rc < 0) {
 		if (rc == -EINVAL)
 			printk(KERN_NOTICE "%s: get_vfs_caps_from_disk returned %d for %s\n",
-				__func__, rc, bprm->filename);
+				__func__, rc, bprm->filename->name);
 		else if (rc == -ENODATA)
 			rc = 0;
 		goto out;
@@ -458,7 +458,7 @@ static int get_file_caps(struct linux_binprm *bprm, bool *effective, bool *has_c
 	rc = bprm_caps_from_vfs_caps(&vcaps, bprm, effective, has_cap);
 	if (rc == -EINVAL)
 		printk(KERN_NOTICE "%s: cap_from_disk returned %d for %s\n",
-		       __func__, rc, bprm->filename);
+		       __func__, rc, bprm->filename->name);
 
 out:
 	dput(dentry);
@@ -498,7 +498,7 @@ int cap_bprm_set_creds(struct linux_binprm *bprm)
 		 * for a root user just to cause least surprise to an admin.
 		 */
 		if (has_cap && !uid_eq(new->uid, root_uid) && uid_eq(new->euid, root_uid)) {
-			warn_setuid_and_fcaps_mixed(bprm->filename);
+			warn_setuid_and_fcaps_mixed(bprm->filename->name);
 			goto skip;
 		}
 		/*
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index 149ee1119f87..330b706a570e 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -278,8 +278,8 @@ int ima_file_mmap(struct file *file, unsigned long prot)
 int ima_bprm_check(struct linux_binprm *bprm)
 {
 	return process_measurement(bprm->file,
-				   (strcmp(bprm->filename, bprm->interp) == 0) ?
-				   bprm->filename : bprm->interp,
+				   (strcmp(bprm->filename->name, bprm->interp) == 0) ?
+				   bprm->filename->name : bprm->interp,
 				   MAY_EXEC, BPRM_CHECK);
 }
 
diff --git a/security/tomoyo/domain.c b/security/tomoyo/domain.c
index 38651454ed08..eb3230cbf158 100644
--- a/security/tomoyo/domain.c
+++ b/security/tomoyo/domain.c
@@ -677,7 +677,7 @@ int tomoyo_find_next_domain(struct linux_binprm *bprm)
 {
 	struct tomoyo_domain_info *old_domain = tomoyo_domain();
 	struct tomoyo_domain_info *domain = NULL;
-	const char *original_name = bprm->filename;
+	const char *original_name = bprm->filename->name;
 	int retval = -ENOMEM;
 	bool reject_on_transition_failure = false;
 	const struct tomoyo_path_info *candidate;
diff --git a/security/tomoyo/tomoyo.c b/security/tomoyo/tomoyo.c
index f0b756e27fed..96757c3b1bd8 100644
--- a/security/tomoyo/tomoyo.c
+++ b/security/tomoyo/tomoyo.c
@@ -90,7 +90,7 @@ static int tomoyo_bprm_set_creds(struct linux_binprm *bprm)
 	 * for the first time.
 	 */
 	if (!tomoyo_policy_loaded)
-		tomoyo_load_policy(bprm->filename);
+		tomoyo_load_policy(bprm->filename->name);
 #endif
 	/*
 	 * Release reference to "struct tomoyo_domain_info" stored inside

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH] exec: Fix use after free of tracepoint trace_sched_process_exec
  2014-02-05  3:37           ` Linus Torvalds
@ 2014-02-05 13:52             ` Oleg Nesterov
  2014-02-05 16:52               ` Linus Torvalds
  0 siblings, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2014-02-05 13:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Eric Paris, Steven Rostedt, LKML, Andrew Morton,
	David Smith, Peter Zijlstra, Igor Zhbanov, Christoph Hellwig

On 02/04, Linus Torvalds wrote:
>
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -239,7 +239,7 @@ static int ____call_usermodehelper(void *data)
>
>  	commit_creds(new);
>
> -	retval = do_execve(sub_info->path,
> +	retval = do_execve(getname_kernel(sub_info->path),
>  			   (const char __user *const __user *)sub_info->argv,
>  			   (const char __user *const __user *)sub_info->envp);

Great, this naturally duplicates filename unconditionally, and we can
kill bprm->tcomm[].

But,

> --- a/include/linux/binfmts.h
> +++ b/include/linux/binfmts.h
> @@ -37,7 +37,7 @@ struct linux_binprm {
>  	int unsafe;		/* how unsafe this exec is (mask of LSM_UNSAFE_*) */
>  	unsigned int per_clear;	/* bits to clear in current->personality */
>  	int argc, envc;
> -	const char * filename;	/* Name of binary as seen by procps */
> +	struct filename *filename;	/* Name of binary as seen by procps */

Do we really need this change? If not (afaics), the patch can be
much simpler, see below...



> -void free_bprm(struct linux_binprm *bprm)
> +static void free_bprm(struct linux_binprm *bprm)
>  {
>  	free_arg_pages(bprm);
>  	if (bprm->cred) {
> @@ -1174,15 +1179,17 @@ void free_bprm(struct linux_binprm *bprm)
>  		fput(bprm->file);
>  	}
>  	/* If a binfmt changed the interp, free it. */
> -	if (bprm->interp != bprm->filename)
> +	if (bprm->interp != bprm->filename->name)
>  		kfree(bprm->interp);
> +	if (bprm->filename)
> +		putname(bprm->filename);

Even if we actually need to turn bprm->filename into "struct filename"
this free_bprm()->putname() only complicates the code, unless I missed
something. The caller, do_execve(), can do putname() unconditionally and
avoid if/NULL games.

IOW, doesn't the change below (on top of your patch) obviously makes
sense or I am totally confused?

Oleg.

--- x/fs/exec.c
+++ x/fs/exec.c
@@ -1183,8 +1183,6 @@ static void free_bprm(struct linux_binpr
 	/* If a binfmt changed the interp, free it. */
 	if (bprm->interp != bprm->filename->name)
 		kfree(bprm->interp);
-	if (bprm->filename)
-		putname(bprm->filename);
 	kfree(bprm);
 }
 
@@ -1478,9 +1476,6 @@ static int do_execve_common(struct filen
 	if (!bprm)
 		goto out_files;
 
-	bprm->filename = filename;
-	bprm->interp = filename->name;
-
 	retval = prepare_bprm_creds(bprm);
 	if (retval)
 		goto out_free;
@@ -1496,6 +1491,8 @@ static int do_execve_common(struct filen
 	sched_exec();
 
 	bprm->file = file;
+	bprm->filename = filename;
+	bprm->interp = filename->name;
 
 	retval = bprm_mm_init(bprm);
 	if (retval)
@@ -1538,7 +1535,7 @@ static int do_execve_common(struct filen
 	free_bprm(bprm);
 	if (displaced)
 		put_files_struct(displaced);
-	return retval;
+	goto out_ret;
 
 out:
 	if (bprm->mm) {
@@ -1552,14 +1549,12 @@ out_unmark:
 
 out_free:
 	free_bprm(bprm);
-	filename = NULL;
 
 out_files:
 	if (displaced)
 		reset_files_struct(displaced);
 out_ret:
-	if (filename)
-		putname(filename);
+	putname(filename);
 	return retval;
 }
 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH] exec: Fix use after free of tracepoint trace_sched_process_exec
  2014-02-05 13:52             ` Oleg Nesterov
@ 2014-02-05 16:52               ` Linus Torvalds
  0 siblings, 0 replies; 14+ messages in thread
From: Linus Torvalds @ 2014-02-05 16:52 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Al Viro, Eric Paris, Steven Rostedt, LKML, Andrew Morton,
	David Smith, Peter Zijlstra, Igor Zhbanov, Christoph Hellwig

[-- Attachment #1: Type: text/plain, Size: 651 bytes --]

On Wed, Feb 5, 2014 at 5:52 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>
> Do we really need this change? If not (afaics), the patch can be
> much simpler, see below...

Right you are.

I started out with free_bprm() being non-static, and thought I had to
handle other callers. Which is why I made the bprm->filename be the
"struct filename" so that brpm_free() could free it.

And then I only later noticed that free_bprm() was really only used by
fs/exec.c, and turned it static - and you're right, if we always just
free it in do_execve_common(), that simplifies the patch a lot.

Good call. New patch attached. More comments?

              Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 9080 bytes --]

 arch/parisc/hpux/fs.c   |  3 +--
 fs/exec.c               | 45 +++++++++++++++++++++------------------------
 fs/namei.c              | 30 ++++++++++++++++++++++++++++++
 include/linux/binfmts.h |  1 -
 include/linux/fs.h      |  1 +
 include/linux/sched.h   |  3 ++-
 init/main.c             |  2 +-
 kernel/auditsc.c        |  2 +-
 kernel/kmod.c           |  2 +-
 9 files changed, 58 insertions(+), 31 deletions(-)

diff --git a/arch/parisc/hpux/fs.c b/arch/parisc/hpux/fs.c
index 88d0962de65a..23ebcd678da3 100644
--- a/arch/parisc/hpux/fs.c
+++ b/arch/parisc/hpux/fs.c
@@ -41,11 +41,10 @@ int hpux_execve(struct pt_regs *regs)
 	if (IS_ERR(filename))
 		goto out;
 
-	error = do_execve(filename->name,
+	error = do_execve(filename,
 			  (const char __user *const __user *) regs->gr[25],
 			  (const char __user *const __user *) regs->gr[24]);
 
-	putname(filename);
 
 out:
 	return error;
diff --git a/fs/exec.c b/fs/exec.c
index e1529b4c79b1..3d78fccdd723 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -748,11 +748,10 @@ EXPORT_SYMBOL(setup_arg_pages);
 
 #endif /* CONFIG_MMU */
 
-struct file *open_exec(const char *name)
+static struct file *do_open_exec(struct filename *name)
 {
 	struct file *file;
 	int err;
-	struct filename tmp = { .name = name };
 	static const struct open_flags open_exec_flags = {
 		.open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
 		.acc_mode = MAY_EXEC | MAY_OPEN,
@@ -760,7 +759,7 @@ struct file *open_exec(const char *name)
 		.lookup_flags = LOOKUP_FOLLOW,
 	};
 
-	file = do_filp_open(AT_FDCWD, &tmp, &open_exec_flags);
+	file = do_filp_open(AT_FDCWD, name, &open_exec_flags);
 	if (IS_ERR(file))
 		goto out;
 
@@ -784,6 +783,12 @@ exit:
 	fput(file);
 	return ERR_PTR(err);
 }
+
+struct file *open_exec(const char *name)
+{
+	struct filename tmp = { .name = name };
+	return do_open_exec(&tmp);
+}
 EXPORT_SYMBOL(open_exec);
 
 int kernel_read(struct file *file, loff_t offset,
@@ -1162,7 +1167,7 @@ int prepare_bprm_creds(struct linux_binprm *bprm)
 	return -ENOMEM;
 }
 
-void free_bprm(struct linux_binprm *bprm)
+static void free_bprm(struct linux_binprm *bprm)
 {
 	free_arg_pages(bprm);
 	if (bprm->cred) {
@@ -1432,7 +1437,7 @@ static int exec_binprm(struct linux_binprm *bprm)
 /*
  * sys_execve() executes a new program.
  */
-static int do_execve_common(const char *filename,
+static int do_execve_common(struct filename *filename,
 				struct user_arg_ptr argv,
 				struct user_arg_ptr envp)
 {
@@ -1441,6 +1446,9 @@ static int do_execve_common(const char *filename,
 	struct files_struct *displaced;
 	int retval;
 
+	if (IS_ERR(filename))
+		return PTR_ERR(filename);
+
 	/*
 	 * We move the actual failure in case of RLIMIT_NPROC excess from
 	 * set*uid() to execve() because too many poorly written programs
@@ -1473,7 +1481,7 @@ static int do_execve_common(const char *filename,
 	check_unsafe_exec(bprm);
 	current->in_execve = 1;
 
-	file = open_exec(filename);
+	file = do_open_exec(filename);
 	retval = PTR_ERR(file);
 	if (IS_ERR(file))
 		goto out_unmark;
@@ -1481,8 +1489,7 @@ static int do_execve_common(const char *filename,
 	sched_exec();
 
 	bprm->file = file;
-	bprm->filename = filename;
-	bprm->interp = filename;
+	bprm->filename = bprm->interp = filename->name;
 
 	retval = bprm_mm_init(bprm);
 	if (retval)
@@ -1523,6 +1530,7 @@ static int do_execve_common(const char *filename,
 	acct_update_integrals(current);
 	task_numa_free(current);
 	free_bprm(bprm);
+	putname(filename);
 	if (displaced)
 		put_files_struct(displaced);
 	return retval;
@@ -1544,10 +1552,11 @@ out_files:
 	if (displaced)
 		reset_files_struct(displaced);
 out_ret:
+	putname(filename);
 	return retval;
 }
 
-int do_execve(const char *filename,
+int do_execve(struct filename *filename,
 	const char __user *const __user *__argv,
 	const char __user *const __user *__envp)
 {
@@ -1557,7 +1566,7 @@ int do_execve(const char *filename,
 }
 
 #ifdef CONFIG_COMPAT
-static int compat_do_execve(const char *filename,
+static int compat_do_execve(struct filename *filename,
 	const compat_uptr_t __user *__argv,
 	const compat_uptr_t __user *__envp)
 {
@@ -1607,25 +1616,13 @@ SYSCALL_DEFINE3(execve,
 		const char __user *const __user *, argv,
 		const char __user *const __user *, envp)
 {
-	struct filename *path = getname(filename);
-	int error = PTR_ERR(path);
-	if (!IS_ERR(path)) {
-		error = do_execve(path->name, argv, envp);
-		putname(path);
-	}
-	return error;
+	return do_execve(getname(filename), argv, envp);
 }
 #ifdef CONFIG_COMPAT
 asmlinkage long compat_sys_execve(const char __user * filename,
 	const compat_uptr_t __user * argv,
 	const compat_uptr_t __user * envp)
 {
-	struct filename *path = getname(filename);
-	int error = PTR_ERR(path);
-	if (!IS_ERR(path)) {
-		error = compat_do_execve(path->name, argv, envp);
-		putname(path);
-	}
-	return error;
+	return compat_do_execve(getname(filename), argv, envp);
 }
 #endif
diff --git a/fs/namei.c b/fs/namei.c
index d580df2e6804..385f7817bfcc 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -196,6 +196,7 @@ recopy:
 		goto error;
 
 	result->uptr = filename;
+	result->aname = NULL;
 	audit_getname(result);
 	return result;
 
@@ -210,6 +211,35 @@ getname(const char __user * filename)
 	return getname_flags(filename, 0, NULL);
 }
 
+/*
+ * The "getname_kernel()" interface doesn't do pathnames longer
+ * than EMBEDDED_NAME_MAX. Deal with it - you're a kernel user.
+ */
+struct filename *
+getname_kernel(const char * filename)
+{
+	struct filename *result;
+	char *kname;
+	int len;
+
+	len = strlen(filename);
+	if (len >= EMBEDDED_NAME_MAX)
+		return ERR_PTR(-ENAMETOOLONG);
+
+	result = __getname();
+	if (unlikely(!result))
+		return ERR_PTR(-ENOMEM);
+
+	kname = (char *)result + sizeof(*result);
+	result->name = kname;
+	result->uptr = NULL;
+	result->aname = NULL;
+	result->separate = false;
+
+	strlcpy(kname, filename, EMBEDDED_NAME_MAX);
+	return result;
+}
+
 #ifdef CONFIG_AUDITSYSCALL
 void putname(struct filename *name)
 {
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index fd8bf3219ef7..b4a745d7d9a9 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -115,7 +115,6 @@ extern int copy_strings_kernel(int argc, const char *const *argv,
 extern int prepare_bprm_creds(struct linux_binprm *bprm);
 extern void install_exec_creds(struct linux_binprm *bprm);
 extern void set_binfmt(struct linux_binfmt *new);
-extern void free_bprm(struct linux_binprm *);
 extern ssize_t read_code(struct file *, unsigned long, loff_t, size_t);
 
 #endif /* _LINUX_BINFMTS_H */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 09f553c59813..d79678c188ad 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2079,6 +2079,7 @@ extern struct file * dentry_open(const struct path *, int, const struct cred *);
 extern int filp_close(struct file *, fl_owner_t id);
 
 extern struct filename *getname(const char __user *);
+extern struct filename *getname_kernel(const char *);
 
 enum {
 	FILE_CREATED = 1,
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 68a0e84463a0..a781dec1cd0b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -128,6 +128,7 @@ struct bio_list;
 struct fs_struct;
 struct perf_event_context;
 struct blk_plug;
+struct filename;
 
 /*
  * List of flags we want to share for kernel threads,
@@ -2311,7 +2312,7 @@ extern void do_group_exit(int);
 extern int allow_signal(int);
 extern int disallow_signal(int);
 
-extern int do_execve(const char *,
+extern int do_execve(struct filename *,
 		     const char __user * const __user *,
 		     const char __user * const __user *);
 extern long do_fork(unsigned long, unsigned long, unsigned long, int __user *, int __user *);
diff --git a/init/main.c b/init/main.c
index 2fd9cef70ee8..eb03090cdced 100644
--- a/init/main.c
+++ b/init/main.c
@@ -812,7 +812,7 @@ void __init load_default_modules(void)
 static int run_init_process(const char *init_filename)
 {
 	argv_init[0] = init_filename;
-	return do_execve(init_filename,
+	return do_execve(getname_kernel(init_filename),
 		(const char __user *const __user *)argv_init,
 		(const char __user *const __user *)envp_init);
 }
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 10176cd5956a..7aef2f4b6c64 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1719,7 +1719,7 @@ void audit_putname(struct filename *name)
 	struct audit_context *context = current->audit_context;
 
 	BUG_ON(!context);
-	if (!context->in_syscall) {
+	if (!name->aname || !context->in_syscall) {
 #if AUDIT_DEBUG == 2
 		printk(KERN_ERR "%s:%d(:%d): final_putname(%p)\n",
 		       __FILE__, __LINE__, context->serial, name);
diff --git a/kernel/kmod.c b/kernel/kmod.c
index b086006c59e7..6b375af4958d 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -239,7 +239,7 @@ static int ____call_usermodehelper(void *data)
 
 	commit_creds(new);
 
-	retval = do_execve(sub_info->path,
+	retval = do_execve(getname_kernel(sub_info->path),
 			   (const char __user *const __user *)sub_info->argv,
 			   (const char __user *const __user *)sub_info->envp);
 	if (!retval)

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-02-05 16:52 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-04 17:05 [RFC][PATCH] exec: Fix use after free of tracepoint trace_sched_process_exec Steven Rostedt
2014-02-04 19:00 ` Oleg Nesterov
2014-02-04 20:10   ` Steven Rostedt
2014-02-04 20:18 ` Linus Torvalds
2014-02-04 20:31   ` Steven Rostedt
2014-02-04 23:28   ` Steven Rostedt
2014-02-04 23:42     ` Steven Rostedt
2014-02-05  0:57       ` Linus Torvalds
2014-02-05  1:10         ` Al Viro
2014-02-05  3:37           ` Linus Torvalds
2014-02-05 13:52             ` Oleg Nesterov
2014-02-05 16:52               ` Linus Torvalds
2014-02-05  2:31         ` Steven Rostedt
2014-02-05  2:51           ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox