Re: possible deadlock in proc_pid_syscall (2)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: ebiederm@xmission.com (Eric W. Biederman)
To: peterz@infradead.org
Cc: syzbot <syzbot+db9cdf3dd1f64252c6ef@syzkaller.appspotmail.com>,
	adobriyan@gmail.com, akpm@linux-foundation.org, avagin@gmail.com,
	christian@brauner.io, gladkov.alexey@gmail.com,
	keescook@chromium.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com,
	walken@google.com, Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@redhat.com>, Namhyung Kim <namhyung@kernel.org>,
	Miklos Szeredi <miklos@szeredi.hu>,
	jannh@google.com
Subject: Re: possible deadlock in proc_pid_syscall (2)
Date: Sun, 30 Aug 2020 07:31:39 -0500	[thread overview]
Message-ID: <87v9h0gvro.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <20200828123720.GZ1362448@hirez.programming.kicks-ass.net> (peterz's message of "Fri, 28 Aug 2020 14:37:20 +0200")

peterz@infradead.org writes:

> On Fri, Aug 28, 2020 at 07:01:17AM -0500, Eric W. Biederman wrote:
>> This feels like an issue where perf can just do too much under
>> exec_update_mutex.  In particular calling kern_path from
>> create_local_trace_uprobe.  Calling into the vfs at the very least
>> makes it impossible to know exactly which locks will be taken.
>> 
>> Thoughts?
>
>> > -> #1 (&ovl_i_mutex_dir_key[depth]){++++}-{3:3}:
>> >        down_read+0x96/0x420 kernel/locking/rwsem.c:1492
>> >        inode_lock_shared include/linux/fs.h:789 [inline]
>> >        lookup_slow fs/namei.c:1560 [inline]
>> >        walk_component+0x409/0x6a0 fs/namei.c:1860
>> >        lookup_last fs/namei.c:2309 [inline]
>> >        path_lookupat+0x1ba/0x830 fs/namei.c:2333
>> >        filename_lookup+0x19f/0x560 fs/namei.c:2366
>> >        create_local_trace_uprobe+0x87/0x4e0 kernel/trace/trace_uprobe.c:1574
>> >        perf_uprobe_init+0x132/0x210 kernel/trace/trace_event_perf.c:323
>> >        perf_uprobe_event_init+0xff/0x1c0 kernel/events/core.c:9580
>> >        perf_try_init_event+0x12a/0x560 kernel/events/core.c:10899
>> >        perf_init_event kernel/events/core.c:10951 [inline]
>> >        perf_event_alloc.part.0+0xdee/0x3770 kernel/events/core.c:11229
>> >        perf_event_alloc kernel/events/core.c:11608 [inline]
>> >        __do_sys_perf_event_open+0x72c/0x2cb0 kernel/events/core.c:11724
>> >        do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>> >        entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Right, so we hold the mutex fairly long there, supposedly to ensure
> privs don't change out from under us.
>
> We do the permission checks early -- no point in doing anything else if
> we're not allowed, but we then have to hold this mutex until the event
> is actually installed according to that comment.
>
> /me goes look at git history:
>
>   6914303824bb5 - changed cred_guard_mutex into exec_update_mutex
>   79c9ce57eb2d5 - introduces cred_guard_mutex
>
> So that latter commit explains the race we're guarding against. Without
> this we can install the event after execve() which might have changed
> privs on us.
>
> I'm open to suggestions on how to do this differently.
>
> Could we check privs twice instead?
>
> Something like the completely untested below..

That might work.

I am thinking that for cases where we want to do significant work it
might be better to ask the process to pause at someplace safe (probably
get_signal) and then do all of the work when we know nothing is changing
in the process.

I don't really like the idea of checking and then checking again.  We
might have to do it but it feels like the model is wrong somewhere.

Given that this is tricky to hit in practice, and given that I am
already working the general problem of how to sort out the locking I am
going to work this with the rest of the thorny issues of in exec.  This
feels like a case where the proper solution is that we simply need
something better than a mutex.


I had not realized before this how much setting up tracing in
perf_even_open looks like attaching a debugger in ptrace_attach.


I need to look at this some more but I suspect exec should be
treating a tracer like exec currently treats a ptracer for
purposes of permission checks.  So I think we may have more issues
than simply the possibility of deadlock on exec_update_mutex.

Eric


> ---
> diff --git a/include/linux/freelist.h b/include/linux/freelist.h
> new file mode 100644
> index 000000000000..e69de29bb2d1
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 5bfe8e3c6e44..14e6c9bbfcda 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -11701,21 +11701,9 @@ SYSCALL_DEFINE5(perf_event_open,
>  	}
>  
>  	if (task) {
> -		err = mutex_lock_interruptible(&task->signal->exec_update_mutex);
> -		if (err)
> -			goto err_task;
> -
> -		/*
> -		 * Preserve ptrace permission check for backwards compatibility.
> -		 *
> -		 * We must hold exec_update_mutex across this and any potential
> -		 * perf_install_in_context() call for this new event to
> -		 * serialize against exec() altering our credentials (and the
> -		 * perf_event_exit_task() that could imply).
> -		 */
>  		err = -EACCES;
>  		if (!perfmon_capable() && !ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS))
> -			goto err_cred;
> +			goto err_task;
>  	}
>  
>  	if (flags & PERF_FLAG_PID_CGROUP)
> @@ -11844,6 +11832,24 @@ SYSCALL_DEFINE5(perf_event_open,
>  		goto err_context;
>  	}
>  
> +	if (task) {
> +		err = mutex_lock_interruptible(&task->signal->exec_update_mutex);
> +		if (err)
> +			goto err_file;
> +
> +		/*
> +		 * Preserve ptrace permission check for backwards compatibility.
> +		 *
> +		 * We must hold exec_update_mutex across this and any potential
> +		 * perf_install_in_context() call for this new event to
> +		 * serialize against exec() altering our credentials (and the
> +		 * perf_event_exit_task() that could imply).
> +		 */
> +		err = -EACCES;
> +		if (!perfmon_capable() && !ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS))
> +			goto err_cred;
> +	}
> +
>  	if (move_group) {
>  		gctx = __perf_event_ctx_lock_double(group_leader, ctx);
>  
> @@ -12019,7 +12025,10 @@ SYSCALL_DEFINE5(perf_event_open,
>  	if (move_group)
>  		perf_event_ctx_unlock(group_leader, gctx);
>  	mutex_unlock(&ctx->mutex);
> -/* err_file: */
> +err_cred:
> +	if (task)
> +		mutex_unlock(&task->signal->exec_update_mutex);
> +err_file:
>  	fput(event_file);
>  err_context:
>  	perf_unpin_context(ctx);
> @@ -12031,9 +12040,6 @@ SYSCALL_DEFINE5(perf_event_open,
>  	 */
>  	if (!event_file)
>  		free_event(event);
> -err_cred:
> -	if (task)
> -		mutex_unlock(&task->signal->exec_update_mutex);
>  err_task:
>  	if (task)
>  		put_task_struct(task);

next prev parent reply	other threads:[~2020-08-30 12:32 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-28  4:57 possible deadlock in proc_pid_syscall (2) syzbot
2020-08-28 12:01 ` Eric W. Biederman
2020-08-28 12:37   ` peterz
2020-08-30 12:31     ` Eric W. Biederman [this message]
2020-08-31  7:43       ` peterz
2020-08-31 13:52         ` Eric W. Biederman
2020-09-02  9:57       ` peterz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v9h0gvro.fsf@x220.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=acme@kernel.org \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=avagin@gmail.com \
    --cc=christian@brauner.io \
    --cc=gladkov.alexey@gmail.com \
    --cc=jannh@google.com \
    --cc=jolsa@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=miklos@szeredi.hu \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=syzbot+db9cdf3dd1f64252c6ef@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox