From: ebiederm@xmission.com (Eric W. Biederman)
To: Jens Axboe <axboe@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>, Qian Cai <cai@redhat.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH -next] fs: Fix memory leaks in do_renameat2() error paths
Date: Tue, 03 Nov 2020 08:45:21 -0600 [thread overview]
Message-ID: <87mtzy7b3y.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <3abc1742-733e-c682-5476-c6337a630e05@kernel.dk> (Jens Axboe's message of "Mon, 2 Nov 2020 14:39:32 -0700")
Jens Axboe <axboe@kernel.dk> writes:
> On 11/2/20 1:31 PM, Jens Axboe wrote:
>> On 11/2/20 1:12 PM, Eric W. Biederman wrote:
>>> Jens Axboe <axboe@kernel.dk> writes:
>>>
>>>> On 11/2/20 12:27 PM, Eric W. Biederman wrote:
>>>>> Jens Axboe <axboe@kernel.dk> writes:
>>>>>
>>>>>> On 10/30/20 4:22 PM, Al Viro wrote:
>>>>>>> On Fri, Oct 30, 2020 at 02:33:11PM -0600, Jens Axboe wrote:
>>>>>>>> On 10/30/20 12:49 PM, Al Viro wrote:
>>>>>>>>> On Fri, Oct 30, 2020 at 12:46:26PM -0600, Jens Axboe wrote:
>>>>>>>>>
>>>>>>>>>> See other reply, it's being posted soon, just haven't gotten there yet
>>>>>>>>>> and it wasn't ready.
>>>>>>>>>>
>>>>>>>>>> It's a prep patch so we can call do_renameat2 and pass in a filename
>>>>>>>>>> instead. The intent is not to have any functional changes in that prep
>>>>>>>>>> patch. But once we can pass in filenames instead of user pointers, it's
>>>>>>>>>> usable from io_uring.
>>>>>>>>>
>>>>>>>>> You do realize that pathname resolution is *NOT* offloadable to helper
>>>>>>>>> threads, I hope...
>>>>>>>>
>>>>>>>> How so? If we have all the necessary context assigned, what's preventing
>>>>>>>> it from working?
>>>>>>>
>>>>>>> Semantics of /proc/self/..., for starters (and things like /proc/mounts, etc.
>>>>>>> *do* pass through that, /dev/stdin included)
>>>>>>
>>>>>> Don't we just need ->thread_pid for that to work?
>>>>>
>>>>> No. You need ->signal.
>>>>>
>>>>> You need ->signal->pids[PIDTYPE_TGID]. It is only for /proc/thread-self
>>>>> that ->thread_pid is needed.
>>>>>
>>>>> Even more so than ->thread_pid, it is a kernel invariant that ->signal
>>>>> does not change.
>>>>
>>>> I don't care about the pid itself, my suggestion was to assign ->thread_pid
>>>> over the lookup operation to ensure that /proc/self/ worked the way that
>>>> you'd expect.
>>>
>>> I understand that.
>>>
>>> However /proc/self/ refers to the current process not to the current
>>> thread. So ->thread_pid is not what you need to assign to make that
>>> happen. What the code looks at is: ->signal->pids[PIDTYPE_TGID].
>>>
>>> It will definitely break invariants to assign to ->signal.
>>>
>>> Currently only exchange_tids assigns ->thread_pid and it is nasty. It
>>> results in code that potentially results in infinite loops in
>>> kernel/signal.c
>>>
>>> To my knowledge nothing assigns ->signal->pids[PIDTYPE_TGID]. At best
>>> it might work but I expect the it would completely confuse something in
>>> the pid to task or pid to process mappings. Which is to say even if it
>>> does work it would be an extremely fragile solution.
>>
>> Thanks Eric, that's useful. Sounds to me like we're better off, at least
>> for now, to just expressly forbid async lookup of /proc/self/. Which
>> isn't really the end of the world as far as I'm concerned.
>
> Alternatively, we just teach task_pid_ptr() where to look for an
> alternate, if current->flags & PF_IO_WORKER is true. Then we don't have
> to assign anything that's visible in task_struct, and in fact the async
> worker can retain this stuff on the stack. As all requests are killed
> before a task is allowed to exit, that should be safe.
That seems assumes task_pid_ptr is always called on current.
When you are looking at the task through the proc filesystem you want
things like /proc/<pid>/stat and /proc/<pid>/status to be able to
display the pids without problem. More than that it is desirable that
readdir does not get the view for the PF_IO_WORKER.
> diff --git a/kernel/pid.c b/kernel/pid.c
> index 74ddbff1a6ba..5fd421a4864c 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -42,6 +42,7 @@
> #include <linux/sched/signal.h>
> #include <linux/sched/task.h>
> #include <linux/idr.h>
> +#include <linux/io_uring.h>
> #include <net/sock.h>
> #include <uapi/linux/pidfd.h>
>
> @@ -320,6 +321,12 @@ EXPORT_SYMBOL_GPL(find_vpid);
>
> static struct pid **task_pid_ptr(struct task_struct *task, enum pid_type type)
> {
> + if ((task->flags & PF_IO_WORKER) && task->io_uring) {
> + return (type == PIDTYPE_PID) ?
> + &task->io_uring->thread_pid :
> + &task->io_uring->pids[type];
> + }
> +
> return (type == PIDTYPE_PID) ?
> &task->thread_pid :
> &task->signal->pids[type];
The only thing I can think of that might work convincingly is to split
get_current() into two functions get_context() and get_task(). Maybe
accessed as current_context and current_task.
With get_context() returning just a pointer to the fields that are safe
to use in io_uring, and get_task returning the other fields.
With exit and exec invaliding the pending work on the contexts it should
be safe to just return a pointer to the context that invoked io_uring.
Data in the context would either need to be read-only or be modified
and read in a multi-thread safe way.
The rest of the data in the task_struct by default could be assume it is
only modified by the task.
That would give type-safety and something avoids playing whack-a-mole
with every new piece of context that userspace accesses.
Eric
prev parent reply other threads:[~2020-11-03 14:47 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-30 15:24 [PATCH -next] fs: Fix memory leaks in do_renameat2() error paths Qian Cai
2020-10-30 15:27 ` Jens Axboe
2020-10-30 15:52 ` Qian Cai
2020-10-30 16:49 ` Jens Axboe
2020-10-30 18:42 ` Al Viro
2020-10-30 18:46 ` Jens Axboe
2020-10-30 18:49 ` Al Viro
2020-10-30 20:33 ` Jens Axboe
2020-10-30 22:22 ` Al Viro
2020-10-30 23:21 ` Jens Axboe
2020-11-02 18:43 ` Eric W. Biederman
2020-11-02 19:27 ` Eric W. Biederman
2020-11-02 19:54 ` Jens Axboe
2020-11-02 20:12 ` Eric W. Biederman
2020-11-02 20:31 ` Jens Axboe
2020-11-02 21:39 ` Jens Axboe
2020-11-03 14:45 ` Eric W. Biederman [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87mtzy7b3y.fsf@x220.int.ebiederm.org \
--to=ebiederm@xmission.com \
--cc=axboe@kernel.dk \
--cc=cai@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.