linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCHSET v3 0/4] fs: Support for LOOKUP_NONBLOCK / RESOLVE_NONBLOCK (Insufficiently faking current?)
Date: Tue, 16 Feb 2021 18:26:20 -0700	[thread overview]
Message-ID: <86cd3801-dfb4-833a-b7e6-e643186030e7@kernel.dk> (raw)
In-Reply-To: <9d9b5143-6b26-49d4-a11a-b21c020d5886@kernel.dk>

On 2/16/21 6:18 PM, Jens Axboe wrote:
> On 2/15/21 7:41 PM, Jens Axboe wrote:
>> On 2/15/21 3:41 PM, Eric W. Biederman wrote:
>>> Jens Axboe <axboe@kernel.dk> writes:
>>>
>>>> On 2/15/21 11:24 AM, Jens Axboe wrote:
>>>>> On 2/15/21 11:07 AM, Eric W. Biederman wrote:
>>>>>> Linus Torvalds <torvalds@linux-foundation.org> writes:
>>>>>>
>>>>>>> On Sun, Feb 14, 2021 at 8:38 AM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>>>>
>>>>>>>>> Similarly it looks like opening of "/dev/tty" fails to
>>>>>>>>> return the tty of the caller but instead fails because
>>>>>>>>> io-wq threads don't have a tty.
>>>>>>>>
>>>>>>>> I've got a patch queued up for 5.12 that clears ->fs and ->files for the
>>>>>>>> thread if not explicitly inherited, and I'm working on similarly
>>>>>>>> proactively catching these cases that could potentially be problematic.
>>>>>>>
>>>>>>> Well, the /dev/tty case still needs fixing somehow.
>>>>>>>
>>>>>>> Opening /dev/tty actually depends on current->signal, and if it is
>>>>>>> NULL it will fall back on the first VT console instead (I think).
>>>>>>>
>>>>>>> I wonder if it should do the same thing /proc/self does..
>>>>>>
>>>>>> Would there be any downside of making the io-wq kernel threads be per
>>>>>> process instead of per user?
>>>>>>
>>>>>> I can see a lower probability of a thread already existing.  Are there
>>>>>> other downsides I am missing?
>>>>>>
>>>>>> The upside would be that all of the issues of have we copied enough
>>>>>> should go away, as the io-wq thread would then behave like another user
>>>>>> space thread.  To handle posix setresuid() and friends it looks like
>>>>>> current_cred would need to be copied but I can't think of anything else.
>>>>>
>>>>> I really like that idea. Do we currently have a way of creating a thread
>>>>> internally, akin to what would happen if the same task did pthread_create?
>>>>> That'd ensure that we have everything we need, without actively needing to
>>>>> map the request types, or find future issues of "we also need this bit".
>>>>> It'd work fine for the 'need new worker' case too, if one goes to sleep.
>>>>> We'd just 'fork' off that child.
>>>>>
>>>>> Would require some restructuring of io-wq, but at the end of it, it'd
>>>>> be a simpler solution.
>>>>
>>>> I was intrigued enough that I tried to wire this up. If we can pull this
>>>> off, then it would take a great weight off my shoulders as there would
>>>> be no more worries on identity.
>>>>
>>>> Here's a branch that's got a set of patches that actually work, though
>>>> it's a bit of a hack in spots. Notes:
>>>>
>>>> - Forked worker initially crashed, since it's an actual user thread and
>>>>   bombed on deref of kernel structures. Expectedly. That's what the
>>>>   horrible kernel_clone_args->io_wq hack is working around for now.
>>>>   Obviously not the final solution, but helped move things along so
>>>>   I could actually test this.
>>>>
>>>> - Shared io-wq helpers need indexing for task, right now this isn't
>>>>   done. But that's not hard to do.
>>>>
>>>> - Idle thread reaping isn't done yet, so they persist until the
>>>>   context goes away.
>>>>
>>>> - task_work fallback needs a bit of love. Currently we fallback to
>>>>   the io-wq manager thread for handling that, but a) manager is gone,
>>>>   and b) the new workers are now threads and go away as well when
>>>>   the original task goes away. None of the three fallback sites need
>>>>   task context, so likely solution here is just punt it to system_wq.
>>>>   Not the hot path, obviously, we're exiting.
>>>>
>>>> - Personality registration is broken, it's just Good Enough to compile.
>>>>
>>>> Probably a few more items that escape me right now. As long as you
>>>> don't hit the fallback cases, it appears to work fine for me. And
>>>> the diffstat is pretty good to:
>>>>
>>>>  fs/io-wq.c                 | 418 +++++++++++--------------------------
>>>>  fs/io-wq.h                 |  10 +-
>>>>  fs/io_uring.c              | 314 +++-------------------------
>>>>  fs/proc/self.c             |   7 -
>>>>  fs/proc/thread_self.c      |   7 -
>>>>  include/linux/io_uring.h   |  19 --
>>>>  include/linux/sched.h      |   3 +
>>>>  include/linux/sched/task.h |   1 +
>>>>  kernel/fork.c              |   2 +
>>>>  9 files changed, 161 insertions(+), 620 deletions(-)
>>>>
>>>> as it gets rid of _all_ the 'grab this or that piece' that we're
>>>> tracking.
>>>>
>>>> WIP series here:
>>>>
>>>> https://git.kernel.dk/cgit/linux-block/log/?h=io_uring-worker
>>>
>>> I took a quick look through the code and in general it seems reasonable.
>>
>> Great, thanks for checking.
> 
> Cleaner series here:
> 
> https://git.kernel.dk/cgit/linux-block/log/?h=io_uring-worker.v2
> 
> One question, since I'm a bit stumped. The very top most debug patch:
> 
> https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-worker.v2&id=8a422f030b9630d16d5ec1ff97842a265f88485e
> 
> any idea what is going on here? For some reason, it only happens for
> the 'manager' thread. That one doesn't do any work by itself, it's just
> tasked with forking a new worker, if we need one.

Seems to trigger for all cases with a pthread in the app. This reproduces
it:


#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <liburing.h>

static void *fn(void *data)
{
	struct io_uring ring;

	io_uring_queue_init(1, &ring, 0);
	sleep(1);
	return NULL;
}

int main(int argc, char *argv[])
{
	pthread_t t;
	void *ret;

	pthread_create(&t, NULL, fn, NULL);
	pthread_join(t, &ret);

	return 0;
}

-- 
Jens Axboe


  reply	other threads:[~2021-02-17  1:27 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-14 19:13 [PATCHSET v3 0/4] fs: Support for LOOKUP_NONBLOCK / RESOLVE_NONBLOCK Jens Axboe
2020-12-14 19:13 ` [PATCH 1/4] fs: make unlazy_walk() error handling consistent Jens Axboe
2020-12-14 19:13 ` [PATCH 2/4] fs: add support for LOOKUP_NONBLOCK Jens Axboe
2020-12-15 12:24   ` Matthew Wilcox
2020-12-15 15:29     ` Jens Axboe
2020-12-15 15:33       ` Matthew Wilcox
2020-12-15 15:37         ` Jens Axboe
2020-12-15 16:08           ` Jens Axboe
2020-12-15 16:14             ` Jens Axboe
2020-12-15 18:29             ` Linus Torvalds
2020-12-15 18:44               ` Jens Axboe
2020-12-15 18:47                 ` Linus Torvalds
2020-12-15 19:03                   ` Jens Axboe
2020-12-15 19:32                     ` Linus Torvalds
2020-12-15 19:38                       ` Jens Axboe
2020-12-16  2:36   ` Al Viro
2020-12-16  3:30     ` Jens Axboe
2020-12-16  2:43   ` Al Viro
2020-12-16  3:32     ` Jens Axboe
2020-12-14 19:13 ` [PATCH 3/4] fs: expose LOOKUP_NONBLOCK through openat2() RESOLVE_NONBLOCK Jens Axboe
2020-12-15 22:25   ` Dave Chinner
2020-12-15 22:31     ` Linus Torvalds
2020-12-15 23:25       ` Jens Axboe
2020-12-16  2:37   ` Al Viro
2020-12-16  3:39     ` Linus Torvalds
2020-12-14 19:13 ` [PATCH 4/4] io_uring: enable LOOKUP_NONBLOCK path resolution for filename lookups Jens Axboe
2020-12-15  3:06 ` [PATCHSET v3 0/4] fs: Support for LOOKUP_NONBLOCK / RESOLVE_NONBLOCK Linus Torvalds
2020-12-15  3:18   ` Jens Axboe
2020-12-15  6:11 ` Al Viro
2020-12-15 15:29   ` Jens Axboe
2021-01-04  5:31 ` Al Viro
2021-01-04 14:43   ` Jens Axboe
2021-01-04 16:54     ` Al Viro
2021-01-04 17:03       ` Jens Axboe
     [not found] ` <m1lfbrwrgq.fsf@fess.ebiederm.org>
2021-02-14 16:38   ` [PATCHSET v3 0/4] fs: Support for LOOKUP_NONBLOCK / RESOLVE_NONBLOCK (Insufficiently faking current?) Jens Axboe
2021-02-14 20:30     ` Linus Torvalds
2021-02-14 21:24       ` Al Viro
2021-02-15 18:07       ` Eric W. Biederman
2021-02-15 18:24         ` Jens Axboe
2021-02-15 21:09           ` Jens Axboe
2021-02-15 22:41             ` Eric W. Biederman
2021-02-16  2:41               ` Jens Axboe
2021-02-17  1:18                 ` Jens Axboe
2021-02-17  1:26                   ` Jens Axboe [this message]
2021-02-17  3:11                     ` Jens Axboe
2021-02-15 17:56     ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86cd3801-dfb4-833a-b7e6-e643186030e7@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=ebiederm@xmission.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).