linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org, torvalds@linux-foundation.org
Subject: Re: [PATCH 1/2] fs: add support for LOOKUP_NONBLOCK
Date: Fri, 11 Dec 2020 08:57:26 -0700	[thread overview]
Message-ID: <bef3f905-f6b7-1134-7ca9-ff9385d6bf86@kernel.dk> (raw)
In-Reply-To: <20201211023555.GV3579531@ZenIV.linux.org.uk>

On 12/10/20 7:35 PM, Al Viro wrote:
> On Thu, Dec 10, 2020 at 01:01:13PM -0700, Jens Axboe wrote:
>> io_uring always punts opens to async context, since there's no control
>> over whether the lookup blocks or not. Add LOOKUP_NONBLOCK to support
>> just doing the fast RCU based lookups, which we know will not block. If
>> we can do a cached path resolution of the filename, then we don't have
>> to always punt lookups for a worker.
>>
>> During path resolution, we always do LOOKUP_RCU first. If that fails and
>> we terminate LOOKUP_RCU, then fail a LOOKUP_NONBLOCK attempt as well.
> 
> In effect you are adding a mode where
> 	* unlazy would fail, except when done from complete_walk()
> 	* ->d_revalidate() wouldn't be attempted at all (not even with LOOKUP_RCU)
> 	* ... but ->get_link() in RCU mode would
> 	* ... and so would everything done after complete_walk() in
> do_open(), very much including the joys like mnt_want_write() (i.e. waiting for
> frozen fs to thaw), handling O_TRUNC, calling ->open() itself...
> 
> So this "not punting lookups for a worker" looks fishy as hell - if you care
> about blocking operations, you haven't really won anything.
> 
> And why exactly is the RCU case of ->d_revalidate() worth buggering off (it
> really can't block - it's called under rcu_read_lock() and it does *not*
> drop it)?
> 
> _IF_ for some theoretical exercise you want to do "lookup without dropping
> out of RCU", just add a flag that has unlazy_walk() fail.  With -ECHILD.
> Strip it away in complete_walk() and have path_init() with that flag
> and without LOOKUP_RCU fail with -EAGAIN.  All there is to it.

Thanks Al, that makes for an easier implementation. I like that suggestion,
boils it down to just three hunks (see below).

For io_uring, the concept is just to perform the fast path inline. The
RCU lookup serves that purpose nicely - if we fail that, then it's expected
to take the latency hit of going async.

> It still leaves you with fuckloads of blocking operations (and that's
> "blocking" with "until admin thaws the damn filesystem several hours
> down the road") after complete_walk(), though.

But that's true (and expected) for any open that isn't non-blocking.


diff --git a/fs/namei.c b/fs/namei.c
index d7952f863e79..d49c72e34c6e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -686,6 +686,8 @@ static bool unlazy_walk(struct nameidata *nd)
 	BUG_ON(!(nd->flags & LOOKUP_RCU));
 
 	nd->flags &= ~LOOKUP_RCU;
+	if (nd->flags & LOOKUP_NONBLOCK)
+		goto out1;
 	if (unlikely(!legitimize_links(nd)))
 		goto out1;
 	if (unlikely(!legitimize_path(nd, &nd->path, nd->seq)))
@@ -792,6 +794,7 @@ static int complete_walk(struct nameidata *nd)
 		 */
 		if (!(nd->flags & (LOOKUP_ROOT | LOOKUP_IS_SCOPED)))
 			nd->root.mnt = NULL;
+		nd->flags &= ~LOOKUP_NONBLOCK;
 		if (unlikely(unlazy_walk(nd)))
 			return -ECHILD;
 	}
@@ -2209,6 +2212,9 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
 
 	if (!*s)
 		flags &= ~LOOKUP_RCU;
+	/* LOOKUP_NONBLOCK requires RCU, ask caller to retry */
+	if ((flags & (LOOKUP_RCU | LOOKUP_NONBLOCK)) == LOOKUP_NONBLOCK)
+		return ERR_PTR(-EAGAIN);
 	if (flags & LOOKUP_RCU)
 		rcu_read_lock();
 

-- 
Jens Axboe


  reply	other threads:[~2020-12-11 17:07 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-10 20:01 [PATCHSET 0/2] fs: Support for LOOKUP_NONBLOCK / RESOLVE_NONBLOCK Jens Axboe
2020-12-10 20:01 ` [PATCH 1/2] fs: add support for LOOKUP_NONBLOCK Jens Axboe
2020-12-10 20:53   ` Linus Torvalds
2020-12-10 21:06     ` Jens Axboe
2020-12-11  2:45       ` Al Viro
2020-12-11 16:05         ` Jens Axboe
2020-12-11 17:20           ` Al Viro
2020-12-11 17:35             ` Linus Torvalds
2020-12-11 18:50             ` Jens Axboe
2020-12-11 21:51               ` Al Viro
2020-12-11 23:47                 ` Jens Axboe
2020-12-11 17:33           ` Matthew Wilcox
2020-12-11 18:55             ` Jens Axboe
2020-12-11  2:35   ` Al Viro
2020-12-11 15:57     ` Jens Axboe [this message]
2020-12-11 17:21       ` Linus Torvalds
2020-12-11 17:29         ` Al Viro
2020-12-11 17:38           ` Al Viro
2020-12-11 17:44           ` Linus Torvalds
2020-12-11 21:46           ` Jens Axboe
2020-12-10 20:01 ` [PATCH 2/2] fs: expose LOOKUP_NONBLOCK through openat2() RESOLVE_NONBLOCK Jens Axboe
2020-12-10 22:29   ` Dave Chinner
2020-12-10 23:12     ` Jens Axboe
2020-12-10 23:29     ` Linus Torvalds
2020-12-11  0:58       ` Dave Chinner
2020-12-11  1:01         ` Linus Torvalds
2020-12-11  3:45           ` Dave Chinner
2020-12-11 18:07             ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bef3f905-f6b7-1134-7ca9-ff9385d6bf86@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).