From: Dave Chinner <david@fromorbit.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH 4/5] fs: honor LOOKUP_NONBLOCK for the last part of file open
Date: Mon, 14 Dec 2020 09:50:22 +1100 [thread overview]
Message-ID: <20201213225022.GF3913616@dread.disaster.area> (raw)
In-Reply-To: <8c4e7013-2929-82ed-06f6-020a19b4fb3d@kernel.dk>
On Sat, Dec 12, 2020 at 02:25:00PM -0700, Jens Axboe wrote:
> On 12/12/20 11:57 AM, Linus Torvalds wrote:
> > On Sat, Dec 12, 2020 at 8:51 AM Jens Axboe <axboe@kernel.dk> wrote:
> >>
> >> We handle it for the path resolution itself, but we should also factor
> >> it in for open_last_lookups() and tmpfile open.
> >
> > So I think this one is fundamentally wrong, for two reasons.
> >
> > One is that "nonblock" shouldn't necessarily mean "take no locks at
> > all". That directory inode lock is very very different from "go down
> > to the filesystem to do IO". No other NONBLOCK thing has ever been "no
> > locks at all", they have all been about possibly long-term blocking.
>
> Do we ever do long term IO _while_ holding the direcoty inode lock? If
> we don't, then we can probably just ignore that side alltogether.
Yes - "all the time" is the simple answer.
Readdir is a simple example, but that is just extent mapping tree
and directory block IO you'd have to wait for, just like regular
file IO.
The big problem is that modifications to directories are atomic and
transactional in most filesystems, which means we might block a
create/unlink/attr/etc in a transaction start for an indefinite
amount of time while we wait for metadata writeback to free up
journal/reservation space. And while we are doing this, nothing else
can access the directory because the VFS holds the directory inode
lock....
We also have metadata IO within transactions, but in most
journalling filesystems once we've started the transaction we can't
back out and return -EAGAIN. So once we are in a transaction
context, the filesystem will block as necessary to run the operation
to completion.
So, really, at the filesystem level I don't see much value in trying
to push non-blocking directory modifications down to the filesystem.
The commonly used filesystems will mostly have to return -EAGAIN
immediately without being able to do anything at all because they
simply aren't architected with the modification rollback
capabilities needed to run fully non-blocking transactional
modification operations.
> > Why does that code care about O_WRONLY | O_RDWR? That has *nothing* to
> > do with the open() wanting to write to the filesystem. We don't even
> > hold that lock after the open - we'll always drop it even for a
> > successful open.
> >
> > Only O_CREAT | O_TRUNC should matter, since those are the ones that
> > cause writes as part of the *open*.
And __O_TMPFILE, which is the same as O_CREAT.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2020-12-13 22:51 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-12 16:51 [PATCHSET RFC v2 0/5] fs: Support for LOOKUP_NONBLOCK / RESOLVE_NONBLOCK Jens Axboe
2020-12-12 16:51 ` [PATCH 1/5] fs: make unlazy_walk() error handling consistent Jens Axboe
2020-12-12 16:51 ` [PATCH 2/5] fs: add support for LOOKUP_NONBLOCK Jens Axboe
2020-12-12 16:51 ` [PATCH 3/5] fs: add mnt_want_write_trylock() Jens Axboe
2020-12-12 16:51 ` [PATCH 4/5] fs: honor LOOKUP_NONBLOCK for the last part of file open Jens Axboe
2020-12-12 17:25 ` Al Viro
2020-12-12 17:47 ` Jens Axboe
2020-12-12 18:57 ` Linus Torvalds
2020-12-12 21:25 ` Jens Axboe
2020-12-12 22:03 ` Linus Torvalds
2020-12-13 22:50 ` Dave Chinner [this message]
2020-12-14 0:45 ` Linus Torvalds
2020-12-14 1:52 ` Dave Chinner
2020-12-14 18:06 ` Linus Torvalds
2020-12-14 17:43 ` Jens Axboe
2020-12-12 16:51 ` [PATCH 5/5] fs: expose LOOKUP_NONBLOCK through openat2() RESOLVE_NONBLOCK Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201213225022.GF3913616@dread.disaster.area \
--to=david@fromorbit.com \
--cc=axboe@kernel.dk \
--cc=linux-fsdevel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.