From: "Darrick J. Wong" <djwong@kernel.org>
To: Christian Brauner <brauner@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>, Hao Xu <hao.xu@linux.dev>,
Jens Axboe <axboe@kernel.dk>,
io-uring@vger.kernel.org,
Dominique Martinet <asmadeus@codewreck.org>,
Pavel Begunkov <asml.silence@gmail.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Stefan Roesch <shr@fb.com>, Clay Harris <bugs@claycon.org>,
linux-fsdevel@vger.kernel.org, Wanpeng Li <wanpengli@tencent.com>
Subject: Re: [PATCH 3/5] io_uring: add support for getdents
Date: Mon, 31 Jul 2023 08:26:23 -0700 [thread overview]
Message-ID: <20230731152623.GC11336@frogsfrogsfrogs> (raw)
In-Reply-To: <20230731-gezeugt-tierwelt-f3d6a900c262@brauner>
On Mon, Jul 31, 2023 at 10:13:21AM +0200, Christian Brauner wrote:
> On Mon, Jul 31, 2023 at 11:33:05AM +1000, Dave Chinner wrote:
> > On Thu, Jul 27, 2023 at 04:27:30PM +0200, Christian Brauner wrote:
> > > On Thu, Jul 27, 2023 at 07:51:19PM +0800, Hao Xu wrote:
> > > > I actually saw this semaphore, and there is another xfs lock in
> > > > file_accessed
> > > > --> touch_atime
> > > > --> inode_update_time
> > > > --> inode->i_op->update_time == xfs_vn_update_time
> > > >
> > > > Forgot to point them out in the cover-letter..., I didn't modify them
> > > > since I'm not very sure about if we should do so, and I saw Stefan's
> > > > patchset didn't modify them too.
> > > >
> > > > My personnal thinking is we should apply trylock logic for this
> > > > inode->i_rwsem. For xfs lock in touch_atime, we should do that since it
> > > > doesn't make sense to rollback all the stuff while we are almost at the
> > > > end of getdents because of a lock.
> > >
> > > That manoeuvres around the problem. Which I'm slightly more sensitive
> > > too as this review is a rather expensive one.
> > >
> > > Plus, it seems fixable in at least two ways:
> > >
> > > For both we need to be able to tell the filesystem that a nowait atime
> > > update is requested. Simple thing seems to me to add a S_NOWAIT flag to
> > > file_time_flags and passing that via i_op->update_time() which already
> > > has a flag argument. That would likely also help kiocb_modified().
> >
> > Wait - didn't we already fix this for mtime updates on IOCB_NOWAIT
> > modification operations? Yeah, we did:
> >
> > kiocb_modified(iocb)
> > file_modified_flags(iocb->ki_file, iocb->ki_flags)
> > ....
> > ret = inode_needs_update_time()
> > if (ret <= 0)
> > return ret;
> > if (flags & IOCB_NOWAIT)
> > return -EAGAIN;
> > <does timestamp update>
> >
> > > file_accessed()
> > > -> touch_atime()
> > > -> inode_update_time()
> > > -> i_op->update_time == xfs_vn_update_time()
> >
> > Yeah, so this needs the same treatment as file_modified_flags() -
> > touch_atime() needs a flag variant that passes IOCB_NOWAIT, and
> > after atime_needs_update() returns trues we should check IOCB_NOWAIT
> > and return EAGAIN if it is set. That will punt the operation that
> > needs to the update to a worker thread that can block....
>
> As I tried to explain, I would prefer if we could inform the filesystem
> through i_op->update_time() itself that this is async and give the
> filesystem the ability to try and acquire the locks it needs and return
> EAGAIN from i_op->update_time() itself if it can't acquire them.
>
> >
> > > Then we have two options afaict:
> > >
> > > (1) best-effort atime update
> > >
> > > file_accessed() already has the builtin assumption that updating atime
> > > might fail for other reasons - see the comment in there. So it is
> > > somewhat best-effort already.
> > >
> > > (2) move atime update before calling into filesystem
> > >
> > > If we want to be sure that access time is updated when a readdir request
> > > is issued through io_uring then we need to have file_accessed() give a
> > > return value and expose a new helper for io_uring or modify
> > > vfs_getdents() to do something like:
> > >
> > > vfs_getdents()
> > > {
> > > if (nowait)
> > > down_read_trylock()
> > >
> > > if (!IS_DEADDIR(inode)) {
> > > ret = file_accessed(file);
> > > if (ret == -EAGAIN)
> > > goto out_unlock;
> > >
> > > f_op->iterate_shared()
> > > }
> > > }
> >
> > Yup, that's the sort of thing that needs to be done.
> >
> > But as I said in the "llseek for io-uring" thread, we need to stop
> > the game of whack-a-mole passing random nowait boolean flags to VFS
> > operations before it starts in earnest. We really need a common
> > context structure (like we have a kiocb for IO operations) that
> > holds per operation control state so we have consistency across all
> > the operations that we need different behaviours for.
>
> Yes, I tend to agree and thought about the same. But right now we don't
> have a lot of context. So I would lean towards a flag argument at most.
>
> But I also wouldn't consider it necessarily wrong to start with booleans
> or a flag first and in a couple of months if the need for more context
> arises we know what kind of struct we want or need.
I'm probably missing a ton of context (because at the end of the day I
don't care all that much about NOWAIT and still have never installed
liburing) but AFAICT the goal seems to be that for a given io request,
uring tries to execute it with trylocks in the originating process
context. If that attempt fails, it'll punt the io to a workqueue and
rerun the request with blocking locks. Right?
I've watched quite a bit of NOWAIT whackamole going on over the past few
years (i_rwsem, the ILOCK, the IO layer, memory allocations...). IIRC
these filesystem ios all have to run in process context, right? If so,
why don't we capture the NOWAIT state in a PF flag? We already do that
for NOFS/NOIO memory allocations to make sure that /all/ reclaim
attempts cannot recurse into the fs/io stacks.
"I prefer EAGAIN errors to this process blocking" doesn't seem all that
much different. But, what do I know...
--D
next prev parent reply other threads:[~2023-07-31 15:26 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-18 13:21 [PATCH v4 0/5] io_uring getdents Hao Xu
2023-07-18 13:21 ` [PATCH 1/5] fs: split off vfs_getdents function of getdents64 syscall Hao Xu
2023-07-18 13:21 ` [PATCH 2/5] vfs_getdents/struct dir_context: add flags field Hao Xu
2023-07-18 13:21 ` [PATCH 3/5] io_uring: add support for getdents Hao Xu
2023-07-19 8:56 ` Hao Xu
2023-07-26 15:00 ` Christian Brauner
2023-07-27 11:51 ` Hao Xu
2023-07-27 14:27 ` Christian Brauner
2023-07-27 15:12 ` Pavel Begunkov
2023-07-27 15:52 ` Christian Brauner
2023-07-27 16:17 ` Pavel Begunkov
2023-07-27 16:28 ` Christian Brauner
2023-07-31 1:58 ` Dave Chinner
2023-07-31 7:34 ` Hao Xu
2023-07-31 7:50 ` Christian Brauner
2023-07-31 7:40 ` Christian Brauner
2023-07-30 18:02 ` Hao Xu
2023-07-31 8:18 ` Christian Brauner
2023-07-31 9:31 ` Hao Xu
2023-07-31 1:33 ` Dave Chinner
2023-07-31 8:13 ` Christian Brauner
2023-07-31 15:26 ` Darrick J. Wong [this message]
2023-07-31 22:18 ` Dave Chinner
2023-08-01 0:28 ` Jens Axboe
2023-08-01 0:47 ` Matthew Wilcox
2023-08-01 0:49 ` Jens Axboe
2023-08-01 1:01 ` Matthew Wilcox
2023-08-01 7:00 ` Christian Brauner
2023-08-01 6:59 ` Christian Brauner
2023-08-01 7:17 ` Christian Brauner
2023-08-08 4:34 ` Hao Xu
2023-08-08 5:18 ` Hao Xu
2023-08-08 9:33 ` Hao Xu
2023-08-08 22:55 ` Jens Axboe
2023-08-01 18:39 ` Hao Xu
2023-07-18 13:21 ` [PATCH 4/5] xfs: add NOWAIT semantics for readdir Hao Xu
2023-07-19 2:35 ` kernel test robot
2023-07-19 6:22 ` Fwd: " Hao Xu
2023-07-18 13:21 ` [PATCH RFC 5/5] disable fixed file for io_uring getdents for now Hao Xu
2023-07-26 14:23 ` Christian Brauner
2023-07-27 12:09 ` Hao Xu
2023-07-19 6:04 ` [PATCH v4 0/5] io_uring getdents Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230731152623.GC11336@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=asmadeus@codewreck.org \
--cc=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=bugs@claycon.org \
--cc=david@fromorbit.com \
--cc=hao.xu@linux.dev \
--cc=io-uring@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=shr@fb.com \
--cc=viro@zeniv.linux.org.uk \
--cc=wanpengli@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.