All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Brauner <brauner@kernel.org>
To: Pavel Begunkov <asml.silence@gmail.com>
Cc: Hao Xu <hao.xu@linux.dev>,
	djwong@kernel.org, Dave Chinner <david@fromorbit.com>,
	Jens Axboe <axboe@kernel.dk>,
	io-uring@vger.kernel.org,
	Dominique Martinet <asmadeus@codewreck.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Stefan Roesch <shr@fb.com>, Clay Harris <bugs@claycon.org>,
	linux-fsdevel@vger.kernel.org, Wanpeng Li <wanpengli@tencent.com>,
	josef@toxicpanda.com
Subject: Re: [PATCH 3/5] io_uring: add support for getdents
Date: Thu, 27 Jul 2023 18:28:52 +0200	[thread overview]
Message-ID: <20230727-daran-abtun-4bc755f668ad@brauner> (raw)
In-Reply-To: <77feb96e-adf7-56f2-dac5-ca5b075afa83@gmail.com>

On Thu, Jul 27, 2023 at 05:17:30PM +0100, Pavel Begunkov wrote:
> On 7/27/23 16:52, Christian Brauner wrote:
> > On Thu, Jul 27, 2023 at 04:12:12PM +0100, Pavel Begunkov wrote:
> > > On 7/27/23 15:27, Christian Brauner wrote:
> > > > On Thu, Jul 27, 2023 at 07:51:19PM +0800, Hao Xu wrote:
> > > > > On 7/26/23 23:00, Christian Brauner wrote:
> > > > > > On Tue, Jul 18, 2023 at 09:21:10PM +0800, Hao Xu wrote:
> > > > > > > From: Hao Xu <howeyxu@tencent.com>
> > > > > > > 
> > > > > > > This add support for getdents64 to io_uring, acting exactly like the
> > > > > > > syscall: the directory is iterated from it's current's position as
> > > > > > > stored in the file struct, and the file's position is updated exactly as
> > > > > > > if getdents64 had been called.
> > > > > > > 
> > > > > > > For filesystems that support NOWAIT in iterate_shared(), try to use it
> > > > > > > first; if a user already knows the filesystem they use do not support
> > > > > > > nowait they can force async through IOSQE_ASYNC in the sqe flags,
> > > > > > > avoiding the need to bounce back through a useless EAGAIN return.
> > > > > > > 
> > > > > > > Co-developed-by: Dominique Martinet <asmadeus@codewreck.org>
> > > > > > > Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
> > > > > > > Signed-off-by: Hao Xu <howeyxu@tencent.com>
> > > > > > > ---
> > > [...]
> > > > > I actually saw this semaphore, and there is another xfs lock in
> > > > > file_accessed
> > > > >     --> touch_atime
> > > > >       --> inode_update_time
> > > > >         --> inode->i_op->update_time == xfs_vn_update_time
> > > > > 
> > > > > Forgot to point them out in the cover-letter..., I didn't modify them
> > > > > since I'm not very sure about if we should do so, and I saw Stefan's
> > > > > patchset didn't modify them too.
> > > > > 
> > > > > My personnal thinking is we should apply trylock logic for this
> > > > > inode->i_rwsem. For xfs lock in touch_atime, we should do that since it
> > > > > doesn't make sense to rollback all the stuff while we are almost at the
> > > > > end of getdents because of a lock.
> > > > 
> > > > That manoeuvres around the problem. Which I'm slightly more sensitive
> > > > too as this review is a rather expensive one.
> > > > 
> > > > Plus, it seems fixable in at least two ways:
> > > > 
> > > > For both we need to be able to tell the filesystem that a nowait atime
> > > > update is requested. Simple thing seems to me to add a S_NOWAIT flag to
> > > > file_time_flags and passing that via i_op->update_time() which already
> > > > has a flag argument. That would likely also help kiocb_modified().
> > > 
> > > fwiw, we've just recently had similar problems with io_uring read/write
> > > and atime/mtime in prod environment, so we're interested in solving that
> > > regardless of this patchset. I.e. io_uring issues rw with NOWAIT, {a,m}time
> > > touch ignores that, that stalls other submissions and completely screws
> > > latency.
> > > 
> > > > file_accessed()
> > > > -> touch_atime()
> > > >      -> inode_update_time()
> > > >         -> i_op->update_time == xfs_vn_update_time()
> > > > 
> > > > Then we have two options afaict:
> > > > 
> > > > (1) best-effort atime update
> > > > 
> > > > file_accessed() already has the builtin assumption that updating atime
> > > > might fail for other reasons - see the comment in there. So it is
> > > > somewhat best-effort already.
> > > > 
> > > > (2) move atime update before calling into filesystem
> > > > 
> > > > If we want to be sure that access time is updated when a readdir request
> > > > is issued through io_uring then we need to have file_accessed() give a
> > > > return value and expose a new helper for io_uring or modify
> > > > vfs_getdents() to do something like:
> > > > 
> > > > vfs_getdents()
> > > > {
> > > > 	if (nowait)
> > > > 		down_read_trylock()
> > > > 
> > > > 	if (!IS_DEADDIR(inode)) {
> > > > 		ret = file_accessed(file);
> > > > 		if (ret == -EAGAIN)
> > > > 			goto out_unlock;
> > > > 
> > > > 		f_op->iterate_shared()
> > > > 	}
> > > > }
> > > > 
> > > > It's not unprecedented to do update atime before the actual operation
> > > > has been done afaict. That's already the case in xfs_file_write_checks()
> > > > which is called before anything is written. So that seems ok.
> > > > 
> > > > Does any of these two options work for the xfs maintainers and Jens?
> > > 
> > > It doesn't look (2) will solve it for reads/writes, at least without
> > 
> > It would also solve it for writes which is what my kiocb_modified()
> > comment was about. So right now you have:
> 
> Great, I assumed there are stricter requirements for mtime not
> transiently failing.

But I mean then wouldn't this already be a problem today?
kiocb_modified() can error out with EAGAIN today:

          ret = inode_needs_update_time(inode, &now);
          if (ret <= 0)
                  return ret;
          if (flags & IOCB_NOWAIT)
                  return -EAGAIN;

          return __file_update_time(file, &now, ret);

the thing is that it doesn't matter for ->write_iter() - for xfs at
least - because xfs does it as part of preparatory checks before
actually doing any real work. The problem happens when you do actual
work and afterwards call kiocb_modified(). That's why I think (2) is
preferable.

> 
> 
> > kiocb_modified(IOCB_NOWAI)
> > -> file_modified_flags(IOCB_NOWAI)
> >     -> file_remove_privs(IOCB_NOWAIT) // already fully non-blocking
> >     -> file_accessed(IOCB_NOWAIT)
> >        -> i_op->update_time(S_ATIME | S_NOWAIT)
> > 
> > and since xfs_file_write_iter() calls xfs_file_write_checks() before
> > doing any actual work you'd now be fine.
> > 
> > For reads xfs_file_read_iter() would need to be changed to a similar
> > logic but that's for xfs to decide ultimately.
> > 
> > > the pain of changing the {write,read}_iter callbacks. 1) sounds good
> > > to me from the io_uring perspective, but I guess it won't work
> > > for mtime?
> > 
> > I would prefer 2) which seems cleaner to me. But I might miss why this
> > won't work. So input needed/wanted.
> 
> Maybe I didn't fully grasp the (2) idea
> 
> 2.1: all read_iter, write_iter, etc. callbacks should do file_accessed()
> before doing IO, which sounds like a good option if everyone agrees with
> that. Taking a look at direct block io, it's already like this.

Yes, that's what I'm talking about. I'm asking whether that's ok for xfs
maintainers basically. i_op->write_iter() already works like that since
the dawn of time but i_op->read_iter doesn't and I'm proposing to make
it work like that and wondering if there's any issues I'm unaware of.

> 
> 2.2: Having io_uring doing file_accessed(). Since it's all currently
> hidden behind {read,write}_iter() callbacks and not easily extractable,
> it doesn't like a good option, unless I missed sth.

I think that would be the wrong approach and is definitely not what I
meant.

  reply	other threads:[~2023-07-27 16:29 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-18 13:21 [PATCH v4 0/5] io_uring getdents Hao Xu
2023-07-18 13:21 ` [PATCH 1/5] fs: split off vfs_getdents function of getdents64 syscall Hao Xu
2023-07-18 13:21 ` [PATCH 2/5] vfs_getdents/struct dir_context: add flags field Hao Xu
2023-07-18 13:21 ` [PATCH 3/5] io_uring: add support for getdents Hao Xu
2023-07-19  8:56   ` Hao Xu
2023-07-26 15:00   ` Christian Brauner
2023-07-27 11:51     ` Hao Xu
2023-07-27 14:27       ` Christian Brauner
2023-07-27 15:12         ` Pavel Begunkov
2023-07-27 15:52           ` Christian Brauner
2023-07-27 16:17             ` Pavel Begunkov
2023-07-27 16:28               ` Christian Brauner [this message]
2023-07-31  1:58                 ` Dave Chinner
2023-07-31  7:34                   ` Hao Xu
2023-07-31  7:50                     ` Christian Brauner
2023-07-31  7:40                   ` Christian Brauner
2023-07-30 18:02         ` Hao Xu
2023-07-31  8:18           ` Christian Brauner
2023-07-31  9:31             ` Hao Xu
2023-07-31  1:33         ` Dave Chinner
2023-07-31  8:13           ` Christian Brauner
2023-07-31 15:26             ` Darrick J. Wong
2023-07-31 22:18               ` Dave Chinner
2023-08-01  0:28               ` Jens Axboe
2023-08-01  0:47                 ` Matthew Wilcox
2023-08-01  0:49                   ` Jens Axboe
2023-08-01  1:01                     ` Matthew Wilcox
2023-08-01  7:00                       ` Christian Brauner
2023-08-01  6:59                     ` Christian Brauner
2023-08-01  7:17                 ` Christian Brauner
2023-08-08  4:34                 ` Hao Xu
2023-08-08  5:18                   ` Hao Xu
2023-08-08  9:33                 ` Hao Xu
2023-08-08 22:55                   ` Jens Axboe
2023-08-01 18:39             ` Hao Xu
2023-07-18 13:21 ` [PATCH 4/5] xfs: add NOWAIT semantics for readdir Hao Xu
2023-07-19  2:35   ` kernel test robot
2023-07-19  6:22   ` Fwd: " Hao Xu
2023-07-18 13:21 ` [PATCH RFC 5/5] disable fixed file for io_uring getdents for now Hao Xu
2023-07-26 14:23   ` Christian Brauner
2023-07-27 12:09     ` Hao Xu
2023-07-19  6:04 ` [PATCH v4 0/5] io_uring getdents Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230727-daran-abtun-4bc755f668ad@brauner \
    --to=brauner@kernel.org \
    --cc=asmadeus@codewreck.org \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=bugs@claycon.org \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hao.xu@linux.dev \
    --cc=io-uring@vger.kernel.org \
    --cc=josef@toxicpanda.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=shr@fb.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.