From: Andy Lutomirski <luto@amacapital.net>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	Pavel Emelyanov <xemul@scylladb.com>,
	linux-fsdevel@vger.kernel.org,
	 "Raphael S . Carvalho" <raphaelsc@scylladb.com>,
	linux-api@vger.kernel.org,  linux-xfs@vger.kernel.org
Subject: Re: [PATCH] fs: Propagate FMODE_NOCMTIME flag to user-facing O_NOCMTIME
Date: Wed, 8 Oct 2025 14:51:14 -0700	[thread overview]
Message-ID: <CALCETrX-cs5MH3k369q2Fk5Q-pYQfEV6CW3va-4E9vD1CoCaGA@mail.gmail.com> (raw)
In-Reply-To: <aObXUBCtp4p83QzS@dread.disaster.area>
On Wed, Oct 8, 2025 at 2:27 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Wed, Oct 08, 2025 at 08:22:35AM -0700, Andy Lutomirski wrote:
> > On Mon, Oct 6, 2025 at 10:08 PM Christoph Hellwig <hch@infradead.org> wrote:
> > >
> > > On Sat, Oct 04, 2025 at 09:08:05AM -0700, Andy Lutomirski wrote:
>
> You are conflating "synchronous update" with "blocking".
>
> Avoiding the need for synchronous timestamp updates is exactly what
> the lazytime mount option provides. i.e. lazytime degrades immediate
> consistency requirements to eventual consistency similar to how the
> default relatime behaviour defers atime updates for eventual
> writeback.
>
> IOWs, we've already largely addressed the synchronous c/mtime update
> problem but what we haven't done is made timestamp updates
> fully support non-blocking caller semantics. That's a separate
> problem...
I'm probably missing something, but is this really different?  Either
the mtime update can block or it can't block.  I haven't dug all the
way into exactly what happens in __mark_inode_dirty(), but there is a
lot going on in there even in the I_DIRTY_TIME path.  And Pavel is
saying that AIO and mtime updates don't play along well.
>
> > and it does so before updating the file contents
> > (although the window during which the timestamp is updated and the
> > contents are not is not as absurdly long as it is in the mmap case).
> >
> > Now my series does not change any of this, but I'm thinking more of
> > the concept: instead of doing file/inode_update_time when a file is
> > logically written (in write_iter, page_mkwrite, etc), set a flag so
> > that the writeback code knows that the timestamp needs updating.
>
> This is exactly what lazytime implements with the I_DIRTY_FLAG.
>
> During writeback, if the filesystem has to modify other metadata in
> the inode (e.g. block allocation), the filesystem will piggyback the
> persistent update of the dirty timestamps on that modification and
> clear the I_DIRTY_TIME flag.
>
> However, if the writeback operation is a pure overwrite, then there
> is no metadata modifiction occuring and so we leave the inode
> I_DIRTY_TIME dirty for a future metadata persistence operation to
> clean them.
>
> IOWs, with lazytime, writeback already persists timestamp updates
> when appropriate for best performance.
I'm probably doing a bad job explaining myself.
In my series, I move (for page_mkwrite only) the mtime update,
*including dirtying the inode* to the writeback path, which makes it
fully non-blocking / asynchronous / whatever you want to call it at
the time that page_mkwrite is called.  More concretely, my suggestion
is to be a bit lazier than current lazytime and not dirty the inode
*at all* in write_iter, or at least not dirty it for the purpose of
timestamp updates.  Instead set a flag somewhere that it cannot be
forgotten about -- in my series, it's this patch:
https://lore.kernel.org/all/f2ac22142b4634b55ff6858d159b45dac96f81b6.1377193658.git.luto@amacapital.net/
and it's a single atomic bit in struct address_space.  The idea is
that there is approximately no additional overhead at the time that
the page cache is dirtied for cmtime-related inode dirtying and that
all such overhead is deferred to the writeback path when it's as
asynchronous as possible from the perspective of whatever user code
dirtied the page cache.  My page_set_cmtime() is completely lockless.
My series is far from perfect, but I did test it with real workloads
12-ish years ago, on overworked HDDs, with latencytop, and it worked.
Performance was vastly improved (using mmap, not write(), obviously).
>
> > Thinking out loud, to handle both write_iter and mmap, there might
> > need to be two bits: one saying "the timestamp needs to be updated"
> > and another saying "the timestamp has been updated in the in-memory
> > inode, but the inode hasn't been dirtied yet".
>
> The flag that implements the latter is called I_DIRTY_TIME. We have
> not implemented the former as that's a userspace visible change of
> behaviour.
Maybe that change should be done?  Or not -- it wouldn't be terribly
hard to have a pair of atomic timestamps in struct inode indicating
what timestamps we want to write the next time we get around to it.
(Concretely, page_set_cmtime() would get some new parameters to
specify actual times, and atomic compare exchange would be used to
update the underlying data structure, so it would remain lock-free but
not be wait-free.)
next prev parent reply	other threads:[~2025-10-08 21:51 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20251003093213.52624-1-xemul@scylladb.com>
2025-10-04  4:26 ` [PATCH] fs: Propagate FMODE_NOCMTIME flag to user-facing O_NOCMTIME Christoph Hellwig
2025-10-04 16:08   ` Andy Lutomirski
2025-10-07  5:08     ` Christoph Hellwig
2025-10-08 15:22       ` Andy Lutomirski
2025-10-08 21:27         ` Dave Chinner
2025-10-08 21:51           ` Andy Lutomirski [this message]
2025-10-11  1:35             ` Dave Chinner
2025-10-11  4:04               ` Andy Lutomirski
2025-10-10  5:27         ` Christoph Hellwig
2025-10-10 17:35           ` Andy Lutomirski
2025-10-05 22:06   ` Dave Chinner
2025-10-07  5:10     ` Christoph Hellwig
2025-10-05 23:38   ` Dave Chinner
2025-10-06  2:16     ` Theodore Ts'o
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox
  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):
  git send-email \
    --in-reply-to=CALCETrX-cs5MH3k369q2Fk5Q-pYQfEV6CW3va-4E9vD1CoCaGA@mail.gmail.com \
    --to=luto@amacapital.net \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=raphaelsc@scylladb.com \
    --cc=xemul@scylladb.com \
    /path/to/YOUR_REPLY
  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
  Be sure your reply has a Subject: header at the top and a blank line
  before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).