From: Jeff Layton <jlayton@kernel.org>
To: Arnd Bergmann <arnd@kernel.org>, John Stultz <jstultz@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Thomas Gleixner <tglx@linutronix.de>,
Stephen Boyd <sboyd@kernel.org>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
kernel test robot <oliver.sang@intel.com>
Subject: Re: [PATCH] timekeeping: move multigrain ctime floor handling into timekeeper
Date: Thu, 12 Sep 2024 07:34:42 -0400 [thread overview]
Message-ID: <c9ed7670a0b35b212991b7ce4735cb3dfaae1fda.camel@kernel.org> (raw)
In-Reply-To: <1484d32b-ab0f-48ff-998a-62feada58300@app.fastmail.com>
On Thu, 2024-09-12 at 10:01 +0000, Arnd Bergmann wrote:
> On Wed, Sep 11, 2024, at 20:43, Jeff Layton wrote:
> >
> > I think we'd have to track this delta as an atomic value and cmpxchg
> > new values into place. The zeroing seems quite tricky to make race-
> > free.
> >
> > Currently, we fetch the floor value early in the process and if it
> > changes before we can swap a new one into place, we just take whatever
> > the new value is (since it's just as good). Since these are monotonic
> > values, any new value is still newer than the original one, so its
> > fine. I'm not sure that still works if we're dealing with a delta that
> > is siding upward and downward.
> >
> > Maybe it does though. I'll take a stab at this tomorrow and see how it
> > looks.
>
> Right, the only idea I had for this would be to atomically
> update a 64-bit tuple of the 32-bit sequence count and the
> 32-bit delta value in the timerkeeper. That way I think the
> "coarse" reader would still get a correct value when running
> concurrently with both a fine-grained reader updating the count
> and the timer tick setting a new count.
>
> There are still a couple of problems:
>
> - this extends the timekeeper logic beyond what the seqlock
> semantics normally allow, and I can't prove that this actually
> works in all corner cases.
>
> - if the delta doesn't fit in a 32-bit value, there has to
> be another fallback mechanism.
>
That could be a problem. I was hoping the delta couldn't grow that
large between timer ticks, but I guess it can. I guess the fallback
could be to just grab new fine-grained timestamps on each call until
the timer ticks.
> - This still requires an atomic64_cmpxchg() in the
> fine-grained ktime_get_real_ts64() replacement, which
> I think is what inode_set_ctime_current() needs today
> as well to ensure that the next coarse value is the
> highest one that has been read so far.
>
Yes. We really don't want to take the seqlock for write just to update
timestamps. I'd prefer to keep the floor-handling lock-free if
possible.
> There is another idea that would completely replace
> your design with something /much/ simpler:
>
> - add a variant of ktime_get_real_ts64() that just
> sets a flag in the timekeeper to signify that a
> fine-grained time has been read since the last
> timer tick
> - add a variant of ktime_get_coarse_real_ts64()
> that returns either tk_xtime() if the flag is
> clear or calls ktime_get_real_ts64() if it's set
> - reset the flag in timekeeping_advance() and any other
> place that updates tk_xtime
>
> That way you avoid the atomic64_try_cmpxchg()
> inode_set_ctime_current(), making that case faster,
> and avoid all overhead in coarse_ctime() unless you
> use both types during the same tick.
>
With the current code we only get a fine grained timestamp iff:
1/ the timestamps have been queried (a'la I_CTIME_QUERIED)
2/ the current coarse-grained or floor time would not show a change in
the ctime
If we do what you're suggesting above, as soon as one task sets the
flag, anyone calling current_time() will end up getting a brand new
fine-grained timestamp, even when the current floor time would have
been fine.
That means a lot more calls into ktime_get_real_ts64(), at least until
the timer ticks, and would probably mean a lot of extra journal
transactions, since those timestamps would all be distinct from one
another and would need to go to disk more often.
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2024-09-12 11:34 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-11 12:56 [PATCH] timekeeping: move multigrain ctime floor handling into timekeeper Jeff Layton
2024-09-11 19:55 ` John Stultz
2024-09-11 20:19 ` Arnd Bergmann
2024-09-11 20:43 ` Jeff Layton
2024-09-12 10:01 ` Arnd Bergmann
2024-09-12 11:34 ` Jeff Layton [this message]
2024-09-12 13:17 ` Arnd Bergmann
2024-09-12 13:26 ` Jeff Layton
2024-09-12 14:37 ` Jeff Layton
2024-09-12 16:51 ` Arnd Bergmann
2024-09-11 20:19 ` Jeff Layton
2024-09-12 12:31 ` Christian Brauner
2024-09-12 12:39 ` Jeff Layton
2024-09-12 12:43 ` Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c9ed7670a0b35b212991b7ce4735cb3dfaae1fda.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=arnd@kernel.org \
--cc=brauner@kernel.org \
--cc=jack@suse.cz \
--cc=jstultz@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=oliver.sang@intel.com \
--cc=sboyd@kernel.org \
--cc=tglx@linutronix.de \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox