public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Arnd Bergmann <arnd@kernel.org>, John Stultz <jstultz@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	Thomas Gleixner <tglx@linutronix.de>,
	Stephen Boyd <sboyd@kernel.org>,
	 linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	kernel test robot <oliver.sang@intel.com>
Subject: Re: [PATCH] timekeeping: move multigrain ctime floor handling into timekeeper
Date: Thu, 12 Sep 2024 07:34:42 -0400	[thread overview]
Message-ID: <c9ed7670a0b35b212991b7ce4735cb3dfaae1fda.camel@kernel.org> (raw)
In-Reply-To: <1484d32b-ab0f-48ff-998a-62feada58300@app.fastmail.com>

On Thu, 2024-09-12 at 10:01 +0000, Arnd Bergmann wrote:
> On Wed, Sep 11, 2024, at 20:43, Jeff Layton wrote:
> > 
> > I think we'd have to track this delta as an atomic value and cmpxchg
> > new values into place. The zeroing seems quite tricky to make race-
> > free.
> > 
> > Currently, we fetch the floor value early in the process and if it
> > changes before we can swap a new one into place, we just take whatever
> > the new value is (since it's just as good). Since these are monotonic
> > values, any new value is still newer than the original one, so its
> > fine. I'm not sure that still works if we're dealing with a delta that
> > is siding upward and downward.
> > 
> > Maybe it does though. I'll take a stab at this tomorrow and see how it
> > looks.
> 
> Right, the only idea I had for this would be to atomically
> update a 64-bit tuple of the 32-bit sequence count and the
> 32-bit delta value in the timerkeeper. That way I think the
> "coarse" reader would still get a correct value when running
> concurrently with both a fine-grained reader updating the count
> and the timer tick setting a new count.
> 
> There are still a couple of problems:
> 
> - this extends the timekeeper logic beyond what the seqlock
>   semantics normally allow, and I can't prove that this actually
>   works in all corner cases.
>
> - if the delta doesn't fit in a 32-bit value, there has to 
>   be another fallback mechanism.
> 

That could be a problem. I was hoping the delta couldn't grow that
large between timer ticks, but I guess it can. I guess the fallback
could be to just grab new fine-grained timestamps on each call until
the timer ticks.

> - This still requires an atomic64_cmpxchg() in the
>   fine-grained ktime_get_real_ts64() replacement, which
>   I think is what inode_set_ctime_current() needs today
>   as well to ensure that the next coarse value is the
>   highest one that has been read so far.
> 

Yes. We really don't want to take the seqlock for write just to update
timestamps. I'd prefer to keep the floor-handling lock-free if
possible.

> There is another idea that would completely replace
> your design with something /much/ simpler:
> 
>  - add a variant of ktime_get_real_ts64() that just
>    sets a flag in the timekeeper to signify that a
>    fine-grained time has been read since the last
>    timer tick
>  - add a variant of ktime_get_coarse_real_ts64()
>    that returns either tk_xtime() if the flag is
>    clear or calls ktime_get_real_ts64() if it's set
>  - reset the flag in timekeeping_advance() and any other
>    place that updates tk_xtime
> 
> That way you avoid the atomic64_try_cmpxchg()
> inode_set_ctime_current(), making that case faster,
> and avoid all overhead in coarse_ctime() unless you
> use both types during the same tick.
> 

With the current code we only get a fine grained timestamp iff:

1/ the timestamps have been queried (a'la I_CTIME_QUERIED)
2/ the current coarse-grained or floor time would not show a change in
the ctime

If we do what you're suggesting above, as soon as one task sets the
flag, anyone calling current_time() will end up getting a brand new
fine-grained timestamp, even when the current floor time would have
been fine.

That means a lot more calls into ktime_get_real_ts64(), at least until
the timer ticks, and would probably mean a lot of extra journal
transactions, since those timestamps would all be distinct from one
another and would need to go to disk more often.
-- 
Jeff Layton <jlayton@kernel.org>

  reply	other threads:[~2024-09-12 11:34 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-11 12:56 [PATCH] timekeeping: move multigrain ctime floor handling into timekeeper Jeff Layton
2024-09-11 19:55 ` John Stultz
2024-09-11 20:19   ` Arnd Bergmann
2024-09-11 20:43     ` Jeff Layton
2024-09-12 10:01       ` Arnd Bergmann
2024-09-12 11:34         ` Jeff Layton [this message]
2024-09-12 13:17           ` Arnd Bergmann
2024-09-12 13:26             ` Jeff Layton
2024-09-12 14:37               ` Jeff Layton
2024-09-12 16:51                 ` Arnd Bergmann
2024-09-11 20:19   ` Jeff Layton
2024-09-12 12:31 ` Christian Brauner
2024-09-12 12:39   ` Jeff Layton
2024-09-12 12:43     ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c9ed7670a0b35b212991b7ce4735cb3dfaae1fda.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=arnd@kernel.org \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=jstultz@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oliver.sang@intel.com \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox