linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org,  linux-kernel@vger.kernel.org
Subject: Re: [PATCH v6 1/9] fs: add infrastructure for multigrain timestamps
Date: Thu, 08 Aug 2024 20:29:18 -0400	[thread overview]
Message-ID: <165c8f15eec4412cf76f46fcff794ae1792ac8db.camel@kernel.org> (raw)
In-Reply-To: <gcn5kkrc2eeger6uzwqe5iinxtevhrgi3qz6ru3th3bkt4nrfd@ldkkwu4ndpnn>

On Fri, 2024-08-09 at 01:43 +0200, Mateusz Guzik wrote:
> On Mon, Jul 15, 2024 at 08:48:52AM -0400, Jeff Layton wrote:
> >  /**
> >   * inode_set_ctime_current - set the ctime to current_time
> >   * @inode: inode
> >   *
> > - * Set the inode->i_ctime to the current value for the inode. Returns
> > - * the current value that was assigned to i_ctime.
> > + * Set the inode's ctime to the current value for the inode. Returns the
> > + * current value that was assigned. If this is not a multigrain inode, then we
> > + * just set it to whatever the coarse_ctime is.
> > + *
> > + * If it is multigrain, then we first see if the coarse-grained timestamp is
> > + * distinct from what we have. If so, then we'll just use that. If we have to
> > + * get a fine-grained timestamp, then do so, and try to swap it into the floor.
> > + * We accept the new floor value regardless of the outcome of the cmpxchg.
> > + * After that, we try to swap the new value into i_ctime_nsec. Again, we take
> > + * the resulting ctime, regardless of the outcome of the swap.
> >   */
> >  struct timespec64 inode_set_ctime_current(struct inode *inode)
> >  {
> > -	struct timespec64 now = current_time(inode);
> > +	ktime_t now, floor = atomic64_read(&ctime_floor);
> > +	struct timespec64 now_ts;
> > +	u32 cns, cur;
> > +
> > +	now = coarse_ctime(floor);
> > +
> > +	/* Just return that if this is not a multigrain fs */
> > +	if (!is_mgtime(inode)) {
> > +		now_ts = timestamp_truncate(ktime_to_timespec64(now), inode);
> > +		inode_set_ctime_to_ts(inode, now_ts);
> > +		goto out;
> > +	}
> > +
> > +	/*
> > +	 * We only need a fine-grained time if someone has queried it,
> > +	 * and the current coarse grained time isn't later than what's
> > +	 * already there.
> > +	 */
> > +	cns = smp_load_acquire(&inode->i_ctime_nsec);
> > +	if (cns & I_CTIME_QUERIED) {
> > +		ktime_t ctime = ktime_set(inode->i_ctime_sec, cns & ~I_CTIME_QUERIED);
> > +
> > +		if (!ktime_after(now, ctime)) {
> > +			ktime_t old, fine;
> > +
> > +			/* Get a fine-grained time */
> > +			fine = ktime_get();
> >  
> > -	inode_set_ctime_to_ts(inode, now);
> > -	return now;
> > +			/*
> > +			 * If the cmpxchg works, we take the new floor value. If
> > +			 * not, then that means that someone else changed it after we
> > +			 * fetched it but before we got here. That value is just
> > +			 * as good, so keep it.
> > +			 */
> > +			old = floor;
> > +			if (!atomic64_try_cmpxchg(&ctime_floor, &old, fine))
> > +				fine = old;
> > +			now = ktime_mono_to_real(fine);
> > +		}
> > +	}
> > +	now_ts = timestamp_truncate(ktime_to_timespec64(now), inode);
> > +	cur = cns;
> > +
> > +	/* No need to cmpxchg if it's exactly the same */
> > +	if (cns == now_ts.tv_nsec && inode->i_ctime_sec == now_ts.tv_sec)
> > +		goto out;
> > +retry:
> > +	/* Try to swap the nsec value into place. */
> > +	if (try_cmpxchg(&inode->i_ctime_nsec, &cur, now_ts.tv_nsec)) {
> > +		/* If swap occurred, then we're (mostly) done */
> > +		inode->i_ctime_sec = now_ts.tv_sec;
> 
> 
> Linux always had rather lax approach to consistency of getattr results
> and I wonder if with this patchset it is no longer viable.
> 
> Ignoring the flag, suppose ctime on the inode is { nsec = 12, sec = 1 },
> while the new timestamp is { nsec = 1, sec = 2 }
> 
> The current update method results in a transient state where { nsec = 1,
> sec = 1 }. But this represents an earlier point in time.
> 
> Thus a thread which observed the first state and spotted the transient
> value in the second one is going to conclude time went backwards. Is
> this considered fine given what the multigrain stuff is trying to
> accomplish?
> 

Yes, I think so.

> As for fixing this, off hand I note there is a 4-byte hole in struct
> inode, just enough to store a sequence counter which fill_mg_cmtime
> could use to safely read the sec/nsec pair. The write side would take
> the inode spinlock.
> 

Note that this is also a problem today with always-coarse timestamps.
We track timestamps in two separate words, and "torn reads" are always
a possibility. I suspect it happens occasionally and we just never
notice.

The main goal with the multigrain series was to make sure that
measuring the ctime of the same inode on both sides of a change always
shows a change in value. I think we'll still achieve that here, even
with a torn read (unless we're just exceptionally unlucky).

As far as the ordering of timestamps between two different files; this
patchset doesn't make that any worse.

I did have an earlier version of this patchset that converted the
i_ctime fields to a ktime_t. That would have solved this problem too,
but it had other drawbacks. We could reconsider that though.

In any case, I think that's really a separate project from the
multigrain work. Given that no one has complained about torn reads so
far, I wouldn't bother at this point.

> > +	} else {
> > +		/*
> > +		 * Was the change due to someone marking the old ctime QUERIED?
> > +		 * If so then retry the swap. This can only happen once since
> > +		 * the only way to clear I_CTIME_QUERIED is to stamp the inode
> > +		 * with a new ctime.
> > +		 */
> > +		if (!(cns & I_CTIME_QUERIED) && (cns | I_CTIME_QUERIED) == cur) {
> > +			cns = cur;
> > +			goto retry;
> > +		}
> > +		/* Otherwise, keep the existing ctime */
> > +		now_ts.tv_sec = inode->i_ctime_sec;
> > +		now_ts.tv_nsec = cur & ~I_CTIME_QUERIED;
> > +	}
> > +out:
> > +	return now_ts;
> >  }
> >  EXPORT_SYMBOL(inode_set_ctime_current);

-- 
Jeff Layton <jlayton@kernel.org>

  reply	other threads:[~2024-08-09  0:29 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-15 12:48 [PATCH v6 0/9] fs: multigrain timestamp redux Jeff Layton
2024-07-15 12:48 ` [PATCH v6 1/9] fs: add infrastructure for multigrain timestamps Jeff Layton
2024-07-17 10:40   ` Jan Kara
2024-08-08 22:09   ` Mateusz Guzik
2024-08-09  0:00     ` Jeff Layton
2024-08-08 23:43   ` Mateusz Guzik
2024-08-09  0:29     ` Jeff Layton [this message]
2024-07-15 12:48 ` [PATCH v6 2/9] fs: tracepoints around multigrain timestamp events Jeff Layton
2024-07-15 18:29   ` Darrick J. Wong
2024-07-17 10:41   ` Jan Kara
2024-07-15 12:48 ` [PATCH v6 3/9] fs: add percpu counters for significant " Jeff Layton
2024-07-15 18:32   ` Darrick J. Wong
2024-07-15 19:53     ` Jeff Layton
2024-07-15 20:03       ` Darrick J. Wong
2024-07-17 10:45       ` Jan Kara
2024-07-15 12:48 ` [PATCH v6 4/9] fs: have setattr_copy handle multigrain timestamps appropriately Jeff Layton
2024-07-17 11:24   ` Jan Kara
2024-07-15 12:48 ` [PATCH v6 5/9] Documentation: add a new file documenting multigrain timestamps Jeff Layton
2024-07-15 18:45   ` Darrick J. Wong
2024-07-17  6:00   ` Randy Dunlap
2024-07-17 11:31   ` Jan Kara
2024-07-17 12:02     ` Jeff Layton
2024-07-15 12:48 ` [PATCH v6 6/9] xfs: switch to " Jeff Layton
2024-07-15 18:47   ` Darrick J. Wong
2024-07-15 12:48 ` [PATCH v6 7/9] ext4: " Jeff Layton
2024-07-17 11:32   ` Jan Kara
2024-07-15 12:48 ` [PATCH v6 8/9] btrfs: convert " Jeff Layton
2024-07-15 12:49 ` [PATCH v6 9/9] tmpfs: add support for " Jeff Layton
2024-07-17 11:34   ` Jan Kara
2024-07-16  7:37 ` [PATCH v6 0/9] fs: multigrain timestamp redux Christian Brauner
2024-07-16 12:45   ` Jeff Layton
2024-07-22 15:30     ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=165c8f15eec4412cf76f46fcff794ae1792ac8db.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mjguzik@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).