From: Jeff Layton <jlayton@kernel.org>
To: Jan Kara <jack@suse.cz>, Christian Brauner <brauner@kernel.org>
Cc: Latchesar Ionkov <lucho@ionkov.net>,
Martin Brandenburg <martin@omnibond.com>,
Konstantin Komarov <almaz.alexandrovich@paragon-software.com>,
linux-xfs@vger.kernel.org, "Darrick J. Wong" <djwong@kernel.org>,
Dominique Martinet <asmadeus@codewreck.org>,
Christian Schoenebeck <linux_oss@crudebyte.com>,
linux-unionfs@vger.kernel.org,
David Howells <dhowells@redhat.com>, Chris Mason <clm@fb.com>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Hans de Goede <hdegoede@redhat.com>,
Marc Dionne <marc.dionne@auristor.com>,
codalist@coda.cs.cmu.edu, linux-afs@lists.infradead.org,
linux-mtd@lists.infradead.org,
Mike Marshall <hubcap@omnibond.com>,
Paulo Alcantara <pc@manguebit.com>, Amir Goldstein <l@gmail.com>,
Eric Van Hensbergen <ericvh@kernel.org>,
bug-gnulib@gnu.org, Andreas Gruenbacher <agruenba@redhat.com>,
Miklos Szeredi <miklos@szeredi.hu>,
Richard Weinberger <richard@nod.at>,
Mark Fasheh <mark@fasheh.com>, Hugh Dickins <hughd@google.com>,
Benjamin Coddington <bcodding@redhat.com>,
Tyler Hicks <code@tyhicks.com>,
cluster-devel@redhat.com, coda@cs.cmu.edu, linux-mm@kvack.org,
Gao Xiang <xiang@kernel.org>, Iurii Zaikin <yzaikin@google.com>,
Namjae Jeon <linkinjeon@kernel.org>,
Trond Myklebust <trond.myklebust@hammerspace.com>,
Xi Ruoyao <xry111@linuxfromscratch.org>,
Shyam Prasad N <sprasad@microsoft.com>,
ecryptfs@vger.kernel.org, Kees Cook <keescook@chromium.org>,
ocfs2-devel@lists.linux.dev, linux-cifs@vger.kernel.org,
linux-erofs@lists.ozlabs.org, Josef Bacik <josef@toxicpanda.com>,
Tom Talpey <tom@talpey.com>, Tejun Heo <tj@kernel.org>,
Yue Hu <huyue2@coolpad.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Ronnie Sahlberg <ronniesahlberg@gmail.com>,
David Sterba <dsterba@suse.com>, Jaegeuk Kim <jaegeuk@kernel.org>,
ceph-devel@vger.kernel.org, Xiubo Li <xiubli@redhat.com>,
Ilya Dryomov <idryomov@gmail.com>,
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>,
Jan Harkes <jaharkes@cs.cmu.edu>,
linux-nfs@vger.kernel.org, linux-ext4@vger.kernel.org,
Theodore Ts'o <tytso@mit.edu>,
Joseph Qi <joseph.qi@linux.alibaba.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
v9fs@lists.linux.dev, ntfs3@lists.linux.dev,
samba-technical@lists.samba.org, linux-kernel@vger.kernel.org,
linux-f2fs-devel@lists.sourceforge.net,
Steve French <sfrench@samba.org>,
Sergey Senozhatsky <senozhatsky@chromium.org>,
Luis Chamberlain <mcgrof@kernel.org>,
Jeffle Xu <jefflexu@linux.alibaba.com>,
devel@lists.orangefs.org, Anna Schumaker <anna@kernel.org>,
Jan Kara <jack@suse.com>, Bo b Peterson <rpeterso@redhat.com>,
linux-fsdevel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Sungjong Seo <sj1557.seo@samsung.com>,
Bruno Haible <bruno@clisp.org>,
linux-btrfs@vger.kernel.org, Joel Becker <jlbec@evilplan.org>
Subject: Re: [f2fs-dev] [PATCH v7 12/13] ext4: switch to multigrain timestamps
Date: Wed, 20 Sep 2023 06:35:18 -0400 [thread overview]
Message-ID: <317d84b1b909b6c6519a2406fcb302ce22dafa41.camel@kernel.org> (raw)
In-Reply-To: <20230920101731.ym6pahcvkl57guto@quack3>
On Wed, 2023-09-20 at 12:17 +0200, Jan Kara wrote:
> On Wed 20-09-23 10:41:30, Christian Brauner wrote:
> > > > f1 was last written to *after* f2 was last written to. If the timestamp of f1
> > > > is then lower than the timestamp of f2, timestamps are fundamentally broken.
> > > >
> > > > Many things in user-space depend on timestamps, such as build system
> > > > centered around 'make', but also 'find ... -newer ...'.
> > > >
> > >
> > >
> > > What does breakage with make look like in this situation? The "fuzz"
> > > here is going to be on the order of a jiffy. The typical case for make
> > > timestamp comparisons is comparing source files vs. a build target. If
> > > those are being written nearly simultaneously, then that could be an
> > > issue, but is that a typical behavior? It seems like it would be hard to
> > > rely on that anyway, esp. given filesystems like NFS that can do lazy
> > > writeback.
> > >
> > > One of the operating principles with this series is that timestamps can
> > > be of varying granularity between different files. Note that Linux
> > > already violates this assumption when you're working across filesystems
> > > of different types.
> > >
> > > As to potential fixes if this is a real problem:
> > >
> > > I don't really want to put this behind a mount or mkfs option (a'la
> > > relatime, etc.), but that is one possibility.
> > >
> > > I wonder if it would be feasible to just advance the coarse-grained
> > > current_time whenever we end up updating a ctime with a fine-grained
> > > timestamp? It might produce some inode write amplification. Files that
> >
> > Less than ideal imho.
> >
> > If this risks breaking existing workloads by enabling it unconditionally
> > and there isn't a clear way to detect and handle these situations
> > without risk of regression then we should move this behind a mount
> > option.
> >
> > So how about the following:
> >
> > From cb14add421967f6e374eb77c36cc4a0526b10d17 Mon Sep 17 00:00:00 2001
> > From: Christian Brauner <brauner@kernel.org>
> > Date: Wed, 20 Sep 2023 10:00:08 +0200
> > Subject: [PATCH] vfs: move multi-grain timestamps behind a mount option
> >
> > While we initially thought we can do this unconditionally it turns out
> > that this might break existing workloads that rely on timestamps in very
> > specific ways and we always knew this was a possibility. Move
> > multi-grain timestamps behind a vfs mount option.
> >
> > Signed-off-by: Christian Brauner <brauner@kernel.org>
>
> Surely this is a safe choice as it moves the responsibility to the sysadmin
> and the cases where finegrained timestamps are required. But I kind of
> wonder how is the sysadmin going to decide whether mgtime is safe for his
> system or not? Because the possible breakage needn't be obvious at the
> first sight...
>
That's the main reason I really didn't want to go with a mount option.
Documenting that may be difficult. While there is some pessimism around
it, I may still take a stab at just advancing the coarse clock whenever
we fetch a fine-grained timestamp. It'd be nice to remove this option in
the future if that turns out to be feasible.
> If I were a sysadmin, I'd rather opt for something like
> finegrained timestamps + lazytime (if I needed the finegrained timestamps
> functionality). That should avoid the IO overhead of finegrained timestamps
> as well and I'd know I can have problems with timestamps only after a
> system crash.
> I've just got another idea how we could solve the problem: Couldn't we
> always just report coarsegrained timestamp to userspace and provide access
> to finegrained value only to NFS which should know what it's doing?
>
I think that'd be hard. First of all, where would we store the second
timestamp? We can't just truncate the fine-grained ones to come up with
a coarse-grained one. It might also be confusing having nfsd and local
filesystems present different attributes.
> > ---
> > fs/fs_context.c | 18 ++++++++++++++++++
> > fs/inode.c | 4 ++--
> > fs/proc_namespace.c | 1 +
> > fs/stat.c | 2 +-
> > include/linux/fs.h | 4 +++-
> > 5 files changed, 25 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/fs_context.c b/fs/fs_context.c
> > index a0ad7a0c4680..dd4dade0bb9e 100644
> > --- a/fs/fs_context.c
> > +++ b/fs/fs_context.c
> > @@ -44,6 +44,7 @@ static const struct constant_table common_set_sb_flag[] = {
> > { "mand", SB_MANDLOCK },
> > { "ro", SB_RDONLY },
> > { "sync", SB_SYNCHRONOUS },
> > + { "mgtime", SB_MGTIME },
> > { },
> > };
> >
> > @@ -52,18 +53,32 @@ static const struct constant_table common_clear_sb_flag[] = {
> > { "nolazytime", SB_LAZYTIME },
> > { "nomand", SB_MANDLOCK },
> > { "rw", SB_RDONLY },
> > + { "nomgtime", SB_MGTIME },
> > { },
> > };
> >
> > +static inline int check_mgtime(unsigned int token, const struct fs_context *fc)
> > +{
> > + if (token != SB_MGTIME)
> > + return 0;
> > + if (!(fc->fs_type->fs_flags & FS_MGTIME))
> > + return invalf(fc, "Filesystem doesn't support multi-grain timestamps");
> > + return 0;
> > +}
> > +
> > /*
> > * Check for a common mount option that manipulates s_flags.
> > */
> > static int vfs_parse_sb_flag(struct fs_context *fc, const char *key)
> > {
> > unsigned int token;
> > + int ret;
> >
> > token = lookup_constant(common_set_sb_flag, key, 0);
> > if (token) {
> > + ret = check_mgtime(token, fc);
> > + if (ret)
> > + return ret;
> > fc->sb_flags |= token;
> > fc->sb_flags_mask |= token;
> > return 0;
> > @@ -71,6 +86,9 @@ static int vfs_parse_sb_flag(struct fs_context *fc, const char *key)
> >
> > token = lookup_constant(common_clear_sb_flag, key, 0);
> > if (token) {
> > + ret = check_mgtime(token, fc);
> > + if (ret)
> > + return ret;
> > fc->sb_flags &= ~token;
> > fc->sb_flags_mask |= token;
> > return 0;
> > diff --git a/fs/inode.c b/fs/inode.c
> > index 54237f4242ff..fd1a2390aaa3 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -2141,7 +2141,7 @@ EXPORT_SYMBOL(current_mgtime);
> >
> > static struct timespec64 current_ctime(struct inode *inode)
> > {
> > - if (is_mgtime(inode))
> > + if (IS_MGTIME(inode))
> > return current_mgtime(inode);
> > return current_time(inode);
> > }
> > @@ -2588,7 +2588,7 @@ struct timespec64 inode_set_ctime_current(struct inode *inode)
> > now = current_time(inode);
> >
> > /* Just copy it into place if it's not multigrain */
> > - if (!is_mgtime(inode)) {
> > + if (!IS_MGTIME(inode)) {
> > inode_set_ctime_to_ts(inode, now);
> > return now;
> > }
> > diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
> > index 250eb5bf7b52..08f5bf4d2c6c 100644
> > --- a/fs/proc_namespace.c
> > +++ b/fs/proc_namespace.c
> > @@ -49,6 +49,7 @@ static int show_sb_opts(struct seq_file *m, struct super_block *sb)
> > { SB_DIRSYNC, ",dirsync" },
> > { SB_MANDLOCK, ",mand" },
> > { SB_LAZYTIME, ",lazytime" },
> > + { SB_MGTIME, ",mgtime" },
> > { 0, NULL }
> > };
> > const struct proc_fs_opts *fs_infop;
> > diff --git a/fs/stat.c b/fs/stat.c
> > index 6e60389d6a15..2f18dd5de18b 100644
> > --- a/fs/stat.c
> > +++ b/fs/stat.c
> > @@ -90,7 +90,7 @@ void generic_fillattr(struct mnt_idmap *idmap, u32 request_mask,
> > stat->size = i_size_read(inode);
> > stat->atime = inode->i_atime;
> >
> > - if (is_mgtime(inode)) {
> > + if (IS_MGTIME(inode)) {
> > fill_mg_cmtime(stat, request_mask, inode);
> > } else {
> > stat->mtime = inode->i_mtime;
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 4aeb3fa11927..03e415fb3a7c 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -1114,6 +1114,7 @@ extern int send_sigurg(struct fown_struct *fown);
> > #define SB_NODEV BIT(2) /* Disallow access to device special files */
> > #define SB_NOEXEC BIT(3) /* Disallow program execution */
> > #define SB_SYNCHRONOUS BIT(4) /* Writes are synced at once */
> > +#define SB_MGTIME BIT(5) /* Use multi-grain timestamps */
> > #define SB_MANDLOCK BIT(6) /* Allow mandatory locks on an FS */
> > #define SB_DIRSYNC BIT(7) /* Directory modifications are synchronous */
> > #define SB_NOATIME BIT(10) /* Do not update access times. */
> > @@ -2105,6 +2106,7 @@ static inline bool sb_rdonly(const struct super_block *sb) { return sb->s_flags
> > ((inode)->i_flags & (S_SYNC|S_DIRSYNC)))
> > #define IS_MANDLOCK(inode) __IS_FLG(inode, SB_MANDLOCK)
> > #define IS_NOATIME(inode) __IS_FLG(inode, SB_RDONLY|SB_NOATIME)
> > +#define IS_MGTIME(inode) __IS_FLG(inode, SB_MGTIME)
> > #define IS_I_VERSION(inode) __IS_FLG(inode, SB_I_VERSION)
> >
> > #define IS_NOQUOTA(inode) ((inode)->i_flags & S_NOQUOTA)
> > @@ -2366,7 +2368,7 @@ struct file_system_type {
> > */
> > static inline bool is_mgtime(const struct inode *inode)
> > {
> > - return inode->i_sb->s_type->fs_flags & FS_MGTIME;
> > + return inode->i_sb->s_flags & SB_MGTIME;
> > }
> >
> > extern struct dentry *mount_bdev(struct file_system_type *fs_type,
> > --
> > 2.34.1
> >
--
Jeff Layton <jlayton@kernel.org>
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
next prev parent reply other threads:[~2023-09-20 10:35 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-07 19:38 [f2fs-dev] [PATCH v7 00/13] fs: implement multigrain timestamps Jeff Layton
2023-08-07 19:38 ` [f2fs-dev] [PATCH v7 01/13] fs: remove silly warning from current_time Jeff Layton
2023-08-08 9:05 ` Jan Kara
2023-08-07 19:38 ` [f2fs-dev] [PATCH v7 02/13] fs: pass the request_mask to generic_fillattr Jeff Layton
2023-08-07 19:38 ` [f2fs-dev] [PATCH v7 03/13] fs: drop the timespec64 arg from generic_update_time Jeff Layton
2023-08-08 9:25 ` Jan Kara
2023-08-07 19:38 ` [f2fs-dev] [PATCH v7 04/13] btrfs: have it use inode_update_timestamps Jeff Layton
2023-08-08 9:26 ` Jan Kara
2023-08-07 19:38 ` [f2fs-dev] [PATCH v7 05/13] fat: make fat_update_time get its own timestamp Jeff Layton
2023-08-08 9:32 ` Jan Kara
2023-08-09 7:08 ` Christian Brauner
2023-08-09 8:37 ` OGAWA Hirofumi
2023-08-09 8:41 ` OGAWA Hirofumi
2023-08-09 10:10 ` Jeff Layton
2023-08-09 13:36 ` OGAWA Hirofumi
2023-08-09 14:22 ` Jeff Layton
2023-08-09 14:44 ` OGAWA Hirofumi
2023-08-09 14:52 ` OGAWA Hirofumi
2023-08-09 15:00 ` Jan Kara
2023-08-09 15:17 ` OGAWA Hirofumi
2023-08-09 16:30 ` Jeff Layton
2023-08-09 17:44 ` OGAWA Hirofumi
2023-08-09 17:59 ` Jeff Layton
2023-08-09 18:31 ` OGAWA Hirofumi
2023-08-09 19:04 ` Jeff Layton
2023-08-09 20:14 ` OGAWA Hirofumi
2023-08-09 22:07 ` Jeff Layton
2023-08-09 22:37 ` OGAWA Hirofumi
2023-08-07 19:38 ` [f2fs-dev] [PATCH v7 06/13] ubifs: have ubifs_update_time use inode_update_timestamps Jeff Layton
2023-08-08 9:37 ` Jan Kara
2023-08-09 7:06 ` Christian Brauner
2023-08-09 8:23 ` Jan Kara
2023-08-07 19:38 ` [f2fs-dev] [PATCH v7 07/13] xfs: have xfs_vn_update_time gets its own timestamp Jeff Layton
2023-08-08 9:39 ` Jan Kara
2023-08-09 7:04 ` Christian Brauner
2023-08-09 15:57 ` Darrick J. Wong
2023-08-07 19:38 ` [f2fs-dev] [PATCH v7 08/13] fs: drop the timespec64 argument from update_time Jeff Layton
2023-08-08 9:45 ` Jan Kara
2023-08-09 12:31 ` Christian Brauner
2023-08-09 18:38 ` Mike Marshall
2023-08-09 19:05 ` Jeff Layton
2023-08-07 19:38 ` [f2fs-dev] [PATCH v7 09/13] fs: add infrastructure for multigrain timestamps Jeff Layton
2023-08-08 10:02 ` Jan Kara
2023-08-07 19:38 ` [f2fs-dev] [PATCH v7 10/13] tmpfs: add support " Jeff Layton
2023-08-07 19:38 ` [f2fs-dev] [PATCH v7 11/13] xfs: switch to " Jeff Layton
2023-08-07 19:38 ` [f2fs-dev] [PATCH v7 12/13] ext4: " Jeff Layton
2023-09-19 7:05 ` Xi Ruoyao via Linux-f2fs-devel
2023-09-19 11:04 ` Jan Kara
2023-09-19 11:33 ` Jeff Layton
[not found] ` <4511209.uG2h0Jr0uP@nimes>
2023-09-19 16:31 ` Jeff Layton
2023-09-19 20:10 ` Paul Eggert
2023-09-19 20:46 ` Jeff Layton
2023-09-20 8:41 ` Christian Brauner
2023-09-20 8:50 ` Xi Ruoyao via Linux-f2fs-devel
2023-09-20 9:56 ` Jeff Layton
2023-09-20 10:17 ` Jan Kara
2023-09-20 10:30 ` Christian Brauner
2023-09-20 13:03 ` Jan Kara
2023-09-20 10:35 ` Jeff Layton [this message]
2023-09-20 11:48 ` Christian Brauner
2023-09-20 11:56 ` Jeff Layton
2023-09-20 12:08 ` Christian Brauner
2023-09-20 12:26 ` Jeff Layton
2023-09-20 12:30 ` Christian Brauner
2023-09-20 13:57 ` Chuck Lever III
2023-09-20 14:53 ` Christian Brauner
2023-09-20 15:29 ` Jeff Layton
2023-09-20 15:30 ` Jan Kara
2023-09-20 12:48 ` Jan Kara
2023-09-20 14:12 ` Jeff Layton
2023-09-20 15:45 ` Jan Kara
2023-09-20 9:58 ` Jan Kara
2023-08-07 19:38 ` [f2fs-dev] [PATCH v7 13/13] btrfs: convert " Jeff Layton
2023-08-08 10:05 ` Jan Kara
2023-08-09 7:09 ` [f2fs-dev] [PATCH v7 00/13] fs: implement " Christian Brauner
2023-09-04 18:11 ` patchwork-bot+f2fs
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=317d84b1b909b6c6519a2406fcb302ce22dafa41.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=adilger.kernel@dilger.ca \
--cc=agruenba@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=almaz.alexandrovich@paragon-software.com \
--cc=anna@kernel.org \
--cc=asmadeus@codewreck.org \
--cc=bcodding@redhat.com \
--cc=brauner@kernel.org \
--cc=bruno@clisp.org \
--cc=bug-gnulib@gnu.org \
--cc=ceph-devel@vger.kernel.org \
--cc=clm@fb.com \
--cc=cluster-devel@redhat.com \
--cc=coda@cs.cmu.edu \
--cc=codalist@coda.cs.cmu.edu \
--cc=code@tyhicks.com \
--cc=devel@lists.orangefs.org \
--cc=dhowells@redhat.com \
--cc=djwong@kernel.org \
--cc=dsterba@suse.com \
--cc=ecryptfs@vger.kernel.org \
--cc=ericvh@kernel.org \
--cc=gregkh@linuxfoundation.org \
--cc=hdegoede@redhat.com \
--cc=hirofumi@mail.parknet.co.jp \
--cc=hubcap@omnibond.com \
--cc=hughd@google.com \
--cc=huyue2@coolpad.com \
--cc=idryomov@gmail.com \
--cc=jack@suse.com \
--cc=jack@suse.cz \
--cc=jaegeuk@kernel.org \
--cc=jaharkes@cs.cmu.edu \
--cc=jefflexu@linux.alibaba.com \
--cc=jlbec@evilplan.org \
--cc=josef@toxicpanda.com \
--cc=joseph.qi@linux.alibaba.com \
--cc=keescook@chromium.org \
--cc=l@gmail.com \
--cc=linkinjeon@kernel.org \
--cc=linux-afs@lists.infradead.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-cifs@vger.kernel.org \
--cc=linux-erofs@lists.ozlabs.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-f2fs-devel@lists.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-mtd@lists.infradead.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-unionfs@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=linux_oss@crudebyte.com \
--cc=lucho@ionkov.net \
--cc=marc.dionne@auristor.com \
--cc=mark@fasheh.com \
--cc=martin@omnibond.com \
--cc=mcgrof@kernel.org \
--cc=miklos@szeredi.hu \
--cc=ntfs3@lists.linux.dev \
--cc=ocfs2-devel@lists.linux.dev \
--cc=pc@manguebit.com \
--cc=richard@nod.at \
--cc=ronniesahlberg@gmail.com \
--cc=rpeterso@redhat.com \
--cc=samba-technical@lists.samba.org \
--cc=senozhatsky@chromium.org \
--cc=sfrench@samba.org \
--cc=sj1557.seo@samsung.com \
--cc=sprasad@microsoft.com \
--cc=tj@kernel.org \
--cc=tom@talpey.com \
--cc=trond.myklebust@hammerspace.com \
--cc=tytso@mit.edu \
--cc=v9fs@lists.linux.dev \
--cc=viro@zeniv.linux.org.uk \
--cc=xiang@kernel.org \
--cc=xiubli@redhat.com \
--cc=xry111@linuxfromscratch.org \
--cc=yzaikin@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).