From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C53ACE79D5 for ; Wed, 20 Sep 2023 14:12:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235125AbjITOMT (ORCPT ); Wed, 20 Sep 2023 10:12:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36074 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234579AbjITOMR (ORCPT ); Wed, 20 Sep 2023 10:12:17 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EBE59AD; Wed, 20 Sep 2023 07:12:11 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BC577C433C7; Wed, 20 Sep 2023 14:12:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1695219131; bh=4g0uOk8VBf1uS75R3xOmF3VYJZplWwaP5KLfTlgZo2s=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=BENAfEB/HgQmzD5fI+RYaDQKDw2xSO1A8gbs1mzNEAHZ5eF8Mrg+6+5ditzXbNu5p pJHpwniOJZIpzKCEwFQ09IJaAHI7KYgRKjCiMQRUtr4BdK4vCfNSLf6m1g3NIG7Bw2 t18SQAAwoEYSiyBR9LIAUbvAdpXSkrsXjXrkCLr8mBWZQuFbW6PsG9s7WcFHD8kdyJ 03cq+sqpDgP8w5cLlAdzE9yzPnkbAPMB5ds0dQ0he+hsuib1kWK9O/1OH5JLJOL14v Bz5P7t1MhdAWiT9KOpNZ1oU51CzNnamG4hM2zLClIkwQAUTtJF6gIvJ3kqYSLPkijU /Ijv/Gjrhoy5g== Message-ID: Subject: Re: [PATCH v7 12/13] ext4: switch to multigrain timestamps From: Jeff Layton To: Jan Kara Cc: Christian Brauner , Bruno Haible , Xi Ruoyao , bug-gnulib@gnu.org, Alexander Viro , Eric Van Hensbergen , Latchesar Ionkov , Dominique Martinet , Christian Schoenebeck , David Howells , Marc Dionne , Chris Mason , Josef Bacik , David Sterba , Xiubo Li , Ilya Dryomov , Jan Harkes , coda@cs.cmu.edu, Tyler Hicks , Gao Xiang , Chao Yu , Yue Hu , Jeffle Xu , Namjae Jeon , Sungjong Seo , Jan Kara , Theodore Ts'o , Andreas Dilger , Jaegeuk Kim , OGAWA Hirofumi , Miklos Szeredi , Bo b Peterson , Andreas Gruenbacher , Greg Kroah-Hartman , Tejun Heo , Trond Myklebust , Anna Schumaker , Konstantin Komarov , Mark Fasheh , Joel Becker , Joseph Qi , Mike Marshall , Martin Brandenburg , Luis Chamberlain , Kees Cook , Iurii Zaikin , Steve French , Paulo Alcantara , Ronnie Sahlberg , Shyam Prasad N , Tom Talpey , Sergey Senozhatsky , Richard Weinberger , Hans de Goede , Hugh Dickins , Andrew Morton , Amir Goldstein , "Darrick J. Wong" , Benjamin Coddington , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, v9fs@lists.linux.dev, linux-afs@lists.infradead.org, linux-btrfs@vger.kernel.org, ceph-devel@vger.kernel.org, codalist@coda.cs.cmu.edu, ecryptfs@vger.kernel.org, linux-erofs@lists.ozlabs.org, linux-ext4@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, cluster-devel@redhat.com, linux-nfs@vger.kernel.org, ntfs3@lists.linux.dev, ocfs2-devel@lists.linux.dev, devel@lists.orangefs.org, linux-cifs@vger.kernel.org, samba-technical@lists.samba.org, linux-mtd@lists.infradead.org, linux-mm@kvack.org, linux-unionfs@vger.kernel.org, linux-xfs@vger.kernel.org Date: Wed, 20 Sep 2023 10:12:03 -0400 In-Reply-To: <20230920124823.ghl6crb5sh4x2pmt@quack3> References: <20230807-mgctime-v7-0-d1dec143a704@kernel.org> <20230919110457.7fnmzo4nqsi43yqq@quack3> <1f29102c09c60661758c5376018eac43f774c462.camel@kernel.org> <4511209.uG2h0Jr0uP@nimes> <08b5c6fd3b08b87fa564bb562d89381dd4e05b6a.camel@kernel.org> <20230920-leerung-krokodil-52ec6cb44707@brauner> <20230920101731.ym6pahcvkl57guto@quack3> <317d84b1b909b6c6519a2406fcb302ce22dafa41.camel@kernel.org> <20230920124823.ghl6crb5sh4x2pmt@quack3> Content-Type: text/plain; charset="ISO-8859-15" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.48.4 (3.48.4-1.fc38) MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Wed, 2023-09-20 at 14:48 +0200, Jan Kara wrote: > On Wed 20-09-23 06:35:18, Jeff Layton wrote: > > On Wed, 2023-09-20 at 12:17 +0200, Jan Kara wrote: > > > If I were a sysadmin, I'd rather opt for something like > > > finegrained timestamps + lazytime (if I needed the finegrained timest= amps > > > functionality). That should avoid the IO overhead of finegrained time= stamps > > > as well and I'd know I can have problems with timestamps only after a > > > system crash. > >=20 > > > I've just got another idea how we could solve the problem: Couldn't w= e > > > always just report coarsegrained timestamp to userspace and provide a= ccess > > > to finegrained value only to NFS which should know what it's doing? > > >=20 > >=20 > > I think that'd be hard. First of all, where would we store the second > > timestamp? We can't just truncate the fine-grained ones to come up with > > a coarse-grained one. It might also be confusing having nfsd and local > > filesystems present different attributes. >=20 > So what I had in mind (and I definitely miss all the NFS intricacies so t= he > idea may be bogus) was that inode->i_ctime would be maintained exactly as > is now. There will be new (kernel internal at least for now) STATX flag > STATX_MULTIGRAIN_TS. fill_mg_cmtime() will return timestamp truncated to > sb->s_time_gran unless STATX_MULTIGRAIN_TS is set. Hence unless you set > STATX_MULTIGRAIN_TS, there is no difference in the returned timestamps > compared to the state before multigrain timestamps were introduced. With > STATX_MULTIGRAIN_TS we return full precision timestamp as stored in the > inode. Then NFS in fh_fill_pre_attrs() and fh_fill_post_attrs() needs to > make sure STATX_MULTIGRAIN_TS is set when calling vfs_getattr() to get > multigrain time. > I agree nfsd may now be presenting slightly different timestamps than use= r > is able to see with stat(2) directly on the filesystem. But is that a > problem? Essentially it is a similar solution as the mgtime mount option > but now sysadmin doesn't have to decide on filesystem mount how to report > timestamps but the stat caller knowingly opts into possibly inconsistent > (among files) but high precision timestamps. And in the particular NFS > usecase where stat is called all the time anyway, timestamps will likely > even be consistent among files. >=20 I like this idea... Would we also need to raise sb->s_time_gran to something corresponding to HZ on these filesystems? If we truncate the timestamps at a granularity corresponding to HZ before presenting them via statx and the like then that should work around the problem with programs that compare timestamps between inodes. With NFSv4, when a filesystem doesn't report a STATX_CHANGE_COOKIE, nfsd will fake one up using the ctime. It's fine for that to use a full fine- grained timestamp since we don't expect to be able to compare that value with one of a different inode. I think we'd want nfsd to present the mtime/ctime values as truncated, just like we would with a local fs. We could hit the same problem of an earlier-looking timestamp with NFS if we try to present the actual fine- grained values to the clients. IOW, I'm convinced that we need to avoid this behavior in most situations. If we do this, then we technically don't need the mount option either. We could still add it though, and have it govern whether fill_mg_cmtime truncates the timestamps before storing them in the kstat. --=20 Jeff Layton