From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 242D7CE79DF for ; Wed, 20 Sep 2023 14:58:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1695221904; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=4g0uOk8VBf1uS75R3xOmF3VYJZplWwaP5KLfTlgZo2s=; b=Xtb+atUR0v+9U8288hs5j3B70nWRm/1zzuZtpsZ5kQ1mZ5Dnm4XMc/LkX/6TjyGueHbKVT bS8lmjhiGaLv4wCzhZEmCAc74xdm3x5BNpwmzjaJRPyy6MCjq6vbPQjsFcQx78EmMhMSMx RFXIRBGlj4O48BLZHUyfTPhxTj71iJ4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-451-3eEsJ9v1MSmYzFn4r6z5AQ-1; Wed, 20 Sep 2023 10:58:20 -0400 X-MC-Unique: 3eEsJ9v1MSmYzFn4r6z5AQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id BCCAE89C6A4; Wed, 20 Sep 2023 14:58:19 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com [10.30.29.100]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5F27F2156701; Wed, 20 Sep 2023 14:58:19 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (localhost [IPv6:::1]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id D2C8C1946595; Wed, 20 Sep 2023 14:58:13 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id 23681194658D for ; Wed, 20 Sep 2023 14:12:31 +0000 (UTC) Received: by smtp.corp.redhat.com (Postfix) id EFAB440C6EC0; Wed, 20 Sep 2023 14:12:25 +0000 (UTC) Received: from mimecast-mx02.redhat.com (mimecast01.extmail.prod.ext.rdu2.redhat.com [10.11.55.17]) by smtp.corp.redhat.com (Postfix) with ESMTPS id E846140C6EBF for ; Wed, 20 Sep 2023 14:12:25 +0000 (UTC) Received: from us-smtp-inbound-delivery-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CBD5A85A5BE for ; Wed, 20 Sep 2023 14:12:25 +0000 (UTC) Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-524-Tucbh0LwONWQMgrz53MP4Q-1; Wed, 20 Sep 2023 10:12:19 -0400 X-MC-Unique: Tucbh0LwONWQMgrz53MP4Q-1 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id 4AA6ECE1B77; Wed, 20 Sep 2023 14:12:13 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BC577C433C7; Wed, 20 Sep 2023 14:12:04 +0000 (UTC) Message-ID: From: Jeff Layton To: Jan Kara Date: Wed, 20 Sep 2023 10:12:03 -0400 In-Reply-To: <20230920124823.ghl6crb5sh4x2pmt@quack3> References: <20230807-mgctime-v7-0-d1dec143a704@kernel.org> <20230919110457.7fnmzo4nqsi43yqq@quack3> <1f29102c09c60661758c5376018eac43f774c462.camel@kernel.org> <4511209.uG2h0Jr0uP@nimes> <08b5c6fd3b08b87fa564bb562d89381dd4e05b6a.camel@kernel.org> <20230920-leerung-krokodil-52ec6cb44707@brauner> <20230920101731.ym6pahcvkl57guto@quack3> <317d84b1b909b6c6519a2406fcb302ce22dafa41.camel@kernel.org> <20230920124823.ghl6crb5sh4x2pmt@quack3> User-Agent: Evolution 3.48.4 (3.48.4-1.fc38) MIME-Version: 1.0 X-Mimecast-Impersonation-Protect: Policy=CLT - Impersonation Protection Definition; Similar Internal Domain=false; Similar Monitored External Domain=false; Custom External Domain=false; Mimecast External Domain=false; Newly Observed Domain=false; Internal User Name=false; Custom Display Name List=false; Reply-to Address Mismatch=false; Targeted Threat Dictionary=false; Mimecast Threat Dictionary=false; Custom Threat Dictionary=false X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Subject: Re: [Cluster-devel] [PATCH v7 12/13] ext4: switch to multigrain timestamps X-BeenThere: cluster-devel@redhat.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: "\[Cluster devel\]" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Latchesar Ionkov , Martin Brandenburg , Konstantin Komarov , linux-xfs@vger.kernel.org, "Darrick J. Wong" , Dominique Martinet , Christian Schoenebeck , linux-unionfs@vger.kernel.org, David Howells , Chris Mason , Andreas Dilger , Hans de Goede , Marc Dionne , codalist@coda.cs.cmu.edu, linux-afs@lists.infradead.org, linux-mtd@lists.infradead.org, Mike Marshall , Paulo Alcantara , Amir Goldstein , Eric Van Hensbergen , bug-gnulib@gnu.org, Miklos Szeredi , Richard Weinberger , Mark Fasheh , Hugh Dickins , Tyler Hicks , cluster-devel@redhat.com, coda@cs.cmu.edu, linux-mm@kvack.org, Gao Xiang , Iurii Zaikin , Namjae Jeon , Trond Myklebust , Xi Ruoyao , Shyam Prasad N , ecryptfs@vger.kernel.org, Kees Cook , ocfs2-devel@lists.linux.dev, linux-cifs@vger.kernel.org, Chao Yu , linux-erofs@lists.ozlabs.org, Josef Bacik , Tom Talpey , Tejun Heo , Yue Hu , Alexander Viro , Ronnie Sahlberg , David Sterba , Jaegeuk Kim , ceph-devel@vger.kernel.org, Xiubo Li , Ilya Dryomov , OGAWA Hirofumi , Jan Harkes , Christian Brauner , linux-ext4@vger.kernel.org, Theodore Ts'o , Joseph Qi , Greg Kroah-Hartman , v9fs@lists.linux.dev, ntfs3@lists.linux.dev, samba-technical@lists.samba.org, linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, Steve French , Sergey Senozhatsky , Luis Chamberlain , Jeffle Xu , devel@lists.orangefs.org, Anna Schumaker , Jan Kara , linux-fsdevel@vger.kernel.org, Andrew Morton , Sungjong Seo , Bruno Haible , linux-nfs@vger.kernel.org, linux-btrfs@vger.kernel.org, Joel Becker Errors-To: cluster-devel-bounces@redhat.com Sender: "Cluster-devel" X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: kernel.org Content-Type: text/plain; charset="ISO-8859-15" Content-Transfer-Encoding: quoted-printable On Wed, 2023-09-20 at 14:48 +0200, Jan Kara wrote: > On Wed 20-09-23 06:35:18, Jeff Layton wrote: > > On Wed, 2023-09-20 at 12:17 +0200, Jan Kara wrote: > > > If I were a sysadmin, I'd rather opt for something like > > > finegrained timestamps + lazytime (if I needed the finegrained timest= amps > > > functionality). That should avoid the IO overhead of finegrained time= stamps > > > as well and I'd know I can have problems with timestamps only after a > > > system crash. > >=20 > > > I've just got another idea how we could solve the problem: Couldn't w= e > > > always just report coarsegrained timestamp to userspace and provide a= ccess > > > to finegrained value only to NFS which should know what it's doing? > > >=20 > >=20 > > I think that'd be hard. First of all, where would we store the second > > timestamp? We can't just truncate the fine-grained ones to come up with > > a coarse-grained one. It might also be confusing having nfsd and local > > filesystems present different attributes. >=20 > So what I had in mind (and I definitely miss all the NFS intricacies so t= he > idea may be bogus) was that inode->i_ctime would be maintained exactly as > is now. There will be new (kernel internal at least for now) STATX flag > STATX_MULTIGRAIN_TS. fill_mg_cmtime() will return timestamp truncated to > sb->s_time_gran unless STATX_MULTIGRAIN_TS is set. Hence unless you set > STATX_MULTIGRAIN_TS, there is no difference in the returned timestamps > compared to the state before multigrain timestamps were introduced. With > STATX_MULTIGRAIN_TS we return full precision timestamp as stored in the > inode. Then NFS in fh_fill_pre_attrs() and fh_fill_post_attrs() needs to > make sure STATX_MULTIGRAIN_TS is set when calling vfs_getattr() to get > multigrain time. > I agree nfsd may now be presenting slightly different timestamps than use= r > is able to see with stat(2) directly on the filesystem. But is that a > problem? Essentially it is a similar solution as the mgtime mount option > but now sysadmin doesn't have to decide on filesystem mount how to report > timestamps but the stat caller knowingly opts into possibly inconsistent > (among files) but high precision timestamps. And in the particular NFS > usecase where stat is called all the time anyway, timestamps will likely > even be consistent among files. >=20 I like this idea... Would we also need to raise sb->s_time_gran to something corresponding to HZ on these filesystems? If we truncate the timestamps at a granularity corresponding to HZ before presenting them via statx and the like then that should work around the problem with programs that compare timestamps between inodes. With NFSv4, when a filesystem doesn't report a STATX_CHANGE_COOKIE, nfsd will fake one up using the ctime. It's fine for that to use a full fine- grained timestamp since we don't expect to be able to compare that value with one of a different inode. I think we'd want nfsd to present the mtime/ctime values as truncated, just like we would with a local fs. We could hit the same problem of an earlier-looking timestamp with NFS if we try to present the actual fine- grained values to the clients. IOW, I'm convinced that we need to avoid this behavior in most situations. If we do this, then we technically don't need the mount option either. We could still add it though, and have it govern whether fill_mg_cmtime truncates the timestamps before storing them in the kstat. --=20 Jeff Layton