From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0054C4332F for ; Thu, 2 Nov 2023 10:15:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3E9CE8001D; Thu, 2 Nov 2023 06:15:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 39B8A8D000F; Thu, 2 Nov 2023 06:15:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2622F8001D; Thu, 2 Nov 2023 06:15:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 134848D000F for ; Thu, 2 Nov 2023 06:15:27 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id E220B80204 for ; Thu, 2 Nov 2023 10:15:26 +0000 (UTC) X-FDA: 81412607052.04.C9BF910 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf25.hostedemail.com (Postfix) with ESMTP id 65A0DA001C for ; Thu, 2 Nov 2023 10:15:24 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=GZJ9pBBp; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf25.hostedemail.com: domain of jlayton@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=jlayton@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698920125; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3uc+66SODy+/iUhKZ8sgBEhvPgo3d3v1mpKWUi/Vgug=; b=r/0PF453lrYhOA25f/GjAdL9gMhJSjlZ7H0MvH4AzzUtgJM5rIiW9Rf+xwOYYYmgQnIgB3 UQ/SA49wOzLz8KU/luCD3R4IV8POc6HJ9WeJuDi8mSGqjPwBqAdoIjzjc32mnx1LhXQk0T qImr3zn9l3OsC96ATZK6reNjx0WywD8= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=GZJ9pBBp; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf25.hostedemail.com: domain of jlayton@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=jlayton@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698920125; a=rsa-sha256; cv=none; b=RUWnSepZP9s7eoyToJgvv65k0AxK8limskeMYRE58EDoOpVQkzmAds/BeShrW7s628Fv4B DuQ2/3dk8fUanHSG/lQ+Gm7reohKEOptmeOR58UcUx4zP1ZwT8D5HbY3RHqmd+2jgAQ0AA 6wFrCwGd/MiOwasSsopRqPAjCuoFGM4= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 293D6CE20A6; Thu, 2 Nov 2023 10:15:16 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id F3897C433C8; Thu, 2 Nov 2023 10:15:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1698920115; bh=Q2Pw88Smwr2tP11TBVI1gQsAdzs/pokawrj4+qe5P3o=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=GZJ9pBBpUlY69n1aPU0VTU2Nb/im1AXs0bYUAdqqHUvx80F3Pni3V0mAHxi6nJnL9 xKer34VF3ykecUHUrKHi6CjlqBJp5lBq568F3gAIt55L21QnCydXqEXkKbItSfgTUG jYkwZ0XkLZVxiwZ3B1q1wRuKtFgVajJL19+tnsuFgQUc1hHrm+iIEv0ZHMRPrYgqap SuGPqC/4K8YM+rT3O0tru3SquBHNU7/YJxMuonL22dCgE0hyj9ypyDnQOgxxnwBsQr KHJfaTCu3Lo3fBx2/lt1iIS14RL/+rFrGqejCCKJmdGTy0sf+IhgClCvCf2aJXRVdi Tbxy/yjqnFaZw== Message-ID: Subject: Re: [PATCH RFC 2/9] timekeeping: new interfaces for multigrain timestamp handing From: Jeff Layton To: Linus Torvalds , Jan Kara Cc: Dave Chinner , Amir Goldstein , Kent Overstreet , Christian Brauner , Alexander Viro , John Stultz , Thomas Gleixner , Stephen Boyd , Chandan Babu R , "Darrick J. Wong" , Theodore Ts'o , Andreas Dilger , Chris Mason , Josef Bacik , David Sterba , Hugh Dickins , Andrew Morton , Jan Kara , David Howells , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-mm@kvack.org, linux-nfs@vger.kernel.org Date: Thu, 02 Nov 2023 06:15:11 -0400 In-Reply-To: References: <2ef9ac6180e47bc9cc8edef20648a000367c4ed2.camel@kernel.org> <6df5ea54463526a3d898ed2bd8a005166caa9381.camel@kernel.org> <3d6a4c21626e6bbb86761a6d39e0fafaf30a4a4d.camel@kernel.org> <20231101101648.zjloqo5su6bbxzff@quack3> Content-Type: text/plain; charset="ISO-8859-15" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.48.4 (3.48.4-1.fc38) MIME-Version: 1.0 X-Rspamd-Queue-Id: 65A0DA001C X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 1r4jjjwawmwhito8smxh8m5eb4orm9db X-HE-Tag: 1698920124-331955 X-HE-Meta: U2FsdGVkX19uoxtr/EzGMOXfeIJIFUk5NTZol1LcFAmuF5PQX9TbkYO4QvUfoFo4w0saVBwyVtoXRbA9r+10tE+eFftF9A9UjREOIpAE1+mQOgs0cG2Naje3d/ua98iYxZ6sox9wwIO7Fzd+qR/CKLYttkBcjQMSL6Q67BONwhzGs2zjR3BbEbrBMebyCxOz7/8PI0EvoqAk2COZjjNR9PdJigLrFSfT3cDI/5IRwZgNMg4RQ5kYwpsEjCz8O9QGZYYJAbIIrZDlivImNleNc7xTyAoXauRy1MuV+/Z4Fo/Idw1BG//WNoVZ6+s0YcksS6DFdon7tpMJtLY7TJD9RYwh58CtcT7oHADKRDrLvFNcYAG/58IjdZShizssvgDPsIjqsORc/1Rs53S5Xs1/OMHR5XX/9xb3Kv1L4S4x4SFlOFFe8hjPkCuNmlxm0FfYIGlznvZYvLcg/Tdf8iQbwC7E258PNmKN9mUHbbFpPy/z5jEBppDDOCIjlN33XWbcRqVMo+ls9PmTsOLMLwyGF8kb6q5Lvlzxx0/gPzHAZqpEofG9gL8E5CpYuo12/WI9oQYau1u0GlLZqPiVhwCzeoPwCgKgfIFnV4xIEYaeSblxqnGiUDFTHkRuoGhODNWTMde9S+5zDl93/PQDIy4wJrG/hBlnD/X0FHtzuRvmRVS7zPBGcmjpQy8I/I7CkOuP1uyIZdRKDyMRUGA+yp0w7ats/IHxCrBJqfPm4xtNHRRVB+zUmvnuVHILRsP4A8OmjHSGS4XR/7Tm3Td1tP8gnh/FnLuUIFpL/ljK52Km2G4GmC9Y+0bCy6ljneQHdEB68a6nkLz0D4l65NcHqMD/vEuWTaT83rHuS63kKaI5czDmc+J0TIBlV9k5FQKGCDfhKqCb9/O0aovvKgTQ/xzxOIRRGilaExkmlc6RXMhJKn0fHdamEUCb9PaQHSDV/OjCRWY7DA5xYi4DdCsZ7g9 Quhic6xo SpRA7fpzFcDBFOprJkHWnXAeAMec3xn9Mhiq2Zkhef/jdTltIcq4jDivJcgDnc6SobaW7QFfcgWor6+alt2UdYIYnfNnieuq5u4oW/Sv2j0CR0OKkCd5r/2LqNXSie5Lmc+KOj6qBiOS2iEW4cPm+yK3avLSzwP3Y0IvgNy9jo9io8Bp/hzhgsx+/WLDr3r98Un89D8FU33q+TH8SCwktfD5v+FbtOHZrXk9YPmn2OXfWO9p+FfwScrTfkg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, 2023-11-01 at 10:10 -1000, Linus Torvalds wrote: > On Wed, 1 Nov 2023 at 00:16, Jan Kara wrote: > >=20 > > OK, but is this compatible with the current XFS behavior? AFAICS curren= tly > > XFS sets sb->s_time_gran to 1 so timestamps currently stored on disk wi= ll > > have some mostly random garbage in low bits of the ctime. >=20 > I really *really* don't think we can use ctime as a "i_version" > replacement. The whole fine-granularity patches were well-intentioned, > but I do think they were broken. >=20 I have to take some issue here. I still the basic concept is sound. The original implementation was flawed but I think I have a scheme that could address the problems with the multigrain series. That said, everyone seems to be haring off after other solutions. I don't much care which one we end up with, as long as the problem gets fixed. > Note that we can't use ctime as a "i_version" replacement for other > reasons too - you have filesystems like FAT - which people do want to > export - that have a single-second (or is it 2s?) granularity in > reality, even though they report a 1ns value in s_time_gran. >=20 > But here's a suggestion that people may hate, but that might just work > in practice: >=20 > - get rid of i_version entirely >=20 > - use the "known good" part of ctime as the upper bits of the change > counter (and by "known good" I mean tv_sec - or possibly even "tv_sec > / 2" if that dim FAT memory of mine is right) >=20 > - make the rule be that ctime is *never* updated for atime updates > (maybe that's already true, I didn't check - maybe it needs a new > mount flag for nfsd) >=20 > - have a per-inode in-memory and vfs-internal (entirely invisible to > filesystems) "ctime modification counter" that is *NOT* a timestamp, > and is *NOT* i_version >=20 > - make the rule be that the "ctime modification counter" is always > zero, *EXCEPT* if > (a) I_VERSION_QUERIED is set > AND > (b) the ctime modification doesn't modify the "known good" part of ct= ime >=20 > so how the "statx change cookie" ends up being "high bits tv_sec of > ctime, low bits ctime modification cookie", and the end result of that > is: >=20 > - if all the reads happen after the last write (common case), then > the low bits will be zero, because I_VERSION_QUERIED wasn't set when > ctime was modified >=20 > - if you do a write *after* a modification, the ctime cookie is > guaranteed to change, because either the known good (sec/2sec) part of > ctime is new, *or* the counter gets updated >=20 > - if the nfs server reboots, the in-memory counter will be cleared > again, and so the change cookie will cause client cache invalidations, > but *only* for those "ctime changed in the same second _after_ > somebody did a read". >=20 > - any long-time caches of files that don't get modified are all fine, > because they will have those low bits zero and depend on just the > stable part of ctime that works across filesystems. So there should be > no nasty thundering herd issues on long-lived caches on lots of > clients if the server reboots, or atime updates every 24 hours or > anything like that. >=20 > and note that *NONE* of this requires any filesystem involvement > (except for the rule of "no atime changes ever impact ctime", which > may or may not already be true). >=20 > The filesystem does *not* know about that modification counter, > there's no new on-disk stable information. >=20 > It's entirely possible that I'm missing something obvious, but the > above sounds to me like the only time you'd have stale invalidations > is really the (unusual) case of having writes after cached reads, and > then a reboot. >=20 > We'd get rid of "inode_maybe_inc_iversion()" entirely, and instead > replace it with logic in inode_set_ctime_current() that basically does >=20 > - if the stable part of ctime changes, clear the new 32-bit counter >=20 > - if I_VERSION_QUERIED isn't set, clear the new 32-bit counter >=20 > - otherwise, increment the new 32-bit counter >=20 > and then the STATX_CHANGE_COOKIE code basically just returns >=20 > (stable part of ctime << 32) + new 32-bit counter >=20 > (and again, the "stable part of ctime" is either just tv_sec, or it's > "tv_sec >> 1" or whatever). >=20 > The above does not expose *any* changes to timestamps to users, and > should work across a wide variety of filesystems, without requiring > any special code from the filesystem itself. >=20 > And now please all jump on me and say "No, Linus, that won't work, becaus= e XYZ". >=20 > Because it is *entirely* possible that I missed something truly > fundamental, and the above is completely broken for some obvious > reason that I just didn't think of. >=20 Yeah, I think this scheme is problematic for the reasons Trond pointed out. I also don't quite see the advantage of this over what Dave Chinner is proposing (using low-order bits of the ctime nsec field to hold a change counter). --=20 Jeff Layton