From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5CC25125BC for ; Fri, 29 Dec 2023 16:06:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="lnXpVm9+" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2BF20C433C8; Fri, 29 Dec 2023 16:06:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1703865999; bh=KjYPf3TULfkOlrcDwAPjxBvurjvpefGvKAP9SBZIEaQ=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=lnXpVm9+fte7CfYZmTvwr1qIurnm4iq2BQ8BxRxdswbrRPL/YpGJLYP6azs6nhu79 Ufgq/CWZ4xluaZzWbzknIL7B8OZEV+gZvY+tbNzDbEOzJyl5pCb8DcdQY67K2dyxNe 4pHM7pwlltujQ7adQllpyKYx2n9OeoE86FaqM3uAFjG2mHTjrhXVyezXEfPa06/c4K SPNASWPSd+T7pYD1oBbsHk4J6sURvTtUa1ZjoJ6wmhbiXPeD//QTcaOLhr8L+EvYuR dr+ZyRaGZ8lLkS5DnZBpw4yCI/yAKG6obCLYlScTgXHWJchbVVg8mpZWwNvHM9RQRh NNeeQakMlEJSg== Date: Fri, 29 Dec 2023 10:06:36 -0600 From: Eric Van Hensbergen To: Christian Schoenebeck Cc: asmadeus@codewreck.org, v9fs@lists.linux.dev, ericvh@gmail.com Subject: Re: cache fixes (redux) Message-ID: References: <2850709.f7ZBceQk0b@silver> Precedence: bulk X-Mailing-List: v9fs@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2850709.f7ZBceQk0b@silver> On Fri, Dec 29, 2023 at 01:22:45PM +0100, Christian Schoenebeck wrote: > On Thursday, December 28, 2023 10:48:23 PM CET asmadeus@codewreck.org wrote: > > Eric Van Hensbergen wrote on Thu, Dec 28, 2023 at 09:07:46AM -0600: > > BTW even if qid.path length is increased as part of a 9p protocol change, it > would still be tricky for client implementation side, as that number is too > big as simply being exposed as inode number by client. IIRC virtiofsd is > handling this by automatically creating separate devices when needed. I > haven't checked how exactly, whether there is e.g. some easy way to create > "subdevices" with Linux. > I'll have a look at this and see if there is something we can do that is similar with different mount points. > > (iiuc from a networked filesystem's point of view (e.g. nfs), we ought > > to check i_ino + i_generation as inode number can be reused after the > > previous owner of the number was deleted, but we're guaranteed the > > couple change... Current iget validation logic checks i_generation from getattr. This was one of the things I wasn't sure if we needed and would definitely mess with qid lookup on our side. This was why I was thinking if we marked unlinked inodes in some way on the client side we'd be less likely to hit a collision, but if the inode wasn't really released on server side (because it was open), maybe this doesn't happen and we only have to worry about stale inodes in cache on client that haven't been properly released. But is that ever a valid case? (it may be happening because of cache code being broken, but that seems like a different case -- maybe if we see an inode lose its last link and its not currently open we just garbage collect it? Don't we do that already? I can see where this might screw up if we have multiple instances of inode structs around, which the current code seems to gesture towards, but I think its a bug. I may write up some auditing checks to look for dup inode structs and other things as part of debug.) > > > > In cacheless modes we should drop dentries as soon as no file/process > > use it, ad drop inodes as soon as there's no dentries left. > > In cached mode we should have an extra ref to dentries (droppable on > > memory pressure somehow -- we're not doing that correctly right now > > afaik), that'll keep the inode alive, so we should try very hard to > > always reuse the same inode/dentres to avoid needless lookups. > > Yes, the cache currently just grows indefinitely. The question is what kind of > metric shall be used to decide when to start dropping old things from the > cache. > Some of this is to do with the fid/dentry handling though. Fids stay in loose, which mean dentries stay, which means inodes stay. I think we can be a bit more aggresive with transient fid recovery and decouple them from dentry by attaching to inodes so we can aggresively clean up dcache if we want to without impacting things -- this will mean less interstitial fids although transient fids for cwd like references will exist, but we'll have one instaead of one per path element. I'll probably keep aggressive dentry recovery to a secondary change set since it'll break a bunch of existing assumptions and I think that was what was giving me trouble in my rewrite branch. > > > > Well there's reconnect (connection troubles), and there's reconnect > > (server restart, or migration) -- in nfsv4 the server is expected to ... > > Right, QEMU live-migration doesn't work with 9p ATM, and probably won't in > near future. > Yeah, I think there's a whole kettle of fish here. I'm interested in tilting at this windmill to support simulation checkpoint/restart cases, but I agree its a hard problem so will hold off until this other stuff is sorted. -eric