From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D287351C12; Wed, 4 Mar 2026 13:26:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772630777; cv=none; b=D367NN2EEMpoK1EdS0HZVj0Nz9vhWnXl9AeNAzu4qzpcFPxCybX3E0KospeUIE1bmJQFTwYJl+xs4xyt8epJIX5TjaQGzXCw4231JsBwih68YglESiuD+xM+I5K4CHcweq3NEBtinK+iYIOsfCq+8yvrt1lg4Ir94JVmw0Q5FlY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772630777; c=relaxed/simple; bh=be80xrXFvqsj4B2KiGyXnFiziheG6DObQtQoTIVQ0lg=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=Ce9Nu0JpIGSxliSj4klkYvPxmWDM/8LKnevJz6UUc2nxPXa+gLvSkHlxuet1+D36AHixc+V0gnloTnoNnFm7hYoWTp6lW9dyznjj/qb8P//WF/sg6fZTbhiMEgHBcwNdUwwMt+6kE+ibgi6m/oedx0rW68UAh/PUkfcdGd38A54= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=mDwbmgrl; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="mDwbmgrl" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 259DBC19423; Wed, 4 Mar 2026 13:26:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1772630776; bh=be80xrXFvqsj4B2KiGyXnFiziheG6DObQtQoTIVQ0lg=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=mDwbmgrlU4thinihsg5IwRF4mQMzvYn7gpVOH906/ciACr3TYYPgMhYz0zthqSDxl bDVRTlEkxEFPvnJ+UfzmECyMWbs2yz2CNlxEXPBA+owV6SKnV5Pjv2T6u/gqJnfy1A 2s6udaDINTw3Qq3dln22jdTHETSD1wWkwnKIGJ7n8GHESQf/ImR5T++Lw+Dsgw9KBf XmPmT5G6xpWvULmkkdC7VZxcSbcjbeCD+R8sNYy1PpmJZWGHygYvmI+AUkkgHQOZqa nqFphxQLD5ZRdpDzDB+iT4ZltnSCNnQkuqJ5IFiw7ju8XiYZ0xQtg6ldzIW9CZA9xf iQMRA5UGqnULw== Date: Wed, 4 Mar 2026 14:26:11 +0100 From: Christian Brauner To: Amir Goldstein Cc: jack@suse.cz, Tejun Heo , "T.J. Mercier" , gregkh@linuxfoundation.org, driver-core@lists.linux.dev, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-fsdevel@vger.kernel.org, shuah@kernel.org, linux-kselftest@vger.kernel.org Subject: Re: [PATCH v4 2/3] kernfs: Send IN_DELETE_SELF and IN_IGNORED Message-ID: <20260304-glasig-amtieren-5010757246ae@brauner> References: <20260220055449.3073-3-tjmercier@google.com> <20260224-hetzen-zeitnah-a3e1e08367cc@brauner> Precedence: bulk X-Mailing-List: cgroups@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Tue, Mar 03, 2026 at 03:27:52PM +0100, Amir Goldstein wrote: > On Tue, Feb 24, 2026 at 12:03 PM Christian Brauner wrote: > > > > On Mon, Feb 23, 2026 at 06:27:31AM -1000, Tejun Heo wrote: > > > (cc'ing Christian Brauner) > > > > > > On Sat, Feb 21, 2026 at 06:11:28PM +0200, Amir Goldstein wrote: > > > > On Sat, Feb 21, 2026 at 12:32 AM Tejun Heo wrote: > > > > > > > > > > Hello, Amir. > > > > > > > > > > On Fri, Feb 20, 2026 at 10:11:15PM +0200, Amir Goldstein wrote: > > > > > > > Yeah, that can be useful. For cgroupfs, there would probably need to be a > > > > > > > way to scope it so that it can be used on delegation boundaries too (which > > > > > > > we can require to coincide with cgroup NS boundaries). > > > > > > > > > > > > I have no idea what the above means. > > > > > > I could ask Gemini or you and I prefer the latter ;) > > > > > > > > > > Ah, you chose wrong. :) > > > > > > > > > > > What are delegation boundaries and NFS boundaries in this context? > > > > > > > > > > cgroup delegation is giving control of a subtree to someone else: > > > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git/tree/Documentation/admin-guide/cgroup-v2.rst#n537 > > > > > > > > > > There's an old way of doing it by changing perms on some files and new way > > > > > using cgroup namespace. > > > > > > > > > > > > Would it be possible to make FAN_MNT_ATTACH work for that? > > > > > > > > > > > > FAN_MNT_ATTACH is an event generated on a mntns object. > > > > > > If "cgroup NS boundaries" is referring to a mntns object and if > > > > > > this object is available in the context of cgroup create/destroy > > > > > > then it should be possible. > > > > > > > > > > Great, yes, cgroup namespace way should work then. > > > > > > > > > > > But FAN_MNT_ATTACH reports a mountid. Is there a mountid > > > > > > to report on cgroup create? Probably not? > > > > > > > > > > Sorry, I thought that was per-mount recursive file event monitoring. > > > > > FAN_MARK_MOUNT looks like the right thing if we want to allow monitoring > > > > > cgroup creations / destructions in a subtree without recursively watching > > > > > each cgroup. > > > > > > > > The problem sounds very similar to subtree monitoring for mkdir/rmdir on > > > > a filesystem, which is a problem that we have not yet solved. > > > > > > > > The problem with FAN_MARK_MOUNT is that it does not support the > > > > events CREATE/DELETE, because those events are currently > > > > > > Ah, bummer. > > > > > > > monitored in context where the mount is not available and anyway > > > > what users want to get notified on a deleted file/dir in a subtree > > > > regardless of the mount through which the create/delete was done. > > > > > > > > Since commit 58f5fbeb367ff ("fanotify: support watching filesystems > > > > and mounts inside userns") and fnaotify groups can be associated > > > > with a userns. > > > > > > > > I was thinking that we can have a model where events are delivered > > > > to a listener based on whether or not the uid/gid of the object are > > > > mappable to the userns of the group. > > > > > > Given how different NSes can be used independently of each other, it'd > > > probably be cleaner if it doesn't have to depend on another NS. > > > > > > > In a filesystem, this criteria cannot guarantee the subtree isolation. > > > > I imagine that for delegated cgroups this criteria could match what > > > > you need, but I am basing this on pure speculation. > > > > > > There's a lot of flexibility in the mechanism, so it's difficult to tell. > > > e.g. There's nothing preventing somebody from creating two separate subtrees > > > delegated to the same user. > > > > Delegation is based on inode ownership I'm not sure how well this will > > fit into the fanotify model. Maybe the group logic for userns that > > fanotify added works. I'm not super sure. > > > > > Christian was mentioning allowing separate super for different cgroup mounts > > > in another thread. cc'ing him for context. > > > > If cgroupfs changes to tmpfs semantics where each mount gives you a new > > superblock then it's possible to give each container its own superblock. > > That in turn would make it possible to place fanotify watches on the > > superblock itself. I think you'd roughly need something like the > > following permission model: > > > > It's hard for me to estimate the effort of changing to multi sb model, > but judging by the length of the email I trimmed below, it does not > sound trivial... > > How do you guys feel about something like this patch which associates > an owner userns to every cgroup? > > I have this POC branch from a long time ago [1] to filter all events > on sb by in_userns() criteria. The semantics for real filesystems > were a bit difficult, but perhaps this model can work well for these > pseudo singleton fs. > > I am trying to work on a model that could be useful for both cgroupfs > and nsfs: > > If user is capable in userns, user will be able to set an sb > watch for all events (say DELETE_SELF) on the sb, for objects > whose owner_userns is in_userns() of the fanotify listener. > > This will enable watching for torn down cgroups and namepsaces > which are visible to said user via delegated cgroups mount > or via listns(). > > I would like to allow calling fsnotify_obj_remove() hook with > encoded object fid (e.g. nsfs_file_handle) instead of the vfs inode, > so that cgroupfs/nsfs could report dying objects without needing > to associate a vfs inode with them. > > WDYT? Is this an interesting direction to persure? I'd need to see the patches. I barely remember the details tbh. It doesn't sound crazy though.