From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp-8fab.mail.infomaniak.ch (smtp-8fab.mail.infomaniak.ch [83.166.143.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 71B5617741 for ; Fri, 30 May 2025 12:20:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=83.166.143.171 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748607654; cv=none; b=jKGiyogdq6Q92hiE2DSUtKW9GAA6yeGD4zzPoQymuwqDxwnV5RDqAHAJP/nWxLlcnKEi+HDj5S+M/Bv4ga07tk5TAasEqcYJs2vsrGQTCSyTbsQCfqSnMGZFZ919W28HzBQXjL3lGwx+IjmrfEMtO6Y0FbWUjUZU3wBqcphAzWg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748607654; c=relaxed/simple; bh=usNAqfNU05yWDD1nnvMBPsO4jB44WecabSLtQa+JE+Y=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=ioRTDG8v1qKLZpNwBmK0g5r8EDTfVIEyO4gZf/ZG4kjuU7KuUESDg8BjY5IeIdXmj7fpjmN8NLiUB9bicXdRS8mKRJWmq7T6Of5G41vInPTj6usA65SRWPUHGBoYg3FRdrVZLBtYc2C2aTAHHJXi0bXDR5hTPdIKAvS6hFp+o/k= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net; spf=pass smtp.mailfrom=digikod.net; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b=Iixtc6xD; arc=none smtp.client-ip=83.166.143.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=digikod.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="Iixtc6xD" Received: from smtp-4-0000.mail.infomaniak.ch (unknown [IPv6:2001:1600:7:10::a6b]) by smtp-4-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4b82Q12q1xz8g8; Fri, 30 May 2025 14:20:41 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=digikod.net; s=20191114; t=1748607641; bh=XnOAINnFb0Z0ZcHabmc3qHltopvQ1m/OtdV1pvd9//U=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Iixtc6xDq8SMxPtPVz3eEfOFPJq6OSNq9mg0j6DPkcvi1r3iE8Ae4vp8qmodYpgAd HXk+DJ0PXp2s2uT0p/i1fexLNBhuGQ3aVgdkzut+dav0tI6ZVZNtYRVNlUE0Z/wEnU JtxL5Y7j3xXmCtz0YpVzCWEiXLWncOjsEQ2BghOM= Received: from unknown by smtp-4-0000.mail.infomaniak.ch (Postfix) with ESMTPA id 4b82Q01gBszjtL; Fri, 30 May 2025 14:20:40 +0200 (CEST) Date: Fri, 30 May 2025 14:20:39 +0200 From: =?utf-8?Q?Micka=C3=ABl_Sala=C3=BCn?= To: Song Liu Cc: Al Viro , Jan Kara , bpf@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, kernel-team@meta.com, andrii@kernel.org, eddyz87@gmail.com, ast@kernel.org, daniel@iogearbox.net, martin.lau@linux.dev, brauner@kernel.org, kpsingh@kernel.org, mattbobrowski@google.com, amir73il@gmail.com, repnop@google.com, jlayton@kernel.org, josef@toxicpanda.com, gnoack@google.com, Tingmao Wang Subject: Re: [PATCH bpf-next 3/4] bpf: Introduce path iterator Message-ID: <20250530.euz5beesaSha@digikod.net> References: <20250529173810.GJ2023217@ZenIV> <20250529183536.GL2023217@ZenIV> <20250529201551.GN2023217@ZenIV> <20250529214544.GO2023217@ZenIV> <20250529231018.GP2023217@ZenIV> Precedence: bulk X-Mailing-List: linux-security-module@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Infomaniak-Routing: alpha On Thu, May 29, 2025 at 05:42:16PM -0700, Song Liu wrote: > On Thu, May 29, 2025 at 4:10 PM Al Viro wrote: > > > > On Thu, May 29, 2025 at 03:13:10PM -0700, Song Liu wrote: > > > > > Is it an issue if we only hold a reference to a MNT_LOCKED mount for > > > short period of time? "Short period" means it may get interrupted, page > > > faults, or wait for an IO (read xattr), but it won't hold a reference to the > > > mount and sleep indefinitely. > > > > MNT_LOCKED mount itself is not a problem. What shouldn't be done is > > looking around in the mountpoint it covers. It depends upon the things > > you are going to do with that, but it's very easy to get an infoleak > > that way. > > > > > > OTOH, there's a good cause for moving some of the flags, MNT_LOCKED > > > > included, out of ->mnt_flags and into a separate field in struct mount. > > > > However, that would conflict with any code using that to deal with > > > > your iterator safely. > > > > > > > > What's more, AFAICS in case of a stack of mounts each covering the root > > > > of parent mount, you stop in each of those. The trouble is, umount(2) > > > > propagation logics assumes that intermediate mounts can be pulled out of > > > > such stack without causing trouble. For pathname resolution that is > > > > true; it goes through the entire stack atomically wrt that stuff. > > > > For your API that's not the case; somebody who has no idea about an > > > > intermediate mount being there might get caught on it while it's getting > > > > pulled from the stack. > > > > > > > > What exactly do you need around the mountpoint crossing? > > > > > > I thought about skipping intermediate mounts (that are hidden by > > > other mounts). AFAICT, not skipping them will not cause any issue. > > > > It can. Suppose e.g. that /mnt gets propagation from another namespace, > > but not the other way round and you mount something on /mnt. > > > > Later, in that another namespace, somebody mounts something on wherever > > your /mnt gets propagation to. A copy will be propagated _between_ > > your /mnt and whatever you've mounted on top of it; it will be entirely > > invisible until you umount your /mnt. At that point the propagated > > copy will show up there, same as if it had appeared just after your > > umount. Prior to that it's entirely invisible. If its original > > counterpart in another namespace gets unmounted first, the copy will > > be quietly pulled out. > > Thanks for sharing this information! > > > Note that choose_mountpoint_rcu() callers (including choose_mountpoint()) > > will have mount_lock seqcount sampled before the traversal _and_ recheck > > it after having reached the bottom of stack. IOW, if you traverse .. > > on the way to root, you won't get caught on the sucker being pulled out. > > In some of our internal discussions, we talked about using > choose_mountpoint() instead of follow_up(). I didn't go that direction in this > version because it requires holding "root". But if it makes more sense > to use, choose_mountpoint(), we sure can hold "root". > > Alternatively, I think it is also OK to pass a zero'ed root to > choose_mountpoint(). > > > Your iterator, OTOH, would stop in that intermediate mount - and get > > an unpleasant surprise when it comes back to do the next step (towards > > /mnt on root filesystem, that is) and finds that path->mnt points > > to something that is detached from everything - no way to get from > > it any further. That - despite the fact that location you've started > > from is still mounted, still has the same pathname, etc. and nothing > > had been disrupted for it. > > > > And yes, landlock has a narrow race in the matching place. Needs to > > be fixed. At least it does ignore those as far as any decisions are > > concerned... Thanks for pointing this out. In the case of Landlock, walking to a disconnected mount point (because of this umount race condition) would deny the requested access whereas it may be allowed otherwise. This is not a security issue but still an issue because an event unrelated to the request (umount) can abort a path resolution, which should not be the case. Without access to mount_lock, what would be the best way to fix this Landlock issue while making it backportable? > > If we update path_parent in this patchset with choose_mountpoint(), > and use it in Landlock, we will close this race condition, right? choose_mountpoint() is currently private, but if we add a new filesystem helper, I think the right approach would be to expose follow_dotdot(), updating its arguments with public types. This way the intermediates mount points will not be exposed, RCU optimization will be leveraged, and usage of this new helper will be simplified. > > > > > Note, BTW, that it might be better off by doing that similar to > > d_path.c - without arseloads of dget_parent/dput et.al.; not sure > > how feasible it is, but if everything in it can be done under > > rcu_read_lock(), that's something to look into. > > I don't think we can do everything here inside rcu_read_lock(). > But d_path.c does have some code we can probably reuse or > learn from. Also, we probably need two variations of iterators, > one walk until absolute root, while the other walk until root of > current->fs, just like d_path() vs. d_absolute_path(). Does this > sound reasonable? Passing the root to a public follow_dotdot() helper should do the job. > > Thanks, > Song >