From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CCFD323DE8F; Tue, 10 Dec 2024 10:34:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733826851; cv=none; b=I8Ev4Ye0AiUQWTUYTp5GeJ1B/9pc+7NIm+KpWMiRnXKzMaP0dg2ZDMpkzwPtiAOWcJ6DBUyXohoOWIXRAroSqJ0FiqFr7PrZq8ntG9BaNgckQKtKb4Maz+j44gcb1Zwm+nC6q19ghLtsgns3UpesDiFEfEgc99MQIB6UkXDnDVg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733826851; c=relaxed/simple; bh=shJZaFw/E0/b3YLMgrlFnAU3HSKm4vC7XBp0UA4tPEE=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=BCdF7NwuvHw0u/6TXBkNvQp0BV5RZmQsd2IA0+xKxE0qST0m2nyH9Tr+EyzdDTyIeizSN341pGV/WO1jgv4pwX946iiiZkZg/wrsJVf5wZPdi9TqYaJZ/yFkrFaI4GJgPUpRFKqBlJv4wdi7By04uWI2mzvpIbjy50Uhs88BMxc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=mw92oLA8; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="mw92oLA8" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3FFA8C4CED6; Tue, 10 Dec 2024 10:34:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733826851; bh=shJZaFw/E0/b3YLMgrlFnAU3HSKm4vC7XBp0UA4tPEE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=mw92oLA8HwtU9GlwbINp+ydZIOEulhYmdPyRiNXcMteX+h/0R8IayLpa2bnM8GOBU YwA4mdI0VZXHoND2peuPphuoFeOjahRjGtyDhb2AsW2BSad83SuBcciqBYMxDxrVBc Cxp83z/RKgtRExOycjCLMype2dDAIBOhmBndl4VrrGmpANXCMCdYznIZSGA1Vda8vP HgXfcddmlVyna7U1aIT60kmyPd87CKSlltH/PqHMFA8bK5qAhc0IUv8BwtWwyRvfZM N7dG6NXXHQKmmkhRYbS366bw8/d0Zf370ftaREaR1FveOXd55ut55qihuNvhGv9YJV FqbkhXe13NcOQ== Date: Tue, 10 Dec 2024 11:34:05 +0100 From: Christian Brauner To: Chuck Lever Cc: Jeff Layton , Amir Goldstein , Christoph Hellwig , "Darrick J. Wong" , Erin Shepherd , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org, stable , Greg KH , Jens Axboe , Shaohua Li Subject: Re: [PATCH 0/4] exportfs: add flag to allow marking export operations as only supporting file handles Message-ID: <20241210-holunder-caravan-578662919f10@brauner> References: <20241206160358.GC7820@frogsfrogsfrogs> <15628525-629f-49a4-a821-92092e2fa8cb@oracle.com> <337ca572-2bfb-4bb5-b71c-daf7ac5e9d56@oracle.com> <20241210-gekonnt-pigmente-6d44d768469f@brauner> Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20241210-gekonnt-pigmente-6d44d768469f@brauner> On Tue, Dec 10, 2024 at 11:13:16AM +0100, Christian Brauner wrote: > On Mon, Dec 09, 2024 at 12:20:10PM -0500, Chuck Lever wrote: > > On 12/9/24 12:15 PM, Jeff Layton wrote: > > > On Mon, 2024-12-09 at 11:35 -0500, Chuck Lever wrote: > > > > On 12/9/24 11:30 AM, Amir Goldstein wrote: > > > > > On Mon, Dec 9, 2024 at 2:46 PM Christoph Hellwig wrote: > > > > > > > > > > > > On Mon, Dec 09, 2024 at 09:58:58AM +0100, Amir Goldstein wrote: > > > > > > > To be clear, exporting pidfs or internal shmem via an anonymous fd is > > > > > > > probably not possible with existing userspace tools, but with all the new > > > > > > > mount_fd and magic link apis, I can never be sure what can be made possible > > > > > > > to achieve when the user holds an anonymous fd. > > > > > > > > > > > > > > The thinking behind adding the EXPORT_OP_LOCAL_FILE_HANDLE flag > > > > > > > was that when kernfs/cgroups was added exportfs support with commit > > > > > > > aa8188253474 ("kernfs: add exportfs operations"), there was no intention > > > > > > > to export cgroupfs over nfs, only local to uses, but that was never enforced, > > > > > > > so we thought it would be good to add this restriction and backport it to > > > > > > > stable kernels. > > > > > > > > > > > > Can you please explain what the problem with exporting these file > > > > > > systems over NFS is? Yes, it's not going to be very useful. But what > > > > > > is actually problematic about it? Any why is it not problematic with > > > > > > a userland nfs server? We really need to settle that argumet before > > > > > > deciding a flag name or polarity. > > > > > > > > > > > > > > > > I agree that it is not the end of the world and users do have to explicitly > > > > > use fsid= argument to be able to export cgroupfs via nfsd. > > > > > > > > > > The idea for this patch started from the claim that Jeff wrote that cgroups > > > > > is not allowed for nfsd export, but I couldn't find where it is not allowed. > > > > > > > > > > > I think that must have been a wrong assumption on my part. I don't see > > > anything that specifically prevents that either. If cgroupfs is mounted > > > and you tell mountd to export it, I don't see what would prevent that. > > > > > > To be clear, I don't see how you would trick bog-standard mountd into > > > exporting a filesystem that isn't mounted into its namespace, however. > > > Writing a replacement for mountd is always a possibilty. > > > > > > > > I have no issue personally with leaving cgroupfs exportable via nfsd > > > > > and changing restricting only SB_NOUSER and SB_KERNMOUNT fs. > > > > > > > > > > Jeff, Chuck, what is your opinion w.r.t exportability of cgroupfs via nfsd? > > > > > > > > We all seem to be hard-pressed to find a usage scenario where exporting > > > > pseudo-filesystems via NFS is valuable. But maybe someone has done it > > > > and has a good reason for it. > > > > > > > > The issue is whether such export should be consistently and actively > > > > prevented. > > > > > > > > I'm not aware of any specific security issues with it. > > > > > > > > > > > > > > I'm not either, but we are in new territory here. nfsd is a network > > > service, so it does present more of an attack surface vs. local access. > > > > > > In general, you do have to take active steps to export a filesystem, > > > but if someone exports / with "crossmnt", everything mounted is > > > potentially accessible. That's obviously a dumb thing to do, but people > > > make mistakes, and it's possible that doing this could be part of a > > > wider exploit. > > > > > > I tend to think it safest to make exporting via nfsd an opt-in thing on > > > a per-fs basis (along the lines of this patchset). If someone wants to > > > allow access to more "exotic" filesystems, let them argue their use- > > > case on the list first. > > > > If we were starting from scratch, 100% agree. > > > > The current situation is that these file systems appear to be exportable > > (and not only via NFS). The proposal is that this facility is to be > > taken away. This can easily turn into a behavior regression for someone > > if we're not careful. > > So I'm happy to drop the exportfs preliminary we have now preventing > kernfs from being exported but then Christoph and you should figure out > what the security implications of allowing kernfs instances to be > exported areare because I'm not an NFS export expert. > > Filesystems that fall under kernfs that are exportable by NFS as I > currently understand it are at least: > > (1) sysfs > (2) cgroupfs > > Has anyone ever actually tried to export the two and tested what > happens? Because I wouldn't be surprised if this ended in tears but > maybe I'm overly pessimistic. > > Both (1) and (2) are rather special and don't have standard filesystem > semantics in a few places. > > - cgroupfs isn't actually namespace aware. Whereas most filesystems like > tmpfs and ramfs that are mountable inside unprivileged containers are > multi-instance filesystems, aka allocate a new superblock per > container cgroupfs is single-instance with a nasty implementation to > virtualize the per-container view via cgroup namespaces. I wouldn't be > surprised if that ends up being problematic. > > - Cgroupfs has write-time permission checks as the process that is moved > into a cgroup isn't known at open time. That has been exploitable > before this was fixed. > > - Even though it's legacy cgroup has a v1 and v2 mode where v1 is even > more messed up than v2 including the release-agent logic which ends up > issuing a usermode helper to call a binary when a cgroup is released. > > - sysfs potentially exposes all kinds of extremly low-level information > to a remote machine. > > None of this gives me the warm and fuzzy. But that's just me. > > Otherwise, I don't understand what it means that a userspace NFS server > can export kernfs instances. I don't know what that means and what the > contrast to in-kernel NFS server export is and whether that has the same > security implications. If so it's even scary that some random userspace > NFS server can just expose guts like kernfs. > > But if both of you feel that this is safe to do and there aren't any > security issues lurking that have gone unnoticed simply because no one > has really ever exported sysfs or cgroupfs then by all means continue > allowing that. I'm rather skeptical. Amir pointed that sysfs can't be exported as it opts out of kernfs export_operations being set.