From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CDC4125D550; Wed, 14 May 2025 15:42:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747237345; cv=none; b=OhkUilWjhYW2rOSL0hi3zxFra4KRZUe4gM2OMc8ahjZp3/N7vEwLRP69M6YkgunzIw9VJKpndXakcuJ+c+4WPoCTtjj8klHnoswsl3J2+F9M7O0QGWW6DmjfCo2B2rpEkyQmxGJ+aR7z/SBVNgSy545Eazjn5b2OVspdqny7L3c= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747237345; c=relaxed/simple; bh=Q/a6cQKz1u3FABKvdOLs1BTFjrBS9xo3VS6yf8DUyaU=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=G0fX/hPEt1L06g9fGtn2JzUBugZGgdg15rCKfwns7+JVRrEZI1MyHoug0DXPO1WsMoA7RDrAteDHNJG6/cURIuqC7lKICzyCnBk/GjA2tov5fgzhEWEFkzduvTh9bwpApha5prpdEw/TqaaZxwZmu3j0v5wN5DVKTR4Fh6M+eNk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=C7JHteJB; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="C7JHteJB" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 99FEEC4CEE3; Wed, 14 May 2025 15:42:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747237345; bh=Q/a6cQKz1u3FABKvdOLs1BTFjrBS9xo3VS6yf8DUyaU=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=C7JHteJBR7doFhyrzMuEXW2wmLeyQXbDHsOzSSFmidGMsT2DcWkptVtQbsxDjHHYs Kjw600rDwGpLwIeRA5v2pRV27VJu7SPCqSk1RyTO0TEVlzlAGHMfuWUFZFdAZI0uQ3 TL8zZAjJ9pnksAZRWg7REJS3gBLFihieHfwrJraklAeAUsTXhh95FH4ApDm9Nd260P ml0QvWdSmKwesrXwvMNNAVIzAIpPee0/8P2pwfwfbIK0yhiWLYjqvjDopOQ5dUvQMP 3mnedQJDeqjw+XaNeWbKhoUUdmD0MPQjly5S3hf96VOYuEZy40BDq82gaeX85sF6zK KGXlreeK/N0ng== Date: Wed, 14 May 2025 08:42:22 -0700 From: Kees Cook To: Mateusz Guzik Cc: "Eric W. Biederman" , Jann Horn , Christian Brauner , Jorge Merlino , Alexander Viro , Thomas Gleixner , Andy Lutomirski , Sebastian Andrzej Siewior , Andrew Morton , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, John Johansen , Paul Moore , James Morris , "Serge E. Hallyn" , Stephen Smalley , Eric Paris , Richard Haines , Casey Schaufler , Xin Long , "David S. Miller" , Todd Kjos , Ondrej Mosnacek , Prashanth Prahlad , Micah Morton , Fenghua Yu , Andrei Vagin , linux-kernel@vger.kernel.org, apparmor@lists.ubuntu.com, linux-security-module@vger.kernel.org, selinux@vger.kernel.org, linux-hardening@vger.kernel.org, oleg@redhat.com Subject: Re: [PATCH 1/2] fs/exec: Explicitly unshare fs_struct on exec Message-ID: <202505140822.6AB755B6@keescook> References: <20221006082735.1321612-1-keescook@chromium.org> <20221006082735.1321612-2-keescook@chromium.org> <20221006090506.paqjf537cox7lqrq@wittgenstein> <86CE201B-5632-4BB7-BCF6-7CB2C2895409@chromium.org> <871pss17hq.fsf@email.froward.int.ebiederm.org> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Wed, May 14, 2025 at 02:03:31AM +0200, Mateusz Guzik wrote: > On Wed, May 14, 2025 at 12:17 AM Eric W. Biederman > wrote: > > > > Jann Horn writes: > > > > > On Tue, May 13, 2025 at 10:57 PM Kees Cook wrote: > > >> On May 13, 2025 6:05:45 AM PDT, Mateusz Guzik wrote: > > >> >Here is my proposal: *deny* exec of suid/sgid binaries if fs_struct is > > >> >shared. This will have to be checked for after the execing proc becomes > > >> >single-threaded ofc. > > >> > > >> Unfortunately the above Chrome helper is setuid and uses CLONE_FS. > > > > > > Chrome first launches a setuid helper, and then the setuid helper does > > > CLONE_FS. Mateusz's proposal would not impact this usecase. > > > > > > Mateusz is proposing to block the case where a process first does > > > CLONE_FS, and *then* one of the processes sharing the fs_struct does a > > > setuid execve(). Linux already downgrades such an execve() to be > > > non-setuid, which probably means anyone trying to do this will get > > > hard-to-understand problems. Mateusz' proposal would just turn this > > > hard-to-debug edgecase, which already doesn't really work, into a > > > clean error; I think that is a nice improvement even just from the > > > UAPI standpoint. > > > > > > If this change makes it possible to clean up the kernel code a bit, even better. > > > > What has brought this to everyone's attention just now? This is > > the second mention of this code path I have seen this week. > > > > There is a syzkaller report concerning ->in_exec handling, for example: > https://lore.kernel.org/all/67dc67f0.050a0220.25ae54.001f.GAE@google.com/#t > > [...] > > It looks like most of the lsm's also test bprm->unsafe. > > > > So I imagine someone could very carefully separate the non-ptrace case > > from the ptrace case but *shrug*. > > > > Perhaps: > > > > if ((is_setid || __cap_gained(permitted, new_old)) && > > ((bprm->unsafe & ~LSM_UNSAFE_PTRACE) || > > !ptracer_capable(current, new->user_ns))) { > > + if (!(bprm->unsafe & LSM_UNSAFE_PTRACE)) { > > + return -EPERM; > > + } > > /* downgrade; they get no more than they had, and maybe less */ > > if (!ns_capable(new->user_ns, CAP_SETUID) || > > (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS)) { > > new->euid = new->uid; > > new->egid = new->gid; > > } > > new->cap_permitted = cap_intersect(new->cap_permitted, > > old->cap_permitted); > > } > > > > If that is what you want that doesn't look to scary. I don't think > > it simplifies anything about fs->in_exec. As fs->in_exec is set when > > the processing calling exec is the only process that owns the fs_struct. > > With fs->in_exec just being a flag that doesn't allow another thread > > to call fork and start sharing the fs_struct during exec. > > > > *Shrug* > > > > I don't see why anyone would care. It is just a very silly corner case. > > Well I don't see how ptrace factors into any of this, apart from being > a different case of ignoring suid/sgid. I actually think we might want to expand the above bit of logic to use an explicit tests of each LSM_UNSAFE case -- the merged logic is very difficult to read currently. Totally untested expansion, if I'm reading everything correctly: if (bprm->unsafe && (is_setid || __cap_gained(permitted, new_old))) { bool limit_caps = false; bool strip_eid = false; unsigned int unsafe = bprm->unsafe; /* Check each bit */ if (unsafe & LSM_UNSAFE_PTRACE) { if (!ptracer_capable(current, new->user_ns)) limit_caps = true; unsafe &= ~LSM_UNSAFE_PTRACE; } if (unsafe & LSM_UNSAFE_SHARE) { limit_caps = true; if (!ns_capable(new->user_ns, CAP_SETUID)) strip_eid = true; unsafe &= ~LSM_UNSAFE_SHARE; } if (unsafe & LSM_UNSAFE_NO_NEW_PRIVS) { limit_caps = true; if (!ns_capable(new->user_ns, CAP_SETUID)) strip_eid = true; unsafe &= ~LSM_UNSAFE_NO_NEW_PRIVS; } if (WARN(unsafe, "Unhandled LSM_UNSAFE flag: %u?!\n", unsafe)) return -EINVAL; if (limit_caps) { new->cap_permitted = cap_intersect(new->cap_permitted, old->cap_permitted); } if (strip_eid) { new->euid = new->uid; new->egid = new->gid; } } > I can agree the suid/sgid situation vs CLONE_FS is a silly corner > case, but one which needs to be handled for security reasons and which > currently has weirdly convoluted code to do it. > > The intent behind my proposal is very much to get the crapper out of > the way in a future-proof and simple manner. > > In check_unsafe_exec() you can find a nasty loop over threads in the > group to find out if the fs struct is used by anyone outside of said > group. Since fs struct users are not explicitly tracked and any of > them can have different creds than the current thread, the kernel opts > to ignore suid/sgid if there are extra users found (for security > reasons). The loop depends on no new threads showing up as the list is > being walked, to that end copy_fs() can transiently return an error if > it spots ->in_exec. > > The >in_exec field is used as a boolean/flag, but parallel execs using > the same fs struct from different thread groups don't look serialized. > This is supposed to be fine as in this case ->in_exec is not getting > set to begin with, but it gets unconditionally unset on all execs. > > And so on. It's all weird af. 100% :) -- Kees Cook