From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B0DEC001E0 for ; Wed, 16 Aug 2023 05:09:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4509794002D; Wed, 16 Aug 2023 01:09:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 400E08D001C; Wed, 16 Aug 2023 01:09:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2C90094002D; Wed, 16 Aug 2023 01:09:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 1DF968D001C for ; Wed, 16 Aug 2023 01:09:34 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id CCAED12035B for ; Wed, 16 Aug 2023 05:09:33 +0000 (UTC) X-FDA: 81128789826.26.57D3ADC Received: from mail-qt1-f181.google.com (mail-qt1-f181.google.com [209.85.160.181]) by imf12.hostedemail.com (Postfix) with ESMTP id 02C7440008 for ; Wed, 16 Aug 2023 05:09:31 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b="GcIFyrj/"; spf=pass (imf12.hostedemail.com: domain of jeffxu@google.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=jeffxu@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692162572; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=EWYw8V7xEm9ZxKzv65n28jlWR3dMIFktpfCljWRtgVw=; b=EOtn0EWPcohLnA30uQfd2G2A2n/iF7T6xktVZT7arVcyZIbVyEbTBotmNyl0naFJ6rhx7K 7yRdW2AILzeNZYBFJKTHzp40Wa5LAXQS6mkO9cNv0Ed8CiJhlMGRln1+XdPqIMRudNlCOx fKWb4ebGHWmi5dSA9Aun3geJuY3Vurc= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b="GcIFyrj/"; spf=pass (imf12.hostedemail.com: domain of jeffxu@google.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=jeffxu@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692162572; a=rsa-sha256; cv=none; b=OJXMUV9OymL8aw08ZkQJlZMbyUttZxPAHTGkCjuPYO6SYysfp1RxhQSpHX/wXxg0Nl0o5k x0DK/eJ82rVd1grqadX3VwCt+2kGEUKZkKbRWv18fDaYtNodAy+MxCAZKTrTw+fO0w3Fp/ b2qB2aOlnn2As3IQcDyFXj9c7b7w73g= Received: by mail-qt1-f181.google.com with SMTP id d75a77b69052e-40a47e8e38dso125201cf.1 for ; Tue, 15 Aug 2023 22:09:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1692162571; x=1692767371; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=EWYw8V7xEm9ZxKzv65n28jlWR3dMIFktpfCljWRtgVw=; b=GcIFyrj/ggB6oJDZSePiyLgPx6rtBBCftDf5qEB/y2PfHEaSZwVgRAFk717Vq7viPz 4E/+eU3jWbqx2pUIsjWtHjjGeU1z7VYZBu0FDSnoUwDCTJVB6dxlytG671GLoTkFUv50 mgyvJefV+5iwQw8xGNZ6hJdicXkoGuwVzG3gOuCIwQCpKEZSZ76ygOiEk1jQ8CSt5SLu VbAnN0GG5oX1AVK/AEBJZJ70EUKSnJaU2uwhXJ2P2loifrUXhyHCwVPorwg/X8L5wSC9 nNpnLfN91lFvY+SSE82EZtBEdr8ZwvPJNfBJum4yjh5DVEuSC6hFQ/ViTiCcFQts8V0S DMMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692162571; x=1692767371; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EWYw8V7xEm9ZxKzv65n28jlWR3dMIFktpfCljWRtgVw=; b=DjApjYBm2YRNCggF5QTGKuE4BuJZRw6SJKUJK8SXZBtBDJBUq6zxUHoEF9bdrgoxbg EdUwjehZn97nEbicRSiRTHRtUgwcMUf+FNnDxhO7S0K14Z/14Hr0gbss5LcZxmeEKXlh QF0AWRxkl78kEDhkPXi1t7xphi2Zb0C5bny4BG64ytXEyQvO39mQJoEmBv1onip4ggh7 iMcrvzwLvdQqpADiaXj7MtN3cObWx38DG7wvbRDTKeiHJP9vCWdZL96n5yG+J8qJRjGI rWSR7xO8bQKEmXEaIlykRw+EyWO2E+W9uVN+duzoXuHVXe8LF8Nvjg61xVSSAIMFp2m7 3buA== X-Gm-Message-State: AOJu0YxJP1zlsYjmlvViMr0gl99hOVtA36NaKTVC1s48bYUoYeJF1VHE 3YG6/S8xF15HConSbi9CE6HnsiQe0KW5Gl41SdY/lQ== X-Google-Smtp-Source: AGHT+IEApM/HrVBOBFUDcS71XBKU9AMFPQfP6BMa03FaPq3RB+bSh3eaqt1r6H0QW8Gx2soPN3AzokmVW8dGTIkNeN8= X-Received: by 2002:ac8:59c4:0:b0:410:385c:d1d9 with SMTP id f4-20020ac859c4000000b00410385cd1d9mr108035qtf.25.1692162570997; Tue, 15 Aug 2023 22:09:30 -0700 (PDT) MIME-Version: 1.0 References: <20230814-memfd-vm-noexec-uapi-fixes-v2-0-7ff9e3e10ba6@cyphar.com> In-Reply-To: <20230814-memfd-vm-noexec-uapi-fixes-v2-0-7ff9e3e10ba6@cyphar.com> From: Jeff Xu Date: Tue, 15 Aug 2023 22:08:54 -0700 Message-ID: Subject: Re: [PATCH v2 0/5] memfd: cleanups for vm.memfd_noexec To: Aleksa Sarai Cc: Andrew Morton , Shuah Khan , Kees Cook , Daniel Verkamp , Christian Brauner , Dominique Martinet , stable@vger.kernel.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 02C7440008 X-Rspam-User: X-Stat-Signature: xpkkytjrod8sfp7o6e31ripssdbbwe6g X-Rspamd-Server: rspam01 X-HE-Tag: 1692162571-191443 X-HE-Meta: U2FsdGVkX1/kz7c4sTywFz52FJmt/X612qY4OaKQKqBNca/WBqtxpEwOHg/hGoVmDooAJHISc1szMTV9I10oP1rUBitA3uqMfwY/vU+UaeQ0jbkdJUrrhOWaN/g7KWdwwbSZvXmvwyR8EmIjqEDVe1G++vjO728pzK1II4ZPueFyXalMIq9ZylT+fLu/tonavHyXIPmcbA0W7Ej3RUabTQK0HXNfY1LkZYLHyvXegLrbUUDtAUR9An4G70uP2ThF15pNPg1E5FsJov5o9GfG6wSpyMwo3et0uAo3A2YJXAiYSfcUyM+lQyqcMQU4qvpSB6XhzGukgC74zyBb6YlGpWxAqlPGl+TzIi2Ro/P5FvHEOpNDM15L7Z1qkfFkzNWMGM7eNGFB2rpIwz8JOVxydBajSeHPYogYlnRRbzqkj6sjcRt0kIRJggIqRazijGmj+NduVbyzQLWuPwcm2U9afPJZoW4oiFetYwGXwB8keHmSinXNemaTni/2fHSZ9LF0r3EwlC/eaV6Zzi2Cg0dANJd/yppho6dXhwUO+sAlPJe/+uDGJCheOQocXytj+6skEgVw+KMGfjWngd76FcHPgZpPgAD+AtjcMuGjqox5bivCZ96RJTHFWs9BcNOkCRRUnLrsDd317wVH1HHZv854Yt1Zl7FGMroCmYETcZnHBa1bTRN22k6qmrh5imrBCuTHXz8pKXWvZeXL/CSTEDaKv1RSvcA1UHYTaYclOGbLPQL+Q3726WvLUX6KwVtU68hvA3VQtUGHFQU4bxQ7Ts8w4qiXc9LXSgHs7rDk2Dcpgf0X3B5CnX9Cec1eYnfGjppe0/c+OugPAu7QzVGJ43mquWqP5VGH/9UdmXLN8fgV49Otp/WhzpTTjbGcJGIJbhqK3mGe3XPT3Yxrw433uX/vpqz77utAr7ttcEP+4p7xcWSQIKLcHXriCEztH7c6t/n1dTwmuzCup6VlXEsb6j2 /jFaawDy a3wB598uoGa0Chk/qENJZRHWpAqmNMNFx74LwcY9QZfH/EbisOAuONWxRj+o1dPbIQ7c0oSN31gMKMwQiCJVFUxTmHFDPTsFQlILkbM2NXU+ijMn4NGY9C12OmrxZBZDKSvGMXZ7M7N8688YWfF4x0c4MyZogvXhnQPBup3W/CaAyVuq4A7K3a/xbVRny2AGgCWbKVE1xHEDSfNRQqe0iakiBnYyQ3CkvT/IloSw86y4+F7HQGeUQJ0sPNcGe2UHb6li/RuY8luo0JpupAqH2MZZBnt9XNXadPcjxfDJpw/iJ/zfCHkzqFQiiUVcSa/skEiRl0bi396U8O6UN7B1GstnGT0j3en2DJRGB02cDsxrzSJAIjnilG92Y7v56h3GcveetS1LtEJELImrVRMnW2ksWF7/83PG/zOca2S0ilru5lT1LGrLOMIgZFuuGHWTWN9Woq54s7PuA3JeREPG0E3oUYdJOsibAy+T0w7lo4oXiFRbE/rt05w7tt9rNVdxTD1zB+sUV/AaK4LQdEdu+oSxwqQf8j/vBiHUq X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Aug 14, 2023 at 1:41=E2=80=AFAM Aleksa Sarai wr= ote: > > The most critical issue with vm.memfd_noexec=3D2 (the fact that passing > MFD_EXEC would bypass it entirely[1]) has been fixed in Andrew's > tree[2], but there are still some outstanding issues that need to be > addressed: > > * vm.memfd_noexec=3D2 shouldn't reject old-style memfd_create(2) syscall= s > because it will make it far to difficult to ever migrate. Instead it > should imply MFD_EXEC. > > * The dmesg warnings are pr_warn_once(), which on most systems means > that they will be used up by systemd or some other boot process and > userspace developers will never see it. > > - For the !(flags & (MFD_EXEC | MFD_NOEXEC_SEAL)) case, outputting a > rate-limited message to the kernel log is necessary to tell > userspace that they should add the new flags. > > Arguably the most ideal way to deal with the spam concern[3,4] > while still prompting userspace to switch to the new flags would be > to only log the warning once per task or something similar. > However, adding something to task_struct for tracking this would be > needless bloat for a single pr_warn_ratelimited(). > > So just switch to pr_info_ratelimited() to avoid spamming the log > with something that isn't a real warning. There's lots of > info-level stuff in dmesg, it seems really unlikely that this > should be an actual problem. Most programs are already switching to > the new flags anyway. > > - For the vm.memfd_noexec=3D2 case, we need to log a warning for every > failure because otherwise userspace will have no idea why their > previously working program started returning -EACCES (previously > -EINVAL) from memfd_create(2). pr_warn_once() is simply wrong here. > > * The racheting mechanism for vm.memfd_noexec makes it incredibly > unappealing for most users to enable the sysctl because enabling it > on &init_pid_ns means you need a system reboot to unset it. Given the > actual security threat being protected against, CAP_SYS_ADMIN users > being restricted in this way makes little sense. > > The argument for this ratcheting by the original author was that it > allows you to have a hierarchical setting that cannot be unset by > child pidnses, but this is not accurate -- changing the parent > pidns's vm.memfd_noexec setting to be more restrictive didn't affect > children. > That is not exactly what I said though. >From ChromeOS's position, allowing downgrade is less secure, and this setting was designed to be set at startup/reboot time from the very beginning, such that the kernel command line or as part of the container runtime environment (get passed to sandboxed container) I understand your viewpoint, from another distribution point of view, the original design might be too restricted, so if the kernel wants to weigh more on ease of admin, I'm OK with your approach. Though it is less secure for ChromeOS - i.e. we do try to prevent arbitrary code execution as much as possible, even for CAP_SYSADMIN. And with this change, it is less secure and one more possibility for us to consider. > Instead, switch the vm.memfd_noexec sysctl to be properly > hierarchical and allow CAP_SYS_ADMIN users (in the pidns's owning > userns) to lower the setting as long as it is not lower than the > parent's effective setting. This change also makes it so that > changing a parent pidns's vm.memfd_noexec will affect all > descendants, providing a properly hierarchical setting. The > performance impact of this is incredibly minimal since the maximum > depth of pidns is 32 and it is only checked during memfd_create(2) > and unshare(CLONE_NEWPID). > > * The memfd selftests would not exit with a non-zero error code when > certain tests that ran in a forked process (specifically the ones > related to MFD_EXEC and MFD_NOEXEC_SEAL) failed. > > [1]: https://lore.kernel.org/all/ZJwcsU0vI-nzgOB_@codewreck.org/ > [2]: https://lore.kernel.org/all/20230705063315.3680666-1-jeffxu@google.c= om/ > [3]: https://lore.kernel.org/Y5yS8wCnuYGLHMj4@x1n/ > [4]: https://lore.kernel.org/f185bb42-b29c-977e-312e-3349eea15383@linuxfo= undation.org/ > > Signed-off-by: Aleksa Sarai > --- > Changes in v2: > - Make vm.memfd_noexec restrictions properly hierarchical. > - Allow vm.memfd_noexec setting to be lowered by CAP_SYS_ADMIN as long > as it is not lower than the parent's effective setting. > - Fix the logging behaviour related to the new flags and > vm.memfd_noexec=3D2. > - Add more thorough tests for vm.memfd_noexec in selftests. > - v1: > > --- > Aleksa Sarai (5): > selftests: memfd: error out test process when child test fails > memfd: do not -EACCES old memfd_create() users with vm.memfd_noexec= =3D2 > memfd: improve userspace warnings for missing exec-related flags > memfd: replace ratcheting feature from vm.memfd_noexec with hierarc= hy > selftests: improve vm.memfd_noexec sysctl tests > > include/linux/pid_namespace.h | 39 ++-- > kernel/pid.c | 3 + > kernel/pid_namespace.c | 6 +- > kernel/pid_sysctl.h | 28 ++- > mm/memfd.c | 33 ++- > tools/testing/selftests/memfd/memfd_test.c | 332 +++++++++++++++++++++++= ------ > 6 files changed, 322 insertions(+), 119 deletions(-) > --- > base-commit: 3ff995246e801ea4de0a30860a1d8da4aeb538e7 > change-id: 20230803-memfd-vm-noexec-uapi-fixes-ace725c67b0f > > Best regards, > -- > Aleksa Sarai >