From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91D50CDB474 for ; Tue, 17 Oct 2023 08:35:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 148B28D00FF; Tue, 17 Oct 2023 04:35:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0D41E8D0007; Tue, 17 Oct 2023 04:35:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB5278D00FF; Tue, 17 Oct 2023 04:35:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D2EB58D0007 for ; Tue, 17 Oct 2023 04:35:15 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A7C1B140F4D for ; Tue, 17 Oct 2023 08:35:15 +0000 (UTC) X-FDA: 81354293790.24.594849B Received: from mail-qt1-f170.google.com (mail-qt1-f170.google.com [209.85.160.170]) by imf22.hostedemail.com (Postfix) with ESMTP id D05CAC0002 for ; Tue, 17 Oct 2023 08:35:13 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=i80SUEiI; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of jeffxu@google.com designates 209.85.160.170 as permitted sender) smtp.mailfrom=jeffxu@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697531713; a=rsa-sha256; cv=none; b=s0hsFN2F6ODG8grmgbkY474xHRBGR5jv2vXF02W4NNRfkD8TeZTgCLswe99vFogOWXFLeC HaJuFRXz/+LqPDdBcsiY+BJFiDlGEVdK8mhhp1NvePv0OHTjfXIx3CEXQ756vEgOVgxwKp mHN7gAod95XvN+Gnm3Dm+Xpx1fU4CpY= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=i80SUEiI; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of jeffxu@google.com designates 209.85.160.170 as permitted sender) smtp.mailfrom=jeffxu@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697531713; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MJSddZ0dGB6k3S6Wf36lPI8PKwn536bSm0E7Nwxx8qM=; b=sFdFLEvEPJXilhqOA+rbPZIwY5sdb77NHkpjI6HFkNrxQ9mX5jQQQCiYlyUr7jUdFKPcom 4y2ic9SCgfF0hsNwwFtrv83DIjffuBysQ8I+0EI4Ni1mGNvY0DtF3XGOLeNSoDski6d3g1 SGOhw4y9zbZzeuJbTYE9KRl4qvYbp9w= Received: by mail-qt1-f170.google.com with SMTP id d75a77b69052e-419b53acc11so185951cf.0 for ; Tue, 17 Oct 2023 01:35:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1697531713; x=1698136513; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=MJSddZ0dGB6k3S6Wf36lPI8PKwn536bSm0E7Nwxx8qM=; b=i80SUEiI8yxwgYJWAt3YthWQGCplBamY7Xd17GB4Kd1rvArTYIOguGpyYwlj4MgMPI AWXB3GX1jT5sPT62wKpmSBfBPj4gc+wcOlKJERjl2hRsdxmAUOl66nPVwsUrdtp42r7A 7oWf25foMuw3qSRR8DBgrVucyXF/KecGL4gnMI3cfAYSs6Qj4AgSqKefw/iJyHmPtTog OkP+6MJFRXLHZVxhmbktPioZcUC+G12w/Rz/IE3RzYRiKw4Qp7yhJSzenmZqYiRVHfJ/ bOY5NLmMjqtQZYP8/FSzn5tgFljcR0TnZGFuTyLUmZ+JjF2i4mekap4tinnMgf8A6xvN uePQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697531713; x=1698136513; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MJSddZ0dGB6k3S6Wf36lPI8PKwn536bSm0E7Nwxx8qM=; b=O8Ke4m4DfIQbvjq7j+jQ0OPT0wRS8Sm9GdRUbEzZ8PuKy7OmTBMDeKu694iBTXoWJR i6moZp8rCN3MfHv+TIRgO82CMXkWmrP91XtMFt1GG8h4GxeGDxUTONmFJ+sz158ST9O6 oz1c9nlX/e8Ud/AapISnwPB53YQ0kEbElIilXXxtF3rcSYj1PvWlvssLV/kQwXgImpcy yEQEDZy6mI0DHlurupCnezv3S+oAUw9v7S2HHIU3QLV66ad5CCRvmgu/Ui5DqQRbsh6e EGiqXFGZBgEBqDqFnL5AINcqi/YDmhbVlL65d/JiklP+vWdahVtSed0rz4Io54LNcX1l R0gg== X-Gm-Message-State: AOJu0YzcMK1woTr3yt7wCj6AHlyKIixf2/RwmW8hwGjU7Cd6DqZjID17 R2kXaz4eF7qqJITq3TXek1DEZstA4jmZBkGCfsMH4Q== X-Google-Smtp-Source: AGHT+IGdaallY0/Z0dk1uumP4gP3TI0PsofLDJ8x7UN64T/PbxNrhMxhrOA0/Iq+hFcVA8qb3CsR8jlHXjTnqg3B9Ek= X-Received: by 2002:ac8:7753:0:b0:417:9238:5a30 with SMTP id g19-20020ac87753000000b0041792385a30mr137245qtu.29.1697531712820; Tue, 17 Oct 2023 01:35:12 -0700 (PDT) MIME-Version: 1.0 References: <20231016143828.647848-1-jeffxu@chromium.org> In-Reply-To: From: Jeff Xu Date: Tue, 17 Oct 2023 01:34:35 -0700 Message-ID: Subject: Re: [RFC PATCH v1 0/8] Introduce mseal() syscall To: Matthew Wilcox Cc: jeffxu@chromium.org, akpm@linux-foundation.org, keescook@chromium.org, sroettger@google.com, jorgelo@chromium.org, groeck@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, jannh@google.com, surenb@google.com, alex.sierra@amd.com, apopple@nvidia.com, aneesh.kumar@linux.ibm.com, axelrasmussen@google.com, ben@decadent.org.uk, catalin.marinas@arm.com, david@redhat.com, dwmw@amazon.co.uk, ying.huang@intel.com, hughd@google.com, joey.gouly@arm.com, corbet@lwn.net, wangkefeng.wang@huawei.com, Liam.Howlett@oracle.com, torvalds@linux-foundation.org, lstoakes@gmail.com, mawupeng1@huawei.com, linmiaohe@huawei.com, namit@vmware.com, peterx@redhat.com, peterz@infradead.org, ryan.roberts@arm.com, shr@devkernel.io, vbabka@suse.cz, xiujianfeng@huawei.com, yu.ma@intel.com, zhangpeng362@huawei.com, dave.hansen@intel.com, luto@kernel.org, linux-hardening@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: D05CAC0002 X-Stat-Signature: fx7qe55nbqpngb7xsipxtgphhnjzg7jf X-HE-Tag: 1697531713-49715 X-HE-Meta: U2FsdGVkX1/PKzZ4ilRIiu5uvUUWozqfB5LUyTY8SZODXbjFDBjq+pldvgRChlO+czbr2IT68EP75xWWdefUhNRRXkj/BMsWxbRBfZrFP4cCHDX23ET5SQxaS/GOeNHT2I961dUW+Xlgu4ROmh2Z2T00KolqvcUdlzb/waFcO5f4G/zqbBpBGtG47+Y2DwX/KwCaULu4c7GTml0nXCHbOIAS4iDZQGBhstO42GQzp3Lnbaf0B2fBBMAcWVmjizb6hDRp5AGeynJsdmTDHyk/8Yq3kDXElf/rkF/Veiy4CX/FaoRIHPEXzmRy6HWuw6r+x4Iv+/Z3oeUFyQimp+PAz8mo4VoE3y7c88lxKsgzMb4zLbVsczGqmwqS6bgjjiwECKibSDbimKjQHfV44gBClyD9r5f/N1DQDYtTPjgrqxLBsqfsHcvuBrv0BUGSsYEHzPX2r6VguPGs8wozql3i18jLXsfRVOELzHuLfcIOlBCA4UcwsD6bx5r/PZva+mgm4a5fAU6Vtgv5ImvB6/qmzjKWpCUpyCFJpma2KIKLzt47fMk3X+3FiNXBm5Zq/jnJSMBFYus0xnWXroo17wfXv0/KQTh2R4v8K+aZKc3j71coeu/9FDYYsjhCnBuEB/JNgoKPOGvH1i7P6Ad0l7y7mtDWXXsFzVAi2KhVUfqEJbobCSJc363kklMKGUhdQddaLyd3W1FS2zzv9b6UgVuZzUTrs96V3tk6BhWtZn8mPi/2rNyraEgtLJWIWztWy8CbBBcybp6Zh4yW2W3Jyfj1FGFWpNakHqth2IsP9gD9kGpbB/7OIqJ/eFz8v5mVIKsxMcQ0VN1Rkuo8vYNG6hIxXkIgzBg1A9Bzk+psBGdJbLavIO1G7nBZVQN5yuHKjsF7n2C495nHLjwoBSuvBuwjwODC73Loa0mVGtd2xPSdkhbVHqXpqaCG7WbgDyoBT+tExzN8x/SDyzVESrGCDyd JkN8WtUV 1DrYTbGDKKaqIN5bHioLFeDquGhmZaF1oBA1pilouK/A+yT2Huk1/3joN7lm03xbJFPoXmQNG4vhD3LxMLf7waxK45MlKjSteuyu0ADXtFQVlt5rm4T7o0vhR8zhjTj9zUNvGX+1Awle+S6jU/q63NWiUZ1ICCPjcogn9gOAQhFOA6eX4WKqKMvibE7yHs0U8olTJGZ3KdrufsIPCOBIxCDrnj7rwmN8bLJsbmyGX8wEp7v6fvdDxa8whpaXE9mUhNYdrSjb/b2bWuqOhZ8jV3+f0PjQKtA5FBQCOLgzAhlbzP2gH3y8Kr9VDkQtkYdWK4Y/QhoqseKLEl857HRde/W/R45IDZKdKPI0J X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Matthew. Thanks for your comments and time to review the patchset. On Mon, Oct 16, 2023 at 8:18=E2=80=AFAM Matthew Wilcox wrote: > > On Mon, Oct 16, 2023 at 02:38:19PM +0000, jeffxu@chromium.org wrote: > > Modern CPUs support memory permissions such as RW and NX bits. Linux ha= s > > supported NX since the release of kernel version 2.6.8 in August 2004 [= 1]. > > This seems like a confusing way to introduce the subject. Here, you're > talking about page permissions, whereas (as far as I can tell), mseal() i= s > about making _virtual_ addresses immutable, for some value of immutable. > > > Memory sealing additionally protects the mapping itself against > > modifications. This is useful to mitigate memory corruption issues wher= e > > a corrupted pointer is passed to a memory management syscall. For examp= le, > > such an attacker primitive can break control-flow integrity guarantees > > since read-only memory that is supposed to be trusted can become writab= le > > or .text pages can get remapped. Memory sealing can automatically be > > applied by the runtime loader to seal .text and .rodata pages and > > applications can additionally seal security critical data at runtime. > > A similar feature already exists in the XNU kernel with the > > VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the mimmutable syscall = [4]. > > Also, Chrome wants to adopt this feature for their CFI work [2] and thi= s > > patchset has been designed to be compatible with the Chrome use case. > > This [2] seems very generic and wide-ranging, not helpful. [5] was more > useful to understand what you're trying to do. > > > The new mseal() is an architecture independent syscall, and with > > following signature: > > > > mseal(void addr, size_t len, unsigned int types, unsigned int flags) > > > > addr/len: memory range. Must be continuous/allocated memory, or else > > mseal() will fail and no VMA is updated. For details on acceptable > > arguments, please refer to comments in mseal.c. Those are also fully > > covered by the selftest. > > Mmm. So when you say "continuous/allocated" what you really mean is > "Must have contiguous VMAs" rather than "All pages in this range must > be populated", yes? > There can't be a gap (unallocated memory) in the given range. Those are covered in selftest: test_seal_unmapped_start() test_seal_unmapped_middle() test_seal_unmapped_end() The comments in check_mm_seal() also mentioned that. > > types: bit mask to specify which syscall to seal, currently they are: > > MM_SEAL_MSEAL 0x1 > > MM_SEAL_MPROTECT 0x2 > > MM_SEAL_MUNMAP 0x4 > > MM_SEAL_MMAP 0x8 > > MM_SEAL_MREMAP 0x10 > > I don't understand why we want this level of granularity. The OpenBSD > and XNU examples just say "This must be immutable*". For values of > immutable that allow downgrading access (eg RW to RO or RX to RO), > but not upgrading access (RW->RX, RO->*, RX->RW). > > > Each bit represents sealing for one specific syscall type, e.g. > > MM_SEAL_MPROTECT will deny mprotect syscall. The consideration of bitma= sk > > is that the API is extendable, i.e. when needed, the sealing can be > > extended to madvise, mlock, etc. Backward compatibility is also easy. > > Honestly, it feels too flexible. Why not just two flags to mprotect() > -- PROT_IMMUTABLE and PROT_DOWNGRADABLE. I can see a use for that -- > maybe for some things we want to be able to downgrade and for other > things, we don't. > Having a seal type per syscall type helps to add the feature incrementally. Applications also know exactly what is sealed. I'm not against types such as IMMUTABLE and DOWNGRADEABLE, if we can define what it seals precisely. As Jann pointed out, there have other scenarios that potentially affect IMMUTABLE. Implementing all thoses will take time. And if we missed a case, we could introduce backward compatibility issues to the application. Bitmask will solve this nicely, i.= e. application will need to apply the newly added sealing type explicitly. > I'd like to see some discussion of how this interacts with mprotect(). > As far as I can tell, the intent is to lock the protections/existance > of the mapping, and not to force memory to stay in core. So it's fine > for the kernel to swap out the page and set up a PTE as a swap entry. > It's also fine for the kernel to mark PTEs as RO to catch page faults; > we're concerned with the LOGICAL permissions, and not the page tables. Yes. That is correct. -Jeff Xu