From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4BF0C77B73 for ; Thu, 18 May 2023 20:20:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4B9EE280001; Thu, 18 May 2023 16:20:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 468D7900003; Thu, 18 May 2023 16:20:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 33139280001; Thu, 18 May 2023 16:20:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 24F33900003 for ; Thu, 18 May 2023 16:20:49 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 8A6A1AE3DF for ; Thu, 18 May 2023 20:20:48 +0000 (UTC) X-FDA: 80804494176.19.070C37C Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) by imf20.hostedemail.com (Postfix) with ESMTP id A1CC11C0004 for ; Thu, 18 May 2023 20:20:45 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=rdm8g1Vf; spf=pass (imf20.hostedemail.com: domain of jeffxu@google.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=jeffxu@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684441245; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MTWuRRNk/cl6sDlbBMZskmGX+zk6elTnHmxsNN+hPKI=; b=i83ORjaM1Id150lKptop/noGVQRaoMlV3ZEovpge8mmfIVX8XD6RG9csDHCjODpJdXDYXO HXLFmlQA43GXGzaDFNuIF3qXkpB2FKMDaDfS7Ef1RnCgVZeOFHnkBLoKu4VDLNKYByiu6J lGiAd87+89xrWROgZery/Qw9Pvxz8js= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684441245; a=rsa-sha256; cv=none; b=4y+yP+/p7Wc+6JgVNx1jQRegK3L/r3faR94PyyXyTkpGX/FAQpyCOUeugT/wYV6NQ62Pnb TLkPcBQ5MT/A7pzU/iMnerSAaC1Y57BxKzbq4L2lB1gRUzDJXK6J6N3Ms6GISTyP8VDKm2 HnXiZnp/Uo57wbA8CI71kWC4MS1tRw8= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=rdm8g1Vf; spf=pass (imf20.hostedemail.com: domain of jeffxu@google.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=jeffxu@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ed1-f48.google.com with SMTP id 4fb4d7f45d1cf-50dba8a52dcso2027a12.0 for ; Thu, 18 May 2023 13:20:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684441244; x=1687033244; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=MTWuRRNk/cl6sDlbBMZskmGX+zk6elTnHmxsNN+hPKI=; b=rdm8g1VfSvgnUqlC2P98NKJnhO16smcqE5rlZtC93FgGVUdOwnqPhxBW1aoWXVefas NlmhhOfi6kUSpGnQRR32JUlqq7QQbq7nVf8d44+k77elnlG7VRB7WfOC4LLxUt0RmEFO r3c+BwOPW5JMd/Pj6FZKqgV67+Xtp1lK+4l1251jlNuhpWsTR9xsKBq64zaNrHtu2/jx Ck+gYRhuh8cKvF995++9baMmAtCxMnP3AKVUEGiMw36yXFaIApXBSP5dir+3vSkFp9Ay 2UtydFJ911sxfhPxw75zc5C2dbqjJ3YJcUCbLJfYgeZM/d7sWU9r44yAwu1Hzq9csPsm 8HVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684441244; x=1687033244; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MTWuRRNk/cl6sDlbBMZskmGX+zk6elTnHmxsNN+hPKI=; b=A1EZgsB2H/1QejOwHlSK/t41Cl32UUbIkuEd0NCLhmtStix4ww0H0P15hDsN98mU0A qbNhy1u+lI0F724G2tee/npc8fEw4Sl6C9jxjPifDZBD6KbkYVjAIa65DnPLvraVjz8N 9Boy1bB/C82R+nPRcnuQ6OlLVSSg30jekXhdUPgLJdPq2ubs1R5I1yZiE489iUxUQ1Rm mM795eJmh886wI7Y/OSfrdK72tJOGqpNvddwDA3kw/V6zgQofhVbwHvV/TYewGKpjPbH 6F04dDDfV1f8OpVHdkBidmN1KyUxGBQedpMWfELnnlbAvSmKnybeS+oKlrksy9Ugigf6 ntnQ== X-Gm-Message-State: AC+VfDzJKg0U5CldcPh4uBNFz69L80tho5hPQYZkfXZJtg65gVkZEiRu wGQEyCNXiZVLSdFcdAHu+71iAGj590Q+tcNOoIOKIg== X-Google-Smtp-Source: ACHHUZ4szDopDyKvT5uuGOC/RGIO4PSKNo0vEobwjmRHQfdAHmO+IiH6Y82C1wQQUgdrCaYZFjmP9jsUjXLcFNRwos4= X-Received: by 2002:a50:9ee3:0:b0:50b:f6ce:2f3d with SMTP id a90-20020a509ee3000000b0050bf6ce2f3dmr4467edf.0.1684441243652; Thu, 18 May 2023 13:20:43 -0700 (PDT) MIME-Version: 1.0 References: <20230515130553.2311248-1-jeffxu@chromium.org> <2bcffc9f-9244-0362-2da9-ece230055320@intel.com> <2b14036e-aed8-4212-bc0f-51ec4fe5a5c1@intel.com> In-Reply-To: <2b14036e-aed8-4212-bc0f-51ec4fe5a5c1@intel.com> From: Jeff Xu Date: Thu, 18 May 2023 13:20:06 -0700 Message-ID: Subject: Re: [PATCH 0/6] Memory Mapping (VMA) protection using PKU - set 1 To: Dave Hansen Cc: =?UTF-8?Q?Stephen_R=C3=B6ttger?= , jeffxu@chromium.org, luto@kernel.org, jorgelo@chromium.org, keescook@chromium.org, groeck@chromium.org, jannh@google.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, linux-hardening@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: A1CC11C0004 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: hsibaesb9snudrru67c7jfwuigdu4a8k X-HE-Tag: 1684441245-236884 X-HE-Meta: U2FsdGVkX19PTvxuYtThs2xa4Yx4FZmbJcesrnBovFuL57BNxQ7d0Y14/6IiDWU47Qld7TdACKetz9yDRuIM9qha8jZeevHxFglM3iJ4IH9A7Ba/6kK2Cm8OeslzxxiCK4DtzmXXRej0I826lDzk2+C3AiEfO2c61SVusxBP5g45R0ILsrRalEJJWAfWcg1by64mFYMI0kUvwmU2uLka0bKPZhmz+hkzCkml7Ab+WYVLR+JfuwwFQa2yIFk9O5AUEflF6JznihtbQVchLRhTM0t2RfV828ZUizUdl1sUlS+Y3QeHmb4os3tyyGohz+HhzRY8sLjMhZUbIFc6CHclz4TFrqvBBUkR3dmxJwkDxtqySyMf/JGlh6LPOWnaVUtn2VaEWwbHrjmdnOcDfsB0CiowuYNljL0JSY3gR9PLoQOucEJhF3ClhUPA1XCuQubnmffjThpbUA+1uE/uzedthr4LaFSX11SymwC2/nC+YI7qE7yjKx6MRKQEJzzIJc61y1uM06x81rZ3M6aduln8g76y/Fff4Z93k4oI807lZlHcfKfV3epqA7OlHfaATLuLxHbny8o4PrxQm9rKxULvbeCbPMAicbJ+geHVLVBPgW0HZy8gPB9jclLGk/XFufe0Z+/1vpyzMG45679u28E0xgORuQCOhyQCkgBYm1dZslNKExnWHm0K55XFZLSJnggMiA0Ztwk7YnuHRKdA+ITdS2t6BW0b6TIcyrEi5X1tm6z3fVs9OBYtO9a5HwM+NBiqaEusZ6tLv9KHMl1wZCd8qtkMKlBw7QvXyWCQe2qx6n/ohHnFyzKWJrZldodlWmiHVMDRvLZwwoQnXyKLjXU21b6JIRpoQSg2EeXR2PWlc9tANIC3mKa/EXgYsynU1Df9ykhjalNHg7Nx0QS/mBktMEXk0GRsZLDXvFxovpGgJgc7pvDCKyzd46ODkao9aFIAanBYfIMY6rOZSJWruu9 Toz2OXu5 uDpPvujML2lWkHw/cL6NfGUa2HS0ALonoHTcj4oFj21o35muFvN0QSTHD2+4O/0YxtW8L9MjWff+8pchDF7zTvGxcwNXAF6AHeC6K/ccm+GV74WaPvLtc5O8P9dR4XualAQFB9GWWmYgqC4qKqhWBJtkWJBzC+IM1a0wLWeKFMTdIRXC1ildU3ecr4IE2gEt17KRDFI8cHxUXyADyDzHUZ94La9ocb9zIgOjYjlktJxW7FgHYflDYXe5itktm187mPTDQlF3xiUXx3I5ZVJTDmBYB74TbOnvKHwW66s5dTkTRajbffZCkHJ5AiekGmk4sPp3jxbgMzu7oQ30Gq8qrX4AhFAHGfqYoaXhFsey6eK0+u3w= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello Dave, Thanks for your email. On Thu, May 18, 2023 at 8:38=E2=80=AFAM Dave Hansen = wrote: > > On 5/17/23 16:48, Jeff Xu wrote: > > However, there are a few challenges I have not yet worked through. > > First, the code needs to track when the first signaling entry occurs > > (saving the PKRU register to the thread struct) and when it is last > > returned (restoring the PKRU register from the thread struct). > > Would tracking signal "depth" work in the face of things like siglongjmp? > Thank you for your question! I am eager to learn more about this area and I worry about blind spots. I will investigate and get back to you. > Taking a step back... > > Here's my concern about this whole thing: it's headed down a rabbit hole > which is *highly* specialized both in the apps that will use it and the > attacks it will mitigate. It probably *requires* turning off a bunch of > syscalls (like io_uring) that folks kinda like in general. > ChromeOS currently disabled io_uring, but it is not required to do so. io_uring supports the IORING_OP_MADVICE operation, which calls the do_madvise() function. This means that io_uring will have the same pkey checks as the madvice() system call. From that perspective, we will fully support io_uring for this feature. > We're balancing that highly specialized mitigation with a feature that > add new ABI, touches core memory management code and signal handling. > The ABI change uses the existing flag field in pkey_alloc() which is reserved. The implementation is backward compatible with all existing pkey usages in both kernel and user space. Or do you have other concerns about ABI in mind ? Yes, you are right about the risk of touching core mm code. To minimize the risk, I try to control the scope of the change (it is about 3 lines in mprotect, more in munmap but really just 3 effective lines from syscall entry). I added new self-tests in mm to make sure it doesn't regress in api behavior. I run those tests before and after my kernel code change to make sure the behavior remains the same, I tested it on 5.15 and 6.1 and 6.4-rc1. Actually, the testing discovered a behavior change for mprotect() between 6.1 and 6.4 (not from this patch, there are refactoring works going on in mm) see this thread [1] I hope those steps will help to mitigate the risk. Agreed on signaling handling is a tough part: what do you think about the approach (modifying PKRU from saved stack after XSAVE), is there a blocker ? > On the x86 side, PKRU is a painfully special snowflake. It's exposed in > the "XSAVE" ABIs, but not actually managed *with* XSAVE in the kernel. > This would be making it an even more special snowflake because it would I admit I'm quite ignorant on XSAVE to understand the above statement, and how that is related. Could you explain it to me please ? And what is in your mind that might improve the situation ? > need new altstack ABI and handling. > I thought adding protected memory support to signaling handling is an independent project with its own weight. As Jann Horn points out in [2]: "we could prevent the attacker from corrupting the signal context if we can protect the signal stack with a pkey." However, the kernel will send SIGSEGV when the stack is protected by PKEY, so there is a benefit to make this work. (Maybe Jann can share some more thoughts on the benefits) And I believe we could do this in a way with minimum ABI change, as below: - allocate PKEY with a new flag (PKEY_ALTSTACK) - at sigaltstack() call, detect the memory is PKEY_ALTSTACK protected, (similar as what mprotect does in this patch) and save it along with stack address/size. - at signaling handling, use the saved info to fill in PKRU. The ABI change is similar to PKEY_ENFORCE_API, and there is no backward compatibility issue. Will these mentioned help our case ? What do you think ? (Stephan has more info on gains, as far as I know, V8 engineers have worked/thought really hard to come to a suitable solution to make chrome browser safer) [1] https://lore.kernel.org/linux-mm/20230516165754.pocx4kaagn3yyw3r@revolv= er/T/ [2] https://docs.google.com/document/d/1OlnJbR5TMoaOAJsf4hHOc-FdTmYK2aDUI7d= 2hfCZSOo/edit?resourcekey=3D0-v9UJXONYsnG5PlCBbcYqIw# Thanks! Best regards, -Jeff