From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mm01.cs.columbia.edu (mm01.cs.columbia.edu [128.59.11.253]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A8C3C43334 for ; Mon, 27 Jun 2022 11:43:19 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id E937D40FD3; Mon, 27 Jun 2022 07:43:18 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uxhRxF4Gi6ip; Mon, 27 Jun 2022 07:43:17 -0400 (EDT) Received: from mm01.cs.columbia.edu (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id A712040BC2; Mon, 27 Jun 2022 07:43:17 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 8B81640BEE for ; Mon, 27 Jun 2022 07:43:16 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ah3E70edpD6N for ; Mon, 27 Jun 2022 07:43:15 -0400 (EDT) Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by mm01.cs.columbia.edu (Postfix) with ESMTPS id 2DAEB40B92 for ; Mon, 27 Jun 2022 07:43:15 -0400 (EDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 7F1AD61192; Mon, 27 Jun 2022 11:43:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BD194C3411D; Mon, 27 Jun 2022 11:43:11 +0000 (UTC) Date: Mon, 27 Jun 2022 12:43:08 +0100 From: Catalin Marinas To: Peter Collingbourne Subject: Re: [PATCH] KVM: arm64: permit MAP_SHARED mappings with MTE enabled Message-ID: References: <20220623234944.141869-1-pcc@google.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Cc: kvm@vger.kernel.org, Marc Zyngier , Andy Lutomirski , Evgenii Stepanov , Michael Roth , Chao Peng , Steven Price , Will Deacon , kvmarm@lists.cs.columbia.edu, Linux ARM X-BeenThere: kvmarm@lists.cs.columbia.edu X-Mailman-Version: 2.1.14 Precedence: list List-Id: Where KVM/ARM decisions are made List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: kvmarm-bounces@lists.cs.columbia.edu Sender: kvmarm-bounces@lists.cs.columbia.edu On Fri, Jun 24, 2022 at 02:50:53PM -0700, Peter Collingbourne wrote: > On Fri, Jun 24, 2022 at 10:05 AM Catalin Marinas > wrote: > > + Steven as he added the KVM and swap support for MTE. > > > > On Thu, Jun 23, 2022 at 04:49:44PM -0700, Peter Collingbourne wrote: > > > Certain VMMs such as crosvm have features (e.g. sandboxing, pmem) that > > > depend on being able to map guest memory as MAP_SHARED. The current > > > restriction on sharing MAP_SHARED pages with the guest is preventing > > > the use of those features with MTE. Therefore, remove this restriction. > > > > We already have some corner cases where the PG_mte_tagged logic fails > > even for MAP_PRIVATE (but page shared with CoW). Adding this on top for > > KVM MAP_SHARED will potentially make things worse (or hard to reason > > about; for example the VMM sets PROT_MTE as well). I'm more inclined to > > get rid of PG_mte_tagged altogether, always zero (or restore) the tags > > on user page allocation, copy them on write. For swap we can scan and if > > all tags are 0 and just skip saving them. > > A problem with this approach is that it would conflict with any > potential future changes that we might make that would require the > kernel to avoid modifying the tags for non-PROT_MTE pages. Not if in all those cases we check VM_MTE_ALLOWED. We seem to have the vma available where it matters. We can keep PG_mte_tagged around but always set it on page allocation (e.g. when zeroing or CoW) and check VM_MTE_ALLOWED rather than VM_MTE. I'm not sure how Linux can deal with pages that do not support MTE. Currently we only handle this at the vm_flags level. Assuming that we somehow manage to, we can still use PG_mte_tagged to mark the pages that supported tags on allocation (and they have been zeroed or copied). I guess if you want to move a page around, you'd need to go through something like try_to_unmap() (or set all mappings to PROT_NONE like in NUMA migration). Then you can either check whether the page is PROT_MTE anywhere and maybe read the tags to see whether all are zero after unmapping. Deferring tag zeroing/restoring to set_pte_at() can be racy without a lock (or your approach with another flag) but I'm not sure it's worth the extra logic if zeroing or copying the tags doesn't have a significant overhead for non-PROT_MTE pages. Another issue with set_pte_at() is that it can write tags on mprotect() even if the page is mapped read-only. So far I couldn't find a problem with this but it adds to the complexity. > Thinking about this some more, another idea that I had was to only > allow MAP_SHARED mappings in a guest with MTE enabled if the mapping > is PROT_MTE and there are no non-PROT_MTE aliases. For anonymous > mappings I don't think it's possible to create a non-PROT_MTE alias in > another mm (since you can't turn off PROT_MTE with mprotect), and for > memfd maybe we could introduce a flag that requires PROT_MTE on all > mappings. That way, we are guaranteed that either the page has been > tagged prior to fault or we have exclusive access to it so it can be > tagged on demand without racing. Let me see what effect that has on > crosvm. You could still have all initial shared mappings as !PROT_MTE and some mprotect() afterwards setting PG_mte_tagged and clearing the tags and this can race. AFAICT, the easiest way to avoid the race is to set PG_mte_tagged on allocation before it ends up in set_pte_at(). -- Catalin _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E7B10C43334 for ; Mon, 27 Jun 2022 11:44:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=+JTSlUw0Jr34P/nU7m6oLJjbwT7sHfCVLEwyiymM0w0=; b=dtTkrs+xPBQUeH jbvpkbh9ZTF6jGEKRlnoUbvPjLf5Crd9LQi3v6Bm6lAPgfEs5fhGyIg4YGfPS8unixlHpphH3wFCY tjQdpoHtoEhyxmI88MHHxItsuUepQxWX8OGmLA0IrFZWpqnOhNLt/QwzwZ3VZQMeW9WUt6BqkEJyN zq7sXexh0NCXIhSFYhbUySfDJi6jSAgdKIuIVgdgxs1wFw7Gff0HWM2QDknLkt/UHivl9EfzzN9Zh tmuowUMXiDzQVRCmrmczWz7YIzwQrLFiA9MO0t+V7FxC4UVAF1VoxkARJSfnir/j9Wko9ou7mQ4k/ nL3G53XM/hX4Z6bUYlgA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1o5n9F-000cq8-0g; Mon, 27 Jun 2022 11:43:21 +0000 Received: from dfw.source.kernel.org ([139.178.84.217]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1o5n9A-000cnT-I8 for linux-arm-kernel@lists.infradead.org; Mon, 27 Jun 2022 11:43:18 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 7F1AD61192; Mon, 27 Jun 2022 11:43:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BD194C3411D; Mon, 27 Jun 2022 11:43:11 +0000 (UTC) Date: Mon, 27 Jun 2022 12:43:08 +0100 From: Catalin Marinas To: Peter Collingbourne Cc: kvmarm@lists.cs.columbia.edu, Marc Zyngier , kvm@vger.kernel.org, Andy Lutomirski , Linux ARM , Michael Roth , Chao Peng , Will Deacon , Evgenii Stepanov , Steven Price Subject: Re: [PATCH] KVM: arm64: permit MAP_SHARED mappings with MTE enabled Message-ID: References: <20220623234944.141869-1-pcc@google.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220627_044316_721274_18C9799D X-CRM114-Status: GOOD ( 36.04 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Fri, Jun 24, 2022 at 02:50:53PM -0700, Peter Collingbourne wrote: > On Fri, Jun 24, 2022 at 10:05 AM Catalin Marinas > wrote: > > + Steven as he added the KVM and swap support for MTE. > > > > On Thu, Jun 23, 2022 at 04:49:44PM -0700, Peter Collingbourne wrote: > > > Certain VMMs such as crosvm have features (e.g. sandboxing, pmem) that > > > depend on being able to map guest memory as MAP_SHARED. The current > > > restriction on sharing MAP_SHARED pages with the guest is preventing > > > the use of those features with MTE. Therefore, remove this restriction. > > > > We already have some corner cases where the PG_mte_tagged logic fails > > even for MAP_PRIVATE (but page shared with CoW). Adding this on top for > > KVM MAP_SHARED will potentially make things worse (or hard to reason > > about; for example the VMM sets PROT_MTE as well). I'm more inclined to > > get rid of PG_mte_tagged altogether, always zero (or restore) the tags > > on user page allocation, copy them on write. For swap we can scan and if > > all tags are 0 and just skip saving them. > > A problem with this approach is that it would conflict with any > potential future changes that we might make that would require the > kernel to avoid modifying the tags for non-PROT_MTE pages. Not if in all those cases we check VM_MTE_ALLOWED. We seem to have the vma available where it matters. We can keep PG_mte_tagged around but always set it on page allocation (e.g. when zeroing or CoW) and check VM_MTE_ALLOWED rather than VM_MTE. I'm not sure how Linux can deal with pages that do not support MTE. Currently we only handle this at the vm_flags level. Assuming that we somehow manage to, we can still use PG_mte_tagged to mark the pages that supported tags on allocation (and they have been zeroed or copied). I guess if you want to move a page around, you'd need to go through something like try_to_unmap() (or set all mappings to PROT_NONE like in NUMA migration). Then you can either check whether the page is PROT_MTE anywhere and maybe read the tags to see whether all are zero after unmapping. Deferring tag zeroing/restoring to set_pte_at() can be racy without a lock (or your approach with another flag) but I'm not sure it's worth the extra logic if zeroing or copying the tags doesn't have a significant overhead for non-PROT_MTE pages. Another issue with set_pte_at() is that it can write tags on mprotect() even if the page is mapped read-only. So far I couldn't find a problem with this but it adds to the complexity. > Thinking about this some more, another idea that I had was to only > allow MAP_SHARED mappings in a guest with MTE enabled if the mapping > is PROT_MTE and there are no non-PROT_MTE aliases. For anonymous > mappings I don't think it's possible to create a non-PROT_MTE alias in > another mm (since you can't turn off PROT_MTE with mprotect), and for > memfd maybe we could introduce a flag that requires PROT_MTE on all > mappings. That way, we are guaranteed that either the page has been > tagged prior to fault or we have exclusive access to it so it can be > tagged on demand without racing. Let me see what effect that has on > crosvm. You could still have all initial shared mappings as !PROT_MTE and some mprotect() afterwards setting PG_mte_tagged and clearing the tags and this can race. AFAICT, the easiest way to avoid the race is to set PG_mte_tagged on allocation before it ends up in set_pte_at(). -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5897C433EF for ; Mon, 27 Jun 2022 11:51:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237915AbiF0LvE (ORCPT ); Mon, 27 Jun 2022 07:51:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49688 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237928AbiF0LtV (ORCPT ); Mon, 27 Jun 2022 07:49:21 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E523EDD6 for ; Mon, 27 Jun 2022 04:43:14 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 827E761241 for ; Mon, 27 Jun 2022 11:43:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BD194C3411D; Mon, 27 Jun 2022 11:43:11 +0000 (UTC) Date: Mon, 27 Jun 2022 12:43:08 +0100 From: Catalin Marinas To: Peter Collingbourne Cc: kvmarm@lists.cs.columbia.edu, Marc Zyngier , kvm@vger.kernel.org, Andy Lutomirski , Linux ARM , Michael Roth , Chao Peng , Will Deacon , Evgenii Stepanov , Steven Price Subject: Re: [PATCH] KVM: arm64: permit MAP_SHARED mappings with MTE enabled Message-ID: References: <20220623234944.141869-1-pcc@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Fri, Jun 24, 2022 at 02:50:53PM -0700, Peter Collingbourne wrote: > On Fri, Jun 24, 2022 at 10:05 AM Catalin Marinas > wrote: > > + Steven as he added the KVM and swap support for MTE. > > > > On Thu, Jun 23, 2022 at 04:49:44PM -0700, Peter Collingbourne wrote: > > > Certain VMMs such as crosvm have features (e.g. sandboxing, pmem) that > > > depend on being able to map guest memory as MAP_SHARED. The current > > > restriction on sharing MAP_SHARED pages with the guest is preventing > > > the use of those features with MTE. Therefore, remove this restriction. > > > > We already have some corner cases where the PG_mte_tagged logic fails > > even for MAP_PRIVATE (but page shared with CoW). Adding this on top for > > KVM MAP_SHARED will potentially make things worse (or hard to reason > > about; for example the VMM sets PROT_MTE as well). I'm more inclined to > > get rid of PG_mte_tagged altogether, always zero (or restore) the tags > > on user page allocation, copy them on write. For swap we can scan and if > > all tags are 0 and just skip saving them. > > A problem with this approach is that it would conflict with any > potential future changes that we might make that would require the > kernel to avoid modifying the tags for non-PROT_MTE pages. Not if in all those cases we check VM_MTE_ALLOWED. We seem to have the vma available where it matters. We can keep PG_mte_tagged around but always set it on page allocation (e.g. when zeroing or CoW) and check VM_MTE_ALLOWED rather than VM_MTE. I'm not sure how Linux can deal with pages that do not support MTE. Currently we only handle this at the vm_flags level. Assuming that we somehow manage to, we can still use PG_mte_tagged to mark the pages that supported tags on allocation (and they have been zeroed or copied). I guess if you want to move a page around, you'd need to go through something like try_to_unmap() (or set all mappings to PROT_NONE like in NUMA migration). Then you can either check whether the page is PROT_MTE anywhere and maybe read the tags to see whether all are zero after unmapping. Deferring tag zeroing/restoring to set_pte_at() can be racy without a lock (or your approach with another flag) but I'm not sure it's worth the extra logic if zeroing or copying the tags doesn't have a significant overhead for non-PROT_MTE pages. Another issue with set_pte_at() is that it can write tags on mprotect() even if the page is mapped read-only. So far I couldn't find a problem with this but it adds to the complexity. > Thinking about this some more, another idea that I had was to only > allow MAP_SHARED mappings in a guest with MTE enabled if the mapping > is PROT_MTE and there are no non-PROT_MTE aliases. For anonymous > mappings I don't think it's possible to create a non-PROT_MTE alias in > another mm (since you can't turn off PROT_MTE with mprotect), and for > memfd maybe we could introduce a flag that requires PROT_MTE on all > mappings. That way, we are guaranteed that either the page has been > tagged prior to fault or we have exclusive access to it so it can be > tagged on demand without racing. Let me see what effect that has on > crosvm. You could still have all initial shared mappings as !PROT_MTE and some mprotect() afterwards setting PG_mte_tagged and clearing the tags and this can race. AFAICT, the easiest way to avoid the race is to set PG_mte_tagged on allocation before it ends up in set_pte_at(). -- Catalin