From: Catalin Marinas <catalin.marinas@arm.com>
To: Peter Collingbourne <pcc@google.com>
Cc: kvm@vger.kernel.org, Marc Zyngier <maz@kernel.org>,
Andy Lutomirski <luto@amacapital.net>,
Evgenii Stepanov <eugenis@google.com>,
Michael Roth <michael.roth@amd.com>,
Chao Peng <chao.p.peng@linux.intel.com>,
Steven Price <steven.price@arm.com>,
Will Deacon <will@kernel.org>,
kvmarm@lists.cs.columbia.edu,
Linux ARM <linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH] KVM: arm64: permit MAP_SHARED mappings with MTE enabled
Date: Mon, 27 Jun 2022 12:43:08 +0100 [thread overview]
Message-ID: <YrmXzHXv4babwbNZ@arm.com> (raw)
In-Reply-To: <CAMn1gO7-qVzZrAt63BJC-M8gKLw4=60iVUo6Eu8T_5y3AZnKcA@mail.gmail.com>
On Fri, Jun 24, 2022 at 02:50:53PM -0700, Peter Collingbourne wrote:
> On Fri, Jun 24, 2022 at 10:05 AM Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> > + Steven as he added the KVM and swap support for MTE.
> >
> > On Thu, Jun 23, 2022 at 04:49:44PM -0700, Peter Collingbourne wrote:
> > > Certain VMMs such as crosvm have features (e.g. sandboxing, pmem) that
> > > depend on being able to map guest memory as MAP_SHARED. The current
> > > restriction on sharing MAP_SHARED pages with the guest is preventing
> > > the use of those features with MTE. Therefore, remove this restriction.
> >
> > We already have some corner cases where the PG_mte_tagged logic fails
> > even for MAP_PRIVATE (but page shared with CoW). Adding this on top for
> > KVM MAP_SHARED will potentially make things worse (or hard to reason
> > about; for example the VMM sets PROT_MTE as well). I'm more inclined to
> > get rid of PG_mte_tagged altogether, always zero (or restore) the tags
> > on user page allocation, copy them on write. For swap we can scan and if
> > all tags are 0 and just skip saving them.
>
> A problem with this approach is that it would conflict with any
> potential future changes that we might make that would require the
> kernel to avoid modifying the tags for non-PROT_MTE pages.
Not if in all those cases we check VM_MTE_ALLOWED. We seem to have the
vma available where it matters. We can keep PG_mte_tagged around but
always set it on page allocation (e.g. when zeroing or CoW) and check
VM_MTE_ALLOWED rather than VM_MTE.
I'm not sure how Linux can deal with pages that do not support MTE.
Currently we only handle this at the vm_flags level. Assuming that we
somehow manage to, we can still use PG_mte_tagged to mark the pages that
supported tags on allocation (and they have been zeroed or copied). I
guess if you want to move a page around, you'd need to go through
something like try_to_unmap() (or set all mappings to PROT_NONE like in
NUMA migration). Then you can either check whether the page is PROT_MTE
anywhere and maybe read the tags to see whether all are zero after
unmapping.
Deferring tag zeroing/restoring to set_pte_at() can be racy without a
lock (or your approach with another flag) but I'm not sure it's worth
the extra logic if zeroing or copying the tags doesn't have a
significant overhead for non-PROT_MTE pages.
Another issue with set_pte_at() is that it can write tags on mprotect()
even if the page is mapped read-only. So far I couldn't find a problem
with this but it adds to the complexity.
> Thinking about this some more, another idea that I had was to only
> allow MAP_SHARED mappings in a guest with MTE enabled if the mapping
> is PROT_MTE and there are no non-PROT_MTE aliases. For anonymous
> mappings I don't think it's possible to create a non-PROT_MTE alias in
> another mm (since you can't turn off PROT_MTE with mprotect), and for
> memfd maybe we could introduce a flag that requires PROT_MTE on all
> mappings. That way, we are guaranteed that either the page has been
> tagged prior to fault or we have exclusive access to it so it can be
> tagged on demand without racing. Let me see what effect that has on
> crosvm.
You could still have all initial shared mappings as !PROT_MTE and some
mprotect() afterwards setting PG_mte_tagged and clearing the tags and
this can race. AFAICT, the easiest way to avoid the race is to set
PG_mte_tagged on allocation before it ends up in set_pte_at().
--
Catalin
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Peter Collingbourne <pcc@google.com>
Cc: kvmarm@lists.cs.columbia.edu, Marc Zyngier <maz@kernel.org>,
kvm@vger.kernel.org, Andy Lutomirski <luto@amacapital.net>,
Linux ARM <linux-arm-kernel@lists.infradead.org>,
Michael Roth <michael.roth@amd.com>,
Chao Peng <chao.p.peng@linux.intel.com>,
Will Deacon <will@kernel.org>,
Evgenii Stepanov <eugenis@google.com>,
Steven Price <steven.price@arm.com>
Subject: Re: [PATCH] KVM: arm64: permit MAP_SHARED mappings with MTE enabled
Date: Mon, 27 Jun 2022 12:43:08 +0100 [thread overview]
Message-ID: <YrmXzHXv4babwbNZ@arm.com> (raw)
In-Reply-To: <CAMn1gO7-qVzZrAt63BJC-M8gKLw4=60iVUo6Eu8T_5y3AZnKcA@mail.gmail.com>
On Fri, Jun 24, 2022 at 02:50:53PM -0700, Peter Collingbourne wrote:
> On Fri, Jun 24, 2022 at 10:05 AM Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> > + Steven as he added the KVM and swap support for MTE.
> >
> > On Thu, Jun 23, 2022 at 04:49:44PM -0700, Peter Collingbourne wrote:
> > > Certain VMMs such as crosvm have features (e.g. sandboxing, pmem) that
> > > depend on being able to map guest memory as MAP_SHARED. The current
> > > restriction on sharing MAP_SHARED pages with the guest is preventing
> > > the use of those features with MTE. Therefore, remove this restriction.
> >
> > We already have some corner cases where the PG_mte_tagged logic fails
> > even for MAP_PRIVATE (but page shared with CoW). Adding this on top for
> > KVM MAP_SHARED will potentially make things worse (or hard to reason
> > about; for example the VMM sets PROT_MTE as well). I'm more inclined to
> > get rid of PG_mte_tagged altogether, always zero (or restore) the tags
> > on user page allocation, copy them on write. For swap we can scan and if
> > all tags are 0 and just skip saving them.
>
> A problem with this approach is that it would conflict with any
> potential future changes that we might make that would require the
> kernel to avoid modifying the tags for non-PROT_MTE pages.
Not if in all those cases we check VM_MTE_ALLOWED. We seem to have the
vma available where it matters. We can keep PG_mte_tagged around but
always set it on page allocation (e.g. when zeroing or CoW) and check
VM_MTE_ALLOWED rather than VM_MTE.
I'm not sure how Linux can deal with pages that do not support MTE.
Currently we only handle this at the vm_flags level. Assuming that we
somehow manage to, we can still use PG_mte_tagged to mark the pages that
supported tags on allocation (and they have been zeroed or copied). I
guess if you want to move a page around, you'd need to go through
something like try_to_unmap() (or set all mappings to PROT_NONE like in
NUMA migration). Then you can either check whether the page is PROT_MTE
anywhere and maybe read the tags to see whether all are zero after
unmapping.
Deferring tag zeroing/restoring to set_pte_at() can be racy without a
lock (or your approach with another flag) but I'm not sure it's worth
the extra logic if zeroing or copying the tags doesn't have a
significant overhead for non-PROT_MTE pages.
Another issue with set_pte_at() is that it can write tags on mprotect()
even if the page is mapped read-only. So far I couldn't find a problem
with this but it adds to the complexity.
> Thinking about this some more, another idea that I had was to only
> allow MAP_SHARED mappings in a guest with MTE enabled if the mapping
> is PROT_MTE and there are no non-PROT_MTE aliases. For anonymous
> mappings I don't think it's possible to create a non-PROT_MTE alias in
> another mm (since you can't turn off PROT_MTE with mprotect), and for
> memfd maybe we could introduce a flag that requires PROT_MTE on all
> mappings. That way, we are guaranteed that either the page has been
> tagged prior to fault or we have exclusive access to it so it can be
> tagged on demand without racing. Let me see what effect that has on
> crosvm.
You could still have all initial shared mappings as !PROT_MTE and some
mprotect() afterwards setting PG_mte_tagged and clearing the tags and
this can race. AFAICT, the easiest way to avoid the race is to set
PG_mte_tagged on allocation before it ends up in set_pte_at().
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Peter Collingbourne <pcc@google.com>
Cc: kvmarm@lists.cs.columbia.edu, Marc Zyngier <maz@kernel.org>,
kvm@vger.kernel.org, Andy Lutomirski <luto@amacapital.net>,
Linux ARM <linux-arm-kernel@lists.infradead.org>,
Michael Roth <michael.roth@amd.com>,
Chao Peng <chao.p.peng@linux.intel.com>,
Will Deacon <will@kernel.org>,
Evgenii Stepanov <eugenis@google.com>,
Steven Price <steven.price@arm.com>
Subject: Re: [PATCH] KVM: arm64: permit MAP_SHARED mappings with MTE enabled
Date: Mon, 27 Jun 2022 12:43:08 +0100 [thread overview]
Message-ID: <YrmXzHXv4babwbNZ@arm.com> (raw)
In-Reply-To: <CAMn1gO7-qVzZrAt63BJC-M8gKLw4=60iVUo6Eu8T_5y3AZnKcA@mail.gmail.com>
On Fri, Jun 24, 2022 at 02:50:53PM -0700, Peter Collingbourne wrote:
> On Fri, Jun 24, 2022 at 10:05 AM Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> > + Steven as he added the KVM and swap support for MTE.
> >
> > On Thu, Jun 23, 2022 at 04:49:44PM -0700, Peter Collingbourne wrote:
> > > Certain VMMs such as crosvm have features (e.g. sandboxing, pmem) that
> > > depend on being able to map guest memory as MAP_SHARED. The current
> > > restriction on sharing MAP_SHARED pages with the guest is preventing
> > > the use of those features with MTE. Therefore, remove this restriction.
> >
> > We already have some corner cases where the PG_mte_tagged logic fails
> > even for MAP_PRIVATE (but page shared with CoW). Adding this on top for
> > KVM MAP_SHARED will potentially make things worse (or hard to reason
> > about; for example the VMM sets PROT_MTE as well). I'm more inclined to
> > get rid of PG_mte_tagged altogether, always zero (or restore) the tags
> > on user page allocation, copy them on write. For swap we can scan and if
> > all tags are 0 and just skip saving them.
>
> A problem with this approach is that it would conflict with any
> potential future changes that we might make that would require the
> kernel to avoid modifying the tags for non-PROT_MTE pages.
Not if in all those cases we check VM_MTE_ALLOWED. We seem to have the
vma available where it matters. We can keep PG_mte_tagged around but
always set it on page allocation (e.g. when zeroing or CoW) and check
VM_MTE_ALLOWED rather than VM_MTE.
I'm not sure how Linux can deal with pages that do not support MTE.
Currently we only handle this at the vm_flags level. Assuming that we
somehow manage to, we can still use PG_mte_tagged to mark the pages that
supported tags on allocation (and they have been zeroed or copied). I
guess if you want to move a page around, you'd need to go through
something like try_to_unmap() (or set all mappings to PROT_NONE like in
NUMA migration). Then you can either check whether the page is PROT_MTE
anywhere and maybe read the tags to see whether all are zero after
unmapping.
Deferring tag zeroing/restoring to set_pte_at() can be racy without a
lock (or your approach with another flag) but I'm not sure it's worth
the extra logic if zeroing or copying the tags doesn't have a
significant overhead for non-PROT_MTE pages.
Another issue with set_pte_at() is that it can write tags on mprotect()
even if the page is mapped read-only. So far I couldn't find a problem
with this but it adds to the complexity.
> Thinking about this some more, another idea that I had was to only
> allow MAP_SHARED mappings in a guest with MTE enabled if the mapping
> is PROT_MTE and there are no non-PROT_MTE aliases. For anonymous
> mappings I don't think it's possible to create a non-PROT_MTE alias in
> another mm (since you can't turn off PROT_MTE with mprotect), and for
> memfd maybe we could introduce a flag that requires PROT_MTE on all
> mappings. That way, we are guaranteed that either the page has been
> tagged prior to fault or we have exclusive access to it so it can be
> tagged on demand without racing. Let me see what effect that has on
> crosvm.
You could still have all initial shared mappings as !PROT_MTE and some
mprotect() afterwards setting PG_mte_tagged and clearing the tags and
this can race. AFAICT, the easiest way to avoid the race is to set
PG_mte_tagged on allocation before it ends up in set_pte_at().
--
Catalin
next prev parent reply other threads:[~2022-06-27 11:43 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-23 23:49 [PATCH] KVM: arm64: permit MAP_SHARED mappings with MTE enabled Peter Collingbourne
2022-06-23 23:49 ` Peter Collingbourne
2022-06-23 23:49 ` Peter Collingbourne
2022-06-24 17:05 ` Catalin Marinas
2022-06-24 17:05 ` Catalin Marinas
2022-06-24 17:05 ` Catalin Marinas
2022-06-24 21:50 ` Peter Collingbourne
2022-06-24 21:50 ` Peter Collingbourne
2022-06-24 21:50 ` Peter Collingbourne
2022-06-27 11:43 ` Catalin Marinas [this message]
2022-06-27 11:43 ` Catalin Marinas
2022-06-27 11:43 ` Catalin Marinas
2022-06-27 18:16 ` Peter Collingbourne
2022-06-27 18:16 ` Peter Collingbourne
2022-06-27 18:16 ` Peter Collingbourne
2022-06-28 17:57 ` Catalin Marinas
2022-06-28 17:57 ` Catalin Marinas
2022-06-28 17:57 ` Catalin Marinas
2022-06-28 18:54 ` Peter Collingbourne
2022-06-28 18:54 ` Peter Collingbourne
2022-06-28 18:54 ` Peter Collingbourne
2022-06-29 19:15 ` Catalin Marinas
2022-06-29 19:15 ` Catalin Marinas
2022-06-29 19:15 ` Catalin Marinas
2022-06-30 17:24 ` Catalin Marinas
2022-06-30 17:24 ` Catalin Marinas
2022-06-30 17:24 ` Catalin Marinas
2022-06-25 8:14 ` Steven Price
2022-06-25 8:14 ` Steven Price
2022-06-25 8:14 ` Steven Price
2022-06-27 15:55 ` Cornelia Huck
2022-06-27 15:55 ` Cornelia Huck
2022-06-27 15:55 ` Cornelia Huck
2022-06-29 8:45 ` Catalin Marinas
2022-06-29 8:45 ` Catalin Marinas
2022-06-29 8:45 ` Catalin Marinas
2022-07-04 9:52 ` Steven Price
2022-07-04 9:52 ` Steven Price
2022-07-04 9:52 ` Steven Price
2022-07-04 12:19 ` Cornelia Huck
2022-07-04 12:19 ` Cornelia Huck
2022-07-04 12:19 ` Cornelia Huck
2022-07-04 15:00 ` Steven Price
2022-07-04 15:00 ` Steven Price
2022-07-04 15:00 ` Steven Price
2022-07-08 13:03 ` Cornelia Huck
2022-07-08 13:03 ` Cornelia Huck
2022-07-08 13:03 ` Cornelia Huck
2022-07-08 13:58 ` Peter Xu
2022-07-08 13:58 ` Peter Xu
2022-07-08 13:58 ` Peter Xu
2022-07-14 13:30 ` Cornelia Huck
2022-07-14 13:30 ` Cornelia Huck
2022-07-14 13:30 ` Cornelia Huck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YrmXzHXv4babwbNZ@arm.com \
--to=catalin.marinas@arm.com \
--cc=chao.p.peng@linux.intel.com \
--cc=eugenis@google.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=luto@amacapital.net \
--cc=maz@kernel.org \
--cc=michael.roth@amd.com \
--cc=pcc@google.com \
--cc=steven.price@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.