From: Marc Zyngier <maz@kernel.org>
To: Quentin Perret <qperret@google.com>
Cc: kernel-team@android.com, qwandor@google.com, will@kernel.org,
catalin.marinas@arm.com, linux-kernel@vger.kernel.org,
kvmarm@lists.cs.columbia.edu,
linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 03/14] KVM: arm64: Continue stage-2 map when re-creating mappings
Date: Tue, 20 Jul 2021 09:26:10 +0100 [thread overview]
Message-ID: <875yx59ysd.wl-maz@kernel.org> (raw)
In-Reply-To: <YPV+2jQ/Q/ie14Fn@google.com>
On Mon, 19 Jul 2021 14:32:10 +0100,
Quentin Perret <qperret@google.com> wrote:
>
> On Monday 19 Jul 2021 at 13:14:48 (+0100), Marc Zyngier wrote:
> > On Mon, 19 Jul 2021 11:47:24 +0100,
> > Quentin Perret <qperret@google.com> wrote:
> > >
> > > The stage-2 map walkers currently return -EAGAIN when re-creating
> > > identical mappings or only changing access permissions. This allows to
> > > optimize mapping pages for concurrent (v)CPUs faulting on the same
> > > page.
> > >
> > > While this works as expected when touching one page-table leaf at a
> > > time, this can lead to difficult situations when mapping larger ranges.
> > > Indeed, a large map operation can fail in the middle if an existing
> > > mapping is found in the range, even if it has compatible attributes,
> > > hence leaving only half of the range mapped.
> >
> > I'm curious of when this can happen. We normally map a single leaf at
> > a time, and we don't have a way to map multiple leaves at once: we
> > either use the VMA base size or try to upgrade it to a THP, but the
> > result is always a single leaf entry. What changed?
>
> Nothing _yet_ :-)
>
> The 'share' hypercall introduced near the end of the series allows to
> share multiple physically contiguous pages in one go -- this is mostly
> to allow sharing data-structures that are larger than a page.
>
> So if one of the pages happens to be already mapped by the time the
> hypercall is issued, mapping the range with the right SW bits becomes
> difficult as kvm_pgtable_stage2_map() will fail halfway through, which
> is tricky to handle.
>
> This patch shouldn't change anything for existing users that only map
> things that are nicely aligned at block/page granularity, but should
> make the life of new users easier, so that seemed like a win.
Right, but this is on a different path, right? Guests can never fault
multiple mappings at once, and it takes you a host hypercall to
perform this "multiple leaves at once".
Is there any way we can restrict this to the hypercall? Or even
better, keep the hypercall as a "one page at a time" thing? I can't
imagine it being performance critical (it is a once-off, and only used
over a rather small region of memory). Then, the called doesn't have
to worry about the page already being mapped or not. This would also
match the behaviour of what I do on the MMIO side.
Or do you anticipate much gain from this being able to use block
mappings?
>
> > > To avoid having to deal with such failures in the caller, don't
> > > interrupt the map operation when hitting existing PTEs, but make sure to
> > > still return -EAGAIN so that user_mem_abort() can mark the page dirty
> > > when needed.
> >
> > I don't follow you here: if you return -EAGAIN for a writable mapping,
> > we don't account for the page to be dirty on the assumption that
> > nothing has been mapped. But if there is a way to map more than a
> > single entry and to get -EAGAIN at the same time, then we're bound to
> > lose data on page eviction.
> >
> > Can you shed some light on this?
>
> Sure. For guests, hitting the -EAGAIN case means we've lost the race
> with another vCPU that faulted the same page. In this case the other
> vCPU either mapped the page RO, which means that our vCPU will then get
> a permission fault next time we run it which will lead to the page being
> marked dirty, or the other vCPU mapped the page RW in which case it
> already marked the page dirty for us and we can safely re-enter the
> guest without doing anything else.
>
> So what I meant by "still return -EAGAIN so that user_mem_abort() can
> mark the page dirty when needed" is "make sure to mark the page dirty
> only when necessary: if winning the race and marking the page RW, or
> in the permission fault path". That is, by keeping the -EAGAIN I want to
> make sure we don't mark the page dirty twice. (This might fine, but this
> would be new behaviour, and it was not clear that would scale well to
> many vCPUs faulting the same page).
>
> I see how this wording can be highly confusing though, I'll and re-word
> for the next version.
I indeed found it pretty confusing. A reword would be much appreciated.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Quentin Perret <qperret@google.com>
Cc: james.morse@arm.com, alexandru.elisei@arm.com,
suzuki.poulose@arm.com, catalin.marinas@arm.com, will@kernel.org,
linux-arm-kernel@lists.infradead.org,
kvmarm@lists.cs.columbia.edu, linux-kernel@vger.kernel.org,
ardb@kernel.org, qwandor@google.com, tabba@google.com,
dbrazdil@google.com, kernel-team@android.com,
Yanan Wang <wangyanan55@huawei.com>
Subject: Re: [PATCH 03/14] KVM: arm64: Continue stage-2 map when re-creating mappings
Date: Tue, 20 Jul 2021 09:26:10 +0100 [thread overview]
Message-ID: <875yx59ysd.wl-maz@kernel.org> (raw)
In-Reply-To: <YPV+2jQ/Q/ie14Fn@google.com>
On Mon, 19 Jul 2021 14:32:10 +0100,
Quentin Perret <qperret@google.com> wrote:
>
> On Monday 19 Jul 2021 at 13:14:48 (+0100), Marc Zyngier wrote:
> > On Mon, 19 Jul 2021 11:47:24 +0100,
> > Quentin Perret <qperret@google.com> wrote:
> > >
> > > The stage-2 map walkers currently return -EAGAIN when re-creating
> > > identical mappings or only changing access permissions. This allows to
> > > optimize mapping pages for concurrent (v)CPUs faulting on the same
> > > page.
> > >
> > > While this works as expected when touching one page-table leaf at a
> > > time, this can lead to difficult situations when mapping larger ranges.
> > > Indeed, a large map operation can fail in the middle if an existing
> > > mapping is found in the range, even if it has compatible attributes,
> > > hence leaving only half of the range mapped.
> >
> > I'm curious of when this can happen. We normally map a single leaf at
> > a time, and we don't have a way to map multiple leaves at once: we
> > either use the VMA base size or try to upgrade it to a THP, but the
> > result is always a single leaf entry. What changed?
>
> Nothing _yet_ :-)
>
> The 'share' hypercall introduced near the end of the series allows to
> share multiple physically contiguous pages in one go -- this is mostly
> to allow sharing data-structures that are larger than a page.
>
> So if one of the pages happens to be already mapped by the time the
> hypercall is issued, mapping the range with the right SW bits becomes
> difficult as kvm_pgtable_stage2_map() will fail halfway through, which
> is tricky to handle.
>
> This patch shouldn't change anything for existing users that only map
> things that are nicely aligned at block/page granularity, but should
> make the life of new users easier, so that seemed like a win.
Right, but this is on a different path, right? Guests can never fault
multiple mappings at once, and it takes you a host hypercall to
perform this "multiple leaves at once".
Is there any way we can restrict this to the hypercall? Or even
better, keep the hypercall as a "one page at a time" thing? I can't
imagine it being performance critical (it is a once-off, and only used
over a rather small region of memory). Then, the called doesn't have
to worry about the page already being mapped or not. This would also
match the behaviour of what I do on the MMIO side.
Or do you anticipate much gain from this being able to use block
mappings?
>
> > > To avoid having to deal with such failures in the caller, don't
> > > interrupt the map operation when hitting existing PTEs, but make sure to
> > > still return -EAGAIN so that user_mem_abort() can mark the page dirty
> > > when needed.
> >
> > I don't follow you here: if you return -EAGAIN for a writable mapping,
> > we don't account for the page to be dirty on the assumption that
> > nothing has been mapped. But if there is a way to map more than a
> > single entry and to get -EAGAIN at the same time, then we're bound to
> > lose data on page eviction.
> >
> > Can you shed some light on this?
>
> Sure. For guests, hitting the -EAGAIN case means we've lost the race
> with another vCPU that faulted the same page. In this case the other
> vCPU either mapped the page RO, which means that our vCPU will then get
> a permission fault next time we run it which will lead to the page being
> marked dirty, or the other vCPU mapped the page RW in which case it
> already marked the page dirty for us and we can safely re-enter the
> guest without doing anything else.
>
> So what I meant by "still return -EAGAIN so that user_mem_abort() can
> mark the page dirty when needed" is "make sure to mark the page dirty
> only when necessary: if winning the race and marking the page RW, or
> in the permission fault path". That is, by keeping the -EAGAIN I want to
> make sure we don't mark the page dirty twice. (This might fine, but this
> would be new behaviour, and it was not clear that would scale well to
> many vCPUs faulting the same page).
>
> I see how this wording can be highly confusing though, I'll and re-word
> for the next version.
I indeed found it pretty confusing. A reword would be much appreciated.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Quentin Perret <qperret@google.com>
Cc: james.morse@arm.com, alexandru.elisei@arm.com,
suzuki.poulose@arm.com, catalin.marinas@arm.com, will@kernel.org,
linux-arm-kernel@lists.infradead.org,
kvmarm@lists.cs.columbia.edu, linux-kernel@vger.kernel.org,
ardb@kernel.org, qwandor@google.com, tabba@google.com,
dbrazdil@google.com, kernel-team@android.com,
Yanan Wang <wangyanan55@huawei.com>
Subject: Re: [PATCH 03/14] KVM: arm64: Continue stage-2 map when re-creating mappings
Date: Tue, 20 Jul 2021 09:26:10 +0100 [thread overview]
Message-ID: <875yx59ysd.wl-maz@kernel.org> (raw)
In-Reply-To: <YPV+2jQ/Q/ie14Fn@google.com>
On Mon, 19 Jul 2021 14:32:10 +0100,
Quentin Perret <qperret@google.com> wrote:
>
> On Monday 19 Jul 2021 at 13:14:48 (+0100), Marc Zyngier wrote:
> > On Mon, 19 Jul 2021 11:47:24 +0100,
> > Quentin Perret <qperret@google.com> wrote:
> > >
> > > The stage-2 map walkers currently return -EAGAIN when re-creating
> > > identical mappings or only changing access permissions. This allows to
> > > optimize mapping pages for concurrent (v)CPUs faulting on the same
> > > page.
> > >
> > > While this works as expected when touching one page-table leaf at a
> > > time, this can lead to difficult situations when mapping larger ranges.
> > > Indeed, a large map operation can fail in the middle if an existing
> > > mapping is found in the range, even if it has compatible attributes,
> > > hence leaving only half of the range mapped.
> >
> > I'm curious of when this can happen. We normally map a single leaf at
> > a time, and we don't have a way to map multiple leaves at once: we
> > either use the VMA base size or try to upgrade it to a THP, but the
> > result is always a single leaf entry. What changed?
>
> Nothing _yet_ :-)
>
> The 'share' hypercall introduced near the end of the series allows to
> share multiple physically contiguous pages in one go -- this is mostly
> to allow sharing data-structures that are larger than a page.
>
> So if one of the pages happens to be already mapped by the time the
> hypercall is issued, mapping the range with the right SW bits becomes
> difficult as kvm_pgtable_stage2_map() will fail halfway through, which
> is tricky to handle.
>
> This patch shouldn't change anything for existing users that only map
> things that are nicely aligned at block/page granularity, but should
> make the life of new users easier, so that seemed like a win.
Right, but this is on a different path, right? Guests can never fault
multiple mappings at once, and it takes you a host hypercall to
perform this "multiple leaves at once".
Is there any way we can restrict this to the hypercall? Or even
better, keep the hypercall as a "one page at a time" thing? I can't
imagine it being performance critical (it is a once-off, and only used
over a rather small region of memory). Then, the called doesn't have
to worry about the page already being mapped or not. This would also
match the behaviour of what I do on the MMIO side.
Or do you anticipate much gain from this being able to use block
mappings?
>
> > > To avoid having to deal with such failures in the caller, don't
> > > interrupt the map operation when hitting existing PTEs, but make sure to
> > > still return -EAGAIN so that user_mem_abort() can mark the page dirty
> > > when needed.
> >
> > I don't follow you here: if you return -EAGAIN for a writable mapping,
> > we don't account for the page to be dirty on the assumption that
> > nothing has been mapped. But if there is a way to map more than a
> > single entry and to get -EAGAIN at the same time, then we're bound to
> > lose data on page eviction.
> >
> > Can you shed some light on this?
>
> Sure. For guests, hitting the -EAGAIN case means we've lost the race
> with another vCPU that faulted the same page. In this case the other
> vCPU either mapped the page RO, which means that our vCPU will then get
> a permission fault next time we run it which will lead to the page being
> marked dirty, or the other vCPU mapped the page RW in which case it
> already marked the page dirty for us and we can safely re-enter the
> guest without doing anything else.
>
> So what I meant by "still return -EAGAIN so that user_mem_abort() can
> mark the page dirty when needed" is "make sure to mark the page dirty
> only when necessary: if winning the race and marking the page RW, or
> in the permission fault path". That is, by keeping the -EAGAIN I want to
> make sure we don't mark the page dirty twice. (This might fine, but this
> would be new behaviour, and it was not clear that would scale well to
> many vCPUs faulting the same page).
>
> I see how this wording can be highly confusing though, I'll and re-word
> for the next version.
I indeed found it pretty confusing. A reword would be much appreciated.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
next prev parent reply other threads:[~2021-07-20 8:26 UTC|newest]
Thread overview: 126+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-19 10:47 [PATCH 00/14] Track shared pages at EL2 in protected mode Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-19 10:47 ` [PATCH 01/14] KVM: arm64: Provide the host_stage2_try() helper macro Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-20 13:57 ` Fuad Tabba
2021-07-20 13:57 ` Fuad Tabba
2021-07-20 13:57 ` Fuad Tabba
2021-07-19 10:47 ` [PATCH 02/14] KVM: arm64: Optimize kvm_pgtable_stage2_find_range() Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-19 10:47 ` [PATCH 03/14] KVM: arm64: Continue stage-2 map when re-creating mappings Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-19 12:14 ` Marc Zyngier
2021-07-19 12:14 ` Marc Zyngier
2021-07-19 12:14 ` Marc Zyngier
2021-07-19 13:32 ` Quentin Perret
2021-07-19 13:32 ` Quentin Perret
2021-07-19 13:32 ` Quentin Perret
2021-07-20 8:26 ` Marc Zyngier [this message]
2021-07-20 8:26 ` Marc Zyngier
2021-07-20 8:26 ` Marc Zyngier
2021-07-20 11:56 ` Quentin Perret
2021-07-20 11:56 ` Quentin Perret
2021-07-20 11:56 ` Quentin Perret
2021-07-19 10:47 ` [PATCH 04/14] KVM: arm64: Rename KVM_PTE_LEAF_ATTR_S2_IGNORED Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-19 10:47 ` [PATCH 05/14] KVM: arm64: Don't overwrite ignored bits with owner id Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-19 12:55 ` Marc Zyngier
2021-07-19 12:55 ` Marc Zyngier
2021-07-19 12:55 ` Marc Zyngier
2021-07-19 13:39 ` Quentin Perret
2021-07-19 13:39 ` Quentin Perret
2021-07-19 13:39 ` Quentin Perret
2021-07-20 8:46 ` Marc Zyngier
2021-07-20 8:46 ` Marc Zyngier
2021-07-20 8:46 ` Marc Zyngier
2021-07-19 10:47 ` [PATCH 06/14] KVM: arm64: Tolerate re-creating hyp mappings to set ignored bits Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-20 10:17 ` Fuad Tabba
2021-07-20 10:17 ` Fuad Tabba
2021-07-20 10:17 ` Fuad Tabba
2021-07-20 10:30 ` Quentin Perret
2021-07-20 10:30 ` Quentin Perret
2021-07-20 10:30 ` Quentin Perret
2021-07-20 10:59 ` Fuad Tabba
2021-07-20 10:59 ` Fuad Tabba
2021-07-20 10:59 ` Fuad Tabba
2021-07-20 11:14 ` Quentin Perret
2021-07-20 11:14 ` Quentin Perret
2021-07-20 11:14 ` Quentin Perret
2021-07-19 10:47 ` [PATCH 07/14] KVM: arm64: Enable forcing page-level stage-2 mappings Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-19 14:24 ` Marc Zyngier
2021-07-19 14:24 ` Marc Zyngier
2021-07-19 14:24 ` Marc Zyngier
2021-07-19 15:36 ` Quentin Perret
2021-07-19 15:36 ` Quentin Perret
2021-07-19 15:36 ` Quentin Perret
2021-07-19 10:47 ` [PATCH 08/14] KVM: arm64: Add support for tagging shared pages in page-table Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-19 14:43 ` Marc Zyngier
2021-07-19 14:43 ` Marc Zyngier
2021-07-19 14:43 ` Marc Zyngier
2021-07-19 15:49 ` Quentin Perret
2021-07-19 15:49 ` Quentin Perret
2021-07-19 15:49 ` Quentin Perret
2021-07-20 10:13 ` Marc Zyngier
2021-07-20 10:13 ` Marc Zyngier
2021-07-20 10:13 ` Marc Zyngier
2021-07-20 11:48 ` Quentin Perret
2021-07-20 11:48 ` Quentin Perret
2021-07-20 11:48 ` Quentin Perret
2021-07-20 13:48 ` Fuad Tabba
2021-07-20 13:48 ` Fuad Tabba
2021-07-20 13:48 ` Fuad Tabba
2021-07-20 14:06 ` Quentin Perret
2021-07-20 14:06 ` Quentin Perret
2021-07-20 14:06 ` Quentin Perret
2021-07-21 7:34 ` Fuad Tabba
2021-07-21 7:34 ` Fuad Tabba
2021-07-21 7:34 ` Fuad Tabba
2021-07-19 10:47 ` [PATCH 09/14] KVM: arm64: Mark host bss and rodata section as shared Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-19 15:01 ` Marc Zyngier
2021-07-19 15:01 ` Marc Zyngier
2021-07-19 15:01 ` Marc Zyngier
2021-07-19 15:56 ` Quentin Perret
2021-07-19 15:56 ` Quentin Perret
2021-07-19 15:56 ` Quentin Perret
2021-07-19 10:47 ` [PATCH 10/14] KVM: arm64: Enable retrieving protections attributes of PTEs Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-19 10:47 ` [PATCH 11/14] KVM: arm64: Expose kvm_pte_valid() helper Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-21 8:20 ` Fuad Tabba
2021-07-21 8:20 ` Fuad Tabba
2021-07-21 8:20 ` Fuad Tabba
2021-07-19 10:47 ` [PATCH 12/14] KVM: arm64: Refactor pkvm_pgtable locking Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-21 8:37 ` Fuad Tabba
2021-07-21 8:37 ` Fuad Tabba
2021-07-21 8:37 ` Fuad Tabba
2021-07-19 10:47 ` [PATCH 13/14] KVM: arm64: Restrict hyp stage-1 manipulation in protected mode Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-21 10:45 ` Fuad Tabba
2021-07-21 10:45 ` Fuad Tabba
2021-07-21 10:45 ` Fuad Tabba
2021-07-21 13:35 ` Quentin Perret
2021-07-21 13:35 ` Quentin Perret
2021-07-21 13:35 ` Quentin Perret
2021-07-19 10:47 ` [PATCH 14/14] KVM: arm64: Prevent late calls to __pkvm_create_private_mapping() Quentin Perret
2021-07-19 10:47 ` Quentin Perret
2021-07-19 10:47 ` Quentin Perret
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=875yx59ysd.wl-maz@kernel.org \
--to=maz@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=kernel-team@android.com \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=qperret@google.com \
--cc=qwandor@google.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.