From: Quentin Perret <qperret@google.com>
To: Marc Zyngier <maz@kernel.org>
Cc: kernel-team@android.com, Will Deacon <will@kernel.org>,
Catalin Marinas <catalin.marinas@arm.com>,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
kvmarm@lists.cs.columbia.edu
Subject: Re: [PATCH 1/2] KVM: arm64: Fix host stage-2 PGD refcount
Date: Mon, 4 Oct 2021 11:05:12 +0100 [thread overview]
Message-ID: <YVrR2M8OmU6ug1Yk@google.com> (raw)
In-Reply-To: <87bl45ru66.wl-maz@kernel.org>
Hey Marc,
On Monday 04 Oct 2021 at 10:55:13 (+0100), Marc Zyngier wrote:
> Hi Quentin,
>
> On Mon, 04 Oct 2021 10:03:13 +0100,
> Quentin Perret <qperret@google.com> wrote:
> >
> > The KVM page-table library refcounts the pages of concatenated stage-2
> > PGDs individually. However, the host's stage-2 PGD is currently managed
> > by EL2 as a single high-order compound page, which can cause the
> > refcount of the tail pages to reach 0 when they really shouldn't, hence
> > corrupting the page-table.
>
> nit: this comment only applies to the protected mode, right? As far as
> I can tell, 'classic' KVM is just fine.
Correct, this really only applies to the host stage-2, which implies
we're in protected mode. I'll make that a bit more explicit.
> > Fix this by introducing a new hyp_split_page() helper in the EL2 page
> > allocator (matching EL1's split_page() function), and make use of it
>
> uber nit: split_page() is not an EL1 function. more of a standard
> kernel function.
Fair enough :)
> > from host_s2_zalloc_page().
> >
> > Fixes: 1025c8c0c6ac ("KVM: arm64: Wrap the host with a stage 2")
> > Suggested-by: Will Deacon <will@kernel.org>
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > ---
> > arch/arm64/kvm/hyp/include/nvhe/gfp.h | 1 +
> > arch/arm64/kvm/hyp/nvhe/mem_protect.c | 6 +++++-
> > arch/arm64/kvm/hyp/nvhe/page_alloc.c | 14 ++++++++++++++
> > 3 files changed, 20 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> > index fb0f523d1492..0a048dc06a7d 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> > @@ -24,6 +24,7 @@ struct hyp_pool {
> >
> > /* Allocation */
> > void *hyp_alloc_pages(struct hyp_pool *pool, unsigned short order);
> > +void hyp_split_page(struct hyp_page *page);
> > void hyp_get_page(struct hyp_pool *pool, void *addr);
> > void hyp_put_page(struct hyp_pool *pool, void *addr);
> >
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > index bacd493a4eac..93a79736c283 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > @@ -35,7 +35,11 @@ const u8 pkvm_hyp_id = 1;
> >
> > static void *host_s2_zalloc_pages_exact(size_t size)
> > {
> > - return hyp_alloc_pages(&host_s2_pool, get_order(size));
> > + void *addr = hyp_alloc_pages(&host_s2_pool, get_order(size));
> > +
> > + hyp_split_page(hyp_virt_to_page(addr));
>
> The only reason this doesn't lead to a subsequent memory leak is that
> concatenated page tables are always a power of two, right?
Indeed, and also because the host stage-2 is _never_ freed, so that's
not memory we're going to reclaim anyway -- we don't have an
implementation of ->free_pages_exact() in the host stage-2 mm_ops.
> If so, that deserves a comment, because I don't think this works in
> the general case unless you actively free the pages that are between
> size and (1 << order).
Ack, that'll probably confuse me too in a few weeks, so a comment won't
hurt. I'll re-spin shortly.
Thanks,
Quentin
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
WARNING: multiple messages have this Message-ID (diff)
From: Quentin Perret <qperret@google.com>
To: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>,
Alexandru Elisei <alexandru.elisei@arm.com>,
Suzuki K Poulose <suzuki.poulose@arm.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>, Fuad Tabba <tabba@google.com>,
David Brazdil <dbrazdil@google.com>,
linux-arm-kernel@lists.infradead.org,
kvmarm@lists.cs.columbia.edu, linux-kernel@vger.kernel.org,
kernel-team@android.com
Subject: Re: [PATCH 1/2] KVM: arm64: Fix host stage-2 PGD refcount
Date: Mon, 4 Oct 2021 11:05:12 +0100 [thread overview]
Message-ID: <YVrR2M8OmU6ug1Yk@google.com> (raw)
In-Reply-To: <87bl45ru66.wl-maz@kernel.org>
Hey Marc,
On Monday 04 Oct 2021 at 10:55:13 (+0100), Marc Zyngier wrote:
> Hi Quentin,
>
> On Mon, 04 Oct 2021 10:03:13 +0100,
> Quentin Perret <qperret@google.com> wrote:
> >
> > The KVM page-table library refcounts the pages of concatenated stage-2
> > PGDs individually. However, the host's stage-2 PGD is currently managed
> > by EL2 as a single high-order compound page, which can cause the
> > refcount of the tail pages to reach 0 when they really shouldn't, hence
> > corrupting the page-table.
>
> nit: this comment only applies to the protected mode, right? As far as
> I can tell, 'classic' KVM is just fine.
Correct, this really only applies to the host stage-2, which implies
we're in protected mode. I'll make that a bit more explicit.
> > Fix this by introducing a new hyp_split_page() helper in the EL2 page
> > allocator (matching EL1's split_page() function), and make use of it
>
> uber nit: split_page() is not an EL1 function. more of a standard
> kernel function.
Fair enough :)
> > from host_s2_zalloc_page().
> >
> > Fixes: 1025c8c0c6ac ("KVM: arm64: Wrap the host with a stage 2")
> > Suggested-by: Will Deacon <will@kernel.org>
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > ---
> > arch/arm64/kvm/hyp/include/nvhe/gfp.h | 1 +
> > arch/arm64/kvm/hyp/nvhe/mem_protect.c | 6 +++++-
> > arch/arm64/kvm/hyp/nvhe/page_alloc.c | 14 ++++++++++++++
> > 3 files changed, 20 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> > index fb0f523d1492..0a048dc06a7d 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> > @@ -24,6 +24,7 @@ struct hyp_pool {
> >
> > /* Allocation */
> > void *hyp_alloc_pages(struct hyp_pool *pool, unsigned short order);
> > +void hyp_split_page(struct hyp_page *page);
> > void hyp_get_page(struct hyp_pool *pool, void *addr);
> > void hyp_put_page(struct hyp_pool *pool, void *addr);
> >
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > index bacd493a4eac..93a79736c283 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > @@ -35,7 +35,11 @@ const u8 pkvm_hyp_id = 1;
> >
> > static void *host_s2_zalloc_pages_exact(size_t size)
> > {
> > - return hyp_alloc_pages(&host_s2_pool, get_order(size));
> > + void *addr = hyp_alloc_pages(&host_s2_pool, get_order(size));
> > +
> > + hyp_split_page(hyp_virt_to_page(addr));
>
> The only reason this doesn't lead to a subsequent memory leak is that
> concatenated page tables are always a power of two, right?
Indeed, and also because the host stage-2 is _never_ freed, so that's
not memory we're going to reclaim anyway -- we don't have an
implementation of ->free_pages_exact() in the host stage-2 mm_ops.
> If so, that deserves a comment, because I don't think this works in
> the general case unless you actively free the pages that are between
> size and (1 << order).
Ack, that'll probably confuse me too in a few weeks, so a comment won't
hurt. I'll re-spin shortly.
Thanks,
Quentin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
WARNING: multiple messages have this Message-ID (diff)
From: Quentin Perret <qperret@google.com>
To: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>,
Alexandru Elisei <alexandru.elisei@arm.com>,
Suzuki K Poulose <suzuki.poulose@arm.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>, Fuad Tabba <tabba@google.com>,
David Brazdil <dbrazdil@google.com>,
linux-arm-kernel@lists.infradead.org,
kvmarm@lists.cs.columbia.edu, linux-kernel@vger.kernel.org,
kernel-team@android.com
Subject: Re: [PATCH 1/2] KVM: arm64: Fix host stage-2 PGD refcount
Date: Mon, 4 Oct 2021 11:05:12 +0100 [thread overview]
Message-ID: <YVrR2M8OmU6ug1Yk@google.com> (raw)
In-Reply-To: <87bl45ru66.wl-maz@kernel.org>
Hey Marc,
On Monday 04 Oct 2021 at 10:55:13 (+0100), Marc Zyngier wrote:
> Hi Quentin,
>
> On Mon, 04 Oct 2021 10:03:13 +0100,
> Quentin Perret <qperret@google.com> wrote:
> >
> > The KVM page-table library refcounts the pages of concatenated stage-2
> > PGDs individually. However, the host's stage-2 PGD is currently managed
> > by EL2 as a single high-order compound page, which can cause the
> > refcount of the tail pages to reach 0 when they really shouldn't, hence
> > corrupting the page-table.
>
> nit: this comment only applies to the protected mode, right? As far as
> I can tell, 'classic' KVM is just fine.
Correct, this really only applies to the host stage-2, which implies
we're in protected mode. I'll make that a bit more explicit.
> > Fix this by introducing a new hyp_split_page() helper in the EL2 page
> > allocator (matching EL1's split_page() function), and make use of it
>
> uber nit: split_page() is not an EL1 function. more of a standard
> kernel function.
Fair enough :)
> > from host_s2_zalloc_page().
> >
> > Fixes: 1025c8c0c6ac ("KVM: arm64: Wrap the host with a stage 2")
> > Suggested-by: Will Deacon <will@kernel.org>
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > ---
> > arch/arm64/kvm/hyp/include/nvhe/gfp.h | 1 +
> > arch/arm64/kvm/hyp/nvhe/mem_protect.c | 6 +++++-
> > arch/arm64/kvm/hyp/nvhe/page_alloc.c | 14 ++++++++++++++
> > 3 files changed, 20 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> > index fb0f523d1492..0a048dc06a7d 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> > @@ -24,6 +24,7 @@ struct hyp_pool {
> >
> > /* Allocation */
> > void *hyp_alloc_pages(struct hyp_pool *pool, unsigned short order);
> > +void hyp_split_page(struct hyp_page *page);
> > void hyp_get_page(struct hyp_pool *pool, void *addr);
> > void hyp_put_page(struct hyp_pool *pool, void *addr);
> >
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > index bacd493a4eac..93a79736c283 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > @@ -35,7 +35,11 @@ const u8 pkvm_hyp_id = 1;
> >
> > static void *host_s2_zalloc_pages_exact(size_t size)
> > {
> > - return hyp_alloc_pages(&host_s2_pool, get_order(size));
> > + void *addr = hyp_alloc_pages(&host_s2_pool, get_order(size));
> > +
> > + hyp_split_page(hyp_virt_to_page(addr));
>
> The only reason this doesn't lead to a subsequent memory leak is that
> concatenated page tables are always a power of two, right?
Indeed, and also because the host stage-2 is _never_ freed, so that's
not memory we're going to reclaim anyway -- we don't have an
implementation of ->free_pages_exact() in the host stage-2 mm_ops.
> If so, that deserves a comment, because I don't think this works in
> the general case unless you actively free the pages that are between
> size and (1 << order).
Ack, that'll probably confuse me too in a few weeks, so a comment won't
hurt. I'll re-spin shortly.
Thanks,
Quentin
next prev parent reply other threads:[~2021-10-04 10:05 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-04 9:03 [PATCH 0/2] A couple of EL2 refcounts fixes Quentin Perret
2021-10-04 9:03 ` Quentin Perret
2021-10-04 9:03 ` Quentin Perret
2021-10-04 9:03 ` [PATCH 1/2] KVM: arm64: Fix host stage-2 PGD refcount Quentin Perret
2021-10-04 9:03 ` Quentin Perret
2021-10-04 9:03 ` Quentin Perret
2021-10-04 9:55 ` Marc Zyngier
2021-10-04 9:55 ` Marc Zyngier
2021-10-04 9:55 ` Marc Zyngier
2021-10-04 10:05 ` Quentin Perret [this message]
2021-10-04 10:05 ` Quentin Perret
2021-10-04 10:05 ` Quentin Perret
2021-10-04 9:03 ` [PATCH 2/2] KVM: arm64: Report corrupted refcount at EL2 Quentin Perret
2021-10-04 9:03 ` Quentin Perret
2021-10-04 9:03 ` Quentin Perret
2021-10-04 9:39 ` [PATCH 0/2] A couple of EL2 refcounts fixes Will Deacon
2021-10-04 9:39 ` Will Deacon
2021-10-04 9:39 ` Will Deacon
-- strict thread matches above, loose matches on Subject: below --
2021-10-05 9:01 Quentin Perret
2021-10-05 9:01 ` [PATCH 1/2] KVM: arm64: Fix host stage-2 PGD refcount Quentin Perret
2021-10-05 9:01 ` Quentin Perret
2021-10-05 9:01 ` Quentin Perret
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YVrR2M8OmU6ug1Yk@google.com \
--to=qperret@google.com \
--cc=catalin.marinas@arm.com \
--cc=kernel-team@android.com \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maz@kernel.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.