Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] KVM: arm64: Fix host/hyp tracking on share/unshare hypercall failure
@ 2026-05-29  7:43 tabba
  2026-05-29  7:43 ` [PATCH 1/2] KVM: arm64: Free hyp-share tracking node when share hypercall fails tabba
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: tabba @ 2026-05-29  7:43 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Quentin Perret,
	Vincent Donnefort
  Cc: linux-arm-kernel, kvmarm, linux-kernel

Hi folks,

Yet another bug I found while testing Sashiko locally with fixes to
review-prompts.

share_pfn_hyp() and unshare_pfn_hyp() in arch/arm64/kvm/mmu.c
maintain a host-side RB-tree mirroring the set of pages shared with
EL2. Both invoke a hypercall that can fail (page-state mismatch,
EL2 refcount still held), but neither cleans up on failure:

- share_pfn_hyp() inserts the tracking node before the hypercall
  and leaves it in the tree on failure, leaking the allocation and
  presenting a phantom share to a later unshare.

- unshare_pfn_hyp() erases the tracking node before the hypercall;
  on failure the host loses its record while EL2 still owns the
  share, breaking later operations on the same pfn.

Severity is low (no isolation impact) and the failure paths are rare
in practice, but the desync is real. Both patches are independent and
apply cleanly to current mainline. In other words, this can wait for
7.2.

Cheers,
/fuad

Fuad Tabba (2):
  KVM: arm64: Free hyp-share tracking node when share hypercall fails
  KVM: arm64: Avoid host/hyp share desync on unshare hypercall failure

 arch/arm64/kvm/mmu.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

-- 
2.54.0.929.g9b7fa37559-goog



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] KVM: arm64: Free hyp-share tracking node when share hypercall fails
  2026-05-29  7:43 [PATCH 0/2] KVM: arm64: Fix host/hyp tracking on share/unshare hypercall failure tabba
@ 2026-05-29  7:43 ` tabba
  2026-05-29  7:43 ` [PATCH 2/2] KVM: arm64: Avoid host/hyp share desync on unshare hypercall failure tabba
  2026-05-29  8:02 ` [PATCH 0/2] KVM: arm64: Fix host/hyp tracking on share/unshare " Vincent Donnefort
  2 siblings, 0 replies; 12+ messages in thread
From: tabba @ 2026-05-29  7:43 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Quentin Perret,
	Vincent Donnefort
  Cc: linux-arm-kernel, kvmarm, linux-kernel

share_pfn_hyp() inserts a tracking node into hyp_shared_pfns and
then invokes __pkvm_host_share_hyp. If the hypercall rejects the
share (page-state mismatch at EL2), the node stays in the tree
with refcount 1: a phantom share that leaks the allocation and
that a later unshare will trust.

Erase the node and free it on hypercall failure.

Fixes: a83e2191b7f1 ("KVM: arm64: pkvm: Refcount the pages shared with EL2")
Reported-by: Sashiko (local):gemini-3.1-pro
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/mmu.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 4da9281312eb..4a928fb003ff 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -501,6 +501,10 @@ static int share_pfn_hyp(u64 pfn)
 	rb_link_node(&this->node, parent, node);
 	rb_insert_color(&this->node, &hyp_shared_pfns);
 	ret = kvm_call_hyp_nvhe(__pkvm_host_share_hyp, pfn);
+	if (ret) {
+		rb_erase(&this->node, &hyp_shared_pfns);
+		kfree(this);
+	}
 unlock:
 	mutex_unlock(&hyp_shared_pfns_lock);
 
-- 
2.54.0.929.g9b7fa37559-goog



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] KVM: arm64: Avoid host/hyp share desync on unshare hypercall failure
  2026-05-29  7:43 [PATCH 0/2] KVM: arm64: Fix host/hyp tracking on share/unshare hypercall failure tabba
  2026-05-29  7:43 ` [PATCH 1/2] KVM: arm64: Free hyp-share tracking node when share hypercall fails tabba
@ 2026-05-29  7:43 ` tabba
  2026-05-29  8:02 ` [PATCH 0/2] KVM: arm64: Fix host/hyp tracking on share/unshare " Vincent Donnefort
  2 siblings, 0 replies; 12+ messages in thread
From: tabba @ 2026-05-29  7:43 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Quentin Perret,
	Vincent Donnefort
  Cc: linux-arm-kernel, kvmarm, linux-kernel

unshare_pfn_hyp() erases the tracking node from hyp_shared_pfns
and frees it before invoking __pkvm_host_unshare_hyp. If the
hypercall fails (e.g. EL2 refcount still held, or page-state
mismatch), the host loses its record while EL2 still holds the
share, breaking later share/unshare attempts on the same pfn.

Invoke the hypercall first; erase and free only on success.

Fixes: 52b28657ebd7 ("KVM: arm64: pkvm: Unshare guest structs during teardown")
Reported-by: Sashiko (local):gemini-3.1-pro
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/mmu.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 4a928fb003ff..8026e834528d 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -524,13 +524,17 @@ static int unshare_pfn_hyp(u64 pfn)
 		goto unlock;
 	}
 
-	this->count--;
-	if (this->count)
+	if (this->count > 1) {
+		this->count--;
+		goto unlock;
+	}
+
+	ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_hyp, pfn);
+	if (ret)
 		goto unlock;
 
 	rb_erase(&this->node, &hyp_shared_pfns);
 	kfree(this);
-	ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_hyp, pfn);
 unlock:
 	mutex_unlock(&hyp_shared_pfns_lock);
 
-- 
2.54.0.929.g9b7fa37559-goog



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] KVM: arm64: Fix host/hyp tracking on share/unshare hypercall failure
  2026-05-29  7:43 [PATCH 0/2] KVM: arm64: Fix host/hyp tracking on share/unshare hypercall failure tabba
  2026-05-29  7:43 ` [PATCH 1/2] KVM: arm64: Free hyp-share tracking node when share hypercall fails tabba
  2026-05-29  7:43 ` [PATCH 2/2] KVM: arm64: Avoid host/hyp share desync on unshare hypercall failure tabba
@ 2026-05-29  8:02 ` Vincent Donnefort
  2026-05-29  8:05   ` Fuad Tabba
  2 siblings, 1 reply; 12+ messages in thread
From: Vincent Donnefort @ 2026-05-29  8:02 UTC (permalink / raw)
  To: tabba
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Quentin Perret,
	linux-arm-kernel, kvmarm, linux-kernel

On Fri, May 29, 2026 at 08:43:39AM +0100, tabba@google.com wrote:
> Hi folks,
> 
> Yet another bug I found while testing Sashiko locally with fixes to
> review-prompts.
> 
> share_pfn_hyp() and unshare_pfn_hyp() in arch/arm64/kvm/mmu.c
> maintain a host-side RB-tree mirroring the set of pages shared with
> EL2. Both invoke a hypercall that can fail (page-state mismatch,
> EL2 refcount still held), but neither cleans up on failure:
> 
> - share_pfn_hyp() inserts the tracking node before the hypercall
>   and leaves it in the tree on failure, leaking the allocation and
>   presenting a phantom share to a later unshare.
> 
> - unshare_pfn_hyp() erases the tracking node before the hypercall;
>   on failure the host loses its record while EL2 still owns the
>   share, breaking later operations on the same pfn.
> 
> Severity is low (no isolation impact) and the failure paths are rare
> in practice, but the desync is real. Both patches are independent and
> apply cleanly to current mainline. In other words, this can wait for
> 7.2.


I believe I fixed that here lore.kernel.org/all/acyKhZL2di_QQ9xm@google.com but
as Quentin pointed-out, there's absolutely no reason for the hypercall to fail.
So I haven't sent a v2.

> 
> Cheers,
> /fuad
> 
> Fuad Tabba (2):
>   KVM: arm64: Free hyp-share tracking node when share hypercall fails
>   KVM: arm64: Avoid host/hyp share desync on unshare hypercall failure
> 
>  arch/arm64/kvm/mmu.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> -- 
> 2.54.0.929.g9b7fa37559-goog
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] KVM: arm64: Fix host/hyp tracking on share/unshare hypercall failure
  2026-05-29  8:02 ` [PATCH 0/2] KVM: arm64: Fix host/hyp tracking on share/unshare " Vincent Donnefort
@ 2026-05-29  8:05   ` Fuad Tabba
  2026-05-29  8:15     ` Marc Zyngier
  0 siblings, 1 reply; 12+ messages in thread
From: Fuad Tabba @ 2026-05-29  8:05 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Quentin Perret,
	linux-arm-kernel, kvmarm, linux-kernel

On Fri, 29 May 2026 at 09:02, Vincent Donnefort <vdonnefort@google.com> wrote:
>
> On Fri, May 29, 2026 at 08:43:39AM +0100, tabba@google.com wrote:
> > Hi folks,
> >
> > Yet another bug I found while testing Sashiko locally with fixes to
> > review-prompts.
> >
> > share_pfn_hyp() and unshare_pfn_hyp() in arch/arm64/kvm/mmu.c
> > maintain a host-side RB-tree mirroring the set of pages shared with
> > EL2. Both invoke a hypercall that can fail (page-state mismatch,
> > EL2 refcount still held), but neither cleans up on failure:
> >
> > - share_pfn_hyp() inserts the tracking node before the hypercall
> >   and leaves it in the tree on failure, leaking the allocation and
> >   presenting a phantom share to a later unshare.
> >
> > - unshare_pfn_hyp() erases the tracking node before the hypercall;
> >   on failure the host loses its record while EL2 still owns the
> >   share, breaking later operations on the same pfn.
> >
> > Severity is low (no isolation impact) and the failure paths are rare
> > in practice, but the desync is real. Both patches are independent and
> > apply cleanly to current mainline. In other words, this can wait for
> > 7.2.
>
>
> I believe I fixed that here lore.kernel.org/all/acyKhZL2di_QQ9xm@google.com but
> as Quentin pointed-out, there's absolutely no reason for the hypercall to fail.
> So I haven't sent a v2.

At the very least we need to add a comment, otherwise, people like me
and LLMs like Sashiko would stumble upon it.

That said, this fix adds no real overhead, makes the code clearer, and
guards us against a future where that call might fail.
Self-documenting in essense.

WDYT?

/fuad

>
> >
> > Cheers,
> > /fuad
> >
> > Fuad Tabba (2):
> >   KVM: arm64: Free hyp-share tracking node when share hypercall fails
> >   KVM: arm64: Avoid host/hyp share desync on unshare hypercall failure
> >
> >  arch/arm64/kvm/mmu.c | 14 +++++++++++---
> >  1 file changed, 11 insertions(+), 3 deletions(-)
> >
> > --
> > 2.54.0.929.g9b7fa37559-goog
> >


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] KVM: arm64: Fix host/hyp tracking on share/unshare hypercall failure
  2026-05-29  8:05   ` Fuad Tabba
@ 2026-05-29  8:15     ` Marc Zyngier
  2026-05-29  8:20       ` Fuad Tabba
  0 siblings, 1 reply; 12+ messages in thread
From: Marc Zyngier @ 2026-05-29  8:15 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Vincent Donnefort, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Quentin Perret,
	linux-arm-kernel, kvmarm, linux-kernel

On Fri, 29 May 2026 09:05:35 +0100,
Fuad Tabba <tabba@google.com> wrote:
> 
> On Fri, 29 May 2026 at 09:02, Vincent Donnefort <vdonnefort@google.com> wrote:
> >
> > On Fri, May 29, 2026 at 08:43:39AM +0100, tabba@google.com wrote:
> > > Hi folks,
> > >
> > > Yet another bug I found while testing Sashiko locally with fixes to
> > > review-prompts.
> > >
> > > share_pfn_hyp() and unshare_pfn_hyp() in arch/arm64/kvm/mmu.c
> > > maintain a host-side RB-tree mirroring the set of pages shared with
> > > EL2. Both invoke a hypercall that can fail (page-state mismatch,
> > > EL2 refcount still held), but neither cleans up on failure:
> > >
> > > - share_pfn_hyp() inserts the tracking node before the hypercall
> > >   and leaves it in the tree on failure, leaking the allocation and
> > >   presenting a phantom share to a later unshare.
> > >
> > > - unshare_pfn_hyp() erases the tracking node before the hypercall;
> > >   on failure the host loses its record while EL2 still owns the
> > >   share, breaking later operations on the same pfn.
> > >
> > > Severity is low (no isolation impact) and the failure paths are rare
> > > in practice, but the desync is real. Both patches are independent and
> > > apply cleanly to current mainline. In other words, this can wait for
> > > 7.2.
> >
> >
> > I believe I fixed that here lore.kernel.org/all/acyKhZL2di_QQ9xm@google.com but
> > as Quentin pointed-out, there's absolutely no reason for the hypercall to fail.
> > So I haven't sent a v2.
> 
> At the very least we need to add a comment, otherwise, people like me
> and LLMs like Sashiko would stumble upon it.
> 
> That said, this fix adds no real overhead, makes the code clearer, and
> guards us against a future where that call might fail.
> Self-documenting in essense.
> 
> WDYT?

If a hypercall really cannot fail, why does it have a return value?

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] KVM: arm64: Fix host/hyp tracking on share/unshare hypercall failure
  2026-05-29  8:15     ` Marc Zyngier
@ 2026-05-29  8:20       ` Fuad Tabba
  2026-05-29  9:21         ` Vincent Donnefort
  2026-05-29  9:29         ` Marc Zyngier
  0 siblings, 2 replies; 12+ messages in thread
From: Fuad Tabba @ 2026-05-29  8:20 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Vincent Donnefort, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Quentin Perret,
	linux-arm-kernel, kvmarm, linux-kernel

On Fri, 29 May 2026 at 09:15, Marc Zyngier <maz@kernel.org> wrote:
>
> On Fri, 29 May 2026 09:05:35 +0100,
> Fuad Tabba <tabba@google.com> wrote:
> >
> > On Fri, 29 May 2026 at 09:02, Vincent Donnefort <vdonnefort@google.com> wrote:
> > >
> > > On Fri, May 29, 2026 at 08:43:39AM +0100, tabba@google.com wrote:
> > > > Hi folks,
> > > >
> > > > Yet another bug I found while testing Sashiko locally with fixes to
> > > > review-prompts.
> > > >
> > > > share_pfn_hyp() and unshare_pfn_hyp() in arch/arm64/kvm/mmu.c
> > > > maintain a host-side RB-tree mirroring the set of pages shared with
> > > > EL2. Both invoke a hypercall that can fail (page-state mismatch,
> > > > EL2 refcount still held), but neither cleans up on failure:
> > > >
> > > > - share_pfn_hyp() inserts the tracking node before the hypercall
> > > >   and leaves it in the tree on failure, leaking the allocation and
> > > >   presenting a phantom share to a later unshare.
> > > >
> > > > - unshare_pfn_hyp() erases the tracking node before the hypercall;
> > > >   on failure the host loses its record while EL2 still owns the
> > > >   share, breaking later operations on the same pfn.
> > > >
> > > > Severity is low (no isolation impact) and the failure paths are rare
> > > > in practice, but the desync is real. Both patches are independent and
> > > > apply cleanly to current mainline. In other words, this can wait for
> > > > 7.2.
> > >
> > >
> > > I believe I fixed that here lore.kernel.org/all/acyKhZL2di_QQ9xm@google.com but
> > > as Quentin pointed-out, there's absolutely no reason for the hypercall to fail.
> > > So I haven't sent a v2.
> >
> > At the very least we need to add a comment, otherwise, people like me
> > and LLMs like Sashiko would stumble upon it.
> >
> > That said, this fix adds no real overhead, makes the code clearer, and
> > guards us against a future where that call might fail.
> > Self-documenting in essense.
> >
> > WDYT?
>
> If a hypercall really cannot fail, why does it have a return value?

Good point. If we know it cannot fail, how about just `void`?

That said, Vincen't exact words are: `very much unlikely`, not the
same as cannot fail :)

https://lore.kernel.org/all/acyKhZL2di_QQ9xm@google.com/

/fuad

>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] KVM: arm64: Fix host/hyp tracking on share/unshare hypercall failure
  2026-05-29  8:20       ` Fuad Tabba
@ 2026-05-29  9:21         ` Vincent Donnefort
  2026-05-29  9:23           ` Fuad Tabba
  2026-05-29  9:29         ` Marc Zyngier
  1 sibling, 1 reply; 12+ messages in thread
From: Vincent Donnefort @ 2026-05-29  9:21 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Quentin Perret,
	linux-arm-kernel, kvmarm, linux-kernel

On Fri, May 29, 2026 at 09:20:50AM +0100, Fuad Tabba wrote:
> On Fri, 29 May 2026 at 09:15, Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Fri, 29 May 2026 09:05:35 +0100,
> > Fuad Tabba <tabba@google.com> wrote:
> > >
> > > On Fri, 29 May 2026 at 09:02, Vincent Donnefort <vdonnefort@google.com> wrote:
> > > >
> > > > On Fri, May 29, 2026 at 08:43:39AM +0100, tabba@google.com wrote:
> > > > > Hi folks,
> > > > >
> > > > > Yet another bug I found while testing Sashiko locally with fixes to
> > > > > review-prompts.
> > > > >
> > > > > share_pfn_hyp() and unshare_pfn_hyp() in arch/arm64/kvm/mmu.c
> > > > > maintain a host-side RB-tree mirroring the set of pages shared with
> > > > > EL2. Both invoke a hypercall that can fail (page-state mismatch,
> > > > > EL2 refcount still held), but neither cleans up on failure:
> > > > >
> > > > > - share_pfn_hyp() inserts the tracking node before the hypercall
> > > > >   and leaves it in the tree on failure, leaking the allocation and
> > > > >   presenting a phantom share to a later unshare.
> > > > >
> > > > > - unshare_pfn_hyp() erases the tracking node before the hypercall;
> > > > >   on failure the host loses its record while EL2 still owns the
> > > > >   share, breaking later operations on the same pfn.
> > > > >
> > > > > Severity is low (no isolation impact) and the failure paths are rare
> > > > > in practice, but the desync is real. Both patches are independent and
> > > > > apply cleanly to current mainline. In other words, this can wait for
> > > > > 7.2.
> > > >
> > > >
> > > > I believe I fixed that here lore.kernel.org/all/acyKhZL2di_QQ9xm@google.com but
> > > > as Quentin pointed-out, there's absolutely no reason for the hypercall to fail.
> > > > So I haven't sent a v2.
> > >
> > > At the very least we need to add a comment, otherwise, people like me
> > > and LLMs like Sashiko would stumble upon it.
> > >
> > > That said, this fix adds no real overhead, makes the code clearer, and
> > > guards us against a future where that call might fail.
> > > Self-documenting in essense.
> > >
> > > WDYT?
> >
> > If a hypercall really cannot fail, why does it have a return value?
> 
> Good point. If we know it cannot fail, how about just `void`?
> 
> That said, Vincen't exact words are: `very much unlikely`, not the
> same as cannot fail :)
> 
> https://lore.kernel.org/all/acyKhZL2di_QQ9xm@google.com/

The error would happen only if the host tries to share/unshare a page with the
wrong state. This would only happen in the case of a misbehaving host.

And Quentin's point was that this is anyway incomplete. To handle this error
properly, kvm_share_hyp/kvm_unshare_hyp would also need to rollback things...
The callers of the unshare should also leak the memory which couldn't be
unshared properly. This isn't the case now, (however we do WARN_ON).

> 
> /fuad
> 
> >
> >         M.
> >
> > --
> > Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] KVM: arm64: Fix host/hyp tracking on share/unshare hypercall failure
  2026-05-29  9:21         ` Vincent Donnefort
@ 2026-05-29  9:23           ` Fuad Tabba
  2026-05-29 10:07             ` Vincent Donnefort
  0 siblings, 1 reply; 12+ messages in thread
From: Fuad Tabba @ 2026-05-29  9:23 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Quentin Perret,
	linux-arm-kernel, kvmarm, linux-kernel

On Fri, 29 May 2026 at 10:21, Vincent Donnefort <vdonnefort@google.com> wrote:
>
> On Fri, May 29, 2026 at 09:20:50AM +0100, Fuad Tabba wrote:
> > On Fri, 29 May 2026 at 09:15, Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > On Fri, 29 May 2026 09:05:35 +0100,
> > > Fuad Tabba <tabba@google.com> wrote:
> > > >
> > > > On Fri, 29 May 2026 at 09:02, Vincent Donnefort <vdonnefort@google.com> wrote:
> > > > >
> > > > > On Fri, May 29, 2026 at 08:43:39AM +0100, tabba@google.com wrote:
> > > > > > Hi folks,
> > > > > >
> > > > > > Yet another bug I found while testing Sashiko locally with fixes to
> > > > > > review-prompts.
> > > > > >
> > > > > > share_pfn_hyp() and unshare_pfn_hyp() in arch/arm64/kvm/mmu.c
> > > > > > maintain a host-side RB-tree mirroring the set of pages shared with
> > > > > > EL2. Both invoke a hypercall that can fail (page-state mismatch,
> > > > > > EL2 refcount still held), but neither cleans up on failure:
> > > > > >
> > > > > > - share_pfn_hyp() inserts the tracking node before the hypercall
> > > > > >   and leaves it in the tree on failure, leaking the allocation and
> > > > > >   presenting a phantom share to a later unshare.
> > > > > >
> > > > > > - unshare_pfn_hyp() erases the tracking node before the hypercall;
> > > > > >   on failure the host loses its record while EL2 still owns the
> > > > > >   share, breaking later operations on the same pfn.
> > > > > >
> > > > > > Severity is low (no isolation impact) and the failure paths are rare
> > > > > > in practice, but the desync is real. Both patches are independent and
> > > > > > apply cleanly to current mainline. In other words, this can wait for
> > > > > > 7.2.
> > > > >
> > > > >
> > > > > I believe I fixed that here lore.kernel.org/all/acyKhZL2di_QQ9xm@google.com but
> > > > > as Quentin pointed-out, there's absolutely no reason for the hypercall to fail.
> > > > > So I haven't sent a v2.
> > > >
> > > > At the very least we need to add a comment, otherwise, people like me
> > > > and LLMs like Sashiko would stumble upon it.
> > > >
> > > > That said, this fix adds no real overhead, makes the code clearer, and
> > > > guards us against a future where that call might fail.
> > > > Self-documenting in essense.
> > > >
> > > > WDYT?
> > >
> > > If a hypercall really cannot fail, why does it have a return value?
> >
> > Good point. If we know it cannot fail, how about just `void`?
> >
> > That said, Vincen't exact words are: `very much unlikely`, not the
> > same as cannot fail :)
> >
> > https://lore.kernel.org/all/acyKhZL2di_QQ9xm@google.com/
>
> The error would happen only if the host tries to share/unshare a page with the
> wrong state. This would only happen in the case of a misbehaving host.
>
> And Quentin's point was that this is anyway incomplete. To handle this error
> properly, kvm_share_hyp/kvm_unshare_hyp would also need to rollback things...
> The callers of the unshare should also leak the memory which couldn't be
> unshared properly. This isn't the case now, (however we do WARN_ON).

If we WARN_ON() in hyp, then I argue we shouldn't have a return value.
Or at least add a comment, BUG_ON() here. Think of the poor LLMs and
the people who run them :)

/fuad

>
> >
> > /fuad
> >
> > >
> > >         M.
> > >
> > > --
> > > Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] KVM: arm64: Fix host/hyp tracking on share/unshare hypercall failure
  2026-05-29  8:20       ` Fuad Tabba
  2026-05-29  9:21         ` Vincent Donnefort
@ 2026-05-29  9:29         ` Marc Zyngier
  2026-05-29 10:06           ` Vincent Donnefort
  1 sibling, 1 reply; 12+ messages in thread
From: Marc Zyngier @ 2026-05-29  9:29 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Vincent Donnefort, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Quentin Perret,
	linux-arm-kernel, kvmarm, linux-kernel

On Fri, 29 May 2026 09:20:50 +0100,
Fuad Tabba <tabba@google.com> wrote:
> 
> On Fri, 29 May 2026 at 09:15, Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Fri, 29 May 2026 09:05:35 +0100,
> > Fuad Tabba <tabba@google.com> wrote:
> > >
> > > On Fri, 29 May 2026 at 09:02, Vincent Donnefort <vdonnefort@google.com> wrote:
> > > >
> > > > On Fri, May 29, 2026 at 08:43:39AM +0100, tabba@google.com wrote:
> > > > > Hi folks,
> > > > >
> > > > > Yet another bug I found while testing Sashiko locally with fixes to
> > > > > review-prompts.
> > > > >
> > > > > share_pfn_hyp() and unshare_pfn_hyp() in arch/arm64/kvm/mmu.c
> > > > > maintain a host-side RB-tree mirroring the set of pages shared with
> > > > > EL2. Both invoke a hypercall that can fail (page-state mismatch,
> > > > > EL2 refcount still held), but neither cleans up on failure:
> > > > >
> > > > > - share_pfn_hyp() inserts the tracking node before the hypercall
> > > > >   and leaves it in the tree on failure, leaking the allocation and
> > > > >   presenting a phantom share to a later unshare.
> > > > >
> > > > > - unshare_pfn_hyp() erases the tracking node before the hypercall;
> > > > >   on failure the host loses its record while EL2 still owns the
> > > > >   share, breaking later operations on the same pfn.
> > > > >
> > > > > Severity is low (no isolation impact) and the failure paths are rare
> > > > > in practice, but the desync is real. Both patches are independent and
> > > > > apply cleanly to current mainline. In other words, this can wait for
> > > > > 7.2.
> > > >
> > > >
> > > > I believe I fixed that here lore.kernel.org/all/acyKhZL2di_QQ9xm@google.com but
> > > > as Quentin pointed-out, there's absolutely no reason for the hypercall to fail.
> > > > So I haven't sent a v2.
> > >
> > > At the very least we need to add a comment, otherwise, people like me
> > > and LLMs like Sashiko would stumble upon it.
> > >
> > > That said, this fix adds no real overhead, makes the code clearer, and
> > > guards us against a future where that call might fail.
> > > Self-documenting in essense.
> > >
> > > WDYT?
> >
> > If a hypercall really cannot fail, why does it have a return value?
> 
> Good point. If we know it cannot fail, how about just `void`?
> 
> That said, Vincen't exact words are: `very much unlikely`, not the
> same as cannot fail :)
> 
> https://lore.kernel.org/all/acyKhZL2di_QQ9xm@google.com/

I think the rules are simple:

- if something can fail, we need to handle the failure

- if something should not fail and has the potential of compromising
  the system, we should panic

- if something absolutely cannot fail, then there is nothing to handle

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] KVM: arm64: Fix host/hyp tracking on share/unshare hypercall failure
  2026-05-29  9:29         ` Marc Zyngier
@ 2026-05-29 10:06           ` Vincent Donnefort
  0 siblings, 0 replies; 12+ messages in thread
From: Vincent Donnefort @ 2026-05-29 10:06 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Fuad Tabba, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Quentin Perret,
	linux-arm-kernel, kvmarm, linux-kernel

On Fri, May 29, 2026 at 10:29:40AM +0100, Marc Zyngier wrote:
> On Fri, 29 May 2026 09:20:50 +0100,
> Fuad Tabba <tabba@google.com> wrote:
> > 
> > On Fri, 29 May 2026 at 09:15, Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > On Fri, 29 May 2026 09:05:35 +0100,
> > > Fuad Tabba <tabba@google.com> wrote:
> > > >
> > > > On Fri, 29 May 2026 at 09:02, Vincent Donnefort <vdonnefort@google.com> wrote:
> > > > >
> > > > > On Fri, May 29, 2026 at 08:43:39AM +0100, tabba@google.com wrote:
> > > > > > Hi folks,
> > > > > >
> > > > > > Yet another bug I found while testing Sashiko locally with fixes to
> > > > > > review-prompts.
> > > > > >
> > > > > > share_pfn_hyp() and unshare_pfn_hyp() in arch/arm64/kvm/mmu.c
> > > > > > maintain a host-side RB-tree mirroring the set of pages shared with
> > > > > > EL2. Both invoke a hypercall that can fail (page-state mismatch,
> > > > > > EL2 refcount still held), but neither cleans up on failure:
> > > > > >
> > > > > > - share_pfn_hyp() inserts the tracking node before the hypercall
> > > > > >   and leaves it in the tree on failure, leaking the allocation and
> > > > > >   presenting a phantom share to a later unshare.
> > > > > >
> > > > > > - unshare_pfn_hyp() erases the tracking node before the hypercall;
> > > > > >   on failure the host loses its record while EL2 still owns the
> > > > > >   share, breaking later operations on the same pfn.
> > > > > >
> > > > > > Severity is low (no isolation impact) and the failure paths are rare
> > > > > > in practice, but the desync is real. Both patches are independent and
> > > > > > apply cleanly to current mainline. In other words, this can wait for
> > > > > > 7.2.
> > > > >
> > > > >
> > > > > I believe I fixed that here lore.kernel.org/all/acyKhZL2di_QQ9xm@google.com but
> > > > > as Quentin pointed-out, there's absolutely no reason for the hypercall to fail.
> > > > > So I haven't sent a v2.
> > > >
> > > > At the very least we need to add a comment, otherwise, people like me
> > > > and LLMs like Sashiko would stumble upon it.
> > > >
> > > > That said, this fix adds no real overhead, makes the code clearer, and
> > > > guards us against a future where that call might fail.
> > > > Self-documenting in essense.
> > > >
> > > > WDYT?
> > >
> > > If a hypercall really cannot fail, why does it have a return value?
> > 
> > Good point. If we know it cannot fail, how about just `void`?
> > 
> > That said, Vincen't exact words are: `very much unlikely`, not the
> > same as cannot fail :)
> > 
> > https://lore.kernel.org/all/acyKhZL2di_QQ9xm@google.com/
> 
> I think the rules are simple:
> 
> - if something can fail, we need to handle the failure

Looking at kvm_share_hyp() it should then rollback the shared pages. I think
that is fine.

> 
> - if something should not fail and has the potential of compromising
>   the system, we should panic

Then kvm_unshare_hyp() being void, should BUG_ON(unshare_pfn_hyp(pfn));

> 
> - if something absolutely cannot fail, then there is nothing to handle
> 
> Thanks,
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] KVM: arm64: Fix host/hyp tracking on share/unshare hypercall failure
  2026-05-29  9:23           ` Fuad Tabba
@ 2026-05-29 10:07             ` Vincent Donnefort
  0 siblings, 0 replies; 12+ messages in thread
From: Vincent Donnefort @ 2026-05-29 10:07 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Quentin Perret,
	linux-arm-kernel, kvmarm, linux-kernel

On Fri, May 29, 2026 at 10:23:22AM +0100, Fuad Tabba wrote:
> On Fri, 29 May 2026 at 10:21, Vincent Donnefort <vdonnefort@google.com> wrote:
> >
> > On Fri, May 29, 2026 at 09:20:50AM +0100, Fuad Tabba wrote:
> > > On Fri, 29 May 2026 at 09:15, Marc Zyngier <maz@kernel.org> wrote:
> > > >
> > > > On Fri, 29 May 2026 09:05:35 +0100,
> > > > Fuad Tabba <tabba@google.com> wrote:
> > > > >
> > > > > On Fri, 29 May 2026 at 09:02, Vincent Donnefort <vdonnefort@google.com> wrote:
> > > > > >
> > > > > > On Fri, May 29, 2026 at 08:43:39AM +0100, tabba@google.com wrote:
> > > > > > > Hi folks,
> > > > > > >
> > > > > > > Yet another bug I found while testing Sashiko locally with fixes to
> > > > > > > review-prompts.
> > > > > > >
> > > > > > > share_pfn_hyp() and unshare_pfn_hyp() in arch/arm64/kvm/mmu.c
> > > > > > > maintain a host-side RB-tree mirroring the set of pages shared with
> > > > > > > EL2. Both invoke a hypercall that can fail (page-state mismatch,
> > > > > > > EL2 refcount still held), but neither cleans up on failure:
> > > > > > >
> > > > > > > - share_pfn_hyp() inserts the tracking node before the hypercall
> > > > > > >   and leaves it in the tree on failure, leaking the allocation and
> > > > > > >   presenting a phantom share to a later unshare.
> > > > > > >
> > > > > > > - unshare_pfn_hyp() erases the tracking node before the hypercall;
> > > > > > >   on failure the host loses its record while EL2 still owns the
> > > > > > >   share, breaking later operations on the same pfn.
> > > > > > >
> > > > > > > Severity is low (no isolation impact) and the failure paths are rare
> > > > > > > in practice, but the desync is real. Both patches are independent and
> > > > > > > apply cleanly to current mainline. In other words, this can wait for
> > > > > > > 7.2.
> > > > > >
> > > > > >
> > > > > > I believe I fixed that here lore.kernel.org/all/acyKhZL2di_QQ9xm@google.com but
> > > > > > as Quentin pointed-out, there's absolutely no reason for the hypercall to fail.
> > > > > > So I haven't sent a v2.
> > > > >
> > > > > At the very least we need to add a comment, otherwise, people like me
> > > > > and LLMs like Sashiko would stumble upon it.
> > > > >
> > > > > That said, this fix adds no real overhead, makes the code clearer, and
> > > > > guards us against a future where that call might fail.
> > > > > Self-documenting in essense.
> > > > >
> > > > > WDYT?
> > > >
> > > > If a hypercall really cannot fail, why does it have a return value?
> > >
> > > Good point. If we know it cannot fail, how about just `void`?
> > >
> > > That said, Vincen't exact words are: `very much unlikely`, not the
> > > same as cannot fail :)
> > >
> > > https://lore.kernel.org/all/acyKhZL2di_QQ9xm@google.com/
> >
> > The error would happen only if the host tries to share/unshare a page with the
> > wrong state. This would only happen in the case of a misbehaving host.
> >
> > And Quentin's point was that this is anyway incomplete. To handle this error
> > properly, kvm_share_hyp/kvm_unshare_hyp would also need to rollback things...
> > The callers of the unshare should also leak the memory which couldn't be
> > unshared properly. This isn't the case now, (however we do WARN_ON).
> 
> If we WARN_ON() in hyp, then I argue we shouldn't have a return value.

I meant the WARN_ON in the host's kvm_hyp_unshare()

> Or at least add a comment, BUG_ON() here. Think of the poor LLMs and
> the people who run them :)
> 
> /fuad
> 
> >
> > >
> > > /fuad
> > >
> > > >
> > > >         M.
> > > >
> > > > --
> > > > Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-05-29 10:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-29  7:43 [PATCH 0/2] KVM: arm64: Fix host/hyp tracking on share/unshare hypercall failure tabba
2026-05-29  7:43 ` [PATCH 1/2] KVM: arm64: Free hyp-share tracking node when share hypercall fails tabba
2026-05-29  7:43 ` [PATCH 2/2] KVM: arm64: Avoid host/hyp share desync on unshare hypercall failure tabba
2026-05-29  8:02 ` [PATCH 0/2] KVM: arm64: Fix host/hyp tracking on share/unshare " Vincent Donnefort
2026-05-29  8:05   ` Fuad Tabba
2026-05-29  8:15     ` Marc Zyngier
2026-05-29  8:20       ` Fuad Tabba
2026-05-29  9:21         ` Vincent Donnefort
2026-05-29  9:23           ` Fuad Tabba
2026-05-29 10:07             ` Vincent Donnefort
2026-05-29  9:29         ` Marc Zyngier
2026-05-29 10:06           ` Vincent Donnefort

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox