* [PATCH 09/10] mshv: Add missing vp_index bounds check in intercept ISR
From: Stanislav Kinsburskii @ 2026-04-29 18:18 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
In-Reply-To: <177748522635.144491.1565666089881726479.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>
mshv_intercept_isr() reads vp_index from the intercept message payload
and uses it directly to index into partition->pt_vp_array without
validating it against MSHV_MAX_VPS. A malformed or corrupted hypervisor
message with a vp_index beyond the array bounds would cause an
out-of-bounds memory access in interrupt context, likely crashing the
host.
Both handle_bitset_message() and handle_pair_message() already validate
vp_index before use. Add the same check to mshv_intercept_isr() for
consistency.
Fixes: 621191d709b14 ("Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs")
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
drivers/hv/mshv_synic.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/hv/mshv_synic.c b/drivers/hv/mshv_synic.c
index 43f1bcbbf2d34..5bceb81229817 100644
--- a/drivers/hv/mshv_synic.c
+++ b/drivers/hv/mshv_synic.c
@@ -384,6 +384,10 @@ mshv_intercept_isr(struct hv_message *msg)
*/
vp_index =
((struct hv_opaque_intercept_message *)msg->u.payload)->vp_index;
+ if (unlikely(vp_index >= MSHV_MAX_VPS)) {
+ pr_debug("VP index %u out of bounds\n", vp_index);
+ goto unlock_out;
+ }
vp = partition->pt_vp_array[vp_index];
if (unlikely(!vp)) {
pr_debug("failed to find VP %u\n", vp_index);
^ permalink raw reply related
* [PATCH 08/10] mshv: Use kfree_rcu in mshv_portid_free
From: Stanislav Kinsburskii @ 2026-04-29 18:18 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
In-Reply-To: <177748522635.144491.1565666089881726479.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>
mshv_portid_free() uses synchronize_rcu() followed by kfree() to
reclaim port table entries. This blocks the caller until a full RCU
grace period elapses, which is unnecessary since the same module already
uses the non-blocking kfree_rcu() pattern in mshv_port_table_fini().
Replace with kfree_rcu() to avoid the blocking wait and keep the
reclamation strategy consistent across the file.
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
drivers/hv/mshv_portid_table.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/hv/mshv_portid_table.c b/drivers/hv/mshv_portid_table.c
index f1aaef69eb9b7..7ffe7a1c1e518 100644
--- a/drivers/hv/mshv_portid_table.c
+++ b/drivers/hv/mshv_portid_table.c
@@ -60,8 +60,7 @@ mshv_portid_free(int port_id)
WARN_ON(!info);
idr_unlock(&port_table_idr);
- synchronize_rcu();
- kfree(info);
+ kfree_rcu(info, portbl_rcu);
}
int
^ permalink raw reply related
* [PATCH 07/10] mshv: Fix use-after-RCU in mshv_portid_lookup
From: Stanislav Kinsburskii @ 2026-04-29 18:18 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
In-Reply-To: <177748522635.144491.1565666089881726479.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>
mshv_portid_lookup() drops the RCU read lock before copying the
port_table_info struct found by idr_find(). If mshv_portid_free() runs
concurrently on another CPU, it can remove the entry and free it (via
synchronize_rcu + kfree) before the copy at line *info = *_info
completes — resulting in a use-after-free.
Move rcu_read_unlock() after the struct copy so the object remains
protected for the entire duration of the read-side access.
Fixes: 621191d709b14 ("Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs")
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
drivers/hv/mshv_portid_table.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/hv/mshv_portid_table.c b/drivers/hv/mshv_portid_table.c
index c349af1f0aaac..f1aaef69eb9b7 100644
--- a/drivers/hv/mshv_portid_table.c
+++ b/drivers/hv/mshv_portid_table.c
@@ -72,12 +72,11 @@ mshv_portid_lookup(int port_id, struct port_table_info *info)
rcu_read_lock();
_info = idr_find(&port_table_idr, port_id);
- rcu_read_unlock();
-
if (_info) {
*info = *_info;
ret = 0;
}
+ rcu_read_unlock();
return ret;
}
^ permalink raw reply related
* [PATCH 06/10] mshv: Fix duplicate GSI detection for GSI 0
From: Stanislav Kinsburskii @ 2026-04-29 18:18 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
In-Reply-To: <177748522635.144491.1565666089881726479.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>
The duplicate routing entry check in mshv_update_routing_table() uses
guest_irq_num != 0 to detect whether a GSI slot is already occupied.
This fails for GSI 0 because its guest_irq_num is 0 both when the slot
is unused (zero-initialized) and when legitimately assigned. As a
result, duplicate entries for GSI 0 are silently accepted, with the
second entry overwriting the first — corrupting the routing table
without any error reported to userspace.
While GSI 0 (legacy timer) is unlikely to appear in MSI-based routing
in practice, the check is semantically wrong — it conflates
"uninitialized" with "GSI number 0." Use girq_entry_valid instead,
which is explicitly set to true when an entry is populated and remains
zero for unused slots regardless of the GSI number.
Fixes: 621191d709b14 ("Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs")
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
drivers/hv/mshv_irq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hv/mshv_irq.c b/drivers/hv/mshv_irq.c
index b3142c84dcbc2..65a4ffc82d566 100644
--- a/drivers/hv/mshv_irq.c
+++ b/drivers/hv/mshv_irq.c
@@ -51,7 +51,7 @@ int mshv_update_routing_table(struct mshv_partition *partition,
/*
* Allow only one to one mapping between GSI and MSI routing.
*/
- if (girq->guest_irq_num != 0) {
+ if (girq->girq_entry_valid) {
r = -EINVAL;
goto out;
}
^ permalink raw reply related
* [PATCH 05/10] mshv: Fix level-triggered check on uninitialized data
From: Stanislav Kinsburskii @ 2026-04-29 18:18 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
In-Reply-To: <177748522635.144491.1565666089881726479.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>
In mshv_irqfd_assign(), the level-triggered validation for resample
irqfds checks irqfd_lapic_irq.lapic_control.level_triggered before
mshv_irqfd_update() has populated the field. Since the irqfd struct is
zero-allocated, level_triggered is always 0 at that point, causing the
check to always reject resample irqfds with -EINVAL. This makes
level-triggered interrupt resampling — used to avoid interrupt storms
with assigned devices — completely non-functional.
Move the check after the mshv_irqfd_update() call, which resolves the
IRQ routing entry and populates irqfd_lapic_irq with the actual trigger
mode.
Fixes: 621191d709b14 ("Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs")
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
drivers/hv/mshv_eventfd.c | 25 ++++++++++++++-----------
1 file changed, 14 insertions(+), 11 deletions(-)
diff --git a/drivers/hv/mshv_eventfd.c b/drivers/hv/mshv_eventfd.c
index d9491a14f30f1..fd594acce3235 100644
--- a/drivers/hv/mshv_eventfd.c
+++ b/drivers/hv/mshv_eventfd.c
@@ -478,6 +478,19 @@ static int mshv_irqfd_assign(struct mshv_partition *pt,
init_poll_funcptr(&irqfd->irqfd_polltbl, mshv_irqfd_queue_proc);
spin_lock_irq(&pt->pt_irqfds_lock);
+ ret = 0;
+ hlist_for_each_entry(tmp, &pt->pt_irqfds_list, irqfd_hnode) {
+ if (irqfd->irqfd_eventfd_ctx != tmp->irqfd_eventfd_ctx)
+ continue;
+ /* This fd is used for another irq already. */
+ ret = -EBUSY;
+ spin_unlock_irq(&pt->pt_irqfds_lock);
+ goto fail;
+ }
+
+ idx = srcu_read_lock(&pt->pt_irq_srcu);
+ mshv_irqfd_update(pt, irqfd);
+
#if IS_ENABLED(CONFIG_X86)
if (args->flags & BIT(MSHV_IRQFD_BIT_RESAMPLE) &&
!irqfd->irqfd_lapic_irq.lapic_control.level_triggered) {
@@ -486,22 +499,12 @@ static int mshv_irqfd_assign(struct mshv_partition *pt,
* Otherwise return with failure
*/
spin_unlock_irq(&pt->pt_irqfds_lock);
+ srcu_read_unlock(&pt->pt_irq_srcu, idx);
ret = -EINVAL;
goto fail;
}
#endif
- ret = 0;
- hlist_for_each_entry(tmp, &pt->pt_irqfds_list, irqfd_hnode) {
- if (irqfd->irqfd_eventfd_ctx != tmp->irqfd_eventfd_ctx)
- continue;
- /* This fd is used for another irq already. */
- ret = -EBUSY;
- spin_unlock_irq(&pt->pt_irqfds_lock);
- goto fail;
- }
- idx = srcu_read_lock(&pt->pt_irq_srcu);
- mshv_irqfd_update(pt, irqfd);
hlist_add_head(&irqfd->irqfd_hnode, &pt->pt_irqfds_list);
spin_unlock_irq(&pt->pt_irqfds_lock);
^ permalink raw reply related
* [PATCH 04/10] mshv: Fix broken seqcount read protection
From: Stanislav Kinsburskii @ 2026-04-29 18:17 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
In-Reply-To: <177748522635.144491.1565666089881726479.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>
mshv_irqfd_update() writes both irqfd_girq_ent and irqfd_lapic_irq as a
logical unit under seqcount write protection. Readers must snapshot these
fields inside the seqcount begin/retry loop to obtain a consistent
point-in-time view — otherwise a concurrent update can produce a torn
read where one field comes from the old state and the other from the new.
Both mshv_assert_irq_slow() and mshv_irqfd_wakeup() get this wrong: the
seqcount loop bodies are empty (just spinning until a stable sequence is
observed), and all reads of the protected fields happen after the loop
with no protection from concurrent writes. If mshv_irqfd_update() races
with interrupt assertion, the caller may use a stale or mixed
vector/apic_id/control combination — delivering an interrupt to the
wrong vCPU, with the wrong vector, or with the wrong trigger mode. This
can cause spurious or lost interrupts in the guest, or a stuck interrupt
line in the level-triggered case.
Fix mshv_assert_irq_slow() by snapshotting both irqfd_girq_ent and
irqfd_lapic_irq into local variables inside the seqcount loop, then
using those locals for the validity check and the hypercall.
Fix mshv_irqfd_wakeup() by snapshotting irqfd_lapic_irq inside its
seqcount loop and passing the snapshot to mshv_try_assert_irq_fast(),
so the fast path operates on the consistent copy rather than reading
the field directly outside seqcount protection.
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
drivers/hv/mshv_eventfd.c | 47 +++++++++++++++++++++++++--------------------
1 file changed, 26 insertions(+), 21 deletions(-)
diff --git a/drivers/hv/mshv_eventfd.c b/drivers/hv/mshv_eventfd.c
index 704c229ee3b19..d9491a14f30f1 100644
--- a/drivers/hv/mshv_eventfd.c
+++ b/drivers/hv/mshv_eventfd.c
@@ -151,10 +151,10 @@ static int mshv_vp_irq_set_vector(struct mshv_vp *vp, u32 vector)
* Try to raise irq for guest via shared vector array. hyp does the actual
* inject of the interrupt.
*/
-static int mshv_try_assert_irq_fast(struct mshv_irqfd *irqfd)
+static int mshv_try_assert_irq_fast(struct mshv_irqfd *irqfd,
+ const struct mshv_lapic_irq *irq)
{
struct mshv_partition *partition = irqfd->irqfd_partn;
- struct mshv_lapic_irq *irq = &irqfd->irqfd_lapic_irq;
struct mshv_vp *vp;
if (!(ms_hyperv.ext_features &
@@ -186,7 +186,8 @@ static int mshv_try_assert_irq_fast(struct mshv_irqfd *irqfd)
return 0;
}
#else /* CONFIG_X86_64 */
-static int mshv_try_assert_irq_fast(struct mshv_irqfd *irqfd)
+static int mshv_try_assert_irq_fast(struct mshv_irqfd *irqfd,
+ const struct mshv_lapic_irq *irq)
{
return -EOPNOTSUPP;
}
@@ -195,30 +196,32 @@ static int mshv_try_assert_irq_fast(struct mshv_irqfd *irqfd)
static void mshv_assert_irq_slow(struct mshv_irqfd *irqfd)
{
struct mshv_partition *partition = irqfd->irqfd_partn;
- struct mshv_lapic_irq *irq = &irqfd->irqfd_lapic_irq;
+ struct mshv_guest_irq_ent girq_ent;
+ struct mshv_lapic_irq irq;
unsigned int seq;
int idx;
-#if IS_ENABLED(CONFIG_X86)
- WARN_ON(irqfd->irqfd_resampler &&
- !irq->lapic_control.level_triggered);
-#endif
-
idx = srcu_read_lock(&partition->pt_irq_srcu);
- if (irqfd->irqfd_girq_ent.guest_irq_num) {
- if (!irqfd->irqfd_girq_ent.girq_entry_valid) {
- srcu_read_unlock(&partition->pt_irq_srcu, idx);
- return;
- }
- do {
- seq = read_seqcount_begin(&irqfd->irqfd_irqe_sc);
- } while (read_seqcount_retry(&irqfd->irqfd_irqe_sc, seq));
+ do {
+ seq = read_seqcount_begin(&irqfd->irqfd_irqe_sc);
+ girq_ent = irqfd->irqfd_girq_ent;
+ irq = irqfd->irqfd_lapic_irq;
+ } while (read_seqcount_retry(&irqfd->irqfd_irqe_sc, seq));
+
+ if (girq_ent.guest_irq_num && !girq_ent.girq_entry_valid) {
+ srcu_read_unlock(&partition->pt_irq_srcu, idx);
+ return;
}
- hv_call_assert_virtual_interrupt(irqfd->irqfd_partn->pt_id,
- irq->lapic_vector, irq->lapic_apic_id,
- irq->lapic_control);
+#if IS_ENABLED(CONFIG_X86)
+ WARN_ON(irqfd->irqfd_resampler &&
+ !irq.lapic_control.level_triggered);
+#endif
+
+ hv_call_assert_virtual_interrupt(partition->pt_id,
+ irq.lapic_vector, irq.lapic_apic_id,
+ irq.lapic_control);
srcu_read_unlock(&partition->pt_irq_srcu, idx);
}
@@ -304,16 +307,18 @@ static int mshv_irqfd_wakeup(wait_queue_entry_t *wait, unsigned int mode,
int ret = 0;
if (flags & EPOLLIN) {
+ struct mshv_lapic_irq irq;
u64 cnt;
eventfd_ctx_do_read(irqfd->irqfd_eventfd_ctx, &cnt);
idx = srcu_read_lock(&pt->pt_irq_srcu);
do {
seq = read_seqcount_begin(&irqfd->irqfd_irqe_sc);
+ irq = irqfd->irqfd_lapic_irq;
} while (read_seqcount_retry(&irqfd->irqfd_irqe_sc, seq));
/* An event has been signaled, raise an interrupt */
- ret = mshv_try_assert_irq_fast(irqfd);
+ ret = mshv_try_assert_irq_fast(irqfd, &irq);
if (ret)
mshv_assert_irq_slow(irqfd);
^ permalink raw reply related
* [PATCH 03/10] mshv: Fix missing lock in mshv_irqfd_deassign
From: Stanislav Kinsburskii @ 2026-04-29 18:17 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
In-Reply-To: <177748522635.144491.1565666089881726479.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>
mshv_irqfd_deactivate() and the hlist traversal of pt_irqfds_list
require pt->pt_irqfds_lock to be held, but mshv_irqfd_deassign()
omits it. This races with the EPOLLHUP path in mshv_irqfd_wakeup(),
which does take the lock before calling mshv_irqfd_deactivate().
Add the missing spin_lock_irq/spin_unlock_irq around the list
traversal, matching the pattern in mshv_irqfd_release().
Fixes: 621191d709b14 ("Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs")
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
drivers/hv/mshv_eventfd.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/hv/mshv_eventfd.c b/drivers/hv/mshv_eventfd.c
index 90959f639dc32..704c229ee3b19 100644
--- a/drivers/hv/mshv_eventfd.c
+++ b/drivers/hv/mshv_eventfd.c
@@ -541,13 +541,14 @@ static int mshv_irqfd_deassign(struct mshv_partition *pt,
if (IS_ERR(eventfd))
return PTR_ERR(eventfd);
+ spin_lock_irq(&pt->pt_irqfds_lock);
hlist_for_each_entry_safe(irqfd, n, &pt->pt_irqfds_list,
irqfd_hnode) {
if (irqfd->irqfd_eventfd_ctx == eventfd &&
irqfd->irqfd_irqnum == args->gsi)
-
mshv_irqfd_deactivate(irqfd);
}
+ spin_unlock_irq(&pt->pt_irqfds_lock);
eventfd_ctx_put(eventfd);
^ permalink raw reply related
* [PATCH 02/10] mshv: Fix potential integer overflow in mshv_region_create
From: Stanislav Kinsburskii @ 2026-04-29 18:17 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
In-Reply-To: <177748522635.144491.1565666089881726479.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>
The allocation size is computed as:
sizeof(*region) + sizeof(struct page *) * nr_pages
where nr_pages is a u64 originating from userspace. A sufficiently
large nr_pages can overflow the multiplication, resulting in a small
allocation followed by out-of-bounds writes when populating mreg_pages.
Use struct_size() which returns SIZE_MAX on overflow, causing vzalloc
to safely return NULL — caught by the existing error check.
Fixes: 621191d709b14 ("Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs")
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
drivers/hv/mshv_regions.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
index fdffd4f002f6f..1d04a97980b8b 100644
--- a/drivers/hv/mshv_regions.c
+++ b/drivers/hv/mshv_regions.c
@@ -177,7 +177,7 @@ struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
{
struct mshv_mem_region *region;
- region = vzalloc(sizeof(*region) + sizeof(struct page *) * nr_pages);
+ region = vzalloc(struct_size(region, mreg_pages, nr_pages));
if (!region)
return ERR_PTR(-ENOMEM);
^ permalink raw reply related
* [PATCH 01/10] mshv: Fix IRQ leak and type hazards in hv_call_modify_spa_host_access
From: Stanislav Kinsburskii @ 2026-04-29 18:17 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
In-Reply-To: <177748522635.144491.1565666089881726479.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>
The bounds check inside the PFN-filling loop can return -EINVAL while
interrupts are disabled via local_irq_save(), leaking IRQ state.
Remove the check — it is redundant because the loop invariant
(done + i < page_count == page_struct_count >> large_shift) guarantees
(done + i) << large_shift < page_struct_count always holds.
While here, fix type mismatches: change 'int done' to 'u64 done' and
use u64 for loop and batch-size variables so they match the u64
page_count they are compared against.
Fixes: 621191d709b14 ("Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs")
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
drivers/hv/mshv_root_hv_call.c | 18 ++++++------------
1 file changed, 6 insertions(+), 12 deletions(-)
diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
index f8c2341193da5..61871ad131b4b 100644
--- a/drivers/hv/mshv_root_hv_call.c
+++ b/drivers/hv/mshv_root_hv_call.c
@@ -1041,7 +1041,7 @@ int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
{
struct hv_input_modify_sparse_spa_page_host_access *input_page;
u64 status;
- int done = 0;
+ u64 done = 0;
unsigned long irq_flags, large_shift = 0;
u64 page_count = page_struct_count;
u16 code = acquire ? HVCALL_ACQUIRE_SPARSE_SPA_PAGE_HOST_ACCESS :
@@ -1058,9 +1058,9 @@ int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
}
while (done < page_count) {
- ulong i, completed, remain = page_count - done;
- int rep_count = min(remain,
- HV_MODIFY_SPARSE_SPA_PAGE_HOST_ACCESS_MAX_PAGE_COUNT);
+ u64 i, completed, remain = page_count - done;
+ u64 rep_count = min_t(u64, remain,
+ HV_MODIFY_SPARSE_SPA_PAGE_HOST_ACCESS_MAX_PAGE_COUNT);
local_irq_save(irq_flags);
input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
@@ -1074,15 +1074,9 @@ int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
input_page->flags = flags;
input_page->host_access = host_access;
- for (i = 0; i < rep_count; i++) {
- u64 index = (done + i) << large_shift;
-
- if (index >= page_struct_count)
- return -EINVAL;
-
+ for (i = 0; i < rep_count; i++)
input_page->spa_page_list[i] =
- page_to_pfn(pages[index]);
- }
+ page_to_pfn(pages[(done + i) << large_shift]);
status = hv_do_rep_hypercall(code, rep_count, 0, input_page,
NULL);
^ permalink raw reply related
* [PATCH 00/10] mshv: Bug fixes across the mshv_root module
From: Stanislav Kinsburskii @ 2026-04-29 18:17 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
This series addresses bugs found during a review of the mshv_root module
introduced by commit 621191d709b14 ("Drivers: hv: Introduce mshv_root
module to expose /dev/mshv to VMMs").
The fixes range from data corruption and use-after-free to silent
functional failures:
- IRQ state leak and type truncation in hypercall helpers
(hv_call_modify_spa_host_access)
- Integer overflow on userspace-controlled allocation size
(mshv_region_create)
- Missing locking, broken seqcount read protection, and a check on
uninitialized data in the irqfd path — the latter makes
level-triggered interrupt resampling completely non-functional
- Duplicate GSI 0 detection using the wrong predicate
- Use-after-RCU in port ID lookup
- Missing VP index bounds check in intercept ISR (OOB in interrupt
context)
- Missing error code on VP allocation failure (silent success to
userspace)
---
Stanislav Kinsburskii (10):
mshv: Fix IRQ leak and type hazards in hv_call_modify_spa_host_access
mshv: Fix potential integer overflow in mshv_region_create
mshv: Fix missing lock in mshv_irqfd_deassign
mshv: Fix broken seqcount read protection
mshv: Fix level-triggered check on uninitialized data
mshv: Fix duplicate GSI detection for GSI 0
mshv: Fix use-after-RCU in mshv_portid_lookup
mshv: Use kfree_rcu in mshv_portid_free
mshv: Add missing vp_index bounds check in intercept ISR
mshv: Fix missing error code on VP allocation failure
drivers/hv/mshv_eventfd.c | 75 ++++++++++++++++++++++------------------
drivers/hv/mshv_irq.c | 2 +
drivers/hv/mshv_portid_table.c | 6 +--
drivers/hv/mshv_regions.c | 2 +
drivers/hv/mshv_root_hv_call.c | 18 +++-------
drivers/hv/mshv_root_main.c | 4 ++
drivers/hv/mshv_synic.c | 4 ++
7 files changed, 59 insertions(+), 52 deletions(-)
^ permalink raw reply
* RE: [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window
From: Michael Kelley @ 2026-04-29 18:01 UTC (permalink / raw)
To: Dexuan Cui, Michael Kelley, KY Srinivasan, Haiyang Zhang,
wei.liu@kernel.org, Long Li, lpieralisi@kernel.org,
kwilczynski@kernel.org, mani@kernel.org, robh@kernel.org,
bhelgaas@google.com, Jake Oshins, linux-hyperv@vger.kernel.org,
linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
matthew.ruffell@canonical.com, kjlx@templeofstupid.com
Cc: Krister Johansen, stable@vger.kernel.org
In-Reply-To: <SA1PR21MB69213486F821CA5A2C793C81BF342@SA1PR21MB6921.namprd21.prod.outlook.com>
From: Dexuan Cui <DECUI@microsoft.com> Sent: Tuesday, April 28, 2026 6:58 PM
> > From: Michael Kelley <mhklinux@outlook.com> Sent: Thursday, April 23, 2026 10:40 AM
[snip]
>
> > Question about Gen 1 VMs: If the Linux frame buffer driver moves
> > the frame buffer somewhere other than the default location, and
> > then the VM does a kexec/kdump, what does the legacy PCI graphic
> > device BAR report as the frame buffer location? Does it *always*
> > report 4G-128MB, or does it report the new location? I can run
>
> It always reports 4G-128MB.
OK, good to know. I was hoping it might report the new location. :-(
> BTW, I suspect a Gen2 VM may have the same issue, i.e.
> currently we only reserve 8MB below 4GB; if hyperv_drm uses
> high MMIO, I suspect the UEFI firmware would still report the
> same original low MMIO framebuffer base/size to the kdump kernel,
> but there is no easy way to verify this for Gen2 VMs...
>
[snip]
>
> However, when the kdump kernel starts to run, and I print the
> pci_resource_start(pdev, 0) and pci_resource_len(pdev, 0)
> from vmbus_reserve_fb(), I still see 4G-128MB:
> [ 12.506159] Gen1 VM: start=0xf8000000, size=0x4000000
>
> In this case, we can't really fix the MMIO conflict, e.g.
> if both hv_pci and hyperv_drm are built as modules, then
> the order of loading them can be nondeterministic:if the order
> in the first kernel is different from the order in
> the kdump kernel, we run into trouble.
Yep.
>
> If the order is deterministic (e.g. hv_pci is
> built-in, and hyperv_drm is built as a module),
> we should be good since both allocates MMIO from
> the high MMIO range in a deterministic way.
>
Yep.
Thanks,
Michael
^ permalink raw reply
* RE: [PATCH] Drivers: hv: vmbus: Improve the logc of reserving fb_mmio on Gen2 VMs
From: Michael Kelley @ 2026-04-29 18:01 UTC (permalink / raw)
To: Dexuan Cui, Michael Kelley, KY Srinivasan, Haiyang Zhang,
wei.liu@kernel.org, Long Li, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, matthew.ruffell@canonical.com,
johansen@templeofstupid.com
Cc: stable@vger.kernel.org
In-Reply-To: <SA1PR21MB69214DC322549834104D26E0BF342@SA1PR21MB6921.namprd21.prod.outlook.com>
From: Dexuan Cui <DECUI@microsoft.com> Sent: Tuesday, April 28, 2026 8:13 PM
> > From: Michael Kelley <mhklinux@outlook.com> Sent: Thursday, April 23, 2026 10:40 AM
[snip]
> > > + /* Hyper-V CoCo guests do not have a framebuffer device. */
> > > + if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
> > > + return;
> >
> > This test is testing feature "A" (mem encryption) in order to determine
> > the presence of feature "B" (no framebuffer), because current
> > configurations happen to always have "A" and "B" at the same time. But
> > the linkage between the features is tenuous, and if configurations should
> > change in the future, testing this way could be bogus. It works now, but I'm
> > leery of depending on the linkage between "A" and "B".
> >
> > You could set up a "can_have_framebuffer" flag in ms_hyperv_init_platform()
> > if running in a CVM, and test that flag here. But I'd suggest just dropping
> > this optimization. CVMs are always Gen2 (and that's not going to change),
> > so they have plenty of low mmio space.
>
> This is not true on a lab host, e.g. I have a TDX VM on a lab host created
> by these 2 commands (without the 2nd command, Hyper-V won't allow
> the TDX VM to start):
>
> New-VM -Generation 2 -GuestStateIsolationType Tdx -Name $vmName
> Disable-VMConsoleSupport -VMName $vmName
>
> The low_mmio_base is still 4GB-128MB. In this case, it's not a good idea
> to try to reserve the 128MB:
>
> 1) the available low MMIO size is smaller than 128MB due to the vTPM
> MMIO range.
>
> 2) even if we can reserve the 109.25 low mmio range
> [0xf8000000-0xfed3ffff], we may not want to do that, just in case
> some assigned PCI device has 32-bit BARs.
>
> So, IMO we need to keep the check:
> + if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
> + return;
>
> BTW, I think this may be a slightly better check here:
> + if (hv_is_isolation_supported())
> + return;
Agreed. Using hv_is_isolation_supported() seems better than
cc_platform_has() for this purpose.
>
> A CVM on Hyper-V won't start without the command line
> Disable-VMConsoleSupport -VMName $vmName
Unfortunately, on my laptop Hyper-V, a VM with VBS Isolation appears
to *not* require Disable-VMConsoleSupport. I can start the VM, and the
VM is offered the VMBus synthvid, mouse, and keyboard devices.
But what's weird in this case is that vmbus_reserved_fb() sees lfb_base
and lfb_start as 0. Furthermore, as a test, I changed the "allowed_in_isolated"
flag to true for the synthvid device, and the Hyper-V DRM driver loads and
initializes. In doing so, the vmconnect.exe window is resized larger, as is
done in a normal VM. /proc/iomem shows that the DRM driver claimed
the expected MMIO range at the start of low MMIO space. I can run a user
space program that mmaps /dev/fb0 and writes pixels to the mmap'ed
memory, and that succeeds as it would in a normal VM, but the
vmconnect.exe window doesn't show anything. It appears that the Hyper-V
host has allocated memory for the frame buffer, but is ignoring anything
that is written to it.
Running Disable-VMConsoleSupport works as expected -- the synthvid,
mouse, and keyboard devices are no longer offered to the VM.
>
> IMO this is very unlikely to change in the future, because the Hyper-V
> synthetic framebuffer VMBus device is not a trusted device for a CVM,
> so there is no reason for Hyper-V to offer such a device to CVMs; even
> if the host offers it, currently the guest hv_vmbus driver ignores it.
>
In the case of VBS Isolation, if such a VM also had a PCI pass-thru device,
the core problem could recur. I.e., not reserving space for the framebuffer
could allow the PCI device to try to use MMIO space that Hyper-V has
set up for the frame buffer, causing the PCI device to fail. And that's a
worse problem than just having the graphics console not function. I
can't actually try the failure case because I don't have an assignable PCI
device on my laptop, but it seems likely based on the evidence that
Hyper-V is setting up a framebuffer device.
So instead of not reserving any MMIO space for the framebuffer on
CVMs, the code you already have limits the reservation to half of the
MMIO space below 4 GB. Won't that work to avoid exhausting the low
MMIO space in a CVM that's running on a local Hyper-V with only 128
MiB of low MMIO space?
> When we assign a physical PCI GPU device to a CVM, I'm not sure if there
> is any framebuffer from the GPU or not. Even if there is, that's a completely
> different scenario and not reserving some low MMIO for "framebuffer"
> is unrelated: I think hyperv_drm (or the deprecated hyperv_fb) is the only
> driver that sets the fb_overlap_ok parameter of vmbus_allocate_mmio().
>
> > And at the moment, CVMs don't
> > support PCI devices,
>
> This is not true: recently I created a "Standard DC16eds v6" TDX CVM
> on Azure, and I did see two NVMe local temporary disks in "nvme list"
> (here TDISP is not used). In 2023, we added the commit
> 2c6ba4216844 ("PCI: hv: Enable PCI pass-thru devices in Confidential VMs")
> and I believe some users are running CVMs with GPUs.
Interesting! I worked on commit 2c6ba4216844, but had not noticed
that Azure now has offerings that makes use of it. I'll take a look at
that TDX VM size.
Thanks,
Michael
^ permalink raw reply
* [PATCH v2] mshv: Simplify GPA map/unmap hypercall helpers
From: Stanislav Kinsburskii @ 2026-04-29 16:48 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
Clean up hv_do_map_gpa_hcall() and hv_call_unmap_gpa_pages() after the
preceding bug-fix patches:
Move "done += completed" before the status checks so that pages mapped
by a partially-successful batch are included in the error cleanup unmap.
Previously these mappings were leaked on failure.
While here, improve type safety and readability:
- Change "int done" to "u64 done" to match the u64 page_count it is
compared against, avoiding signed/unsigned comparison hazards.
- Use u64 for loop iteration and batch size variables consistently.
- Add proper braces to the for-loop body in hv_do_map_gpa_hcall().
- Remove unnecessary "ret" variable from hv_call_unmap_gpa_pages().
- Simplify the error-path unmap to use "done << large_shift" directly
instead of mutating done in place.
Fixes: 621191d709b14 ("Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs")
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
drivers/hv/mshv_root_hv_call.c | 55 +++++++++++++++-------------------------
1 file changed, 20 insertions(+), 35 deletions(-)
diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
index e5992c324904a..1f19a4ca824f0 100644
--- a/drivers/hv/mshv_root_hv_call.c
+++ b/drivers/hv/mshv_root_hv_call.c
@@ -195,8 +195,8 @@ static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
struct hv_input_map_gpa_pages *input_page;
u64 status, *pfnlist;
unsigned long irq_flags, large_shift = 0;
- int ret = 0, done = 0;
- u64 page_count = page_struct_count;
+ u64 done = 0, page_count = page_struct_count;
+ int ret = 0;
if (page_count == 0 || (pages && mmio_spa))
return -EINVAL;
@@ -213,8 +213,8 @@ static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
}
while (done < page_count) {
- ulong i, completed, remain = page_count - done;
- int rep_count = min(remain, HV_MAP_GPA_BATCH_SIZE);
+ u64 i, completed, remain = page_count - done;
+ u64 rep_count = min_t(u64, remain, HV_MAP_GPA_BATCH_SIZE);
local_irq_save(irq_flags);
input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
@@ -224,23 +224,13 @@ static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
input_page->map_flags = flags;
pfnlist = input_page->source_gpa_page_list;
- for (i = 0; i < rep_count; i++)
- if (flags & HV_MAP_GPA_NO_ACCESS) {
+ for (i = 0; i < rep_count; i++) {
+ if (flags & HV_MAP_GPA_NO_ACCESS)
pfnlist[i] = 0;
- } else if (pages) {
- u64 index = (done + i) << large_shift;
-
- if (index >= page_struct_count) {
- ret = -EINVAL;
- break;
- }
- pfnlist[i] = page_to_pfn(pages[index]);
- } else {
+ else if (pages)
+ pfnlist[i] = page_to_pfn(pages[(done + i) << large_shift]);
+ else
pfnlist[i] = mmio_spa + done + i;
- }
- if (ret) {
- local_irq_restore(irq_flags);
- break;
}
status = hv_do_rep_hypercall(HVCALL_MAP_GPA_PAGES, rep_count, 0,
@@ -248,29 +238,26 @@ static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
local_irq_restore(irq_flags);
completed = hv_repcomp(status);
+ done += completed;
if (hv_result_needs_memory(status)) {
ret = hv_call_deposit_pages(NUMA_NO_NODE, partition_id,
HV_MAP_GPA_DEPOSIT_PAGES);
if (ret)
break;
-
} else if (!hv_result_success(status)) {
ret = hv_result_to_errno(status);
break;
}
-
- done += completed;
}
if (ret && done) {
u32 unmap_flags = 0;
- if (flags & HV_MAP_GPA_LARGE_PAGE) {
+ if (flags & HV_MAP_GPA_LARGE_PAGE)
unmap_flags |= HV_UNMAP_GPA_LARGE_PAGE;
- done <<= large_shift;
- }
- hv_call_unmap_gpa_pages(partition_id, gfn, done, unmap_flags);
+ hv_call_unmap_gpa_pages(partition_id, gfn,
+ done << large_shift, unmap_flags);
}
return ret;
@@ -305,7 +292,7 @@ int hv_call_unmap_gpa_pages(u64 partition_id, u64 gfn, u64 page_count_4k,
struct hv_input_unmap_gpa_pages *input_page;
u64 status, page_count = page_count_4k;
unsigned long irq_flags, large_shift = 0;
- int ret = 0, done = 0;
+ u64 done = 0;
if (page_count == 0)
return -EINVAL;
@@ -319,8 +306,8 @@ int hv_call_unmap_gpa_pages(u64 partition_id, u64 gfn, u64 page_count_4k,
}
while (done < page_count) {
- ulong completed, remain = page_count - done;
- int rep_count = min(remain, HV_UMAP_GPA_PAGES);
+ u64 completed, remain = page_count - done;
+ u64 rep_count = min_t(u64, remain, HV_UMAP_GPA_PAGES);
local_irq_save(irq_flags);
input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
@@ -333,15 +320,13 @@ int hv_call_unmap_gpa_pages(u64 partition_id, u64 gfn, u64 page_count_4k,
local_irq_restore(irq_flags);
completed = hv_repcomp(status);
- if (!hv_result_success(status)) {
- ret = hv_result_to_errno(status);
- break;
- }
-
done += completed;
+
+ if (!hv_result_success(status))
+ return hv_result_to_errno(status);
}
- return ret;
+ return 0;
}
int hv_call_get_gpa_access_states(u64 partition_id, u32 count, u64 gpa_base_pfn,
^ permalink raw reply related
* Re: [PATCH] mshv: Simplify GPA map/unmap hypercall helpers
From: Stanislav Kinsburskii @ 2026-04-29 15:15 UTC (permalink / raw)
To: Anirudh Rayabharam
Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel
In-Reply-To: <20260429-orca-of-legal-symmetry-3c72bc@anirudhrb>
On Wed, Apr 29, 2026 at 11:02:37AM +0000, Anirudh Rayabharam wrote:
> On Tue, Apr 28, 2026 at 11:21:12PM +0000, Stanislav Kinsburskii wrote:
> > Clean up hv_do_map_gpa_hcall() and hv_call_unmap_gpa_pages() after the
> > preceding bug-fix patches:
> >
> > Move "done += completed" before the status checks so that pages mapped
> > by a partially-successful batch are included in the error cleanup unmap.
> > Previously these mappings were leaked on failure.
> >
> > While here, improve type safety and readability:
> > - Change "int done" to "u64 done" to match the u64 page_count it is
> > compared against, avoiding signed/unsigned comparison hazards.
> > - Use u64 for loop iteration and batch size variables consistently.
> > - Add proper braces to the for-loop body in hv_do_map_gpa_hcall().
> > - Remove unnecessary "ret" variable from hv_call_unmap_gpa_pages().
> > - Simplify the error-path unmap to use "done << large_shift" directly
> > instead of mutating done in place.
> >
> > Fixes: 621191d709b14 ("Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs")
> > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > ---
> > drivers/hv/mshv_root_hv_call.c | 55 +++++++++++++++-------------------------
> > 1 file changed, 20 insertions(+), 35 deletions(-)
> >
> > diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
> > index e5992c324904a..f5f205a397834 100644
> > --- a/drivers/hv/mshv_root_hv_call.c
> > +++ b/drivers/hv/mshv_root_hv_call.c
> > @@ -195,8 +195,8 @@ static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
> > struct hv_input_map_gpa_pages *input_page;
> > u64 status, *pfnlist;
> > unsigned long irq_flags, large_shift = 0;
> > - int ret = 0, done = 0;
> > - u64 page_count = page_struct_count;
> > + u64 done = 0, page_count = page_struct_count;
> > + int ret = 0;
> >
> > if (page_count == 0 || (pages && mmio_spa))
> > return -EINVAL;
> > @@ -213,8 +213,8 @@ static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
> > }
> >
> > while (done < page_count) {
> > - ulong i, completed, remain = page_count - done;
> > - int rep_count = min(remain, HV_MAP_GPA_BATCH_SIZE);
> > + u64 i, completed, remain = page_count - done;
> > + u64 rep_count = min(remain, (u64)HV_MAP_GPA_BATCH_SIZE);
> >
> > local_irq_save(irq_flags);
> > input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > @@ -224,23 +224,13 @@ static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
> > input_page->map_flags = flags;
> > pfnlist = input_page->source_gpa_page_list;
> >
> > - for (i = 0; i < rep_count; i++)
> > - if (flags & HV_MAP_GPA_NO_ACCESS) {
> > + for (i = 0; i < rep_count; i++) {
> > + if (flags & HV_MAP_GPA_NO_ACCESS)
> > pfnlist[i] = 0;
> > - } else if (pages) {
> > - u64 index = (done + i) << large_shift;
> > -
> > - if (index >= page_struct_count) {
> > - ret = -EINVAL;
> > - break;
> > - }
> > - pfnlist[i] = page_to_pfn(pages[index]);
> > - } else {
> > + else if (pages)
> > + pfnlist[i] = page_to_pfn(pages[(done + i) << large_shift]);
> > + else
> > pfnlist[i] = mmio_spa + done + i;
> > - }
> > - if (ret) {
> > - local_irq_restore(irq_flags);
> > - break;
> > }
> >
> > status = hv_do_rep_hypercall(HVCALL_MAP_GPA_PAGES, rep_count, 0,
> > @@ -248,29 +238,26 @@ static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
> > local_irq_restore(irq_flags);
> >
> > completed = hv_repcomp(status);
> > + done += completed;
> >
> > if (hv_result_needs_memory(status)) {
> > ret = hv_call_deposit_pages(NUMA_NO_NODE, partition_id,
> > HV_MAP_GPA_DEPOSIT_PAGES);
> > if (ret)
> > break;
> > -
> > } else if (!hv_result_success(status)) {
> > ret = hv_result_to_errno(status);
> > break;
> > }
> > -
> > - done += completed;
> > }
> >
> > if (ret && done) {
> > u32 unmap_flags = 0;
> >
> > - if (flags & HV_MAP_GPA_LARGE_PAGE) {
> > + if (flags & HV_MAP_GPA_LARGE_PAGE)
> > unmap_flags |= HV_UNMAP_GPA_LARGE_PAGE;
> > - done <<= large_shift;
> > - }
> > - hv_call_unmap_gpa_pages(partition_id, gfn, done, unmap_flags);
> > + hv_call_unmap_gpa_pages(partition_id, gfn,
> > + done << large_shift, unmap_flags);
>
> How does this work? Earlier we were doing "done << large_shift" only if
> HV_MAP_GPA_LARGE_PAGE is set but now we always do it.
>
It works becuase large_shift in initialized to 0 when
HV_MAP_GPA_LARGE_PAGE is not set.
Thanks,
Stanislav
> Thanks,
> Anirudh.
^ permalink raw reply
* Re: [PATCH] mshv: Add dedicated ioctl for GVA to GPA translation
From: Anirudh Rayabharam @ 2026-04-29 13:11 UTC (permalink / raw)
To: Stanislav Kinsburskii
Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel
In-Reply-To: <177741648871.626779.11067281081219290277.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>
On Tue, Apr 28, 2026 at 10:48:24PM +0000, Stanislav Kinsburskii wrote:
> Add an MSHV_TRANSLATE_GVA ioctl on the VP fd that wraps
> HVCALL_TRANSLATE_VIRTUAL_ADDRESS_EX with transparent fault-in handling for
> movable memory regions. The passthrough path for this hypercall is retained
> for backward compatibility.
>
> When guest-backing pages reside in movable memory regions, the mmu_notifier
> invalidation path remaps them to NO_ACCESS in the hypervisor's second-level
> address translation tables. If the VMM issues a GVA translation (e.g.
> during MMIO emulation) while a page-table page is invalidated, the
> hypervisor returns HV_TRANSLATE_GVA_GPA_NO_READ_ACCESS. The VMM cannot
> resolve this on its own.
>
> The new ioctl detects this transient GPA access failure, faults the page
> back in via mshv_region_handle_gfn_fault(), and retries the translation
> until it succeeds or an unrecoverable error occurs.
>
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
> drivers/hv/mshv_root.h | 3 ++
> drivers/hv/mshv_root_hv_call.c | 37 +++++++++++++++++++++
> drivers/hv/mshv_root_main.c | 69 ++++++++++++++++++++++++++++++++++++++++
> include/hyperv/hvgdk_mini.h | 1 +
> include/hyperv/hvhdk.h | 41 ++++++++++++++++++++++++
> include/uapi/linux/mshv.h | 10 ++++++
> 6 files changed, 161 insertions(+)
>
> diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> index 1f086dcb7aa1a..2e6c4414740cc 100644
> --- a/drivers/hv/mshv_root.h
> +++ b/drivers/hv/mshv_root.h
> @@ -290,6 +290,9 @@ int hv_call_delete_vp(u64 partition_id, u32 vp_index);
> int hv_call_assert_virtual_interrupt(u64 partition_id, u32 vector,
> u64 dest_addr,
> union hv_interrupt_control control);
> +int hv_call_translate_virtual_address_ex(u32 vp_index, u64 partition_id,
> + u64 flags, u64 gva, u64 *gfn,
> + struct hv_translate_gva_result_ex *result);
> int hv_call_clear_virtual_interrupt(u64 partition_id);
> int hv_call_get_gpa_access_states(u64 partition_id, u32 count, u64 gpa_base_pfn,
> union hv_gpa_page_access_state_flags state_flags,
> diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
> index e5992c324904a..9ff4ba5373f59 100644
> --- a/drivers/hv/mshv_root_hv_call.c
> +++ b/drivers/hv/mshv_root_hv_call.c
> @@ -692,6 +692,43 @@ int hv_call_get_partition_property_ex(u64 partition_id, u64 property_code,
> return 0;
> }
>
> +int hv_call_translate_virtual_address_ex(u32 vp_index, u64 partition_id,
> + u64 flags, u64 gva, u64 *gfn,
> + struct hv_translate_gva_result_ex *result)
> +{
> + struct hv_input_translate_virtual_address *input;
> + struct hv_output_translate_virtual_address_ex *output;
> + unsigned long irq_flags;
> + u64 status;
> +
> + local_irq_save(irq_flags);
> +
> + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> + output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> +
> + memset(input, 0, sizeof(*input));
> + input->partition_id = partition_id;
> + input->vp_index = vp_index;
> + input->control_flags = flags;
> + input->gva_page = gva >> HV_HYP_PAGE_SHIFT;
> +
> + status = hv_do_hypercall(HVCALL_TRANSLATE_VIRTUAL_ADDRESS_EX,
> + input, output);
> +
> + if (!hv_result_success(status)) {
> + local_irq_restore(irq_flags);
> + pr_err("%s: %s\n", __func__, hv_result_to_string(status));
> + return hv_result_to_errno(status);
> + }
> +
> + *result = output->translation_result;
> + *gfn = output->gpa_page;
> +
> + local_irq_restore(irq_flags);
> +
> + return 0;
> +}
> +
> int
> hv_call_clear_virtual_interrupt(u64 partition_id)
> {
> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> index bd1359eb58dd4..2d7b6923415a8 100644
> --- a/drivers/hv/mshv_root_main.c
> +++ b/drivers/hv/mshv_root_main.c
> @@ -898,6 +898,72 @@ mshv_vp_ioctl_get_set_state(struct mshv_vp *vp,
> return 0;
> }
>
> +static bool mshv_gpa_fault_retryable(u32 result_code)
> +{
> + /*
> + * Note: HV_TRANSLATE_GVA_GPA_UNMAPPED is intentionally not handled
> + * here. The guest page table cannot be unmapped under normal
> + * operation. It may be mapped with no access during page moves,
> + * but a truly unmapped state indicates a kernel driver bug.
> + * Retrying in this case would only mask the underlying problem of
> + * an unmapped guest page table.
> + */
> + return result_code == HV_TRANSLATE_GVA_GPA_NO_READ_ACCESS;
> +}
> +
> +static long
> +mshv_vp_ioctl_translate_gva(struct mshv_vp *vp, void __user *user_args)
> +{
> + struct mshv_partition *partition = vp->vp_partition;
> + struct mshv_translate_gva args;
> + struct hv_translate_gva_result_ex result;
> + u64 gfn, gpa;
> + int ret;
> +
> + if (copy_from_user(&args, user_args, sizeof(args)))
> + return -EFAULT;
> +
> + do {
> + ret = hv_call_translate_virtual_address_ex(vp->vp_index,
> + partition->pt_id,
> + args.flags, args.gva,
> + &gfn, &result);
> + if (ret)
> + return ret;
> +
> + if (mshv_gpa_fault_retryable(result.result_code)) {
> + struct mshv_mem_region *region;
> + bool faulted;
> +
> + region = mshv_partition_region_by_gfn_get(partition,
> + gfn);
> + if (!region)
> + return -EFAULT;
> +
> + faulted = false;
> + if (region->mreg_type == MSHV_REGION_TYPE_MEM_MOVABLE)
> + faulted = mshv_region_handle_gfn_fault(region,
> + gfn);
> + mshv_region_put(region);
> +
> + if (!faulted)
> + return -EFAULT;
> +
> + cond_resched();
> + }
> + } while (mshv_gpa_fault_retryable(result.result_code));
> +
> + gpa = (gfn << PAGE_SHIFT) | (args.gva & ~PAGE_MASK);
> +
> + if (copy_to_user(args.result, &result, sizeof(*args.result)))
Indentation is a bit off here.
With that fixed:
Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
^ permalink raw reply
* Re: [PATCH] hv: utils: handle and propagate errors in kvp_register
From: Olaf Hering @ 2026-04-29 12:44 UTC (permalink / raw)
To: Thorsten Blum
Cc: K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Greg Kroah-Hartman, stable, linux-hyperv, linux-kernel
In-Reply-To: <afH7VELGgh8eGBUC@linux.dev>
[-- Attachment #1: Type: text/plain, Size: 252 bytes --]
Wed, 29 Apr 2026 14:36:36 +0200 Thorsten Blum <thorsten.blum@linux.dev>:
> What makes you think this is just "cosmetics"?
It does fix an unlikely bug indeed, but it does not need to trigger the whole paperwork attached to a Fixes tag.
Olaf
[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH] hv: utils: handle and propagate errors in kvp_register
From: Thorsten Blum @ 2026-04-29 12:36 UTC (permalink / raw)
To: Olaf Hering
Cc: K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Greg Kroah-Hartman, stable, Ky Srinivasan, linux-hyperv,
linux-kernel
In-Reply-To: <20260429142724.4d74641a.olaf@aepfle.de>
On Wed, Apr 29, 2026 at 02:27:24PM +0200, Olaf Hering wrote:
> Tue, 14 Apr 2026 13:10:08 +0200 Thorsten Blum <thorsten.blum@linux.dev>:
>
> > Fixes: 245ba56a52a3 ("Staging: hv: Implement key/value pair (KVP)")
>
> Please do not abuse the Fixes tag when it fact this change is "cosmetics".
What makes you think this is just "cosmetics"?
^ permalink raw reply
* Re: [PATCH] hv: utils: handle and propagate errors in kvp_register
From: Olaf Hering @ 2026-04-29 12:27 UTC (permalink / raw)
To: Thorsten Blum
Cc: K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Greg Kroah-Hartman, stable, Ky Srinivasan, linux-hyperv,
linux-kernel
In-Reply-To: <20260414111008.307220-2-thorsten.blum@linux.dev>
[-- Attachment #1: Type: text/plain, Size: 235 bytes --]
Tue, 14 Apr 2026 13:10:08 +0200 Thorsten Blum <thorsten.blum@linux.dev>:
> Fixes: 245ba56a52a3 ("Staging: hv: Implement key/value pair (KVP)")
Please do not abuse the Fixes tag when it fact this change is "cosmetics".
Olaf
[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH] mshv: Simplify GPA map/unmap hypercall helpers
From: Anirudh Rayabharam @ 2026-04-29 11:02 UTC (permalink / raw)
To: Stanislav Kinsburskii
Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel
In-Reply-To: <177741845948.632922.14128507833980339307.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>
On Tue, Apr 28, 2026 at 11:21:12PM +0000, Stanislav Kinsburskii wrote:
> Clean up hv_do_map_gpa_hcall() and hv_call_unmap_gpa_pages() after the
> preceding bug-fix patches:
>
> Move "done += completed" before the status checks so that pages mapped
> by a partially-successful batch are included in the error cleanup unmap.
> Previously these mappings were leaked on failure.
>
> While here, improve type safety and readability:
> - Change "int done" to "u64 done" to match the u64 page_count it is
> compared against, avoiding signed/unsigned comparison hazards.
> - Use u64 for loop iteration and batch size variables consistently.
> - Add proper braces to the for-loop body in hv_do_map_gpa_hcall().
> - Remove unnecessary "ret" variable from hv_call_unmap_gpa_pages().
> - Simplify the error-path unmap to use "done << large_shift" directly
> instead of mutating done in place.
>
> Fixes: 621191d709b14 ("Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs")
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
> drivers/hv/mshv_root_hv_call.c | 55 +++++++++++++++-------------------------
> 1 file changed, 20 insertions(+), 35 deletions(-)
>
> diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
> index e5992c324904a..f5f205a397834 100644
> --- a/drivers/hv/mshv_root_hv_call.c
> +++ b/drivers/hv/mshv_root_hv_call.c
> @@ -195,8 +195,8 @@ static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
> struct hv_input_map_gpa_pages *input_page;
> u64 status, *pfnlist;
> unsigned long irq_flags, large_shift = 0;
> - int ret = 0, done = 0;
> - u64 page_count = page_struct_count;
> + u64 done = 0, page_count = page_struct_count;
> + int ret = 0;
>
> if (page_count == 0 || (pages && mmio_spa))
> return -EINVAL;
> @@ -213,8 +213,8 @@ static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
> }
>
> while (done < page_count) {
> - ulong i, completed, remain = page_count - done;
> - int rep_count = min(remain, HV_MAP_GPA_BATCH_SIZE);
> + u64 i, completed, remain = page_count - done;
> + u64 rep_count = min(remain, (u64)HV_MAP_GPA_BATCH_SIZE);
>
> local_irq_save(irq_flags);
> input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
> @@ -224,23 +224,13 @@ static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
> input_page->map_flags = flags;
> pfnlist = input_page->source_gpa_page_list;
>
> - for (i = 0; i < rep_count; i++)
> - if (flags & HV_MAP_GPA_NO_ACCESS) {
> + for (i = 0; i < rep_count; i++) {
> + if (flags & HV_MAP_GPA_NO_ACCESS)
> pfnlist[i] = 0;
> - } else if (pages) {
> - u64 index = (done + i) << large_shift;
> -
> - if (index >= page_struct_count) {
> - ret = -EINVAL;
> - break;
> - }
> - pfnlist[i] = page_to_pfn(pages[index]);
> - } else {
> + else if (pages)
> + pfnlist[i] = page_to_pfn(pages[(done + i) << large_shift]);
> + else
> pfnlist[i] = mmio_spa + done + i;
> - }
> - if (ret) {
> - local_irq_restore(irq_flags);
> - break;
> }
>
> status = hv_do_rep_hypercall(HVCALL_MAP_GPA_PAGES, rep_count, 0,
> @@ -248,29 +238,26 @@ static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
> local_irq_restore(irq_flags);
>
> completed = hv_repcomp(status);
> + done += completed;
>
> if (hv_result_needs_memory(status)) {
> ret = hv_call_deposit_pages(NUMA_NO_NODE, partition_id,
> HV_MAP_GPA_DEPOSIT_PAGES);
> if (ret)
> break;
> -
> } else if (!hv_result_success(status)) {
> ret = hv_result_to_errno(status);
> break;
> }
> -
> - done += completed;
> }
>
> if (ret && done) {
> u32 unmap_flags = 0;
>
> - if (flags & HV_MAP_GPA_LARGE_PAGE) {
> + if (flags & HV_MAP_GPA_LARGE_PAGE)
> unmap_flags |= HV_UNMAP_GPA_LARGE_PAGE;
> - done <<= large_shift;
> - }
> - hv_call_unmap_gpa_pages(partition_id, gfn, done, unmap_flags);
> + hv_call_unmap_gpa_pages(partition_id, gfn,
> + done << large_shift, unmap_flags);
How does this work? Earlier we were doing "done << large_shift" only if
HV_MAP_GPA_LARGE_PAGE is set but now we always do it.
Thanks,
Anirudh.
^ permalink raw reply
* Re: [PATCH V1 04/13] mshv: Provide a way to get partition id if running in a VMM process
From: Anirudh Rayabharam @ 2026-04-29 10:35 UTC (permalink / raw)
To: Mukesh R
Cc: hpa, robin.murphy, robh, wei.liu, mhklinux, muislam, namjain,
magnuskulke, anbelski, linux-kernel, linux-hyperv, iommu,
linux-pci, linux-arch, kys, haiyangz, decui, longli, tglx, mingo,
bp, dave.hansen, x86, joro, will, lpieralisi, kwilczynski,
bhelgaas, arnd
In-Reply-To: <20260422023239.1171963-5-mrathor@linux.microsoft.com>
On Tue, Apr 21, 2026 at 07:32:30PM -0700, Mukesh R wrote:
> Many PCI passthru related hypercalls require partition id of the target
> guest. Guests are actually managed by MSHV driver and the partition id
> is only maintained there. Add a field in the partition struct in MSHV
> driver to save the tgid of the VMM process creating the partition,
> and add a function there to retrieve partition id if current process
> is a VMM process.
>
> Signed-off-by: Mukesh R <mrathor@linux.microsoft.com>
> ---
> drivers/hv/mshv_root.h | 1 +
> drivers/hv/mshv_root_main.c | 22 ++++++++++++++++++++++
> include/asm-generic/mshyperv.h | 5 +++++
> 3 files changed, 28 insertions(+)
Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
^ permalink raw reply
* Re: [PATCH v4 3/3] mshv: unmap debugfs stats pages on kexec
From: Anirudh Rayabharam @ 2026-04-29 10:10 UTC (permalink / raw)
To: Jork Loeser
Cc: linux-hyperv, x86, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H . Peter Anvin, Arnd Bergmann,
Michael Kelley, linux-kernel, linux-arch
In-Reply-To: <20260427213855.1675044-4-jloeser@linux.microsoft.com>
On Mon, Apr 27, 2026 at 02:38:54PM -0700, Jork Loeser wrote:
> On L1VH, debugfs stats pages are overlay pages: the kernel allocates
> them and registers the GPAs with the hypervisor via
> HVCALL_MAP_STATS_PAGE2. These overlay mappings persist in the
> hypervisor across kexec. If the kexec'd kernel reuses those physical
> pages, the hypervisor's overlay semantics cause a machine check
> exception.
>
> Fix this by calling mshv_debugfs_exit() from the reboot notifier,
> which issues HVCALL_UNMAP_STATS_PAGE for each mapped stats page before
> kexec. This releases the overlay bindings so the physical pages can be
> safely reused. Guard mshv_debugfs_exit() against being called when
> init failed.
>
> Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com>
> ---
> drivers/hv/mshv_debugfs.c | 7 ++++++-
> drivers/hv/mshv_synic.c | 1 +
> 2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/hv/mshv_debugfs.c b/drivers/hv/mshv_debugfs.c
> index 418b6dc8f3c2..3c3e02237ae9 100644
> --- a/drivers/hv/mshv_debugfs.c
> +++ b/drivers/hv/mshv_debugfs.c
> @@ -674,8 +674,10 @@ int __init mshv_debugfs_init(void)
>
> mshv_debugfs = debugfs_create_dir("mshv", NULL);
> if (IS_ERR(mshv_debugfs)) {
> + err = PTR_ERR(mshv_debugfs);
> + mshv_debugfs = NULL;
> pr_err("%s: failed to create debugfs directory\n", __func__);
Might as well print err here.
Nevertheless:
Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
Thanks,
Anirudh.
^ permalink raw reply
* Re: [PATCH v2 12/15] mshv_vtl: Move VSM code page offset logic to x86 files
From: Naman Jain @ 2026-04-29 10:00 UTC (permalink / raw)
To: Michael Kelley, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Catalin Marinas, Will Deacon,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
x86@kernel.org, H . Peter Anvin, Arnd Bergmann, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Alexandre Ghiti
Cc: Marc Zyngier, Timothy Hayes, Lorenzo Pieralisi, Sascha Bischoff,
mrigendrachaubey, linux-hyperv@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
linux-riscv@lists.infradead.org, vdso@mailbox.org,
ssengar@linux.microsoft.com
In-Reply-To: <SN6PR02MB4157E0525DDDD153888F5AFBD4362@SN6PR02MB4157.namprd02.prod.outlook.com>
On 4/27/2026 11:10 AM, Michael Kelley wrote:
> From: Naman Jain <namjain@linux.microsoft.com> Sent: Thursday, April 23, 2026 5:42 AM
>>
>> The VSM code page offset register (HV_REGISTER_VSM_CODE_PAGE_OFFSETS)
>> is x86 specific, its value configures the static call used to return
>> to VTL0 via the hypercall page. Move the register read from the common
>> mshv_vtl_get_vsm_regs() into the x86 mshv_vtl_return_call_init(),
>> which is the sole consumer of the offset.
>>
>> Change mshv_vtl_return_call_init() from taking a u64 parameter
>> to taking no arguments, and rename mshv_vtl_get_vsm_regs() to
>> mshv_vtl_get_vsm_cap_reg() since it now only fetches
>> HV_REGISTER_VSM_CAPABILITIES.
>>
>> No functional change on x86. This prepares the common driver code for
>> ARM64 where VSM code page offsets do not apply.
>>
>> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
>> ---
>> arch/x86/hyperv/hv_vtl.c | 19 +++++++++++++++++--
>> arch/x86/include/asm/mshyperv.h | 4 ++--
>> drivers/hv/mshv_vtl_main.c | 24 +++++++++++++-----------
>> 3 files changed, 32 insertions(+), 15 deletions(-)
>>
>> diff --git a/arch/x86/hyperv/hv_vtl.c b/arch/x86/hyperv/hv_vtl.c
>> index f3ffb6a7cb2d..7c10b34cf8a4 100644
>> --- a/arch/x86/hyperv/hv_vtl.c
>> +++ b/arch/x86/hyperv/hv_vtl.c
>> @@ -293,10 +293,25 @@ EXPORT_SYMBOL_GPL(hv_vtl_configure_reg_page);
>>
>> DEFINE_STATIC_CALL_NULL(__mshv_vtl_return_hypercall, void (*)(void));
>>
>> -void mshv_vtl_return_call_init(u64 vtl_return_offset)
>> +int mshv_vtl_return_call_init(void)
>> {
>> + struct hv_register_assoc vsm_pg_offset_reg;
>> + union hv_register_vsm_page_offsets offsets;
>> + int ret;
>> +
>> + vsm_pg_offset_reg.name = HV_REGISTER_VSM_CODE_PAGE_OFFSETS;
>> +
>> + ret = hv_call_get_vp_registers(HV_VP_INDEX_SELF, HV_PARTITION_ID_SELF,
>> + 1, input_vtl_zero, &vsm_pg_offset_reg);
>> + if (ret)
>> + return ret;
>> +
>> + offsets.as_uint64 = vsm_pg_offset_reg.value.reg64;
>> +
>> static_call_update(__mshv_vtl_return_hypercall,
>> - (void *)((u8 *)hv_hypercall_pg + vtl_return_offset));
>> + (void *)((u8 *)hv_hypercall_pg + offsets.vtl_return_offset));
>> +
>> + return 0;
>> }
>> EXPORT_SYMBOL(mshv_vtl_return_call_init);
>>
>> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
>> index b4d80c9a673a..b48f115c1292 100644
>> --- a/arch/x86/include/asm/mshyperv.h
>> +++ b/arch/x86/include/asm/mshyperv.h
>> @@ -286,14 +286,14 @@ struct mshv_vtl_cpu_context {
>> #ifdef CONFIG_HYPERV_VTL_MODE
>> void __init hv_vtl_init_platform(void);
>> int __init hv_vtl_early_init(void);
>> -void mshv_vtl_return_call_init(u64 vtl_return_offset);
>> +int mshv_vtl_return_call_init(void);
>> void mshv_vtl_return_hypercall(void);
>> void __mshv_vtl_return_call(struct mshv_vtl_cpu_context *vtl0);
>> int hv_vtl_get_set_reg(struct hv_register_assoc *regs, bool set, bool shared);
>> #else
>> static inline void __init hv_vtl_init_platform(void) {}
>> static inline int __init hv_vtl_early_init(void) { return 0; }
>> -static inline void mshv_vtl_return_call_init(u64 vtl_return_offset) {}
>> +static inline int mshv_vtl_return_call_init(void) { return 0; }
>> static inline void mshv_vtl_return_hypercall(void) {}
>> static inline void __mshv_vtl_return_call(struct mshv_vtl_cpu_context *vtl0) {}
>> #endif
>> diff --git a/drivers/hv/mshv_vtl_main.c b/drivers/hv/mshv_vtl_main.c
>> index 4c9ae65ad3e8..be498c9234fd 100644
>> --- a/drivers/hv/mshv_vtl_main.c
>> +++ b/drivers/hv/mshv_vtl_main.c
>> @@ -79,7 +79,6 @@ struct mshv_vtl {
>> };
>>
>> static struct mutex mshv_vtl_poll_file_lock;
>> -static union hv_register_vsm_page_offsets mshv_vsm_page_offsets;
>> static union hv_register_vsm_capabilities mshv_vsm_capabilities;
>>
>> static DEFINE_PER_CPU(struct mshv_vtl_poll_file, mshv_vtl_poll_file);
>> @@ -203,21 +202,19 @@ static void mshv_vtl_synic_enable_regs(unsigned int cpu)
>> /* VTL2 Host VSP SINT is (un)masked when the user mode requests that */
>> }
>>
>> -static int mshv_vtl_get_vsm_regs(void)
>> +static int mshv_vtl_get_vsm_cap_reg(void)
>> {
>> - struct hv_register_assoc registers[2];
>> - int ret, count = 2;
>> + struct hv_register_assoc vsm_capability_reg;
>> + int ret;
>>
>> - registers[0].name = HV_REGISTER_VSM_CODE_PAGE_OFFSETS;
>> - registers[1].name = HV_REGISTER_VSM_CAPABILITIES;
>> + vsm_capability_reg.name = HV_REGISTER_VSM_CAPABILITIES;
>>
>> ret = hv_call_get_vp_registers(HV_VP_INDEX_SELF, HV_PARTITION_ID_SELF,
>> - count, input_vtl_zero, registers);
>> + 1, input_vtl_zero, &vsm_capability_reg);
>> if (ret)
>> return ret;
>>
>> - mshv_vsm_page_offsets.as_uint64 = registers[0].value.reg64;
>> - mshv_vsm_capabilities.as_uint64 = registers[1].value.reg64;
>> + mshv_vsm_capabilities.as_uint64 = vsm_capability_reg.value.reg64;
>>
>> return ret;
>
> Nit: This could be just "return 0".
Acked.
>
>> }
>> @@ -1139,13 +1136,18 @@ static int __init mshv_vtl_init(void)
>> tasklet_init(&msg_dpc, mshv_vtl_sint_on_msg_dpc, 0);
>> init_waitqueue_head(&fd_wait_queue);
>>
>> - if (mshv_vtl_get_vsm_regs()) {
>> + if (mshv_vtl_get_vsm_cap_reg()) {
>> dev_emerg(dev, "Unable to get VSM capabilities !!\n");
>
> Why is this failure an emergency message, while the other failures
> here in mshv_vtl_init() are just error messages? When there's lack
> of consistency, I always wonder if there is a reason ..... :-)
It might be because I didn’t pay enough attention to the old code :)
dev_err() should work just fine, I'll change it.
>
>> ret = -ENODEV;
>> goto free_dev;
>> }
>>
>> - mshv_vtl_return_call_init(mshv_vsm_page_offsets.vtl_return_offset);
>> + ret = mshv_vtl_return_call_init();
>> + if (ret) {
>> + dev_err(dev, "mshv_vtl_return_call_init failed: %d\n", ret);
>> + goto free_dev;
>> + }
>> +
>> ret = hv_vtl_setup_synic();
>> if (ret)
>> goto free_dev;
>> --
>> 2.43.0
>>
Regards,
Naman
^ permalink raw reply
* Re: [PATCH v4 2/3] mshv: clean up SynIC state on kexec for L1VH
From: Anirudh Rayabharam @ 2026-04-29 9:58 UTC (permalink / raw)
To: Jork Loeser
Cc: linux-hyperv, x86, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H . Peter Anvin, Arnd Bergmann,
Michael Kelley, linux-kernel, linux-arch
In-Reply-To: <20260427213855.1675044-3-jloeser@linux.microsoft.com>
On Mon, Apr 27, 2026 at 02:38:53PM -0700, Jork Loeser wrote:
> The reboot notifier that tears down the SynIC cpuhp state guards the
> cleanup with hv_root_partition(), so on L1VH (where
> hv_root_partition() is false) SINT0, SINT5, and SIRBP are never
> cleaned up before kexec. The kexec'd kernel then inherits stale
> unmasked SINTs and an enabled SIRBP pointing to freed memory.
>
> Remove the hv_root_partition() guard so the cleanup runs for all
> parent partitions.
>
> Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com>
> ---
> drivers/hv/mshv_synic.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/drivers/hv/mshv_synic.c b/drivers/hv/mshv_synic.c
> index 2db3b0192eac..978a1cace341 100644
> --- a/drivers/hv/mshv_synic.c
> +++ b/drivers/hv/mshv_synic.c
> @@ -723,9 +723,6 @@ mshv_unregister_doorbell(u64 partition_id, int doorbell_portid)
> static int mshv_synic_reboot_notify(struct notifier_block *nb,
> unsigned long code, void *unused)
> {
> - if (!hv_root_partition())
> - return 0;
> -
> cpuhp_remove_state(synic_cpuhp_online);
> return 0;
> }
> --
> 2.43.0
>
Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
^ permalink raw reply
* Re: [PATCH v4 1/3] mshv: limit SynIC management to MSHV-owned resources
From: Anirudh Rayabharam @ 2026-04-29 9:58 UTC (permalink / raw)
To: Jork Loeser
Cc: linux-hyperv, x86, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H . Peter Anvin, Arnd Bergmann,
Michael Kelley, linux-kernel, linux-arch
In-Reply-To: <20260427213855.1675044-2-jloeser@linux.microsoft.com>
On Mon, Apr 27, 2026 at 02:38:52PM -0700, Jork Loeser wrote:
> The SynIC is shared between VMBus and MSHV. VMBus owns the message
> page (SIMP), event flags page (SIEFP), global enable (SCONTROL),
> and SINT2. MSHV adds SINT0, SINT5, and the event ring page (SIRBP).
>
> Currently mshv_synic_cpu_init() redundantly enables SIMP, SIEFP, and
> SCONTROL that VMBus already configured, and mshv_synic_cpu_exit()
> disables all of them. This is wrong because MSHV can be torn down
> while VMBus is still active. In particular, a kexec reboot notifier
> tears down MSHV first. Disabling SCONTROL, SIMP, and SIEFP out
> from under VMBus causes its later cleanup to write SynIC MSRs while
> SynIC is disabled, which the hypervisor does not tolerate.
>
> Restrict MSHV to managing only the resources it owns:
> - SINT0, SINT5: mask on cleanup, unmask on init
> - SIRBP: enable/disable as before
> - SIMP, SIEFP, SCONTROL: leave to VMBus when it is active (L1VH
> and nested root partition); on a non-nested root partition VMBus
> does not run, so MSHV must enable/disable them
>
> While here, fix the SIEFP and SIRBP memremap() and virt_to_phys()
> calls to use HV_HYP_PAGE_SHIFT/HV_HYP_PAGE_SIZE instead of
> PAGE_SHIFT/PAGE_SIZE. The hypervisor always uses 4K pages for SynIC
> register GPAs regardless of the kernel page size, so using PAGE_SHIFT
> produces wrong addresses on ARM64 with 64K pages.
>
> Note that initialization order matters - VMBUS first, MSHV second,
> and the reverse on de-init. Ideally, we would want a dedicated SYNIC
> driver that replaces the cross-dependencies with a clear API and
> dynamic tracking. Such refactor should go into its own dedicated
> series, outside of this kexec fix series.
>
> Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com>
> ---
> drivers/hv/hv.c | 3 +
> drivers/hv/mshv_synic.c | 150 ++++++++++++++++++++++++++--------------
> 2 files changed, 103 insertions(+), 50 deletions(-)
Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
^ permalink raw reply
* Re: [PATCH v2 09/15] Drivers: hv: mshv_vtl: Move hv_vtl_configure_reg_page() to x86
From: Naman Jain @ 2026-04-29 9:57 UTC (permalink / raw)
To: Michael Kelley, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Catalin Marinas, Will Deacon,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
x86@kernel.org, H . Peter Anvin, Arnd Bergmann, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Alexandre Ghiti
Cc: Marc Zyngier, Timothy Hayes, Lorenzo Pieralisi, Sascha Bischoff,
mrigendrachaubey, linux-hyperv@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
linux-riscv@lists.infradead.org, vdso@mailbox.org,
ssengar@linux.microsoft.com
In-Reply-To: <SN6PR02MB4157467FDBC0203C67A67042D4362@SN6PR02MB4157.namprd02.prod.outlook.com>
On 4/27/2026 11:10 AM, Michael Kelley wrote:
> From: Naman Jain <namjain@linux.microsoft.com> Sent: Thursday, April 23, 2026 5:42 AM
>>
>> Move hv_vtl_configure_reg_page() from drivers/hv/mshv_vtl_main.c to
>> arch/x86/hyperv/hv_vtl.c. The register page overlay is an x86-specific
>> feature that uses HV_X64_REGISTER_REG_PAGE, so its configuration belongs
>> in architecture-specific code.
>>
>> Move struct mshv_vtl_per_cpu and union hv_synic_overlay_page_msr to
>> include/asm-generic/mshyperv.h so they are visible to both arch and
>> driver code.
>>
>> Change the return type from void to bool so the caller can determine
>> whether the register page was successfully configured and set
>> mshv_has_reg_page accordingly.
>>
>> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
>> ---
>> arch/x86/hyperv/hv_vtl.c | 32 ++++++++++++++++++++++
>> drivers/hv/mshv_vtl_main.c | 49 +++-------------------------------
>> include/asm-generic/mshyperv.h | 17 ++++++++++++
>> 3 files changed, 53 insertions(+), 45 deletions(-)
>>
<snip>
>> #if IS_ENABLED(CONFIG_HYPERV_VTL_MODE)
>> +/* SYNIC_OVERLAY_PAGE_MSR - internal, identical to hv_synic_simp */
>
> This comment pre-dates your patch, but I don't understand the point
> it is trying to make. The comment is factually true, but I don't know
> why calling that out is relevant. The REG_PAGE MSR seems to be
> conceptually separate and distinct from the SIMP MSR, so the fact
> that the layouts are the same is just a coincidence. Or is there some
> relationship between the two MSRs that I'm not aware of, and the
> comment is trying (and failing?) to point out?
This was added as per suggestion from Nuno in my initial series for
MSHV_VTL. If the reference in "identical to" is misleading, I should
remove it.
https://lore.kernel.org/all/68143eb0-e6a7-4579-bedb-4c2ec5aaef6b@linux.microsoft.com/
Quoting:
"""
it is a generic structure that
appears to be used for several overlay page MSRs (SIMP, SIEF, etc).
But, the type doesn't appear in the hv*dk headers explicitly; it's just
used internally by the hypervisor.
I think it should be renamed with a hv_ prefix to indicate it's part of
the hypervisor ABI, and a brief comment with the provenance:
/* SYNIC_OVERLAY_PAGE_MSR - internal, identical to hv_synic_simp */
union hv_synic_overlay_page_msr {
/* <snip> */
};
"""
>
>> +union hv_synic_overlay_page_msr {
>> + u64 as_uint64;
>> + struct {
>> + u64 enabled: 1;
>> + u64 reserved: 11;
>> + u64 pfn: 52;
>> + } __packed;
>> +};
>> +
>> u8 __init get_vtl(void);
>> void mshv_vtl_return_call(struct mshv_vtl_cpu_context *vtl0);
>> +bool hv_vtl_configure_reg_page(struct mshv_vtl_per_cpu *per_cpu);
>> #else
>> static inline u8 get_vtl(void) { return 0; }
>> static inline void mshv_vtl_return_call(struct mshv_vtl_cpu_context *vtl0) {}
>> +static inline bool hv_vtl_configure_reg_page(struct mshv_vtl_per_cpu *per_cpu) { return false; }
>
> As with Patch 8, if CONFIG_HYPERV_VTL_MODE caused mshv_common.o
> to be built, this stub wouldn't be needed.
>
Acked.
>> #endif
>>
>> #endif
>> --
>> 2.43.0
>>
Regards,
Naman
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox