public inbox for linux-hyperv@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 6.19] mshv: Ignore second stats page map result failure
       [not found] <20260223123738.1532940-1-sashal@kernel.org>
@ 2026-02-23 12:37 ` Sasha Levin
  2026-02-23 12:37 ` [PATCH AUTOSEL 6.19] x86/hyperv: Move hv crash init after hypercall pg setup Sasha Levin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2026-02-23 12:37 UTC (permalink / raw)
  To: patches, stable
  Cc: Purna Pavan Chandra Aekkaladevi, Nuno Das Neves,
	Stanislav Kinsburskii, Michael Kelley, Wei Liu, Sasha Levin, kys,
	haiyangz, decui, longli, linux-hyperv, linux-kernel

From: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com>

[ Upstream commit 7538b80e5a4b473b73428d13b3a47ceaad9a8a7c ]

Older versions of the hypervisor do not have a concept of separate SELF
and PARENT stats areas. In this case, mapping the HV_STATS_AREA_SELF page
is sufficient - it's the only page and it contains all available stats.

Mapping HV_STATS_AREA_PARENT returns HV_STATUS_INVALID_PARAMETER which
currently causes module init to fail on older hypevisor versions.

Detect this case and gracefully fall back to populating
stats_pages[HV_STATS_AREA_PARENT] with the already-mapped SELF page.

Add comments to clarify the behavior, including a clarification of why
this isn't needed for hv_call_map_stats_page2() which always supports
PARENT and SELF areas.

Signed-off-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com>
Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Acked-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have all the information needed for a thorough analysis.

## Analysis

### 1. Commit Message Analysis

The commit clearly describes a backward compatibility bug: on older
Hyper-V hypervisor versions that don't support separate SELF and PARENT
stats areas, mapping `HV_STATS_AREA_PARENT` returns
`HV_STATUS_INVALID_PARAMETER`, which causes **module initialization to
fail entirely**. This is not a feature addition — it's fixing a
regression/incompatibility where the driver doesn't work on older
hypervisors.

### 2. Code Change Analysis

The fix has three parts:

**a) New helper `hv_stats_get_area_type()`** (~15 lines): Extracts the
stats area type from the identity union based on the object type. This
is needed to distinguish PARENT from SELF area mapping requests.

**b) Modified `hv_call_map_stats_page()`** (~20 lines changed): When the
hypercall returns `HV_STATUS_INVALID_PARAMETER` specifically for a
PARENT area mapping, instead of failing with an error, it returns
success but with `*addr = NULL`. This signals to the caller that PARENT
isn't supported.

**c) Modified `mshv_vp_stats_map()`** (+3 lines): After mapping PARENT,
if the address is NULL (meaning older hypervisor), it falls back to
using the already-mapped SELF page for both areas. This is safe because
on older hypervisors, the SELF page contains all available stats.

### 3. Bug Impact

- **Severity**: HIGH — the driver completely fails to create VPs
  (virtual processors), making it unusable on older hypervisor versions
- **User impact**: Anyone running the mshv_root driver on an older
  Hyper-V hypervisor version cannot use the driver at all
- **Trigger**: Deterministic — always fails on affected hypervisor
  versions, not a race or edge case

### 4. Scope and Risk

- The change is small (~40 lines including comments) and well-contained
  to the stats page mapping path
- It adds graceful degradation, not new behavior — the driver works the
  same on new hypervisors
- Multiple reviewers: Reviewed-by and Acked-by from Stanislav
  Kinsburskii, Reviewed-by from Michael Kelley
- The fallback logic (using SELF page for both) is explicitly described
  as correct by the authors since older hypervisors put all stats in the
  single SELF page

### 5. Stable Tree Applicability

- The mshv_root driver was introduced in **v6.15** (commit
  `621191d709b14`)
- The bug has been present **since the driver was first introduced** —
  the original `mshv_vp_stats_map()` function already mapped both SELF
  and PARENT areas with no fallback
- In v6.19 the function was renamed from `hv_call_map_stat_page` to
  `hv_call_map_stats_page` and wrapped by `hv_map_stats_page` (commit
  `d62313bdf5961`), so backporting to pre-6.19 trees would require minor
  adaptation
- The fix applies to all stable trees containing the mshv_root driver
  (6.15.y through 6.18.y)

### 6. Stable Criteria Assessment

| Criterion | Assessment |
|-----------|-----------|
| Obviously correct and tested | YES — well-reviewed, clear logic |
| Fixes a real bug | YES — module init failure |
| Important issue | YES — driver completely non-functional on older
hypervisors |
| Small and contained | YES — ~40 lines, 2 files, single subsystem |
| No new features | YES — adds backward compatibility/graceful
degradation |

### Verification

- `git show 621191d709b14 -- drivers/hv/mshv_root_main.c` confirmed
  `mshv_vp_stats_map()` with SELF+PARENT mapping was present in the
  original v6.15 driver introduction
- `git show 621191d709b14 -- drivers/hv/mshv_root_hv_call.c` confirmed
  the original `hv_call_map_stat_page()` had no fallback for PARENT area
  failures
- `git describe --contains 621191d709b14` returned `v6.15-rc1~193^2~2`,
  confirming v6.15 introduction
- `git describe --contains d62313bdf5961` returned `v6.19-rc1~56^2~30`,
  confirming function rename happened in v6.19
- `git log -p` diff confirmed the rename from `hv_call_map_stat_page` to
  `hv_call_map_stats_page` occurred in `d62313bdf5961`
- The bug mechanism is verified: older hypervisors return
  `HV_STATUS_INVALID_PARAMETER` for PARENT area mapping, which was not
  handled, causing the error to propagate up and fail VP creation
- Could NOT independently verify which exact stable trees (6.15.y,
  6.16.y, 6.17.y, 6.18.y) are still actively maintained (unverified)

### Conclusion

This is a legitimate bug fix for a complete driver initialization
failure on older Hyper-V hypervisor versions. The bug has existed since
the driver was first introduced in v6.15. The fix is small, well-
contained, well-reviewed, and poses minimal regression risk. Backporting
to pre-6.19 stable trees would require minor adaptation due to function
renames, but the logic is straightforward. It meets all stable kernel
criteria.

**YES**

 drivers/hv/mshv_root_hv_call.c | 52 +++++++++++++++++++++++++++++++---
 drivers/hv/mshv_root_main.c    |  3 ++
 2 files changed, 51 insertions(+), 4 deletions(-)

diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
index 598eaff4ff299..1f93b94d7580c 100644
--- a/drivers/hv/mshv_root_hv_call.c
+++ b/drivers/hv/mshv_root_hv_call.c
@@ -813,6 +813,13 @@ hv_call_notify_port_ring_empty(u32 sint_index)
 	return hv_result_to_errno(status);
 }
 
+/*
+ * Equivalent of hv_call_map_stats_page() for cases when the caller provides
+ * the map location.
+ *
+ * NOTE: This is a newer hypercall that always supports SELF and PARENT stats
+ * areas, unlike hv_call_map_stats_page().
+ */
 static int hv_call_map_stats_page2(enum hv_stats_object_type type,
 				   const union hv_stats_object_identity *identity,
 				   u64 map_location)
@@ -855,6 +862,34 @@ static int hv_call_map_stats_page2(enum hv_stats_object_type type,
 	return ret;
 }
 
+static int
+hv_stats_get_area_type(enum hv_stats_object_type type,
+		       const union hv_stats_object_identity *identity)
+{
+	switch (type) {
+	case HV_STATS_OBJECT_HYPERVISOR:
+		return identity->hv.stats_area_type;
+	case HV_STATS_OBJECT_LOGICAL_PROCESSOR:
+		return identity->lp.stats_area_type;
+	case HV_STATS_OBJECT_PARTITION:
+		return identity->partition.stats_area_type;
+	case HV_STATS_OBJECT_VP:
+		return identity->vp.stats_area_type;
+	}
+
+	return -EINVAL;
+}
+
+/*
+ * Map a stats page, where the page location is provided by the hypervisor.
+ *
+ * NOTE: The concept of separate SELF and PARENT stats areas does not exist on
+ * older hypervisor versions. All the available stats information can be found
+ * on the SELF page. When attempting to map the PARENT area on a hypervisor
+ * that doesn't support it, return "success" but with a NULL address. The
+ * caller should check for this case and instead fallback to the SELF area
+ * alone.
+ */
 static int hv_call_map_stats_page(enum hv_stats_object_type type,
 				  const union hv_stats_object_identity *identity,
 				  void **addr)
@@ -863,7 +898,7 @@ static int hv_call_map_stats_page(enum hv_stats_object_type type,
 	struct hv_input_map_stats_page *input;
 	struct hv_output_map_stats_page *output;
 	u64 status, pfn;
-	int ret = 0;
+	int hv_status, ret = 0;
 
 	do {
 		local_irq_save(flags);
@@ -878,11 +913,20 @@ static int hv_call_map_stats_page(enum hv_stats_object_type type,
 		pfn = output->map_location;
 
 		local_irq_restore(flags);
-		if (hv_result(status) != HV_STATUS_INSUFFICIENT_MEMORY) {
-			ret = hv_result_to_errno(status);
+
+		hv_status = hv_result(status);
+		if (hv_status != HV_STATUS_INSUFFICIENT_MEMORY) {
 			if (hv_result_success(status))
 				break;
-			return ret;
+
+			if (hv_stats_get_area_type(type, identity) == HV_STATS_AREA_PARENT &&
+			    hv_status == HV_STATUS_INVALID_PARAMETER) {
+				*addr = NULL;
+				return 0;
+			}
+
+			hv_status_debug(status, "\n");
+			return hv_result_to_errno(status);
 		}
 
 		ret = hv_call_deposit_pages(NUMA_NO_NODE,
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index 681b58154d5ea..d3e8a66443ad6 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -993,6 +993,9 @@ static int mshv_vp_stats_map(u64 partition_id, u32 vp_index,
 	if (err)
 		goto unmap_self;
 
+	if (!stats_pages[HV_STATS_AREA_PARENT])
+		stats_pages[HV_STATS_AREA_PARENT] = stats_pages[HV_STATS_AREA_SELF];
+
 	return 0;
 
 unmap_self:
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH AUTOSEL 6.19] x86/hyperv: Move hv crash init after hypercall pg setup
       [not found] <20260223123738.1532940-1-sashal@kernel.org>
  2026-02-23 12:37 ` [PATCH AUTOSEL 6.19] mshv: Ignore second stats page map result failure Sasha Levin
@ 2026-02-23 12:37 ` Sasha Levin
  2026-02-23 12:37 ` [PATCH AUTOSEL 6.19-6.18] mshv: clear eventfd counter on irqfd shutdown Sasha Levin
  2026-02-23 12:37 ` [PATCH AUTOSEL 6.19] Drivers: hv: vmbus: Use kthread for vmbus interrupts on PREEMPT_RT Sasha Levin
  3 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2026-02-23 12:37 UTC (permalink / raw)
  To: patches, stable
  Cc: Mukesh R, Wei Liu, Sasha Levin, kys, haiyangz, decui, longli,
	tglx, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel

From: Mukesh R <mrathor@linux.microsoft.com>

[ Upstream commit c3a6ae7ea2d3f507cbddb5818ccc65b9d84d6dc7 ]

hv_root_crash_init() is not setting up the hypervisor crash collection
for baremetal cases because when it's called, hypervisor page is not
setup.

Fix is simple, just move the crash init call after the hypercall
page setup.

Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have a clear picture. Let me write my analysis.

## Analysis

### What the commit does

This commit fixes an initialization ordering bug in the Hyper-V root
partition crash dump setup on x86. The function `hv_root_crash_init()`
needs to make hypercalls (specifically `HVCALL_GET_SYSTEM_PROPERTY`) to
set up crash dump collection for the root partition. However, it was
being called **before** `hv_set_hypercall_pg()`, which is the function
that installs the hypercall page into the static call trampoline.
Without `hv_set_hypercall_pg()` having run, any hypercalls made by
`hv_root_crash_init()` would fail silently (or call the `__hv_hyperfail`
stub), meaning crash collection was never properly set up on baremetal
Hyper-V root partitions.

The fix is small and surgical: move the `hv_root_crash_init()` call from
inside the `if (hv_root_partition())` block (before
`hv_set_hypercall_pg()`) to after `hv_set_hypercall_pg()`, with an
explicit `hv_root_partition()` guard.

### Does it fix a real bug?

Yes. On baremetal Hyper-V root partitions, crash dump collection was
completely non-functional. This is a real bug that affects kernel crash
diagnostics in production Hyper-V environments.

### Size and scope

Very small: 1 file, 3 lines added, 1 line removed. The change is a
simple reordering of an existing function call.

### Dependency analysis - CRITICAL ISSUE

The prerequisite commit `77c860d2dbb72` ("x86/hyperv: Enable build of
hypervisor crashdump collection files") that **introduced**
`hv_root_crash_init()` was first merged in **v6.19-rc1**. It is NOT
present in v6.18.y or any earlier stable trees.

This means:
- The code being fixed (`hv_root_crash_init()`) does not exist in any
  stable tree prior to 6.19.y
- The bug was introduced in v6.19-rc1 and this fix targets the same
  v6.19.y tree
- For stable trees 6.18.y and older, there is nothing to fix — the buggy
  code doesn't exist there

### Risk assessment

For 6.19.y stable: Very low risk. The change is a simple reordering of
an initialization call, only affects Hyper-V root partition (baremetal)
configurations, and the commit is authored by the same developer who
introduced the feature.

### Stable kernel criteria

- Obviously correct: Yes, the ordering dependency is clear
- Fixes a real bug: Yes, crash dump collection fails on root partitions
- Small and contained: Yes, 4-line change in 1 file
- No new features: Correct, just reorders existing initialization

### Verdict

This is a valid bugfix for v6.19.y stable. It fixes code that was
introduced in v6.19-rc1 and is only relevant to the 6.19.y stable tree.
For that tree, it should be backported.

### Verification

- **git log** confirmed `77c860d2dbb72` introduced
  `hv_root_crash_init()` on 2025-10-06
- **git tag --contains** confirmed `77c860d2dbb72` is in v6.19-rc1 and
  v6.19 but NOT in v6.18.13
- **git merge-base --is-ancestor** confirmed the prerequisite is NOT in
  v6.18.y stable
- **Read of hv_init.c:63-70** confirmed `hv_set_hypercall_pg()` sets up
  the static call trampoline needed for hypercalls to work
- **Read of hv_init.c:530-589** confirmed the ordering:
  `hv_root_crash_init()` was called at line 561, before
  `hv_set_hypercall_pg()` at line 568
- The fix commit `c3a6ae7ea2d3f` changes 1 file, 3 insertions, 1
  deletion — verified via `git show --stat`
- The Explore agent confirmed `hv_root_crash_init()` makes hypercalls
  (`HVCALL_GET_SYSTEM_PROPERTY`) that require the hypercall page to be
  set up

**YES**

 arch/x86/hyperv/hv_init.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 14de43f4bc6c1..7f3301bd081ec 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -558,7 +558,6 @@ void __init hyperv_init(void)
 		memunmap(src);
 
 		hv_remap_tsc_clocksource();
-		hv_root_crash_init();
 		hv_sleep_notifiers_register();
 	} else {
 		hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
@@ -567,6 +566,9 @@ void __init hyperv_init(void)
 
 	hv_set_hypercall_pg(hv_hypercall_pg);
 
+	if (hv_root_partition())        /* after set hypercall pg */
+		hv_root_crash_init();
+
 skip_hypercall_pg_init:
 	/*
 	 * hyperv_init() is called before LAPIC is initialized: see
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH AUTOSEL 6.19-6.18] mshv: clear eventfd counter on irqfd shutdown
       [not found] <20260223123738.1532940-1-sashal@kernel.org>
  2026-02-23 12:37 ` [PATCH AUTOSEL 6.19] mshv: Ignore second stats page map result failure Sasha Levin
  2026-02-23 12:37 ` [PATCH AUTOSEL 6.19] x86/hyperv: Move hv crash init after hypercall pg setup Sasha Levin
@ 2026-02-23 12:37 ` Sasha Levin
  2026-02-23 12:37 ` [PATCH AUTOSEL 6.19] Drivers: hv: vmbus: Use kthread for vmbus interrupts on PREEMPT_RT Sasha Levin
  3 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2026-02-23 12:37 UTC (permalink / raw)
  To: patches, stable
  Cc: Carlos López, Wei Liu, Sasha Levin, kys, haiyangz, decui,
	longli, linux-hyperv, linux-kernel

From: Carlos López <clopez@suse.de>

[ Upstream commit 2b4246153e2184e3a3b4edc8cc35337d7a2455a6 ]

While unhooking from the irqfd waitqueue, clear the internal eventfd
counter by using eventfd_ctx_remove_wait_queue() instead of
remove_wait_queue(), preventing potential spurious interrupts. This
removes the need to store a pointer into the workqueue, as the eventfd
already keeps track of it.

This mimicks what other similar subsystems do on their equivalent paths
with their irqfds (KVM, Xen, ACRN support, etc).

Signed-off-by: Carlos López <clopez@suse.de>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have a clear picture. Let me provide my analysis.

## Analysis

### 1. Commit Message Analysis

The commit replaces `remove_wait_queue()` with
`eventfd_ctx_remove_wait_queue()` in `mshv_irqfd_shutdown()`. The key
claim is that this "prevents potential spurious interrupts" by clearing
the eventfd counter atomically when unhooking from the waitqueue. The
commit also removes the now-unnecessary `irqfd_wqh` pointer from the
struct.

The phrase "potential spurious interrupts" uses the word "potential" —
suggesting this is a preventive/hardening fix rather than a response to
an observed bug.

### 2. Code Change Analysis

The change is small and well-defined:
- **`mshv_irqfd_shutdown()`**: `remove_wait_queue(irqfd->irqfd_wqh,
  &irqfd->irqfd_wait)` →
  `eventfd_ctx_remove_wait_queue(irqfd->irqfd_eventfd_ctx,
  &irqfd->irqfd_wait, &cnt)`. The new call atomically removes the waiter
  AND resets the eventfd counter to zero.
- **`mshv_irqfd_queue_proc()`**: Removes `irqfd->irqfd_wqh = wqh` since
  the field is no longer needed.
- **`struct mshv_irqfd`**: Removes the `irqfd_wqh` field.

Without clearing the counter, if an eventfd had been signaled before
shutdown completes, stale events could remain in the counter. This is a
real correctness concern, though labeled as "potential."

### 3. Pattern Match with KVM/Xen/ACRN/VFIO

All four analogous subsystems use `eventfd_ctx_remove_wait_queue()` in
their irqfd shutdown paths:
- `virt/kvm/eventfd.c:136`
- `drivers/xen/privcmd.c:906`
- `drivers/virt/acrn/irqfd.c:55`
- `drivers/vfio/virqfd.c:90`

The mshv code was the sole outlier using plain `remove_wait_queue()`.
This is a well-established pattern for correct irqfd teardown.

### 4. Driver Age and Stable Tree Applicability

The mshv driver was introduced in v6.15-rc1 (commit `621191d709b14`). It
would only exist in stable trees 6.15.y and newer (6.16.y, 6.17.y,
6.18.y). It does NOT exist in any LTS trees (6.12.y, 6.6.y, 6.1.y,
5.15.y).

### 5. Risk Assessment

- **Size**: Very small — changes 2 files, net removal of code (removes a
  struct field and an assignment)
- **Risk**: Very low — uses a well-understood API that's already used by
  KVM, Xen, ACRN, and VFIO
- **Scope**: Confined to mshv irqfd shutdown path only

### 6. Bug Severity

Without this fix, if the eventfd counter is non-zero during shutdown,
leftover events could trigger spurious interrupt injection into the
guest VM. In a hypervisor context, spurious interrupts can cause guest
OS instability. While this is described as "potential" (no specific bug
report), the failure mode is real and the fix is straightforward.

### 7. Stable Criteria Assessment

- **Obviously correct**: YES — follows the exact pattern used by KVM,
  Xen, ACRN, VFIO
- **Fixes a real bug**: Borderline — fixes a latent correctness issue
  rather than a reported crash
- **Small and contained**: YES — minimal change
- **No new features**: YES — this is purely a correctness fix
- **Tested**: It mirrors established code in KVM and other subsystems

### Verification

- `git log --follow --diff-filter=A -- drivers/hv/mshv_eventfd.c`
  confirmed the file was introduced in commit `621191d709b14` ("Drivers:
  hv: Introduce mshv_root module")
- `git describe --tags --contains 621191d709b14` confirmed this was
  introduced in v6.15-rc1
- Grep for `eventfd_ctx_remove_wait_queue` confirmed all four analogous
  subsystems (KVM, Xen, ACRN, VFIO) use this API in their shutdown paths
- Read of `drivers/hv/mshv_eventfd.c` confirmed the pre-patch code uses
  `remove_wait_queue()` with the stored `irqfd_wqh` pointer (line 255)
- Verified stable tree tags exist for 6.15.y through 6.18.y that would
  contain this driver
- Could NOT verify any specific user-reported bugs caused by the lack of
  counter clearing (unverified — commit only says "potential")

### Conclusion

This is a small, low-risk correctness fix that aligns mshv with the
well-established pattern used by KVM, Xen, ACRN, and VFIO for irqfd
shutdown. It prevents stale eventfd events from potentially causing
spurious interrupts in guest VMs. The fix is obviously correct, tiny in
scope, and carries essentially zero regression risk. While it addresses
a "potential" rather than actively reported issue, the fix is clearly
the right thing to do for stable users running Microsoft Hypervisor
workloads on 6.15+ kernels.

**YES**

 drivers/hv/mshv_eventfd.c | 5 ++---
 drivers/hv/mshv_eventfd.h | 1 -
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/hv/mshv_eventfd.c b/drivers/hv/mshv_eventfd.c
index 0b75ff1edb735..cb8b24b81cd5e 100644
--- a/drivers/hv/mshv_eventfd.c
+++ b/drivers/hv/mshv_eventfd.c
@@ -247,12 +247,13 @@ static void mshv_irqfd_shutdown(struct work_struct *work)
 {
 	struct mshv_irqfd *irqfd =
 			container_of(work, struct mshv_irqfd, irqfd_shutdown);
+	u64 cnt;
 
 	/*
 	 * Synchronize with the wait-queue and unhook ourselves to prevent
 	 * further events.
 	 */
-	remove_wait_queue(irqfd->irqfd_wqh, &irqfd->irqfd_wait);
+	eventfd_ctx_remove_wait_queue(irqfd->irqfd_eventfd_ctx, &irqfd->irqfd_wait, &cnt);
 
 	if (irqfd->irqfd_resampler) {
 		mshv_irqfd_resampler_shutdown(irqfd);
@@ -371,8 +372,6 @@ static void mshv_irqfd_queue_proc(struct file *file, wait_queue_head_t *wqh,
 	struct mshv_irqfd *irqfd =
 			container_of(polltbl, struct mshv_irqfd, irqfd_polltbl);
 
-	irqfd->irqfd_wqh = wqh;
-
 	/*
 	 * TODO: Ensure there isn't already an exclusive, priority waiter, e.g.
 	 * that the irqfd isn't already bound to another partition.  Only the
diff --git a/drivers/hv/mshv_eventfd.h b/drivers/hv/mshv_eventfd.h
index 332e7670a3442..464c6b81ab336 100644
--- a/drivers/hv/mshv_eventfd.h
+++ b/drivers/hv/mshv_eventfd.h
@@ -32,7 +32,6 @@ struct mshv_irqfd {
 	struct mshv_lapic_irq		     irqfd_lapic_irq;
 	struct hlist_node		     irqfd_hnode;
 	poll_table			     irqfd_polltbl;
-	wait_queue_head_t		    *irqfd_wqh;
 	wait_queue_entry_t		     irqfd_wait;
 	struct work_struct		     irqfd_shutdown;
 	struct mshv_irqfd_resampler	    *irqfd_resampler;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH AUTOSEL 6.19] Drivers: hv: vmbus: Use kthread for vmbus interrupts on PREEMPT_RT
       [not found] <20260223123738.1532940-1-sashal@kernel.org>
                   ` (2 preceding siblings ...)
  2026-02-23 12:37 ` [PATCH AUTOSEL 6.19-6.18] mshv: clear eventfd counter on irqfd shutdown Sasha Levin
@ 2026-02-23 12:37 ` Sasha Levin
  3 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2026-02-23 12:37 UTC (permalink / raw)
  To: patches, stable
  Cc: Jan Kiszka, Florian Bezdeka, Michael Kelley, Wei Liu, Sasha Levin,
	kys, haiyangz, decui, longli, linux-hyperv, linux-kernel

From: Jan Kiszka <jan.kiszka@siemens.com>

[ Upstream commit f8e6343b7a89c7c649db5a9e309ba7aa20401813 ]

Resolves the following lockdep report when booting PREEMPT_RT on Hyper-V
with related guest support enabled:

[    1.127941] hv_vmbus: registering driver hyperv_drm

[    1.132518] =============================
[    1.132519] [ BUG: Invalid wait context ]
[    1.132521] 6.19.0-rc8+ #9 Not tainted
[    1.132524] -----------------------------
[    1.132525] swapper/0/0 is trying to lock:
[    1.132526] ffff8b9381bb3c90 (&channel->sched_lock){....}-{3:3}, at: vmbus_chan_sched+0xc4/0x2b0
[    1.132543] other info that might help us debug this:
[    1.132544] context-{2:2}
[    1.132545] 1 lock held by swapper/0/0:
[    1.132547]  #0: ffffffffa010c4c0 (rcu_read_lock){....}-{1:3}, at: vmbus_chan_sched+0x31/0x2b0
[    1.132557] stack backtrace:
[    1.132560] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.19.0-rc8+ #9 PREEMPT_{RT,(lazy)}
[    1.132565] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/25/2025
[    1.132567] Call Trace:
[    1.132570]  <IRQ>
[    1.132573]  dump_stack_lvl+0x6e/0xa0
[    1.132581]  __lock_acquire+0xee0/0x21b0
[    1.132592]  lock_acquire+0xd5/0x2d0
[    1.132598]  ? vmbus_chan_sched+0xc4/0x2b0
[    1.132606]  ? lock_acquire+0xd5/0x2d0
[    1.132613]  ? vmbus_chan_sched+0x31/0x2b0
[    1.132619]  rt_spin_lock+0x3f/0x1f0
[    1.132623]  ? vmbus_chan_sched+0xc4/0x2b0
[    1.132629]  ? vmbus_chan_sched+0x31/0x2b0
[    1.132634]  vmbus_chan_sched+0xc4/0x2b0
[    1.132641]  vmbus_isr+0x2c/0x150
[    1.132648]  __sysvec_hyperv_callback+0x5f/0xa0
[    1.132654]  sysvec_hyperv_callback+0x88/0xb0
[    1.132658]  </IRQ>
[    1.132659]  <TASK>
[    1.132660]  asm_sysvec_hyperv_callback+0x1a/0x20

As code paths that handle vmbus IRQs use sleepy locks under PREEMPT_RT,
the vmbus_isr execution needs to be moved into thread context. Open-
coding this allows to skip the IPI that irq_work would additionally
bring and which we do not need, being an IRQ, never an NMI.

This affects both x86 and arm64, therefore hook into the common driver
logic.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Florian Bezdeka <florian.bezdeka@siemens.com>
Tested-by: Florian Bezdeka <florian.bezdeka@siemens.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

The bug exists in all stable trees that have PREEMPT_RT support.
However, the patch would need significant adaptation to apply to them
due to the 6.19-specific refactoring of `vmbus_isr` and
`vmbus_chan_sched`.

## Analysis

### What the commit fixes

This commit fixes a **sleeping-in-atomic-context bug** on PREEMPT_RT
kernels running on Hyper-V. The issue is that `vmbus_isr()` runs in hard
IRQ context (called from `sysvec_hyperv_callback` on x86) and acquires
`spin_lock(&channel->sched_lock)` via `vmbus_chan_sched()`. Under
PREEMPT_RT, spinlocks are converted to `rt_spin_lock` (sleeping locks),
which cannot be acquired from hard IRQ context. This triggers a lockdep
"BUG: Invalid wait context" and represents a real correctness issue (not
just a warning).

### Does it fix a real bug?

**Yes.** This is a legitimate bug that makes Hyper-V VMs with PREEMPT_RT
unusable or unstable. The lockdep trace is from real testing (6.19-rc8).
The issue affects all PREEMPT_RT Hyper-V guests.

### Stable kernel rule assessment

1. **Obviously correct and tested**: Yes - reviewed by Michael Kelley
   (Hyper-V maintainer) and Florian Bezdeka, tested by both.
2. **Fixes a real bug**: Yes - sleeping in hardirq context is a real bug
   on PREEMPT_RT.
3. **Important issue**: Moderate - affects PREEMPT_RT on Hyper-V, which
   is a meaningful but somewhat niche combination.
4. **Small and contained**: Borderline - ~80 lines in one file, but adds
   new per-CPU thread infrastructure.
5. **No new features**: The kthread is a mechanism to fix the bug, not a
   feature.

### Risk vs benefit

- **Benefit**: Fixes a real bug that makes PREEMPT_RT on Hyper-V broken.
- **Risk**: Low for non-RT kernels (everything is behind
  `IS_ENABLED(CONFIG_PREEMPT_RT)`, which is compile-time). Moderate for
  RT kernels (new kthread infrastructure, though using well-established
  `smpboot` API).

### Backport concerns

**Critical issue: Dependencies.** This patch was written against the
6.19 codebase which has undergone significant refactoring:
- `vmbus_isr()` changed from `static` to exported
  (`EXPORT_SYMBOL_FOR_MODULES`) in 6.19 via commit `cffe9f58de1eb`
- `vmbus_chan_sched()` signature changed from `vmbus_chan_sched(hv_cpu)`
  to `vmbus_chan_sched(event_page_addr)` in 6.19 via commit
  `163224c189e8b`
- The `vmbus_message_sched()` helper was factored out in 6.19

The patch **will not apply cleanly** to any existing stable tree
(6.12.y, 6.6.y, 6.1.y). A manual backport would be needed, adapting the
fix to the older `vmbus_isr` structure. While the core concept (use
kthread for RT) would work, the adaptation is non-trivial.

### Verification

- **git show v6.12, v6.6, v6.1 kernel/Kconfig.preempt**: Confirmed
  PREEMPT_RT config option exists in all these stable trees
- **git show v6.1 include/linux/smpboot.h**: Confirmed
  `smpboot_register_percpu_thread` API available since at least 6.1
- **git show v6.18 drivers/hv/vmbus_drv.c**: Confirmed `vmbus_isr` is
  `static` in v6.18, only exported in 6.19
- **git show v6.12, v6.6 drivers/hv/vmbus_drv.c**: Confirmed
  `vmbus_chan_sched()` takes `hv_cpu` (not `event_page_addr`) in older
  kernels
- **git log v6.18..v6.19 drivers/hv/vmbus_drv.c**: Identified
  prerequisite commits (163224c189e8b, cffe9f58de1eb) that refactored
  the code
- **Read arch/x86/kernel/cpu/mshyperv.c lines 153-168**: Confirmed
  `vmbus_handler()` is called from `sysvec_hyperv_callback` IDTENTRY
  (hard IRQ context)
- **Read drivers/hv/vmbus_drv.c lines 1305**: Confirmed
  `spin_lock(&channel->sched_lock)` is the sleeping lock in the IRQ path
- **git show v6.12, v6.6**: Confirmed the `sched_lock` spin_lock exists
  in stable trees' vmbus_chan_sched, confirming the bug exists there too

### Decision

This is a legitimate bug fix for PREEMPT_RT on Hyper-V. The bug is real
and affects all stable trees with PREEMPT_RT. However, the patch:

1. Is moderate in size, adding new per-CPU thread infrastructure (~80
   lines)
2. Has significant dependencies on 6.19-specific refactoring and won't
   apply cleanly to any stable tree
3. Targets a somewhat niche combination (PREEMPT_RT + Hyper-V)
4. Would require careful manual adaptation for each stable tree

The fix is well-reviewed and technically sound, and it fixes a real bug.
Despite the backport complexity, the bug is severe enough (sleeping-in-
atomic-context breaks PREEMPT_RT on Hyper-V) that it warrants
backporting with appropriate adaptation. The
`IS_ENABLED(CONFIG_PREEMPT_RT)` guard ensures zero risk to non-RT users.

**YES**

 drivers/hv/vmbus_drv.c | 66 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 65 insertions(+), 1 deletion(-)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index a53af6fe81a65..1d5cba142828e 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -25,6 +25,7 @@
 #include <linux/cpu.h>
 #include <linux/sched/isolation.h>
 #include <linux/sched/task_stack.h>
+#include <linux/smpboot.h>
 
 #include <linux/delay.h>
 #include <linux/panic_notifier.h>
@@ -1350,7 +1351,7 @@ static void vmbus_message_sched(struct hv_per_cpu_context *hv_cpu, void *message
 	}
 }
 
-void vmbus_isr(void)
+static void __vmbus_isr(void)
 {
 	struct hv_per_cpu_context *hv_cpu
 		= this_cpu_ptr(hv_context.cpu_context);
@@ -1363,6 +1364,53 @@ void vmbus_isr(void)
 
 	add_interrupt_randomness(vmbus_interrupt);
 }
+
+static DEFINE_PER_CPU(bool, vmbus_irq_pending);
+static DEFINE_PER_CPU(struct task_struct *, vmbus_irqd);
+
+static void vmbus_irqd_wake(void)
+{
+	struct task_struct *tsk = __this_cpu_read(vmbus_irqd);
+
+	__this_cpu_write(vmbus_irq_pending, true);
+	wake_up_process(tsk);
+}
+
+static void vmbus_irqd_setup(unsigned int cpu)
+{
+	sched_set_fifo(current);
+}
+
+static int vmbus_irqd_should_run(unsigned int cpu)
+{
+	return __this_cpu_read(vmbus_irq_pending);
+}
+
+static void run_vmbus_irqd(unsigned int cpu)
+{
+	__this_cpu_write(vmbus_irq_pending, false);
+	__vmbus_isr();
+}
+
+static bool vmbus_irq_initialized;
+
+static struct smp_hotplug_thread vmbus_irq_threads = {
+	.store                  = &vmbus_irqd,
+	.setup			= vmbus_irqd_setup,
+	.thread_should_run      = vmbus_irqd_should_run,
+	.thread_fn              = run_vmbus_irqd,
+	.thread_comm            = "vmbus_irq/%u",
+};
+
+void vmbus_isr(void)
+{
+	if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
+		vmbus_irqd_wake();
+	} else {
+		lockdep_hardirq_threaded();
+		__vmbus_isr();
+	}
+}
 EXPORT_SYMBOL_FOR_MODULES(vmbus_isr, "mshv_vtl");
 
 static irqreturn_t vmbus_percpu_isr(int irq, void *dev_id)
@@ -1462,6 +1510,13 @@ static int vmbus_bus_init(void)
 	 * the VMbus interrupt handler.
 	 */
 
+	if (IS_ENABLED(CONFIG_PREEMPT_RT) && !vmbus_irq_initialized) {
+		ret = smpboot_register_percpu_thread(&vmbus_irq_threads);
+		if (ret)
+			goto err_kthread;
+		vmbus_irq_initialized = true;
+	}
+
 	if (vmbus_irq == -1) {
 		hv_setup_vmbus_handler(vmbus_isr);
 	} else {
@@ -1507,6 +1562,11 @@ static int vmbus_bus_init(void)
 		free_percpu(vmbus_evt);
 	}
 err_setup:
+	if (IS_ENABLED(CONFIG_PREEMPT_RT) && vmbus_irq_initialized) {
+		smpboot_unregister_percpu_thread(&vmbus_irq_threads);
+		vmbus_irq_initialized = false;
+	}
+err_kthread:
 	bus_unregister(&hv_bus);
 	return ret;
 }
@@ -2976,6 +3036,10 @@ static void __exit vmbus_exit(void)
 		free_percpu_irq(vmbus_irq, vmbus_evt);
 		free_percpu(vmbus_evt);
 	}
+	if (IS_ENABLED(CONFIG_PREEMPT_RT) && vmbus_irq_initialized) {
+		smpboot_unregister_percpu_thread(&vmbus_irq_threads);
+		vmbus_irq_initialized = false;
+	}
 	for_each_online_cpu(cpu) {
 		struct hv_per_cpu_context *hv_cpu
 			= per_cpu_ptr(hv_context.cpu_context, cpu);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-02-23 12:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260223123738.1532940-1-sashal@kernel.org>
2026-02-23 12:37 ` [PATCH AUTOSEL 6.19] mshv: Ignore second stats page map result failure Sasha Levin
2026-02-23 12:37 ` [PATCH AUTOSEL 6.19] x86/hyperv: Move hv crash init after hypercall pg setup Sasha Levin
2026-02-23 12:37 ` [PATCH AUTOSEL 6.19-6.18] mshv: clear eventfd counter on irqfd shutdown Sasha Levin
2026-02-23 12:37 ` [PATCH AUTOSEL 6.19] Drivers: hv: vmbus: Use kthread for vmbus interrupts on PREEMPT_RT Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox