* [PATCH] PCI: hv: add hard timeout to wait_for_response()
@ 2026-06-12 17:40 Hamza Mahfooz
2026-06-12 17:52 ` sashiko-bot
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Hamza Mahfooz @ 2026-06-12 17:40 UTC (permalink / raw)
To: linux-hyperv
Cc: K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Lorenzo Pieralisi, Krzysztof Wilczyński,
Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas, linux-pci,
Hamza Mahfooz, stable
It is possible that we never receive a rescind event, in which case we
will wait indefinitely for a device that will never show up. So, assume
a device is gone if have been polling for more than 5 seconds.
Cc: stable@vger.kernel.org
Fixes: c3635da2a336 ("PCI: hv: Do not wait forever on a device that has disappeared")
Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
---
drivers/pci/controller/pci-hyperv.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index cfc8fa403dad..bd63efc4a210 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -52,6 +52,7 @@
#include <linux/acpi.h>
#include <linux/sizes.h>
#include <linux/of_irq.h>
+#include <linux/jiffies.h>
#include <asm/mshyperv.h>
/*
@@ -1038,6 +1039,8 @@ static void put_pcichild(struct hv_pci_dev *hpdev)
kfree(hpdev);
}
+#define TIMEOUT_MS 5000
+
/*
* There is no good way to get notified from vmbus_onoffer_rescind(),
* so let's use polling here, since this is not a hot path.
@@ -1045,8 +1048,13 @@ static void put_pcichild(struct hv_pci_dev *hpdev)
static int wait_for_response(struct hv_device *hdev,
struct completion *comp)
{
+ unsigned long timeout = get_jiffies_64() + msecs_to_jiffies(TIMEOUT_MS);
+ unsigned long now;
+
while (true) {
- if (hdev->channel->rescind) {
+ now = get_jiffies_64();
+ if (hdev->channel->rescind ||
+ time_after(now, timeout)) {
dev_warn_once(&hdev->device, "The device is gone.\n");
return -ENODEV;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] PCI: hv: add hard timeout to wait_for_response()
2026-06-12 17:40 [PATCH] PCI: hv: add hard timeout to wait_for_response() Hamza Mahfooz
@ 2026-06-12 17:52 ` sashiko-bot
2026-06-12 18:04 ` Long Li
2026-06-12 19:27 ` Easwar Hariharan
2 siblings, 0 replies; 5+ messages in thread
From: sashiko-bot @ 2026-06-12 17:52 UTC (permalink / raw)
To: Hamza Mahfooz; +Cc: linux-pci, linux-hyperv
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Critical] Use-After-Free and Stack Corruption due to abandoned VMBus requests on timeout.
- [Medium] Race condition in wait_for_response() polling loop can falsely trigger timeout.
--
commit 18b7ab3e09172b3f0651ad176d0ab6367f1e43bb
Author: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
PCI: hv: add hard timeout to wait_for_response()
This commit adds a hard 5-second timeout to wait_for_response() in the
hyper-V PCI controller driver. It addresses an issue where the driver could
wait indefinitely for a device to appear if a rescind event is never
received.
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index cfc8fa403dad6..bd63efc4a2103 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
[ ... ]
> @@ -1045,8 +1048,13 @@ static void put_pcichild(struct hv_pci_dev *hpdev)
> static int wait_for_response(struct hv_device *hdev,
> struct completion *comp)
> {
> + unsigned long timeout = get_jiffies_64() + msecs_to_jiffies(TIMEOUT_MS);
> + unsigned long now;
> +
> while (true) {
> - if (hdev->channel->rescind) {
> + now = get_jiffies_64();
> + if (hdev->channel->rescind ||
> + time_after(now, timeout)) {
[Severity: Medium]
Is there a race condition if the thread is preempted right before updating
the time?
If the thread is preempted just before calling get_jiffies_64() and the
delay crosses the 5-second boundary, this will evaluate to true and
immediately return -ENODEV.
Since there is no final check of the completion status after the timeout
check, could this falsely assume a timeout even if the Hyper-V host
successfully replied during the preemption window?
> dev_warn_once(&hdev->device, "The device is gone.\n");
> return -ENODEV;
[Severity: Critical]
By adding this timeout, do we risk a use-after-free and stack corruption if
the VMBus request is never explicitly cancelled?
When we return -ENODEV early here, callers like hv_send_resources_allocated()
will free their heap-allocated request packets and return, which pops the
local completion structure off the stack.
However, since the VMBus channel remains active and the pending request is
not removed from the requestor array, what happens if the Hyper-V host
eventually replies?
Wouldn't hv_pci_onchannelcallback() fetch the dangling packet pointer and
invoke comp_packet->completion_func(...) on freed memory?
It seems this could also corrupt the stack when writing to the destroyed
completion structure. Additionally, in hv_pci_query_relations(), returning
on timeout leaves hbus->survey_event pointing to a local stack variable
that will later be completed by survey_child_resources().
Does this also leak a slot in the VMBus requestor array for each timed-out
request?
> }
>
> if (wait_for_completion_timeout(comp, HZ / 10))
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260612174010.2598695-1-hamzamahfooz@linux.microsoft.com?part=1
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [PATCH] PCI: hv: add hard timeout to wait_for_response()
2026-06-12 17:40 [PATCH] PCI: hv: add hard timeout to wait_for_response() Hamza Mahfooz
2026-06-12 17:52 ` sashiko-bot
@ 2026-06-12 18:04 ` Long Li
2026-06-12 18:44 ` Hamza Mahfooz
2026-06-12 19:27 ` Easwar Hariharan
2 siblings, 1 reply; 5+ messages in thread
From: Long Li @ 2026-06-12 18:04 UTC (permalink / raw)
To: Hamza Mahfooz, linux-hyperv@vger.kernel.org
Cc: KY Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Lorenzo Pieralisi, Krzysztof Wilczyński,
Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas,
linux-pci@vger.kernel.org, stable@vger.kernel.org
> Subject: [PATCH] PCI: hv: add hard timeout to wait_for_response()
>
> It is possible that we never receive a rescind event, in which case we will wait
> indefinitely for a device that will never show up. So, assume a device is gone if
> have been polling for more than 5 seconds.
Can you explain in what situation we never receive a rescind event? If this for dealing a device unload when the vmbus is dead? Please provide more context. A kernel trace is helpful.
Does this patch handle the case where the rescind event comes in right after the timeout?
>
> Cc: stable@vger.kernel.org
> Fixes: c3635da2a336 ("PCI: hv: Do not wait forever on a device that has
> disappeared")
> Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
> ---
> drivers/pci/controller/pci-hyperv.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-
> hyperv.c
> index cfc8fa403dad..bd63efc4a210 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -52,6 +52,7 @@
> #include <linux/acpi.h>
> #include <linux/sizes.h>
> #include <linux/of_irq.h>
> +#include <linux/jiffies.h>
> #include <asm/mshyperv.h>
>
> /*
> @@ -1038,6 +1039,8 @@ static void put_pcichild(struct hv_pci_dev *hpdev)
> kfree(hpdev);
> }
>
> +#define TIMEOUT_MS 5000
> +
> /*
> * There is no good way to get notified from vmbus_onoffer_rescind(),
> * so let's use polling here, since this is not a hot path.
> @@ -1045,8 +1048,13 @@ static void put_pcichild(struct hv_pci_dev *hpdev)
> static int wait_for_response(struct hv_device *hdev,
> struct completion *comp)
> {
> + unsigned long timeout = get_jiffies_64() +
> msecs_to_jiffies(TIMEOUT_MS);
> + unsigned long now;
> +
> while (true) {
> - if (hdev->channel->rescind) {
> + now = get_jiffies_64();
> + if (hdev->channel->rescind ||
> + time_after(now, timeout)) {
What if the VMBUS response comes in right after this check? The completion is allocated on the caller stack, and it will cause kernel OOP.
How do you test this patch?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] PCI: hv: add hard timeout to wait_for_response()
2026-06-12 18:04 ` Long Li
@ 2026-06-12 18:44 ` Hamza Mahfooz
0 siblings, 0 replies; 5+ messages in thread
From: Hamza Mahfooz @ 2026-06-12 18:44 UTC (permalink / raw)
To: Long Li
Cc: linux-hyperv@vger.kernel.org, KY Srinivasan, Haiyang Zhang,
Wei Liu, Dexuan Cui, Lorenzo Pieralisi, Krzysztof Wilczyński,
Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas,
linux-pci@vger.kernel.org, stable@vger.kernel.org
On Fri, Jun 12, 2026 at 06:04:22PM +0000, Long Li wrote:
> What if the VMBUS response comes in right after this check? The completion is allocated on the caller stack, and it will cause kernel OOP.
That is a fair point, I'll try to see if there is a better way to handle this error case.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] PCI: hv: add hard timeout to wait_for_response()
2026-06-12 17:40 [PATCH] PCI: hv: add hard timeout to wait_for_response() Hamza Mahfooz
2026-06-12 17:52 ` sashiko-bot
2026-06-12 18:04 ` Long Li
@ 2026-06-12 19:27 ` Easwar Hariharan
2 siblings, 0 replies; 5+ messages in thread
From: Easwar Hariharan @ 2026-06-12 19:27 UTC (permalink / raw)
To: Hamza Mahfooz
Cc: linux-hyperv, easwar.hariharan, K. Y. Srinivasan, Haiyang Zhang,
Wei Liu, Dexuan Cui, Long Li, Lorenzo Pieralisi,
Krzysztof Wilczyński, Manivannan Sadhasivam, Rob Herring,
Bjorn Helgaas, linux-pci, stable
On 6/12/2026 10:40, Hamza Mahfooz wrote:
> It is possible that we never receive a rescind event, in which case we
> will wait indefinitely for a device that will never show up. So, assume
> a device is gone if have been polling for more than 5 seconds.
>
Echo Long's request for more context on where this was seen.
> Cc: stable@vger.kernel.org
> Fixes: c3635da2a336 ("PCI: hv: Do not wait forever on a device that has disappeared")
> Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
> ---
> drivers/pci/controller/pci-hyperv.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index cfc8fa403dad..bd63efc4a210 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -52,6 +52,7 @@
> #include <linux/acpi.h>
> #include <linux/sizes.h>
> #include <linux/of_irq.h>
> +#include <linux/jiffies.h>
> #include <asm/mshyperv.h>
>
> /*
> @@ -1038,6 +1039,8 @@ static void put_pcichild(struct hv_pci_dev *hpdev)
> kfree(hpdev);
> }
>
> +#define TIMEOUT_MS 5000
> +
This can be in seconds, see below.
> /*
> * There is no good way to get notified from vmbus_onoffer_rescind(),
> * so let's use polling here, since this is not a hot path.
> @@ -1045,8 +1048,13 @@ static void put_pcichild(struct hv_pci_dev *hpdev)
> static int wait_for_response(struct hv_device *hdev,
> struct completion *comp)
> {
> + unsigned long timeout = get_jiffies_64() + msecs_to_jiffies(TIMEOUT_MS);
You can use secs_to_jiffies() here instead.
> + unsigned long now;
> +
> while (true) {
> - if (hdev->channel->rescind) {
> + now = get_jiffies_64();
> + if (hdev->channel->rescind ||
> + time_after(now, timeout)) {
> dev_warn_once(&hdev->device, "The device is gone.\n");
> return -ENODEV;
> }
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-12 19:27 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-12 17:40 [PATCH] PCI: hv: add hard timeout to wait_for_response() Hamza Mahfooz
2026-06-12 17:52 ` sashiko-bot
2026-06-12 18:04 ` Long Li
2026-06-12 18:44 ` Hamza Mahfooz
2026-06-12 19:27 ` Easwar Hariharan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox