public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xen/xenbus: better handle backend crash
@ 2025-11-02  3:20 Marek Marczykowski-Górecki
  2025-11-13 18:43 ` Marek Marczykowski-Górecki
  2025-11-17 11:06 ` Jürgen Groß
  0 siblings, 2 replies; 9+ messages in thread
From: Marek Marczykowski-Górecki @ 2025-11-02  3:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Marek Marczykowski-Górecki, Juergen Gross,
	Stefano Stabellini, Oleksandr Tyshchenko, Peng Jiang, Qiu-ji Chen,
	Jason Andryuk, moderated list:XEN HYPERVISOR INTERFACE

When the backend domain crashes, coordinated device cleanup is not
possible (as it involves waiting for the backend state change). In that
case, toolstack forcefully removes frontend xenstore entries.
xenbus_dev_changed() handles this case, and triggers device cleanup.
It's possible that toolstack manages to connect new device in that
place, before xenbus_dev_changed() notices the old one is missing. If
that happens, new one won't be probed and will forever remain in
XenbusStateInitialising.

Fix this by checking backend-id and if it changes, consider it
unplug+plug operation. It's important that cleanup on such unplug
doesn't modify xenstore entries (especially the "state" key) as it
belong to the new device to be probed - changing it would derail
establishing connection to the new backend (most likely, closing the
device before it was even connected). Handle this case by setting new
xenbus_device->vanished flag to true, and check it before changing state
entry.

And even if xenbus_dev_changed() correctly detects the device was
forcefully removed, the cleanup handling is still racy. Since this whole
handling doesn't happend in a single xenstore transaction, it's possible
that toolstack might put a new device there already. Avoid re-creating
the state key (which in the case of loosing the race would actually
close newly attached device).

The problem does not apply to frontend domain crash, as this case
involves coordinated cleanup.

Problem originally reported at
https://lore.kernel.org/xen-devel/aOZvivyZ9YhVWDLN@mail-itl/T/#t,
including reproduction steps.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
---
I considered re-using one of existing fields instead of a new
xenbus_device->vanished, but I wasn't sure if that would work better.
Setting xenbus_device->nodename to NULL would prevent few other places
using it (including some log messages). Setting xenbus_device->otherend
might have less unintentional impact, but logically it doesn't feel
correct.

With this patch applied, I cannot reproduce the issue anymore - neither
with the simplified reproducer script, nor with the full test suite.
---
 drivers/xen/xenbus/xenbus_client.c |  2 ++
 drivers/xen/xenbus/xenbus_probe.c  | 25 +++++++++++++++++++++++++
 include/xen/xenbus.h               |  1 +
 3 files changed, 28 insertions(+)

diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index e73ec225d4a61..ce2f49d9aa4ad 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -275,6 +275,8 @@ __xenbus_switch_state(struct xenbus_device *dev,
  */
 int xenbus_switch_state(struct xenbus_device *dev, enum xenbus_state state)
 {
+	if (dev->vanished)
+		return 0;
 	return __xenbus_switch_state(dev, state, 0);
 }
 
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 86fe6e7790566..3c3e56b544976 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -444,6 +444,9 @@ static void xenbus_cleanup_devices(const char *path, struct bus_type *bus)
 		info.dev = NULL;
 		bus_for_each_dev(bus, NULL, &info, cleanup_dev);
 		if (info.dev) {
+			dev_warn(&info.dev->dev,
+			         "device forcefully removed from xenstore\n");
+			info.dev->vanished = true;
 			device_unregister(&info.dev->dev);
 			put_device(&info.dev->dev);
 		}
@@ -659,6 +662,28 @@ void xenbus_dev_changed(const char *node, struct xen_bus_type *bus)
 		return;
 
 	dev = xenbus_device_find(root, &bus->bus);
+	/* Backend domain crash results in not coordinated frontend removal,
+	 * without going through XenbusStateClosing. Check if the device
+	 * wasn't replaced to point at another backend in the meantime.
+	 */
+	if (dev && !strncmp(node, "device/", sizeof("device/")-1)) {
+		int backend_id;
+		int err = xenbus_gather(XBT_NIL, root,
+				        "backend-id", "%i", &backend_id,
+					NULL);
+		if (!err && backend_id != dev->otherend_id) {
+			/* It isn't the same device, assume the old one
+			 * vanished and new one needs to be probed.
+			 */
+			dev_warn(&dev->dev,
+				 "backend-id mismatch (%d != %d), reconnecting\n",
+				 backend_id, dev->otherend_id);
+			dev->vanished = true;
+			device_unregister(&dev->dev);
+			put_device(&dev->dev);
+			dev = NULL;
+		}
+	}
 	if (!dev)
 		xenbus_probe_node(bus, type, root);
 	else
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 7dab04cf4a36c..43a5335f1d5a3 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -87,6 +87,7 @@ struct xenbus_device {
 	struct completion down;
 	struct work_struct work;
 	struct semaphore reclaim_sem;
+	bool vanished;
 
 	/* Event channel based statistics and settings. */
 	atomic_t event_channels;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] xen/xenbus: better handle backend crash
  2025-11-02  3:20 [PATCH] xen/xenbus: better handle backend crash Marek Marczykowski-Górecki
@ 2025-11-13 18:43 ` Marek Marczykowski-Górecki
  2025-11-17 11:06 ` Jürgen Groß
  1 sibling, 0 replies; 9+ messages in thread
From: Marek Marczykowski-Górecki @ 2025-11-13 18:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
	Peng Jiang, Qiu-ji Chen, Jason Andryuk,
	moderated list:XEN HYPERVISOR INTERFACE

[-- Attachment #1: Type: text/plain, Size: 5284 bytes --]

Ping?

On Sun, Nov 02, 2025 at 04:20:12AM +0100, Marek Marczykowski-Górecki wrote:
> When the backend domain crashes, coordinated device cleanup is not
> possible (as it involves waiting for the backend state change). In that
> case, toolstack forcefully removes frontend xenstore entries.
> xenbus_dev_changed() handles this case, and triggers device cleanup.
> It's possible that toolstack manages to connect new device in that
> place, before xenbus_dev_changed() notices the old one is missing. If
> that happens, new one won't be probed and will forever remain in
> XenbusStateInitialising.
> 
> Fix this by checking backend-id and if it changes, consider it
> unplug+plug operation. It's important that cleanup on such unplug
> doesn't modify xenstore entries (especially the "state" key) as it
> belong to the new device to be probed - changing it would derail
> establishing connection to the new backend (most likely, closing the
> device before it was even connected). Handle this case by setting new
> xenbus_device->vanished flag to true, and check it before changing state
> entry.
> 
> And even if xenbus_dev_changed() correctly detects the device was
> forcefully removed, the cleanup handling is still racy. Since this whole
> handling doesn't happend in a single xenstore transaction, it's possible
> that toolstack might put a new device there already. Avoid re-creating
> the state key (which in the case of loosing the race would actually
> close newly attached device).
> 
> The problem does not apply to frontend domain crash, as this case
> involves coordinated cleanup.
> 
> Problem originally reported at
> https://lore.kernel.org/xen-devel/aOZvivyZ9YhVWDLN@mail-itl/T/#t,
> including reproduction steps.
> 
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> ---
> I considered re-using one of existing fields instead of a new
> xenbus_device->vanished, but I wasn't sure if that would work better.
> Setting xenbus_device->nodename to NULL would prevent few other places
> using it (including some log messages). Setting xenbus_device->otherend
> might have less unintentional impact, but logically it doesn't feel
> correct.
> 
> With this patch applied, I cannot reproduce the issue anymore - neither
> with the simplified reproducer script, nor with the full test suite.
> ---
>  drivers/xen/xenbus/xenbus_client.c |  2 ++
>  drivers/xen/xenbus/xenbus_probe.c  | 25 +++++++++++++++++++++++++
>  include/xen/xenbus.h               |  1 +
>  3 files changed, 28 insertions(+)
> 
> diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
> index e73ec225d4a61..ce2f49d9aa4ad 100644
> --- a/drivers/xen/xenbus/xenbus_client.c
> +++ b/drivers/xen/xenbus/xenbus_client.c
> @@ -275,6 +275,8 @@ __xenbus_switch_state(struct xenbus_device *dev,
>   */
>  int xenbus_switch_state(struct xenbus_device *dev, enum xenbus_state state)
>  {
> +	if (dev->vanished)
> +		return 0;
>  	return __xenbus_switch_state(dev, state, 0);
>  }
>  
> diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
> index 86fe6e7790566..3c3e56b544976 100644
> --- a/drivers/xen/xenbus/xenbus_probe.c
> +++ b/drivers/xen/xenbus/xenbus_probe.c
> @@ -444,6 +444,9 @@ static void xenbus_cleanup_devices(const char *path, struct bus_type *bus)
>  		info.dev = NULL;
>  		bus_for_each_dev(bus, NULL, &info, cleanup_dev);
>  		if (info.dev) {
> +			dev_warn(&info.dev->dev,
> +			         "device forcefully removed from xenstore\n");
> +			info.dev->vanished = true;
>  			device_unregister(&info.dev->dev);
>  			put_device(&info.dev->dev);
>  		}
> @@ -659,6 +662,28 @@ void xenbus_dev_changed(const char *node, struct xen_bus_type *bus)
>  		return;
>  
>  	dev = xenbus_device_find(root, &bus->bus);
> +	/* Backend domain crash results in not coordinated frontend removal,
> +	 * without going through XenbusStateClosing. Check if the device
> +	 * wasn't replaced to point at another backend in the meantime.
> +	 */
> +	if (dev && !strncmp(node, "device/", sizeof("device/")-1)) {
> +		int backend_id;
> +		int err = xenbus_gather(XBT_NIL, root,
> +				        "backend-id", "%i", &backend_id,
> +					NULL);
> +		if (!err && backend_id != dev->otherend_id) {
> +			/* It isn't the same device, assume the old one
> +			 * vanished and new one needs to be probed.
> +			 */
> +			dev_warn(&dev->dev,
> +				 "backend-id mismatch (%d != %d), reconnecting\n",
> +				 backend_id, dev->otherend_id);
> +			dev->vanished = true;
> +			device_unregister(&dev->dev);
> +			put_device(&dev->dev);
> +			dev = NULL;
> +		}
> +	}
>  	if (!dev)
>  		xenbus_probe_node(bus, type, root);
>  	else
> diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
> index 7dab04cf4a36c..43a5335f1d5a3 100644
> --- a/include/xen/xenbus.h
> +++ b/include/xen/xenbus.h
> @@ -87,6 +87,7 @@ struct xenbus_device {
>  	struct completion down;
>  	struct work_struct work;
>  	struct semaphore reclaim_sem;
> +	bool vanished;
>  
>  	/* Event channel based statistics and settings. */
>  	atomic_t event_channels;
> -- 
> 2.51.0
> 

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xen/xenbus: better handle backend crash
  2025-11-02  3:20 [PATCH] xen/xenbus: better handle backend crash Marek Marczykowski-Górecki
  2025-11-13 18:43 ` Marek Marczykowski-Górecki
@ 2025-11-17 11:06 ` Jürgen Groß
  2026-01-26  7:08   ` Jürgen Groß
  1 sibling, 1 reply; 9+ messages in thread
From: Jürgen Groß @ 2025-11-17 11:06 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki, linux-kernel
  Cc: Stefano Stabellini, Oleksandr Tyshchenko, Peng Jiang, Qiu-ji Chen,
	Jason Andryuk, moderated list:XEN HYPERVISOR INTERFACE


[-- Attachment #1.1.1: Type: text/plain, Size: 2870 bytes --]

On 02.11.25 04:20, Marek Marczykowski-Górecki wrote:
> When the backend domain crashes, coordinated device cleanup is not
> possible (as it involves waiting for the backend state change). In that
> case, toolstack forcefully removes frontend xenstore entries.
> xenbus_dev_changed() handles this case, and triggers device cleanup.
> It's possible that toolstack manages to connect new device in that
> place, before xenbus_dev_changed() notices the old one is missing. If
> that happens, new one won't be probed and will forever remain in
> XenbusStateInitialising.
> 
> Fix this by checking backend-id and if it changes, consider it
> unplug+plug operation. It's important that cleanup on such unplug
> doesn't modify xenstore entries (especially the "state" key) as it
> belong to the new device to be probed - changing it would derail
> establishing connection to the new backend (most likely, closing the
> device before it was even connected). Handle this case by setting new
> xenbus_device->vanished flag to true, and check it before changing state
> entry.
> 
> And even if xenbus_dev_changed() correctly detects the device was
> forcefully removed, the cleanup handling is still racy. Since this whole
> handling doesn't happend in a single xenstore transaction, it's possible
> that toolstack might put a new device there already. Avoid re-creating
> the state key (which in the case of loosing the race would actually
> close newly attached device).
> 
> The problem does not apply to frontend domain crash, as this case
> involves coordinated cleanup.
> 
> Problem originally reported at
> https://lore.kernel.org/xen-devel/aOZvivyZ9YhVWDLN@mail-itl/T/#t,
> including reproduction steps.
> 
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

Sorry I didn't get earlier to this.

My main problem with this patch is that it is basically just papering over
a more general problem.

You are just making the problem much more improbable, but not impossible to
occur again. In case the new driver domain has the same domid as the old one
you can still have the same race.

The clean way to handle that would be to add a unique Id in Xenstore to each
device on the backend side, which can be tested on the frontend side to
match. In case it doesn't match, an old device with the same kind and devid
can be cleaned up.

The unique Id would obviously need to be set by the Xen tools inside the
transaction writing the initial backend Xenstore nodes, as doing that from
the backend would add another potential ambiguity by the driver domain
choosing the same unique id as the previous one did.

The question is whether something like your patch should be used as a
fallback in case there is no unique Id on the backend side of the device
due to a too old Xen version.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xen/xenbus: better handle backend crash
  2025-11-17 11:06 ` Jürgen Groß
@ 2026-01-26  7:08   ` Jürgen Groß
  2026-01-29  7:02     ` Jürgen Groß
  0 siblings, 1 reply; 9+ messages in thread
From: Jürgen Groß @ 2026-01-26  7:08 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki, linux-kernel
  Cc: Stefano Stabellini, Oleksandr Tyshchenko, Peng Jiang, Qiu-ji Chen,
	Jason Andryuk, moderated list:XEN HYPERVISOR INTERFACE


[-- Attachment #1.1.1: Type: text/plain, Size: 4007 bytes --]

On 17.11.25 12:06, Jürgen Groß wrote:
> On 02.11.25 04:20, Marek Marczykowski-Górecki wrote:
>> When the backend domain crashes, coordinated device cleanup is not
>> possible (as it involves waiting for the backend state change). In that
>> case, toolstack forcefully removes frontend xenstore entries.
>> xenbus_dev_changed() handles this case, and triggers device cleanup.
>> It's possible that toolstack manages to connect new device in that
>> place, before xenbus_dev_changed() notices the old one is missing. If
>> that happens, new one won't be probed and will forever remain in
>> XenbusStateInitialising.
>>
>> Fix this by checking backend-id and if it changes, consider it
>> unplug+plug operation. It's important that cleanup on such unplug
>> doesn't modify xenstore entries (especially the "state" key) as it
>> belong to the new device to be probed - changing it would derail
>> establishing connection to the new backend (most likely, closing the
>> device before it was even connected). Handle this case by setting new
>> xenbus_device->vanished flag to true, and check it before changing state
>> entry.
>>
>> And even if xenbus_dev_changed() correctly detects the device was
>> forcefully removed, the cleanup handling is still racy. Since this whole
>> handling doesn't happend in a single xenstore transaction, it's possible
>> that toolstack might put a new device there already. Avoid re-creating
>> the state key (which in the case of loosing the race would actually
>> close newly attached device).
>>
>> The problem does not apply to frontend domain crash, as this case
>> involves coordinated cleanup.
>>
>> Problem originally reported at
>> https://lore.kernel.org/xen-devel/aOZvivyZ9YhVWDLN@mail-itl/T/#t,
>> including reproduction steps.
>>
>> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> 
> Sorry I didn't get earlier to this.
> 
> My main problem with this patch is that it is basically just papering over
> a more general problem.
> 
> You are just making the problem much more improbable, but not impossible to
> occur again. In case the new driver domain has the same domid as the old one
> you can still have the same race.
> 
> The clean way to handle that would be to add a unique Id in Xenstore to each
> device on the backend side, which can be tested on the frontend side to
> match. In case it doesn't match, an old device with the same kind and devid
> can be cleaned up.
> 
> The unique Id would obviously need to be set by the Xen tools inside the
> transaction writing the initial backend Xenstore nodes, as doing that from
> the backend would add another potential ambiguity by the driver domain
> choosing the same unique id as the previous one did.
> 
> The question is whether something like your patch should be used as a
> fallback in case there is no unique Id on the backend side of the device
> due to a too old Xen version.

I think I have found a solution which is much more simple, as it doesn't
need any change of the protocol or any addition of new identifiers.

When creating a new PV device, Xen tools will always write all generic
frontend- and backend-nodes. This includes the frontend state, which is
initialized as XenbusStateInitialising.

The Linux kernel's xenbus driver is already storing the last known state
of a xenbus device in struct xenbus_device. When changing the state, the
xenbus driver is even reading the state from Xenstore (even if only for
making sure the path is still existing). So all what is needed is to check,
whether the read current state is matching the locally saved state. If it
is not matching AND the read state is XenbusStateInitialising, you can be
sure that the backend has been replaced.

Handling this will need to check the return value of xenbus_switch_state()
in all related drivers, but this is just a more or less mechanical change.

I'll prepare a patch series for that.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xen/xenbus: better handle backend crash
  2026-01-26  7:08   ` Jürgen Groß
@ 2026-01-29  7:02     ` Jürgen Groß
  2026-02-06 16:57       ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 9+ messages in thread
From: Jürgen Groß @ 2026-01-29  7:02 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki, linux-kernel
  Cc: Stefano Stabellini, Oleksandr Tyshchenko, Peng Jiang, Qiu-ji Chen,
	Jason Andryuk, moderated list:XEN HYPERVISOR INTERFACE


[-- Attachment #1.1.1: Type: text/plain, Size: 4562 bytes --]

On 26.01.26 08:08, Jürgen Groß wrote:
> On 17.11.25 12:06, Jürgen Groß wrote:
>> On 02.11.25 04:20, Marek Marczykowski-Górecki wrote:
>>> When the backend domain crashes, coordinated device cleanup is not
>>> possible (as it involves waiting for the backend state change). In that
>>> case, toolstack forcefully removes frontend xenstore entries.
>>> xenbus_dev_changed() handles this case, and triggers device cleanup.
>>> It's possible that toolstack manages to connect new device in that
>>> place, before xenbus_dev_changed() notices the old one is missing. If
>>> that happens, new one won't be probed and will forever remain in
>>> XenbusStateInitialising.
>>>
>>> Fix this by checking backend-id and if it changes, consider it
>>> unplug+plug operation. It's important that cleanup on such unplug
>>> doesn't modify xenstore entries (especially the "state" key) as it
>>> belong to the new device to be probed - changing it would derail
>>> establishing connection to the new backend (most likely, closing the
>>> device before it was even connected). Handle this case by setting new
>>> xenbus_device->vanished flag to true, and check it before changing state
>>> entry.
>>>
>>> And even if xenbus_dev_changed() correctly detects the device was
>>> forcefully removed, the cleanup handling is still racy. Since this whole
>>> handling doesn't happend in a single xenstore transaction, it's possible
>>> that toolstack might put a new device there already. Avoid re-creating
>>> the state key (which in the case of loosing the race would actually
>>> close newly attached device).
>>>
>>> The problem does not apply to frontend domain crash, as this case
>>> involves coordinated cleanup.
>>>
>>> Problem originally reported at
>>> https://lore.kernel.org/xen-devel/aOZvivyZ9YhVWDLN@mail-itl/T/#t,
>>> including reproduction steps.
>>>
>>> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>>
>> Sorry I didn't get earlier to this.
>>
>> My main problem with this patch is that it is basically just papering over
>> a more general problem.
>>
>> You are just making the problem much more improbable, but not impossible to
>> occur again. In case the new driver domain has the same domid as the old one
>> you can still have the same race.
>>
>> The clean way to handle that would be to add a unique Id in Xenstore to each
>> device on the backend side, which can be tested on the frontend side to
>> match. In case it doesn't match, an old device with the same kind and devid
>> can be cleaned up.
>>
>> The unique Id would obviously need to be set by the Xen tools inside the
>> transaction writing the initial backend Xenstore nodes, as doing that from
>> the backend would add another potential ambiguity by the driver domain
>> choosing the same unique id as the previous one did.
>>
>> The question is whether something like your patch should be used as a
>> fallback in case there is no unique Id on the backend side of the device
>> due to a too old Xen version.
> 
> I think I have found a solution which is much more simple, as it doesn't
> need any change of the protocol or any addition of new identifiers.
> 
> When creating a new PV device, Xen tools will always write all generic
> frontend- and backend-nodes. This includes the frontend state, which is
> initialized as XenbusStateInitialising.
> 
> The Linux kernel's xenbus driver is already storing the last known state
> of a xenbus device in struct xenbus_device. When changing the state, the
> xenbus driver is even reading the state from Xenstore (even if only for
> making sure the path is still existing). So all what is needed is to check,
> whether the read current state is matching the locally saved state. If it
> is not matching AND the read state is XenbusStateInitialising, you can be
> sure that the backend has been replaced.
> 
> Handling this will need to check the return value of xenbus_switch_state()
> in all related drivers, but this is just a more or less mechanical change.
> 
> I'll prepare a patch series for that.

In the end the result is more like your patch, avoiding the need to modify
all drivers.

I just added my idea to your patch and modified some of your code to be more
simple. I _think_ I have covered all possible scenarios now, resulting in
the need to keep the backend id check in case the backend died during the
early init phase of the device.

Could you please verify the attached patch is working for you?


Juergen

[-- Attachment #1.1.2: 0001-xen-xenbus-better-handle-backend-crash.patch --]
[-- Type: text/x-patch, Size: 5819 bytes --]

From 3a7a8257613141edb03833c5d277b14ee28697dc Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Mon, 26 Jan 2026 08:45:17 +0100
Subject: [PATCH] xen/xenbus: better handle backend crash
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When the backend domain crashes, coordinated device cleanup is not
possible (as it involves waiting for the backend state change). In that
case, toolstack forcefully removes frontend xenstore entries.
xenbus_dev_changed() handles this case, and triggers device cleanup.
It's possible that toolstack manages to connect new device in that
place, before xenbus_dev_changed() notices the old one is missing. If
that happens, new one won't be probed and will forever remain in
XenbusStateInitialising.

Fix this by checking the frontend's state in Xenstore. In case it has
been reset to XenbusStateInitialising by Xen tools, consider this
being the result of an unplug+plug operation.

It's important that cleanup on such unplug doesn't modify Xenstore
entries (especially the "state" key) as it belong to the new device
to be probed - changing it would derail establishing connection to the
new backend (most likely, closing the device before it was even
connected). Handle this case by setting new xenbus_device->vanished
flag to true, and check it before changing state entry.

And even if xenbus_dev_changed() correctly detects the device was
forcefully removed, the cleanup handling is still racy. Since this whole
handling doesn't happened in a single Xenstore transaction, it's possible
that toolstack might put a new device there already. Avoid re-creating
the state key (which in the case of loosing the race would actually
close newly attached device).

The problem does not apply to frontend domain crash, as this case
involves coordinated cleanup.

Problem originally reported at
https://lore.kernel.org/xen-devel/aOZvivyZ9YhVWDLN@mail-itl/T/#t,
including reproduction steps.

Based-on-patch-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/xen/xenbus/xenbus_client.c |  9 ++++++--
 drivers/xen/xenbus/xenbus_probe.c  | 34 ++++++++++++++++++++++++++++++
 include/xen/xenbus.h               |  1 +
 3 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index 2dc874fb5506..0d99a1f60df4 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -226,8 +226,9 @@ __xenbus_switch_state(struct xenbus_device *dev,
 	struct xenbus_transaction xbt;
 	int current_state;
 	int err, abort;
+	bool vanished = false;
 
-	if (state == dev->state)
+	if (state == dev->state || dev->vanished)
 		return 0;
 
 again:
@@ -242,6 +243,10 @@ __xenbus_switch_state(struct xenbus_device *dev,
 	err = xenbus_scanf(xbt, dev->nodename, "state", "%d", &current_state);
 	if (err != 1)
 		goto abort;
+	if (current_state != dev->state && current_state == XenbusStateInitialising) {
+		vanished = true;
+		goto abort;
+	}
 
 	err = xenbus_printf(xbt, dev->nodename, "state", "%d", state);
 	if (err) {
@@ -256,7 +261,7 @@ __xenbus_switch_state(struct xenbus_device *dev,
 		if (err == -EAGAIN && !abort)
 			goto again;
 		xenbus_switch_fatal(dev, depth, err, "ending transaction");
-	} else
+	} else if (!vanished)
 		dev->state = state;
 
 	return 0;
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 86fe6e779056..fcd59a2109c3 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -444,6 +444,9 @@ static void xenbus_cleanup_devices(const char *path, struct bus_type *bus)
 		info.dev = NULL;
 		bus_for_each_dev(bus, NULL, &info, cleanup_dev);
 		if (info.dev) {
+			dev_warn(&info.dev->dev,
+				 "device forcefully removed from xenstore\n");
+			info.dev->vanished = true;
 			device_unregister(&info.dev->dev);
 			put_device(&info.dev->dev);
 		}
@@ -659,6 +662,37 @@ void xenbus_dev_changed(const char *node, struct xen_bus_type *bus)
 		return;
 
 	dev = xenbus_device_find(root, &bus->bus);
+	/*
+	 * Backend domain crash results in not coordinated frontend removal,
+	 * without going through XenbusStateClosing. If this is a new instance
+	 * of the same device Xen tools will have reset the state to
+	 * XenbusStateInitializing.
+	 * It might be that the backend crashed early during the init phase of
+	 * device setup, in which case the known state would have been
+	 * XenbusStateInitializing. So test the backend domid to match the
+	 * saved one. In case the new backend happens to have the same domid as
+	 * the old one, we can just carry on, as there is no inconsistency
+	 * resulting in this case.
+	 */
+	if (dev && !strcmp(bus->root, "device")) {
+		enum xenbus_state state = xenbus_read_driver_state(dev->nodename);
+		unsigned int backend = xenbus_read_unsigned(root, "backend-id",
+							    dev->otherend_id);
+
+		if (state == XenbusStateInitialising &&
+		    (state != dev->state || backend != dev->otherend_id)) {
+			/*
+			 * State has been reset, assume the old one vanished
+			 * and new one needs to be probed.
+			 */
+			dev_warn(&dev->dev,
+				 "state reset occurred, reconnecting\n");
+			dev->vanished = true;
+			device_unregister(&dev->dev);
+			put_device(&dev->dev);
+			dev = NULL;
+		}
+	}
 	if (!dev)
 		xenbus_probe_node(bus, type, root);
 	else
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index c94caf852aea..064cb6db5e9f 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -80,6 +80,7 @@ struct xenbus_device {
 	const char *devicetype;
 	const char *nodename;
 	const char *otherend;
+	bool vanished;
 	int otherend_id;
 	struct xenbus_watch otherend_watch;
 	struct device dev;
-- 
2.52.0


[-- Attachment #1.1.3: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] xen/xenbus: better handle backend crash
  2026-01-29  7:02     ` Jürgen Groß
@ 2026-02-06 16:57       ` Marek Marczykowski-Górecki
  2026-02-09  7:30         ` Jürgen Groß
  2026-02-09  9:02         ` Jürgen Groß
  0 siblings, 2 replies; 9+ messages in thread
From: Marek Marczykowski-Górecki @ 2026-02-06 16:57 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: linux-kernel, Stefano Stabellini, Oleksandr Tyshchenko,
	Peng Jiang, Qiu-ji Chen, Jason Andryuk,
	moderated list:XEN HYPERVISOR INTERFACE

[-- Attachment #1: Type: text/plain, Size: 8644 bytes --]

On Thu, Jan 29, 2026 at 08:02:35AM +0100, Jürgen Groß wrote:
> On 26.01.26 08:08, Jürgen Groß wrote:
> > On 17.11.25 12:06, Jürgen Groß wrote:
> > > On 02.11.25 04:20, Marek Marczykowski-Górecki wrote:
> > > > When the backend domain crashes, coordinated device cleanup is not
> > > > possible (as it involves waiting for the backend state change). In that
> > > > case, toolstack forcefully removes frontend xenstore entries.
> > > > xenbus_dev_changed() handles this case, and triggers device cleanup.
> > > > It's possible that toolstack manages to connect new device in that
> > > > place, before xenbus_dev_changed() notices the old one is missing. If
> > > > that happens, new one won't be probed and will forever remain in
> > > > XenbusStateInitialising.
> > > > 
> > > > Fix this by checking backend-id and if it changes, consider it
> > > > unplug+plug operation. It's important that cleanup on such unplug
> > > > doesn't modify xenstore entries (especially the "state" key) as it
> > > > belong to the new device to be probed - changing it would derail
> > > > establishing connection to the new backend (most likely, closing the
> > > > device before it was even connected). Handle this case by setting new
> > > > xenbus_device->vanished flag to true, and check it before changing state
> > > > entry.
> > > > 
> > > > And even if xenbus_dev_changed() correctly detects the device was
> > > > forcefully removed, the cleanup handling is still racy. Since this whole
> > > > handling doesn't happend in a single xenstore transaction, it's possible
> > > > that toolstack might put a new device there already. Avoid re-creating
> > > > the state key (which in the case of loosing the race would actually
> > > > close newly attached device).
> > > > 
> > > > The problem does not apply to frontend domain crash, as this case
> > > > involves coordinated cleanup.
> > > > 
> > > > Problem originally reported at
> > > > https://lore.kernel.org/xen-devel/aOZvivyZ9YhVWDLN@mail-itl/T/#t,
> > > > including reproduction steps.
> > > > 
> > > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > > 
> > > Sorry I didn't get earlier to this.
> > > 
> > > My main problem with this patch is that it is basically just papering over
> > > a more general problem.
> > > 
> > > You are just making the problem much more improbable, but not impossible to
> > > occur again. In case the new driver domain has the same domid as the old one
> > > you can still have the same race.
> > > 
> > > The clean way to handle that would be to add a unique Id in Xenstore to each
> > > device on the backend side, which can be tested on the frontend side to
> > > match. In case it doesn't match, an old device with the same kind and devid
> > > can be cleaned up.
> > > 
> > > The unique Id would obviously need to be set by the Xen tools inside the
> > > transaction writing the initial backend Xenstore nodes, as doing that from
> > > the backend would add another potential ambiguity by the driver domain
> > > choosing the same unique id as the previous one did.
> > > 
> > > The question is whether something like your patch should be used as a
> > > fallback in case there is no unique Id on the backend side of the device
> > > due to a too old Xen version.
> > 
> > I think I have found a solution which is much more simple, as it doesn't
> > need any change of the protocol or any addition of new identifiers.
> > 
> > When creating a new PV device, Xen tools will always write all generic
> > frontend- and backend-nodes. This includes the frontend state, which is
> > initialized as XenbusStateInitialising.
> > 
> > The Linux kernel's xenbus driver is already storing the last known state
> > of a xenbus device in struct xenbus_device. When changing the state, the
> > xenbus driver is even reading the state from Xenstore (even if only for
> > making sure the path is still existing). So all what is needed is to check,
> > whether the read current state is matching the locally saved state. If it
> > is not matching AND the read state is XenbusStateInitialising, you can be
> > sure that the backend has been replaced.
> > 
> > Handling this will need to check the return value of xenbus_switch_state()
> > in all related drivers, but this is just a more or less mechanical change.
> > 
> > I'll prepare a patch series for that.
> 
> In the end the result is more like your patch, avoiding the need to modify
> all drivers.
> 
> I just added my idea to your patch and modified some of your code to be more
> simple. I _think_ I have covered all possible scenarios now, resulting in
> the need to keep the backend id check in case the backend died during the
> early init phase of the device.
> 
> Could you please verify the attached patch is working for you?

Thanks for the patch!

I ran it through relevant tests, and I got inconsistent results.
Specifically, sometimes, the domU hangs (actually, just one vCPU spins
inside xenwatch thread). Last console messages are:

    systemd[626]: Starting dconf.service - User preferences database...
    gnome-keyring-daemon[975]: ␛[0;1;39mcouldn't access control socket: /run/user/1000/keyring/control: No such file or directory␛[0m
    gnome-keyring-daemon[975]: ␛[0;1;38:5:185mdiscover_other_daemon: 0␛[0m
    xen vif-0: xenbus: state reset occurred, reconnecting
    gnome-keyring-daemon[974]: ␛[0;1;39mcouldn't access control socket: /run/user/1000/keyring/control: No such file or directory␛[0m
    gnome-keyring-daemon[976]: ␛[0;1;39mcouldn't access control socket: /run/user/1000/keyring/control: No such file or directory␛[0m
    gnome-keyring-daemon[976]: ␛[0;1;38:5:185mdiscover_other_daemon: 0␛[0m
    gnome-keyring-daemon[974]: ␛[0;1;38:5:185mdiscover_other_daemon: 0␛[0m
    xen vif-0: xenbus: state reset occurred, reconnecting
    systemd[626]: Started dconf.service - User preferences database.
    xen_netfront: Initialising Xen virtual ethernet driver
    vif vif-0: xenbus: state reset occurred, reconnecting

And the call trace of the spinning xenwatch thread is:

    task:xenwatch        state:D stack:0     pid:64    tgid:64    ppid:2      task_flags:0x288040 flags:0x00080000
    Call Trace:
     <TASK>
     __schedule+0x2f3/0x780
     schedule+0x27/0x80
     xs_wait_for_reply+0xab/0x1f0
     ? __pfx_autoremove_wake_function+0x10/0x10
     xs_talkv+0xec/0x200
     xs_single+0x4a/0x70
     xenbus_gather+0xe4/0x1a0
     xenbus_read_driver_state+0x42/0x70
     xennet_bus_close+0x113/0x2c0 [xen_netfront]
     ? __pfx_autoremove_wake_function+0x10/0x10
     xennet_remove+0x16/0x80 [xen_netfront]
     xenbus_dev_remove+0x71/0xf0
     device_release_driver_internal+0x19c/0x200
     bus_remove_device+0xc6/0x130
     device_del+0x160/0x3e0
     device_unregister+0x17/0x60
     xenbus_dev_changed.cold+0x5e/0x6b
     ? __pfx_xenwatch_thread+0x10/0x10
     xenwatch_thread+0x92/0x1c0
     ? __pfx_autoremove_wake_function+0x10/0x10
     kthread+0xfc/0x240
     ? __pfx_kthread+0x10/0x10
     ret_from_fork+0xf5/0x110
     ? __pfx_kthread+0x10/0x10
     ret_from_fork_asm+0x1a/0x30
     </TASK>
    task:xenbus          state:S stack:0     pid:63    tgid:63    ppid:2      task_flags:0x208040 flags:0x00080000
    Call Trace:
     <TASK>
     __schedule+0x2f3/0x780
     ? __pfx_xenbus_thread+0x10/0x10
     schedule+0x27/0x80
     xenbus_thread+0x1a8/0x200
     ? __pfx_autoremove_wake_function+0x10/0x10
     kthread+0xfc/0x240
     ? __pfx_kthread+0x10/0x10
     ret_from_fork+0xf5/0x110
     ? __pfx_kthread+0x10/0x10
     ret_from_fork_asm+0x1a/0x30
     </TASK>

(technically, `top` says it's the xenbus thread spinning, but it looks
like the actual issue is in xenwatch one)

Note that other xenwatch actions in this domU are not executed, for
example `xl sysrq` does nothing. Not surprising, given xenwatch thread
is busy... Fortunately, it blocks only a single vCPU, so I'm able to
interact with the domU over console (to get the above traces).

It isn't a reliable failure, in this test run it failed once, out of 4
related tests.

The specific test is: https://github.com/QubesOS/qubes-core-admin/blob/main/qubes/tests/integ/network.py#L234
In short:
1. Start a domU
2. Pause it
3. Attach network (backend is != dom0)
4. Unpause

TBH, I'm not sure why the "state reset occurred" message is triggered at
all, I think it shouldn't be in this case...

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xen/xenbus: better handle backend crash
  2026-02-06 16:57       ` Marek Marczykowski-Górecki
@ 2026-02-09  7:30         ` Jürgen Groß
  2026-02-09  9:02         ` Jürgen Groß
  1 sibling, 0 replies; 9+ messages in thread
From: Jürgen Groß @ 2026-02-09  7:30 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: linux-kernel, Stefano Stabellini, Oleksandr Tyshchenko,
	Peng Jiang, Qiu-ji Chen, Jason Andryuk,
	moderated list:XEN HYPERVISOR INTERFACE


[-- Attachment #1.1.1: Type: text/plain, Size: 9179 bytes --]

On 06.02.26 17:57, Marek Marczykowski-Górecki wrote:
> On Thu, Jan 29, 2026 at 08:02:35AM +0100, Jürgen Groß wrote:
>> On 26.01.26 08:08, Jürgen Groß wrote:
>>> On 17.11.25 12:06, Jürgen Groß wrote:
>>>> On 02.11.25 04:20, Marek Marczykowski-Górecki wrote:
>>>>> When the backend domain crashes, coordinated device cleanup is not
>>>>> possible (as it involves waiting for the backend state change). In that
>>>>> case, toolstack forcefully removes frontend xenstore entries.
>>>>> xenbus_dev_changed() handles this case, and triggers device cleanup.
>>>>> It's possible that toolstack manages to connect new device in that
>>>>> place, before xenbus_dev_changed() notices the old one is missing. If
>>>>> that happens, new one won't be probed and will forever remain in
>>>>> XenbusStateInitialising.
>>>>>
>>>>> Fix this by checking backend-id and if it changes, consider it
>>>>> unplug+plug operation. It's important that cleanup on such unplug
>>>>> doesn't modify xenstore entries (especially the "state" key) as it
>>>>> belong to the new device to be probed - changing it would derail
>>>>> establishing connection to the new backend (most likely, closing the
>>>>> device before it was even connected). Handle this case by setting new
>>>>> xenbus_device->vanished flag to true, and check it before changing state
>>>>> entry.
>>>>>
>>>>> And even if xenbus_dev_changed() correctly detects the device was
>>>>> forcefully removed, the cleanup handling is still racy. Since this whole
>>>>> handling doesn't happend in a single xenstore transaction, it's possible
>>>>> that toolstack might put a new device there already. Avoid re-creating
>>>>> the state key (which in the case of loosing the race would actually
>>>>> close newly attached device).
>>>>>
>>>>> The problem does not apply to frontend domain crash, as this case
>>>>> involves coordinated cleanup.
>>>>>
>>>>> Problem originally reported at
>>>>> https://lore.kernel.org/xen-devel/aOZvivyZ9YhVWDLN@mail-itl/T/#t,
>>>>> including reproduction steps.
>>>>>
>>>>> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>>>>
>>>> Sorry I didn't get earlier to this.
>>>>
>>>> My main problem with this patch is that it is basically just papering over
>>>> a more general problem.
>>>>
>>>> You are just making the problem much more improbable, but not impossible to
>>>> occur again. In case the new driver domain has the same domid as the old one
>>>> you can still have the same race.
>>>>
>>>> The clean way to handle that would be to add a unique Id in Xenstore to each
>>>> device on the backend side, which can be tested on the frontend side to
>>>> match. In case it doesn't match, an old device with the same kind and devid
>>>> can be cleaned up.
>>>>
>>>> The unique Id would obviously need to be set by the Xen tools inside the
>>>> transaction writing the initial backend Xenstore nodes, as doing that from
>>>> the backend would add another potential ambiguity by the driver domain
>>>> choosing the same unique id as the previous one did.
>>>>
>>>> The question is whether something like your patch should be used as a
>>>> fallback in case there is no unique Id on the backend side of the device
>>>> due to a too old Xen version.
>>>
>>> I think I have found a solution which is much more simple, as it doesn't
>>> need any change of the protocol or any addition of new identifiers.
>>>
>>> When creating a new PV device, Xen tools will always write all generic
>>> frontend- and backend-nodes. This includes the frontend state, which is
>>> initialized as XenbusStateInitialising.
>>>
>>> The Linux kernel's xenbus driver is already storing the last known state
>>> of a xenbus device in struct xenbus_device. When changing the state, the
>>> xenbus driver is even reading the state from Xenstore (even if only for
>>> making sure the path is still existing). So all what is needed is to check,
>>> whether the read current state is matching the locally saved state. If it
>>> is not matching AND the read state is XenbusStateInitialising, you can be
>>> sure that the backend has been replaced.
>>>
>>> Handling this will need to check the return value of xenbus_switch_state()
>>> in all related drivers, but this is just a more or less mechanical change.
>>>
>>> I'll prepare a patch series for that.
>>
>> In the end the result is more like your patch, avoiding the need to modify
>> all drivers.
>>
>> I just added my idea to your patch and modified some of your code to be more
>> simple. I _think_ I have covered all possible scenarios now, resulting in
>> the need to keep the backend id check in case the backend died during the
>> early init phase of the device.
>>
>> Could you please verify the attached patch is working for you?
> 
> Thanks for the patch!
> 
> I ran it through relevant tests, and I got inconsistent results.
> Specifically, sometimes, the domU hangs (actually, just one vCPU spins
> inside xenwatch thread). Last console messages are:
> 
>      systemd[626]: Starting dconf.service - User preferences database...
>      gnome-keyring-daemon[975]: ␛[0;1;39mcouldn't access control socket: /run/user/1000/keyring/control: No such file or directory␛[0m
>      gnome-keyring-daemon[975]: ␛[0;1;38:5:185mdiscover_other_daemon: 0␛[0m
>      xen vif-0: xenbus: state reset occurred, reconnecting
>      gnome-keyring-daemon[974]: ␛[0;1;39mcouldn't access control socket: /run/user/1000/keyring/control: No such file or directory␛[0m
>      gnome-keyring-daemon[976]: ␛[0;1;39mcouldn't access control socket: /run/user/1000/keyring/control: No such file or directory␛[0m
>      gnome-keyring-daemon[976]: ␛[0;1;38:5:185mdiscover_other_daemon: 0␛[0m
>      gnome-keyring-daemon[974]: ␛[0;1;38:5:185mdiscover_other_daemon: 0␛[0m
>      xen vif-0: xenbus: state reset occurred, reconnecting
>      systemd[626]: Started dconf.service - User preferences database.
>      xen_netfront: Initialising Xen virtual ethernet driver
>      vif vif-0: xenbus: state reset occurred, reconnecting
> 
> And the call trace of the spinning xenwatch thread is:
> 
>      task:xenwatch        state:D stack:0     pid:64    tgid:64    ppid:2      task_flags:0x288040 flags:0x00080000
>      Call Trace:
>       <TASK>
>       __schedule+0x2f3/0x780
>       schedule+0x27/0x80
>       xs_wait_for_reply+0xab/0x1f0
>       ? __pfx_autoremove_wake_function+0x10/0x10
>       xs_talkv+0xec/0x200
>       xs_single+0x4a/0x70
>       xenbus_gather+0xe4/0x1a0
>       xenbus_read_driver_state+0x42/0x70
>       xennet_bus_close+0x113/0x2c0 [xen_netfront]
>       ? __pfx_autoremove_wake_function+0x10/0x10
>       xennet_remove+0x16/0x80 [xen_netfront]
>       xenbus_dev_remove+0x71/0xf0
>       device_release_driver_internal+0x19c/0x200
>       bus_remove_device+0xc6/0x130
>       device_del+0x160/0x3e0
>       device_unregister+0x17/0x60
>       xenbus_dev_changed.cold+0x5e/0x6b
>       ? __pfx_xenwatch_thread+0x10/0x10
>       xenwatch_thread+0x92/0x1c0
>       ? __pfx_autoremove_wake_function+0x10/0x10
>       kthread+0xfc/0x240
>       ? __pfx_kthread+0x10/0x10
>       ret_from_fork+0xf5/0x110
>       ? __pfx_kthread+0x10/0x10
>       ret_from_fork_asm+0x1a/0x30
>       </TASK>
>      task:xenbus          state:S stack:0     pid:63    tgid:63    ppid:2      task_flags:0x208040 flags:0x00080000
>      Call Trace:
>       <TASK>
>       __schedule+0x2f3/0x780
>       ? __pfx_xenbus_thread+0x10/0x10
>       schedule+0x27/0x80
>       xenbus_thread+0x1a8/0x200
>       ? __pfx_autoremove_wake_function+0x10/0x10
>       kthread+0xfc/0x240
>       ? __pfx_kthread+0x10/0x10
>       ret_from_fork+0xf5/0x110
>       ? __pfx_kthread+0x10/0x10
>       ret_from_fork_asm+0x1a/0x30
>       </TASK>
> 
> (technically, `top` says it's the xenbus thread spinning, but it looks
> like the actual issue is in xenwatch one)
> 
> Note that other xenwatch actions in this domU are not executed, for
> example `xl sysrq` does nothing. Not surprising, given xenwatch thread
> is busy... Fortunately, it blocks only a single vCPU, so I'm able to
> interact with the domU over console (to get the above traces).
> 
> It isn't a reliable failure, in this test run it failed once, out of 4
> related tests.
> 
> The specific test is: https://github.com/QubesOS/qubes-core-admin/blob/main/qubes/tests/integ/network.py#L234
> In short:
> 1. Start a domU
> 2. Pause it
> 3. Attach network (backend is != dom0)
> 4. Unpause
> 
> TBH, I'm not sure why the "state reset occurred" message is triggered at
> all, I think it shouldn't be in this case...
> 

Thanks for the test.

I guess the hang happens due to xennet_bus_close() waiting for a state
change which won't happen at all, as it is already XenbusStateInitialising.

The right thing to do would be to add the xenbus_device pointer to the
parameters of xenbus_read_driver_state(), in order to be able to return
XenbusStateUnknown in case the device has vanished.

I'll add a patch for that.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xen/xenbus: better handle backend crash
  2026-02-06 16:57       ` Marek Marczykowski-Górecki
  2026-02-09  7:30         ` Jürgen Groß
@ 2026-02-09  9:02         ` Jürgen Groß
  2026-02-09 20:46           ` Marek Marczykowski-Górecki
  1 sibling, 1 reply; 9+ messages in thread
From: Jürgen Groß @ 2026-02-09  9:02 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: linux-kernel, Stefano Stabellini, Oleksandr Tyshchenko,
	Peng Jiang, Qiu-ji Chen, Jason Andryuk,
	moderated list:XEN HYPERVISOR INTERFACE


[-- Attachment #1.1.1: Type: text/plain, Size: 8789 bytes --]

On 06.02.26 17:57, Marek Marczykowski-Górecki wrote:
> On Thu, Jan 29, 2026 at 08:02:35AM +0100, Jürgen Groß wrote:
>> On 26.01.26 08:08, Jürgen Groß wrote:
>>> On 17.11.25 12:06, Jürgen Groß wrote:
>>>> On 02.11.25 04:20, Marek Marczykowski-Górecki wrote:
>>>>> When the backend domain crashes, coordinated device cleanup is not
>>>>> possible (as it involves waiting for the backend state change). In that
>>>>> case, toolstack forcefully removes frontend xenstore entries.
>>>>> xenbus_dev_changed() handles this case, and triggers device cleanup.
>>>>> It's possible that toolstack manages to connect new device in that
>>>>> place, before xenbus_dev_changed() notices the old one is missing. If
>>>>> that happens, new one won't be probed and will forever remain in
>>>>> XenbusStateInitialising.
>>>>>
>>>>> Fix this by checking backend-id and if it changes, consider it
>>>>> unplug+plug operation. It's important that cleanup on such unplug
>>>>> doesn't modify xenstore entries (especially the "state" key) as it
>>>>> belong to the new device to be probed - changing it would derail
>>>>> establishing connection to the new backend (most likely, closing the
>>>>> device before it was even connected). Handle this case by setting new
>>>>> xenbus_device->vanished flag to true, and check it before changing state
>>>>> entry.
>>>>>
>>>>> And even if xenbus_dev_changed() correctly detects the device was
>>>>> forcefully removed, the cleanup handling is still racy. Since this whole
>>>>> handling doesn't happend in a single xenstore transaction, it's possible
>>>>> that toolstack might put a new device there already. Avoid re-creating
>>>>> the state key (which in the case of loosing the race would actually
>>>>> close newly attached device).
>>>>>
>>>>> The problem does not apply to frontend domain crash, as this case
>>>>> involves coordinated cleanup.
>>>>>
>>>>> Problem originally reported at
>>>>> https://lore.kernel.org/xen-devel/aOZvivyZ9YhVWDLN@mail-itl/T/#t,
>>>>> including reproduction steps.
>>>>>
>>>>> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>>>>
>>>> Sorry I didn't get earlier to this.
>>>>
>>>> My main problem with this patch is that it is basically just papering over
>>>> a more general problem.
>>>>
>>>> You are just making the problem much more improbable, but not impossible to
>>>> occur again. In case the new driver domain has the same domid as the old one
>>>> you can still have the same race.
>>>>
>>>> The clean way to handle that would be to add a unique Id in Xenstore to each
>>>> device on the backend side, which can be tested on the frontend side to
>>>> match. In case it doesn't match, an old device with the same kind and devid
>>>> can be cleaned up.
>>>>
>>>> The unique Id would obviously need to be set by the Xen tools inside the
>>>> transaction writing the initial backend Xenstore nodes, as doing that from
>>>> the backend would add another potential ambiguity by the driver domain
>>>> choosing the same unique id as the previous one did.
>>>>
>>>> The question is whether something like your patch should be used as a
>>>> fallback in case there is no unique Id on the backend side of the device
>>>> due to a too old Xen version.
>>>
>>> I think I have found a solution which is much more simple, as it doesn't
>>> need any change of the protocol or any addition of new identifiers.
>>>
>>> When creating a new PV device, Xen tools will always write all generic
>>> frontend- and backend-nodes. This includes the frontend state, which is
>>> initialized as XenbusStateInitialising.
>>>
>>> The Linux kernel's xenbus driver is already storing the last known state
>>> of a xenbus device in struct xenbus_device. When changing the state, the
>>> xenbus driver is even reading the state from Xenstore (even if only for
>>> making sure the path is still existing). So all what is needed is to check,
>>> whether the read current state is matching the locally saved state. If it
>>> is not matching AND the read state is XenbusStateInitialising, you can be
>>> sure that the backend has been replaced.
>>>
>>> Handling this will need to check the return value of xenbus_switch_state()
>>> in all related drivers, but this is just a more or less mechanical change.
>>>
>>> I'll prepare a patch series for that.
>>
>> In the end the result is more like your patch, avoiding the need to modify
>> all drivers.
>>
>> I just added my idea to your patch and modified some of your code to be more
>> simple. I _think_ I have covered all possible scenarios now, resulting in
>> the need to keep the backend id check in case the backend died during the
>> early init phase of the device.
>>
>> Could you please verify the attached patch is working for you?
> 
> Thanks for the patch!
> 
> I ran it through relevant tests, and I got inconsistent results.
> Specifically, sometimes, the domU hangs (actually, just one vCPU spins
> inside xenwatch thread). Last console messages are:
> 
>      systemd[626]: Starting dconf.service - User preferences database...
>      gnome-keyring-daemon[975]: ␛[0;1;39mcouldn't access control socket: /run/user/1000/keyring/control: No such file or directory␛[0m
>      gnome-keyring-daemon[975]: ␛[0;1;38:5:185mdiscover_other_daemon: 0␛[0m
>      xen vif-0: xenbus: state reset occurred, reconnecting
>      gnome-keyring-daemon[974]: ␛[0;1;39mcouldn't access control socket: /run/user/1000/keyring/control: No such file or directory␛[0m
>      gnome-keyring-daemon[976]: ␛[0;1;39mcouldn't access control socket: /run/user/1000/keyring/control: No such file or directory␛[0m
>      gnome-keyring-daemon[976]: ␛[0;1;38:5:185mdiscover_other_daemon: 0␛[0m
>      gnome-keyring-daemon[974]: ␛[0;1;38:5:185mdiscover_other_daemon: 0␛[0m
>      xen vif-0: xenbus: state reset occurred, reconnecting
>      systemd[626]: Started dconf.service - User preferences database.
>      xen_netfront: Initialising Xen virtual ethernet driver
>      vif vif-0: xenbus: state reset occurred, reconnecting
> 
> And the call trace of the spinning xenwatch thread is:
> 
>      task:xenwatch        state:D stack:0     pid:64    tgid:64    ppid:2      task_flags:0x288040 flags:0x00080000
>      Call Trace:
>       <TASK>
>       __schedule+0x2f3/0x780
>       schedule+0x27/0x80
>       xs_wait_for_reply+0xab/0x1f0
>       ? __pfx_autoremove_wake_function+0x10/0x10
>       xs_talkv+0xec/0x200
>       xs_single+0x4a/0x70
>       xenbus_gather+0xe4/0x1a0
>       xenbus_read_driver_state+0x42/0x70
>       xennet_bus_close+0x113/0x2c0 [xen_netfront]
>       ? __pfx_autoremove_wake_function+0x10/0x10
>       xennet_remove+0x16/0x80 [xen_netfront]
>       xenbus_dev_remove+0x71/0xf0
>       device_release_driver_internal+0x19c/0x200
>       bus_remove_device+0xc6/0x130
>       device_del+0x160/0x3e0
>       device_unregister+0x17/0x60
>       xenbus_dev_changed.cold+0x5e/0x6b
>       ? __pfx_xenwatch_thread+0x10/0x10
>       xenwatch_thread+0x92/0x1c0
>       ? __pfx_autoremove_wake_function+0x10/0x10
>       kthread+0xfc/0x240
>       ? __pfx_kthread+0x10/0x10
>       ret_from_fork+0xf5/0x110
>       ? __pfx_kthread+0x10/0x10
>       ret_from_fork_asm+0x1a/0x30
>       </TASK>
>      task:xenbus          state:S stack:0     pid:63    tgid:63    ppid:2      task_flags:0x208040 flags:0x00080000
>      Call Trace:
>       <TASK>
>       __schedule+0x2f3/0x780
>       ? __pfx_xenbus_thread+0x10/0x10
>       schedule+0x27/0x80
>       xenbus_thread+0x1a8/0x200
>       ? __pfx_autoremove_wake_function+0x10/0x10
>       kthread+0xfc/0x240
>       ? __pfx_kthread+0x10/0x10
>       ret_from_fork+0xf5/0x110
>       ? __pfx_kthread+0x10/0x10
>       ret_from_fork_asm+0x1a/0x30
>       </TASK>
> 
> (technically, `top` says it's the xenbus thread spinning, but it looks
> like the actual issue is in xenwatch one)
> 
> Note that other xenwatch actions in this domU are not executed, for
> example `xl sysrq` does nothing. Not surprising, given xenwatch thread
> is busy... Fortunately, it blocks only a single vCPU, so I'm able to
> interact with the domU over console (to get the above traces).
> 
> It isn't a reliable failure, in this test run it failed once, out of 4
> related tests.
> 
> The specific test is: https://github.com/QubesOS/qubes-core-admin/blob/main/qubes/tests/integ/network.py#L234
> In short:
> 1. Start a domU
> 2. Pause it
> 3. Attach network (backend is != dom0)
> 4. Unpause
> 
> TBH, I'm not sure why the "state reset occurred" message is triggered at
> all, I think it shouldn't be in this case...
> 

Second try.


Juergen

[-- Attachment #1.1.2: 0001-xenbus-add-xenbus_device-parameter-to-xenbus_read_dr.patch --]
[-- Type: text/x-patch, Size: 10690 bytes --]

From be50f0a4041116392864a9acb37315c8b8c076a7 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Mon, 9 Feb 2026 08:42:06 +0100
Subject: [PATCH 1/2] xenbus: add xenbus_device parameter to
 xenbus_read_driver_state()

In order to prepare checking the xenbus device status in
xenbus_read_driver_state(), add the pointer to struct xenbus_device
as a parameter.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/net/xen-netfront.c                 | 34 +++++++++++-----------
 drivers/pci/xen-pcifront.c                 |  8 ++---
 drivers/scsi/xen-scsifront.c               |  2 +-
 drivers/xen/xen-pciback/xenbus.c           | 10 +++----
 drivers/xen/xenbus/xenbus_client.c         |  3 +-
 drivers/xen/xenbus/xenbus_probe.c          |  6 ++--
 drivers/xen/xenbus/xenbus_probe_frontend.c |  2 +-
 include/xen/xenbus.h                       |  3 +-
 8 files changed, 35 insertions(+), 33 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 7c2220366623..e2da977c9c50 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1646,7 +1646,7 @@ static int xennet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 
 	/* avoid the race with XDP headroom adjustment */
 	wait_event(module_wq,
-		   xenbus_read_driver_state(np->xbdev->otherend) ==
+		   xenbus_read_driver_state(np->xbdev, np->xbdev->otherend) ==
 		   XenbusStateReconfigured);
 	np->netfront_xdp_enabled = true;
 
@@ -1764,9 +1764,9 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
 	do {
 		xenbus_switch_state(dev, XenbusStateInitialising);
 		err = wait_event_timeout(module_wq,
-				 xenbus_read_driver_state(dev->otherend) !=
+				 xenbus_read_driver_state(dev, dev->otherend) !=
 				 XenbusStateClosed &&
-				 xenbus_read_driver_state(dev->otherend) !=
+				 xenbus_read_driver_state(dev, dev->otherend) !=
 				 XenbusStateUnknown, XENNET_TIMEOUT);
 	} while (!err);
 
@@ -2627,31 +2627,31 @@ static void xennet_bus_close(struct xenbus_device *dev)
 {
 	int ret;
 
-	if (xenbus_read_driver_state(dev->otherend) == XenbusStateClosed)
+	if (xenbus_read_driver_state(dev, dev->otherend) == XenbusStateClosed)
 		return;
 	do {
 		xenbus_switch_state(dev, XenbusStateClosing);
 		ret = wait_event_timeout(module_wq,
-				   xenbus_read_driver_state(dev->otherend) ==
-				   XenbusStateClosing ||
-				   xenbus_read_driver_state(dev->otherend) ==
-				   XenbusStateClosed ||
-				   xenbus_read_driver_state(dev->otherend) ==
-				   XenbusStateUnknown,
-				   XENNET_TIMEOUT);
+				xenbus_read_driver_state(dev, dev->otherend) ==
+				XenbusStateClosing ||
+				xenbus_read_driver_state(dev, dev->otherend) ==
+				XenbusStateClosed ||
+				xenbus_read_driver_state(dev, dev->otherend) ==
+				XenbusStateUnknown,
+				XENNET_TIMEOUT);
 	} while (!ret);
 
-	if (xenbus_read_driver_state(dev->otherend) == XenbusStateClosed)
+	if (xenbus_read_driver_state(dev, dev->otherend) == XenbusStateClosed)
 		return;
 
 	do {
 		xenbus_switch_state(dev, XenbusStateClosed);
 		ret = wait_event_timeout(module_wq,
-				   xenbus_read_driver_state(dev->otherend) ==
-				   XenbusStateClosed ||
-				   xenbus_read_driver_state(dev->otherend) ==
-				   XenbusStateUnknown,
-				   XENNET_TIMEOUT);
+				xenbus_read_driver_state(dev, dev->otherend) ==
+				XenbusStateClosed ||
+				xenbus_read_driver_state(dev, dev->otherend) ==
+				XenbusStateUnknown,
+				XENNET_TIMEOUT);
 	} while (!ret);
 }
 
diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
index 11636634ae51..cd22bf984024 100644
--- a/drivers/pci/xen-pcifront.c
+++ b/drivers/pci/xen-pcifront.c
@@ -856,7 +856,7 @@ static void pcifront_try_connect(struct pcifront_device *pdev)
 	int err;
 
 	/* Only connect once */
-	if (xenbus_read_driver_state(pdev->xdev->nodename) !=
+	if (xenbus_read_driver_state(pdev->xdev, pdev->xdev->nodename) !=
 	    XenbusStateInitialised)
 		return;
 
@@ -876,7 +876,7 @@ static int pcifront_try_disconnect(struct pcifront_device *pdev)
 	enum xenbus_state prev_state;
 
 
-	prev_state = xenbus_read_driver_state(pdev->xdev->nodename);
+	prev_state = xenbus_read_driver_state(pdev->xdev, pdev->xdev->nodename);
 
 	if (prev_state >= XenbusStateClosing)
 		goto out;
@@ -895,7 +895,7 @@ static int pcifront_try_disconnect(struct pcifront_device *pdev)
 
 static void pcifront_attach_devices(struct pcifront_device *pdev)
 {
-	if (xenbus_read_driver_state(pdev->xdev->nodename) ==
+	if (xenbus_read_driver_state(pdev->xdev, pdev->xdev->nodename) ==
 	    XenbusStateReconfiguring)
 		pcifront_connect(pdev);
 }
@@ -909,7 +909,7 @@ static int pcifront_detach_devices(struct pcifront_device *pdev)
 	struct pci_dev *pci_dev;
 	char str[64];
 
-	state = xenbus_read_driver_state(pdev->xdev->nodename);
+	state = xenbus_read_driver_state(pdev->xdev, pdev->xdev->nodename);
 	if (state == XenbusStateInitialised) {
 		dev_dbg(&pdev->xdev->dev, "Handle skipped connect.\n");
 		/* We missed Connected and need to initialize. */
diff --git a/drivers/scsi/xen-scsifront.c b/drivers/scsi/xen-scsifront.c
index 924025305753..ef74d4da5ab0 100644
--- a/drivers/scsi/xen-scsifront.c
+++ b/drivers/scsi/xen-scsifront.c
@@ -1175,7 +1175,7 @@ static void scsifront_backend_changed(struct xenbus_device *dev,
 			return;
 		}
 
-		if (xenbus_read_driver_state(dev->nodename) ==
+		if (xenbus_read_driver_state(dev, dev->nodename) ==
 		    XenbusStateInitialised)
 			scsifront_do_lun_hotplug(info, VSCSIFRONT_OP_ADD_LUN);
 
diff --git a/drivers/xen/xen-pciback/xenbus.c b/drivers/xen/xen-pciback/xenbus.c
index b11e401f1b1e..4bd1c7a8957e 100644
--- a/drivers/xen/xen-pciback/xenbus.c
+++ b/drivers/xen/xen-pciback/xenbus.c
@@ -149,12 +149,12 @@ static int xen_pcibk_attach(struct xen_pcibk_device *pdev)
 
 	mutex_lock(&pdev->dev_lock);
 	/* Make sure we only do this setup once */
-	if (xenbus_read_driver_state(pdev->xdev->nodename) !=
+	if (xenbus_read_driver_state(pdev->xdev, pdev->xdev->nodename) !=
 	    XenbusStateInitialised)
 		goto out;
 
 	/* Wait for frontend to state that it has published the configuration */
-	if (xenbus_read_driver_state(pdev->xdev->otherend) !=
+	if (xenbus_read_driver_state(pdev->xdev, pdev->xdev->otherend) !=
 	    XenbusStateInitialised)
 		goto out;
 
@@ -374,7 +374,7 @@ static int xen_pcibk_reconfigure(struct xen_pcibk_device *pdev,
 	dev_dbg(&pdev->xdev->dev, "Reconfiguring device ...\n");
 
 	mutex_lock(&pdev->dev_lock);
-	if (xenbus_read_driver_state(pdev->xdev->nodename) != state)
+	if (xenbus_read_driver_state(pdev->xdev, pdev->xdev->nodename) != state)
 		goto out;
 
 	err = xenbus_scanf(XBT_NIL, pdev->xdev->nodename, "num_devs", "%d",
@@ -572,7 +572,7 @@ static int xen_pcibk_setup_backend(struct xen_pcibk_device *pdev)
 	/* It's possible we could get the call to setup twice, so make sure
 	 * we're not already connected.
 	 */
-	if (xenbus_read_driver_state(pdev->xdev->nodename) !=
+	if (xenbus_read_driver_state(pdev->xdev, pdev->xdev->nodename) !=
 	    XenbusStateInitWait)
 		goto out;
 
@@ -662,7 +662,7 @@ static void xen_pcibk_be_watch(struct xenbus_watch *watch,
 	struct xen_pcibk_device *pdev =
 	    container_of(watch, struct xen_pcibk_device, be_watch);
 
-	switch (xenbus_read_driver_state(pdev->xdev->nodename)) {
+	switch (xenbus_read_driver_state(pdev->xdev, pdev->xdev->nodename)) {
 	case XenbusStateInitWait:
 		xen_pcibk_setup_backend(pdev);
 		break;
diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index 2dc874fb5506..6ed0cd8e9676 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -936,7 +936,8 @@ static int xenbus_unmap_ring_hvm(struct xenbus_device *dev, void *vaddr)
  * Returns: the state of the driver rooted at the given store path, or
  * XenbusStateUnknown if no state can be read.
  */
-enum xenbus_state xenbus_read_driver_state(const char *path)
+enum xenbus_state xenbus_read_driver_state(const struct xenbus_device *dev,
+					   const char *path)
 {
 	enum xenbus_state result;
 	int err = xenbus_gather(XBT_NIL, path, "state", "%d", &result, NULL);
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 86fe6e779056..d4f1dbf7cbc4 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -191,7 +191,7 @@ void xenbus_otherend_changed(struct xenbus_watch *watch,
 		return;
 	}
 
-	state = xenbus_read_driver_state(dev->otherend);
+	state = xenbus_read_driver_state(dev, dev->otherend);
 
 	dev_dbg(&dev->dev, "state is %d, (%s), %s, %s\n",
 		state, xenbus_strstate(state), dev->otherend_watch.node, path);
@@ -364,7 +364,7 @@ void xenbus_dev_remove(struct device *_dev)
 	 * closed.
 	 */
 	if (!drv->allow_rebind ||
-	    xenbus_read_driver_state(dev->nodename) == XenbusStateClosing)
+	    xenbus_read_driver_state(dev, dev->nodename) == XenbusStateClosing)
 		xenbus_switch_state(dev, XenbusStateClosed);
 }
 EXPORT_SYMBOL_GPL(xenbus_dev_remove);
@@ -514,7 +514,7 @@ int xenbus_probe_node(struct xen_bus_type *bus,
 	size_t stringlen;
 	char *tmpstring;
 
-	enum xenbus_state state = xenbus_read_driver_state(nodename);
+	enum xenbus_state state = xenbus_read_driver_state(NULL, nodename);
 
 	if (state != XenbusStateInitialising) {
 		/* Device is not new, so ignore it.  This can happen if a
diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c
index 6d1819269cbe..41832994f8b1 100644
--- a/drivers/xen/xenbus/xenbus_probe_frontend.c
+++ b/drivers/xen/xenbus/xenbus_probe_frontend.c
@@ -255,7 +255,7 @@ static int print_device_status(struct device *dev, void *data)
 	} else if (xendev->state < XenbusStateConnected) {
 		enum xenbus_state rstate = XenbusStateUnknown;
 		if (xendev->otherend)
-			rstate = xenbus_read_driver_state(xendev->otherend);
+			rstate = xenbus_read_driver_state(xendev, xendev->otherend);
 		pr_warn("Timeout connecting to device: %s (local state %d, remote state %d)\n",
 			xendev->nodename, xendev->state, rstate);
 	}
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index c94caf852aea..15319da65b7f 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -228,7 +228,8 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr);
 int xenbus_alloc_evtchn(struct xenbus_device *dev, evtchn_port_t *port);
 int xenbus_free_evtchn(struct xenbus_device *dev, evtchn_port_t port);
 
-enum xenbus_state xenbus_read_driver_state(const char *path);
+enum xenbus_state xenbus_read_driver_state(const struct xenbus_device *dev,
+					   const char *path);
 
 __printf(3, 4)
 void xenbus_dev_error(struct xenbus_device *dev, int err, const char *fmt, ...);
-- 
2.53.0


[-- Attachment #1.1.3: 0002-xen-xenbus-better-handle-backend-crash.patch --]
[-- Type: text/x-patch, Size: 6192 bytes --]

From 0292f843bff05c1b266c32e9e802ca3f01e030f0 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Mon, 26 Jan 2026 08:45:17 +0100
Subject: [PATCH 2/2] xen/xenbus: better handle backend crash
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When the backend domain crashes, coordinated device cleanup is not
possible (as it involves waiting for the backend state change). In that
case, toolstack forcefully removes frontend xenstore entries.
xenbus_dev_changed() handles this case, and triggers device cleanup.
It's possible that toolstack manages to connect new device in that
place, before xenbus_dev_changed() notices the old one is missing. If
that happens, new one won't be probed and will forever remain in
XenbusStateInitialising.

Fix this by checking the frontend's state in Xenstore. In case it has
been reset to XenbusStateInitialising by Xen tools, consider this
being the result of an unplug+plug operation.

It's important that cleanup on such unplug doesn't modify Xenstore
entries (especially the "state" key) as it belong to the new device
to be probed - changing it would derail establishing connection to the
new backend (most likely, closing the device before it was even
connected). Handle this case by setting new xenbus_device->vanished
flag to true, and check it before changing state entry.

And even if xenbus_dev_changed() correctly detects the device was
forcefully removed, the cleanup handling is still racy. Since this whole
handling doesn't happened in a single Xenstore transaction, it's possible
that toolstack might put a new device there already. Avoid re-creating
the state key (which in the case of loosing the race would actually
close newly attached device).

The problem does not apply to frontend domain crash, as this case
involves coordinated cleanup.

Problem originally reported at
https://lore.kernel.org/xen-devel/aOZvivyZ9YhVWDLN@mail-itl/T/#t,
including reproduction steps.

Based-on-patch-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/xen/xenbus/xenbus_client.c | 13 +++++++++--
 drivers/xen/xenbus/xenbus_probe.c  | 36 ++++++++++++++++++++++++++++++
 include/xen/xenbus.h               |  1 +
 3 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index 6ed0cd8e9676..00ee8f62c28c 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -226,8 +226,9 @@ __xenbus_switch_state(struct xenbus_device *dev,
 	struct xenbus_transaction xbt;
 	int current_state;
 	int err, abort;
+	bool vanished = false;
 
-	if (state == dev->state)
+	if (state == dev->state || dev->vanished)
 		return 0;
 
 again:
@@ -242,6 +243,10 @@ __xenbus_switch_state(struct xenbus_device *dev,
 	err = xenbus_scanf(xbt, dev->nodename, "state", "%d", &current_state);
 	if (err != 1)
 		goto abort;
+	if (current_state != dev->state && current_state == XenbusStateInitialising) {
+		vanished = true;
+		goto abort;
+	}
 
 	err = xenbus_printf(xbt, dev->nodename, "state", "%d", state);
 	if (err) {
@@ -256,7 +261,7 @@ __xenbus_switch_state(struct xenbus_device *dev,
 		if (err == -EAGAIN && !abort)
 			goto again;
 		xenbus_switch_fatal(dev, depth, err, "ending transaction");
-	} else
+	} else if (!vanished)
 		dev->state = state;
 
 	return 0;
@@ -940,6 +945,10 @@ enum xenbus_state xenbus_read_driver_state(const struct xenbus_device *dev,
 					   const char *path)
 {
 	enum xenbus_state result;
+
+	if (dev && dev->vanished)
+		return XenbusStateUnknown;
+
 	int err = xenbus_gather(XBT_NIL, path, "state", "%d", &result, NULL);
 	if (err)
 		result = XenbusStateUnknown;
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index d4f1dbf7cbc4..47e992032014 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -444,6 +444,9 @@ static void xenbus_cleanup_devices(const char *path, struct bus_type *bus)
 		info.dev = NULL;
 		bus_for_each_dev(bus, NULL, &info, cleanup_dev);
 		if (info.dev) {
+			dev_warn(&info.dev->dev,
+				 "device forcefully removed from xenstore\n");
+			info.dev->vanished = true;
 			device_unregister(&info.dev->dev);
 			put_device(&info.dev->dev);
 		}
@@ -659,6 +662,39 @@ void xenbus_dev_changed(const char *node, struct xen_bus_type *bus)
 		return;
 
 	dev = xenbus_device_find(root, &bus->bus);
+	/*
+	 * Backend domain crash results in not coordinated frontend removal,
+	 * without going through XenbusStateClosing. If this is a new instance
+	 * of the same device Xen tools will have reset the state to
+	 * XenbusStateInitializing.
+	 * It might be that the backend crashed early during the init phase of
+	 * device setup, in which case the known state would have been
+	 * XenbusStateInitializing. So test the backend domid to match the
+	 * saved one. In case the new backend happens to have the same domid as
+	 * the old one, we can just carry on, as there is no inconsistency
+	 * resulting in this case.
+	 */
+	if (dev && !strcmp(bus->root, "device")) {
+		enum xenbus_state state = xenbus_read_driver_state(dev, dev->nodename);
+		unsigned int backend = xenbus_read_unsigned(root, "backend-id",
+							    dev->otherend_id);
+
+		if (state == XenbusStateInitialising &&
+		    (state != dev->state || backend != dev->otherend_id)) {
+			/*
+			 * State has been reset, assume the old one vanished
+			 * and new one needs to be probed.
+			 */
+			dev_warn(&dev->dev,
+				 "state reset occurred, reconnecting\n");
+			dev->vanished = true;
+		}
+		if (dev->vanished) {
+			device_unregister(&dev->dev);
+			put_device(&dev->dev);
+			dev = NULL;
+		}
+	}
 	if (!dev)
 		xenbus_probe_node(bus, type, root);
 	else
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 15319da65b7f..8ca15743af7f 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -80,6 +80,7 @@ struct xenbus_device {
 	const char *devicetype;
 	const char *nodename;
 	const char *otherend;
+	bool vanished;
 	int otherend_id;
 	struct xenbus_watch otherend_watch;
 	struct device dev;
-- 
2.53.0


[-- Attachment #1.1.4: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] xen/xenbus: better handle backend crash
  2026-02-09  9:02         ` Jürgen Groß
@ 2026-02-09 20:46           ` Marek Marczykowski-Górecki
  0 siblings, 0 replies; 9+ messages in thread
From: Marek Marczykowski-Górecki @ 2026-02-09 20:46 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: linux-kernel, Stefano Stabellini, Oleksandr Tyshchenko,
	Peng Jiang, Qiu-ji Chen, Jason Andryuk,
	moderated list:XEN HYPERVISOR INTERFACE

[-- Attachment #1: Type: text/plain, Size: 9649 bytes --]

On Mon, Feb 09, 2026 at 10:02:28AM +0100, Jürgen Groß wrote:
> On 06.02.26 17:57, Marek Marczykowski-Górecki wrote:
> > On Thu, Jan 29, 2026 at 08:02:35AM +0100, Jürgen Groß wrote:
> > > On 26.01.26 08:08, Jürgen Groß wrote:
> > > > On 17.11.25 12:06, Jürgen Groß wrote:
> > > > > On 02.11.25 04:20, Marek Marczykowski-Górecki wrote:
> > > > > > When the backend domain crashes, coordinated device cleanup is not
> > > > > > possible (as it involves waiting for the backend state change). In that
> > > > > > case, toolstack forcefully removes frontend xenstore entries.
> > > > > > xenbus_dev_changed() handles this case, and triggers device cleanup.
> > > > > > It's possible that toolstack manages to connect new device in that
> > > > > > place, before xenbus_dev_changed() notices the old one is missing. If
> > > > > > that happens, new one won't be probed and will forever remain in
> > > > > > XenbusStateInitialising.
> > > > > > 
> > > > > > Fix this by checking backend-id and if it changes, consider it
> > > > > > unplug+plug operation. It's important that cleanup on such unplug
> > > > > > doesn't modify xenstore entries (especially the "state" key) as it
> > > > > > belong to the new device to be probed - changing it would derail
> > > > > > establishing connection to the new backend (most likely, closing the
> > > > > > device before it was even connected). Handle this case by setting new
> > > > > > xenbus_device->vanished flag to true, and check it before changing state
> > > > > > entry.
> > > > > > 
> > > > > > And even if xenbus_dev_changed() correctly detects the device was
> > > > > > forcefully removed, the cleanup handling is still racy. Since this whole
> > > > > > handling doesn't happend in a single xenstore transaction, it's possible
> > > > > > that toolstack might put a new device there already. Avoid re-creating
> > > > > > the state key (which in the case of loosing the race would actually
> > > > > > close newly attached device).
> > > > > > 
> > > > > > The problem does not apply to frontend domain crash, as this case
> > > > > > involves coordinated cleanup.
> > > > > > 
> > > > > > Problem originally reported at
> > > > > > https://lore.kernel.org/xen-devel/aOZvivyZ9YhVWDLN@mail-itl/T/#t,
> > > > > > including reproduction steps.
> > > > > > 
> > > > > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > > > > 
> > > > > Sorry I didn't get earlier to this.
> > > > > 
> > > > > My main problem with this patch is that it is basically just papering over
> > > > > a more general problem.
> > > > > 
> > > > > You are just making the problem much more improbable, but not impossible to
> > > > > occur again. In case the new driver domain has the same domid as the old one
> > > > > you can still have the same race.
> > > > > 
> > > > > The clean way to handle that would be to add a unique Id in Xenstore to each
> > > > > device on the backend side, which can be tested on the frontend side to
> > > > > match. In case it doesn't match, an old device with the same kind and devid
> > > > > can be cleaned up.
> > > > > 
> > > > > The unique Id would obviously need to be set by the Xen tools inside the
> > > > > transaction writing the initial backend Xenstore nodes, as doing that from
> > > > > the backend would add another potential ambiguity by the driver domain
> > > > > choosing the same unique id as the previous one did.
> > > > > 
> > > > > The question is whether something like your patch should be used as a
> > > > > fallback in case there is no unique Id on the backend side of the device
> > > > > due to a too old Xen version.
> > > > 
> > > > I think I have found a solution which is much more simple, as it doesn't
> > > > need any change of the protocol or any addition of new identifiers.
> > > > 
> > > > When creating a new PV device, Xen tools will always write all generic
> > > > frontend- and backend-nodes. This includes the frontend state, which is
> > > > initialized as XenbusStateInitialising.
> > > > 
> > > > The Linux kernel's xenbus driver is already storing the last known state
> > > > of a xenbus device in struct xenbus_device. When changing the state, the
> > > > xenbus driver is even reading the state from Xenstore (even if only for
> > > > making sure the path is still existing). So all what is needed is to check,
> > > > whether the read current state is matching the locally saved state. If it
> > > > is not matching AND the read state is XenbusStateInitialising, you can be
> > > > sure that the backend has been replaced.
> > > > 
> > > > Handling this will need to check the return value of xenbus_switch_state()
> > > > in all related drivers, but this is just a more or less mechanical change.
> > > > 
> > > > I'll prepare a patch series for that.
> > > 
> > > In the end the result is more like your patch, avoiding the need to modify
> > > all drivers.
> > > 
> > > I just added my idea to your patch and modified some of your code to be more
> > > simple. I _think_ I have covered all possible scenarios now, resulting in
> > > the need to keep the backend id check in case the backend died during the
> > > early init phase of the device.
> > > 
> > > Could you please verify the attached patch is working for you?
> > 
> > Thanks for the patch!
> > 
> > I ran it through relevant tests, and I got inconsistent results.
> > Specifically, sometimes, the domU hangs (actually, just one vCPU spins
> > inside xenwatch thread). Last console messages are:
> > 
> >      systemd[626]: Starting dconf.service - User preferences database...
> >      gnome-keyring-daemon[975]: ␛[0;1;39mcouldn't access control socket: /run/user/1000/keyring/control: No such file or directory␛[0m
> >      gnome-keyring-daemon[975]: ␛[0;1;38:5:185mdiscover_other_daemon: 0␛[0m
> >      xen vif-0: xenbus: state reset occurred, reconnecting
> >      gnome-keyring-daemon[974]: ␛[0;1;39mcouldn't access control socket: /run/user/1000/keyring/control: No such file or directory␛[0m
> >      gnome-keyring-daemon[976]: ␛[0;1;39mcouldn't access control socket: /run/user/1000/keyring/control: No such file or directory␛[0m
> >      gnome-keyring-daemon[976]: ␛[0;1;38:5:185mdiscover_other_daemon: 0␛[0m
> >      gnome-keyring-daemon[974]: ␛[0;1;38:5:185mdiscover_other_daemon: 0␛[0m
> >      xen vif-0: xenbus: state reset occurred, reconnecting
> >      systemd[626]: Started dconf.service - User preferences database.
> >      xen_netfront: Initialising Xen virtual ethernet driver
> >      vif vif-0: xenbus: state reset occurred, reconnecting
> > 
> > And the call trace of the spinning xenwatch thread is:
> > 
> >      task:xenwatch        state:D stack:0     pid:64    tgid:64    ppid:2      task_flags:0x288040 flags:0x00080000
> >      Call Trace:
> >       <TASK>
> >       __schedule+0x2f3/0x780
> >       schedule+0x27/0x80
> >       xs_wait_for_reply+0xab/0x1f0
> >       ? __pfx_autoremove_wake_function+0x10/0x10
> >       xs_talkv+0xec/0x200
> >       xs_single+0x4a/0x70
> >       xenbus_gather+0xe4/0x1a0
> >       xenbus_read_driver_state+0x42/0x70
> >       xennet_bus_close+0x113/0x2c0 [xen_netfront]
> >       ? __pfx_autoremove_wake_function+0x10/0x10
> >       xennet_remove+0x16/0x80 [xen_netfront]
> >       xenbus_dev_remove+0x71/0xf0
> >       device_release_driver_internal+0x19c/0x200
> >       bus_remove_device+0xc6/0x130
> >       device_del+0x160/0x3e0
> >       device_unregister+0x17/0x60
> >       xenbus_dev_changed.cold+0x5e/0x6b
> >       ? __pfx_xenwatch_thread+0x10/0x10
> >       xenwatch_thread+0x92/0x1c0
> >       ? __pfx_autoremove_wake_function+0x10/0x10
> >       kthread+0xfc/0x240
> >       ? __pfx_kthread+0x10/0x10
> >       ret_from_fork+0xf5/0x110
> >       ? __pfx_kthread+0x10/0x10
> >       ret_from_fork_asm+0x1a/0x30
> >       </TASK>
> >      task:xenbus          state:S stack:0     pid:63    tgid:63    ppid:2      task_flags:0x208040 flags:0x00080000
> >      Call Trace:
> >       <TASK>
> >       __schedule+0x2f3/0x780
> >       ? __pfx_xenbus_thread+0x10/0x10
> >       schedule+0x27/0x80
> >       xenbus_thread+0x1a8/0x200
> >       ? __pfx_autoremove_wake_function+0x10/0x10
> >       kthread+0xfc/0x240
> >       ? __pfx_kthread+0x10/0x10
> >       ret_from_fork+0xf5/0x110
> >       ? __pfx_kthread+0x10/0x10
> >       ret_from_fork_asm+0x1a/0x30
> >       </TASK>
> > 
> > (technically, `top` says it's the xenbus thread spinning, but it looks
> > like the actual issue is in xenwatch one)
> > 
> > Note that other xenwatch actions in this domU are not executed, for
> > example `xl sysrq` does nothing. Not surprising, given xenwatch thread
> > is busy... Fortunately, it blocks only a single vCPU, so I'm able to
> > interact with the domU over console (to get the above traces).
> > 
> > It isn't a reliable failure, in this test run it failed once, out of 4
> > related tests.
> > 
> > The specific test is: https://github.com/QubesOS/qubes-core-admin/blob/main/qubes/tests/integ/network.py#L234
> > In short:
> > 1. Start a domU
> > 2. Pause it
> > 3. Attach network (backend is != dom0)
> > 4. Unpause
> > 
> > TBH, I'm not sure why the "state reset occurred" message is triggered at
> > all, I think it shouldn't be in this case...
> > 
> 
> Second try.

This time it's green: https://openqa.qubes-os.org/tests/166851 :)
You can treat it as T-by tag.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-02-09 20:46 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-02  3:20 [PATCH] xen/xenbus: better handle backend crash Marek Marczykowski-Górecki
2025-11-13 18:43 ` Marek Marczykowski-Górecki
2025-11-17 11:06 ` Jürgen Groß
2026-01-26  7:08   ` Jürgen Groß
2026-01-29  7:02     ` Jürgen Groß
2026-02-06 16:57       ` Marek Marczykowski-Górecki
2026-02-09  7:30         ` Jürgen Groß
2026-02-09  9:02         ` Jürgen Groß
2026-02-09 20:46           ` Marek Marczykowski-Górecki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox