* [PATCH v2] libxl: do not fail device removal if backend domain is gone
@ 2018-02-23 20:00 Marek Marczykowski-Górecki
2018-02-26 10:32 ` Roger Pau Monné
2018-02-26 10:35 ` Wei Liu
0 siblings, 2 replies; 3+ messages in thread
From: Marek Marczykowski-Górecki @ 2018-02-23 20:00 UTC (permalink / raw)
To: xen-devel; +Cc: Wei Liu, Ian Jackson, Marek Marczykowski-Górecki
Backend domain may be independently destroyed - there is no
synchronization of libxl structures (including /libxl tree) elsewhere.
Backend might also remove the device info from its backend xenstore
subtree on its own.
We have various cases (not comprehensive list):
- both frontend and backend operational: after setting
be/state=XenbusStateClosing backend wait for frontend confirmation
and respond with be/state=XenbusStateClosed; then libxl in dom0
remove frontend entries and libxl in backend domain (which may be the
same) remove backend entries
- unresponsive backend/frontend: after a timeout, force=1 is used to remove
frontend entries, instead of just setting
be/state=XenbusStateClosing; then wait for be/state=XenbusStateClosed.
If that timeout too, remove both frontend and backend entries
- backend gone, with this patch: no place for setting/waiting on
be/state - go directly to removing frontend entries, without waiting
for be/state=XenbusStateClosed (this is the difference vs force=1)
Without this patch the end result is similar, both frontend and backend
entries are removed, but in case of backend gone:
- libxl waits for be/state=XenbusStateClosed (and obviously timeout)
- return value from the function signal an error, which for example
confuse libvirt - it thinks the device remove failed, so is still
there
If such situation is detected, do not fail the removal, but finish the
cleanup of the frontend side and return 0.
This is just workaround, the real fix should watch when the device
backend is removed (including backend domain destruction) and remove
frontend at that time. And report such event to higher layer code, so
for example libvirt could synchronize its state.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
---
Changes in v2:
- change WARN to INFO
- detailed explanation in commit message
---
tools/libxl/libxl_device.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
index 1b796bd392..c60cafe774 100644
--- a/tools/libxl/libxl_device.c
+++ b/tools/libxl/libxl_device.c
@@ -997,6 +997,13 @@ void libxl__initiate_device_generic_remove(libxl__egc *egc,
goto out;
}
+ /* if state_path is empty, assume backend is gone (backend domain
+ * shutdown?), cleanup frontend only; rc=0 */
+ if (!state) {
+ LOG(INFO, "backend %s already removed, cleanup frontend only", be_path);
+ goto out;
+ }
+
rc = libxl__xs_write_checked(gc, t, online_path, "0");
if (rc)
goto out;
--
2.13.6
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v2] libxl: do not fail device removal if backend domain is gone
2018-02-23 20:00 [PATCH v2] libxl: do not fail device removal if backend domain is gone Marek Marczykowski-Górecki
@ 2018-02-26 10:32 ` Roger Pau Monné
2018-02-26 10:35 ` Wei Liu
1 sibling, 0 replies; 3+ messages in thread
From: Roger Pau Monné @ 2018-02-26 10:32 UTC (permalink / raw)
To: Marek Marczykowski-Górecki; +Cc: Ian Jackson, Wei Liu, xen-devel
On Fri, Feb 23, 2018 at 09:00:41PM +0100, Marek Marczykowski-Górecki wrote:
> Backend domain may be independently destroyed - there is no
> synchronization of libxl structures (including /libxl tree) elsewhere.
> Backend might also remove the device info from its backend xenstore
> subtree on its own.
>
> We have various cases (not comprehensive list):
>
> - both frontend and backend operational: after setting
> be/state=XenbusStateClosing backend wait for frontend confirmation
> and respond with be/state=XenbusStateClosed; then libxl in dom0
> remove frontend entries and libxl in backend domain (which may be the
> same) remove backend entries
> - unresponsive backend/frontend: after a timeout, force=1 is used to remove
> frontend entries, instead of just setting
> be/state=XenbusStateClosing; then wait for be/state=XenbusStateClosed.
> If that timeout too, remove both frontend and backend entries
> - backend gone, with this patch: no place for setting/waiting on
> be/state - go directly to removing frontend entries, without waiting
> for be/state=XenbusStateClosed (this is the difference vs force=1)
>
> Without this patch the end result is similar, both frontend and backend
> entries are removed, but in case of backend gone:
> - libxl waits for be/state=XenbusStateClosed (and obviously timeout)
> - return value from the function signal an error, which for example
> confuse libvirt - it thinks the device remove failed, so is still
> there
>
> If such situation is detected, do not fail the removal, but finish the
> cleanup of the frontend side and return 0.
>
> This is just workaround, the real fix should watch when the device
> backend is removed (including backend domain destruction) and remove
> frontend at that time. And report such event to higher layer code, so
> for example libvirt could synchronize its state.
>
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
You seem to have dropped my RB:
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Roger.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2] libxl: do not fail device removal if backend domain is gone
2018-02-23 20:00 [PATCH v2] libxl: do not fail device removal if backend domain is gone Marek Marczykowski-Górecki
2018-02-26 10:32 ` Roger Pau Monné
@ 2018-02-26 10:35 ` Wei Liu
1 sibling, 0 replies; 3+ messages in thread
From: Wei Liu @ 2018-02-26 10:35 UTC (permalink / raw)
To: Marek Marczykowski-Górecki; +Cc: Wei Liu, Ian Jackson, xen-devel
On Fri, Feb 23, 2018 at 09:00:41PM +0100, Marek Marczykowski-Górecki wrote:
> Backend domain may be independently destroyed - there is no
> synchronization of libxl structures (including /libxl tree) elsewhere.
> Backend might also remove the device info from its backend xenstore
> subtree on its own.
>
> We have various cases (not comprehensive list):
>
> - both frontend and backend operational: after setting
> be/state=XenbusStateClosing backend wait for frontend confirmation
> and respond with be/state=XenbusStateClosed; then libxl in dom0
> remove frontend entries and libxl in backend domain (which may be the
> same) remove backend entries
> - unresponsive backend/frontend: after a timeout, force=1 is used to remove
> frontend entries, instead of just setting
> be/state=XenbusStateClosing; then wait for be/state=XenbusStateClosed.
> If that timeout too, remove both frontend and backend entries
> - backend gone, with this patch: no place for setting/waiting on
> be/state - go directly to removing frontend entries, without waiting
> for be/state=XenbusStateClosed (this is the difference vs force=1)
>
> Without this patch the end result is similar, both frontend and backend
> entries are removed, but in case of backend gone:
> - libxl waits for be/state=XenbusStateClosed (and obviously timeout)
> - return value from the function signal an error, which for example
> confuse libvirt - it thinks the device remove failed, so is still
> there
>
> If such situation is detected, do not fail the removal, but finish the
> cleanup of the frontend side and return 0.
>
> This is just workaround, the real fix should watch when the device
> backend is removed (including backend domain destruction) and remove
> frontend at that time. And report such event to higher layer code, so
> for example libvirt could synchronize its state.
>
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2018-02-26 10:35 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-23 20:00 [PATCH v2] libxl: do not fail device removal if backend domain is gone Marek Marczykowski-Górecki
2018-02-26 10:32 ` Roger Pau Monné
2018-02-26 10:35 ` Wei Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).