xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] libxl: do not fail device removal if backend domain is gone
@ 2018-02-23 20:00 Marek Marczykowski-Górecki
  2018-02-26 10:32 ` Roger Pau Monné
  2018-02-26 10:35 ` Wei Liu
  0 siblings, 2 replies; 3+ messages in thread
From: Marek Marczykowski-Górecki @ 2018-02-23 20:00 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Marek Marczykowski-Górecki

Backend domain may be independently destroyed - there is no
synchronization of libxl structures (including /libxl tree) elsewhere.
Backend might also remove the device info from its backend xenstore
subtree on its own.

We have various cases (not comprehensive list):

 - both frontend and backend operational: after setting
   be/state=XenbusStateClosing backend wait for frontend confirmation
   and respond with be/state=XenbusStateClosed; then libxl in dom0
   remove frontend entries and libxl in backend domain (which may be the
   same) remove backend entries
 - unresponsive backend/frontend: after a timeout, force=1 is used to remove
   frontend entries, instead of just setting
   be/state=XenbusStateClosing; then wait for be/state=XenbusStateClosed.
   If that timeout too, remove both frontend and backend entries
 - backend gone, with this patch: no place for setting/waiting on
   be/state - go directly to removing frontend entries, without waiting
   for be/state=XenbusStateClosed (this is the difference vs force=1)

Without this patch the end result is similar, both frontend and backend
entries are removed, but in case of backend gone:
 - libxl waits for be/state=XenbusStateClosed (and obviously timeout)
 - return value from the function signal an error, which for example
   confuse libvirt - it thinks the device remove failed, so is still
   there

If such situation is detected, do not fail the removal, but finish the
cleanup of the frontend side and return 0.

This is just workaround, the real fix should watch when the device
backend is removed (including backend domain destruction) and remove
frontend at that time. And report such event to higher layer code, so
for example libvirt could synchronize its state.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

---
Changes in v2:
 - change WARN to INFO
 - detailed explanation in commit message
---
 tools/libxl/libxl_device.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
index 1b796bd392..c60cafe774 100644
--- a/tools/libxl/libxl_device.c
+++ b/tools/libxl/libxl_device.c
@@ -997,6 +997,13 @@ void libxl__initiate_device_generic_remove(libxl__egc *egc,
             goto out;
         }
 
+        /* if state_path is empty, assume backend is gone (backend domain
+         * shutdown?), cleanup frontend only; rc=0 */
+        if (!state) {
+            LOG(INFO, "backend %s already removed, cleanup frontend only", be_path);
+            goto out;
+        }
+
         rc = libxl__xs_write_checked(gc, t, online_path, "0");
         if (rc)
             goto out;
-- 
2.13.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] libxl: do not fail device removal if backend domain is gone
  2018-02-23 20:00 [PATCH v2] libxl: do not fail device removal if backend domain is gone Marek Marczykowski-Górecki
@ 2018-02-26 10:32 ` Roger Pau Monné
  2018-02-26 10:35 ` Wei Liu
  1 sibling, 0 replies; 3+ messages in thread
From: Roger Pau Monné @ 2018-02-26 10:32 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: Ian Jackson, Wei Liu, xen-devel

On Fri, Feb 23, 2018 at 09:00:41PM +0100, Marek Marczykowski-Górecki wrote:
> Backend domain may be independently destroyed - there is no
> synchronization of libxl structures (including /libxl tree) elsewhere.
> Backend might also remove the device info from its backend xenstore
> subtree on its own.
> 
> We have various cases (not comprehensive list):
> 
>  - both frontend and backend operational: after setting
>    be/state=XenbusStateClosing backend wait for frontend confirmation
>    and respond with be/state=XenbusStateClosed; then libxl in dom0
>    remove frontend entries and libxl in backend domain (which may be the
>    same) remove backend entries
>  - unresponsive backend/frontend: after a timeout, force=1 is used to remove
>    frontend entries, instead of just setting
>    be/state=XenbusStateClosing; then wait for be/state=XenbusStateClosed.
>    If that timeout too, remove both frontend and backend entries
>  - backend gone, with this patch: no place for setting/waiting on
>    be/state - go directly to removing frontend entries, without waiting
>    for be/state=XenbusStateClosed (this is the difference vs force=1)
> 
> Without this patch the end result is similar, both frontend and backend
> entries are removed, but in case of backend gone:
>  - libxl waits for be/state=XenbusStateClosed (and obviously timeout)
>  - return value from the function signal an error, which for example
>    confuse libvirt - it thinks the device remove failed, so is still
>    there
> 
> If such situation is detected, do not fail the removal, but finish the
> cleanup of the frontend side and return 0.
> 
> This is just workaround, the real fix should watch when the device
> backend is removed (including backend domain destruction) and remove
> frontend at that time. And report such event to higher layer code, so
> for example libvirt could synchronize its state.
> 
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

You seem to have dropped my RB:

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] libxl: do not fail device removal if backend domain is gone
  2018-02-23 20:00 [PATCH v2] libxl: do not fail device removal if backend domain is gone Marek Marczykowski-Górecki
  2018-02-26 10:32 ` Roger Pau Monné
@ 2018-02-26 10:35 ` Wei Liu
  1 sibling, 0 replies; 3+ messages in thread
From: Wei Liu @ 2018-02-26 10:35 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: Wei Liu, Ian Jackson, xen-devel

On Fri, Feb 23, 2018 at 09:00:41PM +0100, Marek Marczykowski-Górecki wrote:
> Backend domain may be independently destroyed - there is no
> synchronization of libxl structures (including /libxl tree) elsewhere.
> Backend might also remove the device info from its backend xenstore
> subtree on its own.
> 
> We have various cases (not comprehensive list):
> 
>  - both frontend and backend operational: after setting
>    be/state=XenbusStateClosing backend wait for frontend confirmation
>    and respond with be/state=XenbusStateClosed; then libxl in dom0
>    remove frontend entries and libxl in backend domain (which may be the
>    same) remove backend entries
>  - unresponsive backend/frontend: after a timeout, force=1 is used to remove
>    frontend entries, instead of just setting
>    be/state=XenbusStateClosing; then wait for be/state=XenbusStateClosed.
>    If that timeout too, remove both frontend and backend entries
>  - backend gone, with this patch: no place for setting/waiting on
>    be/state - go directly to removing frontend entries, without waiting
>    for be/state=XenbusStateClosed (this is the difference vs force=1)
> 
> Without this patch the end result is similar, both frontend and backend
> entries are removed, but in case of backend gone:
>  - libxl waits for be/state=XenbusStateClosed (and obviously timeout)
>  - return value from the function signal an error, which for example
>    confuse libvirt - it thinks the device remove failed, so is still
>    there
> 
> If such situation is detected, do not fail the removal, but finish the
> cleanup of the frontend side and return 0.
> 
> This is just workaround, the real fix should watch when the device
> backend is removed (including backend domain destruction) and remove
> frontend at that time. And report such event to higher layer code, so
> for example libvirt could synchronize its state.
> 
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-02-26 10:35 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-23 20:00 [PATCH v2] libxl: do not fail device removal if backend domain is gone Marek Marczykowski-Górecki
2018-02-26 10:32 ` Roger Pau Monné
2018-02-26 10:35 ` Wei Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).