All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/5] remove libxenctrl usage from xenstored
@ 2024-12-17 14:22 Juergen Gross
  2024-12-17 14:22 ` [PATCH v5 1/5] tools: add a dedicated header file for barrier definitions Juergen Gross
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Juergen Gross @ 2024-12-17 14:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Anthony PERARD, Andrew Cooper, Michal Orzel,
	Jan Beulich, Julien Grall, Roger Pau Monné,
	Stefano Stabellini, Daniel P. Smith, Samuel Thibault

Xenstored is using libxenctrl for only one purpose: to get information
about state of domains.

This patch series is removing that dependency by introducing a new
stable interface which can be used by xenstored instead.

There was a RFC series sent out 3 years ago, which I have taken as a
base and by addressing all comments from back then.

The main differences since that RFC series are:

- Instead of introducing an new main hypercall for a stable management
  interface I have just added a new domctl sub-op, as requested in 2021.

- I have added a new library libxenmanage for easy use of the new
  stable hypervisor interface. Main motivation for adding the library
  was the recent attempt to decouple oxenstored from the Xen git tree.
  By using the new library, oxenstored could benefit in the same way as
  xenstored from the new interface: it would be possible to rely on
  stable libraries only.

- Mini-OS has gained some more config options recently, so it was rather
  easy to make xenstore[pvh]-stubdom independent from libxenctrl, too.

- By moving the CPU barrier definitions out of xenctrl.h into a new
  dedicated header xenstored code no longer needs to #include xenctrl.h,
  thus removing any xenctrl reference from xenstored code.

Please note that the last patch can be committed only after the related
Mini-OS patch "config: add support for libxenmanage" has gone in AND
the Mini-OS commit-id has been updated in Config.mk accordingly! The
Mini-OS patch has been Acked already, so it can go in as soon as patch
4 of this series (the one introducing libxenmanage) has been committed.

Changes in V2:
- new patch 1
- former patch 5 mover earlier, now patch 2 (can go in without the rest
  of the series)
- addressed comments

Changes in V3:
- addressed comments

Changes in V4:
- patches 1 and 3 of V3 dropped, as already committed
- addressed comments

Changes in V5:
- addressed comments

Juergen Gross (5):
  tools: add a dedicated header file for barrier definitions
  xen: add bitmap to indicate per-domain state changes
  xen: add new domctl get_changed_domain
  tools/libs: add a new libxenmanage library
  tools/xenstored: use new stable interface instead of libxenctrl

 stubdom/Makefile                       |   8 +-
 stubdom/mini-os.mk                     |   1 +
 tools/9pfsd/io.c                       |   5 +-
 tools/flask/policy/modules/dom0.te     |   2 +-
 tools/flask/policy/modules/xen.if      |   4 +-
 tools/flask/policy/modules/xenstore.te |   1 +
 tools/include/xen-barrier.h            |  39 ++++++
 tools/include/xenctrl.h                |  28 +----
 tools/include/xenmanage.h              |  92 ++++++++++++++
 tools/libs/Makefile                    |   1 +
 tools/libs/ctrl/Makefile               |   2 +-
 tools/libs/manage/Makefile             |  10 ++
 tools/libs/manage/Makefile.common      |   3 +
 tools/libs/manage/core.c               | 168 +++++++++++++++++++++++++
 tools/libs/manage/libxenmanage.map     |   8 ++
 tools/libs/uselibs.mk                  |   2 +
 tools/xenstored/Makefile               |   2 +-
 tools/xenstored/Makefile.common        |   2 +-
 tools/xenstored/core.h                 |   1 -
 tools/xenstored/domain.c               |  52 +++-----
 tools/xenstored/lu.c                   |   1 +
 tools/xenstored/lu_daemon.c            |   1 +
 xen/common/domain.c                    | 127 +++++++++++++++++++
 xen/common/domctl.c                    |  18 ++-
 xen/common/event_channel.c             |  25 +++-
 xen/include/public/domctl.h            |  26 ++++
 xen/include/xen/event.h                |   7 ++
 xen/include/xen/sched.h                |   5 +
 xen/include/xsm/dummy.h                |   8 ++
 xen/include/xsm/xsm.h                  |   6 +
 xen/xsm/dummy.c                        |   1 +
 xen/xsm/flask/hooks.c                  |   7 ++
 xen/xsm/flask/policy/access_vectors    |   2 +
 33 files changed, 591 insertions(+), 74 deletions(-)
 create mode 100644 tools/include/xen-barrier.h
 create mode 100644 tools/include/xenmanage.h
 create mode 100644 tools/libs/manage/Makefile
 create mode 100644 tools/libs/manage/Makefile.common
 create mode 100644 tools/libs/manage/core.c
 create mode 100644 tools/libs/manage/libxenmanage.map

-- 
2.43.0



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v5 1/5] tools: add a dedicated header file for barrier definitions
  2024-12-17 14:22 [PATCH v5 0/5] remove libxenctrl usage from xenstored Juergen Gross
@ 2024-12-17 14:22 ` Juergen Gross
  2024-12-17 14:22 ` [PATCH v5 2/5] xen: add bitmap to indicate per-domain state changes Juergen Gross
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Juergen Gross @ 2024-12-17 14:22 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Anthony PERARD

Instead of having to include xenctrl.h for getting definitions of cpu
barriers, add a dedicated header for that purpose.

Switch the xen-9pfsd daemon to use the new header instead of xenctrl.h.

This is in preparation of making Xenstore independent from libxenctrl.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
---
V1:
- new patch
V2:
- use SPDX
- modify guard define (Anthony PERARD)
- moved earlier in the series
V4:
- use LGPL-2.1-only SPDX identifier (Anthony PERARD)
---
 tools/9pfsd/io.c            |  5 ++++-
 tools/include/xen-barrier.h | 39 +++++++++++++++++++++++++++++++++++++
 tools/include/xenctrl.h     | 28 +-------------------------
 tools/libs/ctrl/Makefile    |  2 +-
 4 files changed, 45 insertions(+), 29 deletions(-)
 create mode 100644 tools/include/xen-barrier.h

diff --git a/tools/9pfsd/io.c b/tools/9pfsd/io.c
index 468e0241f5..14cfcaf568 100644
--- a/tools/9pfsd/io.c
+++ b/tools/9pfsd/io.c
@@ -13,15 +13,18 @@
 
 #include <assert.h>
 #include <errno.h>
+#include <stdarg.h>
 #include <stdbool.h>
+#include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <syslog.h>
+#include <unistd.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <dirent.h>
 #include <fcntl.h>
-#include <xenctrl.h>           /* For cpu barriers. */
+#include <xen-barrier.h>
 #include <xen-tools/common-macros.h>
 
 #include "xen-9pfsd.h"
diff --git a/tools/include/xen-barrier.h b/tools/include/xen-barrier.h
new file mode 100644
index 0000000000..5c22ee112c
--- /dev/null
+++ b/tools/include/xen-barrier.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: LGPL-2.1-only */
+/******************************************************************************
+ * xen-barrier.h
+ *
+ * Definition of CPU barriers, part of libxenctrl.
+ *
+ * Copyright (c) 2003-2004, K A Fraser.
+ */
+
+#ifndef XEN_BARRIER_H
+#define XEN_BARRIER_H
+
+/*
+ *  DEFINITIONS FOR CPU BARRIERS
+ */
+
+#define xen_barrier() asm volatile ( "" : : : "memory")
+
+#if defined(__i386__)
+#define xen_mb()  asm volatile ( "lock addl $0, -4(%%esp)" ::: "memory" )
+#define xen_rmb() xen_barrier()
+#define xen_wmb() xen_barrier()
+#elif defined(__x86_64__)
+#define xen_mb()  asm volatile ( "lock addl $0, -32(%%rsp)" ::: "memory" )
+#define xen_rmb() xen_barrier()
+#define xen_wmb() xen_barrier()
+#elif defined(__arm__)
+#define xen_mb()   asm volatile ("dmb" : : : "memory")
+#define xen_rmb()  asm volatile ("dmb" : : : "memory")
+#define xen_wmb()  asm volatile ("dmb" : : : "memory")
+#elif defined(__aarch64__)
+#define xen_mb()   asm volatile ("dmb sy" : : : "memory")
+#define xen_rmb()  asm volatile ("dmb sy" : : : "memory")
+#define xen_wmb()  asm volatile ("dmb sy" : : : "memory")
+#else
+#error "Define barriers"
+#endif
+
+#endif /* XEN_BARRIER_H */
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 29617585c5..ea57e9dbb9 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -48,6 +48,7 @@
 #include <xen/platform.h>
 
 #include "xentoollog.h"
+#include "xen-barrier.h"
 
 #if defined(__i386__) || defined(__x86_64__)
 #include <xen/foreign/x86_32.h>
@@ -61,33 +62,6 @@
 
 #define INVALID_MFN  (~0UL)
 
-/*
- *  DEFINITIONS FOR CPU BARRIERS
- */
-
-#define xen_barrier() asm volatile ( "" : : : "memory")
-
-#if defined(__i386__)
-#define xen_mb()  asm volatile ( "lock addl $0, -4(%%esp)" ::: "memory" )
-#define xen_rmb() xen_barrier()
-#define xen_wmb() xen_barrier()
-#elif defined(__x86_64__)
-#define xen_mb()  asm volatile ( "lock addl $0, -32(%%rsp)" ::: "memory" )
-#define xen_rmb() xen_barrier()
-#define xen_wmb() xen_barrier()
-#elif defined(__arm__)
-#define xen_mb()   asm volatile ("dmb" : : : "memory")
-#define xen_rmb()  asm volatile ("dmb" : : : "memory")
-#define xen_wmb()  asm volatile ("dmb" : : : "memory")
-#elif defined(__aarch64__)
-#define xen_mb()   asm volatile ("dmb sy" : : : "memory")
-#define xen_rmb()  asm volatile ("dmb sy" : : : "memory")
-#define xen_wmb()  asm volatile ("dmb sy" : : : "memory")
-#else
-#error "Define barriers"
-#endif
-
-
 #define XENCTRL_HAS_XC_INTERFACE 1
 /* In Xen 4.0 and earlier, xc_interface_open and xc_evtchn_open would
  * both return ints being the file descriptor.  In 4.1 and later, they
diff --git a/tools/libs/ctrl/Makefile b/tools/libs/ctrl/Makefile
index 5fe0bfad0c..acce8639d3 100644
--- a/tools/libs/ctrl/Makefile
+++ b/tools/libs/ctrl/Makefile
@@ -3,7 +3,7 @@ include $(XEN_ROOT)/tools/Rules.mk
 
 include Makefile.common
 
-LIBHEADER := xenctrl.h xenctrl_compat.h
+LIBHEADER := xenctrl.h xenctrl_compat.h xen-barrier.h
 PKG_CONFIG_FILE := xencontrol.pc
 PKG_CONFIG_NAME := Xencontrol
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 2/5] xen: add bitmap to indicate per-domain state changes
  2024-12-17 14:22 [PATCH v5 0/5] remove libxenctrl usage from xenstored Juergen Gross
  2024-12-17 14:22 ` [PATCH v5 1/5] tools: add a dedicated header file for barrier definitions Juergen Gross
@ 2024-12-17 14:22 ` Juergen Gross
  2024-12-17 15:19   ` Jan Beulich
  2024-12-17 14:22 ` [PATCH v5 3/5] xen: add new domctl get_changed_domain Juergen Gross
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Juergen Gross @ 2024-12-17 14:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Andrew Cooper, Anthony PERARD, Michal Orzel,
	Jan Beulich, Julien Grall, Roger Pau Monné,
	Stefano Stabellini

Add a bitmap with one bit per possible domid indicating the respective
domain has changed its state (created, deleted, dying, crashed,
shutdown).

Registering the VIRQ_DOM_EXC event will result in setting the bits for
all existing domains and resetting all other bits.

As the usage of this bitmap is tightly coupled with the VIRQ_DOM_EXC
event, it is meant to be used only by a single consumer in the system,
just like the VIRQ_DOM_EXC event.

Resetting a bit will be done in a future patch.

This information is needed for Xenstore to keep track of all domains.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- use DOMID_FIRST_RESERVED instead of DOMID_MASK + 1 (Jan Beulich)
- use const (Jan Beulich)
- move call of domain_reset_states() into evtchn_bind_virq() (Jan Beulich)
- dynamically allocate dom_state_changed bitmap (Jan Beulich)
V3:
- use xvzalloc_array() (Jan Beulich)
- don't rename existing label (Jan Beulich)
V4:
- add __read_mostly (Jan Beulich)
- use __set_biz() (Jan Beulich)
- fix error handling in evtchn_bind_virq() (Jan Beulich)
V5:
- domain_init_states() may be called only if evtchn_bind_virq() has been
  called validly (Jan Beulich)
---
 xen/common/domain.c        | 60 ++++++++++++++++++++++++++++++++++++++
 xen/common/event_channel.c | 16 ++++++++++
 xen/include/xen/sched.h    |  3 ++
 3 files changed, 79 insertions(+)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index e33a0a5a21..87633b1f7b 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -34,6 +34,7 @@
 #include <xen/xenoprof.h>
 #include <xen/irq.h>
 #include <xen/argo.h>
+#include <xen/xvmalloc.h>
 #include <asm/p2m.h>
 #include <asm/processor.h>
 #include <public/sched.h>
@@ -138,6 +139,60 @@ bool __read_mostly vmtrace_available;
 
 bool __read_mostly vpmu_is_available;
 
+static DEFINE_SPINLOCK(dom_state_changed_lock);
+static unsigned long *__read_mostly dom_state_changed;
+
+int domain_init_states(void)
+{
+    const struct domain *d;
+    int rc = -ENOMEM;
+
+    spin_lock(&dom_state_changed_lock);
+
+    if ( dom_state_changed )
+        bitmap_zero(dom_state_changed, DOMID_FIRST_RESERVED);
+    else
+    {
+        dom_state_changed = xvzalloc_array(unsigned long,
+                                           BITS_TO_LONGS(DOMID_FIRST_RESERVED));
+        if ( !dom_state_changed )
+            goto unlock;
+    }
+
+    rcu_read_lock(&domlist_read_lock);
+
+    for_each_domain ( d )
+        __set_bit(d->domain_id, dom_state_changed);
+
+    rcu_read_unlock(&domlist_read_lock);
+
+    rc = 0;
+
+ unlock:
+    spin_unlock(&dom_state_changed_lock);
+
+    return rc;
+}
+
+void domain_deinit_states(void)
+{
+    spin_lock(&dom_state_changed_lock);
+
+    XVFREE(dom_state_changed);
+
+    spin_unlock(&dom_state_changed_lock);
+}
+
+static void domain_changed_state(const struct domain *d)
+{
+    spin_lock(&dom_state_changed_lock);
+
+    if ( dom_state_changed )
+        __set_bit(d->domain_id, dom_state_changed);
+
+    spin_unlock(&dom_state_changed_lock);
+}
+
 static void __domain_finalise_shutdown(struct domain *d)
 {
     struct vcpu *v;
@@ -152,6 +207,7 @@ static void __domain_finalise_shutdown(struct domain *d)
             return;
 
     d->is_shut_down = 1;
+    domain_changed_state(d);
     if ( (d->shutdown_code == SHUTDOWN_suspend) && d->suspend_evtchn )
         evtchn_send(d, d->suspend_evtchn);
     else
@@ -839,6 +895,7 @@ struct domain *domain_create(domid_t domid,
      */
     domlist_insert(d);
 
+    domain_changed_state(d);
     memcpy(d->handle, config->handle, sizeof(d->handle));
 
     return d;
@@ -1104,6 +1161,7 @@ int domain_kill(struct domain *d)
         /* Mem event cleanup has to go here because the rings 
          * have to be put before we call put_domain. */
         vm_event_cleanup(d);
+        domain_changed_state(d);
         put_domain(d);
         send_global_virq(VIRQ_DOM_EXC);
         /* fallthrough */
@@ -1293,6 +1351,8 @@ static void cf_check complete_domain_destroy(struct rcu_head *head)
 
     xfree(d->vcpu);
 
+    domain_changed_state(d);
+
     _domain_destroy(d);
 
     send_global_virq(VIRQ_DOM_EXC);
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 8db2ca4ba2..aa947efba7 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -469,6 +469,7 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, evtchn_port_t port)
     struct domain *d = current->domain;
     int            virq = bind->virq, vcpu = bind->vcpu;
     int            rc = 0;
+    bool           deinit_if_err = false;
 
     if ( (virq < 0) || (virq >= ARRAY_SIZE(v->virq_to_evtchn)) )
         return -EINVAL;
@@ -494,6 +495,15 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, evtchn_port_t port)
         goto out;
     }
 
+    if ( virq == VIRQ_DOM_EXC )
+    {
+        rc = domain_init_states();
+        if ( rc )
+            goto out;
+
+        deinit_if_err = true;
+    }
+
     port = rc = evtchn_get_port(d, port);
     if ( rc < 0 )
     {
@@ -527,6 +537,9 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, evtchn_port_t port)
  out:
     write_unlock(&d->event_lock);
 
+    if ( rc && deinit_if_err )
+        domain_deinit_states();
+
     return rc;
 }
 
@@ -730,6 +743,9 @@ int evtchn_close(struct domain *d1, int port1, bool guest)
         struct vcpu *v;
         unsigned long flags;
 
+        if ( chn1->u.virq == VIRQ_DOM_EXC )
+            domain_deinit_states();
+
         v = d1->vcpu[virq_is_global(chn1->u.virq) ? 0 : chn1->notify_vcpu_id];
 
         write_lock_irqsave(&v->virq_lock, flags);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 711668e028..16684bbaf9 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -800,6 +800,9 @@ void domain_resume(struct domain *d);
 
 int domain_soft_reset(struct domain *d, bool resuming);
 
+int domain_init_states(void);
+void domain_deinit_states(void);
+
 int vcpu_start_shutdown_deferral(struct vcpu *v);
 void vcpu_end_shutdown_deferral(struct vcpu *v);
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 3/5] xen: add new domctl get_changed_domain
  2024-12-17 14:22 [PATCH v5 0/5] remove libxenctrl usage from xenstored Juergen Gross
  2024-12-17 14:22 ` [PATCH v5 1/5] tools: add a dedicated header file for barrier definitions Juergen Gross
  2024-12-17 14:22 ` [PATCH v5 2/5] xen: add bitmap to indicate per-domain state changes Juergen Gross
@ 2024-12-17 14:22 ` Juergen Gross
  2024-12-17 14:41   ` Jan Beulich
  2024-12-18  1:20   ` Daniel P. Smith
  2024-12-17 14:22 ` [PATCH v5 4/5] tools/libs: add a new libxenmanage library Juergen Gross
  2024-12-17 14:22 ` [PATCH v5 5/5] tools/xenstored: use new stable interface instead of libxenctrl Juergen Gross
  4 siblings, 2 replies; 16+ messages in thread
From: Juergen Gross @ 2024-12-17 14:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Daniel P. Smith, Anthony PERARD, Andrew Cooper,
	Michal Orzel, Jan Beulich, Julien Grall, Roger Pau Monné,
	Stefano Stabellini

Add a new domctl sub-function to get data of a domain having changed
state (this is needed by Xenstore).

The returned state just contains the domid, the domain unique id,
and some flags (existing, shutdown, dying).

In order to enable Xenstore stubdom being built for multiple Xen
versions, make this domctl stable.  For stable domctls the
interface_version is always 0.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1:
- use a domctl subop for the new interface (Jan Beulich)
V2:
- fix XSM hooks (Daniel P. Smith)
- remove versioning of stable sub-ops (Jan Beulich)
- use domctl.domain for retuning domid of a changed domain (Jan Beulich)
- simplify locking in get_domain_state() (Jan Beulich)
- undo stray change in event_channel.c (Jan Beulich)
V3:
- have disjunct states "dying" and "dead" (Jan Beulich)
- check padding fields to be 0 (Jan Beulich)
- drop memset() (Jan Beulich)
V4:
- add locking in get_domain_state() (Jan Beulich)
- only allow querying domain having changed state by domain receiving
  VIRQ_DOM_EXC events (Jan Beulich)
V5:
- use memset() (Jan Beulich)
---
 tools/flask/policy/modules/dom0.te     |  2 +-
 tools/flask/policy/modules/xen.if      |  4 +-
 tools/flask/policy/modules/xenstore.te |  1 +
 xen/common/domain.c                    | 67 ++++++++++++++++++++++++++
 xen/common/domctl.c                    | 18 ++++++-
 xen/common/event_channel.c             |  9 +++-
 xen/include/public/domctl.h            | 26 ++++++++++
 xen/include/xen/event.h                |  7 +++
 xen/include/xen/sched.h                |  2 +
 xen/include/xsm/dummy.h                |  8 +++
 xen/include/xsm/xsm.h                  |  6 +++
 xen/xsm/dummy.c                        |  1 +
 xen/xsm/flask/hooks.c                  |  7 +++
 xen/xsm/flask/policy/access_vectors    |  2 +
 14 files changed, 155 insertions(+), 5 deletions(-)

diff --git a/tools/flask/policy/modules/dom0.te b/tools/flask/policy/modules/dom0.te
index 16b8c9646d..6043c01b12 100644
--- a/tools/flask/policy/modules/dom0.te
+++ b/tools/flask/policy/modules/dom0.te
@@ -40,7 +40,7 @@ allow dom0_t dom0_t:domain {
 };
 allow dom0_t dom0_t:domain2 {
 	set_cpu_policy gettsc settsc setscheduler set_vnumainfo
-	get_vnumainfo psr_cmt_op psr_alloc get_cpu_policy
+	get_vnumainfo psr_cmt_op psr_alloc get_cpu_policy get_domain_state
 };
 allow dom0_t dom0_t:resource { add remove };
 
diff --git a/tools/flask/policy/modules/xen.if b/tools/flask/policy/modules/xen.if
index 11c1562aa5..2e06f3ed94 100644
--- a/tools/flask/policy/modules/xen.if
+++ b/tools/flask/policy/modules/xen.if
@@ -54,7 +54,7 @@ define(`create_domain_common', `
 	allow $1 $2:domain2 { set_cpu_policy settsc setscheduler setclaim
 			set_vnumainfo get_vnumainfo cacheflush
 			psr_cmt_op psr_alloc soft_reset
-			resource_map get_cpu_policy };
+			resource_map get_cpu_policy get_domain_state };
 	allow $1 $2:security check_context;
 	allow $1 $2:shadow enable;
 	allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage mmuext_op updatemp };
@@ -94,7 +94,7 @@ define(`manage_domain', `
 			getaddrsize pause unpause trigger shutdown destroy
 			setaffinity setdomainmaxmem getscheduler resume
 			setpodtarget getpodtarget getpagingmempool setpagingmempool };
-    allow $1 $2:domain2 set_vnumainfo;
+    allow $1 $2:domain2 { set_vnumainfo get_domain_state };
 ')
 
 # migrate_domain_out(priv, target)
diff --git a/tools/flask/policy/modules/xenstore.te b/tools/flask/policy/modules/xenstore.te
index 519566ab81..49de53ebe2 100644
--- a/tools/flask/policy/modules/xenstore.te
+++ b/tools/flask/policy/modules/xenstore.te
@@ -13,6 +13,7 @@ allow dom0_t xenstore_t:domain set_virq_handler;
 allow xenstore_t xen_t:xen writeconsole;
 # Xenstore queries domaininfo on all domains
 allow xenstore_t domain_type:domain getdomaininfo;
+allow xenstore_t domain_type:domain2 get_domain_state;
 
 # As a shortcut, the following 3 rules are used instead of adding a domain_comms
 # rule between xenstore_t and every domain type that talks to xenstore
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 87633b1f7b..1580754b53 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -193,6 +193,73 @@ static void domain_changed_state(const struct domain *d)
     spin_unlock(&dom_state_changed_lock);
 }
 
+static void set_domain_state_info(struct xen_domctl_get_domain_state *info,
+                                  const struct domain *d)
+{
+    info->state = XEN_DOMCTL_GETDOMSTATE_STATE_EXIST;
+    if ( d->is_shut_down )
+        info->state |= XEN_DOMCTL_GETDOMSTATE_STATE_SHUTDOWN;
+    if ( d->is_dying == DOMDYING_dying )
+        info->state |= XEN_DOMCTL_GETDOMSTATE_STATE_DYING;
+    if ( d->is_dying == DOMDYING_dead )
+        info->state |= XEN_DOMCTL_GETDOMSTATE_STATE_DEAD;
+    info->unique_id = d->unique_id;
+}
+
+int get_domain_state(struct xen_domctl_get_domain_state *info, struct domain *d,
+                     domid_t *domid)
+{
+    unsigned int dom;
+    int rc = -ENOENT;
+
+    if ( info->pad0 || info->pad1 )
+        return -EINVAL;
+
+    if ( d )
+    {
+        set_domain_state_info(info, d);
+
+        return 0;
+    }
+
+    /*
+     * Only domain registered for VIRQ_DOM_EXC event is allowed to query
+     * domains having changed state.
+     */
+    if ( !domain_handles_global_virq(current->domain, VIRQ_DOM_EXC) )
+        return -EACCES;
+
+    spin_lock(&dom_state_changed_lock);
+
+    if ( dom_state_changed )
+    {
+        dom = find_first_bit(dom_state_changed, DOMID_MASK + 1);
+        if ( dom < DOMID_FIRST_RESERVED )
+        {
+            __clear_bit(dom, dom_state_changed);
+
+            *domid = dom;
+
+            d = rcu_lock_domain_by_id(dom);
+
+            if ( d )
+            {
+                set_domain_state_info(info, d);
+
+                rcu_unlock_domain(d);
+            }
+            else
+                memset(info, 0, sizeof(*info));
+
+            rc = 0;
+        }
+    }
+
+    spin_unlock(&dom_state_changed_lock);
+
+    return rc;
+}
+
 static void __domain_finalise_shutdown(struct domain *d)
 {
     struct vcpu *v;
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 444e072fdc..802bd7596e 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -278,6 +278,11 @@ static struct vnuma_info *vnuma_init(const struct xen_domctl_vnuma *uinfo,
     return ERR_PTR(ret);
 }
 
+static bool is_stable_domctl(uint32_t cmd)
+{
+    return cmd == XEN_DOMCTL_get_domain_state;
+}
+
 long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
 {
     long ret = 0;
@@ -288,7 +293,8 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
     if ( copy_from_guest(op, u_domctl, 1) )
         return -EFAULT;
 
-    if ( op->interface_version != XEN_DOMCTL_INTERFACE_VERSION )
+    if ( op->interface_version !=
+         (is_stable_domctl(op->cmd) ? 0 : XEN_DOMCTL_INTERFACE_VERSION) )
         return -EACCES;
 
     switch ( op->cmd )
@@ -309,6 +315,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
         fallthrough;
     case XEN_DOMCTL_test_assign_device:
     case XEN_DOMCTL_vm_event_op:
+    case XEN_DOMCTL_get_domain_state:
         if ( op->domain == DOMID_INVALID )
         {
             d = NULL;
@@ -866,6 +873,15 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
                 __HYPERVISOR_domctl, "h", u_domctl);
         break;
 
+    case XEN_DOMCTL_get_domain_state:
+        ret = xsm_get_domain_state(XSM_XS_PRIV, d);
+        if ( ret )
+            break;
+
+        copyback = 1;
+        ret = get_domain_state(&op->u.get_domain_state, d, &op->domain);
+        break;
+
     default:
         ret = arch_do_domctl(op, d, u_domctl);
         break;
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index aa947efba7..9ff486d28e 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -992,6 +992,13 @@ void send_global_virq(uint32_t virq)
     send_guest_global_virq(global_virq_handlers[virq] ?: hardware_domain, virq);
 }
 
+bool domain_handles_global_virq(struct domain *d, uint32_t virq)
+{
+    ASSERT(virq_is_global(virq));
+
+    return global_virq_handlers[virq] == d;
+}
+
 int set_global_virq_handler(struct domain *d, uint32_t virq)
 {
     struct domain *old;
@@ -1001,7 +1008,7 @@ int set_global_virq_handler(struct domain *d, uint32_t virq)
     if (!virq_is_global(virq))
         return -EINVAL;
 
-    if (global_virq_handlers[virq] == d)
+    if ( domain_handles_global_virq(d, virq) )
         return 0;
 
     if (unlikely(!get_domain(d)))
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 353f831e40..13fe17ab0a 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -28,6 +28,7 @@
  * Pure additions (e.g. new sub-commands) or compatible interface changes
  * (e.g. adding semantics to 0-checked input fields or data to zeroed output
  * fields) don't require a change of the version.
+ * Stable ops are NOT covered by XEN_DOMCTL_INTERFACE_VERSION!
  *
  * Last version bump: Xen 4.19
  */
@@ -1236,7 +1237,30 @@ struct xen_domctl_dt_overlay {
 };
 #endif
 
+/*
+ * XEN_DOMCTL_get_domain_state (stable interface)
+ *
+ * Get state information of a domain.
+ *
+ * In case domain is DOMID_INVALID, return information about a domain having
+ * changed state and reset the state change indicator for that domain. This
+ * function is usable only by a domain having registered the VIRQ_DOM_EXC
+ * event (normally Xenstore).
+ * NB. xen_domctl.domain is an IN/OUT parameter for this operation.
+ */
+struct xen_domctl_get_domain_state {
+    uint16_t state;
+#define XEN_DOMCTL_GETDOMSTATE_STATE_EXIST     0x0001  /* Domain is existing. */
+#define XEN_DOMCTL_GETDOMSTATE_STATE_SHUTDOWN  0x0002  /* Shutdown finished. */
+#define XEN_DOMCTL_GETDOMSTATE_STATE_DYING     0x0004  /* Domain dying. */
+#define XEN_DOMCTL_GETDOMSTATE_STATE_DEAD      0x0008  /* Domain dead. */
+    uint16_t pad0;           /* Must be 0 on input, returned as 0. */
+    uint32_t pad1;           /* Must be 0 on input, returned as 0. */
+    uint64_t unique_id;      /* Unique domain identifier. */
+};
+
 struct xen_domctl {
+/* Stable domctl ops: interface_version is required to be 0.  */
     uint32_t cmd;
 #define XEN_DOMCTL_createdomain                   1
 #define XEN_DOMCTL_destroydomain                  2
@@ -1325,6 +1349,7 @@ struct xen_domctl {
 #define XEN_DOMCTL_set_paging_mempool_size       86
 #define XEN_DOMCTL_dt_overlay                    87
 #define XEN_DOMCTL_gsi_permission                88
+#define XEN_DOMCTL_get_domain_state              89 /* stable interface */
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
@@ -1391,6 +1416,7 @@ struct xen_domctl {
 #if defined(__arm__) || defined(__aarch64__)
         struct xen_domctl_dt_overlay        dt_overlay;
 #endif
+        struct xen_domctl_get_domain_state  get_domain_state;
         uint8_t                             pad[128];
     } u;
 };
diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
index 48b79f3d62..3a119505c6 100644
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -36,6 +36,13 @@ void send_global_virq(uint32_t virq);
  */
 void send_guest_global_virq(struct domain *d, uint32_t virq);
 
+/*
+ * domain_handles_global_virq:
+ *  @d:        Suspected target domain for this VIRQ
+ *  @virq:     Virtual IRQ number (VIRQ_*), must be global
+ */
+bool domain_handles_global_virq(struct domain *d, uint32_t virq);
+
 /*
  * sent_global_virq_handler: Set a global VIRQ handler.
  *  @d:        New target domain for this VIRQ
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 16684bbaf9..5e9858c3ba 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -802,6 +802,8 @@ int domain_soft_reset(struct domain *d, bool resuming);
 
 int domain_init_states(void);
 void domain_deinit_states(void);
+int get_domain_state(struct xen_domctl_get_domain_state *info,
+                     struct domain *d, domid_t *domid);
 
 int vcpu_start_shutdown_deferral(struct vcpu *v);
 void vcpu_end_shutdown_deferral(struct vcpu *v);
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index f8a3c4b81e..a1a5bb60e9 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -173,6 +173,7 @@ static XSM_INLINE int cf_check xsm_domctl(
     case XEN_DOMCTL_unbind_pt_irq:
         return xsm_default_action(XSM_DM_PRIV, current->domain, d);
     case XEN_DOMCTL_getdomaininfo:
+    case XEN_DOMCTL_get_domain_state:
         return xsm_default_action(XSM_XS_PRIV, current->domain, d);
     default:
         return xsm_default_action(XSM_PRIV, current->domain, d);
@@ -815,6 +816,13 @@ static XSM_INLINE int cf_check xsm_argo_send(
 
 #endif /* CONFIG_ARGO */
 
+static XSM_INLINE int cf_check xsm_get_domain_state(
+    XSM_DEFAULT_ARG struct domain *d)
+{
+    XSM_ASSERT_ACTION(XSM_XS_PRIV);
+    return xsm_default_action(action, current->domain, d);
+}
+
 #include <public/version.h>
 static XSM_INLINE int cf_check xsm_xen_version(XSM_DEFAULT_ARG uint32_t op)
 {
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 4dbff9d866..0689bf5c9f 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -200,6 +200,7 @@ struct xsm_ops {
     int (*argo_register_any_source)(const struct domain *d);
     int (*argo_send)(const struct domain *d, const struct domain *t);
 #endif
+    int (*get_domain_state)(struct domain *d);
 };
 
 #ifdef CONFIG_XSM
@@ -774,6 +775,11 @@ static inline int xsm_argo_send(const struct domain *d, const struct domain *t)
 
 #endif /* CONFIG_ARGO */
 
+static inline int xsm_get_domain_state(struct domain *d)
+{
+    return alternative_call(xsm_ops.get_domain_state, d);
+}
+
 #endif /* XSM_NO_WRAPPERS */
 
 #ifdef CONFIG_MULTIBOOT
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index e6ffa948f7..ce6fbdc6c5 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -148,6 +148,7 @@ static const struct xsm_ops __initconst_cf_clobber dummy_ops = {
     .argo_register_any_source      = xsm_argo_register_any_source,
     .argo_send                     = xsm_argo_send,
 #endif
+    .get_domain_state              = xsm_get_domain_state,
 };
 
 void __init xsm_fixup_ops(struct xsm_ops *ops)
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index a79474ffe4..e110846ad9 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -688,6 +688,7 @@ static int cf_check flask_domctl(struct domain *d, unsigned int cmd,
     case XEN_DOMCTL_memory_mapping:
     case XEN_DOMCTL_set_target:
     case XEN_DOMCTL_vm_event_op:
+    case XEN_DOMCTL_get_domain_state:
 
     /* These have individual XSM hooks (arch/../domctl.c) */
     case XEN_DOMCTL_bind_pt_irq:
@@ -1856,6 +1857,11 @@ static int cf_check flask_argo_send(
 
 #endif
 
+static int cf_check flask_get_domain_state(struct domain *d)
+{
+    return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__GET_DOMAIN_STATE);
+}
+
 static const struct xsm_ops __initconst_cf_clobber flask_ops = {
     .set_system_active = flask_set_system_active,
     .security_domaininfo = flask_security_domaininfo,
@@ -1992,6 +1998,7 @@ static const struct xsm_ops __initconst_cf_clobber flask_ops = {
     .argo_register_any_source = flask_argo_register_any_source,
     .argo_send = flask_argo_send,
 #endif
+    .get_domain_state = flask_get_domain_state,
 };
 
 const struct xsm_ops *__init flask_init(
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index a35e3d4c51..c9a8eeda4e 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -251,6 +251,8 @@ class domain2
     resource_map
 # XEN_DOMCTL_get_cpu_policy
     get_cpu_policy
+# XEN_DOMCTL_get_domain_state
+    get_domain_state
 }
 
 # Similar to class domain, but primarily contains domctls related to HVM domains
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 4/5] tools/libs: add a new libxenmanage library
  2024-12-17 14:22 [PATCH v5 0/5] remove libxenctrl usage from xenstored Juergen Gross
                   ` (2 preceding siblings ...)
  2024-12-17 14:22 ` [PATCH v5 3/5] xen: add new domctl get_changed_domain Juergen Gross
@ 2024-12-17 14:22 ` Juergen Gross
  2024-12-17 14:22 ` [PATCH v5 5/5] tools/xenstored: use new stable interface instead of libxenctrl Juergen Gross
  4 siblings, 0 replies; 16+ messages in thread
From: Juergen Gross @ 2024-12-17 14:22 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Anthony PERARD

In order to have a stable interface in user land for using stable
domctl and possibly later sysctl interfaces, add a new library
libxenmanage.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
---
V1:
- new patch
V2:
- define __XEN_TOOLS__ via Makefile (Anthony PERARD)
- use SPDX in header file (Anthony PERARD)
- change function name to xenmanage_poll_changed_domain() (Anthony PERARD)
- add short library description (Anthony PERARD)
- narrow scope of xen_domctl_get_domain_state pointer (Anthony PERARD)
V4:
- use LGPL-2.1-only SPDX identifier (Anthony PERARD)
---
 tools/include/xenmanage.h          |  92 ++++++++++++++++
 tools/libs/Makefile                |   1 +
 tools/libs/manage/Makefile         |  10 ++
 tools/libs/manage/Makefile.common  |   3 +
 tools/libs/manage/core.c           | 168 +++++++++++++++++++++++++++++
 tools/libs/manage/libxenmanage.map |   8 ++
 tools/libs/uselibs.mk              |   2 +
 7 files changed, 284 insertions(+)
 create mode 100644 tools/include/xenmanage.h
 create mode 100644 tools/libs/manage/Makefile
 create mode 100644 tools/libs/manage/Makefile.common
 create mode 100644 tools/libs/manage/core.c
 create mode 100644 tools/libs/manage/libxenmanage.map

diff --git a/tools/include/xenmanage.h b/tools/include/xenmanage.h
new file mode 100644
index 0000000000..956b7a0a44
--- /dev/null
+++ b/tools/include/xenmanage.h
@@ -0,0 +1,92 @@
+/* SPDX-License-Identifier: LGPL-2.1-only */
+/*
+ * Copyright (c) 2024 SUSE Software Solutions Germany GmbH
+ *
+ * Interfaces of libxenmanage.
+ *
+ * libxenmanage provides management functions for the host using stable
+ * hypercall interfaces.
+ */
+#ifndef XENMANAGE_H
+#define XENMANAGE_H
+
+#include <stdint.h>
+
+/* Avoid the need to #include <xentoollog.h> */
+struct xentoollog_logger;
+
+typedef struct xenmanage_handle xenmanage_handle;
+
+/*
+ * Open libxenmanage.
+ *
+ * Get a handle of the xenmanage library. The handle is required for all
+ * further operations of the library.
+ * Parameters:
+ *   logger:     Logging function to use. If NULL logging is done to stderr.
+ *   open_flags: Only 0 supported.
+ * Return value: Handle or NULL if error.
+ */
+xenmanage_handle *xenmanage_open(struct xentoollog_logger *logger,
+                                 unsigned int open_flags);
+
+/*
+ * Close libxenmanage.
+ *
+ * Return a handle of the xenmanage library.
+ * Parameters:
+ *    hdl: Handle obtained by xenmanage_open().
+ * Return value: always 0.
+ */
+int xenmanage_close(xenmanage_handle *hdl);
+
+#define XENMANAGE_GETDOMSTATE_STATE_EXIST     0x0001  /* Domain is existing. */
+#define XENMANAGE_GETDOMSTATE_STATE_SHUTDOWN  0x0002  /* Shutdown finished. */
+#define XENMANAGE_GETDOMSTATE_STATE_DYING     0x0004  /* Domain dying. */
+#define XENMANAGE_GETDOMSTATE_STATE_DEAD      0x0008  /* Domain dead. */
+
+/*
+ * Return state information of an existing domain.
+ *
+ * Returns the domain state and unique id of the given domain.
+ * Parameters:
+ *   hdl:       handle returned by xenmanage_open()
+ *   domid:     domain id of the domain to get the information for
+ *   state:     where to store the state (XENMANAGE_GETDOMSTATE_STATE_ flags,
+ *              nothing stored if NULL)
+ *   unique_id: where to store the unique id of the domain (nothing stored if
+ *              NULL)
+ * Return value: 0 if information was stored, -1 else (errno is set)
+ */
+int xenmanage_get_domain_info(xenmanage_handle *hdl, unsigned int domid,
+                              unsigned int *state, uint64_t *unique_id);
+
+/*
+ * Return information of a domain having changed state recently.
+ *
+ * Returns the domain id, state and unique id of a domain having changed
+ * state (any of the state bits was modified) since the last time information
+ * for that domain was returned by this function. Only usable by callers who
+ * have registered the VIRQ_DOM_EXC event (normally Xenstore).
+ * Parameters:
+ *   hdl:       handle returned by xenmanage_open()
+ *   domid:     where to store the domid of the domain (not NULL)
+ *   state:     where to store the state (XENMANAGE_GETDOMSTATE_STATE_ flags,
+ *              nothing stored if NULL)
+ *   unique_id: where to store the unique id of the domain (nothing stored if
+ *              NULL)
+ * Return value: 0 if information was stored, -1 else (errno is set)
+ */
+int xenmanage_poll_changed_domain(xenmanage_handle *hdl, unsigned int *domid,
+                                  unsigned int *state, uint64_t *unique_id);
+#endif /* XENMANAGE_H */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libs/Makefile b/tools/libs/Makefile
index 1afcd12e2b..d39516c1b3 100644
--- a/tools/libs/Makefile
+++ b/tools/libs/Makefile
@@ -12,6 +12,7 @@ SUBDIRS-y += devicemodel
 SUBDIRS-y += ctrl
 SUBDIRS-y += guest
 SUBDIRS-y += hypfs
+SUBDIRS-y += manage
 SUBDIRS-y += store
 SUBDIRS-y += stat
 SUBDIRS-$(CONFIG_Linux) += vchan
diff --git a/tools/libs/manage/Makefile b/tools/libs/manage/Makefile
new file mode 100644
index 0000000000..dbfe70d259
--- /dev/null
+++ b/tools/libs/manage/Makefile
@@ -0,0 +1,10 @@
+XEN_ROOT = $(CURDIR)/../../..
+include $(XEN_ROOT)/tools/Rules.mk
+
+MAJOR    = 1
+MINOR    = 0
+version-script := libxenmanage.map
+
+include Makefile.common
+
+include $(XEN_ROOT)/tools/libs/libs.mk
diff --git a/tools/libs/manage/Makefile.common b/tools/libs/manage/Makefile.common
new file mode 100644
index 0000000000..533ba30fba
--- /dev/null
+++ b/tools/libs/manage/Makefile.common
@@ -0,0 +1,3 @@
+CFLAGS += -D__XEN_TOOLS__
+
+OBJS-y  += core.o
diff --git a/tools/libs/manage/core.c b/tools/libs/manage/core.c
new file mode 100644
index 0000000000..b5fa67b036
--- /dev/null
+++ b/tools/libs/manage/core.c
@@ -0,0 +1,168 @@
+/*
+ * Copyright (c) 2024 SUSE Software Solutions Germany GmbH
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#define _GNU_SOURCE
+
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <xentoollog.h>
+#include <xenmanage.h>
+#include <xencall.h>
+#include <xentoolcore_internal.h>
+
+#include <xen/xen.h>
+#include <xen/domctl.h>
+
+struct xenmanage_handle {
+    xentoollog_logger *logger, *logger_tofree;
+    unsigned int flags;
+    xencall_handle *xcall;
+};
+
+xenmanage_handle *xenmanage_open(xentoollog_logger *logger,
+                                 unsigned open_flags)
+{
+    xenmanage_handle *hdl = calloc(1, sizeof(*hdl));
+    int saved_errno;
+
+    if ( !hdl )
+        return NULL;
+
+    if ( open_flags )
+    {
+        errno = EINVAL;
+        goto err;
+    }
+
+    hdl->flags = open_flags;
+    hdl->logger = logger;
+    hdl->logger_tofree = NULL;
+
+    if ( !hdl->logger )
+    {
+        hdl->logger = hdl->logger_tofree =
+            (xentoollog_logger *)
+            xtl_createlogger_stdiostream(stderr, XTL_PROGRESS, 0);
+        if ( !hdl->logger )
+            goto err;
+    }
+
+    hdl->xcall = xencall_open(hdl->logger, 0);
+    if ( !hdl->xcall )
+        goto err;
+
+    return hdl;
+
+err:
+    saved_errno = errno;
+    xenmanage_close(hdl);
+    errno = saved_errno;
+
+    return NULL;
+}
+
+int xenmanage_close(xenmanage_handle *hdl)
+{
+    if ( !hdl )
+        return 0;
+
+    xencall_close(hdl->xcall);
+    xtl_logger_destroy(hdl->logger_tofree);
+    free(hdl);
+    return 0;
+}
+
+static int xenmanage_do_domctl_get_domain_state(xenmanage_handle *hdl,
+                                                unsigned int domid_in,
+                                                unsigned int *domid_out,
+                                                unsigned int *state,
+                                                uint64_t *unique_id)
+{
+    struct xen_domctl *buf;
+    int saved_errno;
+    int ret;
+
+    buf = xencall_alloc_buffer(hdl->xcall, sizeof(*buf));
+    if ( !buf )
+    {
+        errno = ENOMEM;
+        return -1;
+    }
+
+    memset(buf, 0, sizeof(*buf));
+
+    buf->cmd = XEN_DOMCTL_get_domain_state;
+    buf->domain = domid_in;
+
+    ret = xencall1(hdl->xcall, __HYPERVISOR_domctl, (unsigned long)buf);
+    saved_errno = errno;
+    if ( !ret )
+    {
+        struct xen_domctl_get_domain_state *st = &buf->u.get_domain_state;
+
+        if ( domid_out )
+            *domid_out = buf->domain;
+        if ( state )
+        {
+            *state = 0;
+            if ( st->state & XEN_DOMCTL_GETDOMSTATE_STATE_EXIST )
+                *state |= XENMANAGE_GETDOMSTATE_STATE_EXIST;
+            if ( st->state & XEN_DOMCTL_GETDOMSTATE_STATE_SHUTDOWN )
+                *state |= XENMANAGE_GETDOMSTATE_STATE_SHUTDOWN;
+            if ( st->state & XEN_DOMCTL_GETDOMSTATE_STATE_DYING )
+                *state |= XENMANAGE_GETDOMSTATE_STATE_DYING;
+            if ( st->state & XEN_DOMCTL_GETDOMSTATE_STATE_DEAD )
+                *state |= XENMANAGE_GETDOMSTATE_STATE_DEAD;
+        }
+        if ( unique_id )
+            *unique_id = st->unique_id;
+    }
+
+    xencall_free_buffer(hdl->xcall, buf);
+
+    errno = saved_errno;
+
+    return ret;
+}
+
+int xenmanage_get_domain_info(xenmanage_handle *hdl, unsigned int domid,
+                              unsigned int *state, uint64_t *unique_id)
+{
+    if ( !hdl || domid >= DOMID_FIRST_RESERVED )
+    {
+        errno = EINVAL;
+        return -1;
+    }
+
+    return xenmanage_do_domctl_get_domain_state(hdl, domid, NULL, state,
+                                                unique_id);
+}
+
+int xenmanage_poll_changed_domain(xenmanage_handle *hdl, unsigned int *domid,
+                                  unsigned int *state, uint64_t *unique_id)
+{
+    if ( !hdl || !domid )
+    {
+        errno = EINVAL;
+        return -1;
+    }
+
+    return xenmanage_do_domctl_get_domain_state(hdl, DOMID_INVALID, domid,
+                                                state, unique_id);
+}
diff --git a/tools/libs/manage/libxenmanage.map b/tools/libs/manage/libxenmanage.map
new file mode 100644
index 0000000000..64c793e603
--- /dev/null
+++ b/tools/libs/manage/libxenmanage.map
@@ -0,0 +1,8 @@
+VERS_1.0 {
+	global:
+		xenmanage_open;
+		xenmanage_close;
+		xenmanage_get_domain_info;
+		xenmanage_poll_changed_domain;
+	local: *; /* Do not expose anything by default */
+};
diff --git a/tools/libs/uselibs.mk b/tools/libs/uselibs.mk
index 7aa8d83e06..c0a234cfec 100644
--- a/tools/libs/uselibs.mk
+++ b/tools/libs/uselibs.mk
@@ -16,6 +16,8 @@ LIBS_LIBS += devicemodel
 USELIBS_devicemodel := toollog toolcore call
 LIBS_LIBS += hypfs
 USELIBS_hypfs := toollog toolcore call
+LIBS_LIBS += manage
+USELIBS_manage := toollog toolcore call
 LIBS_LIBS += ctrl
 USELIBS_ctrl := toollog call evtchn gnttab foreignmemory devicemodel
 LIBS_LIBS += guest
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 5/5] tools/xenstored: use new stable interface instead of libxenctrl
  2024-12-17 14:22 [PATCH v5 0/5] remove libxenctrl usage from xenstored Juergen Gross
                   ` (3 preceding siblings ...)
  2024-12-17 14:22 ` [PATCH v5 4/5] tools/libs: add a new libxenmanage library Juergen Gross
@ 2024-12-17 14:22 ` Juergen Gross
  2024-12-17 17:33   ` Anthony PERARD
  4 siblings, 1 reply; 16+ messages in thread
From: Juergen Gross @ 2024-12-17 14:22 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Anthony PERARD, Samuel Thibault, Julien Grall

Replace the current use of the unstable xc_domain_getinfo_single()
interface with the stable domctl XEN_DOMCTL_get_domain_state call
via the new libxenmanage library.

This will remove the last usage of libxenctrl by Xenstore, so update
the library dependencies accordingly.

For now only do a direct replacement without using the functionality
of obtaining information about domains having changed the state.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
---
V1:
- use library instead of direct hypercall, only replace current
  libxenctrl use case

Please note that this patch can be committed only after the related
Mini-OS patch "config: add support for libxenmanage" has gone in AND
the Mini-OS commit-id has been updated in Config.mk accordingly!

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 stubdom/Makefile                |  8 ++---
 stubdom/mini-os.mk              |  1 +
 tools/xenstored/Makefile        |  2 +-
 tools/xenstored/Makefile.common |  2 +-
 tools/xenstored/core.h          |  1 -
 tools/xenstored/domain.c        | 52 ++++++++++++---------------------
 tools/xenstored/lu.c            |  1 +
 tools/xenstored/lu_daemon.c     |  1 +
 8 files changed, 28 insertions(+), 40 deletions(-)

diff --git a/stubdom/Makefile b/stubdom/Makefile
index 2a81af28a1..ca800b243c 100644
--- a/stubdom/Makefile
+++ b/stubdom/Makefile
@@ -307,7 +307,7 @@ endif
 # libraries under tools/libs
 #######
 
-STUB_LIBS := toolcore toollog evtchn gnttab call foreignmemory devicemodel ctrl guest
+STUB_LIBS := toolcore toollog evtchn gnttab call foreignmemory devicemodel ctrl guest manage
 
 LIBDEP_guest := cross-zlib
 
@@ -465,7 +465,7 @@ grub: cross-polarssl grub-upstream $(CROSS_ROOT) grub-$(XEN_TARGET_ARCH)-minios-
 # xenstore
 ##########
 
-xenstore-minios.gen.cfg: APP_LIBS = gnttab evtchn toollog ctrl
+xenstore-minios.gen.cfg: APP_LIBS = gnttab evtchn toollog manage
 xenstore-minios.gen.cfg: xenstore-minios.cfg Makefile
 	$(GEN_config) >$@
 
@@ -480,7 +480,7 @@ xenstore: $(CROSS_ROOT) xenstore-minios-config.mk
 # xenstorepvh
 #############
 
-xenstorepvh-minios.gen.cfg: APP_LIBS = gnttab evtchn toollog ctrl
+xenstorepvh-minios.gen.cfg: APP_LIBS = gnttab evtchn toollog manage
 xenstorepvh-minios.gen.cfg: xenstorepvh-minios.cfg Makefile
 	$(GEN_config) >$@
 
@@ -523,7 +523,7 @@ else
 pv-grub-if-enabled:
 endif
 
-XENSTORE_DEPS := libxenevtchn libxengnttab libxenctrl
+XENSTORE_DEPS := libxenevtchn libxengnttab libxenmanage
 
 .PHONY: xenstore-stubdom
 xenstore-stubdom: mini-os-$(XEN_TARGET_ARCH)-xenstore $(XENSTORE_DEPS) xenstore
diff --git a/stubdom/mini-os.mk b/stubdom/mini-os.mk
index 7e4968e026..be32302f9e 100644
--- a/stubdom/mini-os.mk
+++ b/stubdom/mini-os.mk
@@ -13,5 +13,6 @@ GNTTAB_PATH = $(XEN_ROOT)/stubdom/libs-$(MINIOS_TARGET_ARCH)/gnttab
 CALL_PATH = $(XEN_ROOT)/stubdom/libs-$(MINIOS_TARGET_ARCH)/call
 FOREIGNMEMORY_PATH = $(XEN_ROOT)/stubdom/libs-$(MINIOS_TARGET_ARCH)/foreignmemory
 DEVICEMODEL_PATH = $(XEN_ROOT)/stubdom/libs-$(MINIOS_TARGET_ARCH)/devicemodel
+MANAGE_PATH = $(XEN_ROOT)/stubdom/libs-$(MINIOS_TARGET_ARCH)/manage
 CTRL_PATH = $(XEN_ROOT)/stubdom/libs-$(MINIOS_TARGET_ARCH)/ctrl
 GUEST_PATH = $(XEN_ROOT)/stubdom/libs-$(MINIOS_TARGET_ARCH)/guest
diff --git a/tools/xenstored/Makefile b/tools/xenstored/Makefile
index 09adfe1d50..81c42838e0 100644
--- a/tools/xenstored/Makefile
+++ b/tools/xenstored/Makefile
@@ -5,7 +5,7 @@ include Makefile.common
 
 xenstored: LDLIBS += $(LDLIBS_libxenevtchn)
 xenstored: LDLIBS += $(LDLIBS_libxengnttab)
-xenstored: LDLIBS += $(LDLIBS_libxenctrl)
+xenstored: LDLIBS += $(LDLIBS_libxenmanage)
 xenstored: LDLIBS += -lrt
 xenstored: LDLIBS += $(SOCKET_LIBS)
 
diff --git a/tools/xenstored/Makefile.common b/tools/xenstored/Makefile.common
index 27fdb3b49e..271134fcc1 100644
--- a/tools/xenstored/Makefile.common
+++ b/tools/xenstored/Makefile.common
@@ -12,7 +12,7 @@ XENSTORED_OBJS-$(CONFIG_MiniOS) += minios.o lu_minios.o
 # Include configure output (config.h)
 CFLAGS += -include $(XEN_ROOT)/tools/config.h
 CFLAGS += $(CFLAGS_libxenevtchn)
-CFLAGS += $(CFLAGS_libxenctrl)
+CFLAGS += $(CFLAGS_libxenmanage)
 CFLAGS += $(CFLAGS_libxentoolcore)
 
 $(XENSTORED_OBJS-y): CFLAGS += $(CFLAGS_libxengnttab)
diff --git a/tools/xenstored/core.h b/tools/xenstored/core.h
index e58779e88c..632886cecf 100644
--- a/tools/xenstored/core.h
+++ b/tools/xenstored/core.h
@@ -19,7 +19,6 @@
 #ifndef _XENSTORED_CORE_H
 #define _XENSTORED_CORE_H
 
-#include <xenctrl.h>
 #include <xengnttab.h>
 
 #include <sys/types.h>
diff --git a/tools/xenstored/domain.c b/tools/xenstored/domain.c
index 64c8fd0cc3..a6506a5bb2 100644
--- a/tools/xenstored/domain.c
+++ b/tools/xenstored/domain.c
@@ -34,14 +34,15 @@
 #include "control.h"
 
 #include <xenevtchn.h>
-#include <xenctrl.h>
+#include <xenmanage.h>
+#include <xen-barrier.h>
 #include <xen/grant_table.h>
 
 #ifdef __MINIOS__
 #include <mini-os/xenbus.h>
 #endif
 
-static xc_interface **xc_handle;
+static xenmanage_handle *xm_handle;
 xengnttab_handle **xgt_handle;
 static evtchn_port_t virq_port;
 
@@ -619,32 +620,28 @@ static int destroy_domain(void *_domain)
 	return 0;
 }
 
-static bool get_domain_info(unsigned int domid, xc_domaininfo_t *dominfo)
-{
-	return xc_domain_getinfo_single(*xc_handle, domid, dominfo) == 0;
-}
-
 static int check_domain(const void *k, void *v, void *arg)
 {
-	xc_domaininfo_t dominfo;
+	unsigned int state;
 	struct connection *conn;
-	bool dom_valid;
+	int dom_invalid;
 	struct domain *domain = v;
 	bool *notify = arg;
 
-	dom_valid = get_domain_info(domain->domid, &dominfo);
+	dom_invalid = xenmanage_get_domain_info(xm_handle, domain->domid,
+						&state, NULL);
 	if (!domain->introduced) {
-		if (!dom_valid)
+		if (dom_invalid)
 			talloc_free(domain);
 		return 0;
 	}
-	if (dom_valid) {
-		if ((dominfo.flags & XEN_DOMINF_shutdown)
+	if (!dom_invalid) {
+		if ((state & XENMANAGE_GETDOMSTATE_STATE_SHUTDOWN)
 		    && !domain->shutdown) {
 			domain->shutdown = true;
 			*notify = true;
 		}
-		if (!(dominfo.flags & XEN_DOMINF_dying))
+		if (!(state & XENMANAGE_GETDOMSTATE_STATE_DEAD))
 			return 0;
 	}
 	if (domain->conn) {
@@ -786,10 +783,9 @@ static struct domain *find_or_alloc_domain(const void *ctx, unsigned int domid)
 static struct domain *find_or_alloc_existing_domain(unsigned int domid)
 {
 	struct domain *domain;
-	xc_domaininfo_t dominfo;
 
 	domain = find_domain_struct(domid);
-	if (!domain && get_domain_info(domid, &dominfo))
+	if (!domain && !xenmanage_get_domain_info(xm_handle, domid, NULL, NULL))
 		domain = alloc_domain(NULL, domid);
 
 	return domain;
@@ -1187,12 +1183,6 @@ int do_reset_watches(const void *ctx, struct connection *conn,
 	return 0;
 }
 
-static int close_xc_handle(void *_handle)
-{
-	xc_interface_close(*(xc_interface**)_handle);
-	return 0;
-}
-
 static int close_xgt_handle(void *_handle)
 {
 	xengnttab_close(*(xengnttab_handle **)_handle);
@@ -1258,15 +1248,9 @@ void domain_early_init(void)
 	if (!domhash)
 		barf_perror("Failed to allocate domain hashtable");
 
-	xc_handle = talloc(talloc_autofree_context(), xc_interface*);
-	if (!xc_handle)
-		barf_perror("Failed to allocate domain handle");
-
-	*xc_handle = xc_interface_open(0,0,0);
-	if (!*xc_handle)
-		barf_perror("Failed to open connection to hypervisor");
-
-	talloc_set_destructor(xc_handle, close_xc_handle);
+	xm_handle = xenmanage_open(NULL, 0);
+	if (!xm_handle)
+		barf_perror("Failed to open connection to libxenmanage");
 
 	xgt_handle = talloc(talloc_autofree_context(), xengnttab_handle*);
 	if (!xgt_handle)
@@ -1306,6 +1290,8 @@ void domain_deinit(void)
 {
 	if (virq_port)
 		xenevtchn_unbind(xce_handle, virq_port);
+
+	xenmanage_close(xm_handle);
 }
 
 /*
@@ -1335,13 +1321,13 @@ int domain_alloc_permrefs(struct node_perms *perms)
 {
 	unsigned int i, domid;
 	struct domain *d;
-	xc_domaininfo_t dominfo;
 
 	for (i = 0; i < perms->num; i++) {
 		domid = perms->p[i].id;
 		d = find_domain_struct(domid);
 		if (!d) {
-			if (!get_domain_info(domid, &dominfo))
+			if (xenmanage_get_domain_info(xm_handle, domid,
+						      NULL, NULL))
 				perms->p[i].perms |= XS_PERM_IGNORE;
 			else if (!alloc_domain(NULL, domid))
 				return ENOMEM;
diff --git a/tools/xenstored/lu.c b/tools/xenstored/lu.c
index bec2a84e10..4fccbbc195 100644
--- a/tools/xenstored/lu.c
+++ b/tools/xenstored/lu.c
@@ -11,6 +11,7 @@
 #include <stdlib.h>
 #include <syslog.h>
 #include <time.h>
+#include <unistd.h>
 #include <sys/mman.h>
 #include <sys/stat.h>
 
diff --git a/tools/xenstored/lu_daemon.c b/tools/xenstored/lu_daemon.c
index 6df6c80a2a..88d8d9e1b3 100644
--- a/tools/xenstored/lu_daemon.c
+++ b/tools/xenstored/lu_daemon.c
@@ -6,6 +6,7 @@
  */
 
 #include <syslog.h>
+#include <unistd.h>
 #include <sys/stat.h>
 
 #include "talloc.h"
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 3/5] xen: add new domctl get_changed_domain
  2024-12-17 14:22 ` [PATCH v5 3/5] xen: add new domctl get_changed_domain Juergen Gross
@ 2024-12-17 14:41   ` Jan Beulich
  2024-12-18  1:20   ` Daniel P. Smith
  1 sibling, 0 replies; 16+ messages in thread
From: Jan Beulich @ 2024-12-17 14:41 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Daniel P. Smith, Anthony PERARD, Andrew Cooper, Michal Orzel,
	Julien Grall, Roger Pau Monné, Stefano Stabellini, xen-devel

On 17.12.2024 15:22, Juergen Gross wrote:
> Add a new domctl sub-function to get data of a domain having changed
> state (this is needed by Xenstore).
> 
> The returned state just contains the domid, the domain unique id,
> and some flags (existing, shutdown, dying).
> 
> In order to enable Xenstore stubdom being built for multiple Xen
> versions, make this domctl stable.  For stable domctls the
> interface_version is always 0.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

As before:
Reviewed-by: Jan Beulich <jbeulich@suse.com> # non-XSM/Flask




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/5] xen: add bitmap to indicate per-domain state changes
  2024-12-17 14:22 ` [PATCH v5 2/5] xen: add bitmap to indicate per-domain state changes Juergen Gross
@ 2024-12-17 15:19   ` Jan Beulich
  2024-12-17 15:55     ` Jürgen Groß
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2024-12-17 15:19 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
	Roger Pau Monné, Stefano Stabellini, xen-devel

On 17.12.2024 15:22, Juergen Gross wrote:
> Add a bitmap with one bit per possible domid indicating the respective
> domain has changed its state (created, deleted, dying, crashed,
> shutdown).
> 
> Registering the VIRQ_DOM_EXC event will result in setting the bits for
> all existing domains and resetting all other bits.
> 
> As the usage of this bitmap is tightly coupled with the VIRQ_DOM_EXC
> event, it is meant to be used only by a single consumer in the system,
> just like the VIRQ_DOM_EXC event.

I'm sorry, but I need to come back to this. I thought I had got convinced
that only a single entity in the system can bind this vIRQ. Yet upon
checking I can't seem to find what would guarantee this. In particular
binding a vIRQ doesn't involve any XSM check. Hence an unprivileged entity
could, on the assumption that the interested privileged entity (xenstore)
is already up and running, bind and unbind this vIRQ, just to have the
global map freed. What am I overlooking (which would likely want stating
here)?

> V5:
> - domain_init_states() may be called only if evtchn_bind_virq() has been
>   called validly (Jan Beulich)

I now recall why I had first suggested the placement later in the handling:
You're now doing the allocation with yet another lock held. It's likely not
the end of the world, but ...

> @@ -138,6 +139,60 @@ bool __read_mostly vmtrace_available;
>  
>  bool __read_mostly vpmu_is_available;
>  
> +static DEFINE_SPINLOCK(dom_state_changed_lock);
> +static unsigned long *__read_mostly dom_state_changed;
> +
> +int domain_init_states(void)
> +{
> +    const struct domain *d;
> +    int rc = -ENOMEM;
> +
> +    spin_lock(&dom_state_changed_lock);
> +
> +    if ( dom_state_changed )
> +        bitmap_zero(dom_state_changed, DOMID_FIRST_RESERVED);
> +    else
> +    {
> +        dom_state_changed = xvzalloc_array(unsigned long,
> +                                           BITS_TO_LONGS(DOMID_FIRST_RESERVED));

... already this alone wasn't nice, and could be avoided (by doing the
allocation prior to acquiring the lock, which of course complicates the
logic some).

What's perhaps less desirable is that ...

> @@ -494,6 +495,15 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, evtchn_port_t port)
>          goto out;
>      }
>  
> +    if ( virq == VIRQ_DOM_EXC )
> +    {
> +        rc = domain_init_states();
> +        if ( rc )
> +            goto out;
> +
> +        deinit_if_err = true;
> +    }
> +
>      port = rc = evtchn_get_port(d, port);
>      if ( rc < 0 )
>      {

... the placement here additionally introduces lock nesting when really
the two locks shouldn't have any relationship.

How about giving domain_init_states() a boolean parameter, such that it
can be called twice, with the first invocation moved back up where it
was, and the second one put ...

> @@ -527,6 +537,9 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, evtchn_port_t port)
>   out:
>      write_unlock(&d->event_lock);
>  
> +    if ( rc && deinit_if_err )
> +        domain_deinit_states();
> +
>      return rc;
>  }

... down here, not doing any allocation at all (only the clearing), and
hence eliminating the need to deal with its failure? (Alternatively
there could of course be a split into an init and a reset function.)

There of course is the chance of races with such an approach. I'd like
to note though that with the placement of the call in the hunk above
there's a minor race, too (against ...

> @@ -730,6 +743,9 @@ int evtchn_close(struct domain *d1, int port1, bool guest)
>          struct vcpu *v;
>          unsigned long flags;
>  
> +        if ( chn1->u.virq == VIRQ_DOM_EXC )
> +            domain_deinit_states();

... this and the same remote vCPU then immediately binding the vIRQ
again). Hence yet another alternative would appear to be to drop the
new global lock and use d->event_lock for synchronization instead
(provided - see above - that only a single entity can actually set up
all of this). That would pretty much want to have the allocation kept
with the lock already held (which isn't nice, but as said is perhaps
tolerable), but would at least eliminate the undesirable lock nesting.

Re-use of the domain's event lock is at least somewhat justified by
the bit array being tied to VIRQ_DOM_EXEC.

Thoughts?

Jan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/5] xen: add bitmap to indicate per-domain state changes
  2024-12-17 15:19   ` Jan Beulich
@ 2024-12-17 15:55     ` Jürgen Groß
  2024-12-17 16:12       ` Jan Beulich
  0 siblings, 1 reply; 16+ messages in thread
From: Jürgen Groß @ 2024-12-17 15:55 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
	Roger Pau Monné, Stefano Stabellini, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 5327 bytes --]

On 17.12.24 16:19, Jan Beulich wrote:
> On 17.12.2024 15:22, Juergen Gross wrote:
>> Add a bitmap with one bit per possible domid indicating the respective
>> domain has changed its state (created, deleted, dying, crashed,
>> shutdown).
>>
>> Registering the VIRQ_DOM_EXC event will result in setting the bits for
>> all existing domains and resetting all other bits.
>>
>> As the usage of this bitmap is tightly coupled with the VIRQ_DOM_EXC
>> event, it is meant to be used only by a single consumer in the system,
>> just like the VIRQ_DOM_EXC event.
> 
> I'm sorry, but I need to come back to this. I thought I had got convinced
> that only a single entity in the system can bind this vIRQ. Yet upon
> checking I can't seem to find what would guarantee this. In particular
> binding a vIRQ doesn't involve any XSM check. Hence an unprivileged entity
> could, on the assumption that the interested privileged entity (xenstore)
> is already up and running, bind and unbind this vIRQ, just to have the
> global map freed. What am I overlooking (which would likely want stating
> here)?

I think you are not overlooking anything.

I guess this can easily be handled by checking that the VIRQ_DOM_EXC handling
domain is the calling one in domain_[de]init_states(). Note that global virqs
are only ever sent to vcpu[0] of the handling domain, so rebinding the event
to another vcpu is possible, but doesn't make sense.

> 
>> V5:
>> - domain_init_states() may be called only if evtchn_bind_virq() has been
>>    called validly (Jan Beulich)
> 
> I now recall why I had first suggested the placement later in the handling:
> You're now doing the allocation with yet another lock held. It's likely not
> the end of the world, but ...
> 
>> @@ -138,6 +139,60 @@ bool __read_mostly vmtrace_available;
>>   
>>   bool __read_mostly vpmu_is_available;
>>   
>> +static DEFINE_SPINLOCK(dom_state_changed_lock);
>> +static unsigned long *__read_mostly dom_state_changed;
>> +
>> +int domain_init_states(void)
>> +{
>> +    const struct domain *d;
>> +    int rc = -ENOMEM;
>> +
>> +    spin_lock(&dom_state_changed_lock);
>> +
>> +    if ( dom_state_changed )
>> +        bitmap_zero(dom_state_changed, DOMID_FIRST_RESERVED);
>> +    else
>> +    {
>> +        dom_state_changed = xvzalloc_array(unsigned long,
>> +                                           BITS_TO_LONGS(DOMID_FIRST_RESERVED));
> 
> ... already this alone wasn't nice, and could be avoided (by doing the
> allocation prior to acquiring the lock, which of course complicates the
> logic some).
> 
> What's perhaps less desirable is that ...
> 
>> @@ -494,6 +495,15 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, evtchn_port_t port)
>>           goto out;
>>       }
>>   
>> +    if ( virq == VIRQ_DOM_EXC )
>> +    {
>> +        rc = domain_init_states();
>> +        if ( rc )
>> +            goto out;
>> +
>> +        deinit_if_err = true;
>> +    }
>> +
>>       port = rc = evtchn_get_port(d, port);
>>       if ( rc < 0 )
>>       {
> 
> ... the placement here additionally introduces lock nesting when really
> the two locks shouldn't have any relationship.
> 
> How about giving domain_init_states() a boolean parameter, such that it
> can be called twice, with the first invocation moved back up where it
> was, and the second one put ...
> 
>> @@ -527,6 +537,9 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, evtchn_port_t port)
>>    out:
>>       write_unlock(&d->event_lock);
>>   
>> +    if ( rc && deinit_if_err )
>> +        domain_deinit_states();
>> +
>>       return rc;
>>   }
> 
> ... down here, not doing any allocation at all (only the clearing), and
> hence eliminating the need to deal with its failure? (Alternatively
> there could of course be a split into an init and a reset function.)
> 
> There of course is the chance of races with such an approach. I'd like
> to note though that with the placement of the call in the hunk above
> there's a minor race, too (against ...
> 
>> @@ -730,6 +743,9 @@ int evtchn_close(struct domain *d1, int port1, bool guest)
>>           struct vcpu *v;
>>           unsigned long flags;
>>   
>> +        if ( chn1->u.virq == VIRQ_DOM_EXC )
>> +            domain_deinit_states();
> 
> ... this and the same remote vCPU then immediately binding the vIRQ
> again). Hence yet another alternative would appear to be to drop the
> new global lock and use d->event_lock for synchronization instead
> (provided - see above - that only a single entity can actually set up
> all of this). That would pretty much want to have the allocation kept
> with the lock already held (which isn't nice, but as said is perhaps
> tolerable), but would at least eliminate the undesirable lock nesting.
> 
> Re-use of the domain's event lock is at least somewhat justified by
> the bit array being tied to VIRQ_DOM_EXEC.
> 
> Thoughts?

With my suggestion above I think there is no race, as only the domain handling
VIRQ_DOM_EXC could alloc/dealloc dom_state_changed.

Using d->event_lock for synchronization is not a nice option IMO, as it would
require to take the event_lock of the domain handling VIRQ_DOM_EXEC when trying
to set a bit for another domain changing state.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/5] xen: add bitmap to indicate per-domain state changes
  2024-12-17 15:55     ` Jürgen Groß
@ 2024-12-17 16:12       ` Jan Beulich
  2024-12-18  7:15         ` Jürgen Groß
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2024-12-17 16:12 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
	Roger Pau Monné, Stefano Stabellini, xen-devel

On 17.12.2024 16:55, Jürgen Groß wrote:
> On 17.12.24 16:19, Jan Beulich wrote:
>> On 17.12.2024 15:22, Juergen Gross wrote:
>>> Add a bitmap with one bit per possible domid indicating the respective
>>> domain has changed its state (created, deleted, dying, crashed,
>>> shutdown).
>>>
>>> Registering the VIRQ_DOM_EXC event will result in setting the bits for
>>> all existing domains and resetting all other bits.
>>>
>>> As the usage of this bitmap is tightly coupled with the VIRQ_DOM_EXC
>>> event, it is meant to be used only by a single consumer in the system,
>>> just like the VIRQ_DOM_EXC event.
>>
>> I'm sorry, but I need to come back to this. I thought I had got convinced
>> that only a single entity in the system can bind this vIRQ. Yet upon
>> checking I can't seem to find what would guarantee this. In particular
>> binding a vIRQ doesn't involve any XSM check. Hence an unprivileged entity
>> could, on the assumption that the interested privileged entity (xenstore)
>> is already up and running, bind and unbind this vIRQ, just to have the
>> global map freed. What am I overlooking (which would likely want stating
>> here)?
> 
> I think you are not overlooking anything.
> 
> I guess this can easily be handled by checking that the VIRQ_DOM_EXC handling
> domain is the calling one in domain_[de]init_states(). Note that global virqs
> are only ever sent to vcpu[0] of the handling domain, so rebinding the event
> to another vcpu is possible, but doesn't make sense.

No, that's precluded by

    if ( virq_is_global(virq) && (vcpu != 0) )
        return -EINVAL;

afaict. That doesn't, however, preclude multiple vCPU-s from trying to bind
the vIRQ to vCPU 0.

>>> V5:
>>> - domain_init_states() may be called only if evtchn_bind_virq() has been
>>>    called validly (Jan Beulich)
>>
>> I now recall why I had first suggested the placement later in the handling:
>> You're now doing the allocation with yet another lock held. It's likely not
>> the end of the world, but ...
>>
>>> @@ -138,6 +139,60 @@ bool __read_mostly vmtrace_available;
>>>   
>>>   bool __read_mostly vpmu_is_available;
>>>   
>>> +static DEFINE_SPINLOCK(dom_state_changed_lock);
>>> +static unsigned long *__read_mostly dom_state_changed;
>>> +
>>> +int domain_init_states(void)
>>> +{
>>> +    const struct domain *d;
>>> +    int rc = -ENOMEM;
>>> +
>>> +    spin_lock(&dom_state_changed_lock);
>>> +
>>> +    if ( dom_state_changed )
>>> +        bitmap_zero(dom_state_changed, DOMID_FIRST_RESERVED);
>>> +    else
>>> +    {
>>> +        dom_state_changed = xvzalloc_array(unsigned long,
>>> +                                           BITS_TO_LONGS(DOMID_FIRST_RESERVED));
>>
>> ... already this alone wasn't nice, and could be avoided (by doing the
>> allocation prior to acquiring the lock, which of course complicates the
>> logic some).
>>
>> What's perhaps less desirable is that ...
>>
>>> @@ -494,6 +495,15 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, evtchn_port_t port)
>>>           goto out;
>>>       }
>>>   
>>> +    if ( virq == VIRQ_DOM_EXC )
>>> +    {
>>> +        rc = domain_init_states();
>>> +        if ( rc )
>>> +            goto out;
>>> +
>>> +        deinit_if_err = true;
>>> +    }
>>> +
>>>       port = rc = evtchn_get_port(d, port);
>>>       if ( rc < 0 )
>>>       {
>>
>> ... the placement here additionally introduces lock nesting when really
>> the two locks shouldn't have any relationship.
>>
>> How about giving domain_init_states() a boolean parameter, such that it
>> can be called twice, with the first invocation moved back up where it
>> was, and the second one put ...
>>
>>> @@ -527,6 +537,9 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, evtchn_port_t port)
>>>    out:
>>>       write_unlock(&d->event_lock);
>>>   
>>> +    if ( rc && deinit_if_err )
>>> +        domain_deinit_states();
>>> +
>>>       return rc;
>>>   }
>>
>> ... down here, not doing any allocation at all (only the clearing), and
>> hence eliminating the need to deal with its failure? (Alternatively
>> there could of course be a split into an init and a reset function.)
>>
>> There of course is the chance of races with such an approach. I'd like
>> to note though that with the placement of the call in the hunk above
>> there's a minor race, too (against ...
>>
>>> @@ -730,6 +743,9 @@ int evtchn_close(struct domain *d1, int port1, bool guest)
>>>           struct vcpu *v;
>>>           unsigned long flags;
>>>   
>>> +        if ( chn1->u.virq == VIRQ_DOM_EXC )
>>> +            domain_deinit_states();
>>
>> ... this and the same remote vCPU then immediately binding the vIRQ
>> again). Hence yet another alternative would appear to be to drop the
>> new global lock and use d->event_lock for synchronization instead
>> (provided - see above - that only a single entity can actually set up
>> all of this). That would pretty much want to have the allocation kept
>> with the lock already held (which isn't nice, but as said is perhaps
>> tolerable), but would at least eliminate the undesirable lock nesting.
>>
>> Re-use of the domain's event lock is at least somewhat justified by
>> the bit array being tied to VIRQ_DOM_EXEC.
>>
>> Thoughts?
> 
> With my suggestion above I think there is no race, as only the domain handling
> VIRQ_DOM_EXC could alloc/dealloc dom_state_changed.

Yet still it could be multiple vCPU-s therein to try to in parallel.

> Using d->event_lock for synchronization is not a nice option IMO, as it would
> require to take the event_lock of the domain handling VIRQ_DOM_EXEC when trying
> to set a bit for another domain changing state.

Well, yes, it's that domain's data that's to be modified, after all.

Jan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 5/5] tools/xenstored: use new stable interface instead of libxenctrl
  2024-12-17 14:22 ` [PATCH v5 5/5] tools/xenstored: use new stable interface instead of libxenctrl Juergen Gross
@ 2024-12-17 17:33   ` Anthony PERARD
  2024-12-17 18:40     ` Jürgen Groß
  0 siblings, 1 reply; 16+ messages in thread
From: Anthony PERARD @ 2024-12-17 17:33 UTC (permalink / raw)
  To: Juergen Gross; +Cc: xen-devel, Samuel Thibault, Julien Grall

On Tue, Dec 17, 2024 at 03:22:18PM +0100, Juergen Gross wrote:
> diff --git a/tools/xenstored/domain.c b/tools/xenstored/domain.c
> index 64c8fd0cc3..a6506a5bb2 100644
> --- a/tools/xenstored/domain.c
> +++ b/tools/xenstored/domain.c
> @@ -1258,15 +1248,9 @@ void domain_early_init(void)
>  	if (!domhash)
>  		barf_perror("Failed to allocate domain hashtable");
>  
> -	xc_handle = talloc(talloc_autofree_context(), xc_interface*);
> -	if (!xc_handle)
> -		barf_perror("Failed to allocate domain handle");
> -
> -	*xc_handle = xc_interface_open(0,0,0);
> -	if (!*xc_handle)
> -		barf_perror("Failed to open connection to hypervisor");
> -
> -	talloc_set_destructor(xc_handle, close_xc_handle);
> +	xm_handle = xenmanage_open(NULL, 0);
> +	if (!xm_handle)
> +		barf_perror("Failed to open connection to libxenmanage");
>  
>  	xgt_handle = talloc(talloc_autofree_context(), xengnttab_handle*);
>  	if (!xgt_handle)
> @@ -1306,6 +1290,8 @@ void domain_deinit(void)
>  {
>  	if (virq_port)
>  		xenevtchn_unbind(xce_handle, virq_port);
> +
> +	xenmanage_close(xm_handle);

Is this the rigth place to free `xm_handle`? domain_deinit() seems to
only be called by the live update code. All the other initialisation
done in domain_early_init() are free via talloc_autofree() it seems,
which is called by atexit().

So, shouldn't `xm_handle` by handle with talloc like the others?

Thanks,

-- 

Anthony Perard | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 5/5] tools/xenstored: use new stable interface instead of libxenctrl
  2024-12-17 17:33   ` Anthony PERARD
@ 2024-12-17 18:40     ` Jürgen Groß
  0 siblings, 0 replies; 16+ messages in thread
From: Jürgen Groß @ 2024-12-17 18:40 UTC (permalink / raw)
  To: Anthony PERARD; +Cc: xen-devel, Samuel Thibault, Julien Grall


[-- Attachment #1.1.1: Type: text/plain, Size: 1769 bytes --]

On 17.12.24 18:33, Anthony PERARD wrote:
> On Tue, Dec 17, 2024 at 03:22:18PM +0100, Juergen Gross wrote:
>> diff --git a/tools/xenstored/domain.c b/tools/xenstored/domain.c
>> index 64c8fd0cc3..a6506a5bb2 100644
>> --- a/tools/xenstored/domain.c
>> +++ b/tools/xenstored/domain.c
>> @@ -1258,15 +1248,9 @@ void domain_early_init(void)
>>   	if (!domhash)
>>   		barf_perror("Failed to allocate domain hashtable");
>>   
>> -	xc_handle = talloc(talloc_autofree_context(), xc_interface*);
>> -	if (!xc_handle)
>> -		barf_perror("Failed to allocate domain handle");
>> -
>> -	*xc_handle = xc_interface_open(0,0,0);
>> -	if (!*xc_handle)
>> -		barf_perror("Failed to open connection to hypervisor");
>> -
>> -	talloc_set_destructor(xc_handle, close_xc_handle);
>> +	xm_handle = xenmanage_open(NULL, 0);
>> +	if (!xm_handle)
>> +		barf_perror("Failed to open connection to libxenmanage");
>>   
>>   	xgt_handle = talloc(talloc_autofree_context(), xengnttab_handle*);
>>   	if (!xgt_handle)
>> @@ -1306,6 +1290,8 @@ void domain_deinit(void)
>>   {
>>   	if (virq_port)
>>   		xenevtchn_unbind(xce_handle, virq_port);
>> +
>> +	xenmanage_close(xm_handle);
> 
> Is this the rigth place to free `xm_handle`? domain_deinit() seems to
> only be called by the live update code. All the other initialisation
> done in domain_early_init() are free via talloc_autofree() it seems,
> which is called by atexit().
> 
> So, shouldn't `xm_handle` by handle with talloc like the others?

No, I don't think so.

TBH, I'm questioning the talloc() allocation of the other handles, too.

The only case where xenstored is stopped IS the live update case. Apart
from crashes, of course, but those won't execute the destructors either.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 3/5] xen: add new domctl get_changed_domain
  2024-12-17 14:22 ` [PATCH v5 3/5] xen: add new domctl get_changed_domain Juergen Gross
  2024-12-17 14:41   ` Jan Beulich
@ 2024-12-18  1:20   ` Daniel P. Smith
  1 sibling, 0 replies; 16+ messages in thread
From: Daniel P. Smith @ 2024-12-18  1:20 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Anthony PERARD, Andrew Cooper, Michal Orzel, Jan Beulich,
	Julien Grall, Roger Pau Monné, Stefano Stabellini

On 12/17/24 09:22, Juergen Gross wrote:
> Add a new domctl sub-function to get data of a domain having changed
> state (this is needed by Xenstore).
> 
> The returned state just contains the domid, the domain unique id,
> and some flags (existing, shutdown, dying).
> 
> In order to enable Xenstore stubdom being built for multiple Xen
> versions, make this domctl stable.  For stable domctls the
> interface_version is always 0.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> V1:
> - use a domctl subop for the new interface (Jan Beulich)
> V2:
> - fix XSM hooks (Daniel P. Smith)
> - remove versioning of stable sub-ops (Jan Beulich)
> - use domctl.domain for retuning domid of a changed domain (Jan Beulich)
> - simplify locking in get_domain_state() (Jan Beulich)
> - undo stray change in event_channel.c (Jan Beulich)
> V3:
> - have disjunct states "dying" and "dead" (Jan Beulich)
> - check padding fields to be 0 (Jan Beulich)
> - drop memset() (Jan Beulich)
> V4:
> - add locking in get_domain_state() (Jan Beulich)
> - only allow querying domain having changed state by domain receiving
>    VIRQ_DOM_EXC events (Jan Beulich)
> V5:
> - use memset() (Jan Beulich)
> ---
>   tools/flask/policy/modules/dom0.te     |  2 +-
>   tools/flask/policy/modules/xen.if      |  4 +-
>   tools/flask/policy/modules/xenstore.te |  1 +
>   xen/common/domain.c                    | 67 ++++++++++++++++++++++++++
>   xen/common/domctl.c                    | 18 ++++++-
>   xen/common/event_channel.c             |  9 +++-
>   xen/include/public/domctl.h            | 26 ++++++++++
>   xen/include/xen/event.h                |  7 +++
>   xen/include/xen/sched.h                |  2 +
>   xen/include/xsm/dummy.h                |  8 +++
>   xen/include/xsm/xsm.h                  |  6 +++
>   xen/xsm/dummy.c                        |  1 +
>   xen/xsm/flask/hooks.c                  |  7 +++
>   xen/xsm/flask/policy/access_vectors    |  2 +
>   14 files changed, 155 insertions(+), 5 deletions(-)
> 


> diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
> index f8a3c4b81e..a1a5bb60e9 100644
> --- a/xen/include/xsm/dummy.h
> +++ b/xen/include/xsm/dummy.h
> @@ -173,6 +173,7 @@ static XSM_INLINE int cf_check xsm_domctl(
>       case XEN_DOMCTL_unbind_pt_irq:
>           return xsm_default_action(XSM_DM_PRIV, current->domain, d);
>       case XEN_DOMCTL_getdomaininfo:
> +    case XEN_DOMCTL_get_domain_state:
>           return xsm_default_action(XSM_XS_PRIV, current->domain, d);
>       default:
>           return xsm_default_action(XSM_PRIV, current->domain, d);
> @@ -815,6 +816,13 @@ static XSM_INLINE int cf_check xsm_argo_send(
>   
>   #endif /* CONFIG_ARGO */
>   
> +static XSM_INLINE int cf_check xsm_get_domain_state(
> +    XSM_DEFAULT_ARG struct domain *d)
> +{
> +    XSM_ASSERT_ACTION(XSM_XS_PRIV);
> +    return xsm_default_action(action, current->domain, d);
> +}
> +
>   #include <public/version.h>
>   static XSM_INLINE int cf_check xsm_xen_version(XSM_DEFAULT_ARG uint32_t op)
>   {
> diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
> index 4dbff9d866..0689bf5c9f 100644
> --- a/xen/include/xsm/xsm.h
> +++ b/xen/include/xsm/xsm.h
> @@ -200,6 +200,7 @@ struct xsm_ops {
>       int (*argo_register_any_source)(const struct domain *d);
>       int (*argo_send)(const struct domain *d, const struct domain *t);
>   #endif
> +    int (*get_domain_state)(struct domain *d);
>   };
>   
>   #ifdef CONFIG_XSM
> @@ -774,6 +775,11 @@ static inline int xsm_argo_send(const struct domain *d, const struct domain *t)
>   
>   #endif /* CONFIG_ARGO */
>   
> +static inline int xsm_get_domain_state(struct domain *d)
> +{
> +    return alternative_call(xsm_ops.get_domain_state, d);
> +}
> +
>   #endif /* XSM_NO_WRAPPERS */
>   
>   #ifdef CONFIG_MULTIBOOT
> diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
> index e6ffa948f7..ce6fbdc6c5 100644
> --- a/xen/xsm/dummy.c
> +++ b/xen/xsm/dummy.c
> @@ -148,6 +148,7 @@ static const struct xsm_ops __initconst_cf_clobber dummy_ops = {
>       .argo_register_any_source      = xsm_argo_register_any_source,
>       .argo_send                     = xsm_argo_send,
>   #endif
> +    .get_domain_state              = xsm_get_domain_state,
>   };
>   
>   void __init xsm_fixup_ops(struct xsm_ops *ops)
> diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
> index a79474ffe4..e110846ad9 100644
> --- a/xen/xsm/flask/hooks.c
> +++ b/xen/xsm/flask/hooks.c
> @@ -688,6 +688,7 @@ static int cf_check flask_domctl(struct domain *d, unsigned int cmd,
>       case XEN_DOMCTL_memory_mapping:
>       case XEN_DOMCTL_set_target:
>       case XEN_DOMCTL_vm_event_op:
> +    case XEN_DOMCTL_get_domain_state:
>   
>       /* These have individual XSM hooks (arch/../domctl.c) */
>       case XEN_DOMCTL_bind_pt_irq:
> @@ -1856,6 +1857,11 @@ static int cf_check flask_argo_send(
>   
>   #endif
>   
> +static int cf_check flask_get_domain_state(struct domain *d)
> +{
> +    return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__GET_DOMAIN_STATE);
> +}
> +
>   static const struct xsm_ops __initconst_cf_clobber flask_ops = {
>       .set_system_active = flask_set_system_active,
>       .security_domaininfo = flask_security_domaininfo,
> @@ -1992,6 +1998,7 @@ static const struct xsm_ops __initconst_cf_clobber flask_ops = {
>       .argo_register_any_source = flask_argo_register_any_source,
>       .argo_send = flask_argo_send,
>   #endif
> +    .get_domain_state = flask_get_domain_state,
>   };
>   
>   const struct xsm_ops *__init flask_init(
> diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
> index a35e3d4c51..c9a8eeda4e 100644
> --- a/xen/xsm/flask/policy/access_vectors
> +++ b/xen/xsm/flask/policy/access_vectors
> @@ -251,6 +251,8 @@ class domain2
>       resource_map
>   # XEN_DOMCTL_get_cpu_policy
>       get_cpu_policy
> +# XEN_DOMCTL_get_domain_state
> +    get_domain_state
>   }
>   
>   # Similar to class domain, but primarily contains domctls related to HVM domains

Apologies, I missed v5 when responding to v4 a moment ago. For good measure,

Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/5] xen: add bitmap to indicate per-domain state changes
  2024-12-17 16:12       ` Jan Beulich
@ 2024-12-18  7:15         ` Jürgen Groß
  2024-12-19  8:01           ` Jan Beulich
  0 siblings, 1 reply; 16+ messages in thread
From: Jürgen Groß @ 2024-12-18  7:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
	Roger Pau Monné, Stefano Stabellini, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 6742 bytes --]

On 17.12.24 17:12, Jan Beulich wrote:
> On 17.12.2024 16:55, Jürgen Groß wrote:
>> On 17.12.24 16:19, Jan Beulich wrote:
>>> On 17.12.2024 15:22, Juergen Gross wrote:
>>>> Add a bitmap with one bit per possible domid indicating the respective
>>>> domain has changed its state (created, deleted, dying, crashed,
>>>> shutdown).
>>>>
>>>> Registering the VIRQ_DOM_EXC event will result in setting the bits for
>>>> all existing domains and resetting all other bits.
>>>>
>>>> As the usage of this bitmap is tightly coupled with the VIRQ_DOM_EXC
>>>> event, it is meant to be used only by a single consumer in the system,
>>>> just like the VIRQ_DOM_EXC event.
>>>
>>> I'm sorry, but I need to come back to this. I thought I had got convinced
>>> that only a single entity in the system can bind this vIRQ. Yet upon
>>> checking I can't seem to find what would guarantee this. In particular
>>> binding a vIRQ doesn't involve any XSM check. Hence an unprivileged entity
>>> could, on the assumption that the interested privileged entity (xenstore)
>>> is already up and running, bind and unbind this vIRQ, just to have the
>>> global map freed. What am I overlooking (which would likely want stating
>>> here)?
>>
>> I think you are not overlooking anything.
>>
>> I guess this can easily be handled by checking that the VIRQ_DOM_EXC handling
>> domain is the calling one in domain_[de]init_states(). Note that global virqs
>> are only ever sent to vcpu[0] of the handling domain, so rebinding the event
>> to another vcpu is possible, but doesn't make sense.
> 
> No, that's precluded by
> 
>      if ( virq_is_global(virq) && (vcpu != 0) )
>          return -EINVAL;
> 
> afaict. That doesn't, however, preclude multiple vCPU-s from trying to bind
> the vIRQ to vCPU 0.

I let myself fool by the ability to use EVTCHNOP_bind_vcpu for a global
virq. While it is possible to send the event to another vcpu, it is still
vcpu[0] which is used for the bookkeeping.

> 
>>>> V5:
>>>> - domain_init_states() may be called only if evtchn_bind_virq() has been
>>>>     called validly (Jan Beulich)
>>>
>>> I now recall why I had first suggested the placement later in the handling:
>>> You're now doing the allocation with yet another lock held. It's likely not
>>> the end of the world, but ...
>>>
>>>> @@ -138,6 +139,60 @@ bool __read_mostly vmtrace_available;
>>>>    
>>>>    bool __read_mostly vpmu_is_available;
>>>>    
>>>> +static DEFINE_SPINLOCK(dom_state_changed_lock);
>>>> +static unsigned long *__read_mostly dom_state_changed;
>>>> +
>>>> +int domain_init_states(void)
>>>> +{
>>>> +    const struct domain *d;
>>>> +    int rc = -ENOMEM;
>>>> +
>>>> +    spin_lock(&dom_state_changed_lock);
>>>> +
>>>> +    if ( dom_state_changed )
>>>> +        bitmap_zero(dom_state_changed, DOMID_FIRST_RESERVED);
>>>> +    else
>>>> +    {
>>>> +        dom_state_changed = xvzalloc_array(unsigned long,
>>>> +                                           BITS_TO_LONGS(DOMID_FIRST_RESERVED));
>>>
>>> ... already this alone wasn't nice, and could be avoided (by doing the
>>> allocation prior to acquiring the lock, which of course complicates the
>>> logic some).
>>>
>>> What's perhaps less desirable is that ...
>>>
>>>> @@ -494,6 +495,15 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, evtchn_port_t port)
>>>>            goto out;
>>>>        }
>>>>    
>>>> +    if ( virq == VIRQ_DOM_EXC )
>>>> +    {
>>>> +        rc = domain_init_states();
>>>> +        if ( rc )
>>>> +            goto out;
>>>> +
>>>> +        deinit_if_err = true;
>>>> +    }
>>>> +
>>>>        port = rc = evtchn_get_port(d, port);
>>>>        if ( rc < 0 )
>>>>        {
>>>
>>> ... the placement here additionally introduces lock nesting when really
>>> the two locks shouldn't have any relationship.
>>>
>>> How about giving domain_init_states() a boolean parameter, such that it
>>> can be called twice, with the first invocation moved back up where it
>>> was, and the second one put ...
>>>
>>>> @@ -527,6 +537,9 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, evtchn_port_t port)
>>>>     out:
>>>>        write_unlock(&d->event_lock);
>>>>    
>>>> +    if ( rc && deinit_if_err )
>>>> +        domain_deinit_states();
>>>> +
>>>>        return rc;
>>>>    }
>>>
>>> ... down here, not doing any allocation at all (only the clearing), and
>>> hence eliminating the need to deal with its failure? (Alternatively
>>> there could of course be a split into an init and a reset function.)
>>>
>>> There of course is the chance of races with such an approach. I'd like
>>> to note though that with the placement of the call in the hunk above
>>> there's a minor race, too (against ...
>>>
>>>> @@ -730,6 +743,9 @@ int evtchn_close(struct domain *d1, int port1, bool guest)
>>>>            struct vcpu *v;
>>>>            unsigned long flags;
>>>>    
>>>> +        if ( chn1->u.virq == VIRQ_DOM_EXC )
>>>> +            domain_deinit_states();
>>>
>>> ... this and the same remote vCPU then immediately binding the vIRQ
>>> again). Hence yet another alternative would appear to be to drop the
>>> new global lock and use d->event_lock for synchronization instead
>>> (provided - see above - that only a single entity can actually set up
>>> all of this). That would pretty much want to have the allocation kept
>>> with the lock already held (which isn't nice, but as said is perhaps
>>> tolerable), but would at least eliminate the undesirable lock nesting.
>>>
>>> Re-use of the domain's event lock is at least somewhat justified by
>>> the bit array being tied to VIRQ_DOM_EXEC.
>>>
>>> Thoughts?
>>
>> With my suggestion above I think there is no race, as only the domain handling
>> VIRQ_DOM_EXC could alloc/dealloc dom_state_changed.
> 
> Yet still it could be multiple vCPU-s therein to try to in parallel.

But isn't this again the need for trusting other processes within the domain
having the right to consume the virq?

In the end it doesn't matter whether there is such a race or not. Some
process in that domain having the power to do event channel operations could
simply close the event channel. So it IS really about trust.

> 
>> Using d->event_lock for synchronization is not a nice option IMO, as it would
>> require to take the event_lock of the domain handling VIRQ_DOM_EXEC when trying
>> to set a bit for another domain changing state.
> 
> Well, yes, it's that domain's data that's to be modified, after all.

True, but using d->event_lock would probably increase lock contention, as this
lock is used much more often than the new lock introduced by my patch.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/5] xen: add bitmap to indicate per-domain state changes
  2024-12-18  7:15         ` Jürgen Groß
@ 2024-12-19  8:01           ` Jan Beulich
  2025-01-06  9:34             ` Juergen Gross
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2024-12-19  8:01 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
	Roger Pau Monné, Stefano Stabellini, xen-devel

On 18.12.2024 08:15, Jürgen Groß wrote:
> On 17.12.24 17:12, Jan Beulich wrote:
>> On 17.12.2024 16:55, Jürgen Groß wrote:
>>> On 17.12.24 16:19, Jan Beulich wrote:
>>>> On 17.12.2024 15:22, Juergen Gross wrote:
>>>>> Add a bitmap with one bit per possible domid indicating the respective
>>>>> domain has changed its state (created, deleted, dying, crashed,
>>>>> shutdown).
>>>>>
>>>>> Registering the VIRQ_DOM_EXC event will result in setting the bits for
>>>>> all existing domains and resetting all other bits.
>>>>>
>>>>> As the usage of this bitmap is tightly coupled with the VIRQ_DOM_EXC
>>>>> event, it is meant to be used only by a single consumer in the system,
>>>>> just like the VIRQ_DOM_EXC event.
>>>>
>>>> I'm sorry, but I need to come back to this. I thought I had got convinced
>>>> that only a single entity in the system can bind this vIRQ. Yet upon
>>>> checking I can't seem to find what would guarantee this. In particular
>>>> binding a vIRQ doesn't involve any XSM check. Hence an unprivileged entity
>>>> could, on the assumption that the interested privileged entity (xenstore)
>>>> is already up and running, bind and unbind this vIRQ, just to have the
>>>> global map freed. What am I overlooking (which would likely want stating
>>>> here)?
>>>
>>> I think you are not overlooking anything.
>>>
>>> I guess this can easily be handled by checking that the VIRQ_DOM_EXC handling
>>> domain is the calling one in domain_[de]init_states(). Note that global virqs
>>> are only ever sent to vcpu[0] of the handling domain, so rebinding the event
>>> to another vcpu is possible, but doesn't make sense.
>>
>> No, that's precluded by
>>
>>      if ( virq_is_global(virq) && (vcpu != 0) )
>>          return -EINVAL;
>>
>> afaict. That doesn't, however, preclude multiple vCPU-s from trying to bind
>> the vIRQ to vCPU 0.
> 
> I let myself fool by the ability to use EVTCHNOP_bind_vcpu for a global
> virq. While it is possible to send the event to another vcpu, it is still
> vcpu[0] which is used for the bookkeeping.
> 
>>
>>>>> V5:
>>>>> - domain_init_states() may be called only if evtchn_bind_virq() has been
>>>>>     called validly (Jan Beulich)
>>>>
>>>> I now recall why I had first suggested the placement later in the handling:
>>>> You're now doing the allocation with yet another lock held. It's likely not
>>>> the end of the world, but ...
>>>>
>>>>> @@ -138,6 +139,60 @@ bool __read_mostly vmtrace_available;
>>>>>    
>>>>>    bool __read_mostly vpmu_is_available;
>>>>>    
>>>>> +static DEFINE_SPINLOCK(dom_state_changed_lock);
>>>>> +static unsigned long *__read_mostly dom_state_changed;
>>>>> +
>>>>> +int domain_init_states(void)
>>>>> +{
>>>>> +    const struct domain *d;
>>>>> +    int rc = -ENOMEM;
>>>>> +
>>>>> +    spin_lock(&dom_state_changed_lock);
>>>>> +
>>>>> +    if ( dom_state_changed )
>>>>> +        bitmap_zero(dom_state_changed, DOMID_FIRST_RESERVED);
>>>>> +    else
>>>>> +    {
>>>>> +        dom_state_changed = xvzalloc_array(unsigned long,
>>>>> +                                           BITS_TO_LONGS(DOMID_FIRST_RESERVED));
>>>>
>>>> ... already this alone wasn't nice, and could be avoided (by doing the
>>>> allocation prior to acquiring the lock, which of course complicates the
>>>> logic some).
>>>>
>>>> What's perhaps less desirable is that ...
>>>>
>>>>> @@ -494,6 +495,15 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, evtchn_port_t port)
>>>>>            goto out;
>>>>>        }
>>>>>    
>>>>> +    if ( virq == VIRQ_DOM_EXC )
>>>>> +    {
>>>>> +        rc = domain_init_states();
>>>>> +        if ( rc )
>>>>> +            goto out;
>>>>> +
>>>>> +        deinit_if_err = true;
>>>>> +    }
>>>>> +
>>>>>        port = rc = evtchn_get_port(d, port);
>>>>>        if ( rc < 0 )
>>>>>        {
>>>>
>>>> ... the placement here additionally introduces lock nesting when really
>>>> the two locks shouldn't have any relationship.
>>>>
>>>> How about giving domain_init_states() a boolean parameter, such that it
>>>> can be called twice, with the first invocation moved back up where it
>>>> was, and the second one put ...
>>>>
>>>>> @@ -527,6 +537,9 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, evtchn_port_t port)
>>>>>     out:
>>>>>        write_unlock(&d->event_lock);
>>>>>    
>>>>> +    if ( rc && deinit_if_err )
>>>>> +        domain_deinit_states();
>>>>> +
>>>>>        return rc;
>>>>>    }
>>>>
>>>> ... down here, not doing any allocation at all (only the clearing), and
>>>> hence eliminating the need to deal with its failure? (Alternatively
>>>> there could of course be a split into an init and a reset function.)
>>>>
>>>> There of course is the chance of races with such an approach. I'd like
>>>> to note though that with the placement of the call in the hunk above
>>>> there's a minor race, too (against ...
>>>>
>>>>> @@ -730,6 +743,9 @@ int evtchn_close(struct domain *d1, int port1, bool guest)
>>>>>            struct vcpu *v;
>>>>>            unsigned long flags;
>>>>>    
>>>>> +        if ( chn1->u.virq == VIRQ_DOM_EXC )
>>>>> +            domain_deinit_states();
>>>>
>>>> ... this and the same remote vCPU then immediately binding the vIRQ
>>>> again). Hence yet another alternative would appear to be to drop the
>>>> new global lock and use d->event_lock for synchronization instead
>>>> (provided - see above - that only a single entity can actually set up
>>>> all of this). That would pretty much want to have the allocation kept
>>>> with the lock already held (which isn't nice, but as said is perhaps
>>>> tolerable), but would at least eliminate the undesirable lock nesting.
>>>>
>>>> Re-use of the domain's event lock is at least somewhat justified by
>>>> the bit array being tied to VIRQ_DOM_EXEC.
>>>>
>>>> Thoughts?
>>>
>>> With my suggestion above I think there is no race, as only the domain handling
>>> VIRQ_DOM_EXC could alloc/dealloc dom_state_changed.
>>
>> Yet still it could be multiple vCPU-s therein to try to in parallel.
> 
> But isn't this again the need for trusting other processes within the domain
> having the right to consume the virq?
> 
> In the end it doesn't matter whether there is such a race or not. Some
> process in that domain having the power to do event channel operations could
> simply close the event channel. So it IS really about trust.

Well. What we do in Xen should be correct without regard to what a guest might
be doing. And it should be safe against any not-fully-privileged entity in the
system. Hence why I think such a race needs dealing with correctly, no matter
that it's not a sane thing to do for a guest.

>>> Using d->event_lock for synchronization is not a nice option IMO, as it would
>>> require to take the event_lock of the domain handling VIRQ_DOM_EXEC when trying
>>> to set a bit for another domain changing state.
>>
>> Well, yes, it's that domain's data that's to be modified, after all.
> 
> True, but using d->event_lock would probably increase lock contention, as this
> lock is used much more often than the new lock introduced by my patch.

On a system with extremely heavy domain creation / teardown activity there may
be an increase of contention, yes. Whether that's a price worth to pay to avoid
introducing an otherwise unnecessary relationship between two locks is
precisely what we're trying to determine in this discussion. If the two of us
are taking opposite positions here, we'll simply need a 3rd view.

Jan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/5] xen: add bitmap to indicate per-domain state changes
  2024-12-19  8:01           ` Jan Beulich
@ 2025-01-06  9:34             ` Juergen Gross
  0 siblings, 0 replies; 16+ messages in thread
From: Juergen Gross @ 2025-01-06  9:34 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
	Roger Pau Monné, Stefano Stabellini, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 7704 bytes --]

On 19.12.24 09:01, Jan Beulich wrote:
> On 18.12.2024 08:15, Jürgen Groß wrote:
>> On 17.12.24 17:12, Jan Beulich wrote:
>>> On 17.12.2024 16:55, Jürgen Groß wrote:
>>>> On 17.12.24 16:19, Jan Beulich wrote:
>>>>> On 17.12.2024 15:22, Juergen Gross wrote:
>>>>>> Add a bitmap with one bit per possible domid indicating the respective
>>>>>> domain has changed its state (created, deleted, dying, crashed,
>>>>>> shutdown).
>>>>>>
>>>>>> Registering the VIRQ_DOM_EXC event will result in setting the bits for
>>>>>> all existing domains and resetting all other bits.
>>>>>>
>>>>>> As the usage of this bitmap is tightly coupled with the VIRQ_DOM_EXC
>>>>>> event, it is meant to be used only by a single consumer in the system,
>>>>>> just like the VIRQ_DOM_EXC event.
>>>>>
>>>>> I'm sorry, but I need to come back to this. I thought I had got convinced
>>>>> that only a single entity in the system can bind this vIRQ. Yet upon
>>>>> checking I can't seem to find what would guarantee this. In particular
>>>>> binding a vIRQ doesn't involve any XSM check. Hence an unprivileged entity
>>>>> could, on the assumption that the interested privileged entity (xenstore)
>>>>> is already up and running, bind and unbind this vIRQ, just to have the
>>>>> global map freed. What am I overlooking (which would likely want stating
>>>>> here)?
>>>>
>>>> I think you are not overlooking anything.
>>>>
>>>> I guess this can easily be handled by checking that the VIRQ_DOM_EXC handling
>>>> domain is the calling one in domain_[de]init_states(). Note that global virqs
>>>> are only ever sent to vcpu[0] of the handling domain, so rebinding the event
>>>> to another vcpu is possible, but doesn't make sense.
>>>
>>> No, that's precluded by
>>>
>>>       if ( virq_is_global(virq) && (vcpu != 0) )
>>>           return -EINVAL;
>>>
>>> afaict. That doesn't, however, preclude multiple vCPU-s from trying to bind
>>> the vIRQ to vCPU 0.
>>
>> I let myself fool by the ability to use EVTCHNOP_bind_vcpu for a global
>> virq. While it is possible to send the event to another vcpu, it is still
>> vcpu[0] which is used for the bookkeeping.
>>
>>>
>>>>>> V5:
>>>>>> - domain_init_states() may be called only if evtchn_bind_virq() has been
>>>>>>      called validly (Jan Beulich)
>>>>>
>>>>> I now recall why I had first suggested the placement later in the handling:
>>>>> You're now doing the allocation with yet another lock held. It's likely not
>>>>> the end of the world, but ...
>>>>>
>>>>>> @@ -138,6 +139,60 @@ bool __read_mostly vmtrace_available;
>>>>>>     
>>>>>>     bool __read_mostly vpmu_is_available;
>>>>>>     
>>>>>> +static DEFINE_SPINLOCK(dom_state_changed_lock);
>>>>>> +static unsigned long *__read_mostly dom_state_changed;
>>>>>> +
>>>>>> +int domain_init_states(void)
>>>>>> +{
>>>>>> +    const struct domain *d;
>>>>>> +    int rc = -ENOMEM;
>>>>>> +
>>>>>> +    spin_lock(&dom_state_changed_lock);
>>>>>> +
>>>>>> +    if ( dom_state_changed )
>>>>>> +        bitmap_zero(dom_state_changed, DOMID_FIRST_RESERVED);
>>>>>> +    else
>>>>>> +    {
>>>>>> +        dom_state_changed = xvzalloc_array(unsigned long,
>>>>>> +                                           BITS_TO_LONGS(DOMID_FIRST_RESERVED));
>>>>>
>>>>> ... already this alone wasn't nice, and could be avoided (by doing the
>>>>> allocation prior to acquiring the lock, which of course complicates the
>>>>> logic some).
>>>>>
>>>>> What's perhaps less desirable is that ...
>>>>>
>>>>>> @@ -494,6 +495,15 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, evtchn_port_t port)
>>>>>>             goto out;
>>>>>>         }
>>>>>>     
>>>>>> +    if ( virq == VIRQ_DOM_EXC )
>>>>>> +    {
>>>>>> +        rc = domain_init_states();
>>>>>> +        if ( rc )
>>>>>> +            goto out;
>>>>>> +
>>>>>> +        deinit_if_err = true;
>>>>>> +    }
>>>>>> +
>>>>>>         port = rc = evtchn_get_port(d, port);
>>>>>>         if ( rc < 0 )
>>>>>>         {
>>>>>
>>>>> ... the placement here additionally introduces lock nesting when really
>>>>> the two locks shouldn't have any relationship.
>>>>>
>>>>> How about giving domain_init_states() a boolean parameter, such that it
>>>>> can be called twice, with the first invocation moved back up where it
>>>>> was, and the second one put ...
>>>>>
>>>>>> @@ -527,6 +537,9 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, evtchn_port_t port)
>>>>>>      out:
>>>>>>         write_unlock(&d->event_lock);
>>>>>>     
>>>>>> +    if ( rc && deinit_if_err )
>>>>>> +        domain_deinit_states();
>>>>>> +
>>>>>>         return rc;
>>>>>>     }
>>>>>
>>>>> ... down here, not doing any allocation at all (only the clearing), and
>>>>> hence eliminating the need to deal with its failure? (Alternatively
>>>>> there could of course be a split into an init and a reset function.)
>>>>>
>>>>> There of course is the chance of races with such an approach. I'd like
>>>>> to note though that with the placement of the call in the hunk above
>>>>> there's a minor race, too (against ...
>>>>>
>>>>>> @@ -730,6 +743,9 @@ int evtchn_close(struct domain *d1, int port1, bool guest)
>>>>>>             struct vcpu *v;
>>>>>>             unsigned long flags;
>>>>>>     
>>>>>> +        if ( chn1->u.virq == VIRQ_DOM_EXC )
>>>>>> +            domain_deinit_states();
>>>>>
>>>>> ... this and the same remote vCPU then immediately binding the vIRQ
>>>>> again). Hence yet another alternative would appear to be to drop the
>>>>> new global lock and use d->event_lock for synchronization instead
>>>>> (provided - see above - that only a single entity can actually set up
>>>>> all of this). That would pretty much want to have the allocation kept
>>>>> with the lock already held (which isn't nice, but as said is perhaps
>>>>> tolerable), but would at least eliminate the undesirable lock nesting.
>>>>>
>>>>> Re-use of the domain's event lock is at least somewhat justified by
>>>>> the bit array being tied to VIRQ_DOM_EXEC.
>>>>>
>>>>> Thoughts?
>>>>
>>>> With my suggestion above I think there is no race, as only the domain handling
>>>> VIRQ_DOM_EXC could alloc/dealloc dom_state_changed.
>>>
>>> Yet still it could be multiple vCPU-s therein to try to in parallel.
>>
>> But isn't this again the need for trusting other processes within the domain
>> having the right to consume the virq?
>>
>> In the end it doesn't matter whether there is such a race or not. Some
>> process in that domain having the power to do event channel operations could
>> simply close the event channel. So it IS really about trust.
> 
> Well. What we do in Xen should be correct without regard to what a guest might
> be doing. And it should be safe against any not-fully-privileged entity in the
> system. Hence why I think such a race needs dealing with correctly, no matter
> that it's not a sane thing to do for a guest.

I think I have now found a solution covering all your remarks:

- I've added 2 patches making sure that only a single consumer of a global
   virq is able to bind to that virq, while ensuring that it isn't possible
   to switch the handling domain while a virq is bound by a domain. This will
   eliminate all the races.

- I'll switch to use the d->event_lock for protecting the bitmap.

- I'm dropping the case where the bitmap is already allocated when doing the
   bind. After all this shouldn't be possible any longer, as the bitmap will be
   freed when closing the event channel. This again will remove the case where
   binding would potentially change the bitmap without owning it.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-01-06  9:35 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-17 14:22 [PATCH v5 0/5] remove libxenctrl usage from xenstored Juergen Gross
2024-12-17 14:22 ` [PATCH v5 1/5] tools: add a dedicated header file for barrier definitions Juergen Gross
2024-12-17 14:22 ` [PATCH v5 2/5] xen: add bitmap to indicate per-domain state changes Juergen Gross
2024-12-17 15:19   ` Jan Beulich
2024-12-17 15:55     ` Jürgen Groß
2024-12-17 16:12       ` Jan Beulich
2024-12-18  7:15         ` Jürgen Groß
2024-12-19  8:01           ` Jan Beulich
2025-01-06  9:34             ` Juergen Gross
2024-12-17 14:22 ` [PATCH v5 3/5] xen: add new domctl get_changed_domain Juergen Gross
2024-12-17 14:41   ` Jan Beulich
2024-12-18  1:20   ` Daniel P. Smith
2024-12-17 14:22 ` [PATCH v5 4/5] tools/libs: add a new libxenmanage library Juergen Gross
2024-12-17 14:22 ` [PATCH v5 5/5] tools/xenstored: use new stable interface instead of libxenctrl Juergen Gross
2024-12-17 17:33   ` Anthony PERARD
2024-12-17 18:40     ` Jürgen Groß

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.