* [PATCH v5 00/10] runstate/time area registration by (guest) physical address
@ 2023-10-02 15:11 Roger Pau Monne
2023-10-02 15:11 ` [PATCH v5 01/10] mem_sharing/fork: do not attempt to populate vcpu_info page Roger Pau Monne
` (12 more replies)
0 siblings, 13 replies; 42+ messages in thread
From: Roger Pau Monne @ 2023-10-02 15:11 UTC (permalink / raw)
To: xen-devel, henry.wang
Cc: Roger Pau Monne, Tamas K Lengyel, Jan Beulich, Andrew Cooper,
George Dunlap, Wei Liu, Julien Grall, Stefano Stabellini
Since it was indicated that introducing specific new vCPU ops may be
beneficial independent of the introduction of a fully physical-
address-based ABI flavor, here we go. There continue to be a few open
questions throughout the series, resolving of which was one of the main
goals of the earlier postings.
v5 adds one vm-fork specific pre-patch that does simply the introduced
code later on. It does also fix a vm-fork bug.
Patches 1 and 6 are missing and Ack from the mem-sharing maintainer.
Whole series will need a Release-Ack.
Thanks, Roger.
Jan Beulich (9):
x86/shim: zap runstate and time area handles during shutdown
domain: GADDR based shared guest area registration alternative -
teardown
domain: update GADDR based runstate guest area
x86: update GADDR based secondary time area
x86/mem-sharing: copy GADDR based shared guest areas
domain: map/unmap GADDR based shared guest areas
domain: introduce GADDR based runstate area registration alternative
x86: introduce GADDR based secondary time area registration
alternative
common: convert vCPU info area registration
Roger Pau Monne (1):
mem_sharing/fork: do not attempt to populate vcpu_info page
xen/arch/x86/domain.c | 33 +++
xen/arch/x86/include/asm/domain.h | 3 +
xen/arch/x86/include/asm/shared.h | 19 +-
xen/arch/x86/mm/mem_sharing.c | 73 +++----
xen/arch/x86/pv/shim.c | 10 +-
xen/arch/x86/time.c | 34 +++-
xen/arch/x86/x86_64/asm-offsets.c | 2 +-
xen/arch/x86/x86_64/domain.c | 36 ++++
xen/arch/x86/x86_64/traps.c | 2 +-
xen/common/compat/domain.c | 2 +-
xen/common/domain.c | 324 ++++++++++++++++++++++--------
xen/include/public/vcpu.h | 19 ++
xen/include/xen/domain.h | 12 +-
xen/include/xen/sched.h | 8 +-
xen/include/xen/shared.h | 3 +-
15 files changed, 440 insertions(+), 140 deletions(-)
--
2.42.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH v5 01/10] mem_sharing/fork: do not attempt to populate vcpu_info page
2023-10-02 15:11 [PATCH v5 00/10] runstate/time area registration by (guest) physical address Roger Pau Monne
@ 2023-10-02 15:11 ` Roger Pau Monne
[not found] ` <CABfawhm2XMmfyx7vZvGdLZcot3=Mrrx3T5nS3vUR+Ur9j5mkWg@mail.gmail.com>
` (3 more replies)
2023-10-02 15:11 ` [PATCH v5 02/10] x86/shim: zap runstate and time area handles during shutdown Roger Pau Monne
` (11 subsequent siblings)
12 siblings, 4 replies; 42+ messages in thread
From: Roger Pau Monne @ 2023-10-02 15:11 UTC (permalink / raw)
To: xen-devel, henry.wang
Cc: Roger Pau Monne, Tamas K Lengyel, Jan Beulich, Andrew Cooper,
George Dunlap, Wei Liu
Instead let map_vcpu_info() and it's call to get_page_from_gfn()
populate the page in the child as needed. Also remove the bogus
copy_domain_page(): should be placed before the call to map_vcpu_info(),
as the later can update the contents of the vcpu_info page.
Note that this eliminates a bug in copy_vcpu_settings(): The function did
allocate a new page regardless of the GFN already having a mapping, thus in
particular breaking the case of two vCPU-s having their info areas on the same
page.
Fixes: 41548c5472a3 ('mem_sharing: VM forking')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Only build tested.
---
Changes since v4:
- New in this version.
---
xen/arch/x86/mm/mem_sharing.c | 36 ++++++-----------------------------
1 file changed, 6 insertions(+), 30 deletions(-)
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index ae5366d4476e..5f8f1fb4d871 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1689,48 +1689,24 @@ static int copy_vcpu_settings(struct domain *cd, const struct domain *d)
unsigned int i;
struct p2m_domain *p2m = p2m_get_hostp2m(cd);
int ret = -EINVAL;
+ mfn_t vcpu_info_mfn;
for ( i = 0; i < cd->max_vcpus; i++ )
{
struct vcpu *d_vcpu = d->vcpu[i];
struct vcpu *cd_vcpu = cd->vcpu[i];
- mfn_t vcpu_info_mfn;
if ( !d_vcpu || !cd_vcpu )
continue;
- /* Copy & map in the vcpu_info page if the guest uses one */
+ /* Map in the vcpu_info page if the guest uses one */
vcpu_info_mfn = d_vcpu->vcpu_info_mfn;
if ( !mfn_eq(vcpu_info_mfn, INVALID_MFN) )
{
- mfn_t new_vcpu_info_mfn = cd_vcpu->vcpu_info_mfn;
-
- /* Allocate & map the page for it if it hasn't been already */
- if ( mfn_eq(new_vcpu_info_mfn, INVALID_MFN) )
- {
- gfn_t gfn = mfn_to_gfn(d, vcpu_info_mfn);
- unsigned long gfn_l = gfn_x(gfn);
- struct page_info *page;
-
- if ( !(page = alloc_domheap_page(cd, 0)) )
- return -ENOMEM;
-
- new_vcpu_info_mfn = page_to_mfn(page);
- set_gpfn_from_mfn(mfn_x(new_vcpu_info_mfn), gfn_l);
-
- ret = p2m->set_entry(p2m, gfn, new_vcpu_info_mfn,
- PAGE_ORDER_4K, p2m_ram_rw,
- p2m->default_access, -1);
- if ( ret )
- return ret;
-
- ret = map_vcpu_info(cd_vcpu, gfn_l,
- PAGE_OFFSET(d_vcpu->vcpu_info));
- if ( ret )
- return ret;
- }
-
- copy_domain_page(new_vcpu_info_mfn, vcpu_info_mfn);
+ ret = map_vcpu_info(cd_vcpu, mfn_to_gfn(d, vcpu_info_mfn),
+ PAGE_OFFSET(d_vcpu->vcpu_info));
+ if ( ret )
+ return ret;
}
ret = copy_vpmu(d_vcpu, cd_vcpu);
--
2.42.0
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v5 02/10] x86/shim: zap runstate and time area handles during shutdown
2023-10-02 15:11 [PATCH v5 00/10] runstate/time area registration by (guest) physical address Roger Pau Monne
2023-10-02 15:11 ` [PATCH v5 01/10] mem_sharing/fork: do not attempt to populate vcpu_info page Roger Pau Monne
@ 2023-10-02 15:11 ` Roger Pau Monne
2023-10-02 15:11 ` [PATCH v5 03/10] domain: GADDR based shared guest area registration alternative - teardown Roger Pau Monne
` (10 subsequent siblings)
12 siblings, 0 replies; 42+ messages in thread
From: Roger Pau Monne @ 2023-10-02 15:11 UTC (permalink / raw)
To: xen-devel, henry.wang
Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu
From: Jan Beulich <jbeulich@suse.com>
While likely the guest would just re-register the same areas after
a possible resume, let's not take this for granted and avoid the risk of
otherwise corrupting guest memory.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
---
xen/arch/x86/pv/shim.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
index ca0e639db323..7e4bacf7ae40 100644
--- a/xen/arch/x86/pv/shim.c
+++ b/xen/arch/x86/pv/shim.c
@@ -385,6 +385,10 @@ int pv_shim_shutdown(uint8_t reason)
/* Unmap guest vcpu_info pages. */
unmap_vcpu_info(v);
+ /* Zap runstate and time area handles. */
+ set_xen_guest_handle(runstate_guest(v), NULL);
+ set_xen_guest_handle(v->arch.time_info_guest, NULL);
+
/* Reset the periodic timer to the default value. */
vcpu_set_periodic_timer(v, MILLISECS(10));
/* Stop the singleshot timer. */
--
2.42.0
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v5 03/10] domain: GADDR based shared guest area registration alternative - teardown
2023-10-02 15:11 [PATCH v5 00/10] runstate/time area registration by (guest) physical address Roger Pau Monne
2023-10-02 15:11 ` [PATCH v5 01/10] mem_sharing/fork: do not attempt to populate vcpu_info page Roger Pau Monne
2023-10-02 15:11 ` [PATCH v5 02/10] x86/shim: zap runstate and time area handles during shutdown Roger Pau Monne
@ 2023-10-02 15:11 ` Roger Pau Monne
2023-10-02 15:11 ` [PATCH v5 04/10] domain: update GADDR based runstate guest area Roger Pau Monne
` (9 subsequent siblings)
12 siblings, 0 replies; 42+ messages in thread
From: Roger Pau Monne @ 2023-10-02 15:11 UTC (permalink / raw)
To: xen-devel, henry.wang
Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
George Dunlap, Julien Grall, Stefano Stabellini, Julien Grall
From: Jan Beulich <jbeulich@suse.com>
In preparation of the introduction of new vCPU operations allowing to
register the respective areas (one of the two is x86-specific) by
guest-physical address, add the necessary domain cleanup hooks.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
---
xen/arch/x86/domain.c | 5 +++++
xen/arch/x86/include/asm/domain.h | 1 +
xen/arch/x86/pv/shim.c | 4 +++-
xen/common/domain.c | 17 +++++++++++++++++
xen/include/xen/domain.h | 11 +++++++++++
xen/include/xen/sched.h | 1 +
6 files changed, 38 insertions(+), 1 deletion(-)
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 645675d87d9d..9d352defa25e 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1024,7 +1024,10 @@ int arch_domain_soft_reset(struct domain *d)
}
for_each_vcpu ( d, v )
+ {
set_xen_guest_handle(v->arch.time_info_guest, NULL);
+ unmap_guest_area(v, &v->arch.time_guest_area);
+ }
exit_put_gfn:
put_gfn(d, gfn_x(gfn));
@@ -2381,6 +2384,8 @@ int domain_relinquish_resources(struct domain *d)
if ( ret )
return ret;
+ unmap_guest_area(v, &v->arch.time_guest_area);
+
vpmu_destroy(v);
}
diff --git a/xen/arch/x86/include/asm/domain.h b/xen/arch/x86/include/asm/domain.h
index c2d9fc333be5..e0bd28e424e0 100644
--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -669,6 +669,7 @@ struct arch_vcpu
/* A secondary copy of the vcpu time info. */
XEN_GUEST_HANDLE(vcpu_time_info_t) time_info_guest;
+ struct guest_area time_guest_area;
struct arch_vm_event *vm_event;
diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
index 7e4bacf7ae40..f08b16bae2fe 100644
--- a/xen/arch/x86/pv/shim.c
+++ b/xen/arch/x86/pv/shim.c
@@ -382,8 +382,10 @@ int pv_shim_shutdown(uint8_t reason)
for_each_vcpu ( d, v )
{
- /* Unmap guest vcpu_info pages. */
+ /* Unmap guest vcpu_info page and runstate/time areas. */
unmap_vcpu_info(v);
+ unmap_guest_area(v, &v->runstate_guest_area);
+ unmap_guest_area(v, &v->arch.time_guest_area);
/* Zap runstate and time area handles. */
set_xen_guest_handle(runstate_guest(v), NULL);
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 304aa04fa6cb..76a4c2072e10 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -992,7 +992,10 @@ int domain_kill(struct domain *d)
if ( cpupool_move_domain(d, cpupool0) )
return -ERESTART;
for_each_vcpu ( d, v )
+ {
unmap_vcpu_info(v);
+ unmap_guest_area(v, &v->runstate_guest_area);
+ }
d->is_dying = DOMDYING_dead;
/* Mem event cleanup has to go here because the rings
* have to be put before we call put_domain. */
@@ -1446,6 +1449,7 @@ int domain_soft_reset(struct domain *d, bool resuming)
{
set_xen_guest_handle(runstate_guest(v), NULL);
unmap_vcpu_info(v);
+ unmap_guest_area(v, &v->runstate_guest_area);
}
rc = arch_domain_soft_reset(d);
@@ -1597,6 +1601,19 @@ void unmap_vcpu_info(struct vcpu *v)
put_page_and_type(mfn_to_page(mfn));
}
+/*
+ * This is only intended to be used for domain cleanup (or more generally only
+ * with at least the respective vCPU, if it's not the current one, reliably
+ * paused).
+ */
+void unmap_guest_area(struct vcpu *v, struct guest_area *area)
+{
+ struct domain *d = v->domain;
+
+ if ( v != current )
+ ASSERT(atomic_read(&v->pause_count) | atomic_read(&d->pause_count));
+}
+
int default_initialise_vcpu(struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg)
{
struct vcpu_guest_context *ctxt;
diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index 81fb05a64275..a6b22fa2cac8 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -5,6 +5,12 @@
#include <xen/types.h>
#include <public/xen.h>
+
+struct guest_area {
+ struct page_info *pg;
+ void *map;
+};
+
#include <asm/domain.h>
#include <asm/numa.h>
@@ -77,6 +83,11 @@ void arch_vcpu_destroy(struct vcpu *v);
int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned int offset);
void unmap_vcpu_info(struct vcpu *v);
+int map_guest_area(struct vcpu *v, paddr_t gaddr, unsigned int size,
+ struct guest_area *area,
+ void (*populate)(void *dst, struct vcpu *v));
+void unmap_guest_area(struct vcpu *v, struct guest_area *area);
+
struct xen_domctl_createdomain;
int arch_domain_create(struct domain *d,
struct xen_domctl_createdomain *config,
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index d8c8dd85a67d..f30f3b0ebeab 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -203,6 +203,7 @@ struct vcpu
XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat;
} runstate_guest; /* guest address */
#endif
+ struct guest_area runstate_guest_area;
unsigned int new_state;
/* Has the FPU been initialised? */
--
2.42.0
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v5 04/10] domain: update GADDR based runstate guest area
2023-10-02 15:11 [PATCH v5 00/10] runstate/time area registration by (guest) physical address Roger Pau Monne
` (2 preceding siblings ...)
2023-10-02 15:11 ` [PATCH v5 03/10] domain: GADDR based shared guest area registration alternative - teardown Roger Pau Monne
@ 2023-10-02 15:11 ` Roger Pau Monne
2023-10-02 15:11 ` [PATCH v5 05/10] x86: update GADDR based secondary time area Roger Pau Monne
` (8 subsequent siblings)
12 siblings, 0 replies; 42+ messages in thread
From: Roger Pau Monne @ 2023-10-02 15:11 UTC (permalink / raw)
To: xen-devel, henry.wang
Cc: Jan Beulich, Andrew Cooper, George Dunlap, Julien Grall,
Stefano Stabellini, Wei Liu, Roger Pau Monné, Julien Grall
From: Jan Beulich <jbeulich@suse.com>
Before adding a new vCPU operation to register the runstate area by
guest-physical address, add code to actually keep such areas up-to-date.
Note that updating of the area will be done exclusively following the
model enabled by VMASST_TYPE_runstate_update_flag for virtual-address
based registered areas.
Note further that pages aren't marked dirty when written to (matching
the handling of space mapped by map_vcpu_info()), on the basis that the
registrations are lost anyway across migration (or would need re-
populating at the target for transparent migration). Plus the contents
of the areas in question have to be deemed volatile in the first place
(so saving a "most recent" value is pretty meaningless even for e.g.
snapshotting).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
---
xen/common/domain.c | 43 ++++++++++++++++++++++++++++++++++++++---
xen/include/xen/sched.h | 2 ++
2 files changed, 42 insertions(+), 3 deletions(-)
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 76a4c2072e10..d4958ec5e149 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1644,15 +1644,52 @@ bool update_runstate_area(struct vcpu *v)
bool rc;
struct guest_memory_policy policy = { };
void __user *guest_handle = NULL;
- struct vcpu_runstate_info runstate;
+ struct vcpu_runstate_info runstate = v->runstate;
+ struct vcpu_runstate_info *map = v->runstate_guest_area.map;
+
+ if ( map )
+ {
+ uint64_t *pset;
+#ifdef CONFIG_COMPAT
+ struct compat_vcpu_runstate_info *cmap = NULL;
+
+ if ( v->runstate_guest_area_compat )
+ cmap = (void *)map;
+#endif
+
+ /*
+ * NB: No VM_ASSIST(v->domain, runstate_update_flag) check here.
+ * Always using that updating model.
+ */
+#ifdef CONFIG_COMPAT
+ if ( cmap )
+ pset = &cmap->state_entry_time;
+ else
+#endif
+ pset = &map->state_entry_time;
+ runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
+ write_atomic(pset, runstate.state_entry_time);
+ smp_wmb();
+
+#ifdef CONFIG_COMPAT
+ if ( cmap )
+ XLAT_vcpu_runstate_info(cmap, &runstate);
+ else
+#endif
+ *map = runstate;
+
+ smp_wmb();
+ runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
+ write_atomic(pset, runstate.state_entry_time);
+
+ return true;
+ }
if ( guest_handle_is_null(runstate_guest(v)) )
return true;
update_guest_memory_policy(v, &policy);
- memcpy(&runstate, &v->runstate, sizeof(runstate));
-
if ( VM_ASSIST(v->domain, runstate_update_flag) )
{
#ifdef CONFIG_COMPAT
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index f30f3b0ebeab..6e1028785d8c 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -232,6 +232,8 @@ struct vcpu
#ifdef CONFIG_COMPAT
/* A hypercall is using the compat ABI? */
bool hcall_compat;
+ /* Physical runstate area registered via compat ABI? */
+ bool runstate_guest_area_compat;
#endif
#ifdef CONFIG_IOREQ_SERVER
--
2.42.0
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v5 05/10] x86: update GADDR based secondary time area
2023-10-02 15:11 [PATCH v5 00/10] runstate/time area registration by (guest) physical address Roger Pau Monne
` (3 preceding siblings ...)
2023-10-02 15:11 ` [PATCH v5 04/10] domain: update GADDR based runstate guest area Roger Pau Monne
@ 2023-10-02 15:11 ` Roger Pau Monne
2023-10-02 15:11 ` [PATCH v5 06/10] x86/mem-sharing: copy GADDR based shared guest areas Roger Pau Monne
` (7 subsequent siblings)
12 siblings, 0 replies; 42+ messages in thread
From: Roger Pau Monne @ 2023-10-02 15:11 UTC (permalink / raw)
To: xen-devel, henry.wang
Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu
From: Jan Beulich <jbeulich@suse.com>
Before adding a new vCPU operation to register the secondary time area
by guest-physical address, add code to actually keep such areas up-to-
date.
Note that pages aren't marked dirty when written to (matching the
handling of space mapped by map_vcpu_info()), on the basis that the
registrations are lost anyway across migration (or would need re-
populating at the target for transparent migration). Plus the contents
of the areas in question have to be deemed volatile in the first place
(so saving a "most recent" value is pretty meaningless even for e.g.
snapshotting).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
---
xen/arch/x86/time.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index af40a9993c81..332d2d79aeae 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1566,12 +1566,34 @@ static void __update_vcpu_system_time(struct vcpu *v, int force)
v->arch.pv.pending_system_time = _u;
}
+static void write_time_guest_area(struct vcpu_time_info *map,
+ const struct vcpu_time_info *src)
+{
+ /* 1. Update userspace version. */
+ write_atomic(&map->version, src->version);
+ smp_wmb();
+
+ /* 2. Update all other userspace fields. */
+ *map = *src;
+
+ /* 3. Update userspace version again. */
+ smp_wmb();
+ write_atomic(&map->version, version_update_end(src->version));
+}
+
bool update_secondary_system_time(struct vcpu *v,
struct vcpu_time_info *u)
{
XEN_GUEST_HANDLE(vcpu_time_info_t) user_u = v->arch.time_info_guest;
+ struct vcpu_time_info *map = v->arch.time_guest_area.map;
struct guest_memory_policy policy = { .nested_guest_mode = false };
+ if ( map )
+ {
+ write_time_guest_area(map, u);
+ return true;
+ }
+
if ( guest_handle_is_null(user_u) )
return true;
--
2.42.0
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v5 06/10] x86/mem-sharing: copy GADDR based shared guest areas
2023-10-02 15:11 [PATCH v5 00/10] runstate/time area registration by (guest) physical address Roger Pau Monne
` (4 preceding siblings ...)
2023-10-02 15:11 ` [PATCH v5 05/10] x86: update GADDR based secondary time area Roger Pau Monne
@ 2023-10-02 15:11 ` Roger Pau Monne
[not found] ` <CABfawhnHg3KrGP-hp4_Q8GvSf2nVSVSyK24HKqAGuWp_AtD8-A@mail.gmail.com>
2023-10-02 15:11 ` [PATCH v5 07/10] domain: map/unmap " Roger Pau Monne
` (6 subsequent siblings)
12 siblings, 1 reply; 42+ messages in thread
From: Roger Pau Monne @ 2023-10-02 15:11 UTC (permalink / raw)
To: xen-devel, henry.wang
Cc: Jan Beulich, Tamas K Lengyel, Andrew Cooper, George Dunlap,
Roger Pau Monné, Wei Liu, Julien Grall, Stefano Stabellini
From: Jan Beulich <jbeulich@suse.com>
In preparation of the introduction of new vCPU operations allowing to
register the respective areas (one of the two is x86-specific) by
guest-physical address, add the necessary fork handling (with the
backing function yet to be filled in).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v4:
- Rely on map_guest_area() to populate the child p2m if necessary.
---
xen/arch/x86/mm/mem_sharing.c | 31 +++++++++++++++++++++++++++++++
xen/common/domain.c | 7 +++++++
2 files changed, 38 insertions(+)
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 5f8f1fb4d871..99cf001fd70f 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1641,6 +1641,24 @@ static void copy_vcpu_nonreg_state(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
hvm_set_nonreg_state(cd_vcpu, &nrs);
}
+static int copy_guest_area(struct guest_area *cd_area,
+ const struct guest_area *d_area,
+ struct vcpu *cd_vcpu,
+ const struct domain *d)
+{
+ unsigned int offset;
+
+ /* Check if no area to map, or already mapped. */
+ if ( !d_area->pg || cd_area->pg )
+ return 0;
+
+ offset = PAGE_OFFSET(d_area->map);
+ return map_guest_area(cd_vcpu, gfn_to_gaddr(
+ mfn_to_gfn(d, page_to_mfn(d_area->pg))) +
+ offset,
+ PAGE_SIZE - offset, cd_area, NULL);
+}
+
static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
{
struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu);
@@ -1709,6 +1727,16 @@ static int copy_vcpu_settings(struct domain *cd, const struct domain *d)
return ret;
}
+ /* Same for the (physically registered) runstate and time info areas. */
+ ret = copy_guest_area(&cd_vcpu->runstate_guest_area,
+ &d_vcpu->runstate_guest_area, cd_vcpu, d);
+ if ( ret )
+ return ret;
+ ret = copy_guest_area(&cd_vcpu->arch.time_guest_area,
+ &d_vcpu->arch.time_guest_area, cd_vcpu, d);
+ if ( ret )
+ return ret;
+
ret = copy_vpmu(d_vcpu, cd_vcpu);
if ( ret )
return ret;
@@ -1950,7 +1978,10 @@ int mem_sharing_fork_reset(struct domain *d, bool reset_state,
state:
if ( reset_state )
+ {
rc = copy_settings(d, pd);
+ /* TBD: What to do here with -ERESTART? */
+ }
domain_unpause(d);
diff --git a/xen/common/domain.c b/xen/common/domain.c
index d4958ec5e149..47fc90271901 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1601,6 +1601,13 @@ void unmap_vcpu_info(struct vcpu *v)
put_page_and_type(mfn_to_page(mfn));
}
+int map_guest_area(struct vcpu *v, paddr_t gaddr, unsigned int size,
+ struct guest_area *area,
+ void (*populate)(void *dst, struct vcpu *v))
+{
+ return -EOPNOTSUPP;
+}
+
/*
* This is only intended to be used for domain cleanup (or more generally only
* with at least the respective vCPU, if it's not the current one, reliably
--
2.42.0
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v5 07/10] domain: map/unmap GADDR based shared guest areas
2023-10-02 15:11 [PATCH v5 00/10] runstate/time area registration by (guest) physical address Roger Pau Monne
` (5 preceding siblings ...)
2023-10-02 15:11 ` [PATCH v5 06/10] x86/mem-sharing: copy GADDR based shared guest areas Roger Pau Monne
@ 2023-10-02 15:11 ` Roger Pau Monne
2023-10-02 16:40 ` Julien Grall
2023-10-02 15:11 ` [PATCH v5 08/10] domain: introduce GADDR based runstate area registration alternative Roger Pau Monne
` (5 subsequent siblings)
12 siblings, 1 reply; 42+ messages in thread
From: Roger Pau Monne @ 2023-10-02 15:11 UTC (permalink / raw)
To: xen-devel, henry.wang
Cc: Jan Beulich, Andrew Cooper, George Dunlap, Julien Grall,
Stefano Stabellini, Wei Liu, Roger Pau Monné
From: Jan Beulich <jbeulich@suse.com>
The registration by virtual/linear address has downsides: At least on
x86 the access is expensive for HVM/PVH domains. Furthermore for 64-bit
PV domains the areas are inaccessible (and hence cannot be updated by
Xen) when in guest-user mode, and for HVM guests they may be
inaccessible when Meltdown mitigations are in place. (There are yet
more issues.)
In preparation of the introduction of new vCPU operations allowing to
register the respective areas (one of the two is x86-specific) by
guest-physical address, flesh out the map/unmap functions.
Noteworthy differences from map_vcpu_info():
- areas can be registered more than once (and de-registered),
- remote vCPU-s are paused rather than checked for being down (which in
principle can change right after the check),
- the domain lock is taken for a much smaller region.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
---
xen/common/domain.c | 92 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 91 insertions(+), 1 deletion(-)
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 47fc90271901..747bf5c87a8d 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1605,7 +1605,82 @@ int map_guest_area(struct vcpu *v, paddr_t gaddr, unsigned int size,
struct guest_area *area,
void (*populate)(void *dst, struct vcpu *v))
{
- return -EOPNOTSUPP;
+ struct domain *d = v->domain;
+ void *map = NULL;
+ struct page_info *pg = NULL;
+ int rc = 0;
+
+ if ( ~gaddr ) /* Map (i.e. not just unmap)? */
+ {
+ unsigned long gfn = PFN_DOWN(gaddr);
+ unsigned int align;
+ p2m_type_t p2mt;
+
+ if ( gfn != PFN_DOWN(gaddr + size - 1) )
+ return -ENXIO;
+
+#ifdef CONFIG_COMPAT
+ if ( has_32bit_shinfo(d) )
+ align = alignof(compat_ulong_t);
+ else
+#endif
+ align = alignof(xen_ulong_t);
+ if ( !IS_ALIGNED(gaddr, align) )
+ return -ENXIO;
+
+ rc = check_get_page_from_gfn(d, _gfn(gfn), false, &p2mt, &pg);
+ if ( rc )
+ return rc;
+
+ if ( !get_page_type(pg, PGT_writable_page) )
+ {
+ put_page(pg);
+ return -EACCES;
+ }
+
+ map = __map_domain_page_global(pg);
+ if ( !map )
+ {
+ put_page_and_type(pg);
+ return -ENOMEM;
+ }
+ map += PAGE_OFFSET(gaddr);
+ }
+
+ if ( v != current )
+ {
+ if ( !spin_trylock(&d->hypercall_deadlock_mutex) )
+ {
+ rc = -ERESTART;
+ goto unmap;
+ }
+
+ vcpu_pause(v);
+
+ spin_unlock(&d->hypercall_deadlock_mutex);
+ }
+
+ domain_lock(d);
+
+ if ( map && populate )
+ populate(map, v);
+
+ SWAP(area->pg, pg);
+ SWAP(area->map, map);
+
+ domain_unlock(d);
+
+ if ( v != current )
+ vcpu_unpause(v);
+
+ unmap:
+ if ( pg )
+ {
+ unmap_domain_page_global(map);
+ put_page_and_type(pg);
+ }
+
+ return rc;
}
/*
@@ -1616,9 +1691,24 @@ int map_guest_area(struct vcpu *v, paddr_t gaddr, unsigned int size,
void unmap_guest_area(struct vcpu *v, struct guest_area *area)
{
struct domain *d = v->domain;
+ void *map;
+ struct page_info *pg;
if ( v != current )
ASSERT(atomic_read(&v->pause_count) | atomic_read(&d->pause_count));
+
+ domain_lock(d);
+ map = area->map;
+ area->map = NULL;
+ pg = area->pg;
+ area->pg = NULL;
+ domain_unlock(d);
+
+ if ( pg )
+ {
+ unmap_domain_page_global(map);
+ put_page_and_type(pg);
+ }
}
int default_initialise_vcpu(struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg)
--
2.42.0
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v5 08/10] domain: introduce GADDR based runstate area registration alternative
2023-10-02 15:11 [PATCH v5 00/10] runstate/time area registration by (guest) physical address Roger Pau Monne
` (6 preceding siblings ...)
2023-10-02 15:11 ` [PATCH v5 07/10] domain: map/unmap " Roger Pau Monne
@ 2023-10-02 15:11 ` Roger Pau Monne
2023-10-02 16:43 ` Julien Grall
2023-10-02 15:11 ` [PATCH v5 09/10] x86: introduce GADDR based secondary time " Roger Pau Monne
` (4 subsequent siblings)
12 siblings, 1 reply; 42+ messages in thread
From: Roger Pau Monne @ 2023-10-02 15:11 UTC (permalink / raw)
To: xen-devel, henry.wang
Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
George Dunlap, Julien Grall, Stefano Stabellini
From: Jan Beulich <jbeulich@suse.com>
The registration by virtual/linear address has downsides: At least on
x86 the access is expensive for HVM/PVH domains. Furthermore for 64-bit
PV domains the area is inaccessible (and hence cannot be updated by Xen)
when in guest-user mode.
Introduce a new vCPU operation allowing to register the runstate area by
guest-physical address.
An at least theoretical downside to using physically registered areas is
that PV then won't see dirty (and perhaps also accessed) bits set in its
respective page table entries.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
---
xen/arch/x86/x86_64/domain.c | 35 ++++++++++++++++++++++++++++++++
xen/common/domain.c | 39 ++++++++++++++++++++++++++++++++++++
xen/include/public/vcpu.h | 15 ++++++++++++++
3 files changed, 89 insertions(+)
diff --git a/xen/arch/x86/x86_64/domain.c b/xen/arch/x86/x86_64/domain.c
index bfaea17fe718..494b0b54e64e 100644
--- a/xen/arch/x86/x86_64/domain.c
+++ b/xen/arch/x86/x86_64/domain.c
@@ -12,6 +12,22 @@
CHECK_vcpu_get_physid;
#undef xen_vcpu_get_physid
+static void cf_check
+runstate_area_populate(void *map, struct vcpu *v)
+{
+ if ( is_pv_vcpu(v) )
+ v->arch.pv.need_update_runstate_area = false;
+
+ v->runstate_guest_area_compat = true;
+
+ if ( v == current )
+ {
+ struct compat_vcpu_runstate_info *info = map;
+
+ XLAT_vcpu_runstate_info(info, &v->runstate);
+ }
+}
+
int
compat_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
{
@@ -58,6 +74,25 @@ compat_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
break;
}
+ case VCPUOP_register_runstate_phys_area:
+ {
+ struct compat_vcpu_register_runstate_memory_area area;
+
+ rc = -EFAULT;
+ if ( copy_from_guest(&area.addr.p, arg, 1) )
+ break;
+
+ rc = map_guest_area(v, area.addr.p,
+ sizeof(struct compat_vcpu_runstate_info),
+ &v->runstate_guest_area,
+ runstate_area_populate);
+ if ( rc == -ERESTART )
+ rc = hypercall_create_continuation(__HYPERVISOR_vcpu_op, "iih",
+ cmd, vcpuid, arg);
+
+ break;
+ }
+
case VCPUOP_register_vcpu_time_memory_area:
{
struct compat_vcpu_register_time_memory_area area = { .addr.p = 0 };
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 747bf5c87a8d..486c1ae3f7f3 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1830,6 +1830,26 @@ bool update_runstate_area(struct vcpu *v)
return rc;
}
+static void cf_check
+runstate_area_populate(void *map, struct vcpu *v)
+{
+#ifdef CONFIG_PV
+ if ( is_pv_vcpu(v) )
+ v->arch.pv.need_update_runstate_area = false;
+#endif
+
+#ifdef CONFIG_COMPAT
+ v->runstate_guest_area_compat = false;
+#endif
+
+ if ( v == current )
+ {
+ struct vcpu_runstate_info *info = map;
+
+ *info = v->runstate;
+ }
+}
+
long common_vcpu_op(int cmd, struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg)
{
long rc = 0;
@@ -2012,6 +2032,25 @@ long common_vcpu_op(int cmd, struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg)
break;
}
+ case VCPUOP_register_runstate_phys_area:
+ {
+ struct vcpu_register_runstate_memory_area area;
+
+ rc = -EFAULT;
+ if ( copy_from_guest(&area.addr.p, arg, 1) )
+ break;
+
+ rc = map_guest_area(v, area.addr.p,
+ sizeof(struct vcpu_runstate_info),
+ &v->runstate_guest_area,
+ runstate_area_populate);
+ if ( rc == -ERESTART )
+ rc = hypercall_create_continuation(__HYPERVISOR_vcpu_op, "iih",
+ cmd, vcpuid, arg);
+
+ break;
+ }
+
default:
rc = -ENOSYS;
break;
diff --git a/xen/include/public/vcpu.h b/xen/include/public/vcpu.h
index a836b264a911..9dac0f9748ca 100644
--- a/xen/include/public/vcpu.h
+++ b/xen/include/public/vcpu.h
@@ -110,6 +110,8 @@ DEFINE_XEN_GUEST_HANDLE(vcpu_runstate_info_t);
* runstate.state will always be RUNSTATE_running and
* runstate.state_entry_time will indicate the system time at which the
* VCPU was last scheduled to run.
+ * 3. New code wants to prefer VCPUOP_register_runstate_phys_area, and only
+ * fall back to the operation here for backwards compatibility.
* @extra_arg == pointer to vcpu_register_runstate_memory_area structure.
*/
#define VCPUOP_register_runstate_memory_area 5
@@ -221,6 +223,19 @@ struct vcpu_register_time_memory_area {
typedef struct vcpu_register_time_memory_area vcpu_register_time_memory_area_t;
DEFINE_XEN_GUEST_HANDLE(vcpu_register_time_memory_area_t);
+/*
+ * Like the respective VCPUOP_register_*_memory_area, just using the "addr.p"
+ * field of the supplied struct as a guest physical address (i.e. in GFN space).
+ * The respective area may not cross a page boundary. Pass ~0 to unregister an
+ * area. Note that as long as an area is registered by physical address, the
+ * linear address based area will not be serviced (updated) by the hypervisor.
+ *
+ * Note that the area registered via VCPUOP_register_runstate_memory_area will
+ * be updated in the same manner as the one registered via virtual address PLUS
+ * VMASST_TYPE_runstate_update_flag engaged by the domain.
+ */
+#define VCPUOP_register_runstate_phys_area 14
+
#endif /* __XEN_PUBLIC_VCPU_H__ */
/*
--
2.42.0
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v5 09/10] x86: introduce GADDR based secondary time area registration alternative
2023-10-02 15:11 [PATCH v5 00/10] runstate/time area registration by (guest) physical address Roger Pau Monne
` (7 preceding siblings ...)
2023-10-02 15:11 ` [PATCH v5 08/10] domain: introduce GADDR based runstate area registration alternative Roger Pau Monne
@ 2023-10-02 15:11 ` Roger Pau Monne
2023-10-02 16:44 ` Julien Grall
2023-10-02 15:11 ` [PATCH v5 10/10] common: convert vCPU info area registration Roger Pau Monne
` (3 subsequent siblings)
12 siblings, 1 reply; 42+ messages in thread
From: Roger Pau Monne @ 2023-10-02 15:11 UTC (permalink / raw)
To: xen-devel, henry.wang
Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
George Dunlap, Julien Grall, Stefano Stabellini
From: Jan Beulich <jbeulich@suse.com>
The registration by virtual/linear address has downsides: The access is
expensive for HVM/PVH domains. Furthermore for 64-bit PV domains the area
is inaccessible (and hence cannot be updated by Xen) when in guest-user
mode.
Introduce a new vCPU operation allowing to register the secondary time
area by guest-physical address.
An at least theoretical downside to using physically registered areas is
that PV then won't see dirty (and perhaps also accessed) bits set in its
respective page table entries.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
---
xen/arch/x86/domain.c | 28 ++++++++++++++++++++++++++++
xen/arch/x86/include/asm/domain.h | 2 ++
xen/arch/x86/time.c | 10 ++++++++++
xen/arch/x86/x86_64/domain.c | 1 +
xen/include/public/vcpu.h | 4 ++++
5 files changed, 45 insertions(+)
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 9d352defa25e..8e0af2278104 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1529,6 +1529,15 @@ int arch_vcpu_reset(struct vcpu *v)
return 0;
}
+static void cf_check
+time_area_populate(void *map, struct vcpu *v)
+{
+ if ( is_pv_vcpu(v) )
+ v->arch.pv.pending_system_time.version = 0;
+
+ force_update_secondary_system_time(v, map);
+}
+
long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
{
long rc = 0;
@@ -1567,6 +1576,25 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
break;
}
+ case VCPUOP_register_vcpu_time_phys_area:
+ {
+ struct vcpu_register_time_memory_area area;
+
+ rc = -EFAULT;
+ if ( copy_from_guest(&area.addr.p, arg, 1) )
+ break;
+
+ rc = map_guest_area(v, area.addr.p,
+ sizeof(vcpu_time_info_t),
+ &v->arch.time_guest_area,
+ time_area_populate);
+ if ( rc == -ERESTART )
+ rc = hypercall_create_continuation(__HYPERVISOR_vcpu_op, "iih",
+ cmd, vcpuid, arg);
+
+ break;
+ }
+
case VCPUOP_get_physid:
{
struct vcpu_get_physid cpu_id;
diff --git a/xen/arch/x86/include/asm/domain.h b/xen/arch/x86/include/asm/domain.h
index e0bd28e424e0..619e667938ed 100644
--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -692,6 +692,8 @@ void domain_cpu_policy_changed(struct domain *d);
bool update_secondary_system_time(struct vcpu *,
struct vcpu_time_info *);
+void force_update_secondary_system_time(struct vcpu *,
+ struct vcpu_time_info *);
void vcpu_show_registers(const struct vcpu *);
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 332d2d79aeae..73df1639a301 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1628,6 +1628,16 @@ void force_update_vcpu_system_time(struct vcpu *v)
__update_vcpu_system_time(v, 1);
}
+void force_update_secondary_system_time(struct vcpu *v,
+ struct vcpu_time_info *map)
+{
+ struct vcpu_time_info u;
+
+ collect_time_info(v, &u);
+ u.version = -1; /* Compensate for version_update_end(). */
+ write_time_guest_area(map, &u);
+}
+
static void update_domain_rtc(void)
{
struct domain *d;
diff --git a/xen/arch/x86/x86_64/domain.c b/xen/arch/x86/x86_64/domain.c
index 494b0b54e64e..a02d4f569ee5 100644
--- a/xen/arch/x86/x86_64/domain.c
+++ b/xen/arch/x86/x86_64/domain.c
@@ -115,6 +115,7 @@ compat_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
case VCPUOP_send_nmi:
case VCPUOP_get_physid:
+ case VCPUOP_register_vcpu_time_phys_area:
rc = do_vcpu_op(cmd, vcpuid, arg);
break;
diff --git a/xen/include/public/vcpu.h b/xen/include/public/vcpu.h
index 9dac0f9748ca..8fb0bd1b6c03 100644
--- a/xen/include/public/vcpu.h
+++ b/xen/include/public/vcpu.h
@@ -209,6 +209,9 @@ DEFINE_XEN_GUEST_HANDLE(vcpu_get_physid_t);
* segment limit). It can then apply the normal algorithm to compute
* system time from the tsc.
*
+ * New code wants to prefer VCPUOP_register_vcpu_time_phys_area, and only
+ * fall back to the operation here for backwards compatibility.
+ *
* @extra_arg == pointer to vcpu_register_time_info_memory_area structure.
*/
#define VCPUOP_register_vcpu_time_memory_area 13
@@ -235,6 +238,7 @@ DEFINE_XEN_GUEST_HANDLE(vcpu_register_time_memory_area_t);
* VMASST_TYPE_runstate_update_flag engaged by the domain.
*/
#define VCPUOP_register_runstate_phys_area 14
+#define VCPUOP_register_vcpu_time_phys_area 15
#endif /* __XEN_PUBLIC_VCPU_H__ */
--
2.42.0
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v5 10/10] common: convert vCPU info area registration
2023-10-02 15:11 [PATCH v5 00/10] runstate/time area registration by (guest) physical address Roger Pau Monne
` (8 preceding siblings ...)
2023-10-02 15:11 ` [PATCH v5 09/10] x86: introduce GADDR based secondary time " Roger Pau Monne
@ 2023-10-02 15:11 ` Roger Pau Monne
2023-10-04 17:15 ` Julien Grall
2023-10-05 1:27 ` [PATCH v5 00/10] runstate/time area registration by (guest) physical address Henry Wang
` (2 subsequent siblings)
12 siblings, 1 reply; 42+ messages in thread
From: Roger Pau Monne @ 2023-10-02 15:11 UTC (permalink / raw)
To: xen-devel, henry.wang
Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
George Dunlap, Julien Grall, Stefano Stabellini, Tamas K Lengyel
From: Jan Beulich <jbeulich@suse.com>
Switch to using map_guest_area(). Noteworthy differences from
map_vcpu_info():
- remote vCPU-s are paused rather than checked for being down (which in
principle can change right after the check),
- the domain lock is taken for a much smaller region,
- the error code for an attempt to re-register the area is now -EBUSY,
- we could in principle permit de-registration when no area was
previously registered (which would permit "probing", if necessary for
anything).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
---
xen/arch/x86/include/asm/shared.h | 19 ++-
xen/arch/x86/mm/mem_sharing.c | 20 ++-
xen/arch/x86/pv/shim.c | 2 +-
xen/arch/x86/time.c | 2 +-
xen/arch/x86/x86_64/asm-offsets.c | 2 +-
xen/arch/x86/x86_64/traps.c | 2 +-
xen/common/compat/domain.c | 2 +-
xen/common/domain.c | 204 ++++++++++++------------------
xen/include/xen/domain.h | 3 -
xen/include/xen/sched.h | 5 +-
xen/include/xen/shared.h | 3 +-
11 files changed, 112 insertions(+), 152 deletions(-)
diff --git a/xen/arch/x86/include/asm/shared.h b/xen/arch/x86/include/asm/shared.h
index dd3ae8c2639d..60b67fa4b427 100644
--- a/xen/arch/x86/include/asm/shared.h
+++ b/xen/arch/x86/include/asm/shared.h
@@ -26,17 +26,20 @@ static inline void arch_set_##field(struct domain *d, \
#define GET_SET_VCPU(type, field) \
static inline type arch_get_##field(const struct vcpu *v) \
{ \
+ const vcpu_info_t *vi = v->vcpu_info_area.map; \
+ \
return !has_32bit_shinfo(v->domain) ? \
- v->vcpu_info->native.arch.field : \
- v->vcpu_info->compat.arch.field; \
+ vi->native.arch.field : vi->compat.arch.field; \
} \
static inline void arch_set_##field(struct vcpu *v, \
type val) \
{ \
+ vcpu_info_t *vi = v->vcpu_info_area.map; \
+ \
if ( !has_32bit_shinfo(v->domain) ) \
- v->vcpu_info->native.arch.field = val; \
+ vi->native.arch.field = val; \
else \
- v->vcpu_info->compat.arch.field = val; \
+ vi->compat.arch.field = val; \
}
#else
@@ -57,12 +60,16 @@ static inline void arch_set_##field(struct domain *d, \
#define GET_SET_VCPU(type, field) \
static inline type arch_get_##field(const struct vcpu *v) \
{ \
- return v->vcpu_info->arch.field; \
+ const vcpu_info_t *vi = v->vcpu_info_area.map; \
+ \
+ return vi->arch.field; \
} \
static inline void arch_set_##field(struct vcpu *v, \
type val) \
{ \
- v->vcpu_info->arch.field = val; \
+ vcpu_info_t *vi = v->vcpu_info_area.map; \
+ \
+ vi->arch.field = val; \
}
#endif
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 99cf001fd70f..6f1ce1623b8d 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1705,7 +1705,6 @@ static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
static int copy_vcpu_settings(struct domain *cd, const struct domain *d)
{
unsigned int i;
- struct p2m_domain *p2m = p2m_get_hostp2m(cd);
int ret = -EINVAL;
mfn_t vcpu_info_mfn;
@@ -1717,17 +1716,14 @@ static int copy_vcpu_settings(struct domain *cd, const struct domain *d)
if ( !d_vcpu || !cd_vcpu )
continue;
- /* Map in the vcpu_info page if the guest uses one */
- vcpu_info_mfn = d_vcpu->vcpu_info_mfn;
- if ( !mfn_eq(vcpu_info_mfn, INVALID_MFN) )
- {
- ret = map_vcpu_info(cd_vcpu, mfn_to_gfn(d, vcpu_info_mfn),
- PAGE_OFFSET(d_vcpu->vcpu_info));
- if ( ret )
- return ret;
- }
-
- /* Same for the (physically registered) runstate and time info areas. */
+ /*
+ * Map the vcpu_info page and the (physically registered) runstate and
+ * time info areas.
+ */
+ ret = copy_guest_area(&cd_vcpu->vcpu_info_area,
+ &d_vcpu->vcpu_info_area, cd_vcpu, d);
+ if ( ret )
+ return ret;
ret = copy_guest_area(&cd_vcpu->runstate_guest_area,
&d_vcpu->runstate_guest_area, cd_vcpu, d);
if ( ret )
diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
index f08b16bae2fe..81e4a0516d18 100644
--- a/xen/arch/x86/pv/shim.c
+++ b/xen/arch/x86/pv/shim.c
@@ -383,7 +383,7 @@ int pv_shim_shutdown(uint8_t reason)
for_each_vcpu ( d, v )
{
/* Unmap guest vcpu_info page and runstate/time areas. */
- unmap_vcpu_info(v);
+ unmap_guest_area(v, &v->vcpu_info_area);
unmap_guest_area(v, &v->runstate_guest_area);
unmap_guest_area(v, &v->arch.time_guest_area);
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 73df1639a301..d0b0986509b2 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1542,7 +1542,7 @@ static void __update_vcpu_system_time(struct vcpu *v, int force)
struct vcpu_time_info *u = &vcpu_info(v, time), _u;
const struct domain *d = v->domain;
- if ( v->vcpu_info == NULL )
+ if ( !v->vcpu_info_area.map )
return;
collect_time_info(v, &_u);
diff --git a/xen/arch/x86/x86_64/asm-offsets.c b/xen/arch/x86/x86_64/asm-offsets.c
index fbd6c54188db..57b73a4e6214 100644
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -53,7 +53,7 @@ void __dummy__(void)
OFFSET(VCPU_processor, struct vcpu, processor);
OFFSET(VCPU_domain, struct vcpu, domain);
- OFFSET(VCPU_vcpu_info, struct vcpu, vcpu_info);
+ OFFSET(VCPU_vcpu_info, struct vcpu, vcpu_info_area.map);
OFFSET(VCPU_trap_bounce, struct vcpu, arch.pv.trap_bounce);
OFFSET(VCPU_thread_flags, struct vcpu, arch.flags);
OFFSET(VCPU_event_addr, struct vcpu, arch.pv.event_callback_eip);
diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c
index f4d17b483032..e03e80813e36 100644
--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -96,7 +96,7 @@ static void _show_registers(
if ( context == CTXT_hypervisor )
printk(" %pS", _p(regs->rip));
printk("\nRFLAGS: %016lx ", regs->rflags);
- if ( (context == CTXT_pv_guest) && v && v->vcpu_info )
+ if ( (context == CTXT_pv_guest) && v && v->vcpu_info_area.map )
printk("EM: %d ", !!vcpu_info(v, evtchn_upcall_mask));
printk("CONTEXT: %s", context_names[context]);
if ( v && !is_idle_vcpu(v) )
diff --git a/xen/common/compat/domain.c b/xen/common/compat/domain.c
index c4254905359e..7ff238cc2656 100644
--- a/xen/common/compat/domain.c
+++ b/xen/common/compat/domain.c
@@ -49,7 +49,7 @@ int compat_common_vcpu_op(int cmd, struct vcpu *v,
{
case VCPUOP_initialise:
{
- if ( v->vcpu_info == &dummy_vcpu_info )
+ if ( v->vcpu_info_area.map == &dummy_vcpu_info )
return -EINVAL;
#ifdef CONFIG_HVM
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 486c1ae3f7f3..b8281d7cff9d 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -127,10 +127,10 @@ static void vcpu_info_reset(struct vcpu *v)
{
struct domain *d = v->domain;
- v->vcpu_info = ((v->vcpu_id < XEN_LEGACY_MAX_VCPUS)
- ? (vcpu_info_t *)&shared_info(d, vcpu_info[v->vcpu_id])
- : &dummy_vcpu_info);
- v->vcpu_info_mfn = INVALID_MFN;
+ v->vcpu_info_area.map =
+ ((v->vcpu_id < XEN_LEGACY_MAX_VCPUS)
+ ? (vcpu_info_t *)&shared_info(d, vcpu_info[v->vcpu_id])
+ : &dummy_vcpu_info);
}
static void vmtrace_free_buffer(struct vcpu *v)
@@ -993,7 +993,7 @@ int domain_kill(struct domain *d)
return -ERESTART;
for_each_vcpu ( d, v )
{
- unmap_vcpu_info(v);
+ unmap_guest_area(v, &v->vcpu_info_area);
unmap_guest_area(v, &v->runstate_guest_area);
}
d->is_dying = DOMDYING_dead;
@@ -1448,7 +1448,7 @@ int domain_soft_reset(struct domain *d, bool resuming)
for_each_vcpu ( d, v )
{
set_xen_guest_handle(runstate_guest(v), NULL);
- unmap_vcpu_info(v);
+ unmap_guest_area(v, &v->vcpu_info_area);
unmap_guest_area(v, &v->runstate_guest_area);
}
@@ -1496,111 +1496,6 @@ int vcpu_reset(struct vcpu *v)
return rc;
}
-/*
- * Map a guest page in and point the vcpu_info pointer at it. This
- * makes sure that the vcpu_info is always pointing at a valid piece
- * of memory, and it sets a pending event to make sure that a pending
- * event doesn't get missed.
- */
-int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned int offset)
-{
- struct domain *d = v->domain;
- void *mapping;
- vcpu_info_t *new_info;
- struct page_info *page;
- unsigned int align;
-
- if ( offset > (PAGE_SIZE - sizeof(*new_info)) )
- return -ENXIO;
-
-#ifdef CONFIG_COMPAT
- BUILD_BUG_ON(sizeof(*new_info) != sizeof(new_info->compat));
- if ( has_32bit_shinfo(d) )
- align = alignof(new_info->compat);
- else
-#endif
- align = alignof(*new_info);
- if ( offset & (align - 1) )
- return -ENXIO;
-
- if ( !mfn_eq(v->vcpu_info_mfn, INVALID_MFN) )
- return -EINVAL;
-
- /* Run this command on yourself or on other offline VCPUS. */
- if ( (v != current) && !(v->pause_flags & VPF_down) )
- return -EINVAL;
-
- page = get_page_from_gfn(d, gfn, NULL, P2M_UNSHARE);
- if ( !page )
- return -EINVAL;
-
- if ( !get_page_type(page, PGT_writable_page) )
- {
- put_page(page);
- return -EINVAL;
- }
-
- mapping = __map_domain_page_global(page);
- if ( mapping == NULL )
- {
- put_page_and_type(page);
- return -ENOMEM;
- }
-
- new_info = (vcpu_info_t *)(mapping + offset);
-
- if ( v->vcpu_info == &dummy_vcpu_info )
- {
- memset(new_info, 0, sizeof(*new_info));
-#ifdef XEN_HAVE_PV_UPCALL_MASK
- __vcpu_info(v, new_info, evtchn_upcall_mask) = 1;
-#endif
- }
- else
- {
- memcpy(new_info, v->vcpu_info, sizeof(*new_info));
- }
-
- v->vcpu_info = new_info;
- v->vcpu_info_mfn = page_to_mfn(page);
-
- /* Set new vcpu_info pointer /before/ setting pending flags. */
- smp_wmb();
-
- /*
- * Mark everything as being pending just to make sure nothing gets
- * lost. The domain will get a spurious event, but it can cope.
- */
-#ifdef CONFIG_COMPAT
- if ( !has_32bit_shinfo(d) )
- write_atomic(&new_info->native.evtchn_pending_sel, ~0);
- else
-#endif
- write_atomic(&vcpu_info(v, evtchn_pending_sel), ~0);
- vcpu_mark_events_pending(v);
-
- return 0;
-}
-
-/*
- * Unmap the vcpu info page if the guest decided to place it somewhere
- * else. This is used from domain_kill() and domain_soft_reset().
- */
-void unmap_vcpu_info(struct vcpu *v)
-{
- mfn_t mfn = v->vcpu_info_mfn;
-
- if ( mfn_eq(mfn, INVALID_MFN) )
- return;
-
- unmap_domain_page_global((void *)
- ((unsigned long)v->vcpu_info & PAGE_MASK));
-
- vcpu_info_reset(v); /* NB: Clobbers v->vcpu_info_mfn */
-
- put_page_and_type(mfn_to_page(mfn));
-}
-
int map_guest_area(struct vcpu *v, paddr_t gaddr, unsigned int size,
struct guest_area *area,
void (*populate)(void *dst, struct vcpu *v))
@@ -1662,14 +1557,44 @@ int map_guest_area(struct vcpu *v, paddr_t gaddr, unsigned int size,
domain_lock(d);
- if ( map && populate )
- populate(map, v);
+ /* No re-registration of the vCPU info area. */
+ if ( area != &v->vcpu_info_area || !area->pg )
+ {
+ if ( map && populate )
+ populate(map, v);
- SWAP(area->pg, pg);
- SWAP(area->map, map);
+ SWAP(area->pg, pg);
+ SWAP(area->map, map);
+ }
+ else
+ rc = -EBUSY;
domain_unlock(d);
+ /* Set pending flags /after/ new vcpu_info pointer was set. */
+ if ( area == &v->vcpu_info_area && !rc )
+ {
+ /*
+ * Mark everything as being pending just to make sure nothing gets
+ * lost. The domain will get a spurious event, but it can cope.
+ */
+#ifdef CONFIG_COMPAT
+ if ( !has_32bit_shinfo(d) )
+ {
+ vcpu_info_t *info = area->map;
+
+ /* For VCPUOP_register_vcpu_info handling in common_vcpu_op(). */
+ BUILD_BUG_ON(sizeof(*info) != sizeof(info->compat));
+ write_atomic(&info->native.evtchn_pending_sel, ~0);
+ }
+ else
+#endif
+ write_atomic(&vcpu_info(v, evtchn_pending_sel), ~0);
+ vcpu_mark_events_pending(v);
+
+ force_update_vcpu_system_time(v);
+ }
+
if ( v != current )
vcpu_unpause(v);
@@ -1699,7 +1624,10 @@ void unmap_guest_area(struct vcpu *v, struct guest_area *area)
domain_lock(d);
map = area->map;
- area->map = NULL;
+ if ( area == &v->vcpu_info_area )
+ vcpu_info_reset(v);
+ else
+ area->map = NULL;
pg = area->pg;
area->pg = NULL;
domain_unlock(d);
@@ -1830,6 +1758,27 @@ bool update_runstate_area(struct vcpu *v)
return rc;
}
+/*
+ * This makes sure that the vcpu_info is always pointing at a valid piece of
+ * memory, and it sets a pending event to make sure that a pending event
+ * doesn't get missed.
+ */
+static void cf_check
+vcpu_info_populate(void *map, struct vcpu *v)
+{
+ vcpu_info_t *info = map;
+
+ if ( v->vcpu_info_area.map == &dummy_vcpu_info )
+ {
+ memset(info, 0, sizeof(*info));
+#ifdef XEN_HAVE_PV_UPCALL_MASK
+ __vcpu_info(v, info, evtchn_upcall_mask) = 1;
+#endif
+ }
+ else
+ memcpy(info, v->vcpu_info_area.map, sizeof(*info));
+}
+
static void cf_check
runstate_area_populate(void *map, struct vcpu *v)
{
@@ -1859,7 +1808,7 @@ long common_vcpu_op(int cmd, struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg)
switch ( cmd )
{
case VCPUOP_initialise:
- if ( v->vcpu_info == &dummy_vcpu_info )
+ if ( v->vcpu_info_area.map == &dummy_vcpu_info )
return -EINVAL;
rc = arch_initialise_vcpu(v, arg);
@@ -1990,16 +1939,29 @@ long common_vcpu_op(int cmd, struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg)
case VCPUOP_register_vcpu_info:
{
struct vcpu_register_vcpu_info info;
+ paddr_t gaddr;
rc = -EFAULT;
if ( copy_from_guest(&info, arg, 1) )
break;
- domain_lock(d);
- rc = map_vcpu_info(v, info.mfn, info.offset);
- domain_unlock(d);
+ rc = -EINVAL;
+ gaddr = gfn_to_gaddr(_gfn(info.mfn)) + info.offset;
+ if ( !~gaddr ||
+ gfn_x(gaddr_to_gfn(gaddr)) != info.mfn )
+ break;
- force_update_vcpu_system_time(v);
+ /* Preliminary check only; see map_guest_area(). */
+ rc = -EBUSY;
+ if ( v->vcpu_info_area.pg )
+ break;
+
+ /* See the BUILD_BUG_ON() in vcpu_info_populate(). */
+ rc = map_guest_area(v, gaddr, sizeof(vcpu_info_t),
+ &v->vcpu_info_area, vcpu_info_populate);
+ if ( rc == -ERESTART )
+ rc = hypercall_create_continuation(__HYPERVISOR_vcpu_op, "iih",
+ cmd, vcpuid, arg);
break;
}
diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index a6b22fa2cac8..54d88bf5e34b 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -80,9 +80,6 @@ void cf_check free_pirq_struct(void *);
int arch_vcpu_create(struct vcpu *v);
void arch_vcpu_destroy(struct vcpu *v);
-int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned int offset);
-void unmap_vcpu_info(struct vcpu *v);
-
int map_guest_area(struct vcpu *v, paddr_t gaddr, unsigned int size,
struct guest_area *area,
void (*populate)(void *dst, struct vcpu *v));
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 6e1028785d8c..3609ef88c4ff 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -176,7 +176,7 @@ struct vcpu
int processor;
- vcpu_info_t *vcpu_info;
+ struct guest_area vcpu_info_area;
struct domain *domain;
@@ -289,9 +289,6 @@ struct vcpu
struct waitqueue_vcpu *waitqueue_vcpu;
- /* Guest-specified relocation of vcpu_info. */
- mfn_t vcpu_info_mfn;
-
struct evtchn_fifo_vcpu *evtchn_fifo;
/* vPCI per-vCPU area, used to store data for long running operations. */
diff --git a/xen/include/xen/shared.h b/xen/include/xen/shared.h
index a411a8a3e38d..5b71342cab32 100644
--- a/xen/include/xen/shared.h
+++ b/xen/include/xen/shared.h
@@ -44,6 +44,7 @@ typedef struct vcpu_info vcpu_info_t;
extern vcpu_info_t dummy_vcpu_info;
#define shared_info(d, field) __shared_info(d, (d)->shared_info, field)
-#define vcpu_info(v, field) __vcpu_info(v, (v)->vcpu_info, field)
+#define vcpu_info(v, field) \
+ __vcpu_info(v, (vcpu_info_t *)(v)->vcpu_info_area.map, field)
#endif /* __XEN_SHARED_H__ */
--
2.42.0
^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH v5 07/10] domain: map/unmap GADDR based shared guest areas
2023-10-02 15:11 ` [PATCH v5 07/10] domain: map/unmap " Roger Pau Monne
@ 2023-10-02 16:40 ` Julien Grall
0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2023-10-02 16:40 UTC (permalink / raw)
To: Roger Pau Monne, xen-devel, henry.wang
Cc: Jan Beulich, Andrew Cooper, George Dunlap, Stefano Stabellini,
Wei Liu
On 02/10/2023 16:11, Roger Pau Monne wrote:
> From: Jan Beulich <jbeulich@suse.com>
>
> The registration by virtual/linear address has downsides: At least on
> x86 the access is expensive for HVM/PVH domains. Furthermore for 64-bit
> PV domains the areas are inaccessible (and hence cannot be updated by
> Xen) when in guest-user mode, and for HVM guests they may be
> inaccessible when Meltdown mitigations are in place. (There are yet
> more issues.)
>
> In preparation of the introduction of new vCPU operations allowing to
> register the respective areas (one of the two is x86-specific) by
> guest-physical address, flesh out the map/unmap functions.
>
> Noteworthy differences from map_vcpu_info():
> - areas can be registered more than once (and de-registered),
> - remote vCPU-s are paused rather than checked for being down (which in
> principle can change right after the check),
> - the domain lock is taken for a much smaller region.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Cheers,
--
Julien Grall
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 08/10] domain: introduce GADDR based runstate area registration alternative
2023-10-02 15:11 ` [PATCH v5 08/10] domain: introduce GADDR based runstate area registration alternative Roger Pau Monne
@ 2023-10-02 16:43 ` Julien Grall
0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2023-10-02 16:43 UTC (permalink / raw)
To: Roger Pau Monne, xen-devel, henry.wang
Cc: Jan Beulich, Andrew Cooper, Wei Liu, George Dunlap,
Stefano Stabellini
Hi,
On 02/10/2023 16:11, Roger Pau Monne wrote:
> From: Jan Beulich <jbeulich@suse.com>
>
> The registration by virtual/linear address has downsides: At least on
> x86 the access is expensive for HVM/PVH domains. Furthermore for 64-bit
> PV domains the area is inaccessible (and hence cannot be updated by Xen)
> when in guest-user mode.
>
> Introduce a new vCPU operation allowing to register the runstate area by
> guest-physical address.
>
> An at least theoretical downside to using physically registered areas is
> that PV then won't see dirty (and perhaps also accessed) bits set in its
> respective page table entries.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Cheers,
--
Julien Grall
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 09/10] x86: introduce GADDR based secondary time area registration alternative
2023-10-02 15:11 ` [PATCH v5 09/10] x86: introduce GADDR based secondary time " Roger Pau Monne
@ 2023-10-02 16:44 ` Julien Grall
0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2023-10-02 16:44 UTC (permalink / raw)
To: Roger Pau Monne, xen-devel, henry.wang
Cc: Jan Beulich, Andrew Cooper, Wei Liu, George Dunlap,
Stefano Stabellini
Hi,
On 02/10/2023 16:11, Roger Pau Monne wrote:
> From: Jan Beulich <jbeulich@suse.com>
>
> The registration by virtual/linear address has downsides: The access is
> expensive for HVM/PVH domains. Furthermore for 64-bit PV domains the area
> is inaccessible (and hence cannot be updated by Xen) when in guest-user
> mode.
>
> Introduce a new vCPU operation allowing to register the secondary time
> area by guest-physical address.
>
> An at least theoretical downside to using physically registered areas is
> that PV then won't see dirty (and perhaps also accessed) bits set in its
> respective page table entries.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
For the small change in include/public/vcpu.h:
Acked-by: Julien Grall <jgrall@amazon.com>
Cheers,--
Julien Grall
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 01/10] mem_sharing/fork: do not attempt to populate vcpu_info page
[not found] ` <CABfawhm2XMmfyx7vZvGdLZcot3=Mrrx3T5nS3vUR+Ur9j5mkWg@mail.gmail.com>
@ 2023-10-03 7:15 ` Roger Pau Monné
0 siblings, 0 replies; 42+ messages in thread
From: Roger Pau Monné @ 2023-10-03 7:15 UTC (permalink / raw)
To: Tamas K Lengyel
Cc: xen-devel, henry.wang, Jan Beulich, Andrew Cooper, George Dunlap,
Wei Liu
On Mon, Oct 02, 2023 at 01:05:24PM -0400, Tamas K Lengyel wrote:
> On Mon, Oct 2, 2023 at 11:12 AM Roger Pau Monne <roger.pau@citrix.com> wrote:
> >
> > Instead let map_vcpu_info() and it's call to get_page_from_gfn()
> > populate the page in the child as needed. Also remove the bogus
> > copy_domain_page(): should be placed before the call to map_vcpu_info(),
> > as the later can update the contents of the vcpu_info page.
> >
> > Note that this eliminates a bug in copy_vcpu_settings(): The function did
> > allocate a new page regardless of the GFN already having a mapping, thus in
> > particular breaking the case of two vCPU-s having their info areas on the same
> > page.
> >
> > Fixes: 41548c5472a3 ('mem_sharing: VM forking')
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > Only build tested.
>
> So this will allocate & map a page in, but its content won't be
> matching that of the parent. Is that not an issue? If there is no
> state on this page the guest is going to rely on, it looks fine to me.
> But if the guest may expect a certain state to be already setup on
> this page we could run into weird issues in the fork, no?
mem_sharing_fork_page() will do the copy from the parent, and this is
what gets called from get_page_from_gfn().
map_vcpu_info() might change some of the vcpu_info contents when
compared to the parent, but such changes could also happen without the
fork, and the guest should be able to handle it just like any update
to vcpu_info. There will likely be an spurious event channel upcall
injected depending on the delivery method selected by the guest, but
again this could also happen without the fork taking place.
Thanks, Roger.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 06/10] x86/mem-sharing: copy GADDR based shared guest areas
[not found] ` <CABfawhnHg3KrGP-hp4_Q8GvSf2nVSVSyK24HKqAGuWp_AtD8-A@mail.gmail.com>
@ 2023-10-03 14:29 ` Roger Pau Monné
2023-10-03 15:07 ` Julien Grall
0 siblings, 1 reply; 42+ messages in thread
From: Roger Pau Monné @ 2023-10-03 14:29 UTC (permalink / raw)
To: Tamas K Lengyel
Cc: xen-devel, henry.wang, Jan Beulich, Andrew Cooper, George Dunlap,
Wei Liu, Julien Grall, Stefano Stabellini
On Tue, Oct 03, 2023 at 09:53:11AM -0400, Tamas K Lengyel wrote:
> On Mon, Oct 2, 2023 at 11:13 AM Roger Pau Monne <roger.pau@citrix.com> wrote:
> >
> > From: Jan Beulich <jbeulich@suse.com>
> >
> > In preparation of the introduction of new vCPU operations allowing to
> > register the respective areas (one of the two is x86-specific) by
> > guest-physical address, add the necessary fork handling (with the
> > backing function yet to be filled in).
> >
> > Signed-off-by: Jan Beulich <jbeulich@suse.com>
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > Changes since v4:
> > - Rely on map_guest_area() to populate the child p2m if necessary.
> > ---
> > xen/arch/x86/mm/mem_sharing.c | 31 +++++++++++++++++++++++++++++++
> > xen/common/domain.c | 7 +++++++
> > 2 files changed, 38 insertions(+)
> >
> > diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> > index 5f8f1fb4d871..99cf001fd70f 100644
> > --- a/xen/arch/x86/mm/mem_sharing.c
> > +++ b/xen/arch/x86/mm/mem_sharing.c
> > @@ -1641,6 +1641,24 @@ static void copy_vcpu_nonreg_state(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
> > hvm_set_nonreg_state(cd_vcpu, &nrs);
> > }
> >
> > +static int copy_guest_area(struct guest_area *cd_area,
> > + const struct guest_area *d_area,
> > + struct vcpu *cd_vcpu,
> > + const struct domain *d)
> > +{
> > + unsigned int offset;
> > +
> > + /* Check if no area to map, or already mapped. */
> > + if ( !d_area->pg || cd_area->pg )
> > + return 0;
> > +
> > + offset = PAGE_OFFSET(d_area->map);
> > + return map_guest_area(cd_vcpu, gfn_to_gaddr(
> > + mfn_to_gfn(d, page_to_mfn(d_area->pg))) +
> > + offset,
> > + PAGE_SIZE - offset, cd_area, NULL);
> > +}
> > +
> > static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
> > {
> > struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu);
> > @@ -1709,6 +1727,16 @@ static int copy_vcpu_settings(struct domain *cd, const struct domain *d)
> > return ret;
> > }
> >
> > + /* Same for the (physically registered) runstate and time info areas. */
> > + ret = copy_guest_area(&cd_vcpu->runstate_guest_area,
> > + &d_vcpu->runstate_guest_area, cd_vcpu, d);
> > + if ( ret )
> > + return ret;
> > + ret = copy_guest_area(&cd_vcpu->arch.time_guest_area,
> > + &d_vcpu->arch.time_guest_area, cd_vcpu, d);
> > + if ( ret )
> > + return ret;
> > +
> > ret = copy_vpmu(d_vcpu, cd_vcpu);
> > if ( ret )
> > return ret;
> > @@ -1950,7 +1978,10 @@ int mem_sharing_fork_reset(struct domain *d, bool reset_state,
> >
> > state:
> > if ( reset_state )
> > + {
> > rc = copy_settings(d, pd);
> > + /* TBD: What to do here with -ERESTART? */
>
> There is no situation where we get an -ERESTART here currently. Is
> map_guest_area expected to run into situations where it fails with
> that rc?
Yes, there's a spin_trylock() call that will result in
map_guest_area() returning -ERESTART.
> If yes we might need a lock in place so we can block until it
> can succeed.
I'm not sure whether returning -ERESTART can actually happen in
map_guest_area() for the fork case: the child domain is still paused
at this point, so there can't be concurrent guest hypercalls that
would also cause the domain hypercall_deadlock_mutex to be acquired.
The comment was added by Jan, so I cannot be certain about the
intention, neither I would like to misinterpret his words. My
understanding is that future uses of copy_settings() might indeed need
to report -ERESTART, and that it would need to be propagated for
proper hypercall continuations at some point.
Thanks, Roger.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 06/10] x86/mem-sharing: copy GADDR based shared guest areas
2023-10-03 14:29 ` Roger Pau Monné
@ 2023-10-03 15:07 ` Julien Grall
2023-10-03 20:25 ` Tamas K Lengyel
0 siblings, 1 reply; 42+ messages in thread
From: Julien Grall @ 2023-10-03 15:07 UTC (permalink / raw)
To: Roger Pau Monné, Tamas K Lengyel
Cc: xen-devel, henry.wang, Jan Beulich, Andrew Cooper, George Dunlap,
Wei Liu, Stefano Stabellini
Hi Roger,
On 03/10/2023 15:29, Roger Pau Monné wrote:
> On Tue, Oct 03, 2023 at 09:53:11AM -0400, Tamas K Lengyel wrote:
Tamas, somehow your e-mails don't show up in my inbox (even if I am
CCed) or even on lore.kernel.org/xen-devel. It is not even in my SPAM
folder.
>> On Mon, Oct 2, 2023 at 11:13 AM Roger Pau Monne <roger.pau@citrix.com> wrote:
>>>
>>> From: Jan Beulich <jbeulich@suse.com>
>>>
>>> In preparation of the introduction of new vCPU operations allowing to
>>> register the respective areas (one of the two is x86-specific) by
>>> guest-physical address, add the necessary fork handling (with the
>>> backing function yet to be filled in).
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>> ---
>>> Changes since v4:
>>> - Rely on map_guest_area() to populate the child p2m if necessary.
>>> ---
>>> xen/arch/x86/mm/mem_sharing.c | 31 +++++++++++++++++++++++++++++++
>>> xen/common/domain.c | 7 +++++++
>>> 2 files changed, 38 insertions(+)
>>>
>>> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
>>> index 5f8f1fb4d871..99cf001fd70f 100644
>>> --- a/xen/arch/x86/mm/mem_sharing.c
>>> +++ b/xen/arch/x86/mm/mem_sharing.c
>>> @@ -1641,6 +1641,24 @@ static void copy_vcpu_nonreg_state(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
>>> hvm_set_nonreg_state(cd_vcpu, &nrs);
>>> }
>>>
>>> +static int copy_guest_area(struct guest_area *cd_area,
>>> + const struct guest_area *d_area,
>>> + struct vcpu *cd_vcpu,
>>> + const struct domain *d)
>>> +{
>>> + unsigned int offset;
>>> +
>>> + /* Check if no area to map, or already mapped. */
>>> + if ( !d_area->pg || cd_area->pg )
>>> + return 0;
>>> +
>>> + offset = PAGE_OFFSET(d_area->map);
>>> + return map_guest_area(cd_vcpu, gfn_to_gaddr(
>>> + mfn_to_gfn(d, page_to_mfn(d_area->pg))) +
>>> + offset,
>>> + PAGE_SIZE - offset, cd_area, NULL);
>>> +}
>>> +
>>> static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
>>> {
>>> struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu);
>>> @@ -1709,6 +1727,16 @@ static int copy_vcpu_settings(struct domain *cd, const struct domain *d)
>>> return ret;
>>> }
>>>
>>> + /* Same for the (physically registered) runstate and time info areas. */
>>> + ret = copy_guest_area(&cd_vcpu->runstate_guest_area,
>>> + &d_vcpu->runstate_guest_area, cd_vcpu, d);
>>> + if ( ret )
>>> + return ret;
>>> + ret = copy_guest_area(&cd_vcpu->arch.time_guest_area,
>>> + &d_vcpu->arch.time_guest_area, cd_vcpu, d);
>>> + if ( ret )
>>> + return ret;
>>> +
>>> ret = copy_vpmu(d_vcpu, cd_vcpu);
>>> if ( ret )
>>> return ret;
>>> @@ -1950,7 +1978,10 @@ int mem_sharing_fork_reset(struct domain *d, bool reset_state,
>>>
>>> state:
>>> if ( reset_state )
>>> + {
>>> rc = copy_settings(d, pd);
>>> + /* TBD: What to do here with -ERESTART? */
>>
>> There is no situation where we get an -ERESTART here currently. Is
>> map_guest_area expected to run into situations where it fails with
>> that rc?
>
> Yes, there's a spin_trylock() call that will result in
> map_guest_area() returning -ERESTART.
>
>> If yes we might need a lock in place so we can block until it
>> can succeed.
>
> I'm not sure whether returning -ERESTART can actually happen in
> map_guest_area() for the fork case: the child domain is still paused
> at this point, so there can't be concurrent guest hypercalls that
> would also cause the domain hypercall_deadlock_mutex to be acquired.
hypercall_deadlock_mutex is also acquired by domctls. So, I believe,
-ERESTART could be returned if the toolstack is also issuing domclt
right at the same time as forking.
Cheers,
--
Julien Grall
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 06/10] x86/mem-sharing: copy GADDR based shared guest areas
2023-10-03 15:07 ` Julien Grall
@ 2023-10-03 20:25 ` Tamas K Lengyel
2023-10-04 8:20 ` Roger Pau Monné
0 siblings, 1 reply; 42+ messages in thread
From: Tamas K Lengyel @ 2023-10-03 20:25 UTC (permalink / raw)
To: Julien Grall
Cc: Roger Pau Monné, xen-devel, henry.wang, Jan Beulich,
Andrew Cooper, George Dunlap, Wei Liu, Stefano Stabellini
On Tue, Oct 3, 2023 at 11:07 AM Julien Grall <julien@xen.org> wrote:
>
> Hi Roger,
>
> On 03/10/2023 15:29, Roger Pau Monné wrote:
> > On Tue, Oct 03, 2023 at 09:53:11AM -0400, Tamas K Lengyel wrote:
>
> Tamas, somehow your e-mails don't show up in my inbox (even if I am
> CCed) or even on lore.kernel.org/xen-devel. It is not even in my SPAM
> folder.
Thanks, I've switched mailservers, hopefully that resolves the issue.
>
> >> On Mon, Oct 2, 2023 at 11:13 AM Roger Pau Monne <roger.pau@citrix.com> wrote:
> >>>
> >>> From: Jan Beulich <jbeulich@suse.com>
> >>>
> >>> In preparation of the introduction of new vCPU operations allowing to
> >>> register the respective areas (one of the two is x86-specific) by
> >>> guest-physical address, add the necessary fork handling (with the
> >>> backing function yet to be filled in).
> >>>
> >>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >>> ---
> >>> Changes since v4:
> >>> - Rely on map_guest_area() to populate the child p2m if necessary.
> >>> ---
> >>> xen/arch/x86/mm/mem_sharing.c | 31 +++++++++++++++++++++++++++++++
> >>> xen/common/domain.c | 7 +++++++
> >>> 2 files changed, 38 insertions(+)
> >>>
> >>> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> >>> index 5f8f1fb4d871..99cf001fd70f 100644
> >>> --- a/xen/arch/x86/mm/mem_sharing.c
> >>> +++ b/xen/arch/x86/mm/mem_sharing.c
> >>> @@ -1641,6 +1641,24 @@ static void copy_vcpu_nonreg_state(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
> >>> hvm_set_nonreg_state(cd_vcpu, &nrs);
> >>> }
> >>>
> >>> +static int copy_guest_area(struct guest_area *cd_area,
> >>> + const struct guest_area *d_area,
> >>> + struct vcpu *cd_vcpu,
> >>> + const struct domain *d)
> >>> +{
> >>> + unsigned int offset;
> >>> +
> >>> + /* Check if no area to map, or already mapped. */
> >>> + if ( !d_area->pg || cd_area->pg )
> >>> + return 0;
> >>> +
> >>> + offset = PAGE_OFFSET(d_area->map);
> >>> + return map_guest_area(cd_vcpu, gfn_to_gaddr(
> >>> + mfn_to_gfn(d, page_to_mfn(d_area->pg))) +
> >>> + offset,
> >>> + PAGE_SIZE - offset, cd_area, NULL);
> >>> +}
> >>> +
> >>> static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
> >>> {
> >>> struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu);
> >>> @@ -1709,6 +1727,16 @@ static int copy_vcpu_settings(struct domain *cd, const struct domain *d)
> >>> return ret;
> >>> }
> >>>
> >>> + /* Same for the (physically registered) runstate and time info areas. */
> >>> + ret = copy_guest_area(&cd_vcpu->runstate_guest_area,
> >>> + &d_vcpu->runstate_guest_area, cd_vcpu, d);
> >>> + if ( ret )
> >>> + return ret;
> >>> + ret = copy_guest_area(&cd_vcpu->arch.time_guest_area,
> >>> + &d_vcpu->arch.time_guest_area, cd_vcpu, d);
> >>> + if ( ret )
> >>> + return ret;
> >>> +
> >>> ret = copy_vpmu(d_vcpu, cd_vcpu);
> >>> if ( ret )
> >>> return ret;
> >>> @@ -1950,7 +1978,10 @@ int mem_sharing_fork_reset(struct domain *d, bool reset_state,
> >>>
> >>> state:
> >>> if ( reset_state )
> >>> + {
> >>> rc = copy_settings(d, pd);
> >>> + /* TBD: What to do here with -ERESTART? */
> >>
> >> There is no situation where we get an -ERESTART here currently. Is
> >> map_guest_area expected to run into situations where it fails with
> >> that rc?
> >
> > Yes, there's a spin_trylock() call that will result in
> > map_guest_area() returning -ERESTART.
> >
> >> If yes we might need a lock in place so we can block until it
> >> can succeed.
> >
> > I'm not sure whether returning -ERESTART can actually happen in
> > map_guest_area() for the fork case: the child domain is still paused
> > at this point, so there can't be concurrent guest hypercalls that
> > would also cause the domain hypercall_deadlock_mutex to be acquired.
Perhaps turning it into an ASSERT(rc != -ERESTART) is the way to go at
this point. If we run into any cases where it trips we can reason it
out.
> hypercall_deadlock_mutex is also acquired by domctls. So, I believe,
> -ERESTART could be returned if the toolstack is also issuing domclt
> right at the same time as forking.
That's not a concern in this path, only toolstack can start the reset
so we can assume it can coordinate not to have another toolstack
messing with the fork at the same time.
Thanks,
Tamas
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 06/10] x86/mem-sharing: copy GADDR based shared guest areas
2023-10-03 20:25 ` Tamas K Lengyel
@ 2023-10-04 8:20 ` Roger Pau Monné
2023-10-04 11:01 ` Julien Grall
0 siblings, 1 reply; 42+ messages in thread
From: Roger Pau Monné @ 2023-10-04 8:20 UTC (permalink / raw)
To: Tamas K Lengyel
Cc: Julien Grall, xen-devel, henry.wang, Jan Beulich, Andrew Cooper,
George Dunlap, Wei Liu, Stefano Stabellini
On Tue, Oct 03, 2023 at 04:25:58PM -0400, Tamas K Lengyel wrote:
> On Tue, Oct 3, 2023 at 11:07 AM Julien Grall <julien@xen.org> wrote:
> >
> > Hi Roger,
> >
> > On 03/10/2023 15:29, Roger Pau Monné wrote:
> > > On Tue, Oct 03, 2023 at 09:53:11AM -0400, Tamas K Lengyel wrote:
> >
> > Tamas, somehow your e-mails don't show up in my inbox (even if I am
> > CCed) or even on lore.kernel.org/xen-devel. It is not even in my SPAM
> > folder.
>
> Thanks, I've switched mailservers, hopefully that resolves the issue.
>
> >
> > >> On Mon, Oct 2, 2023 at 11:13 AM Roger Pau Monne <roger.pau@citrix.com> wrote:
> > >>>
> > >>> From: Jan Beulich <jbeulich@suse.com>
> > >>>
> > >>> In preparation of the introduction of new vCPU operations allowing to
> > >>> register the respective areas (one of the two is x86-specific) by
> > >>> guest-physical address, add the necessary fork handling (with the
> > >>> backing function yet to be filled in).
> > >>>
> > >>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> > >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > >>> ---
> > >>> Changes since v4:
> > >>> - Rely on map_guest_area() to populate the child p2m if necessary.
> > >>> ---
> > >>> xen/arch/x86/mm/mem_sharing.c | 31 +++++++++++++++++++++++++++++++
> > >>> xen/common/domain.c | 7 +++++++
> > >>> 2 files changed, 38 insertions(+)
> > >>>
> > >>> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> > >>> index 5f8f1fb4d871..99cf001fd70f 100644
> > >>> --- a/xen/arch/x86/mm/mem_sharing.c
> > >>> +++ b/xen/arch/x86/mm/mem_sharing.c
> > >>> @@ -1641,6 +1641,24 @@ static void copy_vcpu_nonreg_state(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
> > >>> hvm_set_nonreg_state(cd_vcpu, &nrs);
> > >>> }
> > >>>
> > >>> +static int copy_guest_area(struct guest_area *cd_area,
> > >>> + const struct guest_area *d_area,
> > >>> + struct vcpu *cd_vcpu,
> > >>> + const struct domain *d)
> > >>> +{
> > >>> + unsigned int offset;
> > >>> +
> > >>> + /* Check if no area to map, or already mapped. */
> > >>> + if ( !d_area->pg || cd_area->pg )
> > >>> + return 0;
> > >>> +
> > >>> + offset = PAGE_OFFSET(d_area->map);
> > >>> + return map_guest_area(cd_vcpu, gfn_to_gaddr(
> > >>> + mfn_to_gfn(d, page_to_mfn(d_area->pg))) +
> > >>> + offset,
> > >>> + PAGE_SIZE - offset, cd_area, NULL);
> > >>> +}
> > >>> +
> > >>> static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
> > >>> {
> > >>> struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu);
> > >>> @@ -1709,6 +1727,16 @@ static int copy_vcpu_settings(struct domain *cd, const struct domain *d)
> > >>> return ret;
> > >>> }
> > >>>
> > >>> + /* Same for the (physically registered) runstate and time info areas. */
> > >>> + ret = copy_guest_area(&cd_vcpu->runstate_guest_area,
> > >>> + &d_vcpu->runstate_guest_area, cd_vcpu, d);
> > >>> + if ( ret )
> > >>> + return ret;
> > >>> + ret = copy_guest_area(&cd_vcpu->arch.time_guest_area,
> > >>> + &d_vcpu->arch.time_guest_area, cd_vcpu, d);
> > >>> + if ( ret )
> > >>> + return ret;
> > >>> +
> > >>> ret = copy_vpmu(d_vcpu, cd_vcpu);
> > >>> if ( ret )
> > >>> return ret;
> > >>> @@ -1950,7 +1978,10 @@ int mem_sharing_fork_reset(struct domain *d, bool reset_state,
> > >>>
> > >>> state:
> > >>> if ( reset_state )
> > >>> + {
> > >>> rc = copy_settings(d, pd);
> > >>> + /* TBD: What to do here with -ERESTART? */
> > >>
> > >> There is no situation where we get an -ERESTART here currently. Is
> > >> map_guest_area expected to run into situations where it fails with
> > >> that rc?
> > >
> > > Yes, there's a spin_trylock() call that will result in
> > > map_guest_area() returning -ERESTART.
> > >
> > >> If yes we might need a lock in place so we can block until it
> > >> can succeed.
> > >
> > > I'm not sure whether returning -ERESTART can actually happen in
> > > map_guest_area() for the fork case: the child domain is still paused
> > > at this point, so there can't be concurrent guest hypercalls that
> > > would also cause the domain hypercall_deadlock_mutex to be acquired.
>
> Perhaps turning it into an ASSERT(rc != -ERESTART) is the way to go at
> this point. If we run into any cases where it trips we can reason it
> out.
In order to avoid possibly returning -ERESTART (which should never be
seen by hypercall callers) we might want to convert it to -EBUSY and
let the caller pick the pieces.
Thanks, Roger.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 06/10] x86/mem-sharing: copy GADDR based shared guest areas
2023-10-04 8:20 ` Roger Pau Monné
@ 2023-10-04 11:01 ` Julien Grall
2023-10-04 13:00 ` Roger Pau Monné
0 siblings, 1 reply; 42+ messages in thread
From: Julien Grall @ 2023-10-04 11:01 UTC (permalink / raw)
To: Roger Pau Monné, Tamas K Lengyel
Cc: xen-devel, henry.wang, Jan Beulich, Andrew Cooper, George Dunlap,
Wei Liu, Stefano Stabellini
Hi Roger,
On 04/10/2023 09:20, Roger Pau Monné wrote:
> On Tue, Oct 03, 2023 at 04:25:58PM -0400, Tamas K Lengyel wrote:
>> On Tue, Oct 3, 2023 at 11:07 AM Julien Grall <julien@xen.org> wrote:
>>>
>>> Hi Roger,
>>>
>>> On 03/10/2023 15:29, Roger Pau Monné wrote:
>>>> On Tue, Oct 03, 2023 at 09:53:11AM -0400, Tamas K Lengyel wrote:
>>>
>>> Tamas, somehow your e-mails don't show up in my inbox (even if I am
>>> CCed) or even on lore.kernel.org/xen-devel. It is not even in my SPAM
>>> folder.
>>
>> Thanks, I've switched mailservers, hopefully that resolves the issue.
It did. Thanks!
>>
>>>
>>>>> On Mon, Oct 2, 2023 at 11:13 AM Roger Pau Monne <roger.pau@citrix.com> wrote:
>>>>>>
>>>>>> From: Jan Beulich <jbeulich@suse.com>
>>>>>>
>>>>>> In preparation of the introduction of new vCPU operations allowing to
>>>>>> register the respective areas (one of the two is x86-specific) by
>>>>>> guest-physical address, add the necessary fork handling (with the
>>>>>> backing function yet to be filled in).
>>>>>>
>>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>>>>> ---
>>>>>> Changes since v4:
>>>>>> - Rely on map_guest_area() to populate the child p2m if necessary.
>>>>>> ---
>>>>>> xen/arch/x86/mm/mem_sharing.c | 31 +++++++++++++++++++++++++++++++
>>>>>> xen/common/domain.c | 7 +++++++
>>>>>> 2 files changed, 38 insertions(+)
>>>>>>
>>>>>> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
>>>>>> index 5f8f1fb4d871..99cf001fd70f 100644
>>>>>> --- a/xen/arch/x86/mm/mem_sharing.c
>>>>>> +++ b/xen/arch/x86/mm/mem_sharing.c
>>>>>> @@ -1641,6 +1641,24 @@ static void copy_vcpu_nonreg_state(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
>>>>>> hvm_set_nonreg_state(cd_vcpu, &nrs);
>>>>>> }
>>>>>>
>>>>>> +static int copy_guest_area(struct guest_area *cd_area,
>>>>>> + const struct guest_area *d_area,
>>>>>> + struct vcpu *cd_vcpu,
>>>>>> + const struct domain *d)
>>>>>> +{
>>>>>> + unsigned int offset;
>>>>>> +
>>>>>> + /* Check if no area to map, or already mapped. */
>>>>>> + if ( !d_area->pg || cd_area->pg )
>>>>>> + return 0;
>>>>>> +
>>>>>> + offset = PAGE_OFFSET(d_area->map);
>>>>>> + return map_guest_area(cd_vcpu, gfn_to_gaddr(
>>>>>> + mfn_to_gfn(d, page_to_mfn(d_area->pg))) +
>>>>>> + offset,
>>>>>> + PAGE_SIZE - offset, cd_area, NULL);
>>>>>> +}
>>>>>> +
>>>>>> static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
>>>>>> {
>>>>>> struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu);
>>>>>> @@ -1709,6 +1727,16 @@ static int copy_vcpu_settings(struct domain *cd, const struct domain *d)
>>>>>> return ret;
>>>>>> }
>>>>>>
>>>>>> + /* Same for the (physically registered) runstate and time info areas. */
>>>>>> + ret = copy_guest_area(&cd_vcpu->runstate_guest_area,
>>>>>> + &d_vcpu->runstate_guest_area, cd_vcpu, d);
>>>>>> + if ( ret )
>>>>>> + return ret;
>>>>>> + ret = copy_guest_area(&cd_vcpu->arch.time_guest_area,
>>>>>> + &d_vcpu->arch.time_guest_area, cd_vcpu, d);
>>>>>> + if ( ret )
>>>>>> + return ret;
>>>>>> +
>>>>>> ret = copy_vpmu(d_vcpu, cd_vcpu);
>>>>>> if ( ret )
>>>>>> return ret;
>>>>>> @@ -1950,7 +1978,10 @@ int mem_sharing_fork_reset(struct domain *d, bool reset_state,
>>>>>>
>>>>>> state:
>>>>>> if ( reset_state )
>>>>>> + {
>>>>>> rc = copy_settings(d, pd);
>>>>>> + /* TBD: What to do here with -ERESTART? */
>>>>>
>>>>> There is no situation where we get an -ERESTART here currently. Is
>>>>> map_guest_area expected to run into situations where it fails with
>>>>> that rc?
>>>>
>>>> Yes, there's a spin_trylock() call that will result in
>>>> map_guest_area() returning -ERESTART.
>>>>
>>>>> If yes we might need a lock in place so we can block until it
>>>>> can succeed.
>>>>
>>>> I'm not sure whether returning -ERESTART can actually happen in
>>>> map_guest_area() for the fork case: the child domain is still paused
>>>> at this point, so there can't be concurrent guest hypercalls that
>>>> would also cause the domain hypercall_deadlock_mutex to be acquired.
>>
>> Perhaps turning it into an ASSERT(rc != -ERESTART) is the way to go at
>> this point. If we run into any cases where it trips we can reason it
>> out.
>
> In order to avoid possibly returning -ERESTART (which should never be
> seen by hypercall callers) we might want to convert it to -EBUSY and
> let the caller pick the pieces.
I realize this is a matter of taste. I think EAGAIN is a better
conversion for ERESTART because we effectively want to caller to try again.
Cheers,
--
Julien Grall
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 06/10] x86/mem-sharing: copy GADDR based shared guest areas
2023-10-04 11:01 ` Julien Grall
@ 2023-10-04 13:00 ` Roger Pau Monné
2023-10-04 13:06 ` Julien Grall
0 siblings, 1 reply; 42+ messages in thread
From: Roger Pau Monné @ 2023-10-04 13:00 UTC (permalink / raw)
To: Julien Grall
Cc: Tamas K Lengyel, xen-devel, henry.wang, Jan Beulich,
Andrew Cooper, George Dunlap, Wei Liu, Stefano Stabellini
On Wed, Oct 04, 2023 at 12:01:21PM +0100, Julien Grall wrote:
> Hi Roger,
>
> On 04/10/2023 09:20, Roger Pau Monné wrote:
> > On Tue, Oct 03, 2023 at 04:25:58PM -0400, Tamas K Lengyel wrote:
> > > On Tue, Oct 3, 2023 at 11:07 AM Julien Grall <julien@xen.org> wrote:
> > > >
> > > > Hi Roger,
> > > >
> > > > On 03/10/2023 15:29, Roger Pau Monné wrote:
> > > > > On Tue, Oct 03, 2023 at 09:53:11AM -0400, Tamas K Lengyel wrote:
> > > >
> > > > Tamas, somehow your e-mails don't show up in my inbox (even if I am
> > > > CCed) or even on lore.kernel.org/xen-devel. It is not even in my SPAM
> > > > folder.
> > >
> > > Thanks, I've switched mailservers, hopefully that resolves the issue.
>
> It did. Thanks!
>
> > >
> > > >
> > > > > > On Mon, Oct 2, 2023 at 11:13 AM Roger Pau Monne <roger.pau@citrix.com> wrote:
> > > > > > >
> > > > > > > From: Jan Beulich <jbeulich@suse.com>
> > > > > > >
> > > > > > > In preparation of the introduction of new vCPU operations allowing to
> > > > > > > register the respective areas (one of the two is x86-specific) by
> > > > > > > guest-physical address, add the necessary fork handling (with the
> > > > > > > backing function yet to be filled in).
> > > > > > >
> > > > > > > Signed-off-by: Jan Beulich <jbeulich@suse.com>
> > > > > > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > > > > > > ---
> > > > > > > Changes since v4:
> > > > > > > - Rely on map_guest_area() to populate the child p2m if necessary.
> > > > > > > ---
> > > > > > > xen/arch/x86/mm/mem_sharing.c | 31 +++++++++++++++++++++++++++++++
> > > > > > > xen/common/domain.c | 7 +++++++
> > > > > > > 2 files changed, 38 insertions(+)
> > > > > > >
> > > > > > > diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> > > > > > > index 5f8f1fb4d871..99cf001fd70f 100644
> > > > > > > --- a/xen/arch/x86/mm/mem_sharing.c
> > > > > > > +++ b/xen/arch/x86/mm/mem_sharing.c
> > > > > > > @@ -1641,6 +1641,24 @@ static void copy_vcpu_nonreg_state(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
> > > > > > > hvm_set_nonreg_state(cd_vcpu, &nrs);
> > > > > > > }
> > > > > > >
> > > > > > > +static int copy_guest_area(struct guest_area *cd_area,
> > > > > > > + const struct guest_area *d_area,
> > > > > > > + struct vcpu *cd_vcpu,
> > > > > > > + const struct domain *d)
> > > > > > > +{
> > > > > > > + unsigned int offset;
> > > > > > > +
> > > > > > > + /* Check if no area to map, or already mapped. */
> > > > > > > + if ( !d_area->pg || cd_area->pg )
> > > > > > > + return 0;
> > > > > > > +
> > > > > > > + offset = PAGE_OFFSET(d_area->map);
> > > > > > > + return map_guest_area(cd_vcpu, gfn_to_gaddr(
> > > > > > > + mfn_to_gfn(d, page_to_mfn(d_area->pg))) +
> > > > > > > + offset,
> > > > > > > + PAGE_SIZE - offset, cd_area, NULL);
> > > > > > > +}
> > > > > > > +
> > > > > > > static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
> > > > > > > {
> > > > > > > struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu);
> > > > > > > @@ -1709,6 +1727,16 @@ static int copy_vcpu_settings(struct domain *cd, const struct domain *d)
> > > > > > > return ret;
> > > > > > > }
> > > > > > >
> > > > > > > + /* Same for the (physically registered) runstate and time info areas. */
> > > > > > > + ret = copy_guest_area(&cd_vcpu->runstate_guest_area,
> > > > > > > + &d_vcpu->runstate_guest_area, cd_vcpu, d);
> > > > > > > + if ( ret )
> > > > > > > + return ret;
> > > > > > > + ret = copy_guest_area(&cd_vcpu->arch.time_guest_area,
> > > > > > > + &d_vcpu->arch.time_guest_area, cd_vcpu, d);
> > > > > > > + if ( ret )
> > > > > > > + return ret;
> > > > > > > +
> > > > > > > ret = copy_vpmu(d_vcpu, cd_vcpu);
> > > > > > > if ( ret )
> > > > > > > return ret;
> > > > > > > @@ -1950,7 +1978,10 @@ int mem_sharing_fork_reset(struct domain *d, bool reset_state,
> > > > > > >
> > > > > > > state:
> > > > > > > if ( reset_state )
> > > > > > > + {
> > > > > > > rc = copy_settings(d, pd);
> > > > > > > + /* TBD: What to do here with -ERESTART? */
> > > > > >
> > > > > > There is no situation where we get an -ERESTART here currently. Is
> > > > > > map_guest_area expected to run into situations where it fails with
> > > > > > that rc?
> > > > >
> > > > > Yes, there's a spin_trylock() call that will result in
> > > > > map_guest_area() returning -ERESTART.
> > > > >
> > > > > > If yes we might need a lock in place so we can block until it
> > > > > > can succeed.
> > > > >
> > > > > I'm not sure whether returning -ERESTART can actually happen in
> > > > > map_guest_area() for the fork case: the child domain is still paused
> > > > > at this point, so there can't be concurrent guest hypercalls that
> > > > > would also cause the domain hypercall_deadlock_mutex to be acquired.
> > >
> > > Perhaps turning it into an ASSERT(rc != -ERESTART) is the way to go at
> > > this point. If we run into any cases where it trips we can reason it
> > > out.
> >
> > In order to avoid possibly returning -ERESTART (which should never be
> > seen by hypercall callers) we might want to convert it to -EBUSY and
> > let the caller pick the pieces.
>
> I realize this is a matter of taste. I think EAGAIN is a better conversion
> for ERESTART because we effectively want to caller to try again.
That's fine with me, but could we leave adding such translation to a
further patch?
I would rather modify Jans code as less as possible.
Thanks, Roger.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 06/10] x86/mem-sharing: copy GADDR based shared guest areas
2023-10-04 13:00 ` Roger Pau Monné
@ 2023-10-04 13:06 ` Julien Grall
2023-10-04 13:53 ` [PATCH v6 " Roger Pau Monne
0 siblings, 1 reply; 42+ messages in thread
From: Julien Grall @ 2023-10-04 13:06 UTC (permalink / raw)
To: Roger Pau Monné
Cc: Tamas K Lengyel, xen-devel, henry.wang, Jan Beulich,
Andrew Cooper, George Dunlap, Wei Liu, Stefano Stabellini
On 04/10/2023 14:00, Roger Pau Monné wrote:
> On Wed, Oct 04, 2023 at 12:01:21PM +0100, Julien Grall wrote:
>> Hi Roger,
>>
>> On 04/10/2023 09:20, Roger Pau Monné wrote:
>>> On Tue, Oct 03, 2023 at 04:25:58PM -0400, Tamas K Lengyel wrote:
>>>> On Tue, Oct 3, 2023 at 11:07 AM Julien Grall <julien@xen.org> wrote:
>>>>>
>>>>> Hi Roger,
>>>>>
>>>>> On 03/10/2023 15:29, Roger Pau Monné wrote:
>>>>>> On Tue, Oct 03, 2023 at 09:53:11AM -0400, Tamas K Lengyel wrote:
>>>>>
>>>>> Tamas, somehow your e-mails don't show up in my inbox (even if I am
>>>>> CCed) or even on lore.kernel.org/xen-devel. It is not even in my SPAM
>>>>> folder.
>>>>
>>>> Thanks, I've switched mailservers, hopefully that resolves the issue.
>>
>> It did. Thanks!
>>
>>>>
>>>>>
>>>>>>> On Mon, Oct 2, 2023 at 11:13 AM Roger Pau Monne <roger.pau@citrix.com> wrote:
>>>>>>>>
>>>>>>>> From: Jan Beulich <jbeulich@suse.com>
>>>>>>>>
>>>>>>>> In preparation of the introduction of new vCPU operations allowing to
>>>>>>>> register the respective areas (one of the two is x86-specific) by
>>>>>>>> guest-physical address, add the necessary fork handling (with the
>>>>>>>> backing function yet to be filled in).
>>>>>>>>
>>>>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>>>>>>> ---
>>>>>>>> Changes since v4:
>>>>>>>> - Rely on map_guest_area() to populate the child p2m if necessary.
>>>>>>>> ---
>>>>>>>> xen/arch/x86/mm/mem_sharing.c | 31 +++++++++++++++++++++++++++++++
>>>>>>>> xen/common/domain.c | 7 +++++++
>>>>>>>> 2 files changed, 38 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
>>>>>>>> index 5f8f1fb4d871..99cf001fd70f 100644
>>>>>>>> --- a/xen/arch/x86/mm/mem_sharing.c
>>>>>>>> +++ b/xen/arch/x86/mm/mem_sharing.c
>>>>>>>> @@ -1641,6 +1641,24 @@ static void copy_vcpu_nonreg_state(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
>>>>>>>> hvm_set_nonreg_state(cd_vcpu, &nrs);
>>>>>>>> }
>>>>>>>>
>>>>>>>> +static int copy_guest_area(struct guest_area *cd_area,
>>>>>>>> + const struct guest_area *d_area,
>>>>>>>> + struct vcpu *cd_vcpu,
>>>>>>>> + const struct domain *d)
>>>>>>>> +{
>>>>>>>> + unsigned int offset;
>>>>>>>> +
>>>>>>>> + /* Check if no area to map, or already mapped. */
>>>>>>>> + if ( !d_area->pg || cd_area->pg )
>>>>>>>> + return 0;
>>>>>>>> +
>>>>>>>> + offset = PAGE_OFFSET(d_area->map);
>>>>>>>> + return map_guest_area(cd_vcpu, gfn_to_gaddr(
>>>>>>>> + mfn_to_gfn(d, page_to_mfn(d_area->pg))) +
>>>>>>>> + offset,
>>>>>>>> + PAGE_SIZE - offset, cd_area, NULL);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
>>>>>>>> {
>>>>>>>> struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu);
>>>>>>>> @@ -1709,6 +1727,16 @@ static int copy_vcpu_settings(struct domain *cd, const struct domain *d)
>>>>>>>> return ret;
>>>>>>>> }
>>>>>>>>
>>>>>>>> + /* Same for the (physically registered) runstate and time info areas. */
>>>>>>>> + ret = copy_guest_area(&cd_vcpu->runstate_guest_area,
>>>>>>>> + &d_vcpu->runstate_guest_area, cd_vcpu, d);
>>>>>>>> + if ( ret )
>>>>>>>> + return ret;
>>>>>>>> + ret = copy_guest_area(&cd_vcpu->arch.time_guest_area,
>>>>>>>> + &d_vcpu->arch.time_guest_area, cd_vcpu, d);
>>>>>>>> + if ( ret )
>>>>>>>> + return ret;
>>>>>>>> +
>>>>>>>> ret = copy_vpmu(d_vcpu, cd_vcpu);
>>>>>>>> if ( ret )
>>>>>>>> return ret;
>>>>>>>> @@ -1950,7 +1978,10 @@ int mem_sharing_fork_reset(struct domain *d, bool reset_state,
>>>>>>>>
>>>>>>>> state:
>>>>>>>> if ( reset_state )
>>>>>>>> + {
>>>>>>>> rc = copy_settings(d, pd);
>>>>>>>> + /* TBD: What to do here with -ERESTART? */
>>>>>>>
>>>>>>> There is no situation where we get an -ERESTART here currently. Is
>>>>>>> map_guest_area expected to run into situations where it fails with
>>>>>>> that rc?
>>>>>>
>>>>>> Yes, there's a spin_trylock() call that will result in
>>>>>> map_guest_area() returning -ERESTART.
>>>>>>
>>>>>>> If yes we might need a lock in place so we can block until it
>>>>>>> can succeed.
>>>>>>
>>>>>> I'm not sure whether returning -ERESTART can actually happen in
>>>>>> map_guest_area() for the fork case: the child domain is still paused
>>>>>> at this point, so there can't be concurrent guest hypercalls that
>>>>>> would also cause the domain hypercall_deadlock_mutex to be acquired.
>>>>
>>>> Perhaps turning it into an ASSERT(rc != -ERESTART) is the way to go at
>>>> this point. If we run into any cases where it trips we can reason it
>>>> out.
>>>
>>> In order to avoid possibly returning -ERESTART (which should never be
>>> seen by hypercall callers) we might want to convert it to -EBUSY and
>>> let the caller pick the pieces.
>>
>> I realize this is a matter of taste. I think EAGAIN is a better conversion
>> for ERESTART because we effectively want to caller to try again.
>
> That's fine with me, but could we leave adding such translation to a
> further patch?
Wouldn't this mean that -ERESTART could be returned to the caller? If
yes, then I think this should be handled here. Otherwise, we will be
exposing a value that is not supposed to be exposed.
Cheers,
--
Julien Grall
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH v6 06/10] x86/mem-sharing: copy GADDR based shared guest areas
2023-10-04 13:06 ` Julien Grall
@ 2023-10-04 13:53 ` Roger Pau Monne
2023-10-04 21:36 ` Tamas K Lengyel
2023-10-16 9:55 ` Jan Beulich
0 siblings, 2 replies; 42+ messages in thread
From: Roger Pau Monne @ 2023-10-04 13:53 UTC (permalink / raw)
To: xen-devel
Cc: Jan Beulich, Tamas K Lengyel, Andrew Cooper, George Dunlap,
Roger Pau Monné, Wei Liu, Julien Grall, Stefano Stabellini
From: Jan Beulich <jbeulich@suse.com>
In preparation of the introduction of new vCPU operations allowing to
register the respective areas (one of the two is x86-specific) by
guest-physical address, add the necessary fork handling (with the
backing function yet to be filled in).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v5:
- Convert -ERESTART to -EAGAIN.
Changes since v4:
- Rely on map_guest_area() to populate the child p2m if necessary.
---
xen/arch/x86/mm/mem_sharing.c | 36 +++++++++++++++++++++++++++++++++++
xen/common/domain.c | 7 +++++++
2 files changed, 43 insertions(+)
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 5f8f1fb4d871..445947b6a918 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1641,6 +1641,24 @@ static void copy_vcpu_nonreg_state(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
hvm_set_nonreg_state(cd_vcpu, &nrs);
}
+static int copy_guest_area(struct guest_area *cd_area,
+ const struct guest_area *d_area,
+ struct vcpu *cd_vcpu,
+ const struct domain *d)
+{
+ unsigned int offset;
+
+ /* Check if no area to map, or already mapped. */
+ if ( !d_area->pg || cd_area->pg )
+ return 0;
+
+ offset = PAGE_OFFSET(d_area->map);
+ return map_guest_area(cd_vcpu, gfn_to_gaddr(
+ mfn_to_gfn(d, page_to_mfn(d_area->pg))) +
+ offset,
+ PAGE_SIZE - offset, cd_area, NULL);
+}
+
static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
{
struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu);
@@ -1709,6 +1727,16 @@ static int copy_vcpu_settings(struct domain *cd, const struct domain *d)
return ret;
}
+ /* Same for the (physically registered) runstate and time info areas. */
+ ret = copy_guest_area(&cd_vcpu->runstate_guest_area,
+ &d_vcpu->runstate_guest_area, cd_vcpu, d);
+ if ( ret )
+ return ret;
+ ret = copy_guest_area(&cd_vcpu->arch.time_guest_area,
+ &d_vcpu->arch.time_guest_area, cd_vcpu, d);
+ if ( ret )
+ return ret;
+
ret = copy_vpmu(d_vcpu, cd_vcpu);
if ( ret )
return ret;
@@ -1950,7 +1978,15 @@ int mem_sharing_fork_reset(struct domain *d, bool reset_state,
state:
if ( reset_state )
+ {
rc = copy_settings(d, pd);
+ if ( rc == -ERESTART )
+ /*
+ * Translate to -EAGAIN, see TODO comment at top of function about
+ * hypercall continuations.
+ */
+ rc = -EAGAIN;
+ }
domain_unpause(d);
diff --git a/xen/common/domain.c b/xen/common/domain.c
index d4958ec5e149..47fc90271901 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1601,6 +1601,13 @@ void unmap_vcpu_info(struct vcpu *v)
put_page_and_type(mfn_to_page(mfn));
}
+int map_guest_area(struct vcpu *v, paddr_t gaddr, unsigned int size,
+ struct guest_area *area,
+ void (*populate)(void *dst, struct vcpu *v))
+{
+ return -EOPNOTSUPP;
+}
+
/*
* This is only intended to be used for domain cleanup (or more generally only
* with at least the respective vCPU, if it's not the current one, reliably
--
2.42.0
^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH v5 10/10] common: convert vCPU info area registration
2023-10-02 15:11 ` [PATCH v5 10/10] common: convert vCPU info area registration Roger Pau Monne
@ 2023-10-04 17:15 ` Julien Grall
0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2023-10-04 17:15 UTC (permalink / raw)
To: Roger Pau Monne, xen-devel, henry.wang
Cc: Jan Beulich, Andrew Cooper, Wei Liu, George Dunlap,
Stefano Stabellini, Tamas K Lengyel
Hi Roger,
On 02/10/2023 16:11, Roger Pau Monne wrote:
> From: Jan Beulich <jbeulich@suse.com>
>
> Switch to using map_guest_area(). Noteworthy differences from
> map_vcpu_info():
> - remote vCPU-s are paused rather than checked for being down (which in
> principle can change right after the check),
> - the domain lock is taken for a much smaller region,
> - the error code for an attempt to re-register the area is now -EBUSY,
> - we could in principle permit de-registration when no area was
> previously registered (which would permit "probing", if necessary for
> anything).
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Cheers,
--
Julien Grall
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v6 06/10] x86/mem-sharing: copy GADDR based shared guest areas
2023-10-04 13:53 ` [PATCH v6 " Roger Pau Monne
@ 2023-10-04 21:36 ` Tamas K Lengyel
2023-10-16 9:55 ` Jan Beulich
1 sibling, 0 replies; 42+ messages in thread
From: Tamas K Lengyel @ 2023-10-04 21:36 UTC (permalink / raw)
To: Roger Pau Monne
Cc: xen-devel, Jan Beulich, Andrew Cooper, George Dunlap, Wei Liu,
Julien Grall, Stefano Stabellini
On Wed, Oct 4, 2023 at 9:54 AM Roger Pau Monne <roger.pau@citrix.com> wrote:
>
> From: Jan Beulich <jbeulich@suse.com>
>
> In preparation of the introduction of new vCPU operations allowing to
> register the respective areas (one of the two is x86-specific) by
> guest-physical address, add the necessary fork handling (with the
> backing function yet to be filled in).
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 00/10] runstate/time area registration by (guest) physical address
2023-10-02 15:11 [PATCH v5 00/10] runstate/time area registration by (guest) physical address Roger Pau Monne
` (9 preceding siblings ...)
2023-10-02 15:11 ` [PATCH v5 10/10] common: convert vCPU info area registration Roger Pau Monne
@ 2023-10-05 1:27 ` Henry Wang
2023-10-05 13:12 ` Julien Grall
2023-10-05 18:58 ` [CRITICAL for 4.18] " Andrew Cooper
2023-10-16 10:07 ` Jan Beulich
12 siblings, 1 reply; 42+ messages in thread
From: Henry Wang @ 2023-10-05 1:27 UTC (permalink / raw)
To: Roger Pau Monne
Cc: Xen-devel, Tamas K Lengyel, Jan Beulich, Andrew Cooper,
George Dunlap, Wei Liu, Julien Grall, Stefano Stabellini
Hi Roger,
> On Oct 2, 2023, at 23:11, Roger Pau Monne <roger.pau@citrix.com> wrote:
>
> Since it was indicated that introducing specific new vCPU ops may be
> beneficial independent of the introduction of a fully physical-
> address-based ABI flavor, here we go. There continue to be a few open
> questions throughout the series, resolving of which was one of the main
> goals of the earlier postings.
>
> v5 adds one vm-fork specific pre-patch that does simply the introduced
> code later on. It does also fix a vm-fork bug.
>
> Patches 1 and 6 are missing and Ack from the mem-sharing maintainer.
>
> Whole series will need a Release-Ack.
We agreed in [1] that this series is a good candidate for 4.18, so for the whole
series,
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
[1] https://lore.kernel.org/xen-devel/0be1e32f-5600-7b3a-8d72-84297a1ebee0@suse.com/
Kind regards,
Henry
>
> Thanks, Roger.
>
> Jan Beulich (9):
> x86/shim: zap runstate and time area handles during shutdown
> domain: GADDR based shared guest area registration alternative -
> teardown
> domain: update GADDR based runstate guest area
> x86: update GADDR based secondary time area
> x86/mem-sharing: copy GADDR based shared guest areas
> domain: map/unmap GADDR based shared guest areas
> domain: introduce GADDR based runstate area registration alternative
> x86: introduce GADDR based secondary time area registration
> alternative
> common: convert vCPU info area registration
>
> Roger Pau Monne (1):
> mem_sharing/fork: do not attempt to populate vcpu_info page
>
> xen/arch/x86/domain.c | 33 +++
> xen/arch/x86/include/asm/domain.h | 3 +
> xen/arch/x86/include/asm/shared.h | 19 +-
> xen/arch/x86/mm/mem_sharing.c | 73 +++----
> xen/arch/x86/pv/shim.c | 10 +-
> xen/arch/x86/time.c | 34 +++-
> xen/arch/x86/x86_64/asm-offsets.c | 2 +-
> xen/arch/x86/x86_64/domain.c | 36 ++++
> xen/arch/x86/x86_64/traps.c | 2 +-
> xen/common/compat/domain.c | 2 +-
> xen/common/domain.c | 324 ++++++++++++++++++++++--------
> xen/include/public/vcpu.h | 19 ++
> xen/include/xen/domain.h | 12 +-
> xen/include/xen/sched.h | 8 +-
> xen/include/xen/shared.h | 3 +-
> 15 files changed, 440 insertions(+), 140 deletions(-)
>
> --
> 2.42.0
>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 01/10] mem_sharing/fork: do not attempt to populate vcpu_info page
2023-10-02 15:11 ` [PATCH v5 01/10] mem_sharing/fork: do not attempt to populate vcpu_info page Roger Pau Monne
[not found] ` <CABfawhm2XMmfyx7vZvGdLZcot3=Mrrx3T5nS3vUR+Ur9j5mkWg@mail.gmail.com>
@ 2023-10-05 12:10 ` Julien Grall
2023-10-05 12:42 ` George Dunlap
[not found] ` <CABfawhmyP_y38002v=v1G2p66ZamhGKrj=0Jm1H_-c_j9VQG8Q@mail.gmail.com>
2023-10-05 13:15 ` Tamas K Lengyel
3 siblings, 1 reply; 42+ messages in thread
From: Julien Grall @ 2023-10-05 12:10 UTC (permalink / raw)
To: Roger Pau Monne, xen-devel, henry.wang, Tamas K Lengyel
Cc: Jan Beulich, Andrew Cooper, George Dunlap, Wei Liu
Hi,
While preparing to commit this series, I have noticed that there are no
Acked-by/Reviewed-by from Tamas (or at least not present in my inbox).
@Tamas, can you provide one?
Cheers,
On 02/10/2023 16:11, Roger Pau Monne wrote:
> Instead let map_vcpu_info() and it's call to get_page_from_gfn()
> populate the page in the child as needed. Also remove the bogus
> copy_domain_page(): should be placed before the call to map_vcpu_info(),
> as the later can update the contents of the vcpu_info page.
>
> Note that this eliminates a bug in copy_vcpu_settings(): The function did
> allocate a new page regardless of the GFN already having a mapping, thus in
> particular breaking the case of two vCPU-s having their info areas on the same
> page.
>
> Fixes: 41548c5472a3 ('mem_sharing: VM forking')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Only build tested.
> ---
> Changes since v4:
> - New in this version.
> ---
> xen/arch/x86/mm/mem_sharing.c | 36 ++++++-----------------------------
> 1 file changed, 6 insertions(+), 30 deletions(-)
>
> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> index ae5366d4476e..5f8f1fb4d871 100644
> --- a/xen/arch/x86/mm/mem_sharing.c
> +++ b/xen/arch/x86/mm/mem_sharing.c
> @@ -1689,48 +1689,24 @@ static int copy_vcpu_settings(struct domain *cd, const struct domain *d)
> unsigned int i;
> struct p2m_domain *p2m = p2m_get_hostp2m(cd);
> int ret = -EINVAL;
> + mfn_t vcpu_info_mfn;
>
> for ( i = 0; i < cd->max_vcpus; i++ )
> {
> struct vcpu *d_vcpu = d->vcpu[i];
> struct vcpu *cd_vcpu = cd->vcpu[i];
> - mfn_t vcpu_info_mfn;
>
> if ( !d_vcpu || !cd_vcpu )
> continue;
>
> - /* Copy & map in the vcpu_info page if the guest uses one */
> + /* Map in the vcpu_info page if the guest uses one */
> vcpu_info_mfn = d_vcpu->vcpu_info_mfn;
> if ( !mfn_eq(vcpu_info_mfn, INVALID_MFN) )
> {
> - mfn_t new_vcpu_info_mfn = cd_vcpu->vcpu_info_mfn;
> -
> - /* Allocate & map the page for it if it hasn't been already */
> - if ( mfn_eq(new_vcpu_info_mfn, INVALID_MFN) )
> - {
> - gfn_t gfn = mfn_to_gfn(d, vcpu_info_mfn);
> - unsigned long gfn_l = gfn_x(gfn);
> - struct page_info *page;
> -
> - if ( !(page = alloc_domheap_page(cd, 0)) )
> - return -ENOMEM;
> -
> - new_vcpu_info_mfn = page_to_mfn(page);
> - set_gpfn_from_mfn(mfn_x(new_vcpu_info_mfn), gfn_l);
> -
> - ret = p2m->set_entry(p2m, gfn, new_vcpu_info_mfn,
> - PAGE_ORDER_4K, p2m_ram_rw,
> - p2m->default_access, -1);
> - if ( ret )
> - return ret;
> -
> - ret = map_vcpu_info(cd_vcpu, gfn_l,
> - PAGE_OFFSET(d_vcpu->vcpu_info));
> - if ( ret )
> - return ret;
> - }
> -
> - copy_domain_page(new_vcpu_info_mfn, vcpu_info_mfn);
> + ret = map_vcpu_info(cd_vcpu, mfn_to_gfn(d, vcpu_info_mfn),
> + PAGE_OFFSET(d_vcpu->vcpu_info));
> + if ( ret )
> + return ret;
> }
>
> ret = copy_vpmu(d_vcpu, cd_vcpu);
--
Julien Grall
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 01/10] mem_sharing/fork: do not attempt to populate vcpu_info page
[not found] ` <CABfawhmyP_y38002v=v1G2p66ZamhGKrj=0Jm1H_-c_j9VQG8Q@mail.gmail.com>
@ 2023-10-05 12:42 ` Roger Pau Monné
0 siblings, 0 replies; 42+ messages in thread
From: Roger Pau Monné @ 2023-10-05 12:42 UTC (permalink / raw)
To: Tamas K Lengyel
Cc: xen-devel, henry.wang, Jan Beulich, Andrew Cooper, George Dunlap,
Wei Liu, Julien Grall
On Tue, Oct 03, 2023 at 09:46:13AM -0400, Tamas K Lengyel wrote:
> On Mon, Oct 2, 2023 at 11:12 AM Roger Pau Monne <roger.pau@citrix.com> wrote:
> >
> > Instead let map_vcpu_info() and it's call to get_page_from_gfn()
> > populate the page in the child as needed. Also remove the bogus
> > copy_domain_page(): should be placed before the call to map_vcpu_info(),
> > as the later can update the contents of the vcpu_info page.
> >
> > Note that this eliminates a bug in copy_vcpu_settings(): The function did
> > allocate a new page regardless of the GFN already having a mapping, thus in
> > particular breaking the case of two vCPU-s having their info areas on the same
> > page.
> >
> > Fixes: 41548c5472a3 ('mem_sharing: VM forking')
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>
> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Thanks, Roger.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 01/10] mem_sharing/fork: do not attempt to populate vcpu_info page
2023-10-05 12:10 ` Julien Grall
@ 2023-10-05 12:42 ` George Dunlap
2023-10-05 12:47 ` Julien Grall
0 siblings, 1 reply; 42+ messages in thread
From: George Dunlap @ 2023-10-05 12:42 UTC (permalink / raw)
To: Julien Grall
Cc: Roger Pau Monne, xen-devel, henry.wang, Tamas K Lengyel,
Jan Beulich, Andrew Cooper, Wei Liu
On Thu, Oct 5, 2023 at 1:11 PM Julien Grall <julien@xen.org> wrote:
>
> Hi,
>
> While preparing to commit this series, I have noticed that there are no
> Acked-by/Reviewed-by from Tamas (or at least not present in my inbox).
>
> @Tamas, can you provide one?
I see an "Acked-by" from Tamas two days ago.
-George
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 01/10] mem_sharing/fork: do not attempt to populate vcpu_info page
2023-10-05 12:42 ` George Dunlap
@ 2023-10-05 12:47 ` Julien Grall
0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2023-10-05 12:47 UTC (permalink / raw)
To: George Dunlap
Cc: Roger Pau Monne, xen-devel, henry.wang, Tamas K Lengyel,
Jan Beulich, Andrew Cooper, Wei Liu
Hi George,
On 05/10/2023 13:42, George Dunlap wrote:
> On Thu, Oct 5, 2023 at 1:11 PM Julien Grall <julien@xen.org> wrote:
>>
>> Hi,
>>
>> While preparing to commit this series, I have noticed that there are no
>> Acked-by/Reviewed-by from Tamas (or at least not present in my inbox).
>>
>> @Tamas, can you provide one?
>
> I see an "Acked-by" from Tamas two days ago.
Sadly, it is also not on lore.kernel.org or our xenproject mail archives.
In [1], Tamas pointed out he had some e-mail trouble. So anything sent
by Tamas before last Tuesday is not present in my inbox or any mail
archives.
Not clear why the Citrix folks received it.
Cheers,
[1]
https://lore.kernel.org/all/CABfawhn0vH6rS8-SJQJVZtto2HA61By1bCG3w9bJMJR3x+rXsg@mail.gmail.com/
--
Julien Grall
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 00/10] runstate/time area registration by (guest) physical address
2023-10-05 1:27 ` [PATCH v5 00/10] runstate/time area registration by (guest) physical address Henry Wang
@ 2023-10-05 13:12 ` Julien Grall
0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2023-10-05 13:12 UTC (permalink / raw)
To: Henry Wang, Roger Pau Monne
Cc: Xen-devel, Tamas K Lengyel, Jan Beulich, Andrew Cooper,
George Dunlap, Wei Liu, Stefano Stabellini
Hi,
On 05/10/2023 02:27, Henry Wang wrote:
> Hi Roger,
>
>> On Oct 2, 2023, at 23:11, Roger Pau Monne <roger.pau@citrix.com> wrote:
>>
>> Since it was indicated that introducing specific new vCPU ops may be
>> beneficial independent of the introduction of a fully physical-
>> address-based ABI flavor, here we go. There continue to be a few open
>> questions throughout the series, resolving of which was one of the main
>> goals of the earlier postings.
>>
>> v5 adds one vm-fork specific pre-patch that does simply the introduced
>> code later on. It does also fix a vm-fork bug.
>>
>> Patches 1 and 6 are missing and Ack from the mem-sharing maintainer.
>>
>> Whole series will need a Release-Ack.
>
> We agreed in [1] that this series is a good candidate for 4.18, so for the whole
> series,
>
> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
I have now committed the series.
Cheers,
--
Julien Grall
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 01/10] mem_sharing/fork: do not attempt to populate vcpu_info page
2023-10-02 15:11 ` [PATCH v5 01/10] mem_sharing/fork: do not attempt to populate vcpu_info page Roger Pau Monne
` (2 preceding siblings ...)
[not found] ` <CABfawhmyP_y38002v=v1G2p66ZamhGKrj=0Jm1H_-c_j9VQG8Q@mail.gmail.com>
@ 2023-10-05 13:15 ` Tamas K Lengyel
3 siblings, 0 replies; 42+ messages in thread
From: Tamas K Lengyel @ 2023-10-05 13:15 UTC (permalink / raw)
To: Roger Pau Monne
Cc: xen-devel, henry.wang, Jan Beulich, Andrew Cooper, George Dunlap,
Wei Liu
On Mon, Oct 2, 2023 at 11:12 AM Roger Pau Monne <roger.pau@citrix.com> wrote:
>
> Instead let map_vcpu_info() and it's call to get_page_from_gfn()
> populate the page in the child as needed. Also remove the bogus
> copy_domain_page(): should be placed before the call to map_vcpu_info(),
> as the later can update the contents of the vcpu_info page.
>
> Note that this eliminates a bug in copy_vcpu_settings(): The function did
> allocate a new page regardless of the GFN already having a mapping, thus in
> particular breaking the case of two vCPU-s having their info areas on the same
> page.
>
> Fixes: 41548c5472a3 ('mem_sharing: VM forking')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Re-sending due to mailserver issues:
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
^ permalink raw reply [flat|nested] 42+ messages in thread
* [CRITICAL for 4.18] Re: [PATCH v5 00/10] runstate/time area registration by (guest) physical address
2023-10-02 15:11 [PATCH v5 00/10] runstate/time area registration by (guest) physical address Roger Pau Monne
` (10 preceding siblings ...)
2023-10-05 1:27 ` [PATCH v5 00/10] runstate/time area registration by (guest) physical address Henry Wang
@ 2023-10-05 18:58 ` Andrew Cooper
2023-10-05 22:40 ` Julien Grall
` (2 more replies)
2023-10-16 10:07 ` Jan Beulich
12 siblings, 3 replies; 42+ messages in thread
From: Andrew Cooper @ 2023-10-05 18:58 UTC (permalink / raw)
To: Roger Pau Monne, xen-devel, henry.wang
Cc: Tamas K Lengyel, Jan Beulich, George Dunlap, Wei Liu,
Julien Grall, Stefano Stabellini
I see this series has been committed. But it's broken in a really
fundamental way.
This is a new extension with persistent side effects to an existing part
of the guest ABI.
Yet there doesn't appear to be any enumeration that the interface is
available to begin with. Requiring the guest to probe subops, and
having no way to disable it on a per-domain basis is unacceptable, and
has exploded on us more times than I care to count in security fixes
alone, and that doesn't even cover the issues Amazon have reported over
the years.
Henry: Blocker for 4.18. The absolutely bare minimum necessary to
avoid reversion is some kind of positive enumeration that the two new
hypercalls are available.
Otherwise I will be #if 0'ing out the new hypercalls before this ABI
mistake gets set in stone.
If this were x86-only it would need to be a CPUID flag, but it will need
to be something arch-agnostic in this case. The series should not have
come without a proper per-domain control and toolstack integration, but
everything else can be retrofitted in an emergency.
And on a related note, where is the documentation describing this new
feature? Some tests perhaps, or any single implementation of the guest
side interface?
This is engineering principles so basic that they do go without saying.
~Andrew
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [CRITICAL for 4.18] Re: [PATCH v5 00/10] runstate/time area registration by (guest) physical address
2023-10-05 18:58 ` [CRITICAL for 4.18] " Andrew Cooper
@ 2023-10-05 22:40 ` Julien Grall
2023-10-05 22:43 ` Julien Grall
2023-10-06 8:00 ` Roger Pau Monné
2023-10-16 10:04 ` Jan Beulich
2 siblings, 1 reply; 42+ messages in thread
From: Julien Grall @ 2023-10-05 22:40 UTC (permalink / raw)
To: Andrew Cooper, Roger Pau Monne, xen-devel, henry.wang
Cc: Tamas K Lengyel, Jan Beulich, George Dunlap, Wei Liu,
Stefano Stabellini
Hi Andrew,
On 05/10/2023 19:58, Andrew Cooper wrote:
> I see this series has been committed. But it's broken in a really
> fundamental way.
Thanks for pointing out. But I'd like to understand how I come to only
hear about those concerns on the series after committing. Did I miss any
thread? Even if this series has been pending for over 6 months, should
have I waited longer before committing?
Furthermore, Jan pointed out that this series was discussed recently
during the x86 meeting. Did you raise any concern during the call and
they were not carried on the ML?
> This is a new extension with persistent side effects to an existing part
> of the guest ABI.
>
> Yet there doesn't appear to be any enumeration that the interface is
> available to begin with. Requiring the guest to probe subops, and
> having no way to disable it on a per-domain basis is unacceptable, and
> has exploded on us more times than I care to count in security fixes
> alone, and that doesn't even cover the issues Amazon have reported over
> the years.
Indeed. But, AFAIR, all those patches got stuck because of diverging
opinions between you and you. Can we finally come to an agreement on how
to disable/expose a new hypercall/feature so we can move on?
>
> Henry: Blocker for 4.18. The absolutely bare minimum necessary to
> avoid reversion is some kind of positive enumeration that the two new
> hypercalls are available.
So to clarify, you would like both an interface for the guest and the
toolstack for 4.18. Is this correct?
>
> Otherwise I will be #if 0'ing out the new hypercalls before this ABI
> mistake gets set in stone.
>
>
> If this were x86-only it would need to be a CPUID flag, but it will need
> to be something arch-agnostic in this case.
I think we can add two new flags in XENVER_get_features to indicate to
the guest that the feature is supported. What do you think?
> The series should not have
> come without a proper per-domain control and toolstack integration,
I think this requirement should be written down in a document we can use
for future reference and not expect people to remember what may have
been said on the ML for previous hypercall addition.
Cheers,
--
Julien Grall
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [CRITICAL for 4.18] Re: [PATCH v5 00/10] runstate/time area registration by (guest) physical address
2023-10-05 22:40 ` Julien Grall
@ 2023-10-05 22:43 ` Julien Grall
0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2023-10-05 22:43 UTC (permalink / raw)
To: Andrew Cooper, Roger Pau Monne, xen-devel, henry.wang
Cc: Tamas K Lengyel, Jan Beulich, George Dunlap, Wei Liu,
Stefano Stabellini
On 05/10/2023 23:40, Julien Grall wrote:
> Hi Andrew,
>
> On 05/10/2023 19:58, Andrew Cooper wrote:
>> I see this series has been committed. But it's broken in a really
>> fundamental way.
>
> Thanks for pointing out. But I'd like to understand how I come to only
> hear about those concerns on the series after committing. Did I miss any
> thread? Even if this series has been pending for over 6 months, should
> have I waited longer before committing?
>
> Furthermore, Jan pointed out that this series was discussed recently
> during the x86 meeting. Did you raise any concern during the call and
> they were not carried on the ML?
>
>> This is a new extension with persistent side effects to an existing part
>> of the guest ABI.
>>
>> Yet there doesn't appear to be any enumeration that the interface is
>> available to begin with. Requiring the guest to probe subops, and
>> having no way to disable it on a per-domain basis is unacceptable, and
>> has exploded on us more times than I care to count in security fixes
>> alone, and that doesn't even cover the issues Amazon have reported over
>> the years.
>
> Indeed. But, AFAIR, all those patches got stuck because of diverging
> opinions between you and you. Can we finally come to an agreement on how
The second 'you' was meant to be Jan. I didn't intend to say you were
disagreing with yourself.
Cheers,
--
Julien Grall
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [CRITICAL for 4.18] Re: [PATCH v5 00/10] runstate/time area registration by (guest) physical address
2023-10-05 18:58 ` [CRITICAL for 4.18] " Andrew Cooper
2023-10-05 22:40 ` Julien Grall
@ 2023-10-06 8:00 ` Roger Pau Monné
2023-10-06 8:22 ` Roger Pau Monné
2023-10-06 10:21 ` Andrew Cooper
2023-10-16 10:04 ` Jan Beulich
2 siblings, 2 replies; 42+ messages in thread
From: Roger Pau Monné @ 2023-10-06 8:00 UTC (permalink / raw)
To: Andrew Cooper
Cc: xen-devel, henry.wang, Tamas K Lengyel, Jan Beulich,
George Dunlap, Wei Liu, Julien Grall, Stefano Stabellini
On Thu, Oct 05, 2023 at 07:58:50PM +0100, Andrew Cooper wrote:
> I see this series has been committed. But it's broken in a really
> fundamental way.
>
>
> This is a new extension with persistent side effects to an existing part
> of the guest ABI.
The only change in the ABI is the different return code for multiple
attempts to map the vcpu_info page, it used to be -EINVAL and it's
-EBUSY now, which seems more descriptive.
The added hypercalls are an extension of the ABI, not not a
modification of an existing part. Or maybe I'm not understanding the
complaint.
> Yet there doesn't appear to be any enumeration that the interface is
> available to begin with. Requiring the guest to probe subops, and
> having no way to disable it on a per-domain basis is unacceptable,
We have never mandated such disables to be part of the series adding
the new hypercalls, those have always been retro fitted in case of
need. Not saying we shouldn't do it, but it's not something we have
asked submitters to do.
> and
> has exploded on us more times than I care to count in security fixes
> alone, and that doesn't even cover the issues Amazon have reported over
> the years.
That's fine, I can add the enumeration. A CHANGELOG entry should also
be added.
>
> Henry: Blocker for 4.18. The absolutely bare minimum necessary to
> avoid reversion is some kind of positive enumeration that the two new
> hypercalls are available.
>
> Otherwise I will be #if 0'ing out the new hypercalls before this ABI
> mistake gets set in stone.
>
>
> If this were x86-only it would need to be a CPUID flag, but it will need
> to be something arch-agnostic in this case. The series should not have
> come without a proper per-domain control and toolstack integration, but
> everything else can be retrofitted in an emergency.
>
> And on a related note, where is the documentation describing this new
> feature? Some tests perhaps, or any single implementation of the guest
> side interface?
Not that I know, I was expecting Jan to post that once he gets back
from PTO.
I already noted somewhere that I wasn't able to test myself because I
couldn't find any Linux side patches to test the feature with, and I
didn't have time to write ones myself (was expecting Jan to have the
Linux side done already for testing reasons).
> This is engineering principles so basic that they do go without saying.
>
> ~Andrew
>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [CRITICAL for 4.18] Re: [PATCH v5 00/10] runstate/time area registration by (guest) physical address
2023-10-06 8:00 ` Roger Pau Monné
@ 2023-10-06 8:22 ` Roger Pau Monné
2023-10-06 10:21 ` Andrew Cooper
1 sibling, 0 replies; 42+ messages in thread
From: Roger Pau Monné @ 2023-10-06 8:22 UTC (permalink / raw)
To: Andrew Cooper
Cc: xen-devel, henry.wang, Tamas K Lengyel, Jan Beulich,
George Dunlap, Wei Liu, Julien Grall, Stefano Stabellini
On Fri, Oct 06, 2023 at 10:00:35AM +0200, Roger Pau Monné wrote:
> On Thu, Oct 05, 2023 at 07:58:50PM +0100, Andrew Cooper wrote:
> > I see this series has been committed. But it's broken in a really
> > fundamental way.
> >
> >
> > This is a new extension with persistent side effects to an existing part
> > of the guest ABI.
>
> The only change in the ABI is the different return code for multiple
> attempts to map the vcpu_info page, it used to be -EINVAL and it's
> -EBUSY now, which seems more descriptive.
>
> The added hypercalls are an extension of the ABI, not not a
> modification of an existing part. Or maybe I'm not understanding the
> complaint.
>
> > Yet there doesn't appear to be any enumeration that the interface is
> > available to begin with. Requiring the guest to probe subops, and
> > having no way to disable it on a per-domain basis is unacceptable,
>
> We have never mandated such disables to be part of the series adding
> the new hypercalls, those have always been retro fitted in case of
> need. Not saying we shouldn't do it, but it's not something we have
> asked submitters to do.
I've been thinking about this, and I assume that we would like some
kind of tools interface to list supported features by the hypervisor,
and then a way to disable them. We could then use such information to
level the hypercall interface across a pool of hosts, and make sure
it's always the same.
In principle guest should cope with some features/hypercalls not being
able on resume, but we all know this is not always the case.
I'm going to punt this, as adding such interface would be too
disruptive at this point in the release, and in any case it's
unlikely we could reach an agreement on how the interface should look
like in a very short time frame.
Roger.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [CRITICAL for 4.18] Re: [PATCH v5 00/10] runstate/time area registration by (guest) physical address
2023-10-06 8:00 ` Roger Pau Monné
2023-10-06 8:22 ` Roger Pau Monné
@ 2023-10-06 10:21 ` Andrew Cooper
1 sibling, 0 replies; 42+ messages in thread
From: Andrew Cooper @ 2023-10-06 10:21 UTC (permalink / raw)
To: Roger Pau Monné
Cc: xen-devel, henry.wang, Tamas K Lengyel, Jan Beulich,
George Dunlap, Wei Liu, Julien Grall, Stefano Stabellini
On 06/10/2023 9:00 am, Roger Pau Monné wrote:
> On Thu, Oct 05, 2023 at 07:58:50PM +0100, Andrew Cooper wrote:
>> Henry: Blocker for 4.18. The absolutely bare minimum necessary to
>> avoid reversion is some kind of positive enumeration that the two new
>> hypercalls are available.
>>
>> Otherwise I will be #if 0'ing out the new hypercalls before this ABI
>> mistake gets set in stone.
>>
>>
>> If this were x86-only it would need to be a CPUID flag, but it will need
>> to be something arch-agnostic in this case. The series should not have
>> come without a proper per-domain control and toolstack integration, but
>> everything else can be retrofitted in an emergency.
>>
>> And on a related note, where is the documentation describing this new
>> feature? Some tests perhaps, or any single implementation of the guest
>> side interface?
> Not that I know, I was expecting Jan to post that once he gets back
> from PTO.
>
> I already noted somewhere that I wasn't able to test myself because I
> couldn't find any Linux side patches to test the feature with, and I
> didn't have time to write ones myself (was expecting Jan to have the
> Linux side done already for testing reasons).
Big new feature, getting merged after RC1 and feature freeze, with no
enumeration and no disable.
How did this not set off massive alarm bells?
I was about to say that this is a disaster waiting to happen, except
OSSTest has beaten me to it.
This "new feature" managed to regress existing behaviour of something it
was trying not to change, and that's in an area that even I wouldn't
expect to put in a disable.
I'm very sorely tempted to insist that this gets disabled by default in
4.18. It's clear that it isn't in a fit state, and the absence of any
testing whatsoever means it is likely to explode on the first guest
kernel which tries to use the new interface.
And to be clear, disabled-by-default is a question for Henry and Henry
alone, as the person ultimately responsible for 4.18.
~Andrew
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v6 06/10] x86/mem-sharing: copy GADDR based shared guest areas
2023-10-04 13:53 ` [PATCH v6 " Roger Pau Monne
2023-10-04 21:36 ` Tamas K Lengyel
@ 2023-10-16 9:55 ` Jan Beulich
2023-10-16 10:59 ` Roger Pau Monné
1 sibling, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2023-10-16 9:55 UTC (permalink / raw)
To: Roger Pau Monne, Tamas K Lengyel
Cc: Andrew Cooper, George Dunlap, Wei Liu, Julien Grall,
Stefano Stabellini, xen-devel
On 04.10.2023 15:53, Roger Pau Monne wrote:
> @@ -1950,7 +1978,15 @@ int mem_sharing_fork_reset(struct domain *d, bool reset_state,
>
> state:
> if ( reset_state )
> + {
> rc = copy_settings(d, pd);
> + if ( rc == -ERESTART )
> + /*
> + * Translate to -EAGAIN, see TODO comment at top of function about
> + * hypercall continuations.
> + */
> + rc = -EAGAIN;
> + }
Are existing callers known to properly handle EAGAIN? I'm worried of the
verbosity that was no lost here.
Jan
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [CRITICAL for 4.18] Re: [PATCH v5 00/10] runstate/time area registration by (guest) physical address
2023-10-05 18:58 ` [CRITICAL for 4.18] " Andrew Cooper
2023-10-05 22:40 ` Julien Grall
2023-10-06 8:00 ` Roger Pau Monné
@ 2023-10-16 10:04 ` Jan Beulich
2 siblings, 0 replies; 42+ messages in thread
From: Jan Beulich @ 2023-10-16 10:04 UTC (permalink / raw)
To: Andrew Cooper
Cc: Tamas K Lengyel, George Dunlap, Wei Liu, Julien Grall,
Stefano Stabellini, Roger Pau Monne, xen-devel, henry.wang
On 05.10.2023 20:58, Andrew Cooper wrote:
> I see this series has been committed. But it's broken in a really
> fundamental way.
>
>
> This is a new extension with persistent side effects to an existing part
> of the guest ABI.
>
> Yet there doesn't appear to be any enumeration that the interface is
> available to begin with. Requiring the guest to probe subops, and
> having no way to disable it on a per-domain basis is unacceptable, and
> has exploded on us more times than I care to count in security fixes
> alone, and that doesn't even cover the issues Amazon have reported over
> the years.
This has never been a requirement. Plus you had ample time to raise such
a request.
> Henry: Blocker for 4.18. The absolutely bare minimum necessary to
> avoid reversion is some kind of positive enumeration that the two new
> hypercalls are available.
I disagree; to me this is a nice-to-have, not a requirement.
> Otherwise I will be #if 0'ing out the new hypercalls before this ABI
> mistake gets set in stone.
>
>
> If this were x86-only it would need to be a CPUID flag, but it will need
> to be something arch-agnostic in this case. The series should not have
> come without a proper per-domain control and toolstack integration, but
> everything else can be retrofitted in an emergency.
To be honest, had it been clear that you expect a per-domain control, I
probably would not have taken on this piece of work.
> And on a related note, where is the documentation describing this new
> feature? Some tests perhaps, or any single implementation of the guest
> side interface?
Documentation is as for sibling interfaces - as much or as little as
we have in the public headers. I did test all of this with XTF, but I've
pretty much given up posting XTF patches, seeing how even XSA tests and
alike never made it anywhere.
Jan
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 00/10] runstate/time area registration by (guest) physical address
2023-10-02 15:11 [PATCH v5 00/10] runstate/time area registration by (guest) physical address Roger Pau Monne
` (11 preceding siblings ...)
2023-10-05 18:58 ` [CRITICAL for 4.18] " Andrew Cooper
@ 2023-10-16 10:07 ` Jan Beulich
12 siblings, 0 replies; 42+ messages in thread
From: Jan Beulich @ 2023-10-16 10:07 UTC (permalink / raw)
To: Roger Pau Monne
Cc: Tamas K Lengyel, Andrew Cooper, George Dunlap, Wei Liu,
Julien Grall, Stefano Stabellini, xen-devel, henry.wang
On 02.10.2023 17:11, Roger Pau Monne wrote:
> Since it was indicated that introducing specific new vCPU ops may be
> beneficial independent of the introduction of a fully physical-
> address-based ABI flavor, here we go. There continue to be a few open
> questions throughout the series, resolving of which was one of the main
> goals of the earlier postings.
>
> v5 adds one vm-fork specific pre-patch that does simply the introduced
> code later on. It does also fix a vm-fork bug.
>
> Patches 1 and 6 are missing and Ack from the mem-sharing maintainer.
>
> Whole series will need a Release-Ack.
>
> Thanks, Roger.
>
> Jan Beulich (9):
> x86/shim: zap runstate and time area handles during shutdown
> domain: GADDR based shared guest area registration alternative -
> teardown
> domain: update GADDR based runstate guest area
> x86: update GADDR based secondary time area
> x86/mem-sharing: copy GADDR based shared guest areas
> domain: map/unmap GADDR based shared guest areas
> domain: introduce GADDR based runstate area registration alternative
> x86: introduce GADDR based secondary time area registration
> alternative
> common: convert vCPU info area registration
>
> Roger Pau Monne (1):
> mem_sharing/fork: do not attempt to populate vcpu_info page
Thanks much for picking this up during my absence.
Jan
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v6 06/10] x86/mem-sharing: copy GADDR based shared guest areas
2023-10-16 9:55 ` Jan Beulich
@ 2023-10-16 10:59 ` Roger Pau Monné
0 siblings, 0 replies; 42+ messages in thread
From: Roger Pau Monné @ 2023-10-16 10:59 UTC (permalink / raw)
To: Jan Beulich
Cc: Tamas K Lengyel, Andrew Cooper, George Dunlap, Wei Liu,
Julien Grall, Stefano Stabellini, xen-devel
On Mon, Oct 16, 2023 at 11:55:25AM +0200, Jan Beulich wrote:
> On 04.10.2023 15:53, Roger Pau Monne wrote:
> > @@ -1950,7 +1978,15 @@ int mem_sharing_fork_reset(struct domain *d, bool reset_state,
> >
> > state:
> > if ( reset_state )
> > + {
> > rc = copy_settings(d, pd);
> > + if ( rc == -ERESTART )
> > + /*
> > + * Translate to -EAGAIN, see TODO comment at top of function about
> > + * hypercall continuations.
> > + */
> > + rc = -EAGAIN;
> > + }
>
> Are existing callers known to properly handle EAGAIN? I'm worried of the
> verbosity that was no lost here.
No idea about the callers using XENMEM_sharing_op_fork_reset, but it
did seem the best option rather than leaking -ERESTART to callers. We
have no callers of xc_memshr_fork_reset() in the tree.
vm_event_resume() will trigger an assert if mem_sharing_fork_reset()
fails with any error code, so doesn't make much difference there if
the return is -EAGAIN or -ERESTART.
My initial proposal had -EBUSY IIRC.
Thanks, Roger.
^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2023-10-16 11:00 UTC | newest]
Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-02 15:11 [PATCH v5 00/10] runstate/time area registration by (guest) physical address Roger Pau Monne
2023-10-02 15:11 ` [PATCH v5 01/10] mem_sharing/fork: do not attempt to populate vcpu_info page Roger Pau Monne
[not found] ` <CABfawhm2XMmfyx7vZvGdLZcot3=Mrrx3T5nS3vUR+Ur9j5mkWg@mail.gmail.com>
2023-10-03 7:15 ` Roger Pau Monné
2023-10-05 12:10 ` Julien Grall
2023-10-05 12:42 ` George Dunlap
2023-10-05 12:47 ` Julien Grall
[not found] ` <CABfawhmyP_y38002v=v1G2p66ZamhGKrj=0Jm1H_-c_j9VQG8Q@mail.gmail.com>
2023-10-05 12:42 ` Roger Pau Monné
2023-10-05 13:15 ` Tamas K Lengyel
2023-10-02 15:11 ` [PATCH v5 02/10] x86/shim: zap runstate and time area handles during shutdown Roger Pau Monne
2023-10-02 15:11 ` [PATCH v5 03/10] domain: GADDR based shared guest area registration alternative - teardown Roger Pau Monne
2023-10-02 15:11 ` [PATCH v5 04/10] domain: update GADDR based runstate guest area Roger Pau Monne
2023-10-02 15:11 ` [PATCH v5 05/10] x86: update GADDR based secondary time area Roger Pau Monne
2023-10-02 15:11 ` [PATCH v5 06/10] x86/mem-sharing: copy GADDR based shared guest areas Roger Pau Monne
[not found] ` <CABfawhnHg3KrGP-hp4_Q8GvSf2nVSVSyK24HKqAGuWp_AtD8-A@mail.gmail.com>
2023-10-03 14:29 ` Roger Pau Monné
2023-10-03 15:07 ` Julien Grall
2023-10-03 20:25 ` Tamas K Lengyel
2023-10-04 8:20 ` Roger Pau Monné
2023-10-04 11:01 ` Julien Grall
2023-10-04 13:00 ` Roger Pau Monné
2023-10-04 13:06 ` Julien Grall
2023-10-04 13:53 ` [PATCH v6 " Roger Pau Monne
2023-10-04 21:36 ` Tamas K Lengyel
2023-10-16 9:55 ` Jan Beulich
2023-10-16 10:59 ` Roger Pau Monné
2023-10-02 15:11 ` [PATCH v5 07/10] domain: map/unmap " Roger Pau Monne
2023-10-02 16:40 ` Julien Grall
2023-10-02 15:11 ` [PATCH v5 08/10] domain: introduce GADDR based runstate area registration alternative Roger Pau Monne
2023-10-02 16:43 ` Julien Grall
2023-10-02 15:11 ` [PATCH v5 09/10] x86: introduce GADDR based secondary time " Roger Pau Monne
2023-10-02 16:44 ` Julien Grall
2023-10-02 15:11 ` [PATCH v5 10/10] common: convert vCPU info area registration Roger Pau Monne
2023-10-04 17:15 ` Julien Grall
2023-10-05 1:27 ` [PATCH v5 00/10] runstate/time area registration by (guest) physical address Henry Wang
2023-10-05 13:12 ` Julien Grall
2023-10-05 18:58 ` [CRITICAL for 4.18] " Andrew Cooper
2023-10-05 22:40 ` Julien Grall
2023-10-05 22:43 ` Julien Grall
2023-10-06 8:00 ` Roger Pau Monné
2023-10-06 8:22 ` Roger Pau Monné
2023-10-06 10:21 ` Andrew Cooper
2023-10-16 10:04 ` Jan Beulich
2023-10-16 10:07 ` Jan Beulich
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.