* [PATCH v4 01/16] libs/guest: Reduce number of parts in write_split_record
2026-06-03 13:05 [PATCH v4 00/16] xenguest optimisations Frediano Ziglio
@ 2026-06-03 13:05 ` Frediano Ziglio
2026-06-08 14:22 ` Roger Pau Monné
2026-06-03 13:05 ` [PATCH v4 02/16] libs/guest: Reduce number of I/O vectors in write_batch Frediano Ziglio
` (14 subsequent siblings)
15 siblings, 1 reply; 30+ messages in thread
From: Frediano Ziglio @ 2026-06-03 13:05 UTC (permalink / raw)
To: xen-devel
Cc: Frediano Ziglio, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Teddy Astie, Anthony PERARD, Juergen Gross
From: Frediano Ziglio <frediano.ziglio@citrix.com>
Small optimization.
There's no much sense to split the header in 2 pieces, it will
just take more time and space to reassemble them in the final
buffer.
This also avoids truncating combined_length to 32 bit in case of
64 bit machines potentially avoiding following record_length check
(it could still be truncated writing it in xc_sr_rhdr structure
but the following check will catch it).
The function become more coherent with following read_record
function.
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
--
Changes since v2:
- change prefix in subject.
Changes since v3:
- clarify commit message.
---
tools/libs/guest/xg_sr_common.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/libs/guest/xg_sr_common.c b/tools/libs/guest/xg_sr_common.c
index 7ccdc3b1f6..86c148c62f 100644
--- a/tools/libs/guest/xg_sr_common.c
+++ b/tools/libs/guest/xg_sr_common.c
@@ -59,11 +59,11 @@ int write_split_record(struct xc_sr_context *ctx, struct xc_sr_record *rec,
static const char zeroes[(1u << REC_ALIGN_ORDER) - 1] = { 0 };
xc_interface *xch = ctx->xch;
- typeof(rec->length) combined_length = rec->length + sz;
+ size_t combined_length = rec->length + sz;
size_t record_length = ROUNDUP(combined_length, REC_ALIGN_ORDER);
+ struct xc_sr_rhdr rhdr = { rec->type, combined_length };
struct iovec parts[] = {
- { &rec->type, sizeof(rec->type) },
- { &combined_length, sizeof(combined_length) },
+ { &rhdr, sizeof(rhdr) },
{ rec->data, rec->length },
{ buf, sz },
{ (void *)zeroes, record_length - combined_length },
--
2.54.0
^ permalink raw reply related [flat|nested] 30+ messages in thread* Re: [PATCH v4 01/16] libs/guest: Reduce number of parts in write_split_record
2026-06-03 13:05 ` [PATCH v4 01/16] libs/guest: Reduce number of parts in write_split_record Frediano Ziglio
@ 2026-06-08 14:22 ` Roger Pau Monné
0 siblings, 0 replies; 30+ messages in thread
From: Roger Pau Monné @ 2026-06-08 14:22 UTC (permalink / raw)
To: Frediano Ziglio
Cc: xen-devel, Frediano Ziglio, Jan Beulich, Andrew Cooper,
Teddy Astie, Anthony PERARD, Juergen Gross
On Wed, Jun 03, 2026 at 02:05:48PM +0100, Frediano Ziglio wrote:
> From: Frediano Ziglio <frediano.ziglio@citrix.com>
>
> Small optimization.
> There's no much sense to split the header in 2 pieces, it will
> just take more time and space to reassemble them in the final
> buffer.
> This also avoids truncating combined_length to 32 bit in case of
> 64 bit machines potentially avoiding following record_length check
I'm not sure I understand the sentence above: rec->length is a fixed
width type uint32_t, and hence it will always be 32bit, regardless of
whether it's build in 32 or 64 bit modes.
I do get the truncation part, and that using size_t is indeed better.
> (it could still be truncated writing it in xc_sr_rhdr structure
> but the following check will catch it).
> The function become more coherent with following read_record
> function.
>
> Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
With the commit message possibly clarified.
Thanks, Roger.
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 02/16] libs/guest: Reduce number of I/O vectors in write_batch
2026-06-03 13:05 [PATCH v4 00/16] xenguest optimisations Frediano Ziglio
2026-06-03 13:05 ` [PATCH v4 01/16] libs/guest: Reduce number of parts in write_split_record Frediano Ziglio
@ 2026-06-03 13:05 ` Frediano Ziglio
2026-06-08 14:42 ` Roger Pau Monné
2026-06-03 13:05 ` [PATCH v4 03/16] " Frediano Ziglio
` (13 subsequent siblings)
15 siblings, 1 reply; 30+ messages in thread
From: Frediano Ziglio @ 2026-06-03 13:05 UTC (permalink / raw)
To: xen-devel
Cc: Frediano Ziglio, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Teddy Astie, Anthony PERARD, Juergen Gross
From: Frediano Ziglio <frediano.ziglio@citrix.com>
Small optimization.
Reduce number of pieces passed to writev.
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
--
Changes since v2:
- change prefix in subject.
---
tools/libs/guest/xg_sr_save.c | 35 ++++++++++++++++-------------------
1 file changed, 16 insertions(+), 19 deletions(-)
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index 3b2c5222e4..1700d81905 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -97,9 +97,12 @@ static int write_batch(struct xc_sr_context *ctx)
void *page, *orig_page;
uint64_t *rec_pfns = NULL;
struct iovec *iov = NULL; int iovcnt = 0;
- struct xc_sr_rec_page_data_header hdr = { 0 };
- struct xc_sr_record rec = {
- .type = REC_TYPE_PAGE_DATA,
+ struct {
+ struct xc_sr_rhdr rec;
+ struct xc_sr_rec_page_data_header page_data;
+ } hdrs = {
+ { .type = REC_TYPE_PAGE_DATA },
+ { 0 },
};
assert(nr_pfns != 0);
@@ -115,7 +118,7 @@ static int write_batch(struct xc_sr_context *ctx)
/* Pointers to locally allocated pages. Need freeing. */
local_pages = calloc(nr_pfns, sizeof(*local_pages));
/* iovec[] for writev(). */
- iov = malloc((nr_pfns + 4) * sizeof(*iov));
+ iov = malloc((nr_pfns + 2) * sizeof(*iov));
if ( !mfns || !types || !errors || !guest_data || !local_pages || !iov )
{
@@ -216,28 +219,22 @@ static int write_batch(struct xc_sr_context *ctx)
goto err;
}
- hdr.count = nr_pfns;
+ hdrs.rec.length = sizeof(hdrs.page_data);
+ hdrs.rec.length += nr_pfns * sizeof(*rec_pfns);
+ hdrs.rec.length += nr_pages * PAGE_SIZE;
- rec.length = sizeof(hdr);
- rec.length += nr_pfns * sizeof(*rec_pfns);
- rec.length += nr_pages * PAGE_SIZE;
+ hdrs.page_data.count = nr_pfns;
for ( i = 0; i < nr_pfns; ++i )
rec_pfns[i] = ((uint64_t)(types[i]) << 32) | ctx->save.batch_pfns[i];
- iov[0].iov_base = &rec.type;
- iov[0].iov_len = sizeof(rec.type);
+ iov[0].iov_base = &hdrs;
+ iov[0].iov_len = sizeof(hdrs);
- iov[1].iov_base = &rec.length;
- iov[1].iov_len = sizeof(rec.length);
+ iov[1].iov_base = rec_pfns;
+ iov[1].iov_len = nr_pfns * sizeof(*rec_pfns);
- iov[2].iov_base = &hdr;
- iov[2].iov_len = sizeof(hdr);
-
- iov[3].iov_base = rec_pfns;
- iov[3].iov_len = nr_pfns * sizeof(*rec_pfns);
-
- iovcnt = 4;
+ iovcnt = 2;
if ( nr_pages )
{
--
2.54.0
^ permalink raw reply related [flat|nested] 30+ messages in thread* Re: [PATCH v4 02/16] libs/guest: Reduce number of I/O vectors in write_batch
2026-06-03 13:05 ` [PATCH v4 02/16] libs/guest: Reduce number of I/O vectors in write_batch Frediano Ziglio
@ 2026-06-08 14:42 ` Roger Pau Monné
0 siblings, 0 replies; 30+ messages in thread
From: Roger Pau Monné @ 2026-06-08 14:42 UTC (permalink / raw)
To: Frediano Ziglio
Cc: xen-devel, Frediano Ziglio, Jan Beulich, Andrew Cooper,
Teddy Astie, Anthony PERARD, Juergen Gross
On Wed, Jun 03, 2026 at 02:05:49PM +0100, Frediano Ziglio wrote:
> From: Frediano Ziglio <frediano.ziglio@citrix.com>
>
> Small optimization.
> Reduce number of pieces passed to writev.
>
> Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
> --
> Changes since v2:
> - change prefix in subject.
> ---
> tools/libs/guest/xg_sr_save.c | 35 ++++++++++++++++-------------------
> 1 file changed, 16 insertions(+), 19 deletions(-)
>
> diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
> index 3b2c5222e4..1700d81905 100644
> --- a/tools/libs/guest/xg_sr_save.c
> +++ b/tools/libs/guest/xg_sr_save.c
> @@ -97,9 +97,12 @@ static int write_batch(struct xc_sr_context *ctx)
> void *page, *orig_page;
> uint64_t *rec_pfns = NULL;
> struct iovec *iov = NULL; int iovcnt = 0;
> - struct xc_sr_rec_page_data_header hdr = { 0 };
> - struct xc_sr_record rec = {
> - .type = REC_TYPE_PAGE_DATA,
> + struct {
> + struct xc_sr_rhdr rec;
> + struct xc_sr_rec_page_data_header page_data;
Is there a possible worry that the compiler (for another
architecture) will introduce non-zero padding between those two structs?
> + } hdrs = {
> + { .type = REC_TYPE_PAGE_DATA },
> + { 0 },
Do you need the explicit initialization to 0 here? All unspecified
fields in the initialization will already be set to 0.
> };
>
> assert(nr_pfns != 0);
> @@ -115,7 +118,7 @@ static int write_batch(struct xc_sr_context *ctx)
> /* Pointers to locally allocated pages. Need freeing. */
> local_pages = calloc(nr_pfns, sizeof(*local_pages));
> /* iovec[] for writev(). */
> - iov = malloc((nr_pfns + 4) * sizeof(*iov));
> + iov = malloc((nr_pfns + 2) * sizeof(*iov));
It would seem more natural to use calloc() here, but it would also do
a zeroing that we don't care.
Thanks, Roger.
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 03/16] libs/guest: Reduce number of I/O vectors in write_batch
2026-06-03 13:05 [PATCH v4 00/16] xenguest optimisations Frediano Ziglio
2026-06-03 13:05 ` [PATCH v4 01/16] libs/guest: Reduce number of parts in write_split_record Frediano Ziglio
2026-06-03 13:05 ` [PATCH v4 02/16] libs/guest: Reduce number of I/O vectors in write_batch Frediano Ziglio
@ 2026-06-03 13:05 ` Frediano Ziglio
2026-06-08 15:03 ` Roger Pau Monné
2026-06-03 13:05 ` [PATCH v4 04/16] libs/guest: Use a single write_exact in write_headers Frediano Ziglio
` (12 subsequent siblings)
15 siblings, 1 reply; 30+ messages in thread
From: Frediano Ziglio @ 2026-06-03 13:05 UTC (permalink / raw)
To: xen-devel
Cc: Frediano Ziglio, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Teddy Astie, Anthony PERARD, Juergen Gross
From: Frediano Ziglio <frediano.ziglio@citrix.com>
Each page was sent using a different iovec item. This potentially exceed
Linux maximum (1024).
Also some implementation (MiniOS) emulate writev with multiple write calls.
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
--
Changes since v2:
- change prefix in subject.
---
tools/libs/guest/xg_sr_save.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index 1700d81905..62a39dfecc 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -240,13 +240,20 @@ static int write_batch(struct xc_sr_context *ctx)
{
for ( i = 0; i < nr_pfns; ++i )
{
- if ( guest_data[i] )
+ if ( !guest_data[i] )
+ continue;
+
+ if ( iov[iovcnt-1].iov_base + iov[iovcnt-1].iov_len != guest_data[i] )
{
iov[iovcnt].iov_base = guest_data[i];
iov[iovcnt].iov_len = PAGE_SIZE;
iovcnt++;
- --nr_pages;
}
+ else
+ {
+ iov[iovcnt-1].iov_len += PAGE_SIZE;
+ }
+ --nr_pages;
}
}
--
2.54.0
^ permalink raw reply related [flat|nested] 30+ messages in thread* Re: [PATCH v4 03/16] libs/guest: Reduce number of I/O vectors in write_batch
2026-06-03 13:05 ` [PATCH v4 03/16] " Frediano Ziglio
@ 2026-06-08 15:03 ` Roger Pau Monné
0 siblings, 0 replies; 30+ messages in thread
From: Roger Pau Monné @ 2026-06-08 15:03 UTC (permalink / raw)
To: Frediano Ziglio
Cc: xen-devel, Frediano Ziglio, Jan Beulich, Andrew Cooper,
Teddy Astie, Anthony PERARD, Juergen Gross
On Wed, Jun 03, 2026 at 02:05:50PM +0100, Frediano Ziglio wrote:
> From: Frediano Ziglio <frediano.ziglio@citrix.com>
>
> Each page was sent using a different iovec item. This potentially exceed
> Linux maximum (1024).
> Also some implementation (MiniOS) emulate writev with multiple write calls.
I would add to the commit message:
"Coalesce adjacent IO vector elements to attempt to reduce the number
of overall IO vectors for each operation."
> Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
> --
> Changes since v2:
> - change prefix in subject.
> ---
> tools/libs/guest/xg_sr_save.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
> index 1700d81905..62a39dfecc 100644
> --- a/tools/libs/guest/xg_sr_save.c
> +++ b/tools/libs/guest/xg_sr_save.c
> @@ -240,13 +240,20 @@ static int write_batch(struct xc_sr_context *ctx)
> {
> for ( i = 0; i < nr_pfns; ++i )
> {
> - if ( guest_data[i] )
> + if ( !guest_data[i] )
> + continue;
> +
> + if ( iov[iovcnt-1].iov_base + iov[iovcnt-1].iov_len != guest_data[i] )
Nit: space between subtraction and operands: iovcnt - 1.
Thanks, Roger.
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 04/16] libs/guest: Use a single write_exact in write_headers
2026-06-03 13:05 [PATCH v4 00/16] xenguest optimisations Frediano Ziglio
` (2 preceding siblings ...)
2026-06-03 13:05 ` [PATCH v4 03/16] " Frediano Ziglio
@ 2026-06-03 13:05 ` Frediano Ziglio
2026-06-08 15:26 ` Roger Pau Monné
2026-06-03 13:05 ` [PATCH v4 05/16] libs/guest: allocate various migration arrays just once Frediano Ziglio
` (11 subsequent siblings)
15 siblings, 1 reply; 30+ messages in thread
From: Frediano Ziglio @ 2026-06-03 13:05 UTC (permalink / raw)
To: xen-devel
Cc: Frediano Ziglio, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Teddy Astie, Anthony PERARD, Juergen Gross
From: Frediano Ziglio <frediano.ziglio@citrix.com>
Reduce number of syscalls.
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
--
Changes since v2:
- change prefix in subject.
---
tools/libs/guest/xg_sr_save.c | 37 +++++++++++++++++------------------
1 file changed, 18 insertions(+), 19 deletions(-)
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index 62a39dfecc..8c4e760f8d 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -10,17 +10,22 @@ static int write_headers(struct xc_sr_context *ctx, uint16_t guest_type)
{
xc_interface *xch = ctx->xch;
int32_t xen_version = xc_version(xch, XENVER_version, NULL);
- struct xc_sr_ihdr ihdr = {
- .marker = IHDR_MARKER,
- .id = htonl(IHDR_ID),
- .version = htonl(3),
- .options = htons(IHDR_OPT_LITTLE_ENDIAN),
- };
- struct xc_sr_dhdr dhdr = {
- .type = guest_type,
- .page_shift = XC_PAGE_SHIFT,
- .xen_major = (xen_version >> 16) & 0xffff,
- .xen_minor = (xen_version) & 0xffff,
+ struct {
+ struct xc_sr_ihdr ihdr;
+ struct xc_sr_dhdr dhdr;
+ } hdrs = {
+ {
+ .marker = IHDR_MARKER,
+ .id = htonl(IHDR_ID),
+ .version = htonl(3),
+ .options = htons(IHDR_OPT_LITTLE_ENDIAN),
+ },
+ {
+ .type = guest_type,
+ .page_shift = XC_PAGE_SHIFT,
+ .xen_major = (xen_version >> 16) & 0xffff,
+ .xen_minor = (xen_version) & 0xffff,
+ },
};
if ( xen_version < 0 )
@@ -29,15 +34,9 @@ static int write_headers(struct xc_sr_context *ctx, uint16_t guest_type)
return -1;
}
- if ( write_exact(ctx->fd, &ihdr, sizeof(ihdr)) )
- {
- PERROR("Unable to write Image Header to stream");
- return -1;
- }
-
- if ( write_exact(ctx->fd, &dhdr, sizeof(dhdr)) )
+ if ( write_exact(ctx->fd, &hdrs, sizeof(hdrs)) )
{
- PERROR("Unable to write Domain Header to stream");
+ PERROR("Unable to write Headers to stream");
return -1;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 30+ messages in thread* Re: [PATCH v4 04/16] libs/guest: Use a single write_exact in write_headers
2026-06-03 13:05 ` [PATCH v4 04/16] libs/guest: Use a single write_exact in write_headers Frediano Ziglio
@ 2026-06-08 15:26 ` Roger Pau Monné
0 siblings, 0 replies; 30+ messages in thread
From: Roger Pau Monné @ 2026-06-08 15:26 UTC (permalink / raw)
To: Frediano Ziglio
Cc: xen-devel, Frediano Ziglio, Jan Beulich, Andrew Cooper,
Teddy Astie, Anthony PERARD, Juergen Gross
On Wed, Jun 03, 2026 at 02:05:51PM +0100, Frediano Ziglio wrote:
> From: Frediano Ziglio <frediano.ziglio@citrix.com>
>
> Reduce number of syscalls.
... by coalescing the image and the domain headers into a single IO
vector array.
>
> Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
> --
> Changes since v2:
> - change prefix in subject.
> ---
> tools/libs/guest/xg_sr_save.c | 37 +++++++++++++++++------------------
> 1 file changed, 18 insertions(+), 19 deletions(-)
>
> diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
> index 62a39dfecc..8c4e760f8d 100644
> --- a/tools/libs/guest/xg_sr_save.c
> +++ b/tools/libs/guest/xg_sr_save.c
> @@ -10,17 +10,22 @@ static int write_headers(struct xc_sr_context *ctx, uint16_t guest_type)
> {
> xc_interface *xch = ctx->xch;
> int32_t xen_version = xc_version(xch, XENVER_version, NULL);
> - struct xc_sr_ihdr ihdr = {
> - .marker = IHDR_MARKER,
> - .id = htonl(IHDR_ID),
> - .version = htonl(3),
> - .options = htons(IHDR_OPT_LITTLE_ENDIAN),
> - };
> - struct xc_sr_dhdr dhdr = {
> - .type = guest_type,
> - .page_shift = XC_PAGE_SHIFT,
> - .xen_major = (xen_version >> 16) & 0xffff,
> - .xen_minor = (xen_version) & 0xffff,
> + struct {
> + struct xc_sr_ihdr ihdr;
> + struct xc_sr_dhdr dhdr;
> + } hdrs = {
> + {
> + .marker = IHDR_MARKER,
> + .id = htonl(IHDR_ID),
> + .version = htonl(3),
> + .options = htons(IHDR_OPT_LITTLE_ENDIAN),
> + },
> + {
> + .type = guest_type,
> + .page_shift = XC_PAGE_SHIFT,
> + .xen_major = (xen_version >> 16) & 0xffff,
You don't strictly need the mask here AFAICT?
Thanks, Roger.
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 05/16] libs/guest: allocate various migration arrays just once
2026-06-03 13:05 [PATCH v4 00/16] xenguest optimisations Frediano Ziglio
` (3 preceding siblings ...)
2026-06-03 13:05 ` [PATCH v4 04/16] libs/guest: Use a single write_exact in write_headers Frediano Ziglio
@ 2026-06-03 13:05 ` Frediano Ziglio
2026-06-08 15:36 ` Andrew Cooper
2026-06-08 15:49 ` Roger Pau Monné
2026-06-03 13:05 ` [PATCH v4 06/16] libs/call: cache up to 4 pages in hypercall bounce buffers Frediano Ziglio
` (10 subsequent siblings)
15 siblings, 2 replies; 30+ messages in thread
From: Frediano Ziglio @ 2026-06-03 13:05 UTC (permalink / raw)
To: xen-devel
Cc: Edwin Török, Jan Beulich, Andrew Cooper,
Roger Pau Monné, Teddy Astie, Anthony PERARD, Juergen Gross,
Frediano Ziglio
From: Edwin Török <edwin.torok@citrix.com>
Allocate these array just once at the start of migration,
using the maximum batch size, and free them at the end.
Signed-off-by: Edwin Török <edwin.torok@citrix.com>
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
--
Changes since v2:
- change prefix in subject.
Changes since v3:
- fix comment style
---
tools/libs/guest/xg_sr_common.h | 13 +++++++
tools/libs/guest/xg_sr_save.c | 66 +++++++++++++--------------------
2 files changed, 39 insertions(+), 40 deletions(-)
diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index f1573aefcb..82549b5589 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -209,6 +209,18 @@ static inline int update_blob(struct xc_sr_blob *blob,
return 0;
}
+struct xc_sr_context_save_buffers
+{
+ xen_pfn_t batch_pfns[MAX_BATCH_SIZE];
+ xen_pfn_t mfns[MAX_BATCH_SIZE];
+ xen_pfn_t types[MAX_BATCH_SIZE];
+ int errors[MAX_BATCH_SIZE];
+ void *guest_data[MAX_BATCH_SIZE];
+ void *local_pages[MAX_BATCH_SIZE];
+ struct iovec iov[MAX_BATCH_SIZE + 2]; /* Headers + data. */
+ uint64_t rec_pfns[MAX_BATCH_SIZE];
+};
+
struct xc_sr_context
{
xc_interface *xch;
@@ -244,6 +256,7 @@ struct xc_sr_context
unsigned long *deferred_pages;
unsigned long nr_deferred_pages;
xc_hypercall_buffer_t dirty_bitmap_hbuf;
+ struct xc_sr_context_save_buffers *buffers;
} save;
struct /* Restore data. */
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index 8c4e760f8d..7d8055a3f9 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -86,16 +86,16 @@ static int write_checkpoint_record(struct xc_sr_context *ctx)
static int write_batch(struct xc_sr_context *ctx)
{
xc_interface *xch = ctx->xch;
- xen_pfn_t *mfns = NULL, *types = NULL;
+ xen_pfn_t *mfns, *types;
void *guest_mapping = NULL;
- void **guest_data = NULL;
- void **local_pages = NULL;
- int *errors = NULL, rc = -1;
+ void **guest_data;
+ void **local_pages;
+ int *errors, rc = -1;
unsigned int i, p, nr_pages = 0, nr_pages_mapped = 0;
unsigned int nr_pfns = ctx->save.nr_batch_pfns;
void *page, *orig_page;
- uint64_t *rec_pfns = NULL;
- struct iovec *iov = NULL; int iovcnt = 0;
+ uint64_t *rec_pfns;
+ struct iovec *iov; int iovcnt = 0;
struct {
struct xc_sr_rhdr rec;
struct xc_sr_rec_page_data_header page_data;
@@ -105,26 +105,24 @@ static int write_batch(struct xc_sr_context *ctx)
};
assert(nr_pfns != 0);
+ assert(nr_pfns <= MAX_BATCH_SIZE);
+ assert(ctx->save.buffers);
/* Mfns of the batch pfns. */
- mfns = malloc(nr_pfns * sizeof(*mfns));
+ mfns = ctx->save.buffers->mfns;
/* Types of the batch pfns. */
- types = malloc(nr_pfns * sizeof(*types));
+ types = ctx->save.buffers->types;
/* Errors from attempting to map the gfns. */
- errors = malloc(nr_pfns * sizeof(*errors));
+ errors = ctx->save.buffers->errors;
/* Pointers to page data to send. Mapped gfns or local allocations. */
- guest_data = calloc(nr_pfns, sizeof(*guest_data));
+ guest_data = ctx->save.buffers->guest_data;
+ memset(guest_data, 0, sizeof(*guest_data) * nr_pfns);
/* Pointers to locally allocated pages. Need freeing. */
- local_pages = calloc(nr_pfns, sizeof(*local_pages));
+ local_pages = ctx->save.buffers->local_pages;
+ memset(local_pages, 0, sizeof(*local_pages) * nr_pfns);
/* iovec[] for writev(). */
- iov = malloc((nr_pfns + 2) * sizeof(*iov));
-
- if ( !mfns || !types || !errors || !guest_data || !local_pages || !iov )
- {
- ERROR("Unable to allocate arrays for a batch of %u pages",
- nr_pfns);
- goto err;
- }
+ iov = ctx->save.buffers->iov;
+ rec_pfns = ctx->save.buffers->rec_pfns;
for ( i = 0; i < nr_pfns; ++i )
{
@@ -210,14 +208,6 @@ static int write_batch(struct xc_sr_context *ctx)
}
}
- rec_pfns = malloc(nr_pfns * sizeof(*rec_pfns));
- if ( !rec_pfns )
- {
- ERROR("Unable to allocate %zu bytes of memory for page data pfn list",
- nr_pfns * sizeof(*rec_pfns));
- goto err;
- }
-
hdrs.rec.length = sizeof(hdrs.page_data);
hdrs.rec.length += nr_pfns * sizeof(*rec_pfns);
hdrs.rec.length += nr_pages * PAGE_SIZE;
@@ -267,17 +257,13 @@ static int write_batch(struct xc_sr_context *ctx)
rc = ctx->save.nr_batch_pfns = 0;
err:
- free(rec_pfns);
if ( guest_mapping )
xenforeignmemory_unmap(xch->fmem, guest_mapping, nr_pages_mapped);
for ( i = 0; local_pages && i < nr_pfns; ++i )
+ {
free(local_pages[i]);
- free(iov);
- free(local_pages);
- free(guest_data);
- free(errors);
- free(types);
- free(mfns);
+ local_pages[i] = NULL;
+ }
return rc;
}
@@ -805,18 +791,18 @@ static int setup(struct xc_sr_context *ctx)
dirty_bitmap = xc_hypercall_buffer_alloc_pages(
xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)));
- ctx->save.batch_pfns = malloc(MAX_BATCH_SIZE *
- sizeof(*ctx->save.batch_pfns));
ctx->save.deferred_pages = bitmap_alloc(ctx->save.p2m_size);
+ ctx->save.buffers = calloc(1, sizeof(*ctx->save.buffers));
- if ( !ctx->save.batch_pfns || !dirty_bitmap || !ctx->save.deferred_pages )
+ if ( !dirty_bitmap || !ctx->save.deferred_pages || !ctx->save.buffers)
{
- ERROR("Unable to allocate memory for dirty bitmaps, batch pfns and"
- " deferred pages");
+ ERROR("Unable to allocate memory for dirty bitmaps, deferred pages"
+ " and various batch buffers");
rc = -1;
errno = ENOMEM;
goto err;
}
+ ctx->save.batch_pfns = ctx->save.buffers->batch_pfns;
rc = 0;
@@ -840,7 +826,7 @@ static void cleanup(struct xc_sr_context *ctx)
xc_hypercall_buffer_free_pages(xch, dirty_bitmap,
NRPAGES(bitmap_size(ctx->save.p2m_size)));
free(ctx->save.deferred_pages);
- free(ctx->save.batch_pfns);
+ free(ctx->save.buffers);
}
/*
--
2.54.0
^ permalink raw reply related [flat|nested] 30+ messages in thread* Re: [PATCH v4 05/16] libs/guest: allocate various migration arrays just once
2026-06-03 13:05 ` [PATCH v4 05/16] libs/guest: allocate various migration arrays just once Frediano Ziglio
@ 2026-06-08 15:36 ` Andrew Cooper
2026-06-08 15:50 ` Roger Pau Monné
2026-06-08 15:49 ` Roger Pau Monné
1 sibling, 1 reply; 30+ messages in thread
From: Andrew Cooper @ 2026-06-08 15:36 UTC (permalink / raw)
To: Frediano Ziglio, xen-devel
Cc: Andrew Cooper, Edwin Török, Jan Beulich,
Roger Pau Monné, Teddy Astie, Anthony PERARD, Juergen Gross,
Frediano Ziglio
On 03/06/2026 2:05 pm, Frediano Ziglio wrote:
> From: Edwin Török <edwin.torok@citrix.com>
>
> Allocate these array just once at the start of migration,
> using the maximum batch size, and free them at the end.
>
> Signed-off-by: Edwin Török <edwin.torok@citrix.com>
> Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
> --
> Changes since v2:
> - change prefix in subject.
>
> Changes since v3:
> - fix comment style
These are intentionally freed/reallocated so valgrind can find overflows.
~Andrew
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v4 05/16] libs/guest: allocate various migration arrays just once
2026-06-08 15:36 ` Andrew Cooper
@ 2026-06-08 15:50 ` Roger Pau Monné
0 siblings, 0 replies; 30+ messages in thread
From: Roger Pau Monné @ 2026-06-08 15:50 UTC (permalink / raw)
To: Andrew Cooper
Cc: Frediano Ziglio, xen-devel, Edwin Török, Jan Beulich,
Teddy Astie, Anthony PERARD, Juergen Gross, Frediano Ziglio
On Mon, Jun 08, 2026 at 04:36:54PM +0100, Andrew Cooper wrote:
> On 03/06/2026 2:05 pm, Frediano Ziglio wrote:
> > From: Edwin Török <edwin.torok@citrix.com>
> >
> > Allocate these array just once at the start of migration,
> > using the maximum batch size, and free them at the end.
> >
> > Signed-off-by: Edwin Török <edwin.torok@citrix.com>
> > Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
> > --
> > Changes since v2:
> > - change prefix in subject.
> >
> > Changes since v3:
> > - fix comment style
>
> These are intentionally freed/reallocated so valgrind can find overflows.
Hm, then we likely need to keep the current behavior for CONFIG_DEBUG
builds of the library?
Thanks, Roger.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v4 05/16] libs/guest: allocate various migration arrays just once
2026-06-03 13:05 ` [PATCH v4 05/16] libs/guest: allocate various migration arrays just once Frediano Ziglio
2026-06-08 15:36 ` Andrew Cooper
@ 2026-06-08 15:49 ` Roger Pau Monné
2026-06-08 16:24 ` Andrew Cooper
1 sibling, 1 reply; 30+ messages in thread
From: Roger Pau Monné @ 2026-06-08 15:49 UTC (permalink / raw)
To: Frediano Ziglio
Cc: xen-devel, Edwin Török, Jan Beulich, Andrew Cooper,
Teddy Astie, Anthony PERARD, Juergen Gross, Frediano Ziglio
On Wed, Jun 03, 2026 at 02:05:52PM +0100, Frediano Ziglio wrote:
> From: Edwin Török <edwin.torok@citrix.com>
>
> Allocate these array just once at the start of migration,
> using the maximum batch size, and free them at the end.
>
> Signed-off-by: Edwin Török <edwin.torok@citrix.com>
> Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
Jan made a comment about this patch (and how it related to a still
pending patch of his):
https://lore.kernel.org/xen-devel/e3f22fa6-c497-4afc-9498-12449548acfd@suse.com/
That is still unresolved AFAICT.
> --
> Changes since v2:
> - change prefix in subject.
>
> Changes since v3:
> - fix comment style
> ---
> tools/libs/guest/xg_sr_common.h | 13 +++++++
> tools/libs/guest/xg_sr_save.c | 66 +++++++++++++--------------------
> 2 files changed, 39 insertions(+), 40 deletions(-)
>
> diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
> index f1573aefcb..82549b5589 100644
> --- a/tools/libs/guest/xg_sr_common.h
> +++ b/tools/libs/guest/xg_sr_common.h
> @@ -209,6 +209,18 @@ static inline int update_blob(struct xc_sr_blob *blob,
> return 0;
> }
>
> +struct xc_sr_context_save_buffers
> +{
> + xen_pfn_t batch_pfns[MAX_BATCH_SIZE];
> + xen_pfn_t mfns[MAX_BATCH_SIZE];
> + xen_pfn_t types[MAX_BATCH_SIZE];
> + int errors[MAX_BATCH_SIZE];
FWIW: I would possibly place errors at the end of the structure. It
seems more natural and is the only array that has 4 byte alignment
instead of 8 (on 64bits at least).
> + void *guest_data[MAX_BATCH_SIZE];
> + void *local_pages[MAX_BATCH_SIZE];
> + struct iovec iov[MAX_BATCH_SIZE + 2]; /* Headers + data. */
> + uint64_t rec_pfns[MAX_BATCH_SIZE];
> +};
> +
> struct xc_sr_context
> {
> xc_interface *xch;
> @@ -244,6 +256,7 @@ struct xc_sr_context
> unsigned long *deferred_pages;
> unsigned long nr_deferred_pages;
> xc_hypercall_buffer_t dirty_bitmap_hbuf;
> + struct xc_sr_context_save_buffers *buffers;
> } save;
>
> struct /* Restore data. */
> diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
> index 8c4e760f8d..7d8055a3f9 100644
> --- a/tools/libs/guest/xg_sr_save.c
> +++ b/tools/libs/guest/xg_sr_save.c
> @@ -86,16 +86,16 @@ static int write_checkpoint_record(struct xc_sr_context *ctx)
> static int write_batch(struct xc_sr_context *ctx)
> {
> xc_interface *xch = ctx->xch;
> - xen_pfn_t *mfns = NULL, *types = NULL;
> + xen_pfn_t *mfns, *types;
> void *guest_mapping = NULL;
> - void **guest_data = NULL;
> - void **local_pages = NULL;
> - int *errors = NULL, rc = -1;
> + void **guest_data;
> + void **local_pages;
> + int *errors, rc = -1;
> unsigned int i, p, nr_pages = 0, nr_pages_mapped = 0;
> unsigned int nr_pfns = ctx->save.nr_batch_pfns;
> void *page, *orig_page;
> - uint64_t *rec_pfns = NULL;
> - struct iovec *iov = NULL; int iovcnt = 0;
> + uint64_t *rec_pfns;
> + struct iovec *iov; int iovcnt = 0;
> struct {
> struct xc_sr_rhdr rec;
> struct xc_sr_rec_page_data_header page_data;
> @@ -105,26 +105,24 @@ static int write_batch(struct xc_sr_context *ctx)
> };
>
> assert(nr_pfns != 0);
> + assert(nr_pfns <= MAX_BATCH_SIZE);
> + assert(ctx->save.buffers);
>
> /* Mfns of the batch pfns. */
> - mfns = malloc(nr_pfns * sizeof(*mfns));
> + mfns = ctx->save.buffers->mfns;
> /* Types of the batch pfns. */
> - types = malloc(nr_pfns * sizeof(*types));
> + types = ctx->save.buffers->types;
> /* Errors from attempting to map the gfns. */
> - errors = malloc(nr_pfns * sizeof(*errors));
> + errors = ctx->save.buffers->errors;
> /* Pointers to page data to send. Mapped gfns or local allocations. */
> - guest_data = calloc(nr_pfns, sizeof(*guest_data));
> + guest_data = ctx->save.buffers->guest_data;
> + memset(guest_data, 0, sizeof(*guest_data) * nr_pfns);
> /* Pointers to locally allocated pages. Need freeing. */
> - local_pages = calloc(nr_pfns, sizeof(*local_pages));
> + local_pages = ctx->save.buffers->local_pages;
> + memset(local_pages, 0, sizeof(*local_pages) * nr_pfns);
See below - I think it's possible to avoid the memset() and keep the
same guarantees.
> /* iovec[] for writev(). */
> - iov = malloc((nr_pfns + 2) * sizeof(*iov));
> -
> - if ( !mfns || !types || !errors || !guest_data || !local_pages || !iov )
> - {
> - ERROR("Unable to allocate arrays for a batch of %u pages",
> - nr_pfns);
> - goto err;
> - }
> + iov = ctx->save.buffers->iov;
> + rec_pfns = ctx->save.buffers->rec_pfns;
>
> for ( i = 0; i < nr_pfns; ++i )
> {
> @@ -210,14 +208,6 @@ static int write_batch(struct xc_sr_context *ctx)
> }
> }
>
> - rec_pfns = malloc(nr_pfns * sizeof(*rec_pfns));
> - if ( !rec_pfns )
> - {
> - ERROR("Unable to allocate %zu bytes of memory for page data pfn list",
> - nr_pfns * sizeof(*rec_pfns));
> - goto err;
> - }
> -
> hdrs.rec.length = sizeof(hdrs.page_data);
> hdrs.rec.length += nr_pfns * sizeof(*rec_pfns);
> hdrs.rec.length += nr_pages * PAGE_SIZE;
> @@ -267,17 +257,13 @@ static int write_batch(struct xc_sr_context *ctx)
> rc = ctx->save.nr_batch_pfns = 0;
>
> err:
> - free(rec_pfns);
> if ( guest_mapping )
> xenforeignmemory_unmap(xch->fmem, guest_mapping, nr_pages_mapped);
> for ( i = 0; local_pages && i < nr_pfns; ++i )
> + {
> free(local_pages[i]);
> - free(iov);
> - free(local_pages);
> - free(guest_data);
> - free(errors);
> - free(types);
> - free(mfns);
> + local_pages[i] = NULL;
If you are doing this cleanup here, you could also do guest_data[i] =
NULL and avoid the memset, since at the start of each write_batch()
the arrays will already be zeroed (either because they are allocated
with calloc() on the first call, and always cleaned up in
write_batch() after usage.
Thanks, Roger.
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [PATCH v4 05/16] libs/guest: allocate various migration arrays just once
2026-06-08 15:49 ` Roger Pau Monné
@ 2026-06-08 16:24 ` Andrew Cooper
0 siblings, 0 replies; 30+ messages in thread
From: Andrew Cooper @ 2026-06-08 16:24 UTC (permalink / raw)
To: Roger Pau Monné, Frediano Ziglio
Cc: Andrew Cooper, xen-devel, Jan Beulich, Teddy Astie,
Anthony PERARD, Juergen Gross, Frediano Ziglio
On 08/06/2026 4:49 pm, Roger Pau Monné wrote:
> On Wed, Jun 03, 2026 at 02:05:52PM +0100, Frediano Ziglio wrote:
>> From: Edwin Török <edwin.torok@citrix.com>
>>
>> Allocate these array just once at the start of migration,
>> using the maximum batch size, and free them at the end.
>>
>> Signed-off-by: Edwin Török <edwin.torok@citrix.com>
>> Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
> Jan made a comment about this patch (and how it related to a still
> pending patch of his):
>
> https://lore.kernel.org/xen-devel/e3f22fa6-c497-4afc-9498-12449548acfd@suse.com/
>
> That is still unresolved AFAICT.
That patch is not pending. It, along with it's series, is wholly nacked.
I have explained in public and in private what it would take to retract
my nack, but until such time as the feedback is actioned, that old
series is going nowhere, and has no baring on other changes.
~Andrew
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 06/16] libs/call: cache up to 4 pages in hypercall bounce buffers
2026-06-03 13:05 [PATCH v4 00/16] xenguest optimisations Frediano Ziglio
` (4 preceding siblings ...)
2026-06-03 13:05 ` [PATCH v4 05/16] libs/guest: allocate various migration arrays just once Frediano Ziglio
@ 2026-06-03 13:05 ` Frediano Ziglio
2026-06-03 13:05 ` [PATCH v4 07/16] libs/guest: avoids using 2 indexes Frediano Ziglio
` (9 subsequent siblings)
15 siblings, 0 replies; 30+ messages in thread
From: Frediano Ziglio @ 2026-06-03 13:05 UTC (permalink / raw)
To: xen-devel
Cc: Edwin Török, Jan Beulich, Andrew Cooper,
Roger Pau Monné, Teddy Astie, Anthony PERARD, Juergen Gross,
Frediano Ziglio
From: Edwin Török <edwin.torok@citrix.com>
During migration there are a lot of mmap/munmap calls,
because `xc_get_pfn_type_batch` exceeds the default hypercall bounce
buffer cache size, and needs to allocate every time it is called.
`munmap` is slow, especially in a PV Dom0 (takes an emulation fault),
so is best avoided.
Eventually it'd be good if the memory pool from xmalloc_tlsf.c
was reused here, but for now make it handle the commonly encountered
sizes (so far up to 4 pages).
Signed-off-by: Edwin Török <edwin.torok@citrix.com>
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
--
Changes since v2:
- change prefix in subject.
---
tools/libs/call/buffer.c | 28 +++++++++++++++++-----------
tools/libs/call/core.c | 3 ++-
tools/libs/call/private.h | 8 +++++---
3 files changed, 24 insertions(+), 15 deletions(-)
diff --git a/tools/libs/call/buffer.c b/tools/libs/call/buffer.c
index 2579b8c719..15d2f98a6b 100644
--- a/tools/libs/call/buffer.c
+++ b/tools/libs/call/buffer.c
@@ -56,13 +56,13 @@ static void *cache_alloc(xencall_handle *xcall, size_t nr_pages)
if ( xcall->buffer_current_allocations > xcall->buffer_maximum_allocations )
xcall->buffer_maximum_allocations = xcall->buffer_current_allocations;
- if ( nr_pages > 1 )
+ if ( nr_pages > ARRAY_SIZE(xcall->buffer_cache) )
{
xcall->buffer_cache_toobig++;
}
- else if ( xcall->buffer_cache_nr > 0 )
+ else if ( xcall->buffer_cache_nr[nr_pages-1] > 0 )
{
- p = xcall->buffer_cache[--xcall->buffer_cache_nr];
+ p = xcall->buffer_cache[nr_pages-1][--xcall->buffer_cache_nr[nr_pages-1]];
xcall->buffer_cache_hits++;
}
else
@@ -84,10 +84,10 @@ static int cache_free(xencall_handle *xcall, void *p, size_t nr_pages)
xcall->buffer_total_releases++;
xcall->buffer_current_allocations--;
- if ( nr_pages == 1 &&
- xcall->buffer_cache_nr < BUFFER_CACHE_SIZE )
+ if ( nr_pages && nr_pages < ARRAY_SIZE(xcall->buffer_cache) &&
+ xcall->buffer_cache_nr[nr_pages-1] < BUFFER_CACHE_SIZE )
{
- xcall->buffer_cache[xcall->buffer_cache_nr++] = p;
+ xcall->buffer_cache[nr_pages-1][xcall->buffer_cache_nr[nr_pages-1]++] = p;
rc = 1;
}
@@ -108,17 +108,23 @@ void buffer_release_cache(xencall_handle *xcall)
DBGPRINTF("current allocations:%d maximum allocations:%d",
xcall->buffer_current_allocations,
xcall->buffer_maximum_allocations);
- DBGPRINTF("cache current size:%d",
- xcall->buffer_cache_nr);
+ for ( unsigned i = 0; i < ARRAY_SIZE(xcall->buffer_cache_nr); ++i )
+ {
+ DBGPRINTF("cache current size[%u pages]:%d", i+1,
+ xcall->buffer_cache_nr[i]);
+ }
DBGPRINTF("cache hits:%d misses:%d toobig:%d",
xcall->buffer_cache_hits,
xcall->buffer_cache_misses,
xcall->buffer_cache_toobig);
- while ( xcall->buffer_cache_nr > 0 )
+ for ( unsigned i = 0; i < ARRAY_SIZE(xcall->buffer_cache_nr); ++i )
{
- p = xcall->buffer_cache[--xcall->buffer_cache_nr];
- osdep_free_pages(xcall, p, 1);
+ while ( xcall->buffer_cache_nr[i] > 0 )
+ {
+ p = xcall->buffer_cache[i][--xcall->buffer_cache_nr[i]];
+ osdep_free_pages(xcall, p, i + 1);
+ }
}
cache_unlock(xcall);
diff --git a/tools/libs/call/core.c b/tools/libs/call/core.c
index 02c4f8e1ae..dd8877c1a0 100644
--- a/tools/libs/call/core.c
+++ b/tools/libs/call/core.c
@@ -14,6 +14,7 @@
*/
#include <stdlib.h>
+#include <string.h>
#include "private.h"
@@ -44,7 +45,7 @@ xencall_handle *xencall_open(xentoollog_logger *logger, unsigned open_flags)
xentoolcore__register_active_handle(&xcall->tc_ah);
xcall->flags = open_flags;
- xcall->buffer_cache_nr = 0;
+ memset(xcall->buffer_cache_nr, 0, sizeof(xcall->buffer_cache_nr));
xcall->buffer_total_allocations = 0;
xcall->buffer_total_releases = 0;
diff --git a/tools/libs/call/private.h b/tools/libs/call/private.h
index 9c3aa432ef..8e6a208975 100644
--- a/tools/libs/call/private.h
+++ b/tools/libs/call/private.h
@@ -31,13 +31,15 @@ struct xencall_handle {
Xentoolcore__Active_Handle tc_ah;
/*
- * A simple cache of unused, single page, hypercall buffers
+ * A simple cache of unused, small, hypercall buffers
+ * buffer_cache[i]'s size is (i+1) pages
*
* Protected by a global lock.
*/
#define BUFFER_CACHE_SIZE 4
- int buffer_cache_nr;
- void *buffer_cache[BUFFER_CACHE_SIZE];
+#define BUFFER_CACHE_NRPAGES 4
+ int buffer_cache_nr[BUFFER_CACHE_NRPAGES];
+ void *buffer_cache[BUFFER_CACHE_NRPAGES][BUFFER_CACHE_SIZE];
/*
* Hypercall buffer statistics. All protected by the global
--
2.54.0
^ permalink raw reply related [flat|nested] 30+ messages in thread* [PATCH v4 07/16] libs/guest: avoids using 2 indexes
2026-06-03 13:05 [PATCH v4 00/16] xenguest optimisations Frediano Ziglio
` (5 preceding siblings ...)
2026-06-03 13:05 ` [PATCH v4 06/16] libs/call: cache up to 4 pages in hypercall bounce buffers Frediano Ziglio
@ 2026-06-03 13:05 ` Frediano Ziglio
2026-06-03 13:05 ` [PATCH v4 08/16] libs/guest: fill directly iov structure Frediano Ziglio
` (8 subsequent siblings)
15 siblings, 0 replies; 30+ messages in thread
From: Frediano Ziglio @ 2026-06-03 13:05 UTC (permalink / raw)
To: xen-devel
Cc: Frediano Ziglio, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Teddy Astie, Anthony PERARD, Juergen Gross
From: Frediano Ziglio <frediano.ziglio@citrix.com>
Simplify code, after the first scan of the various arrays we don't need to
keep original types and PFNs but only the ones having data.
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
---
tools/libs/guest/xg_sr_restore.c | 26 +++++++++++++-------------
1 file changed, 13 insertions(+), 13 deletions(-)
diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index e148fc594a..fb46142d87 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -260,9 +260,7 @@ static int process_page_data(struct xc_sr_context *ctx, unsigned int count,
int *map_errs = malloc(count * sizeof(*map_errs));
int rc;
void *mapping = NULL, *guest_page = NULL;
- unsigned int i, /* i indexes the pfns from the record. */
- j, /* j indexes the subset of pfns we decide to map. */
- nr_pages = 0;
+ unsigned nr_pages;
if ( !mfns || !map_errs )
{
@@ -279,12 +277,18 @@ static int process_page_data(struct xc_sr_context *ctx, unsigned int count,
goto err;
}
- for ( i = 0; i < count; ++i )
+ nr_pages = 0;
+ for ( unsigned i = 0; i < count; ++i )
{
ctx->restore.ops.set_page_type(ctx, pfns[i], types[i]);
- if ( page_type_has_stream_data(types[i]) )
- mfns[nr_pages++] = ctx->restore.ops.pfn_to_gfn(ctx, pfns[i]);
+ if ( !page_type_has_stream_data(types[i]) )
+ continue;
+
+ mfns[nr_pages] = ctx->restore.ops.pfn_to_gfn(ctx, pfns[i]);
+ pfns[nr_pages] = pfns[i];
+ types[nr_pages] = types[i];
+ nr_pages++;
}
/* Nothing to do? */
@@ -302,16 +306,13 @@ static int process_page_data(struct xc_sr_context *ctx, unsigned int count,
goto err;
}
- for ( i = 0, j = 0; i < count; ++i )
+ for ( unsigned i = 0; i < nr_pages; ++i )
{
- if ( !page_type_has_stream_data(types[i]) )
- continue;
-
- if ( map_errs[j] )
+ if ( map_errs[i] )
{
rc = -1;
ERROR("Mapping pfn %#"PRIpfn" (mfn %#"PRIpfn", type %#"PRIx32") failed with %d",
- pfns[i], mfns[j], types[i], map_errs[j]);
+ pfns[i], mfns[i], types[i], map_errs[i]);
goto err;
}
@@ -337,7 +338,6 @@ static int process_page_data(struct xc_sr_context *ctx, unsigned int count,
memcpy(guest_page, page_data, PAGE_SIZE);
}
- ++j;
guest_page += PAGE_SIZE;
page_data += PAGE_SIZE;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 30+ messages in thread* [PATCH v4 08/16] libs/guest: fill directly iov structure
2026-06-03 13:05 [PATCH v4 00/16] xenguest optimisations Frediano Ziglio
` (6 preceding siblings ...)
2026-06-03 13:05 ` [PATCH v4 07/16] libs/guest: avoids using 2 indexes Frediano Ziglio
@ 2026-06-03 13:05 ` Frediano Ziglio
2026-06-03 13:05 ` [PATCH v4 09/16] libs/ctrl: Allows writev_exact to change iov array Frediano Ziglio
` (7 subsequent siblings)
15 siblings, 0 replies; 30+ messages in thread
From: Frediano Ziglio @ 2026-06-03 13:05 UTC (permalink / raw)
To: xen-devel
Cc: Frediano Ziglio, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Teddy Astie, Anthony PERARD, Juergen Gross
From: Frediano Ziglio <frediano.ziglio@citrix.com>
Instead of storing page pointers into an array and lately adding to
iov vector add the pages directly to iov to avoid "guest_data"
array.
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
---
tools/libs/guest/xg_sr_common.h | 1 -
tools/libs/guest/xg_sr_save.c | 62 ++++++++++++---------------------
2 files changed, 22 insertions(+), 41 deletions(-)
diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 82549b5589..e8452746e4 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -215,7 +215,6 @@ struct xc_sr_context_save_buffers
xen_pfn_t mfns[MAX_BATCH_SIZE];
xen_pfn_t types[MAX_BATCH_SIZE];
int errors[MAX_BATCH_SIZE];
- void *guest_data[MAX_BATCH_SIZE];
void *local_pages[MAX_BATCH_SIZE];
struct iovec iov[MAX_BATCH_SIZE + 2]; /* Headers + data. */
uint64_t rec_pfns[MAX_BATCH_SIZE];
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index 7d8055a3f9..593268f176 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -88,7 +88,6 @@ static int write_batch(struct xc_sr_context *ctx)
xc_interface *xch = ctx->xch;
xen_pfn_t *mfns, *types;
void *guest_mapping = NULL;
- void **guest_data;
void **local_pages;
int *errors, rc = -1;
unsigned int i, p, nr_pages = 0, nr_pages_mapped = 0;
@@ -114,9 +113,6 @@ static int write_batch(struct xc_sr_context *ctx)
types = ctx->save.buffers->types;
/* Errors from attempting to map the gfns. */
errors = ctx->save.buffers->errors;
- /* Pointers to page data to send. Mapped gfns or local allocations. */
- guest_data = ctx->save.buffers->guest_data;
- memset(guest_data, 0, sizeof(*guest_data) * nr_pfns);
/* Pointers to locally allocated pages. Need freeing. */
local_pages = ctx->save.buffers->local_pages;
memset(local_pages, 0, sizeof(*local_pages) * nr_pfns);
@@ -159,6 +155,19 @@ static int write_batch(struct xc_sr_context *ctx)
mfns[nr_pages++] = mfns[i];
}
+ hdrs.rec.length = sizeof(hdrs.page_data);
+ hdrs.rec.length += nr_pfns * sizeof(*rec_pfns);
+
+ hdrs.page_data.count = nr_pfns;
+
+ iov[0].iov_base = &hdrs;
+ iov[0].iov_len = sizeof(hdrs);
+
+ iov[1].iov_base = rec_pfns;
+ iov[1].iov_len = nr_pfns * sizeof(*rec_pfns);
+
+ iovcnt = 2;
+
if ( nr_pages > 0 )
{
guest_mapping = xenforeignmemory_map(
@@ -200,60 +209,33 @@ static int write_batch(struct xc_sr_context *ctx)
else
goto err;
}
+ else if ( iov[iovcnt-1].iov_base + iov[iovcnt-1].iov_len != page )
+ {
+ iov[iovcnt].iov_base = page;
+ iov[iovcnt].iov_len = PAGE_SIZE;
+ iovcnt++;
+ }
else
- guest_data[i] = page;
+ {
+ iov[iovcnt-1].iov_len += PAGE_SIZE;
+ }
rc = -1;
++p;
}
}
- hdrs.rec.length = sizeof(hdrs.page_data);
- hdrs.rec.length += nr_pfns * sizeof(*rec_pfns);
hdrs.rec.length += nr_pages * PAGE_SIZE;
- hdrs.page_data.count = nr_pfns;
-
for ( i = 0; i < nr_pfns; ++i )
rec_pfns[i] = ((uint64_t)(types[i]) << 32) | ctx->save.batch_pfns[i];
- iov[0].iov_base = &hdrs;
- iov[0].iov_len = sizeof(hdrs);
-
- iov[1].iov_base = rec_pfns;
- iov[1].iov_len = nr_pfns * sizeof(*rec_pfns);
-
- iovcnt = 2;
-
- if ( nr_pages )
- {
- for ( i = 0; i < nr_pfns; ++i )
- {
- if ( !guest_data[i] )
- continue;
-
- if ( iov[iovcnt-1].iov_base + iov[iovcnt-1].iov_len != guest_data[i] )
- {
- iov[iovcnt].iov_base = guest_data[i];
- iov[iovcnt].iov_len = PAGE_SIZE;
- iovcnt++;
- }
- else
- {
- iov[iovcnt-1].iov_len += PAGE_SIZE;
- }
- --nr_pages;
- }
- }
-
if ( writev_exact(ctx->fd, iov, iovcnt) )
{
PERROR("Failed to write page data to stream");
goto err;
}
- /* Sanity check we have sent all the pages we expected to. */
- assert(nr_pages == 0);
rc = ctx->save.nr_batch_pfns = 0;
err:
--
2.54.0
^ permalink raw reply related [flat|nested] 30+ messages in thread* [PATCH v4 09/16] libs/ctrl: Allows writev_exact to change iov array
2026-06-03 13:05 [PATCH v4 00/16] xenguest optimisations Frediano Ziglio
` (7 preceding siblings ...)
2026-06-03 13:05 ` [PATCH v4 08/16] libs/guest: fill directly iov structure Frediano Ziglio
@ 2026-06-03 13:05 ` Frediano Ziglio
2026-06-03 13:05 ` [PATCH v4 10/16] libs/guest: add xg_foreignmemory_copy_{from,to} Frediano Ziglio
` (6 subsequent siblings)
15 siblings, 0 replies; 30+ messages in thread
From: Frediano Ziglio @ 2026-06-03 13:05 UTC (permalink / raw)
To: xen-devel
Cc: Frediano Ziglio, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Teddy Astie, Anthony PERARD, Juergen Gross
From: Frediano Ziglio <frediano.ziglio@citrix.com>
Avoid having to allocate and copy the array if a partial write
happens.
The implementation in tools/libs/store/xs.c already use this
signature and method.
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
--
Changes since v2:
- change prefix in subject.
---
tools/libs/ctrl/xc_private.c | 26 +++++---------------------
tools/libs/ctrl/xc_private.h | 2 +-
2 files changed, 6 insertions(+), 22 deletions(-)
diff --git a/tools/libs/ctrl/xc_private.c b/tools/libs/ctrl/xc_private.c
index bb0f81d6f3..946fc307aa 100644
--- a/tools/libs/ctrl/xc_private.c
+++ b/tools/libs/ctrl/xc_private.c
@@ -635,7 +635,7 @@ int write_exact(int fd, const void *data, size_t size)
/*
* MiniOS's libc doesn't know about writev(). Implement it as multiple write()s.
*/
-int writev_exact(int fd, const struct iovec *iov, int iovcnt)
+int writev_exact(int fd, struct iovec *iov, int iovcnt)
{
int rc, i;
@@ -649,9 +649,8 @@ int writev_exact(int fd, const struct iovec *iov, int iovcnt)
return 0;
}
#else
-int writev_exact(int fd, const struct iovec *iov, int iovcnt)
+int writev_exact(int fd, struct iovec *iov, int iovcnt)
{
- struct iovec *local_iov = NULL;
int rc = 0, iov_idx = 0, saved_errno = 0;
ssize_t len;
@@ -686,23 +685,9 @@ int writev_exact(int fd, const struct iovec *iov, int iovcnt)
len -= iov[iov_idx++].iov_len;
else
{
- /* Partial write of iov[iov_idx]. Copy iov so we can adjust
- * element iov_idx and resubmit the rest. */
- if ( !local_iov )
- {
- local_iov = malloc(iovcnt * sizeof(*iov));
- if ( !local_iov )
- {
- saved_errno = ENOMEM;
- rc = -1;
- goto out;
- }
-
- iov = memcpy(local_iov, iov, iovcnt * sizeof(*iov));
- }
-
- local_iov[iov_idx].iov_base += len;
- local_iov[iov_idx].iov_len -= len;
+ /* Partial write of iov[iov_idx]. */
+ iov[iov_idx].iov_base += len;
+ iov[iov_idx].iov_len -= len;
break;
}
}
@@ -711,7 +696,6 @@ int writev_exact(int fd, const struct iovec *iov, int iovcnt)
saved_errno = 0;
out:
- free(local_iov);
errno = saved_errno;
return rc;
}
diff --git a/tools/libs/ctrl/xc_private.h b/tools/libs/ctrl/xc_private.h
index b5892ae8dc..3af996e900 100644
--- a/tools/libs/ctrl/xc_private.h
+++ b/tools/libs/ctrl/xc_private.h
@@ -383,7 +383,7 @@ int xc_flush_mmu_updates(xc_interface *xch, struct xc_mmu *mmu);
/* Return 0 on success; -1 on error setting errno. */
int read_exact(int fd, void *data, size_t size); /* EOF => -1, errno=0 */
int write_exact(int fd, const void *data, size_t size);
-int writev_exact(int fd, const struct iovec *iov, int iovcnt);
+int writev_exact(int fd, struct iovec *iov, int iovcnt);
int xc_ffs8(uint8_t x);
int xc_ffs16(uint16_t x);
--
2.54.0
^ permalink raw reply related [flat|nested] 30+ messages in thread* [PATCH v4 10/16] libs/guest: add xg_foreignmemory_copy_{from,to}
2026-06-03 13:05 [PATCH v4 00/16] xenguest optimisations Frediano Ziglio
` (8 preceding siblings ...)
2026-06-03 13:05 ` [PATCH v4 09/16] libs/ctrl: Allows writev_exact to change iov array Frediano Ziglio
@ 2026-06-03 13:05 ` Frediano Ziglio
2026-06-03 13:05 ` [PATCH v4 11/16] PoC: libs/guest: use foreign copy during migration Frediano Ziglio
` (5 subsequent siblings)
15 siblings, 0 replies; 30+ messages in thread
From: Frediano Ziglio @ 2026-06-03 13:05 UTC (permalink / raw)
To: xen-devel
Cc: Frediano Ziglio, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Teddy Astie, Anthony PERARD, Juergen Gross
From: Frediano Ziglio <frediano.ziglio@citrix.com>
This change prepare code to use a new "foreign copy" hypercall.
The new hypercall will copy memory from/to a foreign domain.
The new hypercall can be emulated with a sequence of:
- map foreign memory;
- copy memory;
- unmap foreign memory.
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
---
tools/libs/guest/xg_sr_common.c | 60 +++++++++++++++++++++++++++++++++
tools/libs/guest/xg_sr_common.h | 8 +++++
2 files changed, 68 insertions(+)
diff --git a/tools/libs/guest/xg_sr_common.c b/tools/libs/guest/xg_sr_common.c
index 86c148c62f..a94e9dfbff 100644
--- a/tools/libs/guest/xg_sr_common.c
+++ b/tools/libs/guest/xg_sr_common.c
@@ -156,6 +156,66 @@ static void __attribute__((unused)) build_assertions(void)
BUILD_BUG_ON(sizeof(struct xc_sr_rec_hvm_params) != 8);
}
+enum {
+ foreigncopy_from,
+ foreigncopy_to
+};
+
+static int xg_foreignmemory_copy(xc_interface *xch, domid_t domid,
+ int dir, size_t nr_pages, void *buffer,
+ const xen_pfn_t foreign_pfns[nr_pages])
+{
+ if ( nr_pages == 0 )
+ return 0;
+
+ if ( !buffer || !foreign_pfns )
+ {
+ errno = EINVAL;
+ return -1;
+ }
+
+ int err[nr_pages];
+ const int prot = (dir == foreigncopy_from) ? PROT_READ : PROT_READ|PROT_WRITE;
+
+ void *p = xenforeignmemory_map(xch->fmem, domid, prot, nr_pages, foreign_pfns, err);
+ if ( !p )
+ {
+ errno = EINVAL;
+ return -1;
+ }
+
+ for ( size_t n = 0; n < nr_pages; ++n )
+ if ( err[n] )
+ {
+ xenforeignmemory_unmap(xch->fmem, p, nr_pages);
+ errno = -err[n];
+ return -1;
+ }
+
+ if ( dir == foreigncopy_from )
+ memcpy(buffer, p, nr_pages * XC_PAGE_SIZE);
+ else
+ memcpy(p, buffer, nr_pages * XC_PAGE_SIZE);
+
+ return xenforeignmemory_unmap(xch->fmem, p, nr_pages);
+}
+
+int xg_foreignmemory_copy_from(xc_interface *xch, domid_t dom,
+ size_t nr_pages, void *dest,
+ const xen_pfn_t source[nr_pages])
+{
+ return xg_foreignmemory_copy(xch, dom, foreigncopy_from,
+ nr_pages, dest, source);
+}
+
+int xg_foreignmemory_copy_to(xc_interface *xch, domid_t dom,
+ size_t nr_pages, const xen_pfn_t dest[nr_pages],
+ const void *source)
+{
+ return xg_foreignmemory_copy(xch, dom, foreigncopy_to,
+ nr_pages, (void *) source, dest);
+}
+
/*
* Local variables:
* mode: C
diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index e8452746e4..72c9511f38 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -553,6 +553,14 @@ static inline bool page_type_has_stream_data(uint32_t type)
}
}
+int xg_foreignmemory_copy_from(xc_interface *xch, domid_t dom,
+ size_t nr_pages, void *dest,
+ const xen_pfn_t source[nr_pages]);
+
+int xg_foreignmemory_copy_to(xc_interface *xch, domid_t dom,
+ size_t nr_pages, const xen_pfn_t dest[nr_pages],
+ const void *source);
+
#endif
/*
* Local variables:
--
2.54.0
^ permalink raw reply related [flat|nested] 30+ messages in thread* [PATCH v4 11/16] PoC: libs/guest: use foreign copy during migration
2026-06-03 13:05 [PATCH v4 00/16] xenguest optimisations Frediano Ziglio
` (9 preceding siblings ...)
2026-06-03 13:05 ` [PATCH v4 10/16] libs/guest: add xg_foreignmemory_copy_{from,to} Frediano Ziglio
@ 2026-06-03 13:05 ` Frediano Ziglio
2026-06-03 14:09 ` Andrew Cooper
2026-06-03 13:05 ` [PATCH v4 12/16] xen: implement new foreign copy hypercall Frediano Ziglio
` (4 subsequent siblings)
15 siblings, 1 reply; 30+ messages in thread
From: Frediano Ziglio @ 2026-06-03 13:05 UTC (permalink / raw)
To: xen-devel
Cc: Edwin Török, Jan Beulich, Andrew Cooper,
Roger Pau Monné, Teddy Astie, Anthony PERARD, Juergen Gross,
Frediano Ziglio
From: Edwin Török <edwin.torok@citrix.com>
ministat confirms the improvement:
```
x baseline
+ foreigncopy
N Min Max Median Avg Stddev
x 20 1.1306997 1.1447931 1.1356569 1.1365742 0.003242175
+ 20 0.4311504 0.44180303 0.43616705 0.43600089 0.0031094689
Difference at 95.0% confidence
-0.700573 +/- 0.00203311
-61.639% +/- 0.133355%
(Student's t, pooled s = 0.00317652)
```
The tests pass too, which means that it has correctly migrated all guest
memory.
Frediano: This PoC was adapted to be included in a final series.
Signed-off-by: Edwin Török <edwin.torok@citrix.com>
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
---
tools/libs/guest/xg_sr_common.h | 1 +
tools/libs/guest/xg_sr_restore.c | 42 +++--------------
tools/libs/guest/xg_sr_save.c | 81 +++++++++-----------------------
3 files changed, 30 insertions(+), 94 deletions(-)
diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 72c9511f38..0e0e279ae1 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -256,6 +256,7 @@ struct xc_sr_context
unsigned long nr_deferred_pages;
xc_hypercall_buffer_t dirty_bitmap_hbuf;
struct xc_sr_context_save_buffers *buffers;
+ void *dest_buf;
} save;
struct /* Restore data. */
diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index fb46142d87..b589f0397d 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -259,7 +259,6 @@ static int process_page_data(struct xc_sr_context *ctx, unsigned int count,
xen_pfn_t *mfns = malloc(count * sizeof(*mfns));
int *map_errs = malloc(count * sizeof(*map_errs));
int rc;
- void *mapping = NULL, *guest_page = NULL;
unsigned nr_pages;
if ( !mfns || !map_errs )
@@ -295,27 +294,8 @@ static int process_page_data(struct xc_sr_context *ctx, unsigned int count,
if ( nr_pages == 0 )
goto done;
- mapping = guest_page = xenforeignmemory_map(
- xch->fmem, ctx->domid, PROT_READ | PROT_WRITE,
- nr_pages, mfns, map_errs);
- if ( !mapping )
- {
- rc = -1;
- PERROR("Unable to map %u mfns for %u pages of data",
- nr_pages, count);
- goto err;
- }
-
for ( unsigned i = 0; i < nr_pages; ++i )
{
- if ( map_errs[i] )
- {
- rc = -1;
- ERROR("Mapping pfn %#"PRIpfn" (mfn %#"PRIpfn", type %#"PRIx32") failed with %d",
- pfns[i], mfns[i], types[i], map_errs[i]);
- goto err;
- }
-
/* Undo page normalisation done by the saver. */
rc = ctx->restore.ops.localise_page(ctx, types[i], page_data);
if ( rc )
@@ -325,29 +305,19 @@ static int process_page_data(struct xc_sr_context *ctx, unsigned int count,
goto err;
}
- if ( ctx->restore.verify )
- {
- /* Verify mode - compare incoming data to what we already have. */
- if ( memcmp(guest_page, page_data, PAGE_SIZE) )
- ERROR("verify pfn %#"PRIpfn" failed (type %#"PRIx32")",
- pfns[i], types[i] >> XEN_DOMCTL_PFINFO_LTAB_SHIFT);
- }
- else
- {
- /* Regular mode - copy incoming data into place. */
- memcpy(guest_page, page_data, PAGE_SIZE);
- }
-
- guest_page += PAGE_SIZE;
page_data += PAGE_SIZE;
}
+ if ( !ctx->restore.verify )
+ {
+ rc = xg_foreignmemory_copy_to(xch, ctx->domid, nr_pages, mfns, page_data);
+ if ( rc < 0 )
+ goto err;
+ }
done:
rc = 0;
err:
- if ( mapping )
- xenforeignmemory_unmap(xch->fmem, mapping, nr_pages);
free(map_errs);
free(mfns);
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index 593268f176..ae61f97a47 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -87,12 +87,10 @@ static int write_batch(struct xc_sr_context *ctx)
{
xc_interface *xch = ctx->xch;
xen_pfn_t *mfns, *types;
- void *guest_mapping = NULL;
void **local_pages;
int *errors, rc = -1;
- unsigned int i, p, nr_pages = 0, nr_pages_mapped = 0;
+ unsigned int i, nr_pages = 0;
unsigned int nr_pfns = ctx->save.nr_batch_pfns;
- void *page, *orig_page;
uint64_t *rec_pfns;
struct iovec *iov; int iovcnt = 0;
struct {
@@ -168,61 +166,18 @@ static int write_batch(struct xc_sr_context *ctx)
iovcnt = 2;
- if ( nr_pages > 0 )
+ rc = xg_foreignmemory_copy_from(xch, ctx->domid, nr_pages, ctx->save.dest_buf, mfns);
+ if ( rc < 0 )
{
- guest_mapping = xenforeignmemory_map(
- xch->fmem, ctx->domid, PROT_READ, nr_pages, mfns, errors);
- if ( !guest_mapping )
- {
- PERROR("Failed to map guest pages");
- goto err;
- }
- nr_pages_mapped = nr_pages;
-
- for ( i = 0, p = 0; i < nr_pfns; ++i )
- {
- if ( !page_type_has_stream_data(types[i]) )
- continue;
-
- if ( errors[p] )
- {
- ERROR("Mapping of pfn %#"PRIpfn" (mfn %#"PRIpfn") failed %d",
- ctx->save.batch_pfns[i], mfns[p], errors[p]);
- goto err;
- }
-
- orig_page = page = guest_mapping + (p * PAGE_SIZE);
- rc = ctx->save.ops.normalise_page(ctx, types[i], &page);
-
- if ( orig_page != page )
- local_pages[i] = page;
-
- if ( rc )
- {
- if ( rc == -1 && errno == EAGAIN )
- {
- set_bit(ctx->save.batch_pfns[i], ctx->save.deferred_pages);
- ++ctx->save.nr_deferred_pages;
- types[i] = XEN_DOMCTL_PFINFO_XTAB;
- --nr_pages;
- }
- else
- goto err;
- }
- else if ( iov[iovcnt-1].iov_base + iov[iovcnt-1].iov_len != page )
- {
- iov[iovcnt].iov_base = page;
- iov[iovcnt].iov_len = PAGE_SIZE;
- iovcnt++;
- }
- else
- {
- iov[iovcnt-1].iov_len += PAGE_SIZE;
- }
+ ERROR("xg_foreignmemory_copy_from failed");
+ goto err;
+ }
- rc = -1;
- ++p;
- }
+ if ( nr_pages )
+ {
+ iov[iovcnt].iov_base = ctx->save.dest_buf;
+ iov[iovcnt].iov_len = nr_pages << XC_PAGE_SHIFT;
+ iovcnt++;
}
hdrs.rec.length += nr_pages * PAGE_SIZE;
@@ -239,8 +194,6 @@ static int write_batch(struct xc_sr_context *ctx)
rc = ctx->save.nr_batch_pfns = 0;
err:
- if ( guest_mapping )
- xenforeignmemory_unmap(xch->fmem, guest_mapping, nr_pages_mapped);
for ( i = 0; local_pages && i < nr_pfns; ++i )
{
free(local_pages[i]);
@@ -764,6 +717,7 @@ static int setup(struct xc_sr_context *ctx)
{
xc_interface *xch = ctx->xch;
int rc;
+ const unsigned dest_buf_len = MAX_BATCH_SIZE * XC_PAGE_SIZE;
DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
&ctx->save.dirty_bitmap_hbuf);
@@ -775,6 +729,16 @@ static int setup(struct xc_sr_context *ctx)
xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)));
ctx->save.deferred_pages = bitmap_alloc(ctx->save.p2m_size);
ctx->save.buffers = calloc(1, sizeof(*ctx->save.buffers));
+ ctx->save.dest_buf = NULL;
+
+ rc = posix_memalign(&ctx->save.dest_buf, XC_PAGE_SIZE, dest_buf_len);
+ if ( rc )
+ {
+ ERROR("Unable to allocate %u bytes of buffer", dest_buf_len);
+ errno = rc;
+ rc = -1;
+ goto err;
+ }
if ( !dirty_bitmap || !ctx->save.deferred_pages || !ctx->save.buffers)
{
@@ -809,6 +773,7 @@ static void cleanup(struct xc_sr_context *ctx)
NRPAGES(bitmap_size(ctx->save.p2m_size)));
free(ctx->save.deferred_pages);
free(ctx->save.buffers);
+ free(ctx->save.dest_buf);
}
/*
--
2.54.0
^ permalink raw reply related [flat|nested] 30+ messages in thread* Re: [PATCH v4 11/16] PoC: libs/guest: use foreign copy during migration
2026-06-03 13:05 ` [PATCH v4 11/16] PoC: libs/guest: use foreign copy during migration Frediano Ziglio
@ 2026-06-03 14:09 ` Andrew Cooper
2026-06-04 14:51 ` Frediano Ziglio
0 siblings, 1 reply; 30+ messages in thread
From: Andrew Cooper @ 2026-06-03 14:09 UTC (permalink / raw)
To: Frediano Ziglio, xen-devel
Cc: Andrew Cooper, Edwin Török, Jan Beulich,
Roger Pau Monné, Teddy Astie, Anthony PERARD, Juergen Gross,
Frediano Ziglio
On 03/06/2026 2:05 pm, Frediano Ziglio wrote:
> From: Edwin Török <edwin.torok@citrix.com>
>
> ministat confirms the improvement:
>
> ```
> x baseline
> + foreigncopy
> N Min Max Median Avg Stddev
> x 20 1.1306997 1.1447931 1.1356569 1.1365742 0.003242175
> + 20 0.4311504 0.44180303 0.43616705 0.43600089 0.0031094689
> Difference at 95.0% confidence
> -0.700573 +/- 0.00203311
> -61.639% +/- 0.133355%
> (Student's t, pooled s = 0.00317652)
> ```
>
> The tests pass too, which means that it has correctly migrated all guest
> memory.
>
> Frediano: This PoC was adapted to be included in a final series.
>
> Signed-off-by: Edwin Török <edwin.torok@citrix.com>
> Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
This patch is definitely out of order, seeing as you don't introduce the
new hypercall until the following patch.
But there's also absolutely no information what those stats are. From
memory, I think it was wallclock time of migrating a VM, but there are
no units or sizes of the VM presented, so those are just random numbers.
It also doesn't state whether it's measured from a PV or a PVH guest.
However bad PV is (and it is bad), I think the improvement will be
better in a PVH guest, because the foreign map/unmap operations being
replaced are even more expensive in PVH.
There's another area in libxenguest which would likely benefit; domain
construction. Even with kernels and initrds in the MB range, foreign
copy is probalby a win, and it surely will be for e.g. the ACPI tables
which are a few kB.
~Andrew
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v4 11/16] PoC: libs/guest: use foreign copy during migration
2026-06-03 14:09 ` Andrew Cooper
@ 2026-06-04 14:51 ` Frediano Ziglio
0 siblings, 0 replies; 30+ messages in thread
From: Frediano Ziglio @ 2026-06-04 14:51 UTC (permalink / raw)
To: Andrew Cooper
Cc: xen-devel, Edwin Török, Jan Beulich,
Roger Pau Monné, Teddy Astie, Anthony PERARD, Juergen Gross,
Frediano Ziglio
On Wed, 3 Jun 2026 at 15:09, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> On 03/06/2026 2:05 pm, Frediano Ziglio wrote:
> > From: Edwin Török <edwin.torok@citrix.com>
> >
> > ministat confirms the improvement:
> >
> > ```
> > x baseline
> > + foreigncopy
> > N Min Max Median Avg Stddev
> > x 20 1.1306997 1.1447931 1.1356569 1.1365742 0.003242175
> > + 20 0.4311504 0.44180303 0.43616705 0.43600089 0.0031094689
> > Difference at 95.0% confidence
> > -0.700573 +/- 0.00203311
> > -61.639% +/- 0.133355%
> > (Student's t, pooled s = 0.00317652)
> > ```
> >
> > The tests pass too, which means that it has correctly migrated all guest
> > memory.
> >
> > Frediano: This PoC was adapted to be included in a final series.
> >
> > Signed-off-by: Edwin Török <edwin.torok@citrix.com>
> > Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
>
> This patch is definitely out of order, seeing as you don't introduce the
> new hypercall until the following patch.
>
Yes and no. In the original PoC it was, as it was trying to check
performance difference using the new hypercall. But to prove that code
was working the new hypercalls were "emulated" using functions doing
map/copy/unmap. This also makes sure no performance regressions using
the new wrappers. So the code works with or without the new
hypercalls.
> But there's also absolutely no information what those stats are. From
> memory, I think it was wallclock time of migrating a VM, but there are
> no units or sizes of the VM presented, so those are just random numbers.
>
I had the same though. I quickly searched if Edwin reappeared on ML
but it seems not at the moment.
> It also doesn't state whether it's measured from a PV or a PVH guest.
> However bad PV is (and it is bad), I think the improvement will be
> better in a PVH guest, because the foreign map/unmap operations being
> replaced are even more expensive in PVH.
>
The PoC was simply stripping out PV support so surely they were
PVH/HVM statistics.
> There's another area in libxenguest which would likely benefit; domain
> construction. Even with kernels and initrds in the MB range, foreign
> copy is probalby a win, and it surely will be for e.g. the ACPI tables
> which are a few kB.
>
> ~Andrew
Frediano
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 12/16] xen: implement new foreign copy hypercall
2026-06-03 13:05 [PATCH v4 00/16] xenguest optimisations Frediano Ziglio
` (10 preceding siblings ...)
2026-06-03 13:05 ` [PATCH v4 11/16] PoC: libs/guest: use foreign copy during migration Frediano Ziglio
@ 2026-06-03 13:05 ` Frediano Ziglio
2026-06-03 13:39 ` Jan Beulich
2026-06-08 14:59 ` Teddy Astie
2026-06-03 13:06 ` [PATCH v4 13/16] privcmd: Add definition for new Linux privcmd to access new Xen hypercall Frediano Ziglio
` (3 subsequent siblings)
15 siblings, 2 replies; 30+ messages in thread
From: Frediano Ziglio @ 2026-06-03 13:05 UTC (permalink / raw)
To: xen-devel
Cc: Frediano Ziglio, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Teddy Astie, Anthony PERARD, Juergen Gross
From: Frediano Ziglio <frediano.ziglio@citrix.com>
Add a sub hypercall to __HYPERVISOR_memory_op to allow to
read/write memory from/to a foreign domain.
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
---
xen/common/memory.c | 133 ++++++++++++++++++++++++++++++++++++
xen/include/public/memory.h | 40 ++++++++++-
2 files changed, 172 insertions(+), 1 deletion(-)
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 3672bda025..6a2d9c3190 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1545,6 +1545,132 @@ static int acquire_resource(
return rc;
}
+/*
+ * The "noinline" qualifier avoid the compiler to create a large function
+ * consuming quite a lot of stack.
+ */
+static int noinline mem_foreigncopy(
+ XEN_GUEST_HANDLE_PARAM(xen_foreigncopy_t) arg)
+{
+ struct domain *d, *const currd = current->domain;
+ xen_foreigncopy_t copy;
+ int rc, direction;
+
+ if ( !arch_acquire_resource_check(currd) )
+ return -EACCES;
+
+ if ( copy_from_guest(©, arg, 1) )
+ return -EFAULT;
+
+ if ( copy.flags & ~1u )
+ return -EINVAL;
+
+ direction = copy.flags & XENMEM_foreigncopy_direction;
+
+ if ( copy.nr_frames == 0 )
+ return 0;
+
+ rc = rcu_lock_remote_domain_by_id(copy.domid, &d);
+ if ( rc )
+ return rc;
+
+ /*
+ * Check we are allowed to map and access these foreign pages.
+ */
+ rc = xsm_map_gmfn_foreign(XSM_TARGET, currd, d);
+ if ( rc )
+ goto out;
+
+ do {
+ /*
+ * Arbitrary size. Not too much stack space, and a reasonable stride
+ * for continuation checks.
+ */
+ xen_pfn_t gfn_list[32];
+ unsigned int todo = MIN(ARRAY_SIZE(gfn_list), copy.nr_frames);
+
+ rc = -EFAULT;
+ if ( copy_from_guest(gfn_list, copy.frame_list, todo) )
+ goto out;
+
+ for ( unsigned i = 0; i < todo; i++ )
+ {
+ struct page_info *foreign_page;
+ void *foreign;
+ p2m_type_t p2mt;
+
+ foreign_page = get_page_from_gfn(d, gfn_list[i], &p2mt, P2M_ALLOC);
+
+ if ( unlikely(p2mt != p2m_ram_rw
+#ifdef CONFIG_X86
+ && p2mt != p2m_ram_logdirty
+#endif
+ ) && foreign_page )
+ {
+ put_page(foreign_page);
+ foreign_page = NULL;
+ }
+ if ( unlikely(!foreign_page) )
+ {
+ gdprintk(XENLOG_WARNING,
+ "Error accessing foreign mfn %" PRI_mfn "\n",
+ gfn_list[i]);
+ rc = -EINVAL;
+ copy.nr_frames -= i;
+ guest_handle_add_offset(copy.frame_list, i);
+ goto out;
+ }
+
+ /* A page is dirtied when it's being copied to. */
+ if ( direction == XENMEM_foreigncopy_to )
+ paging_mark_dirty(d, page_to_mfn(foreign_page));
+
+ foreign = map_domain_page(page_to_mfn(foreign_page));
+ if ( direction == XENMEM_foreigncopy_from )
+ rc = copy_to_guest(copy.buffer, foreign, PAGE_SIZE);
+ else
+ rc = copy_from_guest(foreign, copy.buffer, PAGE_SIZE);
+ unmap_domain_page(foreign);
+ put_page(foreign_page);
+
+ if ( unlikely(rc) )
+ {
+ gdprintk(XENLOG_WARNING,
+ "Error copying to mfn %" PRI_mfn "\n", gfn_list[i]);
+ copy.nr_frames -= i;
+ guest_handle_add_offset(copy.frame_list, i);
+ goto out;
+ }
+
+ guest_handle_add_offset(copy.buffer, PAGE_SIZE);
+ }
+
+ copy.nr_frames -= todo;
+ guest_handle_add_offset(copy.frame_list, todo);
+
+ if ( copy.nr_frames && hypercall_preempt_check() )
+ {
+ rc = hypercall_create_continuation(
+ __HYPERVISOR_memory_op, "lh", XENMEM_foreigncopy, arg);
+ goto out;
+ }
+ } while ( copy.nr_frames );
+
+ rc = 0;
+
+ out:
+ rcu_unlock_domain(d);
+
+ /* Update in all cases, it allows the caller to know how many
+ * frames were successfully copied and the continuation to
+ * continue correctly.
+ */
+ if ( copy_to_guest(arg, ©, 1) )
+ rc = -EFAULT;
+
+ return rc;
+}
+
long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
{
struct domain *d, *curr_d = current->domain;
@@ -2012,6 +2138,13 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
start_extent);
break;
+ case XENMEM_foreigncopy:
+ if ( unlikely(start_extent) )
+ return -EINVAL;
+
+ rc = mem_foreigncopy(guest_handle_cast(arg, xen_foreigncopy_t));
+ break;
+
default:
rc = arch_memory_op(cmd, arg);
break;
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index bd9fc37b52..b48d1f378f 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -740,7 +740,45 @@ struct xen_vnuma_topology_info {
typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t;
DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t);
-/* Next available subop number is 29 */
+/*
+ * Copy memory from/to a given domain.
+ */
+#define XENMEM_foreigncopy 29
+struct xen_foreigncopy {
+ /* IN - The domain whose resource is to be copied. */
+ domid_t domid;
+
+ /* IN - Flags. */
+#define XENMEM_foreigncopy_from 0
+#define XENMEM_foreigncopy_to 1
+#define XENMEM_foreigncopy_direction 1
+ uint16_t flags;
+
+ /*
+ * IN
+ *
+ * As an IN parameter number of frames of the domain to be copied.
+ */
+ uint32_t nr_frames;
+
+ /*
+ * IN
+ *
+ * Frames to be copied.
+ */
+ XEN_GUEST_HANDLE(xen_pfn_t) frame_list;
+
+ /*
+ * IN/OUT
+ *
+ * Userspace buffer to read/write from.
+ */
+ XEN_GUEST_HANDLE(uint8) buffer;
+};
+typedef struct xen_foreigncopy xen_foreigncopy_t;
+DEFINE_XEN_GUEST_HANDLE(xen_foreigncopy_t);
+
+/* Next available subop number is 30 */
#endif /* __XEN_PUBLIC_MEMORY_H__ */
--
2.54.0
^ permalink raw reply related [flat|nested] 30+ messages in thread* Re: [PATCH v4 12/16] xen: implement new foreign copy hypercall
2026-06-03 13:05 ` [PATCH v4 12/16] xen: implement new foreign copy hypercall Frediano Ziglio
@ 2026-06-03 13:39 ` Jan Beulich
2026-06-03 14:00 ` Andrew Cooper
2026-06-08 14:59 ` Teddy Astie
1 sibling, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2026-06-03 13:39 UTC (permalink / raw)
To: Frediano Ziglio
Cc: Frediano Ziglio, Andrew Cooper, Roger Pau Monné, Teddy Astie,
Anthony PERARD, Juergen Gross, xen-devel
On 03.06.2026 15:05, Frediano Ziglio wrote:
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -1545,6 +1545,132 @@ static int acquire_resource(
> return rc;
> }
>
> +/*
> + * The "noinline" qualifier avoid the compiler to create a large function
> + * consuming quite a lot of stack.
> + */
> +static int noinline mem_foreigncopy(
> + XEN_GUEST_HANDLE_PARAM(xen_foreigncopy_t) arg)
> +{
> + struct domain *d, *const currd = current->domain;
> + xen_foreigncopy_t copy;
> + int rc, direction;
> +
> + if ( !arch_acquire_resource_check(currd) )
> + return -EACCES;
How is the new sub-op related to acquire-resource? And aren't you instead
lacking an XSM check?
> + if ( copy_from_guest(©, arg, 1) )
> + return -EFAULT;
> +
> + if ( copy.flags & ~1u )
If a suffix is needed here in the first place, please use an upper-case one.
Misra only demands L suffixes to be upper-case, but iirc we decided that
then we want all suffixes that way.
> + return -EINVAL;
> +
> + direction = copy.flags & XENMEM_foreigncopy_direction;
> +
> + if ( copy.nr_frames == 0 )
> + return 0;
> +
> + rc = rcu_lock_remote_domain_by_id(copy.domid, &d);
Why not rcu_lock_domain_by_any_id()? IOW why would a self-copy need
prohibiting?
> + if ( rc )
> + return rc;
> +
> + /*
> + * Check we are allowed to map and access these foreign pages.
> + */
> + rc = xsm_map_gmfn_foreign(XSM_TARGET, currd, d);
> + if ( rc )
> + goto out;
> +
> + do {
> + /*
> + * Arbitrary size. Not too much stack space, and a reasonable stride
> + * for continuation checks.
> + */
> + xen_pfn_t gfn_list[32];
> + unsigned int todo = MIN(ARRAY_SIZE(gfn_list), copy.nr_frames);
> +
> + rc = -EFAULT;
> + if ( copy_from_guest(gfn_list, copy.frame_list, todo) )
> + goto out;
> +
> + for ( unsigned i = 0; i < todo; i++ )
> + {
> + struct page_info *foreign_page;
> + void *foreign;
> + p2m_type_t p2mt;
> +
> + foreign_page = get_page_from_gfn(d, gfn_list[i], &p2mt, P2M_ALLOC);
> +
> + if ( unlikely(p2mt != p2m_ram_rw
> +#ifdef CONFIG_X86
> + && p2mt != p2m_ram_logdirty
> +#endif
> + ) && foreign_page )
This is ugly formatting wise, and the use of unlikely() isn't very likely
to have the effect you intend: As long as the compiler can't translate the
&& expression to something involving only a single conditional branch,
which of the branches is it that is unlikely to be taken?
> + {
> + put_page(foreign_page);
> + foreign_page = NULL;
> + }
> + if ( unlikely(!foreign_page) )
> + {
> + gdprintk(XENLOG_WARNING,
> + "Error accessing foreign mfn %" PRI_mfn "\n",
> + gfn_list[i]);
As per get_page_from_gfn() and gfn_list[] it's a GFN, not an MFN.
> + rc = -EINVAL;
> + copy.nr_frames -= i;
> + guest_handle_add_offset(copy.frame_list, i);
> + goto out;
> + }
> +
> + /* A page is dirtied when it's being copied to. */
> + if ( direction == XENMEM_foreigncopy_to )
> + paging_mark_dirty(d, page_to_mfn(foreign_page));
> +
> + foreign = map_domain_page(page_to_mfn(foreign_page));
> + if ( direction == XENMEM_foreigncopy_from )
> + rc = copy_to_guest(copy.buffer, foreign, PAGE_SIZE);
> + else
> + rc = copy_from_guest(foreign, copy.buffer, PAGE_SIZE);
> + unmap_domain_page(foreign);
> + put_page(foreign_page);
> +
> + if ( unlikely(rc) )
> + {
> + gdprintk(XENLOG_WARNING,
> + "Error copying to mfn %" PRI_mfn "\n", gfn_list[i]);
"to" isn't always correct here, and the problem wasn't with the GFN (not
MFN) anyway, but with the buffer.
> + copy.nr_frames -= i;
> + guest_handle_add_offset(copy.frame_list, i);
> + goto out;
> + }
> +
> + guest_handle_add_offset(copy.buffer, PAGE_SIZE);
> + }
> +
> + copy.nr_frames -= todo;
> + guest_handle_add_offset(copy.frame_list, todo);
> +
> + if ( copy.nr_frames && hypercall_preempt_check() )
> + {
> + rc = hypercall_create_continuation(
> + __HYPERVISOR_memory_op, "lh", XENMEM_foreigncopy, arg);
> + goto out;
> + }
> + } while ( copy.nr_frames );
> +
> + rc = 0;
> +
> + out:
> + rcu_unlock_domain(d);
> +
> + /* Update in all cases, it allows the caller to know how many
> + * frames were successfully copied and the continuation to
> + * continue correctly.
> + */
> + if ( copy_to_guest(arg, ©, 1) )
Since you already used copy_from_guest() one the way in, __copy_to_guest()
will do here.
> + rc = -EFAULT;
> +
> + return rc;
> +}
Before looking at the implementation in yet more detail: This is quite a
bit of new code. Did you at least consider extending MMUEXT_COPY_PAGE
along the lines of what c6b8bdfe3b47 ("x86: extend mmu_update hypercall
to allow update of foreign pagetables") did, allowing two domains to be
specified in the foreigndom argument?
> @@ -2012,6 +2138,13 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> start_extent);
> break;
>
> + case XENMEM_foreigncopy:
> + if ( unlikely(start_extent) )
> + return -EINVAL;
Why make this different from other continuable sub-ops?
> --- a/xen/include/public/memory.h
> +++ b/xen/include/public/memory.h
> @@ -740,7 +740,45 @@ struct xen_vnuma_topology_info {
> typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t;
> DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t);
>
> -/* Next available subop number is 29 */
> +/*
> + * Copy memory from/to a given domain.
> + */
> +#define XENMEM_foreigncopy 29
> +struct xen_foreigncopy {
> + /* IN - The domain whose resource is to be copied. */
> + domid_t domid;
> +
> + /* IN - Flags. */
> +#define XENMEM_foreigncopy_from 0
> +#define XENMEM_foreigncopy_to 1
> +#define XENMEM_foreigncopy_direction 1
> + uint16_t flags;
> +
> + /*
> + * IN
> + *
> + * As an IN parameter number of frames of the domain to be copied.
> + */
> + uint32_t nr_frames;
> +
> + /*
> + * IN
> + *
> + * Frames to be copied.
> + */
> + XEN_GUEST_HANDLE(xen_pfn_t) frame_list;
> +
> + /*
> + * IN/OUT
> + *
> + * Userspace buffer to read/write from.
> + */
> + XEN_GUEST_HANDLE(uint8) buffer;
> +};
Seeing the two handles - what about compat_memory_op()?
Jan
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [PATCH v4 12/16] xen: implement new foreign copy hypercall
2026-06-03 13:39 ` Jan Beulich
@ 2026-06-03 14:00 ` Andrew Cooper
0 siblings, 0 replies; 30+ messages in thread
From: Andrew Cooper @ 2026-06-03 14:00 UTC (permalink / raw)
To: Jan Beulich, Frediano Ziglio
Cc: Andrew Cooper, Frediano Ziglio, Roger Pau Monné, Teddy Astie,
Anthony PERARD, Juergen Gross, xen-devel
On 03/06/2026 2:39 pm, Jan Beulich wrote:
> On 03.06.2026 15:05, Frediano Ziglio wrote:
>> + if ( rc )
>> + return rc;
>> +
>> + /*
>> + * Check we are allowed to map and access these foreign pages.
>> + */
>> + rc = xsm_map_gmfn_foreign(XSM_TARGET, currd, d);
>> + if ( rc )
>> + goto out;
>> +
>> + do {
>> + /*
>> + * Arbitrary size. Not too much stack space, and a reasonable stride
>> + * for continuation checks.
>> + */
>> + xen_pfn_t gfn_list[32];
>> + unsigned int todo = MIN(ARRAY_SIZE(gfn_list), copy.nr_frames);
>> +
>> + rc = -EFAULT;
>> + if ( copy_from_guest(gfn_list, copy.frame_list, todo) )
>> + goto out;
>> +
>> + for ( unsigned i = 0; i < todo; i++ )
>> + {
>> + struct page_info *foreign_page;
>> + void *foreign;
>> + p2m_type_t p2mt;
>> +
>> + foreign_page = get_page_from_gfn(d, gfn_list[i], &p2mt, P2M_ALLOC);
>> +
>> + if ( unlikely(p2mt != p2m_ram_rw
>> +#ifdef CONFIG_X86
>> + && p2mt != p2m_ram_logdirty
>> +#endif
>> + ) && foreign_page )
> This is ugly formatting wise, and the use of unlikely() isn't very likely
> to have the effect you intend: As long as the compiler can't translate the
> && expression to something involving only a single conditional branch,
> which of the branches is it that is unlikely to be taken?
Irrespective of what the compiler thinks or may do, Eclair will hard
reject it because unlikely() is a macro.
~Andrew
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v4 12/16] xen: implement new foreign copy hypercall
2026-06-03 13:05 ` [PATCH v4 12/16] xen: implement new foreign copy hypercall Frediano Ziglio
2026-06-03 13:39 ` Jan Beulich
@ 2026-06-08 14:59 ` Teddy Astie
1 sibling, 0 replies; 30+ messages in thread
From: Teddy Astie @ 2026-06-08 14:59 UTC (permalink / raw)
To: Frediano Ziglio, xen-devel
Cc: Frediano Ziglio, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Anthony PERARD, Juergen Gross
[-- Attachment #1.1.1: Type: text/plain, Size: 7374 bytes --]
Le 03/06/2026 à 15:08, Frediano Ziglio a écrit :
> From: Frediano Ziglio <frediano.ziglio@citrix.com>
>
> Add a sub hypercall to __HYPERVISOR_memory_op to allow to
> read/write memory from/to a foreign domain.
>
> Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
> ---
> xen/common/memory.c | 133 ++++++++++++++++++++++++++++++++++++
> xen/include/public/memory.h | 40 ++++++++++-
> 2 files changed, 172 insertions(+), 1 deletion(-)
>
> diff --git a/xen/common/memory.c b/xen/common/memory.c
> index 3672bda025..6a2d9c3190 100644
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -1545,6 +1545,132 @@ static int acquire_resource(
> return rc;
> }
>
> +/*
> + * The "noinline" qualifier avoid the compiler to create a large function
> + * consuming quite a lot of stack.
> + */
> +static int noinline mem_foreigncopy(
> + XEN_GUEST_HANDLE_PARAM(xen_foreigncopy_t) arg)
> +{
> + struct domain *d, *const currd = current->domain;
> + xen_foreigncopy_t copy;
> + int rc, direction;
> +
> + if ( !arch_acquire_resource_check(currd) )
> + return -EACCES;
> +
> + if ( copy_from_guest(©, arg, 1) )
> + return -EFAULT;
> +
> + if ( copy.flags & ~1u )
> + return -EINVAL;
> +
> + direction = copy.flags & XENMEM_foreigncopy_direction;
> +
> + if ( copy.nr_frames == 0 )
> + return 0;
> +
> + rc = rcu_lock_remote_domain_by_id(copy.domid, &d);
> + if ( rc )
> + return rc;
> +
> + /*
> + * Check we are allowed to map and access these foreign pages.
> + */
> + rc = xsm_map_gmfn_foreign(XSM_TARGET, currd, d);
> + if ( rc )
> + goto out;
> +
> + do {
> + /*
> + * Arbitrary size. Not too much stack space, and a reasonable stride
> + * for continuation checks.
> + */
> + xen_pfn_t gfn_list[32];
> + unsigned int todo = MIN(ARRAY_SIZE(gfn_list), copy.nr_frames);
> +
> + rc = -EFAULT;
> + if ( copy_from_guest(gfn_list, copy.frame_list, todo) )
> + goto out;
> +
> + for ( unsigned i = 0; i < todo; i++ )
> + {
> + struct page_info *foreign_page;
> + void *foreign;
> + p2m_type_t p2mt;
> +
> + foreign_page = get_page_from_gfn(d, gfn_list[i], &p2mt, P2M_ALLOC);
> +
> + if ( unlikely(p2mt != p2m_ram_rw
> +#ifdef CONFIG_X86
> + && p2mt != p2m_ram_logdirty
> +#endif
> + ) && foreign_page )
> + {
> + put_page(foreign_page);
> + foreign_page = NULL;
> + }
> + if ( unlikely(!foreign_page) )
> + {
> + gdprintk(XENLOG_WARNING,
> + "Error accessing foreign mfn %" PRI_mfn "\n",
> + gfn_list[i]);
> + rc = -EINVAL;
> + copy.nr_frames -= i;
> + guest_handle_add_offset(copy.frame_list, i);
> + goto out;
> + }
> +
> + /* A page is dirtied when it's being copied to. */
> + if ( direction == XENMEM_foreigncopy_to )
> + paging_mark_dirty(d, page_to_mfn(foreign_page));
> +
> + foreign = map_domain_page(page_to_mfn(foreign_page));
> + if ( direction == XENMEM_foreigncopy_from )
> + rc = copy_to_guest(copy.buffer, foreign, PAGE_SIZE);
> + else
> + rc = copy_from_guest(foreign, copy.buffer, PAGE_SIZE);
> + unmap_domain_page(foreign);
> + put_page(foreign_page);
> +
> + if ( unlikely(rc) )
> + {
> + gdprintk(XENLOG_WARNING,
> + "Error copying to mfn %" PRI_mfn "\n", gfn_list[i]);
> + copy.nr_frames -= i;
> + guest_handle_add_offset(copy.frame_list, i);
> + goto out;
> + }
> +
> + guest_handle_add_offset(copy.buffer, PAGE_SIZE);
> + }
> +
> + copy.nr_frames -= todo;
> + guest_handle_add_offset(copy.frame_list, todo);
> +
> + if ( copy.nr_frames && hypercall_preempt_check() )
> + {
> + rc = hypercall_create_continuation(
> + __HYPERVISOR_memory_op, "lh", XENMEM_foreigncopy, arg);
> + goto out;
> + }
> + } while ( copy.nr_frames );
> +
> + rc = 0;
> +
> + out:
> + rcu_unlock_domain(d);
> +
> + /* Update in all cases, it allows the caller to know how many
> + * frames were successfully copied and the continuation to
> + * continue correctly.
> + */
> + if ( copy_to_guest(arg, ©, 1) )
> + rc = -EFAULT;
> +
> + return rc;
> +}
> +
> long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> {
> struct domain *d, *curr_d = current->domain;
> @@ -2012,6 +2138,13 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> start_extent);
> break;
>
> + case XENMEM_foreigncopy:
> + if ( unlikely(start_extent) )
> + return -EINVAL;
> +
> + rc = mem_foreigncopy(guest_handle_cast(arg, xen_foreigncopy_t));
> + break;
> +
> default:
> rc = arch_memory_op(cmd, arg);
> break;
> diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
> index bd9fc37b52..b48d1f378f 100644
> --- a/xen/include/public/memory.h
> +++ b/xen/include/public/memory.h
> @@ -740,7 +740,45 @@ struct xen_vnuma_topology_info {
> typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t;
> DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t);
>
> -/* Next available subop number is 29 */
> +/*
> + * Copy memory from/to a given domain.
> + */
> +#define XENMEM_foreigncopy 29
> +struct xen_foreigncopy {
> + /* IN - The domain whose resource is to be copied. */
> + domid_t domid;
> +
> + /* IN - Flags. */
> +#define XENMEM_foreigncopy_from 0
> +#define XENMEM_foreigncopy_to 1
> +#define XENMEM_foreigncopy_direction 1
> + uint16_t flags;
> +
> + /*
> + * IN
> + *
> + * As an IN parameter number of frames of the domain to be copied.
> + */
> + uint32_t nr_frames;
> +
The interface only allows copies to be made at page granularity, while
that can be ok, that probably wants to be stated.
> + /*
> + * IN
> + *
> + * Frames to be copied.
> + */
> + XEN_GUEST_HANDLE(xen_pfn_t) frame_list;
> +
> + /*
> + * IN/OUT
> + *
> + * Userspace buffer to read/write from.
> + */
> + XEN_GUEST_HANDLE(uint8) buffer;
> +};
The interface looks a bit asymetric, on one hand, it takes pfns and on
the other hand, it takes a guest virtual address.
Though, using guest pointers is not great for PVH domains, as it
requires expensive pagewalks (especially for a lot of pages).
Would it be preferable to have a list of gmfn for both sides ?
> +typedef struct xen_foreigncopy xen_foreigncopy_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_foreigncopy_t);
> +
> +/* Next available subop number is 30 */
>
> #endif /* __XEN_PUBLIC_MEMORY_H__ */
>
Teddy
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 2489 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 665 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 13/16] privcmd: Add definition for new Linux privcmd to access new Xen hypercall
2026-06-03 13:05 [PATCH v4 00/16] xenguest optimisations Frediano Ziglio
` (11 preceding siblings ...)
2026-06-03 13:05 ` [PATCH v4 12/16] xen: implement new foreign copy hypercall Frediano Ziglio
@ 2026-06-03 13:06 ` Frediano Ziglio
2026-06-03 13:06 ` [PATCH v4 14/16] libs/guest: use new hypercall if available Frediano Ziglio
` (2 subsequent siblings)
15 siblings, 0 replies; 30+ messages in thread
From: Frediano Ziglio @ 2026-06-03 13:06 UTC (permalink / raw)
To: xen-devel
Cc: Frediano Ziglio, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Teddy Astie, Anthony PERARD, Juergen Gross
From: Frediano Ziglio <frediano.ziglio@citrix.com>
Userspace should use new ioctl to access new hypercall.
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
---
tools/include/xen-sys/Linux/privcmd.h | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/tools/include/xen-sys/Linux/privcmd.h b/tools/include/xen-sys/Linux/privcmd.h
index 607dfa2287..4b80eeae06 100644
--- a/tools/include/xen-sys/Linux/privcmd.h
+++ b/tools/include/xen-sys/Linux/privcmd.h
@@ -100,6 +100,14 @@ typedef struct privcmd_pcidev_get_gsi {
__u32 gsi;
} privcmd_pcidev_get_gsi_t;
+typedef struct privcmd_foreigncopy {
+ domid_t dom; /* Foreign domain. */
+ __u16 dir; /* Direction, 0 from, 1 to. */
+ __u32 num; /* Number of pages to copy. */
+ const xen_pfn_t __user *pfns; /* Array of pfns. */
+ void __user *buffer; /* Buffer to copy to/from, must be page aligned. */
+} privcmd_foreigncopy_t;
+
/*
* @cmd: IOCTL_PRIVCMD_HYPERCALL
* @arg: &privcmd_hypercall_t
@@ -121,6 +129,8 @@ typedef struct privcmd_pcidev_get_gsi {
_IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t))
#define IOCTL_PRIVCMD_PCIDEV_GET_GSI \
_IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_pcidev_get_gsi_t))
+#define IOCTL_PRIVCMD_FOREIGNCOPY \
+ _IOC(_IOC_NONE, 'P', 11, sizeof(privcmd_foreigncopy_t))
#define IOCTL_PRIVCMD_UNIMPLEMENTED \
_IOC(_IOC_NONE, 'P', 0xFF, 0)
--
2.54.0
^ permalink raw reply related [flat|nested] 30+ messages in thread* [PATCH v4 14/16] libs/guest: use new hypercall if available
2026-06-03 13:05 [PATCH v4 00/16] xenguest optimisations Frediano Ziglio
` (12 preceding siblings ...)
2026-06-03 13:06 ` [PATCH v4 13/16] privcmd: Add definition for new Linux privcmd to access new Xen hypercall Frediano Ziglio
@ 2026-06-03 13:06 ` Frediano Ziglio
2026-06-03 13:06 ` [PATCH v4 15/16] libs/guest: finalize PoC Frediano Ziglio
2026-06-03 13:06 ` [PATCH Linux v4 16/16] xen/privcmd: Add new ABI to allow copying foreign memory Frediano Ziglio
15 siblings, 0 replies; 30+ messages in thread
From: Frediano Ziglio @ 2026-06-03 13:06 UTC (permalink / raw)
To: xen-devel
Cc: Frediano Ziglio, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Teddy Astie, Anthony PERARD, Juergen Gross
From: Frediano Ziglio <frediano.ziglio@citrix.com>
Use new hypercall if available, otherwise fall back to map+copy+unmap
sequence.
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
---
tools/libs/guest/xg_sr_common.c | 47 ++++++++++++++++++++++++++-------
1 file changed, 38 insertions(+), 9 deletions(-)
diff --git a/tools/libs/guest/xg_sr_common.c b/tools/libs/guest/xg_sr_common.c
index a94e9dfbff..cb3e6a5658 100644
--- a/tools/libs/guest/xg_sr_common.c
+++ b/tools/libs/guest/xg_sr_common.c
@@ -156,11 +156,6 @@ static void __attribute__((unused)) build_assertions(void)
BUILD_BUG_ON(sizeof(struct xc_sr_rec_hvm_params) != 8);
}
-enum {
- foreigncopy_from,
- foreigncopy_to
-};
-
static int xg_foreignmemory_copy(xc_interface *xch, domid_t domid,
int dir, size_t nr_pages, void *buffer,
const xen_pfn_t foreign_pfns[nr_pages])
@@ -174,8 +169,42 @@ static int xg_foreignmemory_copy(xc_interface *xch, domid_t domid,
return -1;
}
+ /*
+ * If foreign copy is supported, -1 not initialized, 0 not supported,
+ * 1 supported.
+ */
+ static char foreign_copy_supported = -1;
+
+ if ( foreign_copy_supported )
+ {
+ int rc;
+ privcmd_foreigncopy_t copy = {
+ .dom = domid,
+ .dir = dir,
+ .num = nr_pages,
+ .buffer = buffer,
+ };
+ DECLARE_HYPERCALL_BOUNCE_IN(foreign_pfns, nr_pages * sizeof(xen_pfn_t));
+
+ if ( xc_hypercall_bounce_pre(xch, foreign_pfns) )
+ return -1;
+
+ copy.pfns = foreign_pfns;
+
+ rc = ioctl(xencall_fd(xch->xcall), IOCTL_PRIVCMD_FOREIGNCOPY, ©);
+ if ( foreign_copy_supported < 0 )
+ foreign_copy_supported =
+ (!rc || (errno != ENOTTY && errno != ENOSYS));
+
+ xc_hypercall_bounce_post(xch, foreign_pfns);
+
+ if ( foreign_copy_supported )
+ return rc;
+ }
+
+ /* Fallback, emulate. */
int err[nr_pages];
- const int prot = (dir == foreigncopy_from) ? PROT_READ : PROT_READ|PROT_WRITE;
+ const int prot = (dir == XENMEM_foreigncopy_from) ? PROT_READ : PROT_READ|PROT_WRITE;
void *p = xenforeignmemory_map(xch->fmem, domid, prot, nr_pages, foreign_pfns, err);
if ( !p )
@@ -192,7 +221,7 @@ static int xg_foreignmemory_copy(xc_interface *xch, domid_t domid,
return -1;
}
- if ( dir == foreigncopy_from )
+ if ( dir == XENMEM_foreigncopy_from )
memcpy(buffer, p, nr_pages * XC_PAGE_SIZE);
else
memcpy(p, buffer, nr_pages * XC_PAGE_SIZE);
@@ -204,7 +233,7 @@ int xg_foreignmemory_copy_from(xc_interface *xch, domid_t dom,
size_t nr_pages, void *dest,
const xen_pfn_t source[nr_pages])
{
- return xg_foreignmemory_copy(xch, dom, foreigncopy_from,
+ return xg_foreignmemory_copy(xch, dom, XENMEM_foreigncopy_from,
nr_pages, dest, source);
}
@@ -212,7 +241,7 @@ int xg_foreignmemory_copy_to(xc_interface *xch, domid_t dom,
size_t nr_pages, const xen_pfn_t dest[nr_pages],
const void *source)
{
- return xg_foreignmemory_copy(xch, dom, foreigncopy_to,
+ return xg_foreignmemory_copy(xch, dom, XENMEM_foreigncopy_to,
nr_pages, (void *) source, dest);
}
--
2.54.0
^ permalink raw reply related [flat|nested] 30+ messages in thread* [PATCH v4 15/16] libs/guest: finalize PoC
2026-06-03 13:05 [PATCH v4 00/16] xenguest optimisations Frediano Ziglio
` (13 preceding siblings ...)
2026-06-03 13:06 ` [PATCH v4 14/16] libs/guest: use new hypercall if available Frediano Ziglio
@ 2026-06-03 13:06 ` Frediano Ziglio
2026-06-03 13:06 ` [PATCH Linux v4 16/16] xen/privcmd: Add new ABI to allow copying foreign memory Frediano Ziglio
15 siblings, 0 replies; 30+ messages in thread
From: Frediano Ziglio @ 2026-06-03 13:06 UTC (permalink / raw)
To: xen-devel
Cc: Frediano Ziglio, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Teddy Astie, Anthony PERARD, Juergen Gross
From: Frediano Ziglio <frediano.ziglio@citrix.com>
Remove now unused map_errs array.
Test and restore verification code.
Report correctly errors from writev_exact.
Allocate verification buffer using hypercall buffer to avoid errors
using hypercall.
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
---
tools/libs/guest/xg_sr_common.h | 4 +-
tools/libs/guest/xg_sr_restore.c | 48 +++++++++++++++---
tools/libs/guest/xg_sr_save.c | 83 +++++++++++++++++++++-----------
3 files changed, 100 insertions(+), 35 deletions(-)
diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 0e0e279ae1..cd562f028a 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -214,7 +214,6 @@ struct xc_sr_context_save_buffers
xen_pfn_t batch_pfns[MAX_BATCH_SIZE];
xen_pfn_t mfns[MAX_BATCH_SIZE];
xen_pfn_t types[MAX_BATCH_SIZE];
- int errors[MAX_BATCH_SIZE];
void *local_pages[MAX_BATCH_SIZE];
struct iovec iov[MAX_BATCH_SIZE + 2]; /* Headers + data. */
uint64_t rec_pfns[MAX_BATCH_SIZE];
@@ -255,8 +254,8 @@ struct xc_sr_context
unsigned long *deferred_pages;
unsigned long nr_deferred_pages;
xc_hypercall_buffer_t dirty_bitmap_hbuf;
+ xc_hypercall_buffer_t dest_buf;
struct xc_sr_context_save_buffers *buffers;
- void *dest_buf;
} save;
struct /* Restore data. */
@@ -267,6 +266,7 @@ struct xc_sr_context
int send_back_fd;
unsigned long p2m_size;
xc_hypercall_buffer_t dirty_bitmap_hbuf;
+ xc_hypercall_buffer_t verify_buf;
/* From Image Header. */
uint32_t format_version;
diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index b589f0397d..b2df36c6f6 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -257,15 +257,15 @@ static int process_page_data(struct xc_sr_context *ctx, unsigned int count,
{
xc_interface *xch = ctx->xch;
xen_pfn_t *mfns = malloc(count * sizeof(*mfns));
- int *map_errs = malloc(count * sizeof(*map_errs));
int rc;
unsigned nr_pages;
+ void *const source = page_data;
- if ( !mfns || !map_errs )
+ if ( !mfns )
{
rc = -1;
ERROR("Failed to allocate %zu bytes to process page data",
- count * (sizeof(*mfns) + sizeof(*map_errs)));
+ count * sizeof(*mfns));
goto err;
}
@@ -309,17 +309,37 @@ static int process_page_data(struct xc_sr_context *ctx, unsigned int count,
}
if ( !ctx->restore.verify )
{
- rc = xg_foreignmemory_copy_to(xch, ctx->domid, nr_pages, mfns, page_data);
+ rc = xg_foreignmemory_copy_to(xch, ctx->domid, nr_pages, mfns, source);
if ( rc < 0 )
goto err;
}
+ else
+ {
+ DECLARE_HYPERCALL_BUFFER_SHADOW(uint8_t, verify_buf,
+ &ctx->restore.verify_buf);
+
+ rc = xg_foreignmemory_copy_from(xch, ctx->domid, nr_pages, verify_buf, mfns);
+ if ( rc < 0 )
+ goto err;
+
+ void *guest_page = verify_buf;
+ page_data = source;
+ for ( unsigned i = 0; i < nr_pages; ++i )
+ {
+ /* Verify mode - compare incoming data to what we already have. */
+ if ( memcmp(guest_page, page_data, PAGE_SIZE) )
+ ERROR("verify pfn %#"PRIpfn" failed (type %#"PRIx32")",
+ pfns[i], types[i] >> XEN_DOMCTL_PFINFO_LTAB_SHIFT);
+
+ guest_page += PAGE_SIZE;
+ page_data += PAGE_SIZE;
+ }
+ }
done:
rc = 0;
err:
-
- free(map_errs);
free(mfns);
return rc;
@@ -709,6 +729,18 @@ static int setup(struct xc_sr_context *ctx)
int rc;
DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
&ctx->restore.dirty_bitmap_hbuf);
+ DECLARE_HYPERCALL_BUFFER_SHADOW(uint8_t, verify_buf,
+ &ctx->restore.verify_buf);
+
+ verify_buf = xc_hypercall_buffer_alloc_pages(
+ xch, verify_buf, MAX_BATCH_SIZE);
+
+ if ( !verify_buf )
+ {
+ ERROR("Unable to allocate memory for test buffer");
+ rc = -1;
+ goto err;
+ }
if ( ctx->stream_type == XC_STREAM_COLO )
{
@@ -757,6 +789,8 @@ static void cleanup(struct xc_sr_context *ctx)
unsigned int i;
DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
&ctx->restore.dirty_bitmap_hbuf);
+ DECLARE_HYPERCALL_BUFFER_SHADOW(uint8_t, verify_buf,
+ &ctx->restore.verify_buf);
for ( i = 0; i < ctx->restore.buffered_rec_num; i++ )
free(ctx->restore.buffered_records[i].data);
@@ -765,6 +799,8 @@ static void cleanup(struct xc_sr_context *ctx)
xc_hypercall_buffer_free_pages(
xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->restore.p2m_size)));
+ xc_hypercall_buffer_free_pages(xch, verify_buf, MAX_BATCH_SIZE);
+
free(ctx->restore.buffered_records);
free(ctx->restore.populated_pfns);
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index ae61f97a47..514ca4be63 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -88,7 +88,7 @@ static int write_batch(struct xc_sr_context *ctx)
xc_interface *xch = ctx->xch;
xen_pfn_t *mfns, *types;
void **local_pages;
- int *errors, rc = -1;
+ int rc = -1;
unsigned int i, nr_pages = 0;
unsigned int nr_pfns = ctx->save.nr_batch_pfns;
uint64_t *rec_pfns;
@@ -109,8 +109,6 @@ static int write_batch(struct xc_sr_context *ctx)
mfns = ctx->save.buffers->mfns;
/* Types of the batch pfns. */
types = ctx->save.buffers->types;
- /* Errors from attempting to map the gfns. */
- errors = ctx->save.buffers->errors;
/* Pointers to locally allocated pages. Need freeing. */
local_pages = ctx->save.buffers->local_pages;
memset(local_pages, 0, sizeof(*local_pages) * nr_pfns);
@@ -166,18 +164,54 @@ static int write_batch(struct xc_sr_context *ctx)
iovcnt = 2;
- rc = xg_foreignmemory_copy_from(xch, ctx->domid, nr_pages, ctx->save.dest_buf, mfns);
- if ( rc < 0 )
- {
- ERROR("xg_foreignmemory_copy_from failed");
- goto err;
- }
-
if ( nr_pages )
{
- iov[iovcnt].iov_base = ctx->save.dest_buf;
- iov[iovcnt].iov_len = nr_pages << XC_PAGE_SHIFT;
- iovcnt++;
+ int p;
+ void *page, *orig_page;
+
+ DECLARE_HYPERCALL_BUFFER_SHADOW(uint8_t, dest_buf,
+ &ctx->save.dest_buf);
+
+ rc = xg_foreignmemory_copy_from(xch, ctx->domid, nr_pages, dest_buf, mfns);
+ if ( rc < 0 )
+ {
+ ERROR("xg_foreignmemory_copy_from failed");
+ goto err;
+ }
+
+ for ( i = 0, p = 0; i < nr_pfns; ++i )
+ {
+ if ( !page_type_has_stream_data(types[i]) )
+ continue;
+
+ orig_page = page = dest_buf + (p * PAGE_SIZE);
+ rc = ctx->save.ops.normalise_page(ctx, types[i], &page);
+
+ if ( orig_page != page )
+ local_pages[i] = page;
+
+ if ( rc )
+ {
+ if ( rc != -1 || errno != EAGAIN )
+ goto err;
+
+ set_bit(ctx->save.batch_pfns[i], ctx->save.deferred_pages);
+ ++ctx->save.nr_deferred_pages;
+ types[i] = XEN_DOMCTL_PFINFO_XTAB;
+ --nr_pages;
+ }
+ else if ( iov[iovcnt-1].iov_base + iov[iovcnt-1].iov_len == page )
+ {
+ iov[iovcnt-1].iov_len += PAGE_SIZE;
+ }
+ else
+ {
+ iov[iovcnt].iov_base = page;
+ iov[iovcnt].iov_len = PAGE_SIZE;
+ iovcnt++;
+ }
+ ++p;
+ }
}
hdrs.rec.length += nr_pages * PAGE_SIZE;
@@ -188,6 +222,7 @@ static int write_batch(struct xc_sr_context *ctx)
if ( writev_exact(ctx->fd, iov, iovcnt) )
{
PERROR("Failed to write page data to stream");
+ rc = -1;
goto err;
}
@@ -717,30 +752,23 @@ static int setup(struct xc_sr_context *ctx)
{
xc_interface *xch = ctx->xch;
int rc;
- const unsigned dest_buf_len = MAX_BATCH_SIZE * XC_PAGE_SIZE;
DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
&ctx->save.dirty_bitmap_hbuf);
+ DECLARE_HYPERCALL_BUFFER_SHADOW(uint8_t, dest_buf,
+ &ctx->save.dest_buf);
rc = ctx->save.ops.setup(ctx);
if ( rc )
goto err;
+ dest_buf = xc_hypercall_buffer_alloc_pages(
+ xch, dest_buf, MAX_BATCH_SIZE);
dirty_bitmap = xc_hypercall_buffer_alloc_pages(
xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)));
ctx->save.deferred_pages = bitmap_alloc(ctx->save.p2m_size);
ctx->save.buffers = calloc(1, sizeof(*ctx->save.buffers));
- ctx->save.dest_buf = NULL;
-
- rc = posix_memalign(&ctx->save.dest_buf, XC_PAGE_SIZE, dest_buf_len);
- if ( rc )
- {
- ERROR("Unable to allocate %u bytes of buffer", dest_buf_len);
- errno = rc;
- rc = -1;
- goto err;
- }
- if ( !dirty_bitmap || !ctx->save.deferred_pages || !ctx->save.buffers)
+ if ( !dirty_bitmap || !ctx->save.deferred_pages || !ctx->save.buffers || !dest_buf )
{
ERROR("Unable to allocate memory for dirty bitmaps, deferred pages"
" and various batch buffers");
@@ -761,7 +789,8 @@ static void cleanup(struct xc_sr_context *ctx)
xc_interface *xch = ctx->xch;
DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
&ctx->save.dirty_bitmap_hbuf);
-
+ DECLARE_HYPERCALL_BUFFER_SHADOW(uint8_t, dest_buf,
+ &ctx->save.dest_buf);
xc_shadow_control(xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_OFF,
NULL, 0);
@@ -771,9 +800,9 @@ static void cleanup(struct xc_sr_context *ctx)
xc_hypercall_buffer_free_pages(xch, dirty_bitmap,
NRPAGES(bitmap_size(ctx->save.p2m_size)));
+ xc_hypercall_buffer_free_pages(xch, dest_buf, MAX_BATCH_SIZE);
free(ctx->save.deferred_pages);
free(ctx->save.buffers);
- free(ctx->save.dest_buf);
}
/*
--
2.54.0
^ permalink raw reply related [flat|nested] 30+ messages in thread* [PATCH Linux v4 16/16] xen/privcmd: Add new ABI to allow copying foreign memory
2026-06-03 13:05 [PATCH v4 00/16] xenguest optimisations Frediano Ziglio
` (14 preceding siblings ...)
2026-06-03 13:06 ` [PATCH v4 15/16] libs/guest: finalize PoC Frediano Ziglio
@ 2026-06-03 13:06 ` Frediano Ziglio
15 siblings, 0 replies; 30+ messages in thread
From: Frediano Ziglio @ 2026-06-03 13:06 UTC (permalink / raw)
To: xen-devel
Cc: Frediano Ziglio, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Teddy Astie, Anthony PERARD, Juergen Gross
From: Frediano Ziglio <frediano.ziglio@citrix.com>
This new ABI allows to copy foreign domain memory to/from a buffer.
This avoids having to map/copy/unmap foreign memory which is
expensive.
This operation is done particularly when migrating VMs.
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
---
arch/x86/include/asm/xen/interface.h | 1 +
drivers/xen/privcmd.c | 51 ++++++++++++++++++++++++++++
include/uapi/xen/privcmd.h | 10 ++++++
include/xen/interface/memory.h | 37 ++++++++++++++++++++
4 files changed, 99 insertions(+)
diff --git a/arch/x86/include/asm/xen/interface.h b/arch/x86/include/asm/xen/interface.h
index a078a2b0f032..bac3c3bc60fd 100644
--- a/arch/x86/include/asm/xen/interface.h
+++ b/arch/x86/include/asm/xen/interface.h
@@ -91,6 +91,7 @@ DEFINE_GUEST_HANDLE(int);
DEFINE_GUEST_HANDLE(void);
DEFINE_GUEST_HANDLE(uint64_t);
DEFINE_GUEST_HANDLE(uint32_t);
+DEFINE_GUEST_HANDLE(uint8_t);
DEFINE_GUEST_HANDLE(xen_pfn_t);
DEFINE_GUEST_HANDLE(xen_ulong_t);
#endif
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 725a49a0eee7..4ae9138dd314 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -62,6 +62,10 @@ MODULE_LICENSE("GPL");
#define PRIV_VMA_LOCKED ((void *)1)
+#ifndef UINT32_MAX
+#define UINT32_MAX ((uint32_t)~0U)
+#endif
+
static unsigned int privcmd_dm_op_max_num = 16;
module_param_named(dm_op_max_nr_bufs, privcmd_dm_op_max_num, uint, 0644);
MODULE_PARM_DESC(dm_op_max_nr_bufs,
@@ -1522,6 +1526,49 @@ static inline void privcmd_ioeventfd_exit(void)
}
#endif /* CONFIG_XEN_PRIVCMD_EVENTFD */
+static long privcmd_ioctl_foreigncopy(
+ struct file *file, void __user *udata)
+{
+ const struct privcmd_data *const data = file->private_data;
+ long ret;
+ struct privcmd_foreigncopy copy;
+ struct xen_foreigncopy xcopy;
+
+ if (copy_from_user(©, udata, sizeof(copy)))
+ return -EFAULT;
+ if (copy.dir & ~1u)
+ return -EINVAL;
+ if (copy.num >= UINT32_MAX >> PAGE_SHIFT)
+ return -EINVAL;
+ if (!access_ok(copy.pfns, copy.num * sizeof(*copy.pfns)))
+ return -EFAULT;
+ if (!access_ok(copy.buffer, copy.num << PAGE_SHIFT))
+ return -EFAULT;
+
+ /* If restriction is in place, check the domid matches */
+ if (data->domid != DOMID_INVALID && data->domid != copy.dom)
+ return -EPERM;
+
+ xcopy.domid = copy.dom;
+ xcopy.flags = copy.dir;
+ xcopy.nr_frames = copy.num;
+ xcopy.frame_list = (void *) copy.pfns;
+ xcopy.buffer = copy.buffer;
+
+ ret = HYPERVISOR_memory_op(XENMEM_foreigncopy, &xcopy);
+
+ /* copy values back in case of error */
+ if (ret) {
+ copy.num = xcopy.nr_frames = copy.num;
+ copy.pfns = xcopy.frame_list;
+ copy.buffer = xcopy.buffer;
+ if (copy_to_user(udata, ©, sizeof(copy)))
+ ret = -EFAULT;
+ }
+
+ return ret;
+}
+
static long privcmd_ioctl(struct file *file,
unsigned int cmd, unsigned long data)
{
@@ -1569,6 +1616,10 @@ static long privcmd_ioctl(struct file *file,
ret = privcmd_ioctl_pcidev_get_gsi(file, udata);
break;
+ case IOCTL_PRIVCMD_FOREIGNCOPY:
+ ret = privcmd_ioctl_foreigncopy(file, udata);
+ break;
+
default:
break;
}
diff --git a/include/uapi/xen/privcmd.h b/include/uapi/xen/privcmd.h
index 8e2c8fd44764..786d769ad4f8 100644
--- a/include/uapi/xen/privcmd.h
+++ b/include/uapi/xen/privcmd.h
@@ -131,6 +131,14 @@ struct privcmd_pcidev_get_gsi {
__u32 gsi;
};
+struct privcmd_foreigncopy {
+ domid_t dom; /* foreign domain */
+ __u16 dir; /* direction, 0 from, 1 to */
+ __u32 num; /* number of pages to copy */
+ const xen_pfn_t __user *pfns; /* array of pfns */
+ void __user *buffer; /* buffer to copy to/from */
+};
+
/*
* @cmd: IOCTL_PRIVCMD_HYPERCALL
* @arg: &privcmd_hypercall_t
@@ -164,5 +172,7 @@ struct privcmd_pcidev_get_gsi {
_IOW('P', 9, struct privcmd_ioeventfd)
#define IOCTL_PRIVCMD_PCIDEV_GET_GSI \
_IOC(_IOC_NONE, 'P', 10, sizeof(struct privcmd_pcidev_get_gsi))
+#define IOCTL_PRIVCMD_FOREIGNCOPY \
+ _IOC(_IOC_NONE, 'P', 11, sizeof(struct privcmd_foreigncopy))
#endif /* __LINUX_PUBLIC_PRIVCMD_H__ */
diff --git a/include/xen/interface/memory.h b/include/xen/interface/memory.h
index 1a371a825c55..5981402fccde 100644
--- a/include/xen/interface/memory.h
+++ b/include/xen/interface/memory.h
@@ -325,4 +325,41 @@ struct xen_mem_acquire_resource {
};
DEFINE_GUEST_HANDLE_STRUCT(xen_mem_acquire_resource);
+/*
+ * Copy memory from/to a given domain.
+ */
+#define XENMEM_foreigncopy 29
+struct xen_foreigncopy {
+ /* IN - The domain whose resource is to be copied */
+ domid_t domid;
+
+ /* IN - Flags */
+#define XENMEM_foreigncopy_from 0
+#define XENMEM_foreigncopy_to 1
+#define XENMEM_foreigncopy_direction 1
+ uint16_t flags;
+
+ /*
+ * IN
+ *
+ * As an IN parameter number of frames of the domain to be copied.
+ */
+ uint32_t nr_frames;
+
+ /*
+ * IN
+ *
+ * Frames to be copied.
+ */
+ GUEST_HANDLE(xen_pfn_t) frame_list;
+
+ /*
+ * IN/OUT
+ *
+ * Userspace buffer to read/write from.
+ */
+ GUEST_HANDLE(uint8_t) buffer;
+};
+DEFINE_GUEST_HANDLE_STRUCT(xen_foreigncopy);
+
#endif /* __XEN_PUBLIC_MEMORY_H__ */
--
2.54.0
^ permalink raw reply related [flat|nested] 30+ messages in thread