All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fabiano Rosas <farosas@suse.de>
To: "Yichen Wang" <yichen.wang@bytedance.com>,
	"Peter Xu" <peterx@redhat.com>,
	"Dr. David Alan Gilbert" <dave@treblig.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Eric Blake" <eblake@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Cornelia Huck" <cohuck@redhat.com>,
	"Laurent Vivier" <lvivier@redhat.com>,
	qemu-devel@nongnu.org
Cc: Hao Xiang <hao.xiang@linux.dev>,
	"Liu, Yuan1" <yuan1.liu@intel.com>,
	Shivam Kumar <shivam.kumar1@nutanix.com>,
	"Ho-Ren (Jack) Chuang" <horenchuang@bytedance.com>,
	Yichen Wang <yichen.wang@bytedance.com>
Subject: Re: [PATCH v9 09/12] migration/multifd: Enable DSA offloading in multifd sender path.
Date: Fri, 03 Jan 2025 17:59:21 -0300	[thread overview]
Message-ID: <87y0zr7gwm.fsf@suse.de> (raw)
In-Reply-To: <20241225005919.26853-10-yichen.wang@bytedance.com>

Yichen Wang <yichen.wang@bytedance.com> writes:

> From: Hao Xiang <hao.xiang@linux.dev>
>
> Multifd sender path gets an array of pages queued by the migration
> thread. It performs zero page checking on every page in the array.
> The pages are classfied as either a zero page or a normal page. This
> change uses Intel DSA to offload the zero page checking from CPU to
> the DSA accelerator. The sender thread submits a batch of pages to DSA
> hardware and waits for the DSA completion thread to signal for work
> completion.
>
> Signed-off-by: Hao Xiang <hao.xiang@linux.dev>
> Signed-off-by: Yichen Wang <yichen.wang@bytedance.com>
> ---
>  migration/multifd-zero-page.c | 149 ++++++++++++++++++++++++++++++----
>  migration/multifd.c           |  23 +++++-
>  migration/multifd.h           |   6 ++
>  migration/options.c           |  13 +++
>  migration/options.h           |   1 +
>  5 files changed, 176 insertions(+), 16 deletions(-)
>
> diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
> index f1e988a959..0a4e3fb9bd 100644
> --- a/migration/multifd-zero-page.c
> +++ b/migration/multifd-zero-page.c
> @@ -21,7 +21,9 @@
>  
>  static bool multifd_zero_page_enabled(void)
>  {
> -    return migrate_zero_page_detection() == ZERO_PAGE_DETECTION_MULTIFD;
> +    ZeroPageDetection curMethod = migrate_zero_page_detection();
> +    return (curMethod == ZERO_PAGE_DETECTION_MULTIFD ||
> +            curMethod == ZERO_PAGE_DETECTION_DSA_ACCEL);
>  }
>  
>  static void swap_page_offset(ram_addr_t *pages_offset, int a, int b)
> @@ -37,26 +39,49 @@ static void swap_page_offset(ram_addr_t *pages_offset, int a, int b)
>      pages_offset[b] = temp;
>  }
>  
> +#ifdef CONFIG_DSA_OPT
> +
> +static void swap_result(bool *results, int a, int b)
> +{
> +    bool temp;
> +
> +    if (a == b) {
> +        return;
> +    }
> +
> +    temp = results[a];
> +    results[a] = results[b];
> +    results[b] = temp;
> +}
> +
>  /**
> - * multifd_send_zero_page_detect: Perform zero page detection on all pages.
> + * zero_page_detect_dsa: Perform zero page detection using
> + * Intel Data Streaming Accelerator (DSA).
>   *
> - * Sorts normal pages before zero pages in p->pages->offset and updates
> - * p->pages->normal_num.
> + * Sorts normal pages before zero pages in pages->offset and updates
> + * pages->normal_num.
>   *
>   * @param p A pointer to the send params.
>   */
> -void multifd_send_zero_page_detect(MultiFDSendParams *p)
> +static void zero_page_detect_dsa(MultiFDSendParams *p)
>  {
>      MultiFDPages_t *pages = &p->data->u.ram;
>      RAMBlock *rb = pages->block;
> -    int i = 0;
> -    int j = pages->num - 1;
> +    bool *results = p->dsa_batch_task->results;
>  
> -    if (!multifd_zero_page_enabled()) {
> -        pages->normal_num = pages->num;
> -        goto out;
> +    for (int i = 0; i < pages->num; i++) {
> +        p->dsa_batch_task->addr[i] =
> +            (ram_addr_t)(rb->host + pages->offset[i]);
>      }
>  
> +    buffer_is_zero_dsa_batch_sync(p->dsa_batch_task,
> +                                  (const void **)p->dsa_batch_task->addr,
> +                                  pages->num,
> +                                  multifd_ram_page_size());
> +
> +    int i = 0;
> +    int j = pages->num - 1;
> +
>      /*
>       * Sort the page offset array by moving all normal pages to
>       * the left and all zero pages to the right of the array.
> @@ -64,23 +89,59 @@ void multifd_send_zero_page_detect(MultiFDSendParams *p)
>      while (i <= j) {
>          uint64_t offset = pages->offset[i];
>  
> -        if (!buffer_is_zero(rb->host + offset, multifd_ram_page_size())) {
> +        if (!results[i]) {
>              i++;
>              continue;
>          }
>  
> +        swap_result(results, i, j);
>          swap_page_offset(pages->offset, i, j);
>          ram_release_page(rb->idstr, offset);
>          j--;
>      }
>  
>      pages->normal_num = i;
> +}
>  
> -out:
> -    stat64_add(&mig_stats.normal_pages, pages->normal_num);
> -    stat64_add(&mig_stats.zero_pages, pages->num - pages->normal_num);
> +int multifd_dsa_setup(MigrationState *s, Error *local_err)
> +{
> +    g_autofree strList *dsa_parameter = g_malloc0(sizeof(strList));
> +    migrate_dsa_accel_path(&dsa_parameter);
> +    if (qemu_dsa_init(dsa_parameter, &local_err)) {
> +        migrate_set_error(s, local_err);
> +        return -1;
> +    } else {
> +        qemu_dsa_start();
> +    }
> +
> +    return 0;
> +}
> +
> +void multifd_dsa_cleanup(void)
> +{
> +    qemu_dsa_cleanup();
> +}
> +
> +#else
> +
> +static void zero_page_detect_dsa(MultiFDSendParams *p)
> +{
> +    g_assert_not_reached();
>  }
>  
> +int multifd_dsa_setup(MigrationState *s, Error *local_err)
> +{
> +    g_assert_not_reached();
> +    return -1;
> +}
> +
> +void multifd_dsa_cleanup(void)
> +{
> +    return ;
> +}
> +
> +#endif
> +
>  void multifd_recv_zero_page_process(MultiFDRecvParams *p)
>  {
>      for (int i = 0; i < p->zero_num; i++) {
> @@ -92,3 +153,63 @@ void multifd_recv_zero_page_process(MultiFDRecvParams *p)
>          }
>      }
>  }
> +
> +/**
> + * zero_page_detect_cpu: Perform zero page detection using CPU.
> + *
> + * Sorts normal pages before zero pages in p->pages->offset and updates
> + * p->pages->normal_num.
> + *
> + * @param p A pointer to the send params.
> + */
> +static void zero_page_detect_cpu(MultiFDSendParams *p)
> +{
> +    MultiFDPages_t *pages = &p->data->u.ram;
> +    RAMBlock *rb = pages->block;
> +    int i = 0;
> +    int j = pages->num - 1;
> +
> +    /*
> +     * Sort the page offset array by moving all normal pages to
> +     * the left and all zero pages to the right of the array.
> +     */
> +    while (i <= j) {
> +        uint64_t offset = pages->offset[i];
> +
> +        if (!buffer_is_zero(rb->host + offset, multifd_ram_page_size())) {
> +            i++;
> +            continue;
> +        }
> +
> +        swap_page_offset(pages->offset, i, j);
> +        ram_release_page(rb->idstr, offset);
> +        j--;
> +    }
> +
> +    pages->normal_num = i;
> +}
> +
> +/**
> + * multifd_send_zero_page_detect: Perform zero page detection on all pages.
> + *
> + * @param p A pointer to the send params.
> + */
> +void multifd_send_zero_page_detect(MultiFDSendParams *p)
> +{
> +    MultiFDPages_t *pages = &p->data->u.ram;
> +
> +    if (!multifd_zero_page_enabled()) {
> +        pages->normal_num = pages->num;
> +        goto out;
> +    }
> +
> +    if (qemu_dsa_is_running()) {
> +        zero_page_detect_dsa(p);
> +    } else {
> +        zero_page_detect_cpu(p);
> +    }
> +
> +out:
> +    stat64_add(&mig_stats.normal_pages, pages->normal_num);
> +    stat64_add(&mig_stats.zero_pages, pages->num - pages->normal_num);
> +}
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 4f973d70e0..50cdbd21d0 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -13,6 +13,7 @@
>  #include "qemu/osdep.h"
>  #include "qemu/cutils.h"
>  #include "qemu/rcu.h"
> +#include "qemu/dsa.h"
>  #include "exec/target_page.h"
>  #include "system/system.h"
>  #include "exec/ramblock.h"
> @@ -462,6 +463,8 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams *p, Error **errp)
>      p->name = NULL;
>      g_free(p->data);
>      p->data = NULL;
> +    buffer_zero_batch_task_destroy(p->dsa_batch_task);
> +    p->dsa_batch_task = NULL;
>      p->packet_len = 0;
>      g_free(p->packet);
>      p->packet = NULL;
> @@ -493,6 +496,8 @@ void multifd_send_shutdown(void)
>  
>      multifd_send_terminate_threads();
>  
> +    multifd_dsa_cleanup();
> +
>      for (i = 0; i < migrate_multifd_channels(); i++) {
>          MultiFDSendParams *p = &multifd_send_state->params[i];
>          Error *local_err = NULL;
> @@ -814,6 +819,7 @@ bool multifd_send_setup(void)
>      uint32_t page_count = multifd_ram_page_count();
>      bool use_packets = multifd_use_packets();
>      uint8_t i;
> +    Error *local_err = NULL;
>  
>      if (!migrate_multifd()) {
>          return true;
> @@ -827,9 +833,12 @@ bool multifd_send_setup(void)
>      qatomic_set(&multifd_send_state->exiting, 0);
>      multifd_send_state->ops = multifd_ops[migrate_multifd_compression()];
>  
> +    if (ret) {
> +        goto err;
> +    }

This looks like an artifact of rebase.

> +
>      for (i = 0; i < thread_count; i++) {
>          MultiFDSendParams *p = &multifd_send_state->params[i];
> -        Error *local_err = NULL;
>  
>          qemu_sem_init(&p->sem, 0);
>          qemu_sem_init(&p->sem_sync, 0);
> @@ -863,10 +872,19 @@ bool multifd_send_setup(void)
>          goto err;
>      }

This one as well.

>  
> +    if (s && ret == 0 &&

if (!ret && ...

The migration state pointer will always be live here.

> +        s->parameters.zero_page_detection == ZERO_PAGE_DETECTION_DSA_ACCEL) {
> +        ret = multifd_dsa_setup(s, local_err);
> +    }
> +
> +    if (ret) {
> +        goto err;
> +    }
> +
>      for (i = 0; i < thread_count; i++) {
>          MultiFDSendParams *p = &multifd_send_state->params[i];
> -        Error *local_err = NULL;
>  
> +        p->dsa_batch_task = buffer_zero_batch_task_init(page_count);

We should do something about the case where CONFIG_DSA_OPT=y, but DSA is
nonetheless not enabled. We don't want to go into this function and
allocate a bunch of memory, etc.

Can you check inside buffer_zero_batch_task_init() whether dsa_setup
happened and return early if not?

Alternatively, I'd just repeat the for loop inside the conditional above,
like this:

for (i = 0; i < thread_count; i++) {
    qemu_sem_wait(&multifd_send_state->channels_created);
}

if (ret) {
    goto err;
}

+if (s->parameters.zero_page_detection == ZERO_PAGE_DETECTION_DSA_ACCEL) {
+    ret = multifd_dsa_setup(s, local_err);
+    if (ret) {
+        goto err;
+    }
+
+    for (i = 0; i < thread_count; i++) {
+        MultiFDSendParams *p = &multifd_send_state->params[i];
+        p->dsa_batch_task = buffer_zero_batch_task_init(page_count);
+    }
+}

>          ret = multifd_send_state->ops->send_setup(p, &local_err);
>          if (ret) {
>              migrate_set_error(s, local_err);
> @@ -1047,6 +1065,7 @@ void multifd_recv_cleanup(void)
>              qemu_thread_join(&p->thread);
>          }
>      }
> +    multifd_dsa_cleanup();
>      for (i = 0; i < migrate_multifd_channels(); i++) {
>          multifd_recv_cleanup_channel(&multifd_recv_state->params[i]);
>      }
> diff --git a/migration/multifd.h b/migration/multifd.h
> index 50d58c0c9c..da53b0bdfd 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -15,6 +15,7 @@
>  
>  #include "exec/target_page.h"
>  #include "ram.h"
> +#include "qemu/dsa.h"
>  
>  typedef struct MultiFDRecvData MultiFDRecvData;
>  typedef struct MultiFDSendData MultiFDSendData;
> @@ -155,6 +156,9 @@ typedef struct {
>      bool pending_sync;
>      MultiFDSendData *data;
>  
> +    /* Zero page checking batch task */
> +    QemuDsaBatchTask *dsa_batch_task;
> +
>      /* thread local variables. No locking required */
>  
>      /* pointer to the packet */
> @@ -313,6 +317,8 @@ void multifd_send_fill_packet(MultiFDSendParams *p);
>  bool multifd_send_prepare_common(MultiFDSendParams *p);
>  void multifd_send_zero_page_detect(MultiFDSendParams *p);
>  void multifd_recv_zero_page_process(MultiFDRecvParams *p);
> +int multifd_dsa_setup(MigrationState *s, Error *local_err);
> +void multifd_dsa_cleanup(void);
>  
>  static inline void multifd_send_prepare_header(MultiFDSendParams *p)
>  {
> diff --git a/migration/options.c b/migration/options.c
> index 68547b358b..9011e7f6c3 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -817,6 +817,19 @@ const strList *migrate_accel_path(void)
>      return s->parameters.accel_path;
>  }
>  
> +void migrate_dsa_accel_path(strList **dsa_accel_path)
> +{
> +    MigrationState *s = migrate_get_current();
> +    strList *accel_path = s->parameters.accel_path;
> +    strList **tail = dsa_accel_path;
> +    while (accel_path) {
> +        if (strncmp(accel_path->value, "dsa:", 4) == 0) {
> +            QAPI_LIST_APPEND(tail, &accel_path->value[4]);
> +        }
> +        accel_path = accel_path->next;
> +    }
> +}
> +
>  const char *migrate_tls_hostname(void)
>  {
>      MigrationState *s = migrate_get_current();
> diff --git a/migration/options.h b/migration/options.h
> index c994b04cb6..586d091733 100644
> --- a/migration/options.h
> +++ b/migration/options.h
> @@ -86,6 +86,7 @@ const char *migrate_tls_hostname(void);
>  uint64_t migrate_xbzrle_cache_size(void);
>  ZeroPageDetection migrate_zero_page_detection(void);
>  const strList *migrate_accel_path(void);
> +void migrate_dsa_accel_path(strList **dsa_accel_path);
>  
>  /* parameters helpers */


  reply	other threads:[~2025-01-03 21:00 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-25  0:59 [PATCH v9 00/12] Use Intel DSA accelerator to offload zero page checking in multifd live migration Yichen Wang
2024-12-25  0:59 ` [PATCH v9 01/12] meson: Introduce new instruction set enqcmd to the build system Yichen Wang
2024-12-25  0:59 ` [PATCH v9 02/12] util/dsa: Add idxd into linux header copy list Yichen Wang
2024-12-25  0:59 ` [PATCH v9 03/12] util/dsa: Implement DSA device start and stop logic Yichen Wang
2024-12-25  0:59 ` [PATCH v9 04/12] util/dsa: Implement DSA task enqueue and dequeue Yichen Wang
2024-12-25  0:59 ` [PATCH v9 05/12] util/dsa: Implement DSA task asynchronous completion thread model Yichen Wang
2024-12-25  0:59 ` [PATCH v9 06/12] util/dsa: Implement zero page checking in DSA task Yichen Wang
2024-12-25  0:59 ` [PATCH v9 07/12] util/dsa: Implement DSA task asynchronous submission and wait for completion Yichen Wang
2024-12-25  0:59 ` [PATCH v9 08/12] migration/multifd: Add new migration option for multifd DSA offloading Yichen Wang
2025-01-07 10:57   ` Markus Armbruster
2024-12-25  0:59 ` [PATCH v9 09/12] migration/multifd: Enable DSA offloading in multifd sender path Yichen Wang
2025-01-03 20:59   ` Fabiano Rosas [this message]
2024-12-25  0:59 ` [PATCH v9 10/12] util/dsa: Add unit test coverage for Intel DSA task submission and completion Yichen Wang
2025-01-03 20:39   ` Fabiano Rosas
2024-12-25  0:59 ` [PATCH v9 11/12] migration/multifd: Add integration tests for multifd with Intel DSA offloading Yichen Wang
2024-12-25  0:59 ` [PATCH v9 12/12] migration/doc: Add DSA zero page detection doc Yichen Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y0zr7gwm.fsf@suse.de \
    --to=farosas@suse.de \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=dave@treblig.org \
    --cc=eblake@redhat.com \
    --cc=hao.xiang@linux.dev \
    --cc=horenchuang@bytedance.com \
    --cc=lvivier@redhat.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=shivam.kumar1@nutanix.com \
    --cc=yichen.wang@bytedance.com \
    --cc=yuan1.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.