All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fabiano Rosas <farosas@suse.de>
To: Hao Xiang <hao.xiang@bytedance.com>,
	peter.maydell@linaro.org, quintela@redhat.com, peterx@redhat.com,
	marcandre.lureau@redhat.com, bryan.zhang@bytedance.com,
	qemu-devel@nongnu.org
Cc: Hao Xiang <hao.xiang@bytedance.com>
Subject: Re: [PATCH v2 07/20] util/dsa: Implement DSA device start and stop logic.
Date: Mon, 11 Dec 2023 18:28:43 -0300	[thread overview]
Message-ID: <87h6koh1is.fsf@suse.de> (raw)
In-Reply-To: <20231114054032.1192027-8-hao.xiang@bytedance.com>

Hao Xiang <hao.xiang@bytedance.com> writes:

> * DSA device open and close.
> * DSA group contains multiple DSA devices.
> * DSA group configure/start/stop/clean.
>
> Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
> Signed-off-by: Bryan Zhang <bryan.zhang@bytedance.com>
> ---
>  include/qemu/dsa.h |  49 +++++++
>  util/dsa.c         | 338 +++++++++++++++++++++++++++++++++++++++++++++
>  util/meson.build   |   1 +
>  3 files changed, 388 insertions(+)
>  create mode 100644 include/qemu/dsa.h
>  create mode 100644 util/dsa.c
>
> diff --git a/include/qemu/dsa.h b/include/qemu/dsa.h
> new file mode 100644
> index 0000000000..30246b507e
> --- /dev/null
> +++ b/include/qemu/dsa.h
> @@ -0,0 +1,49 @@
> +#ifndef QEMU_DSA_H
> +#define QEMU_DSA_H
> +
> +#include "qemu/thread.h"
> +#include "qemu/queue.h"
> +
> +#ifdef CONFIG_DSA_OPT
> +
> +#pragma GCC push_options
> +#pragma GCC target("enqcmd")
> +
> +#include <linux/idxd.h>
> +#include "x86intrin.h"
> +
> +#endif
> +
> +/**
> + * @brief Initializes DSA devices.
> + *
> + * @param dsa_parameter A list of DSA device path from migration parameter.

This code seems pretty generic, let's decouple this doc from migration.

> + * @return int Zero if successful, otherwise non zero.
> + */
> +int dsa_init(const char *dsa_parameter);
> +
> +/**
> + * @brief Start logic to enable using DSA.
> + */
> +void dsa_start(void);
> +
> +/**
> + * @brief Stop logic to clean up DSA by halting the device group and cleaning up
> + * the completion thread.

"Stop the device group and the completion thread"

The mention of "clean/cleaning up" makes this confusing because of
dsa_cleanup() below.

> + */
> +void dsa_stop(void);
> +
> +/**
> + * @brief Clean up system resources created for DSA offloading.
> + *        This function is called during QEMU process teardown.

This is not called during QEMU process teardown. It's called at the end
of migration AFAICS. Maybe just leave this sentence out.

> + */
> +void dsa_cleanup(void);
> +
> +/**
> + * @brief Check if DSA is running.
> + *
> + * @return True if DSA is running, otherwise false.
> + */
> +bool dsa_is_running(void);
> +
> +#endif
> \ No newline at end of file
> diff --git a/util/dsa.c b/util/dsa.c
> new file mode 100644
> index 0000000000..8edaa892ec
> --- /dev/null
> +++ b/util/dsa.c
> @@ -0,0 +1,338 @@
> +/*
> + * Use Intel Data Streaming Accelerator to offload certain background
> + * operations.
> + *
> + * Copyright (c) 2023 Hao Xiang <hao.xiang@bytedance.com>
> + *                    Bryan Zhang <bryan.zhang@bytedance.com>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/queue.h"
> +#include "qemu/memalign.h"
> +#include "qemu/lockable.h"
> +#include "qemu/cutils.h"
> +#include "qemu/dsa.h"
> +#include "qemu/bswap.h"
> +#include "qemu/error-report.h"
> +#include "qemu/rcu.h"
> +
> +#ifdef CONFIG_DSA_OPT
> +
> +#pragma GCC push_options
> +#pragma GCC target("enqcmd")
> +
> +#include <linux/idxd.h>
> +#include "x86intrin.h"
> +
> +#define DSA_WQ_SIZE 4096
> +#define MAX_DSA_DEVICES 16
> +
> +typedef QSIMPLEQ_HEAD(dsa_task_queue, buffer_zero_batch_task) dsa_task_queue;
> +
> +struct dsa_device {
> +    void *work_queue;
> +};
> +
> +struct dsa_device_group {
> +    struct dsa_device *dsa_devices;
> +    int num_dsa_devices;
> +    uint32_t index;
> +    bool running;
> +    QemuMutex task_queue_lock;
> +    QemuCond task_queue_cond;
> +    dsa_task_queue task_queue;
> +};
> +
> +uint64_t max_retry_count;
> +static struct dsa_device_group dsa_group;
> +
> +
> +/**
> + * @brief This function opens a DSA device's work queue and
> + *        maps the DSA device memory into the current process.
> + *
> + * @param dsa_wq_path A pointer to the DSA device work queue's file path.
> + * @return A pointer to the mapped memory.
> + */
> +static void *
> +map_dsa_device(const char *dsa_wq_path)
> +{
> +    void *dsa_device;
> +    int fd;
> +
> +    fd = open(dsa_wq_path, O_RDWR);
> +    if (fd < 0) {
> +        fprintf(stderr, "open %s failed with errno = %d.\n",
> +                dsa_wq_path, errno);

Use error_report and error_setg* for these. Throughout the series.

> +        return MAP_FAILED;
> +    }
> +    dsa_device = mmap(NULL, DSA_WQ_SIZE, PROT_WRITE,
> +                      MAP_SHARED | MAP_POPULATE, fd, 0);
> +    close(fd);
> +    if (dsa_device == MAP_FAILED) {
> +        fprintf(stderr, "mmap failed with errno = %d.\n", errno);
> +        return MAP_FAILED;
> +    }
> +    return dsa_device;
> +}
> +
> +/**
> + * @brief Initializes a DSA device structure.
> + *
> + * @param instance A pointer to the DSA device.
> + * @param work_queue  A pointer to the DSA work queue.
> + */
> +static void
> +dsa_device_init(struct dsa_device *instance,
> +                void *dsa_work_queue)
> +{
> +    instance->work_queue = dsa_work_queue;
> +}
> +
> +/**
> + * @brief Cleans up a DSA device structure.
> + *
> + * @param instance A pointer to the DSA device to cleanup.
> + */
> +static void
> +dsa_device_cleanup(struct dsa_device *instance)
> +{
> +    if (instance->work_queue != MAP_FAILED) {
> +        munmap(instance->work_queue, DSA_WQ_SIZE);
> +    }
> +}
> +
> +/**
> + * @brief Initializes a DSA device group.
> + *
> + * @param group A pointer to the DSA device group.
> + * @param num_dsa_devices The number of DSA devices this group will have.
> + *
> + * @return Zero if successful, non-zero otherwise.
> + */
> +static int
> +dsa_device_group_init(struct dsa_device_group *group,
> +                      const char *dsa_parameter)

The documentation doesn't match the signature. This happens in other
places as well, please review all of them.

> +{
> +    if (dsa_parameter == NULL || strlen(dsa_parameter) == 0) {
> +        return 0;
> +    }
> +
> +    int ret = 0;
> +    char *local_dsa_parameter = g_strdup(dsa_parameter);
> +    const char *dsa_path[MAX_DSA_DEVICES];
> +    int num_dsa_devices = 0;
> +    char delim[2] = " ";

So we're using space separated strings. Let's document this in this file
and also on the migration parameter documentation.

> +
> +    char *current_dsa_path = strtok(local_dsa_parameter, delim);
> +
> +    while (current_dsa_path != NULL) {
> +        dsa_path[num_dsa_devices++] = current_dsa_path;
> +        if (num_dsa_devices == MAX_DSA_DEVICES) {
> +            break;
> +        }
> +        current_dsa_path = strtok(NULL, delim);
> +    }
> +
> +    group->dsa_devices =
> +        malloc(sizeof(struct dsa_device) * num_dsa_devices);

Use g_new0() here.

> +    group->num_dsa_devices = num_dsa_devices;
> +    group->index = 0;
> +
> +    group->running = false;
> +    qemu_mutex_init(&group->task_queue_lock);
> +    qemu_cond_init(&group->task_queue_cond);
> +    QSIMPLEQ_INIT(&group->task_queue);
> +
> +    void *dsa_wq = MAP_FAILED;
> +    for (int i = 0; i < num_dsa_devices; i++) {
> +        dsa_wq = map_dsa_device(dsa_path[i]);
> +        if (dsa_wq == MAP_FAILED) {
> +            fprintf(stderr, "map_dsa_device failed MAP_FAILED, "
> +                    "using simulation.\n");

What does "using simulation" means? And how are doing it by returning -1
from this function?

> +            ret = -1;

What about the memory for group->dsa_devices in the failure case? We
should either free it here or make sure the client code calls the
cleanup routines.

> +            goto exit;
> +        }
> +        dsa_device_init(&dsa_group.dsa_devices[i], dsa_wq);
> +    }
> +
> +exit:
> +    g_free(local_dsa_parameter);
> +    return ret;
> +}
> +
> +/**
> + * @brief Starts a DSA device group.
> + *
> + * @param group A pointer to the DSA device group.
> + * @param dsa_path An array of DSA device path.
> + * @param num_dsa_devices The number of DSA devices in the device group.
> + */
> +static void
> +dsa_device_group_start(struct dsa_device_group *group)
> +{
> +    group->running = true;
> +}
> +
> +/**
> + * @brief Stops a DSA device group.
> + *
> + * @param group A pointer to the DSA device group.
> + */
> +__attribute__((unused))
> +static void
> +dsa_device_group_stop(struct dsa_device_group *group)
> +{
> +    group->running = false;
> +}
> +
> +/**
> + * @brief Cleans up a DSA device group.
> + *
> + * @param group A pointer to the DSA device group.
> + */
> +static void
> +dsa_device_group_cleanup(struct dsa_device_group *group)
> +{
> +    if (!group->dsa_devices) {
> +        return;
> +    }
> +    for (int i = 0; i < group->num_dsa_devices; i++) {
> +        dsa_device_cleanup(&group->dsa_devices[i]);
> +    }
> +    free(group->dsa_devices);
> +    group->dsa_devices = NULL;
> +
> +    qemu_mutex_destroy(&group->task_queue_lock);
> +    qemu_cond_destroy(&group->task_queue_cond);
> +}
> +
> +/**
> + * @brief Returns the next available DSA device in the group.
> + *
> + * @param group A pointer to the DSA device group.
> + *
> + * @return struct dsa_device* A pointer to the next available DSA device
> + *         in the group.
> + */
> +__attribute__((unused))
> +static struct dsa_device *
> +dsa_device_group_get_next_device(struct dsa_device_group *group)
> +{
> +    if (group->num_dsa_devices == 0) {
> +        return NULL;
> +    }
> +    uint32_t current = qatomic_fetch_inc(&group->index);

The name "index" alone feels a bit opaque. Is there a more
representative name we could give it?

> +    current %= group->num_dsa_devices;
> +    return &group->dsa_devices[current];
> +}
> +
> +/**
> + * @brief Check if DSA is running.
> + *
> + * @return True if DSA is running, otherwise false.
> + */
> +bool dsa_is_running(void)
> +{
> +    return false;
> +}
> +
> +static void
> +dsa_globals_init(void)
> +{
> +    max_retry_count = UINT64_MAX;
> +}
> +
> +/**
> + * @brief Initializes DSA devices.
> + *
> + * @param dsa_parameter A list of DSA device path from migration parameter.
> + * @return int Zero if successful, otherwise non zero.
> + */
> +int dsa_init(const char *dsa_parameter)
> +{
> +    dsa_globals_init();
> +
> +    return dsa_device_group_init(&dsa_group, dsa_parameter);
> +}
> +
> +/**
> + * @brief Start logic to enable using DSA.
> + *
> + */
> +void dsa_start(void)
> +{
> +    if (dsa_group.num_dsa_devices == 0) {
> +        return;
> +    }
> +    if (dsa_group.running) {
> +        return;
> +    }
> +    dsa_device_group_start(&dsa_group);
> +}
> +
> +/**
> + * @brief Stop logic to clean up DSA by halting the device group and cleaning up
> + * the completion thread.
> + *
> + */
> +void dsa_stop(void)
> +{
> +    struct dsa_device_group *group = &dsa_group;
> +
> +    if (!group->running) {
> +        return;
> +    }
> +}
> +
> +/**
> + * @brief Clean up system resources created for DSA offloading.
> + *        This function is called during QEMU process teardown.
> + *
> + */
> +void dsa_cleanup(void)
> +{
> +    dsa_stop();
> +    dsa_device_group_cleanup(&dsa_group);
> +}
> +
> +#else
> +
> +bool dsa_is_running(void)
> +{
> +    return false;
> +}
> +
> +int dsa_init(const char *dsa_parameter)
> +{
> +    fprintf(stderr, "Intel Data Streaming Accelerator is not supported "
> +                    "on this platform.\n");
> +    return -1;

Nothing checks this later in the series and we end up trying to start a
migration when we shouldn't. Fixing the configure step would already
stop this happening, but make sure you check this anyway and abort the
migration.

> +}
> +
> +void dsa_start(void) {}
> +
> +void dsa_stop(void) {}
> +
> +void dsa_cleanup(void) {}
> +
> +#endif

These could all be in the header.

> +
> diff --git a/util/meson.build b/util/meson.build
> index c2322ef6e7..f7277c5e9b 100644
> --- a/util/meson.build
> +++ b/util/meson.build
> @@ -85,6 +85,7 @@ if have_block or have_ga
>  endif
>  if have_block
>    util_ss.add(files('aio-wait.c'))
> +  util_ss.add(files('dsa.c'))

I find it clearer to add the file conditionally under CONFIG_DSA_OPT
here and remove the ifdef from the C file. I'm not sure if we have any
guidelines for this, so up to you.

>    util_ss.add(files('buffer.c'))
>    util_ss.add(files('bufferiszero.c'))
>    util_ss.add(files('hbitmap.c'))


  reply	other threads:[~2023-12-11 21:29 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-14  5:40 [PATCH v2 00/20] Use Intel DSA accelerator to offload zero page checking in multifd live migration Hao Xiang
2023-11-14  5:40 ` [PATCH v2 01/20] multifd: Add capability to enable/disable zero_page Hao Xiang
2023-11-16 15:15   ` Fabiano Rosas
2023-11-14  5:40 ` [PATCH v2 02/20] multifd: Support for zero pages transmission Hao Xiang
2023-11-14  5:40 ` [PATCH v2 03/20] multifd: Zero " Hao Xiang
2023-12-18  2:43   ` Wang, Lei
2023-11-14  5:40 ` [PATCH v2 04/20] So we use multifd to transmit zero pages Hao Xiang
2023-11-16 15:14   ` Fabiano Rosas
2024-01-23  4:28     ` [External] " Hao Xiang
2024-01-25 21:55       ` Hao Xiang
2024-01-25 23:14         ` Fabiano Rosas
2024-01-25 23:46           ` Hao Xiang
2023-11-14  5:40 ` [PATCH v2 05/20] meson: Introduce new instruction set enqcmd to the build system Hao Xiang
2023-12-11 15:41   ` Fabiano Rosas
2023-12-16  0:26     ` [External] " Hao Xiang
2023-11-14  5:40 ` [PATCH v2 06/20] util/dsa: Add dependency idxd Hao Xiang
2023-11-14  5:40 ` [PATCH v2 07/20] util/dsa: Implement DSA device start and stop logic Hao Xiang
2023-12-11 21:28   ` Fabiano Rosas [this message]
2023-12-19  6:41     ` [External] " Hao Xiang
2023-12-19 13:18       ` Fabiano Rosas
2023-12-27  6:00         ` Hao Xiang
2023-11-14  5:40 ` [PATCH v2 08/20] util/dsa: Implement DSA task enqueue and dequeue Hao Xiang
2023-12-12 16:10   ` Fabiano Rosas
2023-12-27  0:07     ` [External] " Hao Xiang
2023-11-14  5:40 ` [PATCH v2 09/20] util/dsa: Implement DSA task asynchronous completion thread model Hao Xiang
2023-12-12 19:36   ` Fabiano Rosas
2023-12-18  3:11   ` Wang, Lei
2023-12-18 18:57     ` [External] " Hao Xiang
2023-12-19  1:33       ` Wang, Lei
2023-12-19  5:12         ` Hao Xiang
2023-11-14  5:40 ` [PATCH v2 10/20] util/dsa: Implement zero page checking in DSA task Hao Xiang
2023-11-14  5:40 ` [PATCH v2 11/20] util/dsa: Implement DSA task asynchronous submission and wait for completion Hao Xiang
2023-12-13 14:01   ` Fabiano Rosas
2023-12-27  6:26     ` [External] " Hao Xiang
2023-11-14  5:40 ` [PATCH v2 12/20] migration/multifd: Add new migration option for multifd DSA offloading Hao Xiang
2023-12-11 19:44   ` Fabiano Rosas
2023-12-18 18:34     ` [External] " Hao Xiang
2023-12-18  3:12   ` Wang, Lei
2023-11-14  5:40 ` [PATCH v2 13/20] migration/multifd: Prepare to introduce DSA acceleration on the multifd path Hao Xiang
2023-12-18  3:20   ` Wang, Lei
2023-11-14  5:40 ` [PATCH v2 14/20] migration/multifd: Enable DSA offloading in multifd sender path Hao Xiang
2023-11-14  5:40 ` [PATCH v2 15/20] migration/multifd: Add test hook to set normal page ratio Hao Xiang
2023-11-14  5:40 ` [PATCH v2 16/20] migration/multifd: Enable set normal page ratio test hook in multifd Hao Xiang
2023-11-14  5:40 ` [PATCH v2 17/20] migration/multifd: Add migration option set packet size Hao Xiang
2023-11-14  5:40 ` [PATCH v2 18/20] migration/multifd: Enable set packet size migration option Hao Xiang
2023-12-13 17:33   ` Fabiano Rosas
2024-01-03 20:04     ` [External] " Hao Xiang
2023-11-14  5:40 ` [PATCH v2 19/20] util/dsa: Add unit test coverage for Intel DSA task submission and completion Hao Xiang
2023-11-14  5:40 ` [PATCH v2 20/20] migration/multifd: Add integration tests for multifd with Intel DSA offloading Hao Xiang
2023-11-15 17:43 ` [PATCH v2 00/20] Use Intel DSA accelerator to offload zero page checking in multifd live migration Elena Ufimtseva
2023-11-15 19:37   ` [External] " Hao Xiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h6koh1is.fsf@suse.de \
    --to=farosas@suse.de \
    --cc=bryan.zhang@bytedance.com \
    --cc=hao.xiang@bytedance.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.