Re: [PATCH] qga: implement 'guest-get-nvidia-smi' command

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Markus Armbruster <armbru@redhat.com>
To: "João Vilaça" <machadovilaca@gmail.com>
Cc: qemu-devel@nongnu.org,  kkostiuk@redhat.com
Subject: Re: [PATCH] qga: implement 'guest-get-nvidia-smi' command
Date: Wed, 01 Apr 2026 13:25:16 +0200	[thread overview]
Message-ID: <871pgy3otf.fsf@pond.sub.org> (raw)
In-Reply-To: <20260331110135.92883-1-machadovilaca@gmail.com> ("João Vilaça"'s message of "Tue, 31 Mar 2026 12:01:35 +0100")

You neglected to cc: me.  We recommend to use scripts/get_maintainer.pl
to find all the maintainers, then use common sense to trim.

João Vilaça <machadovilaca@gmail.com> writes:

The commit message needs to explain why and how the patch is useful.

For a patch adding a command to qemu-ga, like this one, it needs to
state the command's anticipated use cases.

> ---
>  qga/commands-posix.c | 64 ++++++++++++++++++++++++++++++++++++++++++++
>  qga/commands-win32.c | 64 ++++++++++++++++++++++++++++++++++++++++++++
>  qga/qapi-schema.json | 59 ++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 187 insertions(+)
>
> diff --git a/qga/commands-posix.c b/qga/commands-posix.c
> index 837be51c40..631a8a9ee6 100644
> --- a/qga/commands-posix.c
> +++ b/qga/commands-posix.c
> @@ -1415,3 +1415,67 @@ GuestLoadAverage *qmp_guest_get_load(Error **errp)
>      return ret;
>  }
>  #endif
> +
> +GuestNvidiaGpuList *qmp_guest_get_nvidia_smi(Error **errp)
> +{
> +    const gchar *argv[] = {
> +        "nvidia-smi",
> +        "--query-gpu=index,name,driver_version,"
> +            "temperature.gpu,utilization.gpu,utilization.memory,"
> +            "memory.total,memory.free,memory.used",
> +        "--format=csv,noheader,nounits",
> +        NULL
> +    };
> +    g_autofree gchar *stdout_buf = NULL;
> +    g_autofree gchar *stderr_buf = NULL;
> +    gint exit_status;
> +    GError *gerr = NULL;
> +    GuestNvidiaGpuList *head = NULL, **tail = &head;
> +
> +    if (!g_spawn_sync(NULL, (gchar **)argv, NULL,
> +                      G_SPAWN_SEARCH_PATH,
> +                      NULL, NULL,
> +                      &stdout_buf, &stderr_buf,
> +                      &exit_status, &gerr)) {

Why not ga_run_command()?  Hmm, it throws away the command's output on
success.

Kostiantyn, should ga_run_command() be rewritten on top of
g_spawn_sync()?

> +        error_setg(errp, "failed to run nvidia-smi: %s", gerr->message);
> +        g_error_free(gerr);
> +        return NULL;
> +    }
> +
> +    if (exit_status != 0) {
> +        error_setg(errp, "nvidia-smi failed (exit %d): %s",
> +                   exit_status, stderr_buf ? stderr_buf : "unknown error");

This is wrong if @stderr_buf can contain newlines.  qapi/error.h:

 * The resulting message should be a single phrase, with no newline or
 * trailing punctuation.

However, there's similar misuse elsewhere in this file.  Oh well, carry
on.

> +        return NULL;
> +    }
> +

I figure the command's output is some form of CSV.  Can you point to its
documentation?

> +    gchar **lines = g_strsplit(stdout_buf, "\n", -1);
> +    for (int i = 0; lines[i] != NULL; i++) {
> +        gchar *line = g_strstrip(lines[i]);
> +        if (*line == '\0') {
> +            continue;

Silently ignore empty lines.  Okay.

> +        }
> +
> +        gchar **f = g_strsplit(line, ", ", 9);

If the line has more than 9 values, they are squashed into the last one.

> +        if (g_strv_length(f) < 9) {
> +            g_strfreev(f);
> +            continue;

Silently ignore lines with less than 9 values.

> +        }
> +
> +        GuestNvidiaGpu *gpu     = g_new0(GuestNvidiaGpu, 1);
> +        gpu->index              = (int)g_ascii_strtoll(f[0], NULL, 10);

If the value doesn't parse as decimal signed integer, we silently assume
zero.

If it parses, we silently ignore any text following it.

If it parses a value outside 64 bit signed range, we silently assume its
largest or smallest value.

> +        gpu->name               = g_strdup(g_strstrip(f[1]));
> +        gpu->driver_version     = g_strdup(g_strstrip(f[2]));
> +        gpu->temperature        = (int)g_ascii_strtoll(f[3], NULL, 10);
> +        gpu->gpu_utilization    = (int)g_ascii_strtoll(f[4], NULL, 10);
> +        gpu->memory_utilization = (int)g_ascii_strtoll(f[5], NULL, 10);
> +        gpu->memory_total       = (int)g_ascii_strtoll(f[6], NULL, 10);
> +        gpu->memory_free        = (int)g_ascii_strtoll(f[7], NULL, 10);
> +        gpu->memory_used        = (int)g_ascii_strtoll(f[8], NULL, 10);
> +
> +        QAPI_LIST_APPEND(tail, gpu);
> +        g_strfreev(f);
> +    }
> +    g_strfreev(lines);

Are you *sure* this is robust enough?

Please consider a bog-standard LL(1) parser.

> +
> +    return head;
> +}
> diff --git a/qga/commands-win32.c b/qga/commands-win32.c
> index c0bf3467bd..a78d5b71f5 100644
> --- a/qga/commands-win32.c
> +++ b/qga/commands-win32.c
> @@ -2764,3 +2764,67 @@ GuestNetworkRouteList *qmp_guest_network_get_route(Error **errp)
>      g_hash_table_destroy(interface_metric_cache);
>      return head;
>  }
> +
> +GuestNvidiaGpuList *qmp_guest_get_nvidia_smi(Error **errp)
> +{
> +    const gchar *argv[] = {
> +        "nvidia-smi",
> +        "--query-gpu=index,name,driver_version,"
> +            "temperature.gpu,utilization.gpu,utilization.memory,"
> +            "memory.total,memory.free,memory.used",
> +        "--format=csv,noheader,nounits",
> +        NULL
> +    };
> +    g_autofree gchar *stdout_buf = NULL;
> +    g_autofree gchar *stderr_buf = NULL;
> +    gint exit_status;
> +    GError *gerr = NULL;
> +    GuestNvidiaGpuList *head = NULL, **tail = &head;
> +
> +    if (!g_spawn_sync(NULL, (gchar **)argv, NULL,
> +                      G_SPAWN_SEARCH_PATH,
> +                      NULL, NULL,
> +                      &stdout_buf, &stderr_buf,
> +                      &exit_status, &gerr)) {
> +        error_setg(errp, "failed to run nvidia-smi: %s", gerr->message);
> +        g_error_free(gerr);
> +        return NULL;
> +    }
> +
> +    if (exit_status != 0) {
> +        error_setg(errp, "nvidia-smi failed (exit %d): %s",
> +                   exit_status, stderr_buf ? stderr_buf : "unknown error");
> +        return NULL;
> +    }
> +
> +    gchar **lines = g_strsplit(stdout_buf, "\n", -1);
> +    for (int i = 0; lines[i] != NULL; i++) {
> +        gchar *line = g_strstrip(lines[i]);
> +        if (*line == '\0') {
> +            continue;
> +        }
> +
> +        gchar **f = g_strsplit(line, ", ", 9);
> +        if (g_strv_length(f) < 9) {
> +            g_strfreev(f);
> +            continue;
> +        }
> +
> +        GuestNvidiaGpu *gpu     = g_new0(GuestNvidiaGpu, 1);
> +        gpu->index              = (int)g_ascii_strtoll(f[0], NULL, 10);
> +        gpu->name               = g_strdup(g_strstrip(f[1]));
> +        gpu->driver_version     = g_strdup(g_strstrip(f[2]));
> +        gpu->temperature        = (int)g_ascii_strtoll(f[3], NULL, 10);
> +        gpu->gpu_utilization    = (int)g_ascii_strtoll(f[4], NULL, 10);
> +        gpu->memory_utilization = (int)g_ascii_strtoll(f[5], NULL, 10);
> +        gpu->memory_total       = (int)g_ascii_strtoll(f[6], NULL, 10);
> +        gpu->memory_free        = (int)g_ascii_strtoll(f[7], NULL, 10);
> +        gpu->memory_used        = (int)g_ascii_strtoll(f[8], NULL, 10);
> +
> +        QAPI_LIST_APPEND(tail, gpu);
> +        g_strfreev(f);
> +    }
> +    g_strfreev(lines);
> +
> +    return head;
> +}

Duplicates the output parser.  Can you avoid that?

> diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json
> index c57bc9a02f..8abbf71131 100644
> --- a/qga/qapi-schema.json
> +++ b/qga/qapi-schema.json
> @@ -1876,6 +1876,65 @@
>    'if': { 'any': ['CONFIG_WIN32', 'CONFIG_GETLOADAVG'] }
>  }
>  
> +##
> +# @GuestNvidiaGpu:
> +#
> +# Information about a single NVIDIA GPU as reported by nvidia-smi.
> +#
> +# @index: GPU index (0-based), stable across reboots for a given
> +#         hardware slot

Please format like

   # @index: GPU index (0-based), stable across reboots for a given
   #     hardware slot

> +#
> +# @name: GPU product name (e.g. "NVIDIA A100-SXM4-80GB")
> +#
> +# @driver-version: version string of the installed NVIDIA driver
> +#
> +# @temperature: GPU die temperature in degrees Celsius
> +#
> +# @gpu-utilization: GPU compute engine utilization in percent (0-100)

(0-100) feels redundant.

> +#
> +# @memory-utilization: GPU memory controller utilization in percent
> +#                      (0-100)

Likewise.

> +#
> +# @memory-total: total framebuffer memory in MiB
> +#
> +# @memory-free: free framebuffer memory in MiB
> +#
> +# @memory-used: used framebuffer memory in MiB
> +#
> +# Since: 10.1

11.1 most likely.

> +##
> +{ 'struct': 'GuestNvidiaGpu',
> +  'data': {
> +      'index':              'int',
> +      'name':               'str',
> +      'driver-version':     'str',
> +      'temperature':        'int',
> +      'gpu-utilization':    'int',
> +      'memory-utilization': 'int',
> +      'memory-total':       'int',
> +      'memory-free':        'int',
> +      'memory-used':        'int'
> +  }
> +}
> +
> +##
> +# @guest-get-nvidia-smi:
> +#
> +# Query NVIDIA GPU information via nvidia-smi inside the guest.
> +#
> +# Returns one @GuestNvidiaGpu entry per physical GPU (or MIG instance)
> +# detected by the NVIDIA driver.
> +#
> +# Errors:
> +#   - If nvidia-smi is not installed or not found in $PATH
> +#   - If nvidia-smi exits with a non-zero status (e.g. no NVIDIA
> +#     device)

We commonly mention the error kind like this:

   #   - If nvidia-smi is not installed or not found in $PATH,
   #     GenericError
   #   - If nvidia-smi exits with a non-zero status (e.g. no NVIDIA
   #     device), GenericError

> +#
> +# Since: 10.1
> +##
> +{ 'command': 'guest-get-nvidia-smi',
> +  'returns': ['GuestNvidiaGpu'] }
> +
>  ##
>  # @GuestNetworkRoute:
>  #

Why not use existing guest-exec, and leave the parsing to the client?

next prev parent reply	other threads:[~2026-04-01 11:26 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-31 11:01 [PATCH] qga: implement 'guest-get-nvidia-smi' command João Vilaça
2026-03-31 12:57 ` Kostiantyn Kostiuk
2026-03-31 13:07   ` João Vilaça
2026-04-01 11:58     ` Daniel P. Berrangé
2026-04-01 11:25 ` Markus Armbruster [this message]
2026-04-01 11:50   ` Kostiantyn Kostiuk
2026-04-01 12:00     ` João Vilaça
2026-04-01 12:01     ` Daniel P. Berrangé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871pgy3otf.fsf@pond.sub.org \
    --to=armbru@redhat.com \
    --cc=kkostiuk@redhat.com \
    --cc=machadovilaca@gmail.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.