From: David Gibson <david@gibson.dropbear.id.au>
To: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: xiaoguangrong.eric@gmail.com, mst@redhat.com,
aneesh.kumar@linux.ibm.com, qemu-devel@nongnu.org,
kvm-ppc@vger.kernel.org, groug@kaod.org,
shivaprasadbhat@gmail.com, qemu-ppc@nongnu.org,
bharata@linux.vnet.ibm.com, imammedo@redhat.com,
ehabkost@redhat.com
Subject: Re: [RFC PATCH v4] ppc/spapr: Add support for H_SCM_PERFORMANCE_STATS hcall
Date: Thu, 27 May 2021 02:01:30 +0000 [thread overview]
Message-ID: <YK79eipSGbzTLMlp@yekko> (raw)
In-Reply-To: <20210526030057.395092-1-vaibhav@linux.ibm.com>
[-- Attachment #1: Type: text/plain, Size: 16848 bytes --]
On Wed, May 26, 2021 at 08:30:57AM +0530, Vaibhav Jain wrote:
> Add support for H_SCM_PERFORMANCE_STATS described at [1] for
> spapr nvdimms. This enables guest to fetch performance stats[2] like
> expected life of an nvdimm ('MemLife ') etc and display them to the
> user. Linux kernel support for fetching these performance stats and
> exposing them to the user-space was done via [3].
>
> The hcall semantics mandate that each nvdimm performance stats is
> uniquely identied by a 8-byte ascii string encoded as an unsigned
> integer (e.g 'MemLife ' == 0x4D656D4C69666520) and its value be a
> 8-byte unsigned integer. These performance-stats are exchanged with
> the guest in via a guest allocated buffer called
> 'requestAndResultBuffer' or rr-buffer for short. This buffer contains
> a header descibed by 'struct papr_scm_perf_stats' followed by an array
> of performance-stats described by 'struct papr_scm_perf_stat'. The
> hypervisor is expected to validate the rr-buffer header and then based
> on the request copy the needed performance-stats to the array of
> 'struct papr_scm_perf_stat' following the header.
>
> The patch proposes a new function h_scm_performance_stats() that
> services the H_SCM_PERFORMANCE_STATS hcall. After verifying the
> validity of the rr-buffer header via scm_perf_check_rr_buffer() it
> proceeds to fill the rr-buffer with requested performance-stats. The
> value of individual stats is retrived from individual accessor
> function for the stat which are indexed in the array
> 'nvdimm_perf_stats'.
>
> References:
> [1] "Hypercall Op-codes (hcalls)"
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/powerpc/papr_hcalls.rst#n269
> [2] Sysfs attribute documentation for papr_scm
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-bus-papr-pmem#n36
> [3] "powerpc/papr_scm: Fetch nvdimm performance stats from PHYP"
> https://lore.kernel.org/r/20200731064153.182203-2-vaibhav@linux.ibm.com
>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Looking fairly good now. A few trivial nits, one remaining bug (I
suspect) and of course the fact that you don't yet have any
non-example stats to return.
> ---
> Changelog
>
> Since RFC-v3:
> * Introduced consistent types 'perf_stat_id/val' for handling
> perf-stats in code rather than switching between u64 and char[8]. [
> David Gibson ]
> * Removed a redundant temporary variable 'stat_id' usage from
> nvdimm_stat_getval() [ David Gibson ]
> * Move the tests for buffer sizes around to a a central place in
> h_scm_performance_stats() [ David Gibson ]
> * Update the allocation size/type of rr-buffer in
> h_scm_performance_stats [ David Gibson, Groug ]
> * Reworked the loop for generating canned-response in
> h_scm_performance_stats().
> * Reworked logic for returning an unknown stat-id in h_scm_performance_stats().
>
> Since RFC-v2:
> * s/SCM_STATS_MAX_OUTPUT_BUFFER/SCM_STATS_MIN_OUTPUT_BUFFER/ thats the
> minimum size buffer needed to return all supported performance
> stats. Value for this is derived from size of array 'nvdimm_perf_stats'
> * Added SCM_STATS_MAX_OUTPUT_BUFFER to indicate maximum supported
> rr-buffer size from a guest. Value for this is derived from hard
> limit of SCM_STATS_MAX_STATS.
> * Updated scm_perf_check_rr_buffer() to add a check for max size of
> rr-buffer. [David]
> * Updated scm_perf_check_rr_buffer() to removed a reduntant check for
> min size of rr-buffer [David]
> * Updated h_scm_performance_stats() to have a single allocation for
> rr-buffer and copy it back to guest memory in a single shot. [David]
>
> Since RFC-v1:
> * Removed empty lines from code [ David ]
> * Updated struct papr_scm_perf_stat to use uint64_t for
> statistic_id.
> * Added a hard limit on max number of stats requested to 255 [ David ]
> * Updated scm_perf_check_rr_buffer() to check for rr-buffer header
> size [ David ]
> * Removed a redundant check from nvdimm_stat_getval() [ David ]
> * Removed a redundant call to address_space_access_valid() in
> scm_perf_check_rr_buffer() [ David ]
> * Instead of allocating a minimum size local buffer, allocate a max
> possible size local rr-buffer. [ David ]
> * Updated nvdimm_stat_getval() to set 'val' to '0' on error. [ David ]
> * Updated h_scm_performance_stats() to use a canned-response method
> for simplifying num_stats==0 case [ David ].
> ---
> hw/ppc/spapr_nvdimm.c | 219 +++++++++++++++++++++++++++++++++++++++++
> include/hw/ppc/spapr.h | 22 ++++-
> 2 files changed, 240 insertions(+), 1 deletion(-)
>
> diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c
> index 252204e25f..81ab2811bc 100644
> --- a/hw/ppc/spapr_nvdimm.c
> +++ b/hw/ppc/spapr_nvdimm.c
> @@ -35,6 +35,23 @@
> /* SCM device is unable to persist memory contents */
> #define PAPR_PMEM_UNARMED PPC_BIT(0)
>
> +/* Maximum number of stats that we can return back in a single stat request */
> +#define SCM_STATS_MAX_STATS 255
> +
> +/* Given _numstats_ != 0 calculate the size of RR buffer required */
> +#define SCM_STATS_RR_BUFFER_SIZE(_numstats_) \
> + (sizeof(struct papr_scm_perf_stats) + \
> + sizeof(struct papr_scm_perf_stat) * \
> + (_numstats_))
> +
> +/* Maximum possible output buffer to fulfill a perf-stats request */
> +#define SCM_STATS_MAX_OUTPUT_BUFFER \
> + (SCM_STATS_RR_BUFFER_SIZE(SCM_STATS_MAX_STATS))
> +
> +/* Minimum output buffer size needed to return all perf_stats except noopstat */
> +#define SCM_STATS_MIN_OUTPUT_BUFFER (SCM_STATS_RR_BUFFER_SIZE\
Nit: linebreak before the macro body, like MAX_OUTPUT_BUFFER would be
more readable.
> + (ARRAY_SIZE(nvdimm_perf_stats) - 1))
> +
> bool spapr_nvdimm_validate(HotplugHandler *hotplug_dev, NVDIMMDevice *nvdimm,
> uint64_t size, Error **errp)
> {
> @@ -502,6 +519,207 @@ static target_ulong h_scm_health(PowerPCCPU *cpu, SpaprMachineState *spapr,
> return H_SUCCESS;
> }
>
> +static int perf_stat_noop(SpaprDrc *drc, perf_stat_id unused,
> + perf_stat_val *val)
> +{
> + *val = 0;
> + return H_SUCCESS;
> +}
> +
> +static int perf_stat_memlife(SpaprDrc *drc, perf_stat_id unused,
> + perf_stat_val *val)
> +{
> + /* Assume full life available of an NVDIMM right now */
> + *val = 100;
> + return H_SUCCESS;
> +}
> +
> +/*
> + * Holds all supported performance stats accessors. Each performance-statistic
> + * is uniquely identified by a 8-byte ascii string for example: 'MemLife '
> + * which indicate in percentage how much usage life of an nvdimm is remaining.
> + * 'NoopStat' which is primarily used to test support for retriving performance
> + * stats and also to replace unknown stats present in the rr-buffer.
> + *
> + */
> +static const struct {
> + perf_stat_id stat_id;
> + int (*stat_getval)(SpaprDrc *drc, perf_stat_id id, perf_stat_val *val);
> +} nvdimm_perf_stats[] = {
> + { "NoopStat", perf_stat_noop},
> + { "MemLife ", perf_stat_memlife},
> +};
> +
> +/*
> + * Given a nvdimm drc and stat-name return its value. In case given stat-name
> + * isnt supported then return H_PARTIAL.
> + */
> +static int nvdimm_stat_getval(SpaprDrc *drc, perf_stat_id id,
> + perf_stat_val *val)
> +{
> + int index;
> +
> + *val = 0;
> +
> + /* Lookup the stats-id in the nvdimm_perf_stats table */
> + for (index = 0; index < ARRAY_SIZE(nvdimm_perf_stats); ++index) {
> + if (!memcmp(&nvdimm_perf_stats[index].stat_id, id,
> + sizeof(perf_stat_id))) {
> + return nvdimm_perf_stats[index].stat_getval(drc, id, val);
> + }
> + }
> + return H_PARTIAL;
> +}
> +
> +/*
> + * Given a request & result buffer header verify its contents. Also
> + * buffer-size and number of stats requested are within our expected
> + * sane bounds.
> + */
> +static int scm_perf_check_rr_buffer(struct papr_scm_perf_stats *header,
> + hwaddr addr, size_t size,
> + uint32_t *num_stats)
> +{
> + size_t expected_buffsize;
> +
> + /* Verify the header eyecather and version */
> + if (memcmp(&header->eye_catcher, SCM_STATS_EYECATCHER,
> + sizeof(header->eye_catcher))) {
> + return H_BAD_DATA;
> + }
> + if (be32_to_cpu(header->stats_version) != 0x1) {
> + return H_NOT_AVAILABLE;
> + }
> +
> + /* verify that rr buffer has enough space */
> + *num_stats = be32_to_cpu(header->num_statistics);
> + if (*num_stats > SCM_STATS_MAX_STATS) {
> + /* Too many stats requested */
> + return H_P3;
> + }
> +
> + expected_buffsize = *num_stats ?
> + SCM_STATS_RR_BUFFER_SIZE(*num_stats) : SCM_STATS_MIN_OUTPUT_BUFFER;
> + if (size < expected_buffsize) {
> + return H_P3;
> + }
> +
> + return H_SUCCESS;
> +}
> +
> +/*
> + * For a given DRC index (R3) return one ore more performance stats of an nvdimm
> + * device in guest allocated Request-and-result buffer (rr-buffer) (R4) of
> + * given 'size' (R5). The rr-buffer consists of a header described by
> + * 'struct papr_scm_perf_stats' that embeds the 'stats_version' and
> + * 'num_statistics' fields. This is followed by an array of
> + * 'struct papr_scm_perf_stat'. Based on the request type the writes the
> + * performance into the array of 'struct papr_scm_perf_stat' embedded inside
> + * the rr-buffer provided by the guest.
> + * Special cases handled are:
> + * 'size' == 0 : Return the maximum possible size of rr-buffer
> + * 'size' != 0 && 'num_statistics == 0' : Return all possible performance stats
> + *
> + * In case there was an error fetching a specific stats (e.g stat-id unknown or
> + * any other error) then return the stat-id in R4 and also replace its stat
> + * entry in rr-buffer with 'NoopStat'
> + */
> +static target_ulong h_scm_performance_stats(PowerPCCPU *cpu,
> + SpaprMachineState *spapr,
> + target_ulong opcode,
> + target_ulong *args)
> +{
> + SpaprDrc *drc = spapr_drc_by_index(args[0]);
> + const hwaddr addr = args[1];
> + size_t size = args[2];
> + g_autofree struct papr_scm_perf_stats *perfstats = NULL;
> + struct papr_scm_perf_stat *stats;
> + perf_stat_val stat_val;
> + uint32_t num_stats;
> + int index;
> + long rc;
> +
> + /* Ensure that the drc is valid & is valid PMEM dimm and is plugged in */
> + if (!drc || !drc->dev ||
> + spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
> + return H_PARAMETER;
> + }
> +
> + /* Guest requested max buffer size for output buffer */
Nit: comment reads confusingly with the name "MIN_OUTPUT_BUFFER",
clarify please.
> + if (size == 0) {
> + args[0] = SCM_STATS_MIN_OUTPUT_BUFFER;
> + return H_SUCCESS;
> + }
> +
> + /* verify size is enough to hold rr-buffer */
> + if (size < sizeof(struct papr_scm_perf_stats)) {
> + return H_BAD_DATA;
> + }
> +
> + if (size > SCM_STATS_MAX_OUTPUT_BUFFER) {
> + return H_P3;
> + }
> +
> + /* allocate enough buffer space locally for holding max stats */
Nit: comment is no longer accurate now that you allocate based on the
given size.
> + perfstats = g_try_malloc0(size);
> + if (!perfstats) {
> + return H_NO_MEM;
> + }
> + stats = &perfstats->scm_statistics[0];
> +
> + /* Read and verify rr-buffer */
> + rc = address_space_read(&address_space_memory, addr, MEMTXATTRS_UNSPECIFIED,
> + perfstats, size);
> + if (rc != MEMTX_OK) {
> + return H_PRIVILEGE;
> + }
> + rc = scm_perf_check_rr_buffer(perfstats, addr, size, &num_stats);
> + if (rc != H_SUCCESS) {
> + return rc;
> + }
> +
> + /* when returning all stats generate a canned response first */
> + if (num_stats == 0) {
> + /* Ignore 'noopstat' when generating canned response */
> + for (num_stats = 0; num_stats < ARRAY_SIZE(nvdimm_perf_stats) - 1;
> + ++num_stats) {
> + memcpy(&stats[num_stats].statistic_id,
> + nvdimm_perf_stats[num_stats + 1].stat_id,
> + sizeof(perf_stat_id));
> + }
> + }
> +
> + /* Populate the rr-buffer with stat values */
> + for (args[0] = 0, index = 0; index < num_stats; ++index) {
> + rc = nvdimm_stat_getval(drc, stats[index].statistic_id, &stat_val);
> +
> + /* On error add noop stat to rr buffer & save last inval stat-id */
> + if (rc != H_SUCCESS) {
> + if (!args[0]) {
> + memcpy(&args[0], stats[index].statistic_id,
> + sizeof(perf_stat_id));
args[0] is effectively a return register, stored in *host native*
order, so I think you need a be64_to_cpu() here, though I guess you
need to check how PAPR says you're supposed to load the stat ids into
a register.
> + }
> + memcpy(&stats[index].statistic_id, nvdimm_perf_stats[0].stat_id,
> + sizeof(perf_stat_id));
> + }
> + /* Caller expects stat values in BE encoding */
> + stats[index].statistic_value = cpu_to_be64(stat_val);
> + }
> +
> + /* Update and copy the local rr-buffer back to guest */
> + perfstats->num_statistics = cpu_to_be32(num_stats);
> + g_assert(size <= SCM_STATS_MAX_OUTPUT_BUFFER);
> + rc = address_space_write(&address_space_memory, addr,
> + MEMTXATTRS_UNSPECIFIED, perfstats, size);
> +
> + if (rc != MEMTX_OK) {
> + return H_PRIVILEGE;
> + }
> +
> + /* Check if there was a failure in fetching any stat */
> + return args[0] ? H_PARTIAL : H_SUCCESS;
> +}
> +
> static void spapr_scm_register_types(void)
> {
> /* qemu/scm specific hcalls */
> @@ -511,6 +729,7 @@ static void spapr_scm_register_types(void)
> spapr_register_hypercall(H_SCM_UNBIND_MEM, h_scm_unbind_mem);
> spapr_register_hypercall(H_SCM_UNBIND_ALL, h_scm_unbind_all);
> spapr_register_hypercall(H_SCM_HEALTH, h_scm_health);
> + spapr_register_hypercall(H_SCM_PERFORMANCE_STATS, h_scm_performance_stats);
> }
>
> type_init(spapr_scm_register_types)
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index bbf817af46..9e093b3a00 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -326,6 +326,7 @@ struct SpaprMachineState {
> #define H_P8 -61
> #define H_P9 -62
> #define H_OVERLAP -68
> +#define H_BAD_DATA -70
> #define H_UNSUPPORTED_FLAG -256
> #define H_MULTI_THREADS_ACTIVE -9005
>
> @@ -539,8 +540,9 @@ struct SpaprMachineState {
> #define H_SCM_UNBIND_MEM 0x3F0
> #define H_SCM_UNBIND_ALL 0x3FC
> #define H_SCM_HEALTH 0x400
> +#define H_SCM_PERFORMANCE_STATS 0x418
>
> -#define MAX_HCALL_OPCODE H_SCM_HEALTH
> +#define MAX_HCALL_OPCODE H_SCM_PERFORMANCE_STATS
>
> /* The hcalls above are standardized in PAPR and implemented by pHyp
> * as well.
> @@ -793,6 +795,24 @@ OBJECT_DECLARE_SIMPLE_TYPE(SpaprTceTable, SPAPR_TCE_TABLE)
> DECLARE_INSTANCE_CHECKER(IOMMUMemoryRegion, SPAPR_IOMMU_MEMORY_REGION,
> TYPE_SPAPR_IOMMU_MEMORY_REGION)
>
> +/* Defs and structs exchanged with guest when reporting drc perf stats */
> +#define SCM_STATS_EYECATCHER "SCMSTATS"
> +
> +typedef char perf_stat_id[8];
> +typedef uint64_t perf_stat_val;
> +
> +struct QEMU_PACKED papr_scm_perf_stat {
> + perf_stat_id statistic_id;
> + perf_stat_val statistic_value;
> +};
> +
> +struct QEMU_PACKED papr_scm_perf_stats {
> + uint8_t eye_catcher[8]; /* Should be “SCMSTATS” */
> + uint32_t stats_version; /* Should be 0x01 */
> + uint32_t num_statistics; /* Number of stats following */
> + struct papr_scm_perf_stat scm_statistics[]; /* Performance matrics */
> +};
> +
> struct SpaprTceTable {
> DeviceState parent;
> uint32_t liobn;
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
WARNING: multiple messages have this Message-ID (diff)
From: David Gibson <david@gibson.dropbear.id.au>
To: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: xiaoguangrong.eric@gmail.com, mst@redhat.com,
aneesh.kumar@linux.ibm.com, qemu-devel@nongnu.org,
kvm-ppc@vger.kernel.org, groug@kaod.org,
shivaprasadbhat@gmail.com, qemu-ppc@nongnu.org,
bharata@linux.vnet.ibm.com, imammedo@redhat.com,
ehabkost@redhat.com
Subject: Re: [RFC PATCH v4] ppc/spapr: Add support for H_SCM_PERFORMANCE_STATS hcall
Date: Thu, 27 May 2021 12:01:30 +1000 [thread overview]
Message-ID: <YK79eipSGbzTLMlp@yekko> (raw)
In-Reply-To: <20210526030057.395092-1-vaibhav@linux.ibm.com>
[-- Attachment #1: Type: text/plain, Size: 16848 bytes --]
On Wed, May 26, 2021 at 08:30:57AM +0530, Vaibhav Jain wrote:
> Add support for H_SCM_PERFORMANCE_STATS described at [1] for
> spapr nvdimms. This enables guest to fetch performance stats[2] like
> expected life of an nvdimm ('MemLife ') etc and display them to the
> user. Linux kernel support for fetching these performance stats and
> exposing them to the user-space was done via [3].
>
> The hcall semantics mandate that each nvdimm performance stats is
> uniquely identied by a 8-byte ascii string encoded as an unsigned
> integer (e.g 'MemLife ' == 0x4D656D4C69666520) and its value be a
> 8-byte unsigned integer. These performance-stats are exchanged with
> the guest in via a guest allocated buffer called
> 'requestAndResultBuffer' or rr-buffer for short. This buffer contains
> a header descibed by 'struct papr_scm_perf_stats' followed by an array
> of performance-stats described by 'struct papr_scm_perf_stat'. The
> hypervisor is expected to validate the rr-buffer header and then based
> on the request copy the needed performance-stats to the array of
> 'struct papr_scm_perf_stat' following the header.
>
> The patch proposes a new function h_scm_performance_stats() that
> services the H_SCM_PERFORMANCE_STATS hcall. After verifying the
> validity of the rr-buffer header via scm_perf_check_rr_buffer() it
> proceeds to fill the rr-buffer with requested performance-stats. The
> value of individual stats is retrived from individual accessor
> function for the stat which are indexed in the array
> 'nvdimm_perf_stats'.
>
> References:
> [1] "Hypercall Op-codes (hcalls)"
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/powerpc/papr_hcalls.rst#n269
> [2] Sysfs attribute documentation for papr_scm
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-bus-papr-pmem#n36
> [3] "powerpc/papr_scm: Fetch nvdimm performance stats from PHYP"
> https://lore.kernel.org/r/20200731064153.182203-2-vaibhav@linux.ibm.com
>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Looking fairly good now. A few trivial nits, one remaining bug (I
suspect) and of course the fact that you don't yet have any
non-example stats to return.
> ---
> Changelog
>
> Since RFC-v3:
> * Introduced consistent types 'perf_stat_id/val' for handling
> perf-stats in code rather than switching between u64 and char[8]. [
> David Gibson ]
> * Removed a redundant temporary variable 'stat_id' usage from
> nvdimm_stat_getval() [ David Gibson ]
> * Move the tests for buffer sizes around to a a central place in
> h_scm_performance_stats() [ David Gibson ]
> * Update the allocation size/type of rr-buffer in
> h_scm_performance_stats [ David Gibson, Groug ]
> * Reworked the loop for generating canned-response in
> h_scm_performance_stats().
> * Reworked logic for returning an unknown stat-id in h_scm_performance_stats().
>
> Since RFC-v2:
> * s/SCM_STATS_MAX_OUTPUT_BUFFER/SCM_STATS_MIN_OUTPUT_BUFFER/ thats the
> minimum size buffer needed to return all supported performance
> stats. Value for this is derived from size of array 'nvdimm_perf_stats'
> * Added SCM_STATS_MAX_OUTPUT_BUFFER to indicate maximum supported
> rr-buffer size from a guest. Value for this is derived from hard
> limit of SCM_STATS_MAX_STATS.
> * Updated scm_perf_check_rr_buffer() to add a check for max size of
> rr-buffer. [David]
> * Updated scm_perf_check_rr_buffer() to removed a reduntant check for
> min size of rr-buffer [David]
> * Updated h_scm_performance_stats() to have a single allocation for
> rr-buffer and copy it back to guest memory in a single shot. [David]
>
> Since RFC-v1:
> * Removed empty lines from code [ David ]
> * Updated struct papr_scm_perf_stat to use uint64_t for
> statistic_id.
> * Added a hard limit on max number of stats requested to 255 [ David ]
> * Updated scm_perf_check_rr_buffer() to check for rr-buffer header
> size [ David ]
> * Removed a redundant check from nvdimm_stat_getval() [ David ]
> * Removed a redundant call to address_space_access_valid() in
> scm_perf_check_rr_buffer() [ David ]
> * Instead of allocating a minimum size local buffer, allocate a max
> possible size local rr-buffer. [ David ]
> * Updated nvdimm_stat_getval() to set 'val' to '0' on error. [ David ]
> * Updated h_scm_performance_stats() to use a canned-response method
> for simplifying num_stats==0 case [ David ].
> ---
> hw/ppc/spapr_nvdimm.c | 219 +++++++++++++++++++++++++++++++++++++++++
> include/hw/ppc/spapr.h | 22 ++++-
> 2 files changed, 240 insertions(+), 1 deletion(-)
>
> diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c
> index 252204e25f..81ab2811bc 100644
> --- a/hw/ppc/spapr_nvdimm.c
> +++ b/hw/ppc/spapr_nvdimm.c
> @@ -35,6 +35,23 @@
> /* SCM device is unable to persist memory contents */
> #define PAPR_PMEM_UNARMED PPC_BIT(0)
>
> +/* Maximum number of stats that we can return back in a single stat request */
> +#define SCM_STATS_MAX_STATS 255
> +
> +/* Given _numstats_ != 0 calculate the size of RR buffer required */
> +#define SCM_STATS_RR_BUFFER_SIZE(_numstats_) \
> + (sizeof(struct papr_scm_perf_stats) + \
> + sizeof(struct papr_scm_perf_stat) * \
> + (_numstats_))
> +
> +/* Maximum possible output buffer to fulfill a perf-stats request */
> +#define SCM_STATS_MAX_OUTPUT_BUFFER \
> + (SCM_STATS_RR_BUFFER_SIZE(SCM_STATS_MAX_STATS))
> +
> +/* Minimum output buffer size needed to return all perf_stats except noopstat */
> +#define SCM_STATS_MIN_OUTPUT_BUFFER (SCM_STATS_RR_BUFFER_SIZE\
Nit: linebreak before the macro body, like MAX_OUTPUT_BUFFER would be
more readable.
> + (ARRAY_SIZE(nvdimm_perf_stats) - 1))
> +
> bool spapr_nvdimm_validate(HotplugHandler *hotplug_dev, NVDIMMDevice *nvdimm,
> uint64_t size, Error **errp)
> {
> @@ -502,6 +519,207 @@ static target_ulong h_scm_health(PowerPCCPU *cpu, SpaprMachineState *spapr,
> return H_SUCCESS;
> }
>
> +static int perf_stat_noop(SpaprDrc *drc, perf_stat_id unused,
> + perf_stat_val *val)
> +{
> + *val = 0;
> + return H_SUCCESS;
> +}
> +
> +static int perf_stat_memlife(SpaprDrc *drc, perf_stat_id unused,
> + perf_stat_val *val)
> +{
> + /* Assume full life available of an NVDIMM right now */
> + *val = 100;
> + return H_SUCCESS;
> +}
> +
> +/*
> + * Holds all supported performance stats accessors. Each performance-statistic
> + * is uniquely identified by a 8-byte ascii string for example: 'MemLife '
> + * which indicate in percentage how much usage life of an nvdimm is remaining.
> + * 'NoopStat' which is primarily used to test support for retriving performance
> + * stats and also to replace unknown stats present in the rr-buffer.
> + *
> + */
> +static const struct {
> + perf_stat_id stat_id;
> + int (*stat_getval)(SpaprDrc *drc, perf_stat_id id, perf_stat_val *val);
> +} nvdimm_perf_stats[] = {
> + { "NoopStat", perf_stat_noop},
> + { "MemLife ", perf_stat_memlife},
> +};
> +
> +/*
> + * Given a nvdimm drc and stat-name return its value. In case given stat-name
> + * isnt supported then return H_PARTIAL.
> + */
> +static int nvdimm_stat_getval(SpaprDrc *drc, perf_stat_id id,
> + perf_stat_val *val)
> +{
> + int index;
> +
> + *val = 0;
> +
> + /* Lookup the stats-id in the nvdimm_perf_stats table */
> + for (index = 0; index < ARRAY_SIZE(nvdimm_perf_stats); ++index) {
> + if (!memcmp(&nvdimm_perf_stats[index].stat_id, id,
> + sizeof(perf_stat_id))) {
> + return nvdimm_perf_stats[index].stat_getval(drc, id, val);
> + }
> + }
> + return H_PARTIAL;
> +}
> +
> +/*
> + * Given a request & result buffer header verify its contents. Also
> + * buffer-size and number of stats requested are within our expected
> + * sane bounds.
> + */
> +static int scm_perf_check_rr_buffer(struct papr_scm_perf_stats *header,
> + hwaddr addr, size_t size,
> + uint32_t *num_stats)
> +{
> + size_t expected_buffsize;
> +
> + /* Verify the header eyecather and version */
> + if (memcmp(&header->eye_catcher, SCM_STATS_EYECATCHER,
> + sizeof(header->eye_catcher))) {
> + return H_BAD_DATA;
> + }
> + if (be32_to_cpu(header->stats_version) != 0x1) {
> + return H_NOT_AVAILABLE;
> + }
> +
> + /* verify that rr buffer has enough space */
> + *num_stats = be32_to_cpu(header->num_statistics);
> + if (*num_stats > SCM_STATS_MAX_STATS) {
> + /* Too many stats requested */
> + return H_P3;
> + }
> +
> + expected_buffsize = *num_stats ?
> + SCM_STATS_RR_BUFFER_SIZE(*num_stats) : SCM_STATS_MIN_OUTPUT_BUFFER;
> + if (size < expected_buffsize) {
> + return H_P3;
> + }
> +
> + return H_SUCCESS;
> +}
> +
> +/*
> + * For a given DRC index (R3) return one ore more performance stats of an nvdimm
> + * device in guest allocated Request-and-result buffer (rr-buffer) (R4) of
> + * given 'size' (R5). The rr-buffer consists of a header described by
> + * 'struct papr_scm_perf_stats' that embeds the 'stats_version' and
> + * 'num_statistics' fields. This is followed by an array of
> + * 'struct papr_scm_perf_stat'. Based on the request type the writes the
> + * performance into the array of 'struct papr_scm_perf_stat' embedded inside
> + * the rr-buffer provided by the guest.
> + * Special cases handled are:
> + * 'size' == 0 : Return the maximum possible size of rr-buffer
> + * 'size' != 0 && 'num_statistics == 0' : Return all possible performance stats
> + *
> + * In case there was an error fetching a specific stats (e.g stat-id unknown or
> + * any other error) then return the stat-id in R4 and also replace its stat
> + * entry in rr-buffer with 'NoopStat'
> + */
> +static target_ulong h_scm_performance_stats(PowerPCCPU *cpu,
> + SpaprMachineState *spapr,
> + target_ulong opcode,
> + target_ulong *args)
> +{
> + SpaprDrc *drc = spapr_drc_by_index(args[0]);
> + const hwaddr addr = args[1];
> + size_t size = args[2];
> + g_autofree struct papr_scm_perf_stats *perfstats = NULL;
> + struct papr_scm_perf_stat *stats;
> + perf_stat_val stat_val;
> + uint32_t num_stats;
> + int index;
> + long rc;
> +
> + /* Ensure that the drc is valid & is valid PMEM dimm and is plugged in */
> + if (!drc || !drc->dev ||
> + spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
> + return H_PARAMETER;
> + }
> +
> + /* Guest requested max buffer size for output buffer */
Nit: comment reads confusingly with the name "MIN_OUTPUT_BUFFER",
clarify please.
> + if (size == 0) {
> + args[0] = SCM_STATS_MIN_OUTPUT_BUFFER;
> + return H_SUCCESS;
> + }
> +
> + /* verify size is enough to hold rr-buffer */
> + if (size < sizeof(struct papr_scm_perf_stats)) {
> + return H_BAD_DATA;
> + }
> +
> + if (size > SCM_STATS_MAX_OUTPUT_BUFFER) {
> + return H_P3;
> + }
> +
> + /* allocate enough buffer space locally for holding max stats */
Nit: comment is no longer accurate now that you allocate based on the
given size.
> + perfstats = g_try_malloc0(size);
> + if (!perfstats) {
> + return H_NO_MEM;
> + }
> + stats = &perfstats->scm_statistics[0];
> +
> + /* Read and verify rr-buffer */
> + rc = address_space_read(&address_space_memory, addr, MEMTXATTRS_UNSPECIFIED,
> + perfstats, size);
> + if (rc != MEMTX_OK) {
> + return H_PRIVILEGE;
> + }
> + rc = scm_perf_check_rr_buffer(perfstats, addr, size, &num_stats);
> + if (rc != H_SUCCESS) {
> + return rc;
> + }
> +
> + /* when returning all stats generate a canned response first */
> + if (num_stats == 0) {
> + /* Ignore 'noopstat' when generating canned response */
> + for (num_stats = 0; num_stats < ARRAY_SIZE(nvdimm_perf_stats) - 1;
> + ++num_stats) {
> + memcpy(&stats[num_stats].statistic_id,
> + nvdimm_perf_stats[num_stats + 1].stat_id,
> + sizeof(perf_stat_id));
> + }
> + }
> +
> + /* Populate the rr-buffer with stat values */
> + for (args[0] = 0, index = 0; index < num_stats; ++index) {
> + rc = nvdimm_stat_getval(drc, stats[index].statistic_id, &stat_val);
> +
> + /* On error add noop stat to rr buffer & save last inval stat-id */
> + if (rc != H_SUCCESS) {
> + if (!args[0]) {
> + memcpy(&args[0], stats[index].statistic_id,
> + sizeof(perf_stat_id));
args[0] is effectively a return register, stored in *host native*
order, so I think you need a be64_to_cpu() here, though I guess you
need to check how PAPR says you're supposed to load the stat ids into
a register.
> + }
> + memcpy(&stats[index].statistic_id, nvdimm_perf_stats[0].stat_id,
> + sizeof(perf_stat_id));
> + }
> + /* Caller expects stat values in BE encoding */
> + stats[index].statistic_value = cpu_to_be64(stat_val);
> + }
> +
> + /* Update and copy the local rr-buffer back to guest */
> + perfstats->num_statistics = cpu_to_be32(num_stats);
> + g_assert(size <= SCM_STATS_MAX_OUTPUT_BUFFER);
> + rc = address_space_write(&address_space_memory, addr,
> + MEMTXATTRS_UNSPECIFIED, perfstats, size);
> +
> + if (rc != MEMTX_OK) {
> + return H_PRIVILEGE;
> + }
> +
> + /* Check if there was a failure in fetching any stat */
> + return args[0] ? H_PARTIAL : H_SUCCESS;
> +}
> +
> static void spapr_scm_register_types(void)
> {
> /* qemu/scm specific hcalls */
> @@ -511,6 +729,7 @@ static void spapr_scm_register_types(void)
> spapr_register_hypercall(H_SCM_UNBIND_MEM, h_scm_unbind_mem);
> spapr_register_hypercall(H_SCM_UNBIND_ALL, h_scm_unbind_all);
> spapr_register_hypercall(H_SCM_HEALTH, h_scm_health);
> + spapr_register_hypercall(H_SCM_PERFORMANCE_STATS, h_scm_performance_stats);
> }
>
> type_init(spapr_scm_register_types)
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index bbf817af46..9e093b3a00 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -326,6 +326,7 @@ struct SpaprMachineState {
> #define H_P8 -61
> #define H_P9 -62
> #define H_OVERLAP -68
> +#define H_BAD_DATA -70
> #define H_UNSUPPORTED_FLAG -256
> #define H_MULTI_THREADS_ACTIVE -9005
>
> @@ -539,8 +540,9 @@ struct SpaprMachineState {
> #define H_SCM_UNBIND_MEM 0x3F0
> #define H_SCM_UNBIND_ALL 0x3FC
> #define H_SCM_HEALTH 0x400
> +#define H_SCM_PERFORMANCE_STATS 0x418
>
> -#define MAX_HCALL_OPCODE H_SCM_HEALTH
> +#define MAX_HCALL_OPCODE H_SCM_PERFORMANCE_STATS
>
> /* The hcalls above are standardized in PAPR and implemented by pHyp
> * as well.
> @@ -793,6 +795,24 @@ OBJECT_DECLARE_SIMPLE_TYPE(SpaprTceTable, SPAPR_TCE_TABLE)
> DECLARE_INSTANCE_CHECKER(IOMMUMemoryRegion, SPAPR_IOMMU_MEMORY_REGION,
> TYPE_SPAPR_IOMMU_MEMORY_REGION)
>
> +/* Defs and structs exchanged with guest when reporting drc perf stats */
> +#define SCM_STATS_EYECATCHER "SCMSTATS"
> +
> +typedef char perf_stat_id[8];
> +typedef uint64_t perf_stat_val;
> +
> +struct QEMU_PACKED papr_scm_perf_stat {
> + perf_stat_id statistic_id;
> + perf_stat_val statistic_value;
> +};
> +
> +struct QEMU_PACKED papr_scm_perf_stats {
> + uint8_t eye_catcher[8]; /* Should be “SCMSTATS” */
> + uint32_t stats_version; /* Should be 0x01 */
> + uint32_t num_statistics; /* Number of stats following */
> + struct papr_scm_perf_stat scm_statistics[]; /* Performance matrics */
> +};
> +
> struct SpaprTceTable {
> DeviceState parent;
> uint32_t liobn;
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2021-05-27 2:01 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-26 3:00 [RFC PATCH v4] ppc/spapr: Add support for H_SCM_PERFORMANCE_STATS hcall Vaibhav Jain
2021-05-26 3:12 ` Vaibhav Jain
2021-05-27 2:01 ` David Gibson [this message]
2021-05-27 2:01 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YK79eipSGbzTLMlp@yekko \
--to=david@gibson.dropbear.id.au \
--cc=aneesh.kumar@linux.ibm.com \
--cc=bharata@linux.vnet.ibm.com \
--cc=ehabkost@redhat.com \
--cc=groug@kaod.org \
--cc=imammedo@redhat.com \
--cc=kvm-ppc@vger.kernel.org \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
--cc=shivaprasadbhat@gmail.com \
--cc=vaibhav@linux.ibm.com \
--cc=xiaoguangrong.eric@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.