* [RFC PATCH v3] ppc/spapr: Add support for H_SCM_PERFORMANCE_STATS hcall
@ 2021-05-15 7:37 Vaibhav Jain
2021-05-17 6:23 ` David Gibson
0 siblings, 1 reply; 6+ messages in thread
From: Vaibhav Jain @ 2021-05-15 7:37 UTC (permalink / raw)
To: qemu-devel, kvm-ppc, qemu-ppc, david, mst, imammedo,
xiaoguangrong.eric
Cc: ehabkost, aneesh.kumar, groug, shivaprasadbhat, bharata,
Vaibhav Jain
Add support for H_SCM_PERFORMANCE_STATS described at [1] for
spapr nvdimms. This enables guest to fetch performance stats[2] like
expected life of an nvdimm ('MemLife ') etc and display them to the
user. Linux kernel support for fetching these performance stats and
exposing them to the user-space was done via [3].
The hcall semantics mandate that each nvdimm performance stats is
uniquely identied by a 8-byte ascii string encoded as an unsigned
integer (e.g 'MemLife ' == 0x4D656D4C69666520) and its value be a
8-byte unsigned integer. These performance-stats are exchanged with
the guest in via a guest allocated buffer called
'requestAndResultBuffer' or rr-buffer for short. This buffer contains
a header descibed by 'struct papr_scm_perf_stats' followed by an array
of performance-stats described by 'struct papr_scm_perf_stat'. The
hypervisor is expected to validate the rr-buffer header and then based
on the request copy the needed performance-stats to the array of
'struct papr_scm_perf_stat' following the header.
The patch proposes a new function h_scm_performance_stats() that
services the H_SCM_PERFORMANCE_STATS hcall. After verifying the
validity of the rr-buffer header via scm_perf_check_rr_buffer() it
proceeds to fill the rr-buffer with requested performance-stats. The
value of individual stats is retrived from individual accessor
function for the stat which are indexed in the array
'nvdimm_perf_stats'.
References:
[1] "Hypercall Op-codes (hcalls)"
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/powerpc/papr_hcalls.rst#n269
[2] Sysfs attribute documentation for papr_scm
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-bus-papr-pmem#n36
[3] "powerpc/papr_scm: Fetch nvdimm performance stats from PHYP"
https://lore.kernel.org/r/20200731064153.182203-2-vaibhav@linux.ibm.com
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Changelog
Since RFC-v2:
* s/SCM_STATS_MAX_OUTPUT_BUFFER/SCM_STATS_MIN_OUTPUT_BUFFER/ thats the
minimum size buffer needed to return all supported performance
stats. Value for this is derived from size of array 'nvdimm_perf_stats'
* Added SCM_STATS_MAX_OUTPUT_BUFFER to indicate maximum supported
rr-buffer size from a guest. Value for this is derived from hard
limit of SCM_STATS_MAX_STATS.
* Updated scm_perf_check_rr_buffer() to add a check for max size of
rr-buffer. [David]
* Updated scm_perf_check_rr_buffer() to removed a reduntant check for
min size of rr-buffer [David]
* Updated h_scm_performance_stats() to have a single allocation for
rr-buffer and copy it back to guest memory in a single shot. [David]
Since RFC-v1:
* Removed empty lines from code [ David ]
* Updated struct papr_scm_perf_stat to use uint64_t for
statistic_id.
* Added a hard limit on max number of stats requested to 255 [ David ]
* Updated scm_perf_check_rr_buffer() to check for rr-buffer header
size [ David ]
* Removed a redundant check from nvdimm_stat_getval() [ David ]
* Removed a redundant call to address_space_access_valid() in
scm_perf_check_rr_buffer() [ David ]
* Instead of allocating a minimum size local buffer, allocate a max
possible size local rr-buffer. [ David ]
* Updated nvdimm_stat_getval() to set 'val' to '0' on error. [ David ]
* Updated h_scm_performance_stats() to use a canned-response method
for simplifying num_stats==0 case [ David ].
---
hw/ppc/spapr_nvdimm.c | 222 +++++++++++++++++++++++++++++++++++++++++
include/hw/ppc/spapr.h | 19 +++-
2 files changed, 240 insertions(+), 1 deletion(-)
diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c
index 252204e25f..287326b9c0 100644
--- a/hw/ppc/spapr_nvdimm.c
+++ b/hw/ppc/spapr_nvdimm.c
@@ -35,6 +35,19 @@
/* SCM device is unable to persist memory contents */
#define PAPR_PMEM_UNARMED PPC_BIT(0)
+/* Minimum output buffer size needed to return all nvdimm_perf_stats */
+#define SCM_STATS_MIN_OUTPUT_BUFFER (sizeof(struct papr_scm_perf_stats) + \
+ sizeof(struct papr_scm_perf_stat) * \
+ ARRAY_SIZE(nvdimm_perf_stats))
+
+/* Maximum number of stats that we can return back in a single stat request */
+#define SCM_STATS_MAX_STATS 255
+
+/* Maximum possible output buffer to fulfill a perf-stats request */
+#define SCM_STATS_MAX_OUTPUT_BUFFER (sizeof(struct papr_scm_perf_stats) + \
+ sizeof(struct papr_scm_perf_stat) * \
+ SCM_STATS_MAX_STATS)
+
bool spapr_nvdimm_validate(HotplugHandler *hotplug_dev, NVDIMMDevice *nvdimm,
uint64_t size, Error **errp)
{
@@ -502,6 +515,214 @@ static target_ulong h_scm_health(PowerPCCPU *cpu, SpaprMachineState *spapr,
return H_SUCCESS;
}
+static int perf_stat_noop(SpaprDrc *drc, uint64_t unused, uint64_t *val)
+{
+ *val = 0;
+ return H_SUCCESS;
+}
+
+static int perf_stat_memlife(SpaprDrc *drc, uint64_t unused, uint64_t *val)
+{
+ /* Assume full life available of an NVDIMM right now */
+ *val = 100;
+ return H_SUCCESS;
+}
+
+/*
+ * Holds all supported performance stats accessors. Each performance-statistic
+ * is uniquely identified by a 8-byte ascii string for example: 'MemLife '
+ * which indicate in percentage how much usage life of an nvdimm is remaining.
+ * 'NoopStat' which is primarily used to test support for retriving performance
+ * stats and also to replace unknown stats present in the rr-buffer.
+ *
+ */
+static const struct {
+ char stat_id[8];
+ int (*stat_getval)(SpaprDrc *drc, uint64_t id, uint64_t *val);
+} nvdimm_perf_stats[] = {
+ { "NoopStat", perf_stat_noop},
+ { "MemLife ", perf_stat_memlife},
+};
+
+/*
+ * Given a nvdimm drc and stat-name return its value. In case given stat-name
+ * isnt supported then return H_PARTIAL.
+ */
+static int nvdimm_stat_getval(SpaprDrc *drc, uint64_t id, uint64_t *val)
+{
+ int index;
+ char stat_id[8];
+
+ /* since comparision is done */
+ memcpy(&stat_id[0], &id, 8);
+ *val = 0;
+
+ /* Lookup the stats-id in the nvdimm_perf_stats table */
+ for (index = 0; index < ARRAY_SIZE(nvdimm_perf_stats); ++index) {
+ if (memcmp(nvdimm_perf_stats[index].stat_id, &stat_id[0], 8) == 0) {
+ return nvdimm_perf_stats[index].stat_getval(drc, id, val);
+ }
+ }
+ return H_PARTIAL;
+}
+
+/*
+ * Given a request & result buffer header verify its contents. Also
+ * buffer-size and number of stats requested are within our expected
+ * sane bounds.
+ */
+static int scm_perf_check_rr_buffer(struct papr_scm_perf_stats *header,
+ hwaddr addr, size_t size,
+ uint32_t *num_stats)
+{
+ size_t expected_buffsize;
+
+ /* Verify the header eyecather and version */
+ if (memcmp(&header->eye_catcher, SCM_STATS_EYECATCHER,
+ sizeof(header->eye_catcher))) {
+ return H_BAD_DATA;
+ }
+ if (be32_to_cpu(header->stats_version) != 0x1) {
+ return H_NOT_AVAILABLE;
+ }
+
+ /* verify that rr buffer has enough space */
+ *num_stats = be32_to_cpu(header->num_statistics);
+ if (*num_stats == 0) { /* Return all stats */
+ expected_buffsize = SCM_STATS_MIN_OUTPUT_BUFFER;
+ } else if (*num_stats > SCM_STATS_MAX_STATS) {
+ /* Too many stats requested */
+ return H_P3;
+ } else { /* Return a subset of stats */
+ expected_buffsize = sizeof(struct papr_scm_perf_stats) +
+ (*num_stats) * sizeof(struct papr_scm_perf_stat);
+ }
+
+ if (size < expected_buffsize || size > SCM_STATS_MAX_OUTPUT_BUFFER) {
+ return H_P3;
+ }
+
+ return H_SUCCESS;
+}
+
+/*
+ * For a given DRC index (R3) return one ore more performance stats of an nvdimm
+ * device in guest allocated Request-and-result buffer (rr-buffer) (R4) of
+ * given 'size' (R5). The rr-buffer consists of a header described by
+ * 'struct papr_scm_perf_stats' that embeds the 'stats_version' and
+ * 'num_statistics' fields. This is followed by an array of
+ * 'struct papr_scm_perf_stat'. Based on the request type the writes the
+ * performance into the array of 'struct papr_scm_perf_stat' embedded inside
+ * the rr-buffer provided by the guest.
+ * Special cases handled are:
+ * 'size' == 0 : Return the maximum possible size of rr-buffer
+ * 'size' != 0 && 'num_statistics == 0' : Return all possible performance stats
+ *
+ * In case there was an error fetching a specific stats (e.g stat-id unknown or
+ * any other error) then return the stat-id in R4 and also replace its stat
+ * entry in rr-buffer with 'NoopStat'
+ */
+static target_ulong h_scm_performance_stats(PowerPCCPU *cpu,
+ SpaprMachineState *spapr,
+ target_ulong opcode,
+ target_ulong *args)
+{
+ SpaprDrc *drc = spapr_drc_by_index(args[0]);
+ const hwaddr addr = args[1];
+ size_t size = args[2];
+ struct papr_scm_perf_stats *perfstats;
+ struct papr_scm_perf_stat *stats;
+ uint64_t invalid_stat = 0, stat_val;
+ MemTxResult result;
+ uint32_t num_stats;
+ unsigned long rc;
+ int index;
+
+ /* Ensure that the drc is valid & is valid PMEM dimm and is plugged in */
+ if (!drc || !drc->dev ||
+ spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
+ return H_PARAMETER;
+ }
+
+ /* Guest requested max buffer size for output buffer */
+ if (size == 0) {
+ args[0] = SCM_STATS_MAX_OUTPUT_BUFFER;
+ return H_SUCCESS;
+ }
+
+ /* verify size is enough to hold rr-buffer header */
+ if (size < sizeof(struct papr_scm_perf_stats)) {
+ return H_BAD_DATA;
+ }
+
+ /* allocate enough buffer space locally for holding max stats */
+ perfstats = g_try_malloc0(SCM_STATS_MAX_OUTPUT_BUFFER);
+ if (!perfstats) {
+ return H_NO_MEM;
+ }
+
+ /* Read and verify rr-buffer header */
+ result = address_space_read(&address_space_memory, addr,
+ MEMTXATTRS_UNSPECIFIED, perfstats,
+ sizeof(struct papr_scm_perf_stats));
+ rc = (result == MEMTX_OK) ?
+ scm_perf_check_rr_buffer(perfstats, addr, size, &num_stats) :
+ H_PRIVILEGE;
+ if (rc != H_SUCCESS) {
+ g_free(perfstats);
+ return rc;
+ }
+
+ stats = &perfstats->scm_statistics[0];
+ /* when returning all stats generate a canned response first */
+ if (num_stats == 0) {
+ for (index = 1; index < ARRAY_SIZE(nvdimm_perf_stats); ++index) {
+ memcpy(&stats[index - 1].statistic_id,
+ &nvdimm_perf_stats[index].stat_id, 8);
+ }
+ num_stats = ARRAY_SIZE(nvdimm_perf_stats) - 1;
+ } else {
+ /* copy the rr-buffer from the guest memory */
+ result = address_space_read(&address_space_memory,
+ addr + sizeof(struct papr_scm_perf_stats),
+ MEMTXATTRS_UNSPECIFIED, stats,
+ size - sizeof(struct papr_scm_perf_stats));
+ if (result != MEMTX_OK) {
+ g_free(perfstats);
+ return H_PRIVILEGE;
+ }
+ }
+
+ for (index = 0; index < num_stats; ++index) {
+ rc = nvdimm_stat_getval(drc, stats[index].statistic_id, &stat_val);
+
+ /* On error add noop stat to rr buffer & save last inval stat-id */
+ if (rc != H_SUCCESS) {
+ if (!invalid_stat) {
+ invalid_stat = stats[index].statistic_id;
+ }
+ memcpy(&stats[index].statistic_id, nvdimm_perf_stats[0].stat_id, 8);
+ }
+ /* Caller expects stat values in BE encoding */
+ stats[index].statistic_value = cpu_to_be64(stat_val);
+ }
+
+ /* Update and copy the local rr-buffer back to guest */
+ perfstats->num_statistics = cpu_to_be32(num_stats);
+ g_assert(size <= SCM_STATS_MAX_OUTPUT_BUFFER);
+ result = address_space_write(&address_space_memory, addr,
+ MEMTXATTRS_UNSPECIFIED, perfstats, size);
+
+ /* Cleanup the stats buffer */
+ g_free(perfstats);
+ if (result) {
+ return H_PRIVILEGE;
+ }
+ /* Check if there was a failure in fetching any stat */
+ args[0] = invalid_stat;
+ return invalid_stat ? H_PARTIAL : H_SUCCESS;
+}
+
static void spapr_scm_register_types(void)
{
/* qemu/scm specific hcalls */
@@ -511,6 +732,7 @@ static void spapr_scm_register_types(void)
spapr_register_hypercall(H_SCM_UNBIND_MEM, h_scm_unbind_mem);
spapr_register_hypercall(H_SCM_UNBIND_ALL, h_scm_unbind_all);
spapr_register_hypercall(H_SCM_HEALTH, h_scm_health);
+ spapr_register_hypercall(H_SCM_PERFORMANCE_STATS, h_scm_performance_stats);
}
type_init(spapr_scm_register_types)
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index d2b5a9bdf9..6f3353b4ea 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -326,6 +326,7 @@ struct SpaprMachineState {
#define H_P8 -61
#define H_P9 -62
#define H_OVERLAP -68
+#define H_BAD_DATA -70
#define H_UNSUPPORTED_FLAG -256
#define H_MULTI_THREADS_ACTIVE -9005
@@ -539,8 +540,9 @@ struct SpaprMachineState {
#define H_SCM_UNBIND_MEM 0x3F0
#define H_SCM_UNBIND_ALL 0x3FC
#define H_SCM_HEALTH 0x400
+#define H_SCM_PERFORMANCE_STATS 0x418
-#define MAX_HCALL_OPCODE H_SCM_HEALTH
+#define MAX_HCALL_OPCODE H_SCM_PERFORMANCE_STATS
/* The hcalls above are standardized in PAPR and implemented by pHyp
* as well.
@@ -787,6 +789,21 @@ OBJECT_DECLARE_SIMPLE_TYPE(SpaprTceTable, SPAPR_TCE_TABLE)
DECLARE_INSTANCE_CHECKER(IOMMUMemoryRegion, SPAPR_IOMMU_MEMORY_REGION,
TYPE_SPAPR_IOMMU_MEMORY_REGION)
+/* Defs and structs exchanged with guest when reporting drc perf stats */
+#define SCM_STATS_EYECATCHER "SCMSTATS"
+
+struct QEMU_PACKED papr_scm_perf_stat {
+ uint64_t statistic_id;
+ uint64_t statistic_value;
+};
+
+struct QEMU_PACKED papr_scm_perf_stats {
+ uint8_t eye_catcher[8]; /* Should be “SCMSTATS” */
+ uint32_t stats_version; /* Should be 0x01 */
+ uint32_t num_statistics; /* Number of stats following */
+ struct papr_scm_perf_stat scm_statistics[]; /* Performance matrics */
+};
+
struct SpaprTceTable {
DeviceState parent;
uint32_t liobn;
--
2.31.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [RFC PATCH v3] ppc/spapr: Add support for H_SCM_PERFORMANCE_STATS hcall 2021-05-15 7:37 [RFC PATCH v3] ppc/spapr: Add support for H_SCM_PERFORMANCE_STATS hcall Vaibhav Jain @ 2021-05-17 6:23 ` David Gibson 2021-05-17 7:55 ` Greg Kurz 2021-05-22 3:31 ` Vaibhav Jain 0 siblings, 2 replies; 6+ messages in thread From: David Gibson @ 2021-05-17 6:23 UTC (permalink / raw) To: Vaibhav Jain Cc: xiaoguangrong.eric, mst, aneesh.kumar, qemu-devel, kvm-ppc, groug, shivaprasadbhat, qemu-ppc, bharata, imammedo, ehabkost [-- Attachment #1: Type: text/plain, Size: 17531 bytes --] On Sat, May 15, 2021 at 01:07:59PM +0530, Vaibhav Jain wrote: > Add support for H_SCM_PERFORMANCE_STATS described at [1] for > spapr nvdimms. This enables guest to fetch performance stats[2] like > expected life of an nvdimm ('MemLife ') etc and display them to the > user. Linux kernel support for fetching these performance stats and > exposing them to the user-space was done via [3]. > > The hcall semantics mandate that each nvdimm performance stats is > uniquely identied by a 8-byte ascii string encoded as an unsigned > integer (e.g 'MemLife ' == 0x4D656D4C69666520) and its value be a > 8-byte unsigned integer. These performance-stats are exchanged with > the guest in via a guest allocated buffer called > 'requestAndResultBuffer' or rr-buffer for short. This buffer contains > a header descibed by 'struct papr_scm_perf_stats' followed by an array > of performance-stats described by 'struct papr_scm_perf_stat'. The > hypervisor is expected to validate the rr-buffer header and then based > on the request copy the needed performance-stats to the array of > 'struct papr_scm_perf_stat' following the header. > > The patch proposes a new function h_scm_performance_stats() that > services the H_SCM_PERFORMANCE_STATS hcall. After verifying the > validity of the rr-buffer header via scm_perf_check_rr_buffer() it > proceeds to fill the rr-buffer with requested performance-stats. The > value of individual stats is retrived from individual accessor > function for the stat which are indexed in the array > 'nvdimm_perf_stats'. > > References: > [1] "Hypercall Op-codes (hcalls)" > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/powerpc/papr_hcalls.rst#n269 > [2] Sysfs attribute documentation for papr_scm > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-bus-papr-pmem#n36 > [3] "powerpc/papr_scm: Fetch nvdimm performance stats from PHYP" > https://lore.kernel.org/r/20200731064153.182203-2-vaibhav@linux.ibm.com > > Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> > --- > Changelog > > Since RFC-v2: > * s/SCM_STATS_MAX_OUTPUT_BUFFER/SCM_STATS_MIN_OUTPUT_BUFFER/ thats the > minimum size buffer needed to return all supported performance > stats. Value for this is derived from size of array 'nvdimm_perf_stats' > * Added SCM_STATS_MAX_OUTPUT_BUFFER to indicate maximum supported > rr-buffer size from a guest. Value for this is derived from hard > limit of SCM_STATS_MAX_STATS. > * Updated scm_perf_check_rr_buffer() to add a check for max size of > rr-buffer. [David] > * Updated scm_perf_check_rr_buffer() to removed a reduntant check for > min size of rr-buffer [David] > * Updated h_scm_performance_stats() to have a single allocation for > rr-buffer and copy it back to guest memory in a single shot. [David] > > Since RFC-v1: > * Removed empty lines from code [ David ] > * Updated struct papr_scm_perf_stat to use uint64_t for > statistic_id. > * Added a hard limit on max number of stats requested to 255 [ David ] > * Updated scm_perf_check_rr_buffer() to check for rr-buffer header > size [ David ] > * Removed a redundant check from nvdimm_stat_getval() [ David ] > * Removed a redundant call to address_space_access_valid() in > scm_perf_check_rr_buffer() [ David ] > * Instead of allocating a minimum size local buffer, allocate a max > possible size local rr-buffer. [ David ] > * Updated nvdimm_stat_getval() to set 'val' to '0' on error. [ David ] > * Updated h_scm_performance_stats() to use a canned-response method > for simplifying num_stats==0 case [ David ]. > --- > hw/ppc/spapr_nvdimm.c | 222 +++++++++++++++++++++++++++++++++++++++++ > include/hw/ppc/spapr.h | 19 +++- > 2 files changed, 240 insertions(+), 1 deletion(-) > > diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c > index 252204e25f..287326b9c0 100644 > --- a/hw/ppc/spapr_nvdimm.c > +++ b/hw/ppc/spapr_nvdimm.c > @@ -35,6 +35,19 @@ > /* SCM device is unable to persist memory contents */ > #define PAPR_PMEM_UNARMED PPC_BIT(0) > > +/* Minimum output buffer size needed to return all nvdimm_perf_stats */ > +#define SCM_STATS_MIN_OUTPUT_BUFFER (sizeof(struct papr_scm_perf_stats) + \ > + sizeof(struct papr_scm_perf_stat) * \ > + ARRAY_SIZE(nvdimm_perf_stats)) MIN_OUTPUT_BUFFER is a better name, but still not great. I think we can get rid of this define completely in a neat way, though. See below. > +/* Maximum number of stats that we can return back in a single stat request */ > +#define SCM_STATS_MAX_STATS 255 Although it's technically allowed, I'm still not convinced there's actually any reason to allow the user to request the same stat over and over. > +/* Maximum possible output buffer to fulfill a perf-stats request */ > +#define SCM_STATS_MAX_OUTPUT_BUFFER (sizeof(struct papr_scm_perf_stats) + \ > + sizeof(struct papr_scm_perf_stat) * \ > + SCM_STATS_MAX_STATS) > + > bool spapr_nvdimm_validate(HotplugHandler *hotplug_dev, NVDIMMDevice *nvdimm, > uint64_t size, Error **errp) > { > @@ -502,6 +515,214 @@ static target_ulong h_scm_health(PowerPCCPU *cpu, SpaprMachineState *spapr, > return H_SUCCESS; > } > > +static int perf_stat_noop(SpaprDrc *drc, uint64_t unused, uint64_t *val) > +{ > + *val = 0; > + return H_SUCCESS; > +} > + > +static int perf_stat_memlife(SpaprDrc *drc, uint64_t unused, uint64_t *val) > +{ > + /* Assume full life available of an NVDIMM right now */ > + *val = 100; > + return H_SUCCESS; > +} > + > +/* > + * Holds all supported performance stats accessors. Each performance-statistic > + * is uniquely identified by a 8-byte ascii string for example: 'MemLife ' > + * which indicate in percentage how much usage life of an nvdimm is remaining. > + * 'NoopStat' which is primarily used to test support for retriving performance > + * stats and also to replace unknown stats present in the rr-buffer. > + * > + */ > +static const struct { > + char stat_id[8]; So using a char[] here, but a uint64_t in the request structure makes it pretty hard to follow if you're doing the right thing w.r.t. endianness, especially since you effectively memcmp() directly between u64s and char[]s. You really want to use a consistent type for the ids. > + int (*stat_getval)(SpaprDrc *drc, uint64_t id, uint64_t *val); > +} nvdimm_perf_stats[] = { > + { "NoopStat", perf_stat_noop}, > + { "MemLife ", perf_stat_memlife}, > +}; > + > +/* > + * Given a nvdimm drc and stat-name return its value. In case given stat-name > + * isnt supported then return H_PARTIAL. > + */ > +static int nvdimm_stat_getval(SpaprDrc *drc, uint64_t id, uint64_t *val) > +{ > + int index; > + char stat_id[8]; > + > + /* since comparision is done */ > + memcpy(&stat_id[0], &id, 8); I don't see why you're making this temporary copy here. > + *val = 0; > + > + /* Lookup the stats-id in the nvdimm_perf_stats table */ > + for (index = 0; index < ARRAY_SIZE(nvdimm_perf_stats); ++index) { > + if (memcmp(nvdimm_perf_stats[index].stat_id, &stat_id[0], 8) == 0) { > + return nvdimm_perf_stats[index].stat_getval(drc, id, val); > + } > + } > + return H_PARTIAL; > +} > + > +/* > + * Given a request & result buffer header verify its contents. Also > + * buffer-size and number of stats requested are within our expected > + * sane bounds. > + */ > +static int scm_perf_check_rr_buffer(struct papr_scm_perf_stats *header, > + hwaddr addr, size_t size, > + uint32_t *num_stats) > +{ > + size_t expected_buffsize; > + > + /* Verify the header eyecather and version */ > + if (memcmp(&header->eye_catcher, SCM_STATS_EYECATCHER, > + sizeof(header->eye_catcher))) { > + return H_BAD_DATA; > + } > + if (be32_to_cpu(header->stats_version) != 0x1) { > + return H_NOT_AVAILABLE; > + } > + > + /* verify that rr buffer has enough space */ > + *num_stats = be32_to_cpu(header->num_statistics); > + if (*num_stats == 0) { /* Return all stats */ > + expected_buffsize = SCM_STATS_MIN_OUTPUT_BUFFER; > + } else if (*num_stats > SCM_STATS_MAX_STATS) { > + /* Too many stats requested */ > + return H_P3; I'd recommend testing and exiting on this error case before handling the all stats case. Disposing of error cases early is more idiomatic. You can then combine the all stats and n-stats cases a bit more nicely with something like: actual_numstats = (*num_stats) ? (*num_stats) : ARRAY_SIZE(...); Then use the same logic to compute the expected bufsize (min_bufsize might be a better name) in both cases. > + } else { /* Return a subset of stats */ > + expected_buffsize = sizeof(struct papr_scm_perf_stats) + > + (*num_stats) * sizeof(struct papr_scm_perf_stat); > + } > + > + if (size < expected_buffsize || size > SCM_STATS_MAX_OUTPUT_BUFFER) { > + return H_P3; I think you can avoid the MAX_OUTPUT_BUFFER check here... > + } > + > + return H_SUCCESS; > +} > + > +/* > + * For a given DRC index (R3) return one ore more performance stats of an nvdimm > + * device in guest allocated Request-and-result buffer (rr-buffer) (R4) of > + * given 'size' (R5). The rr-buffer consists of a header described by > + * 'struct papr_scm_perf_stats' that embeds the 'stats_version' and > + * 'num_statistics' fields. This is followed by an array of > + * 'struct papr_scm_perf_stat'. Based on the request type the writes the > + * performance into the array of 'struct papr_scm_perf_stat' embedded inside > + * the rr-buffer provided by the guest. > + * Special cases handled are: > + * 'size' == 0 : Return the maximum possible size of rr-buffer > + * 'size' != 0 && 'num_statistics == 0' : Return all possible performance stats > + * > + * In case there was an error fetching a specific stats (e.g stat-id unknown or > + * any other error) then return the stat-id in R4 and also replace its stat > + * entry in rr-buffer with 'NoopStat' > + */ > +static target_ulong h_scm_performance_stats(PowerPCCPU *cpu, > + SpaprMachineState *spapr, > + target_ulong opcode, > + target_ulong *args) > +{ > + SpaprDrc *drc = spapr_drc_by_index(args[0]); > + const hwaddr addr = args[1]; > + size_t size = args[2]; > + struct papr_scm_perf_stats *perfstats; > + struct papr_scm_perf_stat *stats; > + uint64_t invalid_stat = 0, stat_val; > + MemTxResult result; > + uint32_t num_stats; > + unsigned long rc; > + int index; > + > + /* Ensure that the drc is valid & is valid PMEM dimm and is plugged in */ > + if (!drc || !drc->dev || > + spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) { > + return H_PARAMETER; > + } > + > + /* Guest requested max buffer size for output buffer */ > + if (size == 0) { > + args[0] = SCM_STATS_MAX_OUTPUT_BUFFER; > + return H_SUCCESS; > + } > + > + /* verify size is enough to hold rr-buffer header */ > + if (size < sizeof(struct papr_scm_perf_stats)) { .. if you put it here instead, then you will have dealt with all obviously bad buffer sizes early. > + return H_BAD_DATA; > + } > + > + /* allocate enough buffer space locally for holding max stats */ > + perfstats = g_try_malloc0(SCM_STATS_MAX_OUTPUT_BUFFER); Then you can safely base this malloc on the given size, rather than always over-allocating. > + if (!perfstats) { > + return H_NO_MEM; > + } > + > + /* Read and verify rr-buffer header */ > + result = address_space_read(&address_space_memory, addr, > + MEMTXATTRS_UNSPECIFIED, perfstats, > + sizeof(struct papr_scm_perf_stats)); And you can also read the entire thing with a single memory read here. > + rc = (result == MEMTX_OK) ? > + scm_perf_check_rr_buffer(perfstats, addr, size, &num_stats) : > + H_PRIVILEGE; This is a bit cryptic. Just deal with the memtx error first, then run the buffer validation. Actually, you can unify the exit paths for these and the success case by using a goto label near the end which has the g_free() and return rc. > + if (rc != H_SUCCESS) { > + g_free(perfstats); > + return rc; > + } > + > + stats = &perfstats->scm_statistics[0]; > + /* when returning all stats generate a canned response first */ > + if (num_stats == 0) { > + for (index = 1; index < ARRAY_SIZE(nvdimm_perf_stats); ++index) { > + memcpy(&stats[index - 1].statistic_id, > + &nvdimm_perf_stats[index].stat_id, 8); > + } > + num_stats = ARRAY_SIZE(nvdimm_perf_stats) - 1; > + } else { > + /* copy the rr-buffer from the guest memory */ > + result = address_space_read(&address_space_memory, > + addr + sizeof(struct papr_scm_perf_stats), > + MEMTXATTRS_UNSPECIFIED, stats, > + size - sizeof(struct papr_scm_perf_stats)); > + if (result != MEMTX_OK) { > + g_free(perfstats); > + return H_PRIVILEGE; > + } > + } > + > + for (index = 0; index < num_stats; ++index) { > + rc = nvdimm_stat_getval(drc, stats[index].statistic_id, &stat_val); > + > + /* On error add noop stat to rr buffer & save last inval stat-id */ > + if (rc != H_SUCCESS) { > + if (!invalid_stat) { > + invalid_stat = stats[index].statistic_id; > + } > + memcpy(&stats[index].statistic_id, nvdimm_perf_stats[0].stat_id, 8); > + } > + /* Caller expects stat values in BE encoding */ > + stats[index].statistic_value = cpu_to_be64(stat_val); > + } > + > + /* Update and copy the local rr-buffer back to guest */ > + perfstats->num_statistics = cpu_to_be32(num_stats); > + g_assert(size <= SCM_STATS_MAX_OUTPUT_BUFFER); > + result = address_space_write(&address_space_memory, addr, > + MEMTXATTRS_UNSPECIFIED, perfstats, size); > + > + /* Cleanup the stats buffer */ > + g_free(perfstats); > + if (result) { > + return H_PRIVILEGE; > + } > + /* Check if there was a failure in fetching any stat */ > + args[0] = invalid_stat; > + return invalid_stat ? H_PARTIAL : H_SUCCESS; > +} > + > static void spapr_scm_register_types(void) > { > /* qemu/scm specific hcalls */ > @@ -511,6 +732,7 @@ static void spapr_scm_register_types(void) > spapr_register_hypercall(H_SCM_UNBIND_MEM, h_scm_unbind_mem); > spapr_register_hypercall(H_SCM_UNBIND_ALL, h_scm_unbind_all); > spapr_register_hypercall(H_SCM_HEALTH, h_scm_health); > + spapr_register_hypercall(H_SCM_PERFORMANCE_STATS, h_scm_performance_stats); > } > > type_init(spapr_scm_register_types) > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h > index d2b5a9bdf9..6f3353b4ea 100644 > --- a/include/hw/ppc/spapr.h > +++ b/include/hw/ppc/spapr.h > @@ -326,6 +326,7 @@ struct SpaprMachineState { > #define H_P8 -61 > #define H_P9 -62 > #define H_OVERLAP -68 > +#define H_BAD_DATA -70 > #define H_UNSUPPORTED_FLAG -256 > #define H_MULTI_THREADS_ACTIVE -9005 > > @@ -539,8 +540,9 @@ struct SpaprMachineState { > #define H_SCM_UNBIND_MEM 0x3F0 > #define H_SCM_UNBIND_ALL 0x3FC > #define H_SCM_HEALTH 0x400 > +#define H_SCM_PERFORMANCE_STATS 0x418 > > -#define MAX_HCALL_OPCODE H_SCM_HEALTH > +#define MAX_HCALL_OPCODE H_SCM_PERFORMANCE_STATS > > /* The hcalls above are standardized in PAPR and implemented by pHyp > * as well. > @@ -787,6 +789,21 @@ OBJECT_DECLARE_SIMPLE_TYPE(SpaprTceTable, SPAPR_TCE_TABLE) > DECLARE_INSTANCE_CHECKER(IOMMUMemoryRegion, SPAPR_IOMMU_MEMORY_REGION, > TYPE_SPAPR_IOMMU_MEMORY_REGION) > > +/* Defs and structs exchanged with guest when reporting drc perf stats */ > +#define SCM_STATS_EYECATCHER "SCMSTATS" > + > +struct QEMU_PACKED papr_scm_perf_stat { > + uint64_t statistic_id; > + uint64_t statistic_value; > +}; > + > +struct QEMU_PACKED papr_scm_perf_stats { > + uint8_t eye_catcher[8]; /* Should be “SCMSTATS” */ > + uint32_t stats_version; /* Should be 0x01 */ > + uint32_t num_statistics; /* Number of stats following */ > + struct papr_scm_perf_stat scm_statistics[]; /* Performance matrics */ > +}; > + > struct SpaprTceTable { > DeviceState parent; > uint32_t liobn; -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH v3] ppc/spapr: Add support for H_SCM_PERFORMANCE_STATS hcall 2021-05-17 6:23 ` David Gibson @ 2021-05-17 7:55 ` Greg Kurz 2021-05-17 8:16 ` David Gibson 2021-05-22 3:31 ` Vaibhav Jain 1 sibling, 1 reply; 6+ messages in thread From: Greg Kurz @ 2021-05-17 7:55 UTC (permalink / raw) To: David Gibson Cc: xiaoguangrong.eric, mst, aneesh.kumar, bharata, qemu-devel, kvm-ppc, shivaprasadbhat, qemu-ppc, imammedo, Vaibhav Jain, ehabkost [-- Attachment #1: Type: text/plain, Size: 657 bytes --] On Mon, 17 May 2021 16:23:56 +1000 David Gibson <david@gibson.dropbear.id.au> wrote: > On Sat, May 15, 2021 at 01:07:59PM +0530, Vaibhav Jain wrote: [...] > > + rc = (result == MEMTX_OK) ? > > + scm_perf_check_rr_buffer(perfstats, addr, size, &num_stats) : > > + H_PRIVILEGE; > > This is a bit cryptic. Just deal with the memtx error first, then run > the buffer validation. Actually, you can unify the exit paths for > these and the success case by using a goto label near the end which > has the g_free() and return rc. > It seems all the g_free() calls could even be avoided by converting perfstats to g_autofree. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH v3] ppc/spapr: Add support for H_SCM_PERFORMANCE_STATS hcall 2021-05-17 7:55 ` Greg Kurz @ 2021-05-17 8:16 ` David Gibson 0 siblings, 0 replies; 6+ messages in thread From: David Gibson @ 2021-05-17 8:16 UTC (permalink / raw) To: Greg Kurz Cc: xiaoguangrong.eric, mst, aneesh.kumar, bharata, qemu-devel, kvm-ppc, shivaprasadbhat, qemu-ppc, imammedo, Vaibhav Jain, ehabkost [-- Attachment #1: Type: text/plain, Size: 978 bytes --] On Mon, May 17, 2021 at 09:55:31AM +0200, Greg Kurz wrote: > On Mon, 17 May 2021 16:23:56 +1000 > David Gibson <david@gibson.dropbear.id.au> wrote: > > > On Sat, May 15, 2021 at 01:07:59PM +0530, Vaibhav Jain wrote: > [...] > > > + rc = (result == MEMTX_OK) ? > > > + scm_perf_check_rr_buffer(perfstats, addr, size, &num_stats) : > > > + H_PRIVILEGE; > > > > This is a bit cryptic. Just deal with the memtx error first, then run > > the buffer validation. Actually, you can unify the exit paths for > > these and the success case by using a goto label near the end which > > has the g_free() and return rc. > > > > It seems all the g_free() calls could even be avoided by > converting perfstats to g_autofree. That's an even better idea. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH v3] ppc/spapr: Add support for H_SCM_PERFORMANCE_STATS hcall 2021-05-17 6:23 ` David Gibson 2021-05-17 7:55 ` Greg Kurz @ 2021-05-22 3:31 ` Vaibhav Jain 2021-05-24 7:32 ` David Gibson 1 sibling, 1 reply; 6+ messages in thread From: Vaibhav Jain @ 2021-05-22 3:31 UTC (permalink / raw) To: David Gibson Cc: xiaoguangrong.eric, mst, aneesh.kumar, qemu-devel, kvm-ppc, groug, shivaprasadbhat, qemu-ppc, bharata, imammedo, ehabkost Thanks for looking into this patch David and Groug, David Gibson <david@gibson.dropbear.id.au> writes: > On Sat, May 15, 2021 at 01:07:59PM +0530, Vaibhav Jain wrote: >> Add support for H_SCM_PERFORMANCE_STATS described at [1] for >> spapr nvdimms. This enables guest to fetch performance stats[2] like >> expected life of an nvdimm ('MemLife ') etc and display them to the >> user. Linux kernel support for fetching these performance stats and >> exposing them to the user-space was done via [3]. >> >> The hcall semantics mandate that each nvdimm performance stats is >> uniquely identied by a 8-byte ascii string encoded as an unsigned >> integer (e.g 'MemLife ' == 0x4D656D4C69666520) and its value be a >> 8-byte unsigned integer. These performance-stats are exchanged with >> the guest in via a guest allocated buffer called >> 'requestAndResultBuffer' or rr-buffer for short. This buffer contains >> a header descibed by 'struct papr_scm_perf_stats' followed by an array >> of performance-stats described by 'struct papr_scm_perf_stat'. The >> hypervisor is expected to validate the rr-buffer header and then based >> on the request copy the needed performance-stats to the array of >> 'struct papr_scm_perf_stat' following the header. >> >> The patch proposes a new function h_scm_performance_stats() that >> services the H_SCM_PERFORMANCE_STATS hcall. After verifying the >> validity of the rr-buffer header via scm_perf_check_rr_buffer() it >> proceeds to fill the rr-buffer with requested performance-stats. The >> value of individual stats is retrived from individual accessor >> function for the stat which are indexed in the array >> 'nvdimm_perf_stats'. >> >> References: >> [1] "Hypercall Op-codes (hcalls)" >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/powerpc/papr_hcalls.rst#n269 >> [2] Sysfs attribute documentation for papr_scm >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-bus-papr-pmem#n36 >> [3] "powerpc/papr_scm: Fetch nvdimm performance stats from PHYP" >> https://lore.kernel.org/r/20200731064153.182203-2-vaibhav@linux.ibm.com >> >> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> >> --- >> Changelog >> >> Since RFC-v2: >> * s/SCM_STATS_MAX_OUTPUT_BUFFER/SCM_STATS_MIN_OUTPUT_BUFFER/ thats the >> minimum size buffer needed to return all supported performance >> stats. Value for this is derived from size of array 'nvdimm_perf_stats' >> * Added SCM_STATS_MAX_OUTPUT_BUFFER to indicate maximum supported >> rr-buffer size from a guest. Value for this is derived from hard >> limit of SCM_STATS_MAX_STATS. >> * Updated scm_perf_check_rr_buffer() to add a check for max size of >> rr-buffer. [David] >> * Updated scm_perf_check_rr_buffer() to removed a reduntant check for >> min size of rr-buffer [David] >> * Updated h_scm_performance_stats() to have a single allocation for >> rr-buffer and copy it back to guest memory in a single shot. [David] >> >> Since RFC-v1: >> * Removed empty lines from code [ David ] >> * Updated struct papr_scm_perf_stat to use uint64_t for >> statistic_id. >> * Added a hard limit on max number of stats requested to 255 [ David ] >> * Updated scm_perf_check_rr_buffer() to check for rr-buffer header >> size [ David ] >> * Removed a redundant check from nvdimm_stat_getval() [ David ] >> * Removed a redundant call to address_space_access_valid() in >> scm_perf_check_rr_buffer() [ David ] >> * Instead of allocating a minimum size local buffer, allocate a max >> possible size local rr-buffer. [ David ] >> * Updated nvdimm_stat_getval() to set 'val' to '0' on error. [ David ] >> * Updated h_scm_performance_stats() to use a canned-response method >> for simplifying num_stats==0 case [ David ]. >> --- >> hw/ppc/spapr_nvdimm.c | 222 +++++++++++++++++++++++++++++++++++++++++ >> include/hw/ppc/spapr.h | 19 +++- >> 2 files changed, 240 insertions(+), 1 deletion(-) >> >> diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c >> index 252204e25f..287326b9c0 100644 >> --- a/hw/ppc/spapr_nvdimm.c >> +++ b/hw/ppc/spapr_nvdimm.c >> @@ -35,6 +35,19 @@ >> /* SCM device is unable to persist memory contents */ >> #define PAPR_PMEM_UNARMED PPC_BIT(0) >> >> +/* Minimum output buffer size needed to return all nvdimm_perf_stats */ >> +#define SCM_STATS_MIN_OUTPUT_BUFFER (sizeof(struct papr_scm_perf_stats) + \ >> + sizeof(struct papr_scm_perf_stat) * \ >> + ARRAY_SIZE(nvdimm_perf_stats)) > > MIN_OUTPUT_BUFFER is a better name, but still not great. I think we > can get rid of this define completely in a neat way, though. See below. > > Not sure how we can get rid of it since we still need to advertise to the guest how much rr-buffer size we expect to return all perf-stats. Sorry but I didnt make out of any suggestions below that could get rid of this define. >> +/* Maximum number of stats that we can return back in a single stat request */ >> +#define SCM_STATS_MAX_STATS 255 > > Although it's technically allowed, I'm still not convinced there's > actually any reason to allow the user to request the same stat over > and over. > Matching the PowerVM behaviour here which doesnt enforce any limitations on the how many times a single perf-stat can appear in rr-buffer. >> +/* Maximum possible output buffer to fulfill a perf-stats request */ >> +#define SCM_STATS_MAX_OUTPUT_BUFFER (sizeof(struct papr_scm_perf_stats) + \ >> + sizeof(struct papr_scm_perf_stat) * \ >> + SCM_STATS_MAX_STATS) >> + >> bool spapr_nvdimm_validate(HotplugHandler *hotplug_dev, NVDIMMDevice *nvdimm, >> uint64_t size, Error **errp) >> { >> @@ -502,6 +515,214 @@ static target_ulong h_scm_health(PowerPCCPU *cpu, SpaprMachineState *spapr, >> return H_SUCCESS; >> } >> >> +static int perf_stat_noop(SpaprDrc *drc, uint64_t unused, uint64_t *val) >> +{ >> + *val = 0; >> + return H_SUCCESS; >> +} >> + >> +static int perf_stat_memlife(SpaprDrc *drc, uint64_t unused, uint64_t *val) >> +{ >> + /* Assume full life available of an NVDIMM right now */ >> + *val = 100; >> + return H_SUCCESS; >> +} >> + >> +/* >> + * Holds all supported performance stats accessors. Each performance-statistic >> + * is uniquely identified by a 8-byte ascii string for example: 'MemLife ' >> + * which indicate in percentage how much usage life of an nvdimm is remaining. >> + * 'NoopStat' which is primarily used to test support for retriving performance >> + * stats and also to replace unknown stats present in the rr-buffer. >> + * >> + */ >> +static const struct { >> + char stat_id[8]; > > So using a char[] here, but a uint64_t in the request structure makes > it pretty hard to follow if you're doing the right thing > w.r.t. endianness, especially since you effectively memcmp() directly > between u64s and char[]s. You really want to use a consistent type > for the ids. > Though the PAPR-SCM defines stat-ids as u64 they are essentially 8-byte ascii strings encoded in a u64. The guest kernel and this proposed qemu patch doesnt do any math operations on them which might be effected by their endianess. The switch from u64->char[8] is done only for a more convinent ASCII representation stats-ids in nvdimm_pref_stats[]. >> + int (*stat_getval)(SpaprDrc *drc, uint64_t id, uint64_t *val); >> +} nvdimm_perf_stats[] = { >> + { "NoopStat", perf_stat_noop}, >> + { "MemLife ", perf_stat_memlife}, >> +}; >> + >> +/* >> + * Given a nvdimm drc and stat-name return its value. In case given stat-name >> + * isnt supported then return H_PARTIAL. >> + */ >> +static int nvdimm_stat_getval(SpaprDrc *drc, uint64_t id, uint64_t *val) >> +{ >> + int index; >> + char stat_id[8]; >> + >> + /* since comparision is done */ >> + memcpy(&stat_id[0], &id, 8); > > I don't see why you're making this temporary copy here. > Agree, removed this in next iteration of this patch. >> + *val = 0; >> + >> + /* Lookup the stats-id in the nvdimm_perf_stats table */ >> + for (index = 0; index < ARRAY_SIZE(nvdimm_perf_stats); ++index) { >> + if (memcmp(nvdimm_perf_stats[index].stat_id, &stat_id[0], 8) == 0) { >> + return nvdimm_perf_stats[index].stat_getval(drc, id, val); >> + } >> + } >> + return H_PARTIAL; >> +} >> + >> +/* >> + * Given a request & result buffer header verify its contents. Also >> + * buffer-size and number of stats requested are within our expected >> + * sane bounds. >> + */ >> +static int scm_perf_check_rr_buffer(struct papr_scm_perf_stats *header, >> + hwaddr addr, size_t size, >> + uint32_t *num_stats) >> +{ >> + size_t expected_buffsize; >> + >> + /* Verify the header eyecather and version */ >> + if (memcmp(&header->eye_catcher, SCM_STATS_EYECATCHER, >> + sizeof(header->eye_catcher))) { >> + return H_BAD_DATA; >> + } >> + if (be32_to_cpu(header->stats_version) != 0x1) { >> + return H_NOT_AVAILABLE; >> + } >> + >> + /* verify that rr buffer has enough space */ >> + *num_stats = be32_to_cpu(header->num_statistics); >> + if (*num_stats == 0) { /* Return all stats */ >> + expected_buffsize = SCM_STATS_MIN_OUTPUT_BUFFER; >> + } else if (*num_stats > SCM_STATS_MAX_STATS) { >> + /* Too many stats requested */ >> + return H_P3; > > I'd recommend testing and exiting on this error case before handling > the all stats case. Disposing of error cases early is more idiomatic. > > You can then combine the all stats and n-stats cases a bit more nicely > with something like: > actual_numstats = (*num_stats) ? (*num_stats) : ARRAY_SIZE(...); > > Then use the same logic to compute the expected bufsize (min_bufsize > might be a better name) in both cases. > > Agree, have done the change you suggested in next iteration. >> + } else { /* Return a subset of stats */ >> + expected_buffsize = sizeof(struct papr_scm_perf_stats) + >> + (*num_stats) * sizeof(struct papr_scm_perf_stat); >> + } >> + >> + if (size < expected_buffsize || size > SCM_STATS_MAX_OUTPUT_BUFFER) { >> + return H_P3; > > I think you can avoid the MAX_OUTPUT_BUFFER check here... > Yes, moved the check to h_scm_performance_stats() in next iteration. >> + } >> + >> + return H_SUCCESS; >> +} >> + >> +/* >> + * For a given DRC index (R3) return one ore more performance stats of an nvdimm >> + * device in guest allocated Request-and-result buffer (rr-buffer) (R4) of >> + * given 'size' (R5). The rr-buffer consists of a header described by >> + * 'struct papr_scm_perf_stats' that embeds the 'stats_version' and >> + * 'num_statistics' fields. This is followed by an array of >> + * 'struct papr_scm_perf_stat'. Based on the request type the writes the >> + * performance into the array of 'struct papr_scm_perf_stat' embedded inside >> + * the rr-buffer provided by the guest. >> + * Special cases handled are: >> + * 'size' == 0 : Return the maximum possible size of rr-buffer >> + * 'size' != 0 && 'num_statistics == 0' : Return all possible performance stats >> + * >> + * In case there was an error fetching a specific stats (e.g stat-id unknown or >> + * any other error) then return the stat-id in R4 and also replace its stat >> + * entry in rr-buffer with 'NoopStat' >> + */ >> +static target_ulong h_scm_performance_stats(PowerPCCPU *cpu, >> + SpaprMachineState *spapr, >> + target_ulong opcode, >> + target_ulong *args) >> +{ >> + SpaprDrc *drc = spapr_drc_by_index(args[0]); >> + const hwaddr addr = args[1]; >> + size_t size = args[2]; >> + struct papr_scm_perf_stats *perfstats; >> + struct papr_scm_perf_stat *stats; >> + uint64_t invalid_stat = 0, stat_val; >> + MemTxResult result; >> + uint32_t num_stats; >> + unsigned long rc; >> + int index; >> + >> + /* Ensure that the drc is valid & is valid PMEM dimm and is plugged in */ >> + if (!drc || !drc->dev || >> + spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) { >> + return H_PARAMETER; >> + } >> + >> + /* Guest requested max buffer size for output buffer */ >> + if (size == 0) { >> + args[0] = SCM_STATS_MAX_OUTPUT_BUFFER; >> + return H_SUCCESS; >> + } >> + >> + /* verify size is enough to hold rr-buffer header */ >> + if (size < sizeof(struct papr_scm_perf_stats)) { > > > .. if you put it here instead, then you will have dealt with all > obviously bad buffer sizes early. > Agree >> + return H_BAD_DATA; >> + } >> + >> + /* allocate enough buffer space locally for holding max stats */ >> + perfstats = g_try_malloc0(SCM_STATS_MAX_OUTPUT_BUFFER); > > Then you can safely base this malloc on the given size, rather than > always over-allocating. > Right, have updated this in v4 >> + if (!perfstats) { >> + return H_NO_MEM; >> + } >> + >> + /* Read and verify rr-buffer header */ >> + result = address_space_read(&address_space_memory, addr, >> + MEMTXATTRS_UNSPECIFIED, perfstats, >> + sizeof(struct papr_scm_perf_stats)); > > And you can also read the entire thing with a single memory read here. > Yes agree. Addressed this in v4. >> + rc = (result == MEMTX_OK) ? >> + scm_perf_check_rr_buffer(perfstats, addr, size, &num_stats) : >> + H_PRIVILEGE; > > This is a bit cryptic. Just deal with the memtx error first, then run > the buffer validation. Actually, you can unify the exit paths for > these and the success case by using a goto label near the end which > has the g_free() and return rc. > Sure, addressed this in v4 by using g_autofree >> + if (rc != H_SUCCESS) { >> + g_free(perfstats); >> + return rc; >> + } >> + >> + stats = &perfstats->scm_statistics[0]; >> + /* when returning all stats generate a canned response first */ >> + if (num_stats == 0) { >> + for (index = 1; index < ARRAY_SIZE(nvdimm_perf_stats); ++index) { >> + memcpy(&stats[index - 1].statistic_id, >> + &nvdimm_perf_stats[index].stat_id, 8); >> + } >> + num_stats = ARRAY_SIZE(nvdimm_perf_stats) - 1; >> + } else { >> + /* copy the rr-buffer from the guest memory */ >> + result = address_space_read(&address_space_memory, >> + addr + sizeof(struct papr_scm_perf_stats), >> + MEMTXATTRS_UNSPECIFIED, stats, >> + size - sizeof(struct papr_scm_perf_stats)); >> + if (result != MEMTX_OK) { >> + g_free(perfstats); >> + return H_PRIVILEGE; >> + } >> + } >> + >> + for (index = 0; index < num_stats; ++index) { >> + rc = nvdimm_stat_getval(drc, stats[index].statistic_id, &stat_val); >> + >> + /* On error add noop stat to rr buffer & save last inval stat-id */ >> + if (rc != H_SUCCESS) { >> + if (!invalid_stat) { >> + invalid_stat = stats[index].statistic_id; >> + } >> + memcpy(&stats[index].statistic_id, nvdimm_perf_stats[0].stat_id, 8); >> + } >> + /* Caller expects stat values in BE encoding */ >> + stats[index].statistic_value = cpu_to_be64(stat_val); >> + } >> + >> + /* Update and copy the local rr-buffer back to guest */ >> + perfstats->num_statistics = cpu_to_be32(num_stats); >> + g_assert(size <= SCM_STATS_MAX_OUTPUT_BUFFER); >> + result = address_space_write(&address_space_memory, addr, >> + MEMTXATTRS_UNSPECIFIED, perfstats, size); >> + >> + /* Cleanup the stats buffer */ >> + g_free(perfstats); >> + if (result) { >> + return H_PRIVILEGE; >> + } >> + /* Check if there was a failure in fetching any stat */ >> + args[0] = invalid_stat; >> + return invalid_stat ? H_PARTIAL : H_SUCCESS; >> +} >> + >> static void spapr_scm_register_types(void) >> { >> /* qemu/scm specific hcalls */ >> @@ -511,6 +732,7 @@ static void spapr_scm_register_types(void) >> spapr_register_hypercall(H_SCM_UNBIND_MEM, h_scm_unbind_mem); >> spapr_register_hypercall(H_SCM_UNBIND_ALL, h_scm_unbind_all); >> spapr_register_hypercall(H_SCM_HEALTH, h_scm_health); >> + spapr_register_hypercall(H_SCM_PERFORMANCE_STATS, h_scm_performance_stats); >> } >> >> type_init(spapr_scm_register_types) >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h >> index d2b5a9bdf9..6f3353b4ea 100644 >> --- a/include/hw/ppc/spapr.h >> +++ b/include/hw/ppc/spapr.h >> @@ -326,6 +326,7 @@ struct SpaprMachineState { >> #define H_P8 -61 >> #define H_P9 -62 >> #define H_OVERLAP -68 >> +#define H_BAD_DATA -70 >> #define H_UNSUPPORTED_FLAG -256 >> #define H_MULTI_THREADS_ACTIVE -9005 >> >> @@ -539,8 +540,9 @@ struct SpaprMachineState { >> #define H_SCM_UNBIND_MEM 0x3F0 >> #define H_SCM_UNBIND_ALL 0x3FC >> #define H_SCM_HEALTH 0x400 >> +#define H_SCM_PERFORMANCE_STATS 0x418 >> >> -#define MAX_HCALL_OPCODE H_SCM_HEALTH >> +#define MAX_HCALL_OPCODE H_SCM_PERFORMANCE_STATS >> >> /* The hcalls above are standardized in PAPR and implemented by pHyp >> * as well. >> @@ -787,6 +789,21 @@ OBJECT_DECLARE_SIMPLE_TYPE(SpaprTceTable, SPAPR_TCE_TABLE) >> DECLARE_INSTANCE_CHECKER(IOMMUMemoryRegion, SPAPR_IOMMU_MEMORY_REGION, >> TYPE_SPAPR_IOMMU_MEMORY_REGION) >> >> +/* Defs and structs exchanged with guest when reporting drc perf stats */ >> +#define SCM_STATS_EYECATCHER "SCMSTATS" >> + >> +struct QEMU_PACKED papr_scm_perf_stat { >> + uint64_t statistic_id; >> + uint64_t statistic_value; >> +}; >> + >> +struct QEMU_PACKED papr_scm_perf_stats { >> + uint8_t eye_catcher[8]; /* Should be “SCMSTATS” */ >> + uint32_t stats_version; /* Should be 0x01 */ >> + uint32_t num_statistics; /* Number of stats following */ >> + struct papr_scm_perf_stat scm_statistics[]; /* Performance matrics */ >> +}; >> + >> struct SpaprTceTable { >> DeviceState parent; >> uint32_t liobn; > > -- > David Gibson | I'll have my music baroque, and my code > david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ > | _way_ _around_! > http://www.ozlabs.org/~dgibson -- Cheers ~ Vaibhav ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH v3] ppc/spapr: Add support for H_SCM_PERFORMANCE_STATS hcall 2021-05-22 3:31 ` Vaibhav Jain @ 2021-05-24 7:32 ` David Gibson 0 siblings, 0 replies; 6+ messages in thread From: David Gibson @ 2021-05-24 7:32 UTC (permalink / raw) To: Vaibhav Jain Cc: xiaoguangrong.eric, mst, aneesh.kumar, qemu-devel, kvm-ppc, groug, shivaprasadbhat, qemu-ppc, bharata, imammedo, ehabkost [-- Attachment #1: Type: text/plain, Size: 8549 bytes --] On Sat, May 22, 2021 at 09:01:26AM +0530, Vaibhav Jain wrote: > Thanks for looking into this patch David and Groug, > > David Gibson <david@gibson.dropbear.id.au> writes: > > On Sat, May 15, 2021 at 01:07:59PM +0530, Vaibhav Jain wrote: > >> Add support for H_SCM_PERFORMANCE_STATS described at [1] for > >> spapr nvdimms. This enables guest to fetch performance stats[2] like > >> expected life of an nvdimm ('MemLife ') etc and display them to the > >> user. Linux kernel support for fetching these performance stats and > >> exposing them to the user-space was done via [3]. > >> > >> The hcall semantics mandate that each nvdimm performance stats is > >> uniquely identied by a 8-byte ascii string encoded as an unsigned > >> integer (e.g 'MemLife ' == 0x4D656D4C69666520) and its value be a > >> 8-byte unsigned integer. These performance-stats are exchanged with > >> the guest in via a guest allocated buffer called > >> 'requestAndResultBuffer' or rr-buffer for short. This buffer contains > >> a header descibed by 'struct papr_scm_perf_stats' followed by an array > >> of performance-stats described by 'struct papr_scm_perf_stat'. The > >> hypervisor is expected to validate the rr-buffer header and then based > >> on the request copy the needed performance-stats to the array of > >> 'struct papr_scm_perf_stat' following the header. > >> > >> The patch proposes a new function h_scm_performance_stats() that > >> services the H_SCM_PERFORMANCE_STATS hcall. After verifying the > >> validity of the rr-buffer header via scm_perf_check_rr_buffer() it > >> proceeds to fill the rr-buffer with requested performance-stats. The > >> value of individual stats is retrived from individual accessor > >> function for the stat which are indexed in the array > >> 'nvdimm_perf_stats'. > >> > >> References: > >> [1] "Hypercall Op-codes (hcalls)" > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/powerpc/papr_hcalls.rst#n269 > >> [2] Sysfs attribute documentation for papr_scm > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-bus-papr-pmem#n36 > >> [3] "powerpc/papr_scm: Fetch nvdimm performance stats from PHYP" > >> https://lore.kernel.org/r/20200731064153.182203-2-vaibhav@linux.ibm.com > >> > >> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> > >> --- > >> Changelog > >> > >> Since RFC-v2: > >> * s/SCM_STATS_MAX_OUTPUT_BUFFER/SCM_STATS_MIN_OUTPUT_BUFFER/ thats the > >> minimum size buffer needed to return all supported performance > >> stats. Value for this is derived from size of array 'nvdimm_perf_stats' > >> * Added SCM_STATS_MAX_OUTPUT_BUFFER to indicate maximum supported > >> rr-buffer size from a guest. Value for this is derived from hard > >> limit of SCM_STATS_MAX_STATS. > >> * Updated scm_perf_check_rr_buffer() to add a check for max size of > >> rr-buffer. [David] > >> * Updated scm_perf_check_rr_buffer() to removed a reduntant check for > >> min size of rr-buffer [David] > >> * Updated h_scm_performance_stats() to have a single allocation for > >> rr-buffer and copy it back to guest memory in a single shot. [David] > >> > >> Since RFC-v1: > >> * Removed empty lines from code [ David ] > >> * Updated struct papr_scm_perf_stat to use uint64_t for > >> statistic_id. > >> * Added a hard limit on max number of stats requested to 255 [ David ] > >> * Updated scm_perf_check_rr_buffer() to check for rr-buffer header > >> size [ David ] > >> * Removed a redundant check from nvdimm_stat_getval() [ David ] > >> * Removed a redundant call to address_space_access_valid() in > >> scm_perf_check_rr_buffer() [ David ] > >> * Instead of allocating a minimum size local buffer, allocate a max > >> possible size local rr-buffer. [ David ] > >> * Updated nvdimm_stat_getval() to set 'val' to '0' on error. [ David ] > >> * Updated h_scm_performance_stats() to use a canned-response method > >> for simplifying num_stats==0 case [ David ]. > >> --- > >> hw/ppc/spapr_nvdimm.c | 222 +++++++++++++++++++++++++++++++++++++++++ > >> include/hw/ppc/spapr.h | 19 +++- > >> 2 files changed, 240 insertions(+), 1 deletion(-) > >> > >> diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c > >> index 252204e25f..287326b9c0 100644 > >> --- a/hw/ppc/spapr_nvdimm.c > >> +++ b/hw/ppc/spapr_nvdimm.c > >> @@ -35,6 +35,19 @@ > >> /* SCM device is unable to persist memory contents */ > >> #define PAPR_PMEM_UNARMED PPC_BIT(0) > >> > >> +/* Minimum output buffer size needed to return all nvdimm_perf_stats */ > >> +#define SCM_STATS_MIN_OUTPUT_BUFFER (sizeof(struct papr_scm_perf_stats) + \ > >> + sizeof(struct papr_scm_perf_stat) * \ > >> + ARRAY_SIZE(nvdimm_perf_stats)) > > > > MIN_OUTPUT_BUFFER is a better name, but still not great. I think we > > can get rid of this define completely in a neat way, though. See below. > > > > > Not sure how we can get rid of it since we still need to advertise to > the guest how much rr-buffer size we expect to return all > perf-stats. Sorry but I didnt make out of any suggestions below that > could get rid of this define. > > > >> +/* Maximum number of stats that we can return back in a single stat request */ > >> +#define SCM_STATS_MAX_STATS 255 > > > > Although it's technically allowed, I'm still not convinced there's > > actually any reason to allow the user to request the same stat over > > and over. > > > Matching the PowerVM behaviour here which doesnt enforce any limitations > on the how many times a single perf-stat can appear in rr-buffer. Hm, I guess matching PowerVM is worthwhile. Still can't imagine any case where a client would actually want to do so. > > >> +/* Maximum possible output buffer to fulfill a perf-stats request */ > >> +#define SCM_STATS_MAX_OUTPUT_BUFFER (sizeof(struct papr_scm_perf_stats) + \ > >> + sizeof(struct papr_scm_perf_stat) * \ > >> + SCM_STATS_MAX_STATS) > >> + > >> bool spapr_nvdimm_validate(HotplugHandler *hotplug_dev, NVDIMMDevice *nvdimm, > >> uint64_t size, Error **errp) > >> { > >> @@ -502,6 +515,214 @@ static target_ulong h_scm_health(PowerPCCPU *cpu, SpaprMachineState *spapr, > >> return H_SUCCESS; > >> } > >> > >> +static int perf_stat_noop(SpaprDrc *drc, uint64_t unused, uint64_t *val) > >> +{ > >> + *val = 0; > >> + return H_SUCCESS; > >> +} > >> + > >> +static int perf_stat_memlife(SpaprDrc *drc, uint64_t unused, uint64_t *val) > >> +{ > >> + /* Assume full life available of an NVDIMM right now */ > >> + *val = 100; > >> + return H_SUCCESS; > >> +} > >> + > >> +/* > >> + * Holds all supported performance stats accessors. Each performance-statistic > >> + * is uniquely identified by a 8-byte ascii string for example: 'MemLife ' > >> + * which indicate in percentage how much usage life of an nvdimm is remaining. > >> + * 'NoopStat' which is primarily used to test support for retriving performance > >> + * stats and also to replace unknown stats present in the rr-buffer. > >> + * > >> + */ > >> +static const struct { > >> + char stat_id[8]; > > > > So using a char[] here, but a uint64_t in the request structure makes > > it pretty hard to follow if you're doing the right thing > > w.r.t. endianness, especially since you effectively memcmp() directly > > between u64s and char[]s. You really want to use a consistent type > > for the ids. > > > Though the PAPR-SCM defines stat-ids as u64 they are essentially 8-byte > ascii strings encoded in a u64. Yes, I got that. The typing should still be consistent. > The guest kernel and this proposed qemu > patch doesnt do any math operations on them which might be effected by > their endianess. You do however return it in a register in at least one case, so you need to be careful about how that's loaded or stored. > The switch from u64->char[8] is done only for a more convinent > ASCII representation stats-ids in nvdimm_pref_stats[]. Sounds like it would make more sense to use char[8] everywhere, then. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-05-24 7:33 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-05-15 7:37 [RFC PATCH v3] ppc/spapr: Add support for H_SCM_PERFORMANCE_STATS hcall Vaibhav Jain 2021-05-17 6:23 ` David Gibson 2021-05-17 7:55 ` Greg Kurz 2021-05-17 8:16 ` David Gibson 2021-05-22 3:31 ` Vaibhav Jain 2021-05-24 7:32 ` David Gibson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).