* [PATCH v12 0/6] powerpc/papr_scm: Add support for reporting nvdimm health
From: Vaibhav Jain @ 2020-06-15 12:26 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Steven Rostedt, Oliver O'Halloran,
Aneesh Kumar K . V, Vaibhav Jain, Dan Williams, Ira Weiny
Changes since v11 [1]:
* Minor update to 'papr_pdsm.h' fixing a misleading comment about
'possible' padding being added by GCC which doesn't apply in case
structs are marked as __packed.
* Fix the order of initialization of 'struct nd_papr_pdsm_health' in
papr_pdsm_health().
* Added acks from Ira for various patches.
[1] https://lore.kernel.org/linux-nvdimm/20200607131339.476036-1-vaibhav@linux.ibm.com
---
The PAPR standard[2][4] provides mechanisms to query the health and
performance stats of an NVDIMM via various hcalls as described in
Ref[3]. Until now these stats were never available nor exposed to the
user-space tools like 'ndctl'. This is partly due to PAPR platform not
having support for ACPI and NFIT. Hence 'ndctl' is unable to query and
report the dimm health status and a user had no way to determine the
current health status of a NDVIMM.
To overcome this limitation, this patch-set updates papr_scm kernel
module to query and fetch NVDIMM health stats using hcalls described
in Ref[3]. This health and performance stats are then exposed to
userspace via sysfs and PAPR-NVDIMM-Specific-Methods(PDSM) issued by
libndctl.
These changes coupled with proposed ndtcl changes located at Ref[5]
should provide a way for the user to retrieve NVDIMM health status
using ndtcl.
Below is a sample output using proposed kernel + ndctl for PAPR NVDIMM
in a emulation environment:
# ndctl list -DH
[
{
"dev":"nmem0",
"health":{
"health_state":"fatal",
"shutdown_state":"dirty"
}
}
]
Dimm health report output on a pseries guest lpar with vPMEM or HMS
based NVDIMMs that are in perfectly healthy conditions:
# ndctl list -d nmem0 -H
[
{
"dev":"nmem0",
"health":{
"health_state":"ok",
"shutdown_state":"clean"
}
}
]
PAPR NVDIMM-Specific-Methods(PDSM)
==================================
PDSM requests are issued by vendor specific code in libndctl to
execute certain operations or fetch information from NVDIMMS. PDSMs
requests can be sent to papr_scm module via libndctl(userspace) and
libnvdimm (kernel) using the ND_CMD_CALL ioctl command which can be
handled in the dimm control function papr_scm_ndctl(). Current
patchset proposes a single PDSM to retrieve NVDIMM health, defined in
the newly introduced uapi header named 'papr_pdsm.h'. Support for
more PDSMs will be added in future.
Structure of the patch-set
==========================
The patch-set starts with a doc patch documenting details of hcall
H_SCM_HEALTH. Second patch exports kernel symbol seq_buf_printf()
thats used in subsequent patches to generate sysfs attribute content.
Third patch implements support for fetching NVDIMM health information
from PHYP and partially exposing it to user-space via a NVDIMM sysfs
flag.
Fourth patch updates papr_scm_ndctl() to handle a possible error case
and also improve debug logging.
Fifth patch deals with implementing support for servicing PDSM
commands in papr_scm module.
Finally the last patch implements support for servicing PDSM
'PAPR_PDSM_HEALTH' that returns the NVDIMM health information to
libndctl.
References:
[2] "Power Architecture Platform Reference"
https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
[3] commit 58b278f568f0
("powerpc: Provide initial documentation for PAPR hcalls")
[4] "Linux on Power Architecture Platform Reference"
https://members.openpowerfoundation.org/document/dl/469
[5] https://github.com/vaibhav92/ndctl/tree/papr_scm_health_v12
---
Vaibhav Jain (6):
powerpc: Document details on H_SCM_HEALTH hcall
seq_buf: Export seq_buf_printf
powerpc/papr_scm: Fetch nvdimm health information from PHYP
powerpc/papr_scm: Improve error logging and handling papr_scm_ndctl()
ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH
Documentation/ABI/testing/sysfs-bus-papr-pmem | 27 ++
Documentation/powerpc/papr_hcalls.rst | 46 ++-
arch/powerpc/include/uapi/asm/papr_pdsm.h | 125 ++++++
arch/powerpc/platforms/pseries/papr_scm.c | 373 +++++++++++++++++-
include/uapi/linux/ndctl.h | 1 +
lib/seq_buf.c | 1 +
6 files changed, 562 insertions(+), 11 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem
create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h
--
2.26.2
^ permalink raw reply
* [PATCH v12 1/6] powerpc: Document details on H_SCM_HEALTH hcall
From: Vaibhav Jain @ 2020-06-15 12:26 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Steven Rostedt, Oliver O'Halloran,
Aneesh Kumar K . V, Vaibhav Jain, Dan Williams, Ira Weiny
In-Reply-To: <20200615122644.31887-1-vaibhav@linux.ibm.com>
Add documentation to 'papr_hcalls.rst' describing the bitmap flags
that are returned from H_SCM_HEALTH hcall as per the PAPR-SCM
specification.
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ira Weiny <ira.weiny@intel.com>
Acked-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Changelog:
v11..v12:
* None
v10..v11:
* None
v9..v10:
* Added ack from Ira.
Resend:
* None
v8..v9:
* s/SCM/PMEM device. [ Dan Williams, Aneesh ]
v7..v8:
* Added a clarification on bit-ordering of Health Bitmap
Resend:
* None
v6..v7:
* None
v5..v6:
* New patch in the series
---
Documentation/powerpc/papr_hcalls.rst | 46 ++++++++++++++++++++++++---
1 file changed, 42 insertions(+), 4 deletions(-)
diff --git a/Documentation/powerpc/papr_hcalls.rst b/Documentation/powerpc/papr_hcalls.rst
index 3493631a60f8..48fcf1255a33 100644
--- a/Documentation/powerpc/papr_hcalls.rst
+++ b/Documentation/powerpc/papr_hcalls.rst
@@ -220,13 +220,51 @@ from the LPAR memory.
**H_SCM_HEALTH**
| Input: drcIndex
-| Out: *health-bitmap, health-bit-valid-bitmap*
+| Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)*
| Return Value: *H_Success, H_Parameter, H_Hardware*
Given a DRC Index return the info on predictive failure and overall health of
-the NVDIMM. The asserted bits in the health-bitmap indicate a single predictive
-failure and health-bit-valid-bitmap indicate which bits in health-bitmap are
-valid.
+the PMEM device. The asserted bits in the health-bitmap indicate one or more states
+(described in table below) of the PMEM device and health-bit-valid-bitmap indicate
+which bits in health-bitmap are valid. The bits are reported in
+reverse bit ordering for example a value of 0xC400000000000000
+indicates bits 0, 1, and 5 are valid.
+
+Health Bitmap Flags:
+
++------+-----------------------------------------------------------------------+
+| Bit | Definition |
++======+=======================================================================+
+| 00 | PMEM device is unable to persist memory contents. |
+| | If the system is powered down, nothing will be saved. |
++------+-----------------------------------------------------------------------+
+| 01 | PMEM device failed to persist memory contents. Either contents were |
+| | not saved successfully on power down or were not restored properly on |
+| | power up. |
++------+-----------------------------------------------------------------------+
+| 02 | PMEM device contents are persisted from previous IPL. The data from |
+| | the last boot were successfully restored. |
++------+-----------------------------------------------------------------------+
+| 03 | PMEM device contents are not persisted from previous IPL. There was no|
+| | data to restore from the last boot. |
++------+-----------------------------------------------------------------------+
+| 04 | PMEM device memory life remaining is critically low |
++------+-----------------------------------------------------------------------+
+| 05 | PMEM device will be garded off next IPL due to failure |
++------+-----------------------------------------------------------------------+
+| 06 | PMEM device contents cannot persist due to current platform health |
+| | status. A hardware failure may prevent data from being saved or |
+| | restored. |
++------+-----------------------------------------------------------------------+
+| 07 | PMEM device is unable to persist memory contents in certain conditions|
++------+-----------------------------------------------------------------------+
+| 08 | PMEM device is encrypted |
++------+-----------------------------------------------------------------------+
+| 09 | PMEM device has successfully completed a requested erase or secure |
+| | erase procedure. |
++------+-----------------------------------------------------------------------+
+|10:63 | Reserved / Unused |
++------+-----------------------------------------------------------------------+
**H_SCM_PERFORMANCE_STATS**
--
2.26.2
^ permalink raw reply related
* [PATCH v12 3/6] powerpc/papr_scm: Fetch nvdimm health information from PHYP
From: Vaibhav Jain @ 2020-06-15 12:26 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Steven Rostedt, Oliver O'Halloran,
Aneesh Kumar K . V, Vaibhav Jain, Dan Williams, Ira Weiny
In-Reply-To: <20200615122644.31887-1-vaibhav@linux.ibm.com>
Implement support for fetching nvdimm health information via
H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair
of 64-bit bitmap, bitwise-and of which is then stored in
'struct papr_scm_priv' and subsequently partially exposed to
user-space via newly introduced dimm specific attribute
'papr/flags'. Since the hcall is costly, the health information is
cached and only re-queried, 60s after the previous successful hcall.
The patch also adds a documentation text describing flags reported by
the the new sysfs attribute 'papr/flags' is also introduced at
Documentation/ABI/testing/sysfs-bus-papr-pmem.
[1] commit 58b278f568f0 ("powerpc: Provide initial documentation for
PAPR hcalls")
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Changelog:
v11..v12:
* None
v10..v11:
* None
v9..v10:
* Removed an avoidable 'goto' in __drc_pmem_query_health. [ Ira ].
Resend:
* Added ack from Aneesh.
v8..v9:
* Rename some variables and defines to reduce usage of term SCM
replacing it with PMEM [Dan Williams, Aneesh]
* s/PAPR_SCM_DIMM/PAPR_PMEM/g
* s/papr_scm_nd_attributes/papr_nd_attributes/g
* s/papr_scm_nd_attribute_group/papr_nd_attribute_group/g
* s/papr_scm_dimm_attr_groups/papr_nd_attribute_groups/g
* Renamed file sysfs-bus-papr-scm to sysfs-bus-papr-pmem
v7..v8:
* Update type of variable 'rc' in __drc_pmem_query_health() and
drc_pmem_query_health() to long and int respectively. [ Ira ]
* Updated the patch description to s/64 bit Big Endian Number/64-bit
bitmap/ [ Ira, Aneesh ].
Resend:
* None
v6..v7 :
* Used the exported buf_seq_printf() function to generate content for
'papr/flags'
* Moved the PAPR_SCM_DIMM_* bit-flags macro definitions to papr_scm.c
and removed the papr_scm.h file [Mpe]
* Some minor consistency issued in sysfs-bus-papr-scm
documentation. [Mpe]
* s/dimm_mutex/health_mutex/g [Mpe]
* Split drc_pmem_query_health() into two function one of which takes
care of caching and locking. [Mpe]
* Fixed a local copy creation of dimm health information using
READ_ONCE(). [Mpe]
v5..v6 :
* Change the flags sysfs attribute from 'papr_flags' to 'papr/flags'
[Dan Williams]
* Include documentation for 'papr/flags' attr [Dan Williams]
* Change flag 'save_fail' to 'flush_fail' [Dan Williams]
* Caching of health bitmap to reduce expensive hcalls [Dan Williams]
* Removed usage of PPC_BIT from 'papr-scm.h' header [Mpe]
* Replaced two __be64 integers from papr_scm_priv to a single u64
integer [Mpe]
* Updated patch description to reflect the changes made in this
version.
* Removed avoidable usage of 'papr_scm_priv.dimm_mutex' from
flags_show() [Dan Williams]
v4..v5 :
* None
v3..v4 :
* None
v2..v3 :
* Removed PAPR_SCM_DIMM_HEALTH_NON_CRITICAL as a condition for
NVDIMM unarmed [Aneesh]
v1..v2 :
* New patch in the series.
---
Documentation/ABI/testing/sysfs-bus-papr-pmem | 27 +++
arch/powerpc/platforms/pseries/papr_scm.c | 168 +++++++++++++++++-
2 files changed, 193 insertions(+), 2 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem
diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem b/Documentation/ABI/testing/sysfs-bus-papr-pmem
new file mode 100644
index 000000000000..5b10d036a8d4
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem
@@ -0,0 +1,27 @@
+What: /sys/bus/nd/devices/nmemX/papr/flags
+Date: Apr, 2020
+KernelVersion: v5.8
+Contact: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, linux-nvdimm@lists.01.org,
+Description:
+ (RO) Report flags indicating various states of a
+ papr-pmem NVDIMM device. Each flag maps to a one or
+ more bits set in the dimm-health-bitmap retrieved in
+ response to H_SCM_HEALTH hcall. The details of the bit
+ flags returned in response to this hcall is available
+ at 'Documentation/powerpc/papr_hcalls.rst' . Below are
+ the flags reported in this sysfs file:
+
+ * "not_armed" : Indicates that NVDIMM contents will not
+ survive a power cycle.
+ * "flush_fail" : Indicates that NVDIMM contents
+ couldn't be flushed during last
+ shut-down event.
+ * "restore_fail": Indicates that NVDIMM contents
+ couldn't be restored during NVDIMM
+ initialization.
+ * "encrypted" : NVDIMM contents are encrypted.
+ * "smart_notify": There is health event for the NVDIMM.
+ * "scrubbed" : Indicating that contents of the
+ NVDIMM have been scrubbed.
+ * "locked" : Indicating that NVDIMM contents cant
+ be modified until next power cycle.
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index f35592423380..0c091622b15e 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -12,6 +12,7 @@
#include <linux/libnvdimm.h>
#include <linux/platform_device.h>
#include <linux/delay.h>
+#include <linux/seq_buf.h>
#include <asm/plpar_wrappers.h>
@@ -22,6 +23,44 @@
(1ul << ND_CMD_GET_CONFIG_DATA) | \
(1ul << ND_CMD_SET_CONFIG_DATA))
+/* DIMM health bitmap bitmap indicators */
+/* SCM device is unable to persist memory contents */
+#define PAPR_PMEM_UNARMED (1ULL << (63 - 0))
+/* SCM device failed to persist memory contents */
+#define PAPR_PMEM_SHUTDOWN_DIRTY (1ULL << (63 - 1))
+/* SCM device contents are persisted from previous IPL */
+#define PAPR_PMEM_SHUTDOWN_CLEAN (1ULL << (63 - 2))
+/* SCM device contents are not persisted from previous IPL */
+#define PAPR_PMEM_EMPTY (1ULL << (63 - 3))
+/* SCM device memory life remaining is critically low */
+#define PAPR_PMEM_HEALTH_CRITICAL (1ULL << (63 - 4))
+/* SCM device will be garded off next IPL due to failure */
+#define PAPR_PMEM_HEALTH_FATAL (1ULL << (63 - 5))
+/* SCM contents cannot persist due to current platform health status */
+#define PAPR_PMEM_HEALTH_UNHEALTHY (1ULL << (63 - 6))
+/* SCM device is unable to persist memory contents in certain conditions */
+#define PAPR_PMEM_HEALTH_NON_CRITICAL (1ULL << (63 - 7))
+/* SCM device is encrypted */
+#define PAPR_PMEM_ENCRYPTED (1ULL << (63 - 8))
+/* SCM device has been scrubbed and locked */
+#define PAPR_PMEM_SCRUBBED_AND_LOCKED (1ULL << (63 - 9))
+
+/* Bits status indicators for health bitmap indicating unarmed dimm */
+#define PAPR_PMEM_UNARMED_MASK (PAPR_PMEM_UNARMED | \
+ PAPR_PMEM_HEALTH_UNHEALTHY)
+
+/* Bits status indicators for health bitmap indicating unflushed dimm */
+#define PAPR_PMEM_BAD_SHUTDOWN_MASK (PAPR_PMEM_SHUTDOWN_DIRTY)
+
+/* Bits status indicators for health bitmap indicating unrestored dimm */
+#define PAPR_PMEM_BAD_RESTORE_MASK (PAPR_PMEM_EMPTY)
+
+/* Bit status indicators for smart event notification */
+#define PAPR_PMEM_SMART_EVENT_MASK (PAPR_PMEM_HEALTH_CRITICAL | \
+ PAPR_PMEM_HEALTH_FATAL | \
+ PAPR_PMEM_HEALTH_UNHEALTHY)
+
+/* private struct associated with each region */
struct papr_scm_priv {
struct platform_device *pdev;
struct device_node *dn;
@@ -39,6 +78,15 @@ struct papr_scm_priv {
struct resource res;
struct nd_region *region;
struct nd_interleave_set nd_set;
+
+ /* Protect dimm health data from concurrent read/writes */
+ struct mutex health_mutex;
+
+ /* Last time the health information of the dimm was updated */
+ unsigned long lasthealth_jiffies;
+
+ /* Health information for the dimm */
+ u64 health_bitmap;
};
static int drc_pmem_bind(struct papr_scm_priv *p)
@@ -144,6 +192,61 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
return drc_pmem_bind(p);
}
+/*
+ * Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
+ * health information.
+ */
+static int __drc_pmem_query_health(struct papr_scm_priv *p)
+{
+ unsigned long ret[PLPAR_HCALL_BUFSIZE];
+ long rc;
+
+ /* issue the hcall */
+ rc = plpar_hcall(H_SCM_HEALTH, ret, p->drc_index);
+ if (rc != H_SUCCESS) {
+ dev_err(&p->pdev->dev,
+ "Failed to query health information, Err:%ld\n", rc);
+ return -ENXIO;
+ }
+
+ p->lasthealth_jiffies = jiffies;
+ p->health_bitmap = ret[0] & ret[1];
+
+ dev_dbg(&p->pdev->dev,
+ "Queried dimm health info. Bitmap:0x%016lx Mask:0x%016lx\n",
+ ret[0], ret[1]);
+
+ return 0;
+}
+
+/* Min interval in seconds for assuming stable dimm health */
+#define MIN_HEALTH_QUERY_INTERVAL 60
+
+/* Query cached health info and if needed call drc_pmem_query_health */
+static int drc_pmem_query_health(struct papr_scm_priv *p)
+{
+ unsigned long cache_timeout;
+ int rc;
+
+ /* Protect concurrent modifications to papr_scm_priv */
+ rc = mutex_lock_interruptible(&p->health_mutex);
+ if (rc)
+ return rc;
+
+ /* Jiffies offset for which the health data is assumed to be same */
+ cache_timeout = p->lasthealth_jiffies +
+ msecs_to_jiffies(MIN_HEALTH_QUERY_INTERVAL * 1000);
+
+ /* Fetch new health info is its older than MIN_HEALTH_QUERY_INTERVAL */
+ if (time_after(jiffies, cache_timeout))
+ rc = __drc_pmem_query_health(p);
+ else
+ /* Assume cached health data is valid */
+ rc = 0;
+
+ mutex_unlock(&p->health_mutex);
+ return rc;
+}
static int papr_scm_meta_get(struct papr_scm_priv *p,
struct nd_cmd_get_config_data_hdr *hdr)
@@ -286,6 +389,64 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
return 0;
}
+static ssize_t flags_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvdimm *dimm = to_nvdimm(dev);
+ struct papr_scm_priv *p = nvdimm_provider_data(dimm);
+ struct seq_buf s;
+ u64 health;
+ int rc;
+
+ rc = drc_pmem_query_health(p);
+ if (rc)
+ return rc;
+
+ /* Copy health_bitmap locally, check masks & update out buffer */
+ health = READ_ONCE(p->health_bitmap);
+
+ seq_buf_init(&s, buf, PAGE_SIZE);
+ if (health & PAPR_PMEM_UNARMED_MASK)
+ seq_buf_printf(&s, "not_armed ");
+
+ if (health & PAPR_PMEM_BAD_SHUTDOWN_MASK)
+ seq_buf_printf(&s, "flush_fail ");
+
+ if (health & PAPR_PMEM_BAD_RESTORE_MASK)
+ seq_buf_printf(&s, "restore_fail ");
+
+ if (health & PAPR_PMEM_ENCRYPTED)
+ seq_buf_printf(&s, "encrypted ");
+
+ if (health & PAPR_PMEM_SMART_EVENT_MASK)
+ seq_buf_printf(&s, "smart_notify ");
+
+ if (health & PAPR_PMEM_SCRUBBED_AND_LOCKED)
+ seq_buf_printf(&s, "scrubbed locked ");
+
+ if (seq_buf_used(&s))
+ seq_buf_printf(&s, "\n");
+
+ return seq_buf_used(&s);
+}
+DEVICE_ATTR_RO(flags);
+
+/* papr_scm specific dimm attributes */
+static struct attribute *papr_nd_attributes[] = {
+ &dev_attr_flags.attr,
+ NULL,
+};
+
+static struct attribute_group papr_nd_attribute_group = {
+ .name = "papr",
+ .attrs = papr_nd_attributes,
+};
+
+static const struct attribute_group *papr_nd_attr_groups[] = {
+ &papr_nd_attribute_group,
+ NULL,
+};
+
static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
{
struct device *dev = &p->pdev->dev;
@@ -312,8 +473,8 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
dimm_flags = 0;
set_bit(NDD_LABELING, &dimm_flags);
- p->nvdimm = nvdimm_create(p->bus, p, NULL, dimm_flags,
- PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
+ p->nvdimm = nvdimm_create(p->bus, p, papr_nd_attr_groups,
+ dimm_flags, PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
if (!p->nvdimm) {
dev_err(dev, "Error creating DIMM object for %pOF\n", p->dn);
goto err;
@@ -399,6 +560,9 @@ static int papr_scm_probe(struct platform_device *pdev)
if (!p)
return -ENOMEM;
+ /* Initialize the dimm mutex */
+ mutex_init(&p->health_mutex);
+
/* optional DT properties */
of_property_read_u32(dn, "ibm,metadata-size", &metadata_size);
--
2.26.2
^ permalink raw reply related
* [PATCH v12 2/6] seq_buf: Export seq_buf_printf
From: Vaibhav Jain @ 2020-06-15 12:26 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Cezary Rojewski, Piotr Maziarz, Steven Rostedt,
Christoph Hellwig, Oliver O'Halloran, Aneesh Kumar K . V,
Borislav Petkov, Vaibhav Jain, Dan Williams, Ira Weiny
In-Reply-To: <20200615122644.31887-1-vaibhav@linux.ibm.com>
'seq_buf' provides a very useful abstraction for writing to a string
buffer without needing to worry about it over-flowing. However even
though the API has been stable for couple of years now its still not
exported to kernel loadable modules limiting its usage.
Hence this patch proposes update to 'seq_buf.c' to mark
seq_buf_printf() which is part of the seq_buf API to be exported to
kernel loadable GPL modules. This symbol will be used in later parts
of this patch-set to simplify content creation for a sysfs attribute.
Cc: Piotr Maziarz <piotrx.maziarz@linux.intel.com>
Cc: Cezary Rojewski <cezary.rojewski@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@alien8.de>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Changelog:
v11..v12:
* None
v10..v11:
* None
v9..v10:
* None
Resend:
* Added ack from Steven Rostedt
v8..v9:
* None
v7..v8:
* Updated the patch title [ Christoph Hellwig ]
* Updated patch description to replace confusing term 'external kernel
modules' to 'kernel lodable modules'.
Resend:
* Added ack from Steven Rostedt
v6..v7:
* New patch in the series
---
lib/seq_buf.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/lib/seq_buf.c b/lib/seq_buf.c
index 4e865d42ab03..707453f5d58e 100644
--- a/lib/seq_buf.c
+++ b/lib/seq_buf.c
@@ -91,6 +91,7 @@ int seq_buf_printf(struct seq_buf *s, const char *fmt, ...)
return ret;
}
+EXPORT_SYMBOL_GPL(seq_buf_printf);
#ifdef CONFIG_BINARY_PRINTF
/**
--
2.26.2
^ permalink raw reply related
* [PATCH v12 4/6] powerpc/papr_scm: Improve error logging and handling papr_scm_ndctl()
From: Vaibhav Jain @ 2020-06-15 12:26 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Steven Rostedt, Oliver O'Halloran,
Aneesh Kumar K . V, Vaibhav Jain, Dan Williams, Ira Weiny
In-Reply-To: <20200615122644.31887-1-vaibhav@linux.ibm.com>
Since papr_scm_ndctl() can be called from outside papr_scm, its
exposed to the possibility of receiving NULL as value of 'cmd_rc'
argument. This patch updates papr_scm_ndctl() to protect against such
possibility by assigning it pointer to a local variable in case cmd_rc
== NULL.
Finally the patch also updates the 'default' add a debug log unknown
'cmd' values.
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Changelog:
v11..v12:
* Added ack from Ira
v10..v11:
* Instead of returning *cmd_rd just return '0' in case nd_cmd is
handled. In case of unknown nd-cmd return -EINVAL
[ Ira and Dan Williams ]
* Updated patch description.
v9..v10
* New patch in the series
---
arch/powerpc/platforms/pseries/papr_scm.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index 0c091622b15e..692ad3d79826 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -355,11 +355,16 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
{
struct nd_cmd_get_config_size *get_size_hdr;
struct papr_scm_priv *p;
+ int rc;
/* Only dimm-specific calls are supported atm */
if (!nvdimm)
return -EINVAL;
+ /* Use a local variable in case cmd_rc pointer is NULL */
+ if (!cmd_rc)
+ cmd_rc = &rc;
+
p = nvdimm_provider_data(nvdimm);
switch (cmd) {
@@ -381,6 +386,7 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
break;
default:
+ dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd);
return -EINVAL;
}
--
2.26.2
^ permalink raw reply related
* [PATCH v12 5/6] ndctl/papr_scm, uapi: Add support for PAPR nvdimm specific methods
From: Vaibhav Jain @ 2020-06-15 12:26 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Steven Rostedt, Oliver O'Halloran,
Aneesh Kumar K . V, Vaibhav Jain, Dan Williams, Ira Weiny
In-Reply-To: <20200615122644.31887-1-vaibhav@linux.ibm.com>
Introduce support for PAPR NVDIMM Specific Methods (PDSM) in papr_scm
module and add the command family NVDIMM_FAMILY_PAPR to the white list
of NVDIMM command sets. Also advertise support for ND_CMD_CALL for the
nvdimm command mask and implement necessary scaffolding in the module
to handle ND_CMD_CALL ioctl and PDSM requests that we receive.
The layout of the PDSM request as we expect from libnvdimm/libndctl is
described in newly introduced uapi header 'papr_pdsm.h' which
defines a new 'struct nd_pdsm_cmd_pkg' header. This header is used
to communicate the PDSM request via member
'nd_cmd_pkg.nd_command' and size of payload that need to be
sent/received for servicing the PDSM.
A new function is_cmd_valid() is implemented that reads the args to
papr_scm_ndctl() and performs sanity tests on them. A new function
papr_scm_service_pdsm() is introduced and is called from
papr_scm_ndctl() in case of a PDSM request is received via ND_CMD_CALL
command from libnvdimm.
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Changelog:
v11..v12:
* Updated a misleading comment in 'papr_pdsm.h' regarding payload
size. [ Ira ]
v10..v11:
* Moved in-lines 'nd_pdsm_cmd_pkg()' and 'pdsm_cmd_to_payload()' from
'papr_pdsm.h' header to 'papr_scm.c'. The avoids a potential license
incompatibility issue with non-GPL-2.0 user-space code trying to
include the header in its code. [ Ira ]
* Verified papr_pdsm.h with UAPI_HEADER_TEST config.
* Moved the is_cmd_valid() check in papr_scm_ndctl() before check for
cmd_rc == NULL. This prevents cmd_rc to be updated in case the
nd-cmd is invalid or unknown.
v9..v10:
* Simplified 'struct nd_pdsm_cmd_pkg' by removing the
'payload_version' field.
* Removed the corrosponding documentation on versioning and backward
compatibility from 'papr_pdsm.h'
* Reduced the size of reserved fields to 4-bytes making 'struct
nd_pdsm_cmd_pkg' 64 + 8 bytes long.
* Updated is_cmd_valid() to enforce validation checks on pdsm
commands. [ Dan Williams ]
* Added check for reserved fields being set to '0' in is_cmd_valid()
[ Ira ]
* Moved changes for checking cmd_rc == NULL and logging improvements
to a separate prelim patch [ Ira ].
* Moved pdsm package validation checks from papr_scm_service_pdsm()
to is_cmd_valid().
* Marked papr_scm_service_pdsm() return type as 'void' since errors
are reported in nd_pdsm_cmd_pkg.cmd_status field.
Resend:
* Added ack from Aneesh.
v8..v9:
* Reduced the usage of term SCM replacing it with appropriate
replacement [ Dan Williams, Aneesh ]
* Renamed 'papr_scm_pdsm.h' to 'papr_pdsm.h'
* s/PAPR_SCM_PDSM_*/PAPR_PDSM_*/g
* s/NVDIMM_FAMILY_PAPR_SCM/NVDIMM_FAMILY_PAPR/g
* Minor updates to 'papr_psdm.h' to replace usage of term 'SCM'.
* Minor update to patch description.
v7..v8:
* Removed the 'payload_offset' field from 'struct
nd_pdsm_cmd_pkg'. Instead command payload is always assumed to start
at 'nd_pdsm_cmd_pkg.payload'. [ Aneesh ]
* To enable introducing new fields to 'struct nd_pdsm_cmd_pkg',
'reserved' field of 10-bytes is introduced. [ Aneesh ]
* Fixed a typo in "Backward Compatibility" section of papr_scm_pdsm.h
[ Ira ]
Resend:
* None
v6..v7 :
* Removed the re-definitions of __packed macro from papr_scm_pdsm.h
[Mpe].
* Removed the usage of __KERNEL__ macros in papr_scm_pdsm.h [Mpe].
* Removed macros that were unused in papr_scm.c from papr_scm_pdsm.h
[Mpe].
* Made functions defined in papr_scm_pdsm.h as static inline. [Mpe]
v5..v6 :
* Changed the usage of the term DSM to PDSM to distinguish it from the
ACPI term [ Dan Williams ]
* Renamed papr_scm_dsm.h to papr_scm_pdsm.h and updated various struct
to reflect the new terminology.
* Updated the patch description and title to reflect the new terminology.
* Squashed patch to introduce new command family in 'ndctl.h' with
this patch [ Dan Williams ]
* Updated the papr_scm_pdsm method starting index from 0x10000 to 0x0
[ Dan Williams ]
* Removed redundant license text from the papr_scm_psdm.h file.
[ Dan Williams ]
* s/envelop/envelope/ at various places [ Dan Williams ]
* Added '__packed' attribute to command package header to gaurd
against different compiler adding paddings between the fields.
[ Dan Williams]
* Converted various pr_debug to dev_debug [ Dan Williams ]
v4..v5 :
* None
v3..v4 :
* None
v2..v3 :
* Updated the patch prefix to 'ndctl/uapi' [Aneesh]
v1..v2 :
* None
---
arch/powerpc/include/uapi/asm/papr_pdsm.h | 82 ++++++++++++++
arch/powerpc/platforms/pseries/papr_scm.c | 126 +++++++++++++++++++++-
include/uapi/linux/ndctl.h | 1 +
3 files changed, 205 insertions(+), 4 deletions(-)
create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h
diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h
new file mode 100644
index 000000000000..34d1a41d2406
--- /dev/null
+++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
@@ -0,0 +1,82 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * PAPR nvDimm Specific Methods (PDSM) and structs for libndctl
+ *
+ * (C) Copyright IBM 2020
+ *
+ * Author: Vaibhav Jain <vaibhav at linux.ibm.com>
+ */
+
+#ifndef _UAPI_ASM_POWERPC_PAPR_PDSM_H_
+#define _UAPI_ASM_POWERPC_PAPR_PDSM_H_
+
+#include <linux/types.h>
+#include <linux/ndctl.h>
+
+/*
+ * PDSM Envelope:
+ *
+ * The ioctl ND_CMD_CALL transfers data between user-space and kernel via
+ * envelope which consists of a header and user-defined payload sections.
+ * The header is described by 'struct nd_pdsm_cmd_pkg' which expects a
+ * payload following it and accessible via 'nd_pdsm_cmd_pkg.payload' field.
+ * There is reserved field that can used to introduce new fields to the
+ * structure in future. It also tries to ensure that 'nd_pdsm_cmd_pkg.payload'
+ * lies at a 8-byte boundary.
+ *
+ * +-------------+---------------------+---------------------------+
+ * | 64-Bytes | 8-Bytes | Max 184-Bytes |
+ * +-------------+---------------------+---------------------------+
+ * | nd_pdsm_cmd_pkg | |
+ * |-------------+ | |
+ * | nd_cmd_pkg | | |
+ * +-------------+---------------------+---------------------------+
+ * | nd_family | | |
+ * | nd_size_out | cmd_status | |
+ * | nd_size_in | reserved | payload |
+ * | nd_command | | |
+ * | nd_fw_size | | |
+ * +-------------+---------------------+---------------------------+
+ *
+ * PDSM Header:
+ *
+ * The header is defined as 'struct nd_pdsm_cmd_pkg' which embeds a
+ * 'struct nd_cmd_pkg' instance. The PDSM command is assigned to member
+ * 'nd_cmd_pkg.nd_command'. Apart from size information of the envelope which is
+ * contained in 'struct nd_cmd_pkg', the header also has members following
+ * members:
+ *
+ * 'cmd_status' : (Out) Errors if any encountered while servicing PDSM.
+ * 'reserved' : Not used, reserved for future and should be set to 0.
+ *
+ * PDSM Payload:
+ *
+ * The layout of the PDSM Payload is defined by various structs shared between
+ * papr_scm and libndctl so that contents of payload can be interpreted. During
+ * servicing of a PDSM the papr_scm module will read input args from the payload
+ * field by casting its contents to an appropriate struct pointer based on the
+ * PDSM command. Similarly the output of servicing the PDSM command will be
+ * copied to the payload field using the same struct.
+ *
+ * 'libnvdimm' enforces a hard limit of 256 bytes on the envelope size, which
+ * leaves exactly 184 bytes for the envelope payload.
+ */
+
+/* PDSM-header + payload expected with ND_CMD_CALL ioctl from libnvdimm */
+struct nd_pdsm_cmd_pkg {
+ struct nd_cmd_pkg hdr; /* Package header containing sub-cmd */
+ __s32 cmd_status; /* Out: Sub-cmd status returned back */
+ __u16 reserved[2]; /* Ignored and to be used in future */
+ __u8 payload[]; /* In/Out: Sub-cmd data buffer */
+} __packed;
+
+/*
+ * Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
+ * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
+ */
+enum papr_pdsm {
+ PAPR_PDSM_MIN = 0x0,
+ PAPR_PDSM_MAX,
+};
+
+#endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index 692ad3d79826..167fcf0e249d 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -15,13 +15,15 @@
#include <linux/seq_buf.h>
#include <asm/plpar_wrappers.h>
+#include <asm/papr_pdsm.h>
#define BIND_ANY_ADDR (~0ul)
#define PAPR_SCM_DIMM_CMD_MASK \
((1ul << ND_CMD_GET_CONFIG_SIZE) | \
(1ul << ND_CMD_GET_CONFIG_DATA) | \
- (1ul << ND_CMD_SET_CONFIG_DATA))
+ (1ul << ND_CMD_SET_CONFIG_DATA) | \
+ (1ul << ND_CMD_CALL))
/* DIMM health bitmap bitmap indicators */
/* SCM device is unable to persist memory contents */
@@ -89,6 +91,21 @@ struct papr_scm_priv {
u64 health_bitmap;
};
+/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */
+static inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg *cmd)
+{
+ return (struct nd_pdsm_cmd_pkg *)cmd;
+}
+
+/* Return the payload pointer for a given pcmd */
+static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
+{
+ if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
+ return NULL;
+ else
+ return (void *)(pcmd->payload);
+}
+
static int drc_pmem_bind(struct papr_scm_priv *p)
{
unsigned long ret[PLPAR_HCALL_BUFSIZE];
@@ -349,17 +366,113 @@ static int papr_scm_meta_set(struct papr_scm_priv *p,
return 0;
}
+/*
+ * Do a sanity checks on the inputs args to dimm-control function and return
+ * '0' if valid. This also does validation on ND_CMD_CALL sub-command packages.
+ */
+static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
+ unsigned int buf_len)
+{
+ unsigned long cmd_mask = PAPR_SCM_DIMM_CMD_MASK;
+ struct nd_pdsm_cmd_pkg *pkg;
+ struct nd_cmd_pkg *nd_cmd;
+ struct papr_scm_priv *p;
+ enum papr_pdsm pdsm;
+
+ /* Only dimm-specific calls are supported atm */
+ if (!nvdimm)
+ return -EINVAL;
+
+ /* get the provider data from struct nvdimm */
+ p = nvdimm_provider_data(nvdimm);
+
+ if (!test_bit(cmd, &cmd_mask)) {
+ dev_dbg(&p->pdev->dev, "Unsupported cmd=%u\n", cmd);
+ return -EINVAL;
+ }
+
+ /* For CMD_CALL verify pdsm request */
+ if (cmd == ND_CMD_CALL) {
+ /* Verify the envelope package */
+ if (!buf || buf_len < sizeof(struct nd_pdsm_cmd_pkg)) {
+ dev_dbg(&p->pdev->dev, "Invalid pkg size=%u\n",
+ buf_len);
+ return -EINVAL;
+ }
+
+ /* Verify that the nd_cmd_pkg.nd_family is correct */
+ nd_cmd = (struct nd_cmd_pkg *)buf;
+ if (nd_cmd->nd_family != NVDIMM_FAMILY_PAPR) {
+ dev_dbg(&p->pdev->dev, "Invalid pkg family=0x%llx\n",
+ nd_cmd->nd_family);
+ return -EINVAL;
+ }
+
+ /* Get the pdsm request package and the command */
+ pkg = nd_to_pdsm_cmd_pkg(nd_cmd);
+ pdsm = pkg->hdr.nd_command;
+
+ /* Verify if the psdm command is valid */
+ if (pdsm <= PAPR_PDSM_MIN || pdsm >= PAPR_PDSM_MAX) {
+ dev_dbg(&p->pdev->dev, "PDSM[0x%x]: Invalid PDSM\n", pdsm);
+ return -EINVAL;
+ }
+
+ /* We except a payload with all PDSM commands */
+ if (!pdsm_cmd_to_payload(pkg)) {
+ dev_dbg(&p->pdev->dev, "PDSM[0x%x]: Empty payload\n", pdsm);
+ return -EINVAL;
+ }
+
+ /* Ensure reserved fields of the pdsm header are set to 0 */
+ if (pkg->reserved[0] || pkg->reserved[1]) {
+ dev_dbg(&p->pdev->dev,
+ "PDSM[0x%x]: Invalid reserved field usage\n", pdsm);
+ return -EINVAL;
+ }
+ }
+
+ /* Let the command be further processed */
+ return 0;
+}
+
+/*
+ * For a given pdsm request call an appropriate service function.
+ * Returns errors if any while handling the pdsm command package.
+ */
+static int papr_scm_service_pdsm(struct papr_scm_priv *p,
+ struct nd_pdsm_cmd_pkg *pkg)
+{
+ const enum papr_pdsm pdsm = pkg->hdr.nd_command;
+
+ dev_dbg(&p->pdev->dev, "PDSM[0x%x]: Servicing..\n", pdsm);
+
+ /* Call pdsm service function */
+ switch (pdsm) {
+ default:
+ dev_dbg(&p->pdev->dev, "PDSM[0x%x]: Unsupported PDSM request\n",
+ pdsm);
+ pkg->cmd_status = -ENOENT;
+ break;
+ }
+
+ return pkg->cmd_status;
+}
+
static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
struct nvdimm *nvdimm, unsigned int cmd, void *buf,
unsigned int buf_len, int *cmd_rc)
{
struct nd_cmd_get_config_size *get_size_hdr;
+ struct nd_pdsm_cmd_pkg *call_pkg = NULL;
struct papr_scm_priv *p;
int rc;
- /* Only dimm-specific calls are supported atm */
- if (!nvdimm)
- return -EINVAL;
+ rc = is_cmd_valid(nvdimm, cmd, buf, buf_len);
+ if (rc) {
+ pr_debug("Invalid cmd=0x%x. Err=%d\n", cmd, rc);
+ return rc;
+ }
/* Use a local variable in case cmd_rc pointer is NULL */
if (!cmd_rc)
@@ -385,6 +498,11 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
*cmd_rc = papr_scm_meta_set(p, buf);
break;
+ case ND_CMD_CALL:
+ call_pkg = nd_to_pdsm_cmd_pkg(buf);
+ *cmd_rc = papr_scm_service_pdsm(p, call_pkg);
+ break;
+
default:
dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd);
return -EINVAL;
diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
index de5d90212409..0e09dc5cec19 100644
--- a/include/uapi/linux/ndctl.h
+++ b/include/uapi/linux/ndctl.h
@@ -244,6 +244,7 @@ struct nd_cmd_pkg {
#define NVDIMM_FAMILY_HPE2 2
#define NVDIMM_FAMILY_MSFT 3
#define NVDIMM_FAMILY_HYPERV 4
+#define NVDIMM_FAMILY_PAPR 5
#define ND_IOCTL_CALL _IOWR(ND_IOCTL, ND_CMD_CALL,\
struct nd_cmd_pkg)
--
2.26.2
^ permalink raw reply related
* [PATCH v12 6/6] powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH
From: Vaibhav Jain @ 2020-06-15 12:26 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Steven Rostedt, Oliver O'Halloran,
Aneesh Kumar K . V, Vaibhav Jain, Dan Williams, Ira Weiny
In-Reply-To: <20200615122644.31887-1-vaibhav@linux.ibm.com>
This patch implements support for PDSM request 'PAPR_PDSM_HEALTH'
that returns a newly introduced 'struct nd_papr_pdsm_health' instance
containing dimm health information back to user space in response to
ND_CMD_CALL. This functionality is implemented in newly introduced
papr_pdsm_health() that queries the nvdimm health information and
then copies this information to the package payload whose layout is
defined by 'struct nd_papr_pdsm_health'.
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Changelog:
v11..v12:
* Minor: Reodered the initialization of 'struct nd_papr_pdsm_health'
fields to match order present in its definition. [ Ira ]
* Added ack from Ira
v10..v11:
* Changed the definition of 'struct nd_papr_pdsm_health' to a maximal
struct 184 bytes in size [ Dan Williams ]
* Added new field 'extension_flags' to 'struct nd_papr_pdsm_health'
[ Dan Williams ]
* Updated papr_pdsm_health() to set field 'extension_flags' to 0.
* Introduced a define ND_PDSM_PAYLOAD_MAX_SIZE that indicates the
maximum size of a payload.
* Fixed a suspicious conversion from u64 to u8 in papr_pdsm_health
that was preventing correct initialization of 'struct
nd_papr_pdsm_health'. [ Ira ]
v9..v10:
* Removed code in papr_pdsm_health that performed validation on pdsm
payload version and corrosponding struct and defines used for
validation of payload version.
* Dropped usage of struct papr_pdsm_health in 'struct
papr_scm_priv'. Instead papr_psdm_health() now uses
'papr_scm_priv.health_bitmap' to populate the pdsm payload.
* Above change also fixes the problem where this patch was removing
the code that was previously introduced in this patch-series.
[ Ira ]
* Introduced a new def ND_PDSM_ENVELOPE_HDR_SIZE that indicates the
space allocated to 'struct nd_pdsm_cmd_pkg' fields except 'struct
nd_cmd_pkg'. This def is useful in validating payload sizes.
* Reworked papr_pdsm_health() to enforce a specific payload size for
'PAPR_PDSM_HEALTH' pdsm request.
Resend:
* Added ack from Aneesh.
v8..v9:
* s/PAPR_SCM_PDSM_HEALTH/PAPR_PDSM_HEALTH/g [ Dan , Aneesh ]
* s/PAPR_SCM_PSDM_DIMM_*/PAPR_PDSM_DIMM_*/g
* Renamed papr_scm_get_health() to papr_psdm_health()
* Updated patch description to replace papr-scm dimm with nvdimm.
v7..v8:
* None
Resend:
* None
v6..v7:
* Updated flags_show() to use seq_buf_printf(). [Mpe]
* Updated papr_scm_get_health() to use newly introduced
__drc_pmem_query_health() bypassing the cache [Mpe].
v5..v6:
* Added attribute '__packed' to 'struct nd_papr_pdsm_health_v1' to
gaurd against possibility of different compilers adding different
paddings to the struct [ Dan Williams ]
* Updated 'struct nd_papr_pdsm_health_v1' to use __u8 instead of
'bool' and also updated drc_pmem_query_health() to take this into
account. [ Dan Williams ]
v4..v5:
* None
v3..v4:
* Call the DSM_PAPR_SCM_HEALTH service function from
papr_scm_service_dsm() instead of papr_scm_ndctl(). [Aneesh]
v2..v3:
* Updated struct nd_papr_scm_dimm_health_stat_v1 to use '__xx' types
as its exported to the userspace [Aneesh]
* Changed the constants DSM_PAPR_SCM_DIMM_XX indicating dimm health
from enum to #defines [Aneesh]
v1..v2:
* New patch in the series
---
arch/powerpc/include/uapi/asm/papr_pdsm.h | 43 ++++++++++++++
arch/powerpc/platforms/pseries/papr_scm.c | 71 +++++++++++++++++++++++
2 files changed, 114 insertions(+)
diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h
index 34d1a41d2406..d453baea13c4 100644
--- a/arch/powerpc/include/uapi/asm/papr_pdsm.h
+++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
@@ -70,13 +70,56 @@ struct nd_pdsm_cmd_pkg {
__u8 payload[]; /* In/Out: Sub-cmd data buffer */
} __packed;
+/* Calculate size used by the pdsm header fields minus 'struct nd_cmd_pkg' */
+#define ND_PDSM_HDR_SIZE \
+ (sizeof(struct nd_pdsm_cmd_pkg) - sizeof(struct nd_cmd_pkg))
+
+/* Max payload size that we can handle */
+#define ND_PDSM_PAYLOAD_MAX_SIZE 184
+
/*
* Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
* via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
*/
enum papr_pdsm {
PAPR_PDSM_MIN = 0x0,
+ PAPR_PDSM_HEALTH,
PAPR_PDSM_MAX,
};
+/* Various nvdimm health indicators */
+#define PAPR_PDSM_DIMM_HEALTHY 0
+#define PAPR_PDSM_DIMM_UNHEALTHY 1
+#define PAPR_PDSM_DIMM_CRITICAL 2
+#define PAPR_PDSM_DIMM_FATAL 3
+
+/*
+ * Struct exchanged between kernel & ndctl in for PAPR_PDSM_HEALTH
+ * Various flags indicate the health status of the dimm.
+ *
+ * extension_flags : Any extension fields present in the struct.
+ * dimm_unarmed : Dimm not armed. So contents wont persist.
+ * dimm_bad_shutdown : Previous shutdown did not persist contents.
+ * dimm_bad_restore : Contents from previous shutdown werent restored.
+ * dimm_scrubbed : Contents of the dimm have been scrubbed.
+ * dimm_locked : Contents of the dimm cant be modified until CEC reboot
+ * dimm_encrypted : Contents of dimm are encrypted.
+ * dimm_health : Dimm health indicator. One of PAPR_PDSM_DIMM_XXXX
+ */
+struct nd_papr_pdsm_health {
+ union {
+ struct {
+ __u32 extension_flags;
+ __u8 dimm_unarmed;
+ __u8 dimm_bad_shutdown;
+ __u8 dimm_bad_restore;
+ __u8 dimm_scrubbed;
+ __u8 dimm_locked;
+ __u8 dimm_encrypted;
+ __u16 dimm_health;
+ };
+ __u8 buf[ND_PDSM_PAYLOAD_MAX_SIZE];
+ };
+} __packed;
+
#endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index 167fcf0e249d..c7f1420f835f 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -436,6 +436,73 @@ static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
return 0;
}
+/* Fetch the DIMM health info and populate it in provided package. */
+static int papr_pdsm_health(struct papr_scm_priv *p,
+ struct nd_pdsm_cmd_pkg *pkg)
+{
+ int rc;
+ struct nd_papr_pdsm_health health = { 0 };
+ u16 copysize = sizeof(struct nd_papr_pdsm_health);
+ u16 payload_size = pkg->hdr.nd_size_out - ND_PDSM_HDR_SIZE;
+
+ /* Ensure correct payload size that can hold struct nd_papr_pdsm_health */
+ if (payload_size != copysize) {
+ dev_dbg(&p->pdev->dev,
+ "Unexpected payload-size (%u). Expected (%u)",
+ pkg->hdr.nd_size_out, copysize);
+ rc = -ENOSPC;
+ goto out;
+ }
+
+ /* Ensure dimm health mutex is taken preventing concurrent access */
+ rc = mutex_lock_interruptible(&p->health_mutex);
+ if (rc)
+ goto out;
+
+ /* Always fetch upto date dimm health data ignoring cached values */
+ rc = __drc_pmem_query_health(p);
+ if (rc) {
+ mutex_unlock(&p->health_mutex);
+ goto out;
+ }
+
+ /* update health struct with various flags derived from health bitmap */
+ health = (struct nd_papr_pdsm_health) {
+ .extension_flags = 0,
+ .dimm_unarmed = !!(p->health_bitmap & PAPR_PMEM_UNARMED_MASK),
+ .dimm_bad_shutdown = !!(p->health_bitmap & PAPR_PMEM_BAD_SHUTDOWN_MASK),
+ .dimm_bad_restore = !!(p->health_bitmap & PAPR_PMEM_BAD_RESTORE_MASK),
+ .dimm_scrubbed = !!(p->health_bitmap & PAPR_PMEM_SCRUBBED_AND_LOCKED),
+ .dimm_locked = !!(p->health_bitmap & PAPR_PMEM_SCRUBBED_AND_LOCKED),
+ .dimm_encrypted = !!(p->health_bitmap & PAPR_PMEM_ENCRYPTED),
+ .dimm_health = PAPR_PDSM_DIMM_HEALTHY,
+ };
+
+ /* Update field dimm_health based on health_bitmap flags */
+ if (p->health_bitmap & PAPR_PMEM_HEALTH_FATAL)
+ health.dimm_health = PAPR_PDSM_DIMM_FATAL;
+ else if (p->health_bitmap & PAPR_PMEM_HEALTH_CRITICAL)
+ health.dimm_health = PAPR_PDSM_DIMM_CRITICAL;
+ else if (p->health_bitmap & PAPR_PMEM_HEALTH_UNHEALTHY)
+ health.dimm_health = PAPR_PDSM_DIMM_UNHEALTHY;
+
+ /* struct populated hence can release the mutex now */
+ mutex_unlock(&p->health_mutex);
+
+ dev_dbg(&p->pdev->dev, "Copying payload size=%u\n", copysize);
+
+ /* Copy the health struct to the payload */
+ memcpy(pdsm_cmd_to_payload(pkg), &health, copysize);
+
+ /* Update fw size including size of struct nd_pdsm_cmd_pkg fields */
+ pkg->hdr.nd_fw_size = copysize + ND_PDSM_HDR_SIZE;
+
+out:
+ dev_dbg(&p->pdev->dev, "completion code = %d\n", rc);
+
+ return rc;
+}
+
/*
* For a given pdsm request call an appropriate service function.
* Returns errors if any while handling the pdsm command package.
@@ -449,6 +516,10 @@ static int papr_scm_service_pdsm(struct papr_scm_priv *p,
/* Call pdsm service function */
switch (pdsm) {
+ case PAPR_PDSM_HEALTH:
+ pkg->cmd_status = papr_pdsm_health(p, pkg);
+ break;
+
default:
dev_dbg(&p->pdev->dev, "PDSM[0x%x]: Unsupported PDSM request\n",
pdsm);
--
2.26.2
^ permalink raw reply related
* Re: [PATCH v2] SUNRPC: Add missing definition of ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
From: Chuck Lever @ 2020-06-15 12:33 UTC (permalink / raw)
To: Christophe Leroy, Bruce Fields
Cc: Linux NFS Mailing List, netdev, Linux Kernel Mailing List,
Trond Myklebust, Anna Schumaker, Jakub Kicinski, linuxppc-dev,
David S. Miller
In-Reply-To: <9e9882a2fb57b6f9d98a0a5d8b6bf9cff9fcbd93.1592202173.git.christophe.leroy@csgroup.eu>
> On Jun 15, 2020, at 2:25 AM, Christophe Leroy <christophe.leroy@csgroup.eu> wrote:
>
> Even if that's only a warning, not including asm/cacheflush.h
> leads to svc_flush_bvec() being empty allthough powerpc defines
> ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE.
>
> CC net/sunrpc/svcsock.o
> net/sunrpc/svcsock.c:227:5: warning: "ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE" is not defined [-Wundef]
> #if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
> ^
>
> Include linux/highmem.h so that asm/cacheflush.h will be included.
>
> Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> Reported-by: kernel test robot <lkp@intel.com>
> Cc: Chuck Lever <chuck.lever@oracle.com>
> Fixes: ca07eda33e01 ("SUNRPC: Refactor svc_recvfrom()")
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
LGTM.
Acked-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> v2: Use linux/highmem.h instead of asm/cacheflush.sh
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
> net/sunrpc/svcsock.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 5c4ec9386f81..c537272f9c7e 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -44,6 +44,7 @@
> #include <net/tcp.h>
> #include <net/tcp_states.h>
> #include <linux/uaccess.h>
> +#include <linux/highmem.h>
> #include <asm/ioctls.h>
>
> #include <linux/sunrpc/types.h>
> --
> 2.25.0
>
--
Chuck Lever
^ permalink raw reply
* [PATCH v13 0/6] powerpc/papr_scm: Add support for reporting nvdimm health
From: Vaibhav Jain @ 2020-06-15 12:44 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Steven Rostedt, Oliver O'Halloran,
Aneesh Kumar K . V, Vaibhav Jain, Dan Williams, Ira Weiny
Changes since v12 [1]:
* Fixed the clang warning regarding variable length object' being not at
the end of the 'struct nd_pdsm_cmd_pkg' by introducing a new layout.
* Removed instance of 'struct nd_cmd_pkg hdr' from 'struct
nd_pdsm_cmd_pkg' and renamed the struct to 'struct nd_pkg_pdsm' to
match the libndctl naming convention.
* Introduced 'union nd_pdsm_payload' thats a maximal union of all
possible payload structs and use it instead of having flexible
'payload' member 'struct nd_pdsm_cmd_pkg'.
* Introduce pdsm descriptor 'struct pdsm_cmd_desc' and its array
__pdsm_cmd_descriptors[] that holds a payload 'size_[in|out]' and
service function for each pdsm. This is analogues to
'__nd_cmd_dimm_descs[]'
* Introduce function 'pdsm_cmd_desc()' to fetch the corresponding pdsm
descriptor for each valid pdsm.
* Updated papr_scm_service_pdsm() to use 'pdsm_cmd_desc()' and apply
checks on 'nd_cmd_pkg' payload based on psdm descriptor members and
finally service the pdsm using the 'service' member of the
descriptor.
* Updated Patch-5 that to use the updated 'struct nd_pkg_pdsm'
definition.
[1] https://lore.kernel.org/linux-nvdimm/20200608211026.67573-1-vaibhav@linux.ibm.com
---
The PAPR standard[2][4] provides mechanisms to query the health and
performance stats of an NVDIMM via various hcalls as described in
Ref[3]. Until now these stats were never available nor exposed to the
user-space tools like 'ndctl'. This is partly due to PAPR platform not
having support for ACPI and NFIT. Hence 'ndctl' is unable to query and
report the dimm health status and a user had no way to determine the
current health status of a NDVIMM.
To overcome this limitation, this patch-set updates papr_scm kernel
module to query and fetch NVDIMM health stats using hcalls described
in Ref[3]. This health and performance stats are then exposed to
userspace via sysfs and PAPR-NVDIMM-Specific-Methods(PDSM) issued by
libndctl.
These changes coupled with proposed ndtcl changes located at Ref[5]
should provide a way for the user to retrieve NVDIMM health status
using ndtcl.
Below is a sample output using proposed kernel + ndctl for PAPR NVDIMM
in a emulation environment:
# ndctl list -DH
[
{
"dev":"nmem0",
"health":{
"health_state":"fatal",
"shutdown_state":"dirty"
}
}
]
Dimm health report output on a pseries guest lpar with vPMEM or HMS
based NVDIMMs that are in perfectly healthy conditions:
# ndctl list -d nmem0 -H
[
{
"dev":"nmem0",
"health":{
"health_state":"ok",
"shutdown_state":"clean"
}
}
]
PAPR NVDIMM-Specific-Methods(PDSM)
==================================
PDSM requests are issued by vendor specific code in libndctl to
execute certain operations or fetch information from NVDIMMS. PDSMs
requests can be sent to papr_scm module via libndctl(userspace) and
libnvdimm (kernel) using the ND_CMD_CALL ioctl command which can be
handled in the dimm control function papr_scm_ndctl(). Current
patchset proposes a single PDSM to retrieve NVDIMM health, defined in
the newly introduced uapi header named 'papr_pdsm.h'. Support for
more PDSMs will be added in future.
Structure of the patch-set
==========================
The patch-set starts with a doc patch documenting details of hcall
H_SCM_HEALTH. Second patch exports kernel symbol seq_buf_printf()
thats used in subsequent patches to generate sysfs attribute content.
Third patch implements support for fetching NVDIMM health information
from PHYP and partially exposing it to user-space via a NVDIMM sysfs
flag.
Fourth patch updates papr_scm_ndctl() to handle a possible error case
and also improve debug logging.
Fifth patch deals with implementing support for servicing PDSM
commands in papr_scm module.
Finally the last patch implements support for servicing PDSM
'PAPR_PDSM_HEALTH' that returns the NVDIMM health information to
libndctl.
References:
[2] "Power Architecture Platform Reference"
https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
[3] commit 58b278f568f0
("powerpc: Provide initial documentation for PAPR hcalls")
[4] "Linux on Power Architecture Platform Reference"
https://members.openpowerfoundation.org/document/dl/469
[5] https://github.com/vaibhav92/ndctl/tree/papr_scm_health_v13
---
Vaibhav Jain (6):
powerpc: Document details on H_SCM_HEALTH hcall
seq_buf: Export seq_buf_printf
powerpc/papr_scm: Fetch nvdimm health information from PHYP
powerpc/papr_scm: Improve error logging and handling papr_scm_ndctl()
ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH
Documentation/ABI/testing/sysfs-bus-papr-pmem | 27 ++
Documentation/powerpc/papr_hcalls.rst | 46 +-
arch/powerpc/include/uapi/asm/papr_pdsm.h | 132 ++++++
arch/powerpc/platforms/pseries/papr_scm.c | 420 +++++++++++++++++-
include/uapi/linux/ndctl.h | 1 +
lib/seq_buf.c | 1 +
6 files changed, 616 insertions(+), 11 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem
create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h
--
2.26.2
^ permalink raw reply
* [PATCH v13 1/6] powerpc: Document details on H_SCM_HEALTH hcall
From: Vaibhav Jain @ 2020-06-15 12:44 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Steven Rostedt, Oliver O'Halloran,
Aneesh Kumar K . V, Vaibhav Jain, Dan Williams, Ira Weiny
In-Reply-To: <20200615124407.32596-1-vaibhav@linux.ibm.com>
Add documentation to 'papr_hcalls.rst' describing the bitmap flags
that are returned from H_SCM_HEALTH hcall as per the PAPR-SCM
specification.
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ira Weiny <ira.weiny@intel.com>
Acked-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Changelog:
v12..v13:
* None
v11..v12:
* None
v10..v11:
* None
v9..v10:
* Added ack from Ira.
Resend:
* None
v8..v9:
* s/SCM/PMEM device. [ Dan Williams, Aneesh ]
v7..v8:
* Added a clarification on bit-ordering of Health Bitmap
Resend:
* None
v6..v7:
* None
v5..v6:
* New patch in the series
---
Documentation/powerpc/papr_hcalls.rst | 46 ++++++++++++++++++++++++---
1 file changed, 42 insertions(+), 4 deletions(-)
diff --git a/Documentation/powerpc/papr_hcalls.rst b/Documentation/powerpc/papr_hcalls.rst
index 3493631a60f8..48fcf1255a33 100644
--- a/Documentation/powerpc/papr_hcalls.rst
+++ b/Documentation/powerpc/papr_hcalls.rst
@@ -220,13 +220,51 @@ from the LPAR memory.
**H_SCM_HEALTH**
| Input: drcIndex
-| Out: *health-bitmap, health-bit-valid-bitmap*
+| Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)*
| Return Value: *H_Success, H_Parameter, H_Hardware*
Given a DRC Index return the info on predictive failure and overall health of
-the NVDIMM. The asserted bits in the health-bitmap indicate a single predictive
-failure and health-bit-valid-bitmap indicate which bits in health-bitmap are
-valid.
+the PMEM device. The asserted bits in the health-bitmap indicate one or more states
+(described in table below) of the PMEM device and health-bit-valid-bitmap indicate
+which bits in health-bitmap are valid. The bits are reported in
+reverse bit ordering for example a value of 0xC400000000000000
+indicates bits 0, 1, and 5 are valid.
+
+Health Bitmap Flags:
+
++------+-----------------------------------------------------------------------+
+| Bit | Definition |
++======+=======================================================================+
+| 00 | PMEM device is unable to persist memory contents. |
+| | If the system is powered down, nothing will be saved. |
++------+-----------------------------------------------------------------------+
+| 01 | PMEM device failed to persist memory contents. Either contents were |
+| | not saved successfully on power down or were not restored properly on |
+| | power up. |
++------+-----------------------------------------------------------------------+
+| 02 | PMEM device contents are persisted from previous IPL. The data from |
+| | the last boot were successfully restored. |
++------+-----------------------------------------------------------------------+
+| 03 | PMEM device contents are not persisted from previous IPL. There was no|
+| | data to restore from the last boot. |
++------+-----------------------------------------------------------------------+
+| 04 | PMEM device memory life remaining is critically low |
++------+-----------------------------------------------------------------------+
+| 05 | PMEM device will be garded off next IPL due to failure |
++------+-----------------------------------------------------------------------+
+| 06 | PMEM device contents cannot persist due to current platform health |
+| | status. A hardware failure may prevent data from being saved or |
+| | restored. |
++------+-----------------------------------------------------------------------+
+| 07 | PMEM device is unable to persist memory contents in certain conditions|
++------+-----------------------------------------------------------------------+
+| 08 | PMEM device is encrypted |
++------+-----------------------------------------------------------------------+
+| 09 | PMEM device has successfully completed a requested erase or secure |
+| | erase procedure. |
++------+-----------------------------------------------------------------------+
+|10:63 | Reserved / Unused |
++------+-----------------------------------------------------------------------+
**H_SCM_PERFORMANCE_STATS**
--
2.26.2
^ permalink raw reply related
* [PATCH v13 3/6] powerpc/papr_scm: Fetch nvdimm health information from PHYP
From: Vaibhav Jain @ 2020-06-15 12:44 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Steven Rostedt, Oliver O'Halloran,
Aneesh Kumar K . V, Vaibhav Jain, Dan Williams, Ira Weiny
In-Reply-To: <20200615124407.32596-1-vaibhav@linux.ibm.com>
Implement support for fetching nvdimm health information via
H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair
of 64-bit bitmap, bitwise-and of which is then stored in
'struct papr_scm_priv' and subsequently partially exposed to
user-space via newly introduced dimm specific attribute
'papr/flags'. Since the hcall is costly, the health information is
cached and only re-queried, 60s after the previous successful hcall.
The patch also adds a documentation text describing flags reported by
the the new sysfs attribute 'papr/flags' is also introduced at
Documentation/ABI/testing/sysfs-bus-papr-pmem.
[1] commit 58b278f568f0 ("powerpc: Provide initial documentation for
PAPR hcalls")
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Changelog:
v12..v13:
* None
v11..v12:
* None
v10..v11:
* None
v9..v10:
* Removed an avoidable 'goto' in __drc_pmem_query_health. [ Ira ].
Resend:
* Added ack from Aneesh.
v8..v9:
* Rename some variables and defines to reduce usage of term SCM
replacing it with PMEM [Dan Williams, Aneesh]
* s/PAPR_SCM_DIMM/PAPR_PMEM/g
* s/papr_scm_nd_attributes/papr_nd_attributes/g
* s/papr_scm_nd_attribute_group/papr_nd_attribute_group/g
* s/papr_scm_dimm_attr_groups/papr_nd_attribute_groups/g
* Renamed file sysfs-bus-papr-scm to sysfs-bus-papr-pmem
v7..v8:
* Update type of variable 'rc' in __drc_pmem_query_health() and
drc_pmem_query_health() to long and int respectively. [ Ira ]
* Updated the patch description to s/64 bit Big Endian Number/64-bit
bitmap/ [ Ira, Aneesh ].
Resend:
* None
v6..v7 :
* Used the exported buf_seq_printf() function to generate content for
'papr/flags'
* Moved the PAPR_SCM_DIMM_* bit-flags macro definitions to papr_scm.c
and removed the papr_scm.h file [Mpe]
* Some minor consistency issued in sysfs-bus-papr-scm
documentation. [Mpe]
* s/dimm_mutex/health_mutex/g [Mpe]
* Split drc_pmem_query_health() into two function one of which takes
care of caching and locking. [Mpe]
* Fixed a local copy creation of dimm health information using
READ_ONCE(). [Mpe]
v5..v6 :
* Change the flags sysfs attribute from 'papr_flags' to 'papr/flags'
[Dan Williams]
* Include documentation for 'papr/flags' attr [Dan Williams]
* Change flag 'save_fail' to 'flush_fail' [Dan Williams]
* Caching of health bitmap to reduce expensive hcalls [Dan Williams]
* Removed usage of PPC_BIT from 'papr-scm.h' header [Mpe]
* Replaced two __be64 integers from papr_scm_priv to a single u64
integer [Mpe]
* Updated patch description to reflect the changes made in this
version.
* Removed avoidable usage of 'papr_scm_priv.dimm_mutex' from
flags_show() [Dan Williams]
v4..v5 :
* None
v3..v4 :
* None
v2..v3 :
* Removed PAPR_SCM_DIMM_HEALTH_NON_CRITICAL as a condition for
NVDIMM unarmed [Aneesh]
v1..v2 :
* New patch in the series.
---
Documentation/ABI/testing/sysfs-bus-papr-pmem | 27 +++
arch/powerpc/platforms/pseries/papr_scm.c | 168 +++++++++++++++++-
2 files changed, 193 insertions(+), 2 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem
diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem b/Documentation/ABI/testing/sysfs-bus-papr-pmem
new file mode 100644
index 000000000000..5b10d036a8d4
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem
@@ -0,0 +1,27 @@
+What: /sys/bus/nd/devices/nmemX/papr/flags
+Date: Apr, 2020
+KernelVersion: v5.8
+Contact: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, linux-nvdimm@lists.01.org,
+Description:
+ (RO) Report flags indicating various states of a
+ papr-pmem NVDIMM device. Each flag maps to a one or
+ more bits set in the dimm-health-bitmap retrieved in
+ response to H_SCM_HEALTH hcall. The details of the bit
+ flags returned in response to this hcall is available
+ at 'Documentation/powerpc/papr_hcalls.rst' . Below are
+ the flags reported in this sysfs file:
+
+ * "not_armed" : Indicates that NVDIMM contents will not
+ survive a power cycle.
+ * "flush_fail" : Indicates that NVDIMM contents
+ couldn't be flushed during last
+ shut-down event.
+ * "restore_fail": Indicates that NVDIMM contents
+ couldn't be restored during NVDIMM
+ initialization.
+ * "encrypted" : NVDIMM contents are encrypted.
+ * "smart_notify": There is health event for the NVDIMM.
+ * "scrubbed" : Indicating that contents of the
+ NVDIMM have been scrubbed.
+ * "locked" : Indicating that NVDIMM contents cant
+ be modified until next power cycle.
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index f35592423380..0c091622b15e 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -12,6 +12,7 @@
#include <linux/libnvdimm.h>
#include <linux/platform_device.h>
#include <linux/delay.h>
+#include <linux/seq_buf.h>
#include <asm/plpar_wrappers.h>
@@ -22,6 +23,44 @@
(1ul << ND_CMD_GET_CONFIG_DATA) | \
(1ul << ND_CMD_SET_CONFIG_DATA))
+/* DIMM health bitmap bitmap indicators */
+/* SCM device is unable to persist memory contents */
+#define PAPR_PMEM_UNARMED (1ULL << (63 - 0))
+/* SCM device failed to persist memory contents */
+#define PAPR_PMEM_SHUTDOWN_DIRTY (1ULL << (63 - 1))
+/* SCM device contents are persisted from previous IPL */
+#define PAPR_PMEM_SHUTDOWN_CLEAN (1ULL << (63 - 2))
+/* SCM device contents are not persisted from previous IPL */
+#define PAPR_PMEM_EMPTY (1ULL << (63 - 3))
+/* SCM device memory life remaining is critically low */
+#define PAPR_PMEM_HEALTH_CRITICAL (1ULL << (63 - 4))
+/* SCM device will be garded off next IPL due to failure */
+#define PAPR_PMEM_HEALTH_FATAL (1ULL << (63 - 5))
+/* SCM contents cannot persist due to current platform health status */
+#define PAPR_PMEM_HEALTH_UNHEALTHY (1ULL << (63 - 6))
+/* SCM device is unable to persist memory contents in certain conditions */
+#define PAPR_PMEM_HEALTH_NON_CRITICAL (1ULL << (63 - 7))
+/* SCM device is encrypted */
+#define PAPR_PMEM_ENCRYPTED (1ULL << (63 - 8))
+/* SCM device has been scrubbed and locked */
+#define PAPR_PMEM_SCRUBBED_AND_LOCKED (1ULL << (63 - 9))
+
+/* Bits status indicators for health bitmap indicating unarmed dimm */
+#define PAPR_PMEM_UNARMED_MASK (PAPR_PMEM_UNARMED | \
+ PAPR_PMEM_HEALTH_UNHEALTHY)
+
+/* Bits status indicators for health bitmap indicating unflushed dimm */
+#define PAPR_PMEM_BAD_SHUTDOWN_MASK (PAPR_PMEM_SHUTDOWN_DIRTY)
+
+/* Bits status indicators for health bitmap indicating unrestored dimm */
+#define PAPR_PMEM_BAD_RESTORE_MASK (PAPR_PMEM_EMPTY)
+
+/* Bit status indicators for smart event notification */
+#define PAPR_PMEM_SMART_EVENT_MASK (PAPR_PMEM_HEALTH_CRITICAL | \
+ PAPR_PMEM_HEALTH_FATAL | \
+ PAPR_PMEM_HEALTH_UNHEALTHY)
+
+/* private struct associated with each region */
struct papr_scm_priv {
struct platform_device *pdev;
struct device_node *dn;
@@ -39,6 +78,15 @@ struct papr_scm_priv {
struct resource res;
struct nd_region *region;
struct nd_interleave_set nd_set;
+
+ /* Protect dimm health data from concurrent read/writes */
+ struct mutex health_mutex;
+
+ /* Last time the health information of the dimm was updated */
+ unsigned long lasthealth_jiffies;
+
+ /* Health information for the dimm */
+ u64 health_bitmap;
};
static int drc_pmem_bind(struct papr_scm_priv *p)
@@ -144,6 +192,61 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
return drc_pmem_bind(p);
}
+/*
+ * Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
+ * health information.
+ */
+static int __drc_pmem_query_health(struct papr_scm_priv *p)
+{
+ unsigned long ret[PLPAR_HCALL_BUFSIZE];
+ long rc;
+
+ /* issue the hcall */
+ rc = plpar_hcall(H_SCM_HEALTH, ret, p->drc_index);
+ if (rc != H_SUCCESS) {
+ dev_err(&p->pdev->dev,
+ "Failed to query health information, Err:%ld\n", rc);
+ return -ENXIO;
+ }
+
+ p->lasthealth_jiffies = jiffies;
+ p->health_bitmap = ret[0] & ret[1];
+
+ dev_dbg(&p->pdev->dev,
+ "Queried dimm health info. Bitmap:0x%016lx Mask:0x%016lx\n",
+ ret[0], ret[1]);
+
+ return 0;
+}
+
+/* Min interval in seconds for assuming stable dimm health */
+#define MIN_HEALTH_QUERY_INTERVAL 60
+
+/* Query cached health info and if needed call drc_pmem_query_health */
+static int drc_pmem_query_health(struct papr_scm_priv *p)
+{
+ unsigned long cache_timeout;
+ int rc;
+
+ /* Protect concurrent modifications to papr_scm_priv */
+ rc = mutex_lock_interruptible(&p->health_mutex);
+ if (rc)
+ return rc;
+
+ /* Jiffies offset for which the health data is assumed to be same */
+ cache_timeout = p->lasthealth_jiffies +
+ msecs_to_jiffies(MIN_HEALTH_QUERY_INTERVAL * 1000);
+
+ /* Fetch new health info is its older than MIN_HEALTH_QUERY_INTERVAL */
+ if (time_after(jiffies, cache_timeout))
+ rc = __drc_pmem_query_health(p);
+ else
+ /* Assume cached health data is valid */
+ rc = 0;
+
+ mutex_unlock(&p->health_mutex);
+ return rc;
+}
static int papr_scm_meta_get(struct papr_scm_priv *p,
struct nd_cmd_get_config_data_hdr *hdr)
@@ -286,6 +389,64 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
return 0;
}
+static ssize_t flags_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvdimm *dimm = to_nvdimm(dev);
+ struct papr_scm_priv *p = nvdimm_provider_data(dimm);
+ struct seq_buf s;
+ u64 health;
+ int rc;
+
+ rc = drc_pmem_query_health(p);
+ if (rc)
+ return rc;
+
+ /* Copy health_bitmap locally, check masks & update out buffer */
+ health = READ_ONCE(p->health_bitmap);
+
+ seq_buf_init(&s, buf, PAGE_SIZE);
+ if (health & PAPR_PMEM_UNARMED_MASK)
+ seq_buf_printf(&s, "not_armed ");
+
+ if (health & PAPR_PMEM_BAD_SHUTDOWN_MASK)
+ seq_buf_printf(&s, "flush_fail ");
+
+ if (health & PAPR_PMEM_BAD_RESTORE_MASK)
+ seq_buf_printf(&s, "restore_fail ");
+
+ if (health & PAPR_PMEM_ENCRYPTED)
+ seq_buf_printf(&s, "encrypted ");
+
+ if (health & PAPR_PMEM_SMART_EVENT_MASK)
+ seq_buf_printf(&s, "smart_notify ");
+
+ if (health & PAPR_PMEM_SCRUBBED_AND_LOCKED)
+ seq_buf_printf(&s, "scrubbed locked ");
+
+ if (seq_buf_used(&s))
+ seq_buf_printf(&s, "\n");
+
+ return seq_buf_used(&s);
+}
+DEVICE_ATTR_RO(flags);
+
+/* papr_scm specific dimm attributes */
+static struct attribute *papr_nd_attributes[] = {
+ &dev_attr_flags.attr,
+ NULL,
+};
+
+static struct attribute_group papr_nd_attribute_group = {
+ .name = "papr",
+ .attrs = papr_nd_attributes,
+};
+
+static const struct attribute_group *papr_nd_attr_groups[] = {
+ &papr_nd_attribute_group,
+ NULL,
+};
+
static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
{
struct device *dev = &p->pdev->dev;
@@ -312,8 +473,8 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
dimm_flags = 0;
set_bit(NDD_LABELING, &dimm_flags);
- p->nvdimm = nvdimm_create(p->bus, p, NULL, dimm_flags,
- PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
+ p->nvdimm = nvdimm_create(p->bus, p, papr_nd_attr_groups,
+ dimm_flags, PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
if (!p->nvdimm) {
dev_err(dev, "Error creating DIMM object for %pOF\n", p->dn);
goto err;
@@ -399,6 +560,9 @@ static int papr_scm_probe(struct platform_device *pdev)
if (!p)
return -ENOMEM;
+ /* Initialize the dimm mutex */
+ mutex_init(&p->health_mutex);
+
/* optional DT properties */
of_property_read_u32(dn, "ibm,metadata-size", &metadata_size);
--
2.26.2
^ permalink raw reply related
* [PATCH v13 6/6] powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH
From: Vaibhav Jain @ 2020-06-15 12:44 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Steven Rostedt, Oliver O'Halloran,
Aneesh Kumar K . V, Vaibhav Jain, Dan Williams, Ira Weiny
In-Reply-To: <20200615124407.32596-1-vaibhav@linux.ibm.com>
This patch implements support for PDSM request 'PAPR_PDSM_HEALTH'
that returns a newly introduced 'struct nd_papr_pdsm_health' instance
containing dimm health information back to user space in response to
ND_CMD_CALL. This functionality is implemented in newly introduced
papr_pdsm_health() that queries the nvdimm health information and
then copies this information to the package payload whose layout is
defined by 'struct nd_papr_pdsm_health'.
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Changelog:
v12..v13:
* Added the 'struct nd_papr_pdsm_health' to 'union nd_pdsm_payload'
* Added pdsm descriptor of 'PAPR_PDSM_HEALTH' to
__pdsm_cmd_descriptors[].
* Updated papr_pdsm_health() to copy the dimm health information
directly into PDSM payload rather then copying it to an intermediate
struct and then memcpy it to the PDSM payload area.
v11..v12:
* Minor: Reodered the initialization of 'struct nd_papr_pdsm_health'
fields to match order present in its definition. [ Ira ]
* Added ack from Ira
v10..v11:
* Changed the definition of 'struct nd_papr_pdsm_health' to a maximal
struct 184 bytes in size [ Dan Williams ]
* Added new field 'extension_flags' to 'struct nd_papr_pdsm_health'
[ Dan Williams ]
* Updated papr_pdsm_health() to set field 'extension_flags' to 0.
* Introduced a define ND_PDSM_PAYLOAD_MAX_SIZE that indicates the
maximum size of a payload.
* Fixed a suspicious conversion from u64 to u8 in papr_pdsm_health
that was preventing correct initialization of 'struct
nd_papr_pdsm_health'. [ Ira ]
v9..v10:
* Removed code in papr_pdsm_health that performed validation on pdsm
payload version and corrosponding struct and defines used for
validation of payload version.
* Dropped usage of struct papr_pdsm_health in 'struct
papr_scm_priv'. Instead papr_psdm_health() now uses
'papr_scm_priv.health_bitmap' to populate the pdsm payload.
* Above change also fixes the problem where this patch was removing
the code that was previously introduced in this patch-series.
[ Ira ]
* Introduced a new def ND_PDSM_ENVELOPE_HDR_SIZE that indicates the
space allocated to 'struct nd_pdsm_cmd_pkg' fields except 'struct
nd_cmd_pkg'. This def is useful in validating payload sizes.
* Reworked papr_pdsm_health() to enforce a specific payload size for
'PAPR_PDSM_HEALTH' pdsm request.
Resend:
* Added ack from Aneesh.
v8..v9:
* s/PAPR_SCM_PDSM_HEALTH/PAPR_PDSM_HEALTH/g [ Dan , Aneesh ]
* s/PAPR_SCM_PSDM_DIMM_*/PAPR_PDSM_DIMM_*/g
* Renamed papr_scm_get_health() to papr_psdm_health()
* Updated patch description to replace papr-scm dimm with nvdimm.
v7..v8:
* None
Resend:
* None
v6..v7:
* Updated flags_show() to use seq_buf_printf(). [Mpe]
* Updated papr_scm_get_health() to use newly introduced
__drc_pmem_query_health() bypassing the cache [Mpe].
v5..v6:
* Added attribute '__packed' to 'struct nd_papr_pdsm_health_v1' to
gaurd against possibility of different compilers adding different
paddings to the struct [ Dan Williams ]
* Updated 'struct nd_papr_pdsm_health_v1' to use __u8 instead of
'bool' and also updated drc_pmem_query_health() to take this into
account. [ Dan Williams ]
v4..v5:
* None
v3..v4:
* Call the DSM_PAPR_SCM_HEALTH service function from
papr_scm_service_dsm() instead of papr_scm_ndctl(). [Aneesh]
v2..v3:
* Updated struct nd_papr_scm_dimm_health_stat_v1 to use '__xx' types
as its exported to the userspace [Aneesh]
* Changed the constants DSM_PAPR_SCM_DIMM_XX indicating dimm health
from enum to #defines [Aneesh]
v1..v2:
* New patch in the series
---
arch/powerpc/include/uapi/asm/papr_pdsm.h | 37 ++++++++++++++++
arch/powerpc/platforms/pseries/papr_scm.c | 51 +++++++++++++++++++++++
2 files changed, 88 insertions(+)
diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h
index 28115152aa4e..9ccecc1d6840 100644
--- a/arch/powerpc/include/uapi/asm/papr_pdsm.h
+++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
@@ -66,17 +66,54 @@
#define ND_PDSM_HDR_SIZE \
(sizeof(struct nd_pkg_pdsm) - ND_PDSM_PAYLOAD_MAX_SIZE)
+/* Various nvdimm health indicators */
+#define PAPR_PDSM_DIMM_HEALTHY 0
+#define PAPR_PDSM_DIMM_UNHEALTHY 1
+#define PAPR_PDSM_DIMM_CRITICAL 2
+#define PAPR_PDSM_DIMM_FATAL 3
+
+/*
+ * Struct exchanged between kernel & ndctl in for PAPR_PDSM_HEALTH
+ * Various flags indicate the health status of the dimm.
+ *
+ * extension_flags : Any extension fields present in the struct.
+ * dimm_unarmed : Dimm not armed. So contents wont persist.
+ * dimm_bad_shutdown : Previous shutdown did not persist contents.
+ * dimm_bad_restore : Contents from previous shutdown werent restored.
+ * dimm_scrubbed : Contents of the dimm have been scrubbed.
+ * dimm_locked : Contents of the dimm cant be modified until CEC reboot
+ * dimm_encrypted : Contents of dimm are encrypted.
+ * dimm_health : Dimm health indicator. One of PAPR_PDSM_DIMM_XXXX
+ */
+struct nd_papr_pdsm_health {
+ union {
+ struct {
+ __u32 extension_flags;
+ __u8 dimm_unarmed;
+ __u8 dimm_bad_shutdown;
+ __u8 dimm_bad_restore;
+ __u8 dimm_scrubbed;
+ __u8 dimm_locked;
+ __u8 dimm_encrypted;
+ __u16 dimm_health;
+ };
+ __u8 buf[ND_PDSM_PAYLOAD_MAX_SIZE];
+ };
+};
+
/*
* Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
* via 'nd_cmd_pkg.nd_command' member of the ioctl struct
*/
enum papr_pdsm {
PAPR_PDSM_MIN = 0x0,
+ PAPR_PDSM_HEALTH,
PAPR_PDSM_MAX,
};
/* Maximal union that can hold all possible payload types */
union nd_pdsm_payload {
+ struct nd_papr_pdsm_health health;
__u8 buf[ND_PDSM_PAYLOAD_MAX_SIZE];
} __packed;
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index d3bbf9940ba4..9c569078a09f 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -416,6 +416,52 @@ static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
return 0;
}
+/* Fetch the DIMM health info and populate it in provided package. */
+static int papr_pdsm_health(struct papr_scm_priv *p,
+ union nd_pdsm_payload *payload)
+{
+ int rc;
+
+ /* Ensure dimm health mutex is taken preventing concurrent access */
+ rc = mutex_lock_interruptible(&p->health_mutex);
+ if (rc)
+ goto out;
+
+ /* Always fetch upto date dimm health data ignoring cached values */
+ rc = __drc_pmem_query_health(p);
+ if (rc) {
+ mutex_unlock(&p->health_mutex);
+ goto out;
+ }
+
+ /* update health struct with various flags derived from health bitmap */
+ payload->health = (struct nd_papr_pdsm_health) {
+ .extension_flags = 0,
+ .dimm_unarmed = !!(p->health_bitmap & PAPR_PMEM_UNARMED_MASK),
+ .dimm_bad_shutdown = !!(p->health_bitmap & PAPR_PMEM_BAD_SHUTDOWN_MASK),
+ .dimm_bad_restore = !!(p->health_bitmap & PAPR_PMEM_BAD_RESTORE_MASK),
+ .dimm_scrubbed = !!(p->health_bitmap & PAPR_PMEM_SCRUBBED_AND_LOCKED),
+ .dimm_locked = !!(p->health_bitmap & PAPR_PMEM_SCRUBBED_AND_LOCKED),
+ .dimm_encrypted = !!(p->health_bitmap & PAPR_PMEM_ENCRYPTED),
+ .dimm_health = PAPR_PDSM_DIMM_HEALTHY,
+ };
+
+ /* Update field dimm_health based on health_bitmap flags */
+ if (p->health_bitmap & PAPR_PMEM_HEALTH_FATAL)
+ payload->health.dimm_health = PAPR_PDSM_DIMM_FATAL;
+ else if (p->health_bitmap & PAPR_PMEM_HEALTH_CRITICAL)
+ payload->health.dimm_health = PAPR_PDSM_DIMM_CRITICAL;
+ else if (p->health_bitmap & PAPR_PMEM_HEALTH_UNHEALTHY)
+ payload->health.dimm_health = PAPR_PDSM_DIMM_UNHEALTHY;
+
+ /* struct populated hence can release the mutex now */
+ mutex_unlock(&p->health_mutex);
+ rc = sizeof(struct nd_papr_pdsm_health);
+
+out:
+ return rc;
+}
+
/*
* 'struct pdsm_cmd_desc'
* Identifies supported PDSMs' expected length of in/out payloads
@@ -444,6 +490,11 @@ static const struct pdsm_cmd_desc __pdsm_cmd_descriptors[] = {
},
/* New PDSM command descriptors to be added below */
+ [PAPR_PDSM_HEALTH] = {
+ .size_in = 0,
+ .size_out = sizeof(struct nd_papr_pdsm_health),
+ .service = papr_pdsm_health,
+ },
/* Empty */
[PAPR_PDSM_MAX] = {
.size_in = 0,
--
2.26.2
^ permalink raw reply related
* [PATCH v13 4/6] powerpc/papr_scm: Improve error logging and handling papr_scm_ndctl()
From: Vaibhav Jain @ 2020-06-15 12:44 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Steven Rostedt, Oliver O'Halloran,
Aneesh Kumar K . V, Vaibhav Jain, Dan Williams, Ira Weiny
In-Reply-To: <20200615124407.32596-1-vaibhav@linux.ibm.com>
Since papr_scm_ndctl() can be called from outside papr_scm, its
exposed to the possibility of receiving NULL as value of 'cmd_rc'
argument. This patch updates papr_scm_ndctl() to protect against such
possibility by assigning it pointer to a local variable in case cmd_rc
== NULL.
Finally the patch also updates the 'default' add a debug log unknown
'cmd' values.
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Changelog:
v12..v13:
* None
v11..v12:
* Added ack from Ira
v10..v11:
* Instead of returning *cmd_rd just return '0' in case nd_cmd is
handled. In case of unknown nd-cmd return -EINVAL
[ Ira and Dan Williams ]
* Updated patch description.
v9..v10
* New patch in the series
---
arch/powerpc/platforms/pseries/papr_scm.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index 0c091622b15e..692ad3d79826 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -355,11 +355,16 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
{
struct nd_cmd_get_config_size *get_size_hdr;
struct papr_scm_priv *p;
+ int rc;
/* Only dimm-specific calls are supported atm */
if (!nvdimm)
return -EINVAL;
+ /* Use a local variable in case cmd_rc pointer is NULL */
+ if (!cmd_rc)
+ cmd_rc = &rc;
+
p = nvdimm_provider_data(nvdimm);
switch (cmd) {
@@ -381,6 +386,7 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
break;
default:
+ dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd);
return -EINVAL;
}
--
2.26.2
^ permalink raw reply related
* [PATCH v13 2/6] seq_buf: Export seq_buf_printf
From: Vaibhav Jain @ 2020-06-15 12:44 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Cezary Rojewski, Piotr Maziarz, Steven Rostedt,
Christoph Hellwig, Oliver O'Halloran, Aneesh Kumar K . V,
Borislav Petkov, Vaibhav Jain, Dan Williams, Ira Weiny
In-Reply-To: <20200615124407.32596-1-vaibhav@linux.ibm.com>
'seq_buf' provides a very useful abstraction for writing to a string
buffer without needing to worry about it over-flowing. However even
though the API has been stable for couple of years now its still not
exported to kernel loadable modules limiting its usage.
Hence this patch proposes update to 'seq_buf.c' to mark
seq_buf_printf() which is part of the seq_buf API to be exported to
kernel loadable GPL modules. This symbol will be used in later parts
of this patch-set to simplify content creation for a sysfs attribute.
Cc: Piotr Maziarz <piotrx.maziarz@linux.intel.com>
Cc: Cezary Rojewski <cezary.rojewski@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@alien8.de>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Changelog:
v12..v13:
* None
v11..v12:
* None
v10..v11:
* None
v9..v10:
* None
Resend:
* Added ack from Steven Rostedt
v8..v9:
* None
v7..v8:
* Updated the patch title [ Christoph Hellwig ]
* Updated patch description to replace confusing term 'external kernel
modules' to 'kernel lodable modules'.
Resend:
* Added ack from Steven Rostedt
v6..v7:
* New patch in the series
---
lib/seq_buf.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/lib/seq_buf.c b/lib/seq_buf.c
index 4e865d42ab03..707453f5d58e 100644
--- a/lib/seq_buf.c
+++ b/lib/seq_buf.c
@@ -91,6 +91,7 @@ int seq_buf_printf(struct seq_buf *s, const char *fmt, ...)
return ret;
}
+EXPORT_SYMBOL_GPL(seq_buf_printf);
#ifdef CONFIG_BINARY_PRINTF
/**
--
2.26.2
^ permalink raw reply related
* [PATCH v13 5/6] ndctl/papr_scm, uapi: Add support for PAPR nvdimm specific methods
From: Vaibhav Jain @ 2020-06-15 12:44 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Steven Rostedt, Oliver O'Halloran,
Aneesh Kumar K . V, Vaibhav Jain, Dan Williams, Ira Weiny
In-Reply-To: <20200615124407.32596-1-vaibhav@linux.ibm.com>
Introduce support for PAPR NVDIMM Specific Methods (PDSM) in papr_scm
module and add the command family NVDIMM_FAMILY_PAPR to the white list
of NVDIMM command sets. Also advertise support for ND_CMD_CALL for the
nvdimm command mask and implement necessary scaffolding in the module
to handle ND_CMD_CALL ioctl and PDSM requests that we receive.
The layout of the PDSM request as we expect from libnvdimm/libndctl is
described in newly introduced uapi header 'papr_pdsm.h' which
defines a 'struct nd_pkg_pdsm' and a maximal union named
'nd_pdsm_payload'. These new structs together with 'struct nd_cmd_pkg'
for a pdsm envelop thats sent by libndctl to libnvdimm and serviced by
papr_scm in 'papr_scm_service_pdsm()'. The PDSM request is
communicated by member 'struct nd_cmd_pkg.nd_command' together with
other information on the pdsm payload (size-in, size-out).
The patch also introduces 'struct pdsm_cmd_desc' instances of which
are stored in an array __pdsm_cmd_descriptors[] indexed with PDSM cmd
and corresponding access function pdsm_cmd_desc() is
introduced. 'struct pdsm_cdm_desc' holds the service function for a
given PDSM and corresponding payload in/out sizes.
A new function papr_scm_service_pdsm() is introduced and is called from
papr_scm_ndctl() in case of a PDSM request is received via ND_CMD_CALL
command from libnvdimm. The function performs validation on the PDSM
payload based on info present in corresponding PDSM descriptor and if
valid calls the 'struct pdcm_cmd_desc.service' function to service the
PDSM.
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Changelog:
v12..v13:
* s/struct nd_pdsm_cmd_pkg/struct nd_pkg_pdsm/
* Removed instance of 'struct nd_cmd_pkg hdr' from 'struct
nd_pdsm_cmd_pkg'.
* Introduced 'union nd_pdsm_payload' thats a maximal union of all
possible payload structs.
* Instead of having flexible 'payload' member 'struct nd_pdsm_cmd_pkg'
replace it with a 'union nd_pdsm_payload'.
* Introduce pdsm descriptor 'struct pdsm_cmd_desc' and its array
__pdsm_cmd_descriptors[] that holds a payload 'size_[in|out]' and
service function for each pdsm. This is analogus to
'__nd_cmd_dimm_descs[]'
* Introduce function 'pdsm_cmd_desc()' to fetch the corrosponding pdsm
descriptor for each valid pdsm.
* Updated papr_scm_service_pdsm() to use 'pdsm_cmd_desc()' and apply
checks on 'nd_cmd_pkg' payload based on psdm descriptor members and
finally service the pdsm using the 'service' member of the
descriptor.
* Updated 'papr_scm_ndctl()' to send the 'struct nd_cmd_pkg*' instance
to papr_scm_service_pdsm() instead of a 'struct nd_pdsm_cmd_pkg*'.
* Updated documentation in 'papr_psdm.h' to reflect new pdsm envelope
layout.
* Update patch description.
v11..v12:
* Updated a misleading comment in 'papr_pdsm.h' regarding payload
size. [ Ira ]
v10..v11:
* Moved in-lines 'nd_pdsm_cmd_pkg()' and 'pdsm_cmd_to_payload()' from
'papr_pdsm.h' header to 'papr_scm.c'. The avoids a potential license
incompatibility issue with non-GPL-2.0 user-space code trying to
include the header in its code. [ Ira ]
* Verified papr_pdsm.h with UAPI_HEADER_TEST config.
* Moved the is_cmd_valid() check in papr_scm_ndctl() before check for
cmd_rc == NULL. This prevents cmd_rc to be updated in case the
nd-cmd is invalid or unknown.
v9..v10:
* Simplified 'struct nd_pdsm_cmd_pkg' by removing the
'payload_version' field.
* Removed the corrosponding documentation on versioning and backward
compatibility from 'papr_pdsm.h'
* Reduced the size of reserved fields to 4-bytes making 'struct
nd_pdsm_cmd_pkg' 64 + 8 bytes long.
* Updated is_cmd_valid() to enforce validation checks on pdsm
commands. [ Dan Williams ]
* Added check for reserved fields being set to '0' in is_cmd_valid()
[ Ira ]
* Moved changes for checking cmd_rc == NULL and logging improvements
to a separate prelim patch [ Ira ].
* Moved pdsm package validation checks from papr_scm_service_pdsm()
to is_cmd_valid().
* Marked papr_scm_service_pdsm() return type as 'void' since errors
are reported in nd_pdsm_cmd_pkg.cmd_status field.
Resend:
* Added ack from Aneesh.
v8..v9:
* Reduced the usage of term SCM replacing it with appropriate
replacement [ Dan Williams, Aneesh ]
* Renamed 'papr_scm_pdsm.h' to 'papr_pdsm.h'
* s/PAPR_SCM_PDSM_*/PAPR_PDSM_*/g
* s/NVDIMM_FAMILY_PAPR_SCM/NVDIMM_FAMILY_PAPR/g
* Minor updates to 'papr_psdm.h' to replace usage of term 'SCM'.
* Minor update to patch description.
v7..v8:
* Removed the 'payload_offset' field from 'struct
nd_pdsm_cmd_pkg'. Instead command payload is always assumed to start
at 'nd_pdsm_cmd_pkg.payload'. [ Aneesh ]
* To enable introducing new fields to 'struct nd_pdsm_cmd_pkg',
'reserved' field of 10-bytes is introduced. [ Aneesh ]
* Fixed a typo in "Backward Compatibility" section of papr_scm_pdsm.h
[ Ira ]
Resend:
* None
v6..v7 :
* Removed the re-definitions of __packed macro from papr_scm_pdsm.h
[Mpe].
* Removed the usage of __KERNEL__ macros in papr_scm_pdsm.h [Mpe].
* Removed macros that were unused in papr_scm.c from papr_scm_pdsm.h
[Mpe].
* Made functions defined in papr_scm_pdsm.h as static inline. [Mpe]
v5..v6 :
* Changed the usage of the term DSM to PDSM to distinguish it from the
ACPI term [ Dan Williams ]
* Renamed papr_scm_dsm.h to papr_scm_pdsm.h and updated various struct
to reflect the new terminology.
* Updated the patch description and title to reflect the new terminology.
* Squashed patch to introduce new command family in 'ndctl.h' with
this patch [ Dan Williams ]
* Updated the papr_scm_pdsm method starting index from 0x10000 to 0x0
[ Dan Williams ]
* Removed redundant license text from the papr_scm_psdm.h file.
[ Dan Williams ]
* s/envelop/envelope/ at various places [ Dan Williams ]
* Added '__packed' attribute to command package header to gaurd
against different compiler adding paddings between the fields.
[ Dan Williams]
* Converted various pr_debug to dev_debug [ Dan Williams ]
v4..v5 :
* None
v3..v4 :
* None
v2..v3 :
* Updated the patch prefix to 'ndctl/uapi' [Aneesh]
v1..v2 :
* None
---
arch/powerpc/include/uapi/asm/papr_pdsm.h | 95 +++++++++++
arch/powerpc/platforms/pseries/papr_scm.c | 193 +++++++++++++++++++++-
include/uapi/linux/ndctl.h | 1 +
3 files changed, 285 insertions(+), 4 deletions(-)
create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h
diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h
new file mode 100644
index 000000000000..28115152aa4e
--- /dev/null
+++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
@@ -0,0 +1,95 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * PAPR nvDimm Specific Methods (PDSM) and structs for libndctl
+ *
+ * (C) Copyright IBM 2020
+ *
+ * Author: Vaibhav Jain <vaibhav at linux.ibm.com>
+ */
+
+#ifndef _UAPI_ASM_POWERPC_PAPR_PDSM_H_
+#define _UAPI_ASM_POWERPC_PAPR_PDSM_H_
+
+#include <linux/types.h>
+#include <linux/ndctl.h>
+
+/*
+ * PDSM Envelope:
+ *
+ * The ioctl ND_CMD_CALL exchange data between user-space and kernel via
+ * envelope which consists of 2 headers sections and payload sections as
+ * illustrated below:
+ * +-----------------+---------------+---------------------------+
+ * | 64-Bytes | 8-Bytes | Max 184-Bytes |
+ * +-----------------+---------------+---------------------------+
+ * | ND-HEADER | PDSM-HEADER | PDSM-PAYLOAD |
+ * +-----------------+---------------+---------------------------+
+ * | nd_family | | |
+ * | nd_size_out | cmd_status | |
+ * | nd_size_in | reserved | nd_pdsm_payload |
+ * | nd_command | payload --> | |
+ * | nd_fw_size | | |
+ * | nd_payload ---> | | |
+ * +---------------+-----------------+---------------------------+
+ *
+ * ND Header:
+ * This is the generic libnvdimm header described as 'struct nd_cmd_pkg'
+ * which is interpreted by libnvdimm before passed on to papr_scm. Important
+ * member fields used are:
+ * 'nd_family' : (In) NVDIMM_FAMILY_PAPR_SCM
+ * 'nd_size_in' : (In) PDSM-HEADER + PDSM-IN-PAYLOAD (usually 0)
+ * 'nd_size_out' : (In) PDSM-HEADER + PDSM-RETURN-PAYLOAD
+ * 'nd_command' : (In) One of PAPR_PDSM_XXX
+ * 'nd_fw_size' : (Out) PDSM-HEADER + size of actual payload returned
+ *
+ * PDSM Header:
+ * This is papr-scm specific header that precedes the payload. This is defined
+ * as nd_cmd_pdsm_pkg. Following fields aare available in this header:
+ *
+ * 'cmd_status' : (Out) Errors if any encountered while servicing PDSM.
+ * 'reserved' : Not used, reserved for future and should be set to 0.
+ * 'payload' : A union of all the possible payload structs
+ *
+ * PDSM Payload:
+ *
+ * The layout of the PDSM Payload is defined by various structs shared between
+ * papr_scm and libndctl so that contents of payload can be interpreted. As such
+ * its defined as a union of all possible payload structs as
+ * 'union nd_pdsm_payload'. Based on the value of 'nd_cmd_pkg.nd_command'
+ * appropriate member of the union is accessed.
+ */
+
+/* Max payload size that we can handle */
+#define ND_PDSM_PAYLOAD_MAX_SIZE 184
+
+/* Max payload size that we can handle */
+#define ND_PDSM_HDR_SIZE \
+ (sizeof(struct nd_pkg_pdsm) - ND_PDSM_PAYLOAD_MAX_SIZE)
+
+/*
+ * Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
+ * via 'nd_cmd_pkg.nd_command' member of the ioctl struct
+ */
+enum papr_pdsm {
+ PAPR_PDSM_MIN = 0x0,
+ PAPR_PDSM_MAX,
+};
+
+/* Maximal union that can hold all possible payload types */
+union nd_pdsm_payload {
+ __u8 buf[ND_PDSM_PAYLOAD_MAX_SIZE];
+} __packed;
+
+/*
+ * PDSM-header + payload expected with ND_CMD_CALL ioctl from libnvdimm
+ * Valid member of union 'payload' is identified via 'nd_cmd_pkg.nd_command'
+ * that should always precede this struct when sent to papr_scm via CMD_CALL
+ * interface.
+ */
+struct nd_pkg_pdsm {
+ __s32 cmd_status; /* Out: Sub-cmd status returned back */
+ __u16 reserved[2]; /* Ignored and to be set as '0' */
+ union nd_pdsm_payload payload;
+} __packed;
+
+#endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index 692ad3d79826..d3bbf9940ba4 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -15,13 +15,15 @@
#include <linux/seq_buf.h>
#include <asm/plpar_wrappers.h>
+#include <asm/papr_pdsm.h>
#define BIND_ANY_ADDR (~0ul)
#define PAPR_SCM_DIMM_CMD_MASK \
((1ul << ND_CMD_GET_CONFIG_SIZE) | \
(1ul << ND_CMD_GET_CONFIG_DATA) | \
- (1ul << ND_CMD_SET_CONFIG_DATA))
+ (1ul << ND_CMD_SET_CONFIG_DATA) | \
+ (1ul << ND_CMD_CALL))
/* DIMM health bitmap bitmap indicators */
/* SCM device is unable to persist memory contents */
@@ -349,17 +351,195 @@ static int papr_scm_meta_set(struct papr_scm_priv *p,
return 0;
}
+/*
+ * Do a sanity checks on the inputs args to dimm-control function and return
+ * '0' if valid. Validation of PDSM payloads happens later in
+ * papr_scm_service_pdsm.
+ */
+static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
+ unsigned int buf_len)
+{
+ unsigned long cmd_mask = PAPR_SCM_DIMM_CMD_MASK;
+ struct nd_cmd_pkg *nd_cmd;
+ struct papr_scm_priv *p;
+ enum papr_pdsm pdsm;
+
+ /* Only dimm-specific calls are supported atm */
+ if (!nvdimm)
+ return -EINVAL;
+
+ /* get the provider data from struct nvdimm */
+ p = nvdimm_provider_data(nvdimm);
+
+ if (!test_bit(cmd, &cmd_mask)) {
+ dev_dbg(&p->pdev->dev, "Unsupported cmd=%u\n", cmd);
+ return -EINVAL;
+ }
+
+ /* For CMD_CALL verify pdsm request */
+ if (cmd == ND_CMD_CALL) {
+ /* Verify the envelope and envelop size */
+ if (!buf ||
+ buf_len < (sizeof(struct nd_cmd_pkg) + ND_PDSM_HDR_SIZE)) {
+ dev_dbg(&p->pdev->dev, "Invalid pkg size=%u\n",
+ buf_len);
+ return -EINVAL;
+ }
+
+ /* Verify that the nd_cmd_pkg.nd_family is correct */
+ nd_cmd = (struct nd_cmd_pkg *)buf;
+
+ if (nd_cmd->nd_family != NVDIMM_FAMILY_PAPR) {
+ dev_dbg(&p->pdev->dev, "Invalid pkg family=0x%llx\n",
+ nd_cmd->nd_family);
+ return -EINVAL;
+ }
+
+ pdsm = (enum papr_pdsm)nd_cmd->nd_command;
+
+ /* Verify if the pdsm command is valid */
+ if (pdsm <= PAPR_PDSM_MIN || pdsm >= PAPR_PDSM_MAX) {
+ dev_dbg(&p->pdev->dev, "PDSM[0x%x]: Invalid PDSM\n",
+ pdsm);
+ return -EINVAL;
+ }
+
+ /* Have enough space to hold returned 'nd_pkg_pdsm' header */
+ if (nd_cmd->nd_size_out < ND_PDSM_HDR_SIZE) {
+ dev_dbg(&p->pdev->dev, "PDSM[0x%x]: Invalid payload\n",
+ pdsm);
+ return -EINVAL;
+ }
+ }
+
+ /* Let the command be further processed */
+ return 0;
+}
+
+/*
+ * 'struct pdsm_cmd_desc'
+ * Identifies supported PDSMs' expected length of in/out payloads
+ * and pdsm service function.
+ *
+ * size_in : Size of input payload if any in the PDSM request.
+ * size_out : Size of output payload if any in the PDSM request.
+ * service : Service function for the PDSM request. Return semantics:
+ * rc < 0 : Error servicing PDSM and rc indicates the error.
+ * rc >=0 : Serviced successfully and 'rc' indicate number of
+ * bytes written to payload.
+ */
+struct pdsm_cmd_desc {
+ u32 size_in;
+ u32 size_out;
+ int (*service)(struct papr_scm_priv *dimm,
+ union nd_pdsm_payload *payload);
+};
+
+/* Holds all supported PDSMs' command descriptors */
+static const struct pdsm_cmd_desc __pdsm_cmd_descriptors[] = {
+ [PAPR_PDSM_MIN] = {
+ .size_in = 0,
+ .size_out = 0,
+ .service = NULL,
+ },
+ /* New PDSM command descriptors to be added below */
+
+ /* Empty */
+ [PAPR_PDSM_MAX] = {
+ .size_in = 0,
+ .size_out = 0,
+ .service = NULL,
+ },
+};
+
+/* Given a valid pdsm cmd return its command descriptor else return NULL */
+static inline const struct pdsm_cmd_desc *pdsm_cmd_desc(enum papr_pdsm cmd)
+{
+ if (cmd >= 0 || cmd < ARRAY_SIZE(__pdsm_cmd_descriptors))
+ return &__pdsm_cmd_descriptors[cmd];
+
+ return NULL;
+}
+
+/*
+ * For a given pdsm request call an appropriate service function.
+ * Returns errors if any while handling the pdsm command package.
+ */
+static int papr_scm_service_pdsm(struct papr_scm_priv *p,
+ struct nd_cmd_pkg *pkg)
+{
+ /* Get the PDSM header and PDSM command */
+ struct nd_pkg_pdsm *pdsm_pkg = (struct nd_pkg_pdsm *)pkg->nd_payload;
+ enum papr_pdsm pdsm = (enum papr_pdsm)pkg->nd_command;
+ const struct pdsm_cmd_desc *pdsc;
+ int rc;
+
+ /* Fetch corresponding pdsm descriptor for validation and servicing */
+ pdsc = pdsm_cmd_desc(pdsm);
+
+ /* Validate pdsm descriptor */
+ /* Ensure that reserved fields are 0 */
+ if (pdsm_pkg->reserved[0] || pdsm_pkg->reserved[1]) {
+ dev_dbg(&p->pdev->dev, "PDSM[0x%x]: Invalid reserved field\n",
+ pdsm);
+ return -EINVAL;
+ }
+
+ /* If pdsm expects some input, then ensure that the size_in matches */
+ if (pdsc->size_in &&
+ pkg->nd_size_in != (pdsc->size_in + ND_PDSM_HDR_SIZE)) {
+ dev_dbg(&p->pdev->dev, "PDSM[0x%x]: Mismatched size_in=%d\n",
+ pdsm, pkg->nd_size_in);
+ return -EINVAL;
+ }
+
+ /* If pdsm wants to return data, then ensure that size_out matches */
+ if (pdsc->size_out &&
+ pkg->nd_size_out != (pdsc->size_out + ND_PDSM_HDR_SIZE)) {
+ dev_dbg(&p->pdev->dev, "PDSM[0x%x]: Mismatched size_out=%d\n",
+ pdsm, pkg->nd_size_out);
+ return -EINVAL;
+ }
+
+ /* Service the pdsm */
+ if (pdsc->service) {
+ dev_dbg(&p->pdev->dev, "PDSM[0x%x]: Servicing..\n", pdsm);
+
+ rc = pdsc->service(p, &pdsm_pkg->payload);
+
+ if (rc < 0) {
+ /* error encountered while servicing pdsm */
+ pdsm_pkg->cmd_status = rc;
+ pkg->nd_fw_size = ND_PDSM_HDR_SIZE;
+ } else {
+ /* pdsm serviced and 'rc' bytes written to payload */
+ pdsm_pkg->cmd_status = 0;
+ pkg->nd_fw_size = ND_PDSM_HDR_SIZE + rc;
+ }
+ } else {
+ dev_dbg(&p->pdev->dev, "PDSM[0x%x]: Unsupported PDSM request\n",
+ pdsm);
+ pdsm_pkg->cmd_status = -ENOENT;
+ pkg->nd_fw_size = ND_PDSM_HDR_SIZE;
+ }
+
+ return pdsm_pkg->cmd_status;
+}
+
static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
struct nvdimm *nvdimm, unsigned int cmd, void *buf,
unsigned int buf_len, int *cmd_rc)
{
struct nd_cmd_get_config_size *get_size_hdr;
+ struct nd_cmd_pkg *call_pkg = NULL;
struct papr_scm_priv *p;
int rc;
- /* Only dimm-specific calls are supported atm */
- if (!nvdimm)
- return -EINVAL;
+ rc = is_cmd_valid(nvdimm, cmd, buf, buf_len);
+ if (rc) {
+ pr_debug("Invalid cmd=0x%x. Err=%d\n", cmd, rc);
+ return rc;
+ }
/* Use a local variable in case cmd_rc pointer is NULL */
if (!cmd_rc)
@@ -385,6 +565,11 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
*cmd_rc = papr_scm_meta_set(p, buf);
break;
+ case ND_CMD_CALL:
+ call_pkg = (struct nd_cmd_pkg *)buf;
+ *cmd_rc = papr_scm_service_pdsm(p, call_pkg);
+ break;
+
default:
dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd);
return -EINVAL;
diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
index de5d90212409..0e09dc5cec19 100644
--- a/include/uapi/linux/ndctl.h
+++ b/include/uapi/linux/ndctl.h
@@ -244,6 +244,7 @@ struct nd_cmd_pkg {
#define NVDIMM_FAMILY_HPE2 2
#define NVDIMM_FAMILY_MSFT 3
#define NVDIMM_FAMILY_HYPERV 4
+#define NVDIMM_FAMILY_PAPR 5
#define ND_IOCTL_CALL _IOWR(ND_IOCTL, ND_CMD_CALL,\
struct nd_cmd_pkg)
--
2.26.2
^ permalink raw reply related
* Re: [PATCH v12 1/6] powerpc: Document details on H_SCM_HEALTH hcall
From: Vaibhav Jain @ 2020-06-15 12:47 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Aneesh Kumar K . V, Steven Rostedt,
Oliver O'Halloran, Dan Williams, Ira Weiny
In-Reply-To: <20200615122644.31887-2-vaibhav@linux.ibm.com>
This accidently got reposted. Please ignore.
Vaibhav Jain <vaibhav@linux.ibm.com> writes:
> Add documentation to 'papr_hcalls.rst' describing the bitmap flags
> that are returned from H_SCM_HEALTH hcall as per the PAPR-SCM
> specification.
>
> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Acked-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> ---
> Changelog:
>
> v11..v12:
> * None
>
> v10..v11:
> * None
>
> v9..v10:
> * Added ack from Ira.
>
> Resend:
> * None
>
> v8..v9:
> * s/SCM/PMEM device. [ Dan Williams, Aneesh ]
>
> v7..v8:
> * Added a clarification on bit-ordering of Health Bitmap
>
> Resend:
> * None
>
> v6..v7:
> * None
>
> v5..v6:
> * New patch in the series
> ---
> Documentation/powerpc/papr_hcalls.rst | 46 ++++++++++++++++++++++++---
> 1 file changed, 42 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/powerpc/papr_hcalls.rst b/Documentation/powerpc/papr_hcalls.rst
> index 3493631a60f8..48fcf1255a33 100644
> --- a/Documentation/powerpc/papr_hcalls.rst
> +++ b/Documentation/powerpc/papr_hcalls.rst
> @@ -220,13 +220,51 @@ from the LPAR memory.
> **H_SCM_HEALTH**
>
> | Input: drcIndex
> -| Out: *health-bitmap, health-bit-valid-bitmap*
> +| Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)*
> | Return Value: *H_Success, H_Parameter, H_Hardware*
>
> Given a DRC Index return the info on predictive failure and overall health of
> -the NVDIMM. The asserted bits in the health-bitmap indicate a single predictive
> -failure and health-bit-valid-bitmap indicate which bits in health-bitmap are
> -valid.
> +the PMEM device. The asserted bits in the health-bitmap indicate one or more states
> +(described in table below) of the PMEM device and health-bit-valid-bitmap indicate
> +which bits in health-bitmap are valid. The bits are reported in
> +reverse bit ordering for example a value of 0xC400000000000000
> +indicates bits 0, 1, and 5 are valid.
> +
> +Health Bitmap Flags:
> +
> ++------+-----------------------------------------------------------------------+
> +| Bit | Definition |
> ++======+=======================================================================+
> +| 00 | PMEM device is unable to persist memory contents. |
> +| | If the system is powered down, nothing will be saved. |
> ++------+-----------------------------------------------------------------------+
> +| 01 | PMEM device failed to persist memory contents. Either contents were |
> +| | not saved successfully on power down or were not restored properly on |
> +| | power up. |
> ++------+-----------------------------------------------------------------------+
> +| 02 | PMEM device contents are persisted from previous IPL. The data from |
> +| | the last boot were successfully restored. |
> ++------+-----------------------------------------------------------------------+
> +| 03 | PMEM device contents are not persisted from previous IPL. There was no|
> +| | data to restore from the last boot. |
> ++------+-----------------------------------------------------------------------+
> +| 04 | PMEM device memory life remaining is critically low |
> ++------+-----------------------------------------------------------------------+
> +| 05 | PMEM device will be garded off next IPL due to failure |
> ++------+-----------------------------------------------------------------------+
> +| 06 | PMEM device contents cannot persist due to current platform health |
> +| | status. A hardware failure may prevent data from being saved or |
> +| | restored. |
> ++------+-----------------------------------------------------------------------+
> +| 07 | PMEM device is unable to persist memory contents in certain conditions|
> ++------+-----------------------------------------------------------------------+
> +| 08 | PMEM device is encrypted |
> ++------+-----------------------------------------------------------------------+
> +| 09 | PMEM device has successfully completed a requested erase or secure |
> +| | erase procedure. |
> ++------+-----------------------------------------------------------------------+
> +|10:63 | Reserved / Unused |
> ++------+-----------------------------------------------------------------------+
>
> **H_SCM_PERFORMANCE_STATS**
>
> --
> 2.26.2
>
--
Cheers
~ Vaibhav
^ permalink raw reply
* Re: [PATCH v12 0/6] powerpc/papr_scm: Add support for reporting nvdimm health
From: Vaibhav Jain @ 2020-06-15 12:48 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Aneesh Kumar K . V, Steven Rostedt,
Oliver O'Halloran, Dan Williams, Ira Weiny
In-Reply-To: <20200615122644.31887-1-vaibhav@linux.ibm.com>
This accidently got reposted. Please ignore.
v13 version of the patch series located at
https://lore.kernel.org/linux-nvdimm/20200615124407.32596-1-vaibhav@linux.ibm.com
Vaibhav Jain <vaibhav@linux.ibm.com> writes:
> Changes since v11 [1]:
> * Minor update to 'papr_pdsm.h' fixing a misleading comment about
> 'possible' padding being added by GCC which doesn't apply in case
> structs are marked as __packed.
> * Fix the order of initialization of 'struct nd_papr_pdsm_health' in
> papr_pdsm_health().
> * Added acks from Ira for various patches.
>
> [1] https://lore.kernel.org/linux-nvdimm/20200607131339.476036-1-vaibhav@linux.ibm.com
> ---
>
> The PAPR standard[2][4] provides mechanisms to query the health and
> performance stats of an NVDIMM via various hcalls as described in
> Ref[3]. Until now these stats were never available nor exposed to the
> user-space tools like 'ndctl'. This is partly due to PAPR platform not
> having support for ACPI and NFIT. Hence 'ndctl' is unable to query and
> report the dimm health status and a user had no way to determine the
> current health status of a NDVIMM.
>
> To overcome this limitation, this patch-set updates papr_scm kernel
> module to query and fetch NVDIMM health stats using hcalls described
> in Ref[3]. This health and performance stats are then exposed to
> userspace via sysfs and PAPR-NVDIMM-Specific-Methods(PDSM) issued by
> libndctl.
>
> These changes coupled with proposed ndtcl changes located at Ref[5]
> should provide a way for the user to retrieve NVDIMM health status
> using ndtcl.
>
> Below is a sample output using proposed kernel + ndctl for PAPR NVDIMM
> in a emulation environment:
>
> # ndctl list -DH
> [
> {
> "dev":"nmem0",
> "health":{
> "health_state":"fatal",
> "shutdown_state":"dirty"
> }
> }
> ]
>
> Dimm health report output on a pseries guest lpar with vPMEM or HMS
> based NVDIMMs that are in perfectly healthy conditions:
>
> # ndctl list -d nmem0 -H
> [
> {
> "dev":"nmem0",
> "health":{
> "health_state":"ok",
> "shutdown_state":"clean"
> }
> }
> ]
>
> PAPR NVDIMM-Specific-Methods(PDSM)
> ==================================
>
> PDSM requests are issued by vendor specific code in libndctl to
> execute certain operations or fetch information from NVDIMMS. PDSMs
> requests can be sent to papr_scm module via libndctl(userspace) and
> libnvdimm (kernel) using the ND_CMD_CALL ioctl command which can be
> handled in the dimm control function papr_scm_ndctl(). Current
> patchset proposes a single PDSM to retrieve NVDIMM health, defined in
> the newly introduced uapi header named 'papr_pdsm.h'. Support for
> more PDSMs will be added in future.
>
> Structure of the patch-set
> ==========================
>
> The patch-set starts with a doc patch documenting details of hcall
> H_SCM_HEALTH. Second patch exports kernel symbol seq_buf_printf()
> thats used in subsequent patches to generate sysfs attribute content.
>
> Third patch implements support for fetching NVDIMM health information
> from PHYP and partially exposing it to user-space via a NVDIMM sysfs
> flag.
>
> Fourth patch updates papr_scm_ndctl() to handle a possible error case
> and also improve debug logging.
>
> Fifth patch deals with implementing support for servicing PDSM
> commands in papr_scm module.
>
> Finally the last patch implements support for servicing PDSM
> 'PAPR_PDSM_HEALTH' that returns the NVDIMM health information to
> libndctl.
>
> References:
> [2] "Power Architecture Platform Reference"
> https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
> [3] commit 58b278f568f0
> ("powerpc: Provide initial documentation for PAPR hcalls")
> [4] "Linux on Power Architecture Platform Reference"
> https://members.openpowerfoundation.org/document/dl/469
> [5] https://github.com/vaibhav92/ndctl/tree/papr_scm_health_v12
>
> ---
>
> Vaibhav Jain (6):
> powerpc: Document details on H_SCM_HEALTH hcall
> seq_buf: Export seq_buf_printf
> powerpc/papr_scm: Fetch nvdimm health information from PHYP
> powerpc/papr_scm: Improve error logging and handling papr_scm_ndctl()
> ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
> powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH
>
> Documentation/ABI/testing/sysfs-bus-papr-pmem | 27 ++
> Documentation/powerpc/papr_hcalls.rst | 46 ++-
> arch/powerpc/include/uapi/asm/papr_pdsm.h | 125 ++++++
> arch/powerpc/platforms/pseries/papr_scm.c | 373 +++++++++++++++++-
> include/uapi/linux/ndctl.h | 1 +
> lib/seq_buf.c | 1 +
> 6 files changed, 562 insertions(+), 11 deletions(-)
> create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem
> create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h
>
> --
> 2.26.2
>
--
Cheers
~ Vaibhav
^ permalink raw reply
* Re: [PATCH v13 2/6] seq_buf: Export seq_buf_printf
From: Borislav Petkov @ 2020-06-15 12:55 UTC (permalink / raw)
To: Vaibhav Jain
Cc: Santosh Sivaraj, Ira Weiny, linux-nvdimm, Cezary Rojewski,
linux-kernel, Steven Rostedt, Piotr Maziarz, Christoph Hellwig,
Oliver O'Halloran, Aneesh Kumar K . V, Dan Williams,
linuxppc-dev
In-Reply-To: <20200615124407.32596-3-vaibhav@linux.ibm.com>
On Mon, Jun 15, 2020 at 06:14:03PM +0530, Vaibhav Jain wrote:
> 'seq_buf' provides a very useful abstraction for writing to a string
> buffer without needing to worry about it over-flowing. However even
> though the API has been stable for couple of years now its still not
> exported to kernel loadable modules limiting its usage.
>
> Hence this patch proposes update to 'seq_buf.c' to mark
> seq_buf_printf() which is part of the seq_buf API to be exported to
> kernel loadable GPL modules. This symbol will be used in later parts
> of this patch-set to simplify content creation for a sysfs attribute.
>
> Cc: Piotr Maziarz <piotrx.maziarz@linux.intel.com>
> Cc: Cezary Rojewski <cezary.rojewski@intel.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> ---
> Changelog:
>
> v12..v13:
> * None
>
> v11..v12:
> * None
Can you please resend your patchset once a week like everyone else and
not flood inboxes with it?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply
* [PATCH 2/3] mm: Allow arches to provide ptep_get()
From: Christophe Leroy @ 2020-06-15 12:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
Will Deacon, Andrew Morton, Peter Zijlstra (Intel)
Cc: linux-mm, linuxppc-dev, linux-kernel
In-Reply-To: <cover.1592225557.git.christophe.leroy@csgroup.eu>
Since commit 2ab3a0a02905 ("READ_ONCE: Enforce atomicity for
{READ,WRITE}_ONCE() memory accesses"), it is not possible anymore
to use READ_ONCE() to access complex page table entries like
the one defined for powerpc 8xx with 16k size pages.
Define a ptep_get() helper that architectures can override instead
of performing a READ_ONCE() on the page table entry pointer.
Fixes: 2ab3a0a02905 ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses")
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
include/asm-generic/hugetlb.h | 2 +-
include/linux/pgtable.h | 7 +++++++
mm/gup.c | 2 +-
3 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
index 40f85decc2ee..8e1e6244a89d 100644
--- a/include/asm-generic/hugetlb.h
+++ b/include/asm-generic/hugetlb.h
@@ -122,7 +122,7 @@ static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
#ifndef __HAVE_ARCH_HUGE_PTEP_GET
static inline pte_t huge_ptep_get(pte_t *ptep)
{
- return READ_ONCE(*ptep);
+ return ptep_get(ptep);
}
#endif
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 32b6c52d41b9..56c1e8eb7bb0 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -249,6 +249,13 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
}
#endif
+#ifndef __HAVE_ARCH_PTEP_GET
+static inline pte_t ptep_get(pte_t *ptep)
+{
+ return READ_ONCE(*ptep);
+}
+#endif
+
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
#ifndef __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR
static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
diff --git a/mm/gup.c b/mm/gup.c
index 761df4944ef5..6f47697f8fb0 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2196,7 +2196,7 @@ static inline pte_t gup_get_pte(pte_t *ptep)
*/
static inline pte_t gup_get_pte(pte_t *ptep)
{
- return READ_ONCE(*ptep);
+ return ptep_get(ptep);
}
#endif /* CONFIG_GUP_GET_PTE_LOW_HIGH */
--
2.25.0
^ permalink raw reply related
* [PATCH 1/3] mm/gup: Use huge_ptep_get() in gup_hugepte()
From: Christophe Leroy @ 2020-06-15 12:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
Will Deacon, Andrew Morton, Peter Zijlstra (Intel)
Cc: linux-mm, linuxppc-dev, linux-kernel
In-Reply-To: <cover.1592225557.git.christophe.leroy@csgroup.eu>
gup_hugepte() reads hugepage table entries, it can't read
them directly, huge_ptep_get() must be used.
Fixes: 2ab3a0a02905 ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses")
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
mm/gup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/gup.c b/mm/gup.c
index de9e36262ccb..761df4944ef5 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2425,7 +2425,7 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
if (pte_end < end)
end = pte_end;
- pte = READ_ONCE(*ptep);
+ pte = huge_ptep_get(ptep);
if (!pte_access_permitted(pte, flags & FOLL_WRITE))
return 0;
--
2.25.0
^ permalink raw reply related
* [PATCH 0/3] Fix build failure with v5.8-rc1
From: Christophe Leroy @ 2020-06-15 12:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
Will Deacon, Andrew Morton, Peter Zijlstra (Intel)
Cc: linux-mm, linuxppc-dev, linux-kernel
Commit 2ab3a0a02905 ("READ_ONCE: Enforce atomicity for
{READ,WRITE}_ONCE() memory accesses") leads to following build
failure on powerpc 8xx.
To fix it, this small series introduces a new helper named ptep_get()
to replace the direct access with READ_ONCE(). This new helper
can be overriden by architectures.
CC mm/gup.o
In file included from ./include/linux/kernel.h:11:0,
from mm/gup.c:2:
In function 'gup_hugepte.constprop',
inlined from 'gup_huge_pd.isra.79' at mm/gup.c:2465:8:
./include/linux/compiler.h:392:38: error: call to '__compiletime_assert_222' declared with attribute error: Unsupported access size for {READ,WRITE}_ONCE().
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
^
./include/linux/compiler.h:373:4: note: in definition of macro '__compiletime_assert'
prefix ## suffix(); \
^
./include/linux/compiler.h:392:2: note: in expansion of macro '_compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
^
./include/linux/compiler.h:405:2: note: in expansion of macro 'compiletime_assert'
compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
^
./include/linux/compiler.h:291:2: note: in expansion of macro 'compiletime_assert_rwonce_type'
compiletime_assert_rwonce_type(x); \
^
mm/gup.c:2428:8: note: in expansion of macro 'READ_ONCE'
pte = READ_ONCE(*ptep);
^
In function 'gup_get_pte',
inlined from 'gup_pte_range' at mm/gup.c:2228:9,
inlined from 'gup_pmd_range' at mm/gup.c:2613:15,
inlined from 'gup_pud_range' at mm/gup.c:2641:15,
inlined from 'gup_p4d_range' at mm/gup.c:2666:15,
inlined from 'gup_pgd_range' at mm/gup.c:2694:15,
inlined from 'internal_get_user_pages_fast' at mm/gup.c:2795:3:
./include/linux/compiler.h:392:38: error: call to '__compiletime_assert_219' declared with attribute error: Unsupported access size for {READ,WRITE}_ONCE().
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
^
./include/linux/compiler.h:373:4: note: in definition of macro '__compiletime_assert'
prefix ## suffix(); \
^
./include/linux/compiler.h:392:2: note: in expansion of macro '_compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
^
./include/linux/compiler.h:405:2: note: in expansion of macro 'compiletime_assert'
compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
^
./include/linux/compiler.h:291:2: note: in expansion of macro 'compiletime_assert_rwonce_type'
compiletime_assert_rwonce_type(x); \
^
mm/gup.c:2199:9: note: in expansion of macro 'READ_ONCE'
return READ_ONCE(*ptep);
^
make[2]: *** [mm/gup.o] Error 1
Christophe Leroy (3):
mm/gup: Use huge_ptep_get() in gup_hugepte()
mm: Allow arches to provide ptep_get()
powerpc/8xx: Provide ptep_get() with 16k pages
arch/powerpc/include/asm/nohash/32/pgtable.h | 10 ++++++++++
include/asm-generic/hugetlb.h | 2 +-
include/linux/pgtable.h | 7 +++++++
mm/gup.c | 4 ++--
4 files changed, 20 insertions(+), 3 deletions(-)
--
2.25.0
^ permalink raw reply
* [PATCH 3/3] powerpc/8xx: Provide ptep_get() with 16k pages
From: Christophe Leroy @ 2020-06-15 12:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
Will Deacon, Andrew Morton, Peter Zijlstra (Intel)
Cc: linux-mm, linuxppc-dev, linux-kernel
In-Reply-To: <cover.1592225557.git.christophe.leroy@csgroup.eu>
READ_ONCE() now enforces atomic read, which leads to:
CC mm/gup.o
In file included from ./include/linux/kernel.h:11:0,
from mm/gup.c:2:
In function 'gup_hugepte.constprop',
inlined from 'gup_huge_pd.isra.79' at mm/gup.c:2465:8:
./include/linux/compiler.h:392:38: error: call to '__compiletime_assert_222' declared with attribute error: Unsupported access size for {READ,WRITE}_ONCE().
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
^
./include/linux/compiler.h:373:4: note: in definition of macro '__compiletime_assert'
prefix ## suffix(); \
^
./include/linux/compiler.h:392:2: note: in expansion of macro '_compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
^
./include/linux/compiler.h:405:2: note: in expansion of macro 'compiletime_assert'
compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
^
./include/linux/compiler.h:291:2: note: in expansion of macro 'compiletime_assert_rwonce_type'
compiletime_assert_rwonce_type(x); \
^
mm/gup.c:2428:8: note: in expansion of macro 'READ_ONCE'
pte = READ_ONCE(*ptep);
^
In function 'gup_get_pte',
inlined from 'gup_pte_range' at mm/gup.c:2228:9,
inlined from 'gup_pmd_range' at mm/gup.c:2613:15,
inlined from 'gup_pud_range' at mm/gup.c:2641:15,
inlined from 'gup_p4d_range' at mm/gup.c:2666:15,
inlined from 'gup_pgd_range' at mm/gup.c:2694:15,
inlined from 'internal_get_user_pages_fast' at mm/gup.c:2795:3:
./include/linux/compiler.h:392:38: error: call to '__compiletime_assert_219' declared with attribute error: Unsupported access size for {READ,WRITE}_ONCE().
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
^
./include/linux/compiler.h:373:4: note: in definition of macro '__compiletime_assert'
prefix ## suffix(); \
^
./include/linux/compiler.h:392:2: note: in expansion of macro '_compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
^
./include/linux/compiler.h:405:2: note: in expansion of macro 'compiletime_assert'
compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
^
./include/linux/compiler.h:291:2: note: in expansion of macro 'compiletime_assert_rwonce_type'
compiletime_assert_rwonce_type(x); \
^
mm/gup.c:2199:9: note: in expansion of macro 'READ_ONCE'
return READ_ONCE(*ptep);
^
make[2]: *** [mm/gup.o] Error 1
Define ptep_get() on 8xx when using 16k pages.
Fixes: 2ab3a0a02905 ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses")
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/include/asm/nohash/32/pgtable.h | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h
index b56f14160ae5..77addb599ce7 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -286,6 +286,16 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
return __pte(pte_update(mm, addr, ptep, ~0, 0, 0));
}
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+#define __HAVE_ARCH_PTEP_GET
+static inline pte_t ptep_get(pte_t *ptep)
+{
+ pte_t pte = {READ_ONCE(ptep->pte), 0, 0, 0};
+
+ return pte;
+}
+#endif
+
#define __HAVE_ARCH_PTEP_SET_WRPROTECT
static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr,
pte_t *ptep)
--
2.25.0
^ permalink raw reply related
* [PATCH 1/6] exec: cleanup the execve wrappers
From: Christoph Hellwig @ 2020-06-15 13:00 UTC (permalink / raw)
To: Al Viro
Cc: linux-arch, linux-s390, Arnd Bergmann, linux-parisc, x86,
linux-mips, linux-kernel, linux-fsdevel, Luis Chamberlain,
sparclinux, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200615130032.931285-1-hch@lst.de>
Remove a whole bunch of wrappers that eventually all call
__do_execve_file, and consolidate the execvce helpers to:
(1) __do_execveat, which is the lowest level helper implementing the
actual functionality
(2) do_execvat, which is used by all callers that want native
pointers
(3) do_compat_execve, which is used by all compat syscalls
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/exec.c | 98 +++++++++++------------------------------
include/linux/binfmts.h | 12 ++---
init/main.c | 7 +--
kernel/umh.c | 16 +++----
4 files changed, 41 insertions(+), 92 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index e6e8a9a7032784..354fdaa536ae7d 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1815,10 +1815,7 @@ static int exec_binprm(struct linux_binprm *bprm)
return 0;
}
-/*
- * sys_execve() executes a new program.
- */
-static int __do_execve_file(int fd, struct filename *filename,
+static int __do_execveat(int fd, struct filename *filename,
struct user_arg_ptr argv,
struct user_arg_ptr envp,
int flags, struct file *file)
@@ -1972,74 +1969,16 @@ static int __do_execve_file(int fd, struct filename *filename,
return retval;
}
-static int do_execveat_common(int fd, struct filename *filename,
- struct user_arg_ptr argv,
- struct user_arg_ptr envp,
- int flags)
-{
- return __do_execve_file(fd, filename, argv, envp, flags, NULL);
-}
-
-int do_execve_file(struct file *file, void *__argv, void *__envp)
-{
- struct user_arg_ptr argv = { .ptr.native = __argv };
- struct user_arg_ptr envp = { .ptr.native = __envp };
-
- return __do_execve_file(AT_FDCWD, NULL, argv, envp, 0, file);
-}
-
-int do_execve(struct filename *filename,
- const char __user *const __user *__argv,
- const char __user *const __user *__envp)
-{
- struct user_arg_ptr argv = { .ptr.native = __argv };
- struct user_arg_ptr envp = { .ptr.native = __envp };
- return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
-}
-
int do_execveat(int fd, struct filename *filename,
const char __user *const __user *__argv,
const char __user *const __user *__envp,
- int flags)
+ int flags, struct file *file)
{
struct user_arg_ptr argv = { .ptr.native = __argv };
struct user_arg_ptr envp = { .ptr.native = __envp };
- return do_execveat_common(fd, filename, argv, envp, flags);
-}
-
-#ifdef CONFIG_COMPAT
-static int compat_do_execve(struct filename *filename,
- const compat_uptr_t __user *__argv,
- const compat_uptr_t __user *__envp)
-{
- struct user_arg_ptr argv = {
- .is_compat = true,
- .ptr.compat = __argv,
- };
- struct user_arg_ptr envp = {
- .is_compat = true,
- .ptr.compat = __envp,
- };
- return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
-}
-
-static int compat_do_execveat(int fd, struct filename *filename,
- const compat_uptr_t __user *__argv,
- const compat_uptr_t __user *__envp,
- int flags)
-{
- struct user_arg_ptr argv = {
- .is_compat = true,
- .ptr.compat = __argv,
- };
- struct user_arg_ptr envp = {
- .is_compat = true,
- .ptr.compat = __envp,
- };
- return do_execveat_common(fd, filename, argv, envp, flags);
+ return __do_execveat(fd, filename, argv, envp, flags, file);
}
-#endif
void set_binfmt(struct linux_binfmt *new)
{
@@ -2070,7 +2009,7 @@ SYSCALL_DEFINE3(execve,
const char __user *const __user *, argv,
const char __user *const __user *, envp)
{
- return do_execve(getname(filename), argv, envp);
+ return do_execveat(AT_FDCWD, getname(filename), argv, envp, 0, NULL);
}
SYSCALL_DEFINE5(execveat,
@@ -2080,18 +2019,34 @@ SYSCALL_DEFINE5(execveat,
int, flags)
{
int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
+ struct filename *name = getname_flags(filename, lookup_flags, NULL);
- return do_execveat(fd,
- getname_flags(filename, lookup_flags, NULL),
- argv, envp, flags);
+ return do_execveat(fd, name, argv, envp, flags, NULL);
}
#ifdef CONFIG_COMPAT
+static int do_compat_execve(int fd, struct filename *filename,
+ const compat_uptr_t __user *__argv,
+ const compat_uptr_t __user *__envp,
+ int flags)
+{
+ struct user_arg_ptr argv = {
+ .is_compat = true,
+ .ptr.compat = __argv,
+ };
+ struct user_arg_ptr envp = {
+ .is_compat = true,
+ .ptr.compat = __envp,
+ };
+
+ return __do_execveat(fd, filename, argv, envp, flags, NULL);
+}
+
COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
const compat_uptr_t __user *, argv,
const compat_uptr_t __user *, envp)
{
- return compat_do_execve(getname(filename), argv, envp);
+ return do_compat_execve(AT_FDCWD, getname(filename), argv, envp, 0);
}
COMPAT_SYSCALL_DEFINE5(execveat, int, fd,
@@ -2101,9 +2056,8 @@ COMPAT_SYSCALL_DEFINE5(execveat, int, fd,
int, flags)
{
int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
+ struct filename *name = getname_flags(filename, lookup_flags, NULL);
- return compat_do_execveat(fd,
- getname_flags(filename, lookup_flags, NULL),
- argv, envp, flags);
+ return do_compat_execve(fd, name, argv, envp, flags);
}
#endif
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 4a20b7517dd036..bed702e4b1fbd9 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -134,13 +134,9 @@ int copy_string_kernel(const char *arg, struct linux_binprm *bprm);
extern void set_binfmt(struct linux_binfmt *new);
extern ssize_t read_code(struct file *, unsigned long, loff_t, size_t);
-extern int do_execve(struct filename *,
- const char __user * const __user *,
- const char __user * const __user *);
-extern int do_execveat(int, struct filename *,
- const char __user * const __user *,
- const char __user * const __user *,
- int);
-int do_execve_file(struct file *file, void *__argv, void *__envp);
+int do_execveat(int fd, struct filename *filename,
+ const char __user *const __user *__argv,
+ const char __user *const __user *__envp,
+ int flags, struct file *file);
#endif /* _LINUX_BINFMTS_H */
diff --git a/init/main.c b/init/main.c
index 0ead83e86b5aa2..838950ea7bca22 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1329,9 +1329,10 @@ static int run_init_process(const char *init_filename)
pr_debug(" with environment:\n");
for (p = envp_init; *p; p++)
pr_debug(" %s\n", *p);
- return do_execve(getname_kernel(init_filename),
- (const char __user *const __user *)argv_init,
- (const char __user *const __user *)envp_init);
+ return do_execveat(AT_FDCWD, getname_kernel(init_filename),
+ (const char __user *const __user *)argv_init,
+ (const char __user *const __user *)envp_init,
+ 0, NULL);
}
static int try_to_run_init_process(const char *init_filename)
diff --git a/kernel/umh.c b/kernel/umh.c
index 79f139a7ca03c6..7aa9a5817582ca 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -103,15 +103,13 @@ static int call_usermodehelper_exec_async(void *data)
commit_creds(new);
sub_info->pid = task_pid_nr(current);
- if (sub_info->file) {
- retval = do_execve_file(sub_info->file,
- sub_info->argv, sub_info->envp);
- if (!retval)
- current->flags |= PF_UMH;
- } else
- retval = do_execve(getname_kernel(sub_info->path),
- (const char __user *const __user *)sub_info->argv,
- (const char __user *const __user *)sub_info->envp);
+ retval = do_execveat(AT_FDCWD,
+ sub_info->path ? getname_kernel(sub_info->path) : NULL,
+ (const char __user *const __user *)sub_info->argv,
+ (const char __user *const __user *)sub_info->envp,
+ 0, sub_info->file);
+ if (sub_info->file && !retval)
+ current->flags |= PF_UMH;
out:
sub_info->retval = retval;
/*
--
2.26.2
^ permalink raw reply related
* properly support exec and wait with kernel pointers
From: Christoph Hellwig @ 2020-06-15 13:00 UTC (permalink / raw)
To: Al Viro
Cc: linux-arch, linux-s390, Arnd Bergmann, linux-parisc, x86,
linux-mips, linux-kernel, linux-fsdevel, Luis Chamberlain,
sparclinux, linuxppc-dev, linux-arm-kernel
Hi all,
this series first cleans up the exec code and then adds proper
kernel_execveat and kernel_wait callers instead of relying on the fact
that the early init code and kernel threads implicitly run with
the address limit set to KERNEL_DS.
Note that the cleanup removes the compat execve(at) handlers (almost)
entirely, as we can handle the compat difference very nicely in a
unified codebase. The only exception is x86 where this would list the
handlers twice in the same syscall table due to the messed up x32
design. I had to add an extra compat handler just for that case, but
maybe someone has a better idea.
^ permalink raw reply
* [PATCH 2/6] exec: simplify the compat syscall handling
From: Christoph Hellwig @ 2020-06-15 13:00 UTC (permalink / raw)
To: Al Viro
Cc: linux-arch, linux-s390, Arnd Bergmann, linux-parisc, x86,
linux-mips, linux-kernel, linux-fsdevel, Luis Chamberlain,
sparclinux, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200615130032.931285-1-hch@lst.de>
The only differenence betweeen the compat exec* syscalls and their
native versions is that compat_ptr sign extension, and the fact that
the pointer arithmetics for the two dimensional arrays needs to use
the compat pointer size. Instead of the compat wrappers and the
struct user_arg_ptr machinery just use in_compat_syscall() to do the
right thing for the compat case deep inside get_user_arg_ptr().
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
arch/arm64/include/asm/unistd32.h | 4 +-
arch/mips/kernel/syscalls/syscall_n32.tbl | 4 +-
arch/mips/kernel/syscalls/syscall_o32.tbl | 4 +-
arch/parisc/kernel/syscalls/syscall.tbl | 4 +-
arch/powerpc/kernel/syscalls/syscall.tbl | 4 +-
arch/s390/kernel/syscalls/syscall.tbl | 4 +-
arch/sparc/kernel/syscalls.S | 4 +-
arch/x86/entry/syscalls/syscall_32.tbl | 4 +-
fs/exec.c | 89 ++++++-------------
include/linux/compat.h | 7 --
include/uapi/asm-generic/unistd.h | 4 +-
tools/include/uapi/asm-generic/unistd.h | 4 +-
.../arch/powerpc/entry/syscalls/syscall.tbl | 4 +-
.../perf/arch/s390/entry/syscalls/syscall.tbl | 4 +-
14 files changed, 53 insertions(+), 91 deletions(-)
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 6d95d0c8bf2f47..141f5d2ff1c34f 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -33,7 +33,7 @@ __SYSCALL(__NR_link, sys_link)
#define __NR_unlink 10
__SYSCALL(__NR_unlink, sys_unlink)
#define __NR_execve 11
-__SYSCALL(__NR_execve, compat_sys_execve)
+__SYSCALL(__NR_execve, sys_execve)
#define __NR_chdir 12
__SYSCALL(__NR_chdir, sys_chdir)
/* 13 was sys_time */
@@ -785,7 +785,7 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
#define __NR_bpf 386
__SYSCALL(__NR_bpf, sys_bpf)
#define __NR_execveat 387
-__SYSCALL(__NR_execveat, compat_sys_execveat)
+__SYSCALL(__NR_execveat, sys_execveat)
#define __NR_userfaultfd 388
__SYSCALL(__NR_userfaultfd, sys_userfaultfd)
#define __NR_membarrier 389
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index f777141f52568f..e861b5ab7179c9 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -64,7 +64,7 @@
54 n32 getsockopt compat_sys_getsockopt
55 n32 clone __sys_clone
56 n32 fork __sys_fork
-57 n32 execve compat_sys_execve
+57 n32 execve sys_execve
58 n32 exit sys_exit
59 n32 wait4 compat_sys_wait4
60 n32 kill sys_kill
@@ -328,7 +328,7 @@
317 n32 getrandom sys_getrandom
318 n32 memfd_create sys_memfd_create
319 n32 bpf sys_bpf
-320 n32 execveat compat_sys_execveat
+320 n32 execveat sys_execveat
321 n32 userfaultfd sys_userfaultfd
322 n32 membarrier sys_membarrier
323 n32 mlock2 sys_mlock2
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 13280625d312e9..bba80f74e9968e 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -18,7 +18,7 @@
8 o32 creat sys_creat
9 o32 link sys_link
10 o32 unlink sys_unlink
-11 o32 execve sys_execve compat_sys_execve
+11 o32 execve sys_execve
12 o32 chdir sys_chdir
13 o32 time sys_time32
14 o32 mknod sys_mknod
@@ -367,7 +367,7 @@
353 o32 getrandom sys_getrandom
354 o32 memfd_create sys_memfd_create
355 o32 bpf sys_bpf
-356 o32 execveat sys_execveat compat_sys_execveat
+356 o32 execveat sys_execveat
357 o32 userfaultfd sys_userfaultfd
358 o32 membarrier sys_membarrier
359 o32 mlock2 sys_mlock2
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index 5a758fa6ec5242..23fa0d0edf3384 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -18,7 +18,7 @@
8 common creat sys_creat
9 common link sys_link
10 common unlink sys_unlink
-11 common execve sys_execve compat_sys_execve
+11 common execve sys_execve
12 common chdir sys_chdir
13 32 time sys_time32
13 64 time sys_time
@@ -385,7 +385,7 @@
339 common getrandom sys_getrandom
340 common memfd_create sys_memfd_create
341 common bpf sys_bpf
-342 common execveat sys_execveat compat_sys_execveat
+342 common execveat sys_execveat
343 common membarrier sys_membarrier
344 common userfaultfd sys_userfaultfd
345 common mlock2 sys_mlock2
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index f833a319082247..c52cdab89dc0ae 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -20,7 +20,7 @@
8 common creat sys_creat
9 common link sys_link
10 common unlink sys_unlink
-11 nospu execve sys_execve compat_sys_execve
+11 nospu execve sys_execve
12 common chdir sys_chdir
13 32 time sys_time32
13 64 time sys_time
@@ -460,7 +460,7 @@
359 common getrandom sys_getrandom
360 common memfd_create sys_memfd_create
361 common bpf sys_bpf
-362 nospu execveat sys_execveat compat_sys_execveat
+362 nospu execveat sys_execveat
363 32 switch_endian sys_ni_syscall
363 64 switch_endian sys_switch_endian
363 spu switch_endian sys_ni_syscall
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index bfdcb763395735..bd2275db2026ea 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -18,7 +18,7 @@
8 common creat sys_creat sys_creat
9 common link sys_link sys_link
10 common unlink sys_unlink sys_unlink
-11 common execve sys_execve compat_sys_execve
+11 common execve sys_execve sys_execve
12 common chdir sys_chdir sys_chdir
13 32 time - sys_time32
14 common mknod sys_mknod sys_mknod
@@ -361,7 +361,7 @@
351 common bpf sys_bpf sys_bpf
352 common s390_pci_mmio_write sys_s390_pci_mmio_write sys_s390_pci_mmio_write
353 common s390_pci_mmio_read sys_s390_pci_mmio_read sys_s390_pci_mmio_read
-354 common execveat sys_execveat compat_sys_execveat
+354 common execveat sys_execveat sys_execveat
355 common userfaultfd sys_userfaultfd sys_userfaultfd
356 common membarrier sys_membarrier sys_membarrier
357 common recvmmsg sys_recvmmsg compat_sys_recvmmsg_time32
diff --git a/arch/sparc/kernel/syscalls.S b/arch/sparc/kernel/syscalls.S
index db42b4fb370844..70463972152a92 100644
--- a/arch/sparc/kernel/syscalls.S
+++ b/arch/sparc/kernel/syscalls.S
@@ -16,12 +16,12 @@ sys64_execveat:
sunos_execv:
mov %g0, %o2
sys32_execve:
- set compat_sys_execve, %g1
+ set sys_execve, %g1
jmpl %g1, %g0
flushw
sys32_execveat:
- set compat_sys_execveat, %g1
+ set sys_execveat, %g1
jmpl %g1, %g0
flushw
#endif
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index d8f8a1a69ed11f..2b1eccd3f8f697 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -22,7 +22,7 @@
8 i386 creat sys_creat
9 i386 link sys_link
10 i386 unlink sys_unlink
-11 i386 execve sys_execve compat_sys_execve
+11 i386 execve sys_execve
12 i386 chdir sys_chdir
13 i386 time sys_time32
14 i386 mknod sys_mknod
@@ -369,7 +369,7 @@
355 i386 getrandom sys_getrandom
356 i386 memfd_create sys_memfd_create
357 i386 bpf sys_bpf
-358 i386 execveat sys_execveat compat_sys_execveat
+358 i386 execveat sys_execveat
359 i386 socket sys_socket
360 i386 socketpair sys_socketpair
361 i386 bind sys_bind
diff --git a/fs/exec.c b/fs/exec.c
index 354fdaa536ae7d..94107eceda8a67 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -386,34 +386,25 @@ static int bprm_mm_init(struct linux_binprm *bprm)
return err;
}
-struct user_arg_ptr {
-#ifdef CONFIG_COMPAT
- bool is_compat;
-#endif
- union {
- const char __user *const __user *native;
-#ifdef CONFIG_COMPAT
- const compat_uptr_t __user *compat;
-#endif
- } ptr;
-};
-
-static const char __user *get_user_arg_ptr(struct user_arg_ptr argv, int nr)
+static const char __user *
+get_user_arg_ptr(const char __user *const __user *argv, int nr)
{
const char __user *native;
#ifdef CONFIG_COMPAT
- if (unlikely(argv.is_compat)) {
+ if (in_compat_syscall()) {
+ const compat_uptr_t __user *compat_argv =
+ compat_ptr((unsigned long)argv);
compat_uptr_t compat;
- if (get_user(compat, argv.ptr.compat + nr))
+ if (get_user(compat, compat_argv + nr))
return ERR_PTR(-EFAULT);
return compat_ptr(compat);
}
#endif
- if (get_user(native, argv.ptr.native + nr))
+ if (get_user(native, argv + nr))
return ERR_PTR(-EFAULT);
return native;
@@ -422,11 +413,11 @@ static const char __user *get_user_arg_ptr(struct user_arg_ptr argv, int nr)
/*
* count() counts the number of strings in array ARGV.
*/
-static int count(struct user_arg_ptr argv, int max)
+static int count(const char __user *const __user *argv, int max)
{
int i = 0;
- if (argv.ptr.native != NULL) {
+ if (argv) {
for (;;) {
const char __user *p = get_user_arg_ptr(argv, i);
@@ -449,7 +440,8 @@ static int count(struct user_arg_ptr argv, int max)
}
static int prepare_arg_pages(struct linux_binprm *bprm,
- struct user_arg_ptr argv, struct user_arg_ptr envp)
+ const char __user *const __user *argv,
+ const char __user *const __user *envp)
{
unsigned long limit, ptr_size;
@@ -497,7 +489,7 @@ static int prepare_arg_pages(struct linux_binprm *bprm,
* processes's memory to the new process's stack. The call to get_user_pages()
* ensures the destination page is created and not swapped out.
*/
-static int copy_strings(int argc, struct user_arg_ptr argv,
+static int copy_strings(int argc, const char __user *const __user *argv,
struct linux_binprm *bprm)
{
struct page *kmapped_page = NULL;
@@ -1815,10 +1807,10 @@ static int exec_binprm(struct linux_binprm *bprm)
return 0;
}
-static int __do_execveat(int fd, struct filename *filename,
- struct user_arg_ptr argv,
- struct user_arg_ptr envp,
- int flags, struct file *file)
+int do_execveat(int fd, struct filename *filename,
+ const char __user *const __user *argv,
+ const char __user *const __user *envp,
+ int flags, struct file *file)
{
char *pathbuf = NULL;
struct linux_binprm *bprm;
@@ -1969,17 +1961,6 @@ static int __do_execveat(int fd, struct filename *filename,
return retval;
}
-int do_execveat(int fd, struct filename *filename,
- const char __user *const __user *__argv,
- const char __user *const __user *__envp,
- int flags, struct file *file)
-{
- struct user_arg_ptr argv = { .ptr.native = __argv };
- struct user_arg_ptr envp = { .ptr.native = __envp };
-
- return __do_execveat(fd, filename, argv, envp, flags, file);
-}
-
void set_binfmt(struct linux_binfmt *new)
{
struct mm_struct *mm = current->mm;
@@ -2024,40 +2005,28 @@ SYSCALL_DEFINE5(execveat,
return do_execveat(fd, name, argv, envp, flags, NULL);
}
-#ifdef CONFIG_COMPAT
-static int do_compat_execve(int fd, struct filename *filename,
- const compat_uptr_t __user *__argv,
- const compat_uptr_t __user *__envp,
- int flags)
-{
- struct user_arg_ptr argv = {
- .is_compat = true,
- .ptr.compat = __argv,
- };
- struct user_arg_ptr envp = {
- .is_compat = true,
- .ptr.compat = __envp,
- };
-
- return __do_execveat(fd, filename, argv, envp, flags, NULL);
-}
-
+/*
+ * x32 syscalls are listed in the same table as x86_64 ones, so we need to
+ * define compat syscalls that are exactly the same as the native version for
+ * the syscall table machinery to work. Sigh..
+ */
+#ifdef CONFIG_X86_X32
COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
- const compat_uptr_t __user *, argv,
- const compat_uptr_t __user *, envp)
+ const char __user *const __user *, argv,
+ const char __user *const __user *, envp)
{
- return do_compat_execve(AT_FDCWD, getname(filename), argv, envp, 0);
+ return do_execveat(AT_FDCWD, getname(filename), argv, envp, 0, NULL);
}
COMPAT_SYSCALL_DEFINE5(execveat, int, fd,
const char __user *, filename,
- const compat_uptr_t __user *, argv,
- const compat_uptr_t __user *, envp,
+ const char __user *const __user *, argv,
+ const char __user *const __user *, envp,
int, flags)
{
int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
struct filename *name = getname_flags(filename, lookup_flags, NULL);
- return do_compat_execve(fd, name, argv, envp, flags);
+ return do_execveat(fd, name, argv, envp, flags, NULL);
}
-#endif
+#endif /* CONFIG_X86_X32 */
diff --git a/include/linux/compat.h b/include/linux/compat.h
index e90100c0de72e4..5e8f6a588e0d43 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -752,10 +752,6 @@ asmlinkage long compat_sys_recvmsg(int fd, struct compat_msghdr __user *msg,
asmlinkage long compat_sys_keyctl(u32 option,
u32 arg2, u32 arg3, u32 arg4, u32 arg5);
-/* arch/example/kernel/sys_example.c */
-asmlinkage long compat_sys_execve(const char __user *filename, const compat_uptr_t __user *argv,
- const compat_uptr_t __user *envp);
-
/* mm/fadvise.c: No generic prototype for fadvise64_64 */
/* mm/, CONFIG_MMU only */
@@ -806,9 +802,6 @@ asmlinkage ssize_t compat_sys_process_vm_writev(compat_pid_t pid,
const struct compat_iovec __user *lvec,
compat_ulong_t liovcnt, const struct compat_iovec __user *rvec,
compat_ulong_t riovcnt, compat_ulong_t flags);
-asmlinkage long compat_sys_execveat(int dfd, const char __user *filename,
- const compat_uptr_t __user *argv,
- const compat_uptr_t __user *envp, int flags);
asmlinkage ssize_t compat_sys_preadv2(compat_ulong_t fd,
const struct compat_iovec __user *vec,
compat_ulong_t vlen, u32 pos_low, u32 pos_high, rwf_t flags);
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index f4a01305d9a65c..c639d04a094b8b 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -640,7 +640,7 @@ __SC_COMP(__NR_keyctl, sys_keyctl, compat_sys_keyctl)
#define __NR_clone 220
__SYSCALL(__NR_clone, sys_clone)
#define __NR_execve 221
-__SC_COMP(__NR_execve, sys_execve, compat_sys_execve)
+__SYSCALL(__NR_execve, sys_execve)
#define __NR3264_mmap 222
__SC_3264(__NR3264_mmap, sys_mmap2, sys_mmap)
@@ -751,7 +751,7 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
#define __NR_bpf 280
__SYSCALL(__NR_bpf, sys_bpf)
#define __NR_execveat 281
-__SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
+__SYSCALL(__NR_execveat, sys_execveat)
#define __NR_userfaultfd 282
__SYSCALL(__NR_userfaultfd, sys_userfaultfd)
#define __NR_membarrier 283
diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h
index 3a3201e4618ef8..a6dbc8af8bd577 100644
--- a/tools/include/uapi/asm-generic/unistd.h
+++ b/tools/include/uapi/asm-generic/unistd.h
@@ -640,7 +640,7 @@ __SC_COMP(__NR_keyctl, sys_keyctl, compat_sys_keyctl)
#define __NR_clone 220
__SYSCALL(__NR_clone, sys_clone)
#define __NR_execve 221
-__SC_COMP(__NR_execve, sys_execve, compat_sys_execve)
+__SYSCALL(__NR_execve, sys_execve)
#define __NR3264_mmap 222
__SC_3264(__NR3264_mmap, sys_mmap2, sys_mmap)
@@ -751,7 +751,7 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
#define __NR_bpf 280
__SYSCALL(__NR_bpf, sys_bpf)
#define __NR_execveat 281
-__SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
+__SYSCALL(__NR_execveat, sys_execveat)
#define __NR_userfaultfd 282
__SYSCALL(__NR_userfaultfd, sys_userfaultfd)
#define __NR_membarrier 283
diff --git a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
index 35b61bfc1b1ae9..42bf8b461a0ed6 100644
--- a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
@@ -18,7 +18,7 @@
8 common creat sys_creat
9 common link sys_link
10 common unlink sys_unlink
-11 nospu execve sys_execve compat_sys_execve
+11 nospu execve sys_execve sys_execve
12 common chdir sys_chdir
13 32 time sys_time32
13 64 time sys_time
@@ -454,7 +454,7 @@
359 common getrandom sys_getrandom
360 common memfd_create sys_memfd_create
361 common bpf sys_bpf
-362 nospu execveat sys_execveat compat_sys_execveat
+362 nospu execveat sys_execveat sys_execveat
363 32 switch_endian sys_ni_syscall
363 64 switch_endian ppc_switch_endian
363 spu switch_endian sys_ni_syscall
diff --git a/tools/perf/arch/s390/entry/syscalls/syscall.tbl b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
index b38d48464368dc..f3c16f2d9746ac 100644
--- a/tools/perf/arch/s390/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
@@ -18,7 +18,7 @@
8 common creat sys_creat compat_sys_creat
9 common link sys_link compat_sys_link
10 common unlink sys_unlink compat_sys_unlink
-11 common execve sys_execve compat_sys_execve
+11 common execve sys_execve sys_execve
12 common chdir sys_chdir compat_sys_chdir
13 32 time - compat_sys_time
14 common mknod sys_mknod compat_sys_mknod
@@ -361,7 +361,7 @@
351 common bpf sys_bpf compat_sys_bpf
352 common s390_pci_mmio_write sys_s390_pci_mmio_write compat_sys_s390_pci_mmio_write
353 common s390_pci_mmio_read sys_s390_pci_mmio_read compat_sys_s390_pci_mmio_read
-354 common execveat sys_execveat compat_sys_execveat
+354 common execveat sys_execveat sys_execveat
355 common userfaultfd sys_userfaultfd sys_userfaultfd
356 common membarrier sys_membarrier sys_membarrier
357 common recvmmsg sys_recvmmsg compat_sys_recvmmsg
--
2.26.2
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox