LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* RE: [RFC PATCH 1/2] libnvdimm: Add prctl control for disabling synchronous fault support.
From: Williams, Dan J @ 2020-06-02 17:59 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-nvdimm, Aneesh Kumar K.V, jack@suse.de, Jeff Moyer,
	Oliver O'Halloran, Michal Suchánek, linuxppc-dev
In-Reply-To: <20200601095049.GB3960@quack2.suse.cz>

[ forgive formatting, a series of unfortunate events has me using Outlook for the moment ]

> From: Jan Kara <jack@suse.cz>
> > > > These flags are device properties that affect the kernel and
> > > > userspace's handling of persistence.
> > > >
> > >
> > > That will not handle the scenario with multiple applications using
> > > the same fsdax mount point where one is updated to use the new
> > > instruction and the other is not.
> >
> > Right, it needs to be a global setting / flag day to switch from one
> > regime to another. Per-process control is a recipe for disaster.
> 
> First I'd like to mention that hopefully the concern is mostly theoretical since
> as Aneesh wrote above, real persistent memory never shipped for PPC and
> so there are very few apps (if any) using the old way to ensure cache
> flushing.
> 
> But I'd like to understand why do you think per-process control is a recipe for
> disaster? Because from my POV the sysfs interface you propose is actually
> difficult to use in practice. As a distributor, you have hard time picking the
> default because you have a choice between picking safe option which is
> going to confuse users because of failing MAP_SYNC and unsafe option
> where everyone will be happy until someone looses data because of some
> ancient application using wrong instructions to persist data. Poor experience
> for users in either way. And when distro defaults to "safe option", then the
> burden is on the sysadmin to toggle the switch but how is he supposed to
> decide when that is safe? First he has to understand what the problem
> actually is, then he has to audit all the applications using pmem whether they
> use the new instruction - which is IMO a lot of effort if you have a couple of
> applications and practically infeasible if you have more of them.
> So IMO the burden should be *on the application* to declare that it is aware
> of the new instructions to flush pmem on the platform and only to such
> application the kernel should give the trust to use MAP_SYNC mappings.

The "disaster" in my mind is this need to globally change the ABI for persistence semantics for all of Linux because one CPU wants a do over. What does a generic "MAP_SYNC_ENABLE" knob even mean to the existing deployed base of persistent memory applications? Yes, sysfs is awkward, but it's trying to provide some relief without imposing unexplainable semantics on everyone else. I think a comprehensive (overengineered) solution would involve not introducing another "I know what I'm doing" flag to the interface, but maybe requiring applications to call a pmem sync API in something like a vsyscall. Or, also overengineered, some binary translation / interpretation to actively detect and kill applications that deploy the old instructions. Something horrid like on first write fault to a MAP_SYNC try to look ahead in the binary for the correct sync sequence and kill the application otherwise. That would at least provide some enforcement and safety without requiring other architectures to consider what MAP_SYNC_ENABLE means to them.

^ permalink raw reply

* Re: [RESEND PATCH v9 3/5] powerpc/papr_scm: Fetch nvdimm health information from PHYP
From: Ira Weiny @ 2020-06-02 18:45 UTC (permalink / raw)
  To: Vaibhav Jain
  Cc: Santosh Sivaraj, linux-nvdimm, linux-kernel, Steven Rostedt,
	Oliver O'Halloran, Aneesh Kumar K . V, Dan Williams,
	linuxppc-dev
In-Reply-To: <20200602101438.73929-4-vaibhav@linux.ibm.com>

On Tue, Jun 02, 2020 at 03:44:36PM +0530, Vaibhav Jain wrote:
> Implement support for fetching nvdimm health information via
> H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair
> of 64-bit bitmap, bitwise-and of which is then stored in
> 'struct papr_scm_priv' and subsequently partially exposed to
> user-space via newly introduced dimm specific attribute
> 'papr/flags'. Since the hcall is costly, the health information is
> cached and only re-queried, 60s after the previous successful hcall.
> 
> The patch also adds a  documentation text describing flags reported by
> the the new sysfs attribute 'papr/flags' is also introduced at
> Documentation/ABI/testing/sysfs-bus-papr-pmem.
> 
> [1] commit 58b278f568f0 ("powerpc: Provide initial documentation for
> PAPR hcalls")
> 
> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> ---
> Changelog:
> 
> Resend:
> * Added ack from Aneesh.
> 
> v8..v9:
> * Rename some variables and defines to reduce usage of term SCM
>   replacing it with PMEM [Dan Williams, Aneesh]
> * s/PAPR_SCM_DIMM/PAPR_PMEM/g
> * s/papr_scm_nd_attributes/papr_nd_attributes/g
> * s/papr_scm_nd_attribute_group/papr_nd_attribute_group/g
> * s/papr_scm_dimm_attr_groups/papr_nd_attribute_groups/g
> * Renamed file sysfs-bus-papr-scm to sysfs-bus-papr-pmem
> 
> v7..v8:
> * Update type of variable 'rc' in __drc_pmem_query_health() and
>   drc_pmem_query_health() to long and int respectively. [ Ira ]
> * Updated the patch description to s/64 bit Big Endian Number/64-bit
>   bitmap/ [ Ira, Aneesh ].
> 
> Resend:
> * None
> 
> v6..v7 :
> * Used the exported buf_seq_printf() function to generate content for
>   'papr/flags'
> * Moved the PAPR_SCM_DIMM_* bit-flags macro definitions to papr_scm.c
>   and removed the papr_scm.h file [Mpe]
> * Some minor consistency issued in sysfs-bus-papr-scm
>   documentation. [Mpe]
> * s/dimm_mutex/health_mutex/g [Mpe]
> * Split drc_pmem_query_health() into two function one of which takes
>   care of caching and locking. [Mpe]
> * Fixed a local copy creation of dimm health information using
>   READ_ONCE(). [Mpe]
> 
> v5..v6 :
> * Change the flags sysfs attribute from 'papr_flags' to 'papr/flags'
>   [Dan Williams]
> * Include documentation for 'papr/flags' attr [Dan Williams]
> * Change flag 'save_fail' to 'flush_fail' [Dan Williams]
> * Caching of health bitmap to reduce expensive hcalls [Dan Williams]
> * Removed usage of PPC_BIT from 'papr-scm.h' header [Mpe]
> * Replaced two __be64 integers from papr_scm_priv to a single u64
>   integer [Mpe]
> * Updated patch description to reflect the changes made in this
>   version.
> * Removed avoidable usage of 'papr_scm_priv.dimm_mutex' from
>   flags_show() [Dan Williams]
> 
> v4..v5 :
> * None
> 
> v3..v4 :
> * None
> 
> v2..v3 :
> * Removed PAPR_SCM_DIMM_HEALTH_NON_CRITICAL as a condition for
>        	 NVDIMM unarmed [Aneesh]
> 
> v1..v2 :
> * New patch in the series.
> ---
>  Documentation/ABI/testing/sysfs-bus-papr-pmem |  27 +++
>  arch/powerpc/platforms/pseries/papr_scm.c     | 169 +++++++++++++++++-
>  2 files changed, 194 insertions(+), 2 deletions(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem b/Documentation/ABI/testing/sysfs-bus-papr-pmem
> new file mode 100644
> index 000000000000..5b10d036a8d4
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem
> @@ -0,0 +1,27 @@
> +What:		/sys/bus/nd/devices/nmemX/papr/flags
> +Date:		Apr, 2020
> +KernelVersion:	v5.8
> +Contact:	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, linux-nvdimm@lists.01.org,
> +Description:
> +		(RO) Report flags indicating various states of a
> +		papr-pmem NVDIMM device. Each flag maps to a one or
> +		more bits set in the dimm-health-bitmap retrieved in
> +		response to H_SCM_HEALTH hcall. The details of the bit
> +		flags returned in response to this hcall is available
> +		at 'Documentation/powerpc/papr_hcalls.rst' . Below are
> +		the flags reported in this sysfs file:
> +
> +		* "not_armed"	: Indicates that NVDIMM contents will not
> +				  survive a power cycle.
> +		* "flush_fail"	: Indicates that NVDIMM contents
> +				  couldn't be flushed during last
> +				  shut-down event.
> +		* "restore_fail": Indicates that NVDIMM contents
> +				  couldn't be restored during NVDIMM
> +				  initialization.
> +		* "encrypted"	: NVDIMM contents are encrypted.
> +		* "smart_notify": There is health event for the NVDIMM.
> +		* "scrubbed"	: Indicating that contents of the
> +				  NVDIMM have been scrubbed.
> +		* "locked"	: Indicating that NVDIMM contents cant
> +				  be modified until next power cycle.
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> index f35592423380..149431594839 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -12,6 +12,7 @@
>  #include <linux/libnvdimm.h>
>  #include <linux/platform_device.h>
>  #include <linux/delay.h>
> +#include <linux/seq_buf.h>
>  
>  #include <asm/plpar_wrappers.h>
>  
> @@ -22,6 +23,44 @@
>  	 (1ul << ND_CMD_GET_CONFIG_DATA) | \
>  	 (1ul << ND_CMD_SET_CONFIG_DATA))
>  
> +/* DIMM health bitmap bitmap indicators */
> +/* SCM device is unable to persist memory contents */
> +#define PAPR_PMEM_UNARMED                   (1ULL << (63 - 0))
> +/* SCM device failed to persist memory contents */
> +#define PAPR_PMEM_SHUTDOWN_DIRTY            (1ULL << (63 - 1))
> +/* SCM device contents are persisted from previous IPL */
> +#define PAPR_PMEM_SHUTDOWN_CLEAN            (1ULL << (63 - 2))
> +/* SCM device contents are not persisted from previous IPL */
> +#define PAPR_PMEM_EMPTY                     (1ULL << (63 - 3))
> +/* SCM device memory life remaining is critically low */
> +#define PAPR_PMEM_HEALTH_CRITICAL           (1ULL << (63 - 4))
> +/* SCM device will be garded off next IPL due to failure */
> +#define PAPR_PMEM_HEALTH_FATAL              (1ULL << (63 - 5))
> +/* SCM contents cannot persist due to current platform health status */
> +#define PAPR_PMEM_HEALTH_UNHEALTHY          (1ULL << (63 - 6))
> +/* SCM device is unable to persist memory contents in certain conditions */
> +#define PAPR_PMEM_HEALTH_NON_CRITICAL       (1ULL << (63 - 7))
> +/* SCM device is encrypted */
> +#define PAPR_PMEM_ENCRYPTED                 (1ULL << (63 - 8))
> +/* SCM device has been scrubbed and locked */
> +#define PAPR_PMEM_SCRUBBED_AND_LOCKED       (1ULL << (63 - 9))
> +
> +/* Bits status indicators for health bitmap indicating unarmed dimm */
> +#define PAPR_PMEM_UNARMED_MASK (PAPR_PMEM_UNARMED |		\
> +				PAPR_PMEM_HEALTH_UNHEALTHY)
> +
> +/* Bits status indicators for health bitmap indicating unflushed dimm */
> +#define PAPR_PMEM_BAD_SHUTDOWN_MASK (PAPR_PMEM_SHUTDOWN_DIRTY)
> +
> +/* Bits status indicators for health bitmap indicating unrestored dimm */
> +#define PAPR_PMEM_BAD_RESTORE_MASK  (PAPR_PMEM_EMPTY)
> +
> +/* Bit status indicators for smart event notification */
> +#define PAPR_PMEM_SMART_EVENT_MASK (PAPR_PMEM_HEALTH_CRITICAL | \
> +				    PAPR_PMEM_HEALTH_FATAL |	\
> +				    PAPR_PMEM_HEALTH_UNHEALTHY)
> +
> +/* private struct associated with each region */
>  struct papr_scm_priv {
>  	struct platform_device *pdev;
>  	struct device_node *dn;
> @@ -39,6 +78,15 @@ struct papr_scm_priv {
>  	struct resource res;
>  	struct nd_region *region;
>  	struct nd_interleave_set nd_set;
> +
> +	/* Protect dimm health data from concurrent read/writes */
> +	struct mutex health_mutex;

I question if this really needs protection.  But I don't think it hurts.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

> +
> +	/* Last time the health information of the dimm was updated */
> +	unsigned long lasthealth_jiffies;
> +
> +	/* Health information for the dimm */
> +	u64 health_bitmap;
>  };
>  
>  static int drc_pmem_bind(struct papr_scm_priv *p)
> @@ -144,6 +192,62 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
>  	return drc_pmem_bind(p);
>  }
>  
> +/*
> + * Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
> + * health information.
> + */
> +static int __drc_pmem_query_health(struct papr_scm_priv *p)
> +{
> +	unsigned long ret[PLPAR_HCALL_BUFSIZE];
> +	long rc;
> +
> +	/* issue the hcall */
> +	rc = plpar_hcall(H_SCM_HEALTH, ret, p->drc_index);
> +	if (rc != H_SUCCESS) {
> +		dev_err(&p->pdev->dev,
> +			 "Failed to query health information, Err:%ld\n", rc);
> +		rc = -ENXIO;
> +		goto out;
> +	}
> +
> +	p->lasthealth_jiffies = jiffies;
> +	p->health_bitmap = ret[0] & ret[1];
> +
> +	dev_dbg(&p->pdev->dev,
> +		"Queried dimm health info. Bitmap:0x%016lx Mask:0x%016lx\n",
> +		ret[0], ret[1]);
> +out:
> +	return rc;
> +}
> +
> +/* Min interval in seconds for assuming stable dimm health */
> +#define MIN_HEALTH_QUERY_INTERVAL 60
> +
> +/* Query cached health info and if needed call drc_pmem_query_health */
> +static int drc_pmem_query_health(struct papr_scm_priv *p)
> +{
> +	unsigned long cache_timeout;
> +	int rc;
> +
> +	/* Protect concurrent modifications to papr_scm_priv */
> +	rc = mutex_lock_interruptible(&p->health_mutex);
> +	if (rc)
> +		return rc;
> +
> +	/* Jiffies offset for which the health data is assumed to be same */
> +	cache_timeout = p->lasthealth_jiffies +
> +		msecs_to_jiffies(MIN_HEALTH_QUERY_INTERVAL * 1000);
> +
> +	/* Fetch new health info is its older than MIN_HEALTH_QUERY_INTERVAL */
> +	if (time_after(jiffies, cache_timeout))
> +		rc = __drc_pmem_query_health(p);
> +	else
> +		/* Assume cached health data is valid */
> +		rc = 0;
> +
> +	mutex_unlock(&p->health_mutex);
> +	return rc;
> +}
>  
>  static int papr_scm_meta_get(struct papr_scm_priv *p,
>  			     struct nd_cmd_get_config_data_hdr *hdr)
> @@ -286,6 +390,64 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
>  	return 0;
>  }
>  
> +static ssize_t flags_show(struct device *dev,
> +				struct device_attribute *attr, char *buf)
> +{
> +	struct nvdimm *dimm = to_nvdimm(dev);
> +	struct papr_scm_priv *p = nvdimm_provider_data(dimm);
> +	struct seq_buf s;
> +	u64 health;
> +	int rc;
> +
> +	rc = drc_pmem_query_health(p);
> +	if (rc)
> +		return rc;
> +
> +	/* Copy health_bitmap locally, check masks & update out buffer */
> +	health = READ_ONCE(p->health_bitmap);
> +
> +	seq_buf_init(&s, buf, PAGE_SIZE);
> +	if (health & PAPR_PMEM_UNARMED_MASK)
> +		seq_buf_printf(&s, "not_armed ");
> +
> +	if (health & PAPR_PMEM_BAD_SHUTDOWN_MASK)
> +		seq_buf_printf(&s, "flush_fail ");
> +
> +	if (health & PAPR_PMEM_BAD_RESTORE_MASK)
> +		seq_buf_printf(&s, "restore_fail ");
> +
> +	if (health & PAPR_PMEM_ENCRYPTED)
> +		seq_buf_printf(&s, "encrypted ");
> +
> +	if (health & PAPR_PMEM_SMART_EVENT_MASK)
> +		seq_buf_printf(&s, "smart_notify ");
> +
> +	if (health & PAPR_PMEM_SCRUBBED_AND_LOCKED)
> +		seq_buf_printf(&s, "scrubbed locked ");
> +
> +	if (seq_buf_used(&s))
> +		seq_buf_printf(&s, "\n");
> +
> +	return seq_buf_used(&s);
> +}
> +DEVICE_ATTR_RO(flags);
> +
> +/* papr_scm specific dimm attributes */
> +static struct attribute *papr_nd_attributes[] = {
> +	&dev_attr_flags.attr,
> +	NULL,
> +};
> +
> +static struct attribute_group papr_nd_attribute_group = {
> +	.name = "papr",
> +	.attrs = papr_nd_attributes,
> +};
> +
> +static const struct attribute_group *papr_nd_attr_groups[] = {
> +	&papr_nd_attribute_group,
> +	NULL,
> +};
> +
>  static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>  {
>  	struct device *dev = &p->pdev->dev;
> @@ -312,8 +474,8 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>  	dimm_flags = 0;
>  	set_bit(NDD_LABELING, &dimm_flags);
>  
> -	p->nvdimm = nvdimm_create(p->bus, p, NULL, dimm_flags,
> -				  PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
> +	p->nvdimm = nvdimm_create(p->bus, p, papr_nd_attr_groups,
> +				  dimm_flags, PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
>  	if (!p->nvdimm) {
>  		dev_err(dev, "Error creating DIMM object for %pOF\n", p->dn);
>  		goto err;
> @@ -399,6 +561,9 @@ static int papr_scm_probe(struct platform_device *pdev)
>  	if (!p)
>  		return -ENOMEM;
>  
> +	/* Initialize the dimm mutex */
> +	mutex_init(&p->health_mutex);
> +
>  	/* optional DT properties */
>  	of_property_read_u32(dn, "ibm,metadata-size", &metadata_size);
>  
> -- 
> 2.26.2
> 

^ permalink raw reply

* Re: [PATCH 11/14] x86/entry: Clarify irq_{enter,exit}_rcu()
From: Qian Cai @ 2020-06-02 18:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: daniel.thompson, andrew.cooper3, bigeasy, x86, linux-kernel,
	sean.j.christopherson, luto, Lai Jiangshan, rostedt, a.darwish,
	tglx, linuxppc-dev
In-Reply-To: <20200602150511.GH706478@hirez.programming.kicks-ass.net>

On Tue, Jun 02, 2020 at 05:05:11PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 02, 2020 at 10:42:35AM -0400, Qian Cai wrote:
> 
> > Reverted this commit fixed the POWER9 boot warning,
> 
> ARGH, I'm an idiot. Please try this instead:
>
> 
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index a3eb6eba8c41..c4201b7f42b1 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -438,7 +438,7 @@ void irq_exit_rcu(void)
>   */
>  void irq_exit(void)
>  {
> -	irq_exit_rcu();
> +	__irq_exit_rcu();
>  	rcu_irq_exit();
>  	 /* must be last! */
>  	lockdep_hardirq_exit();

This works fine.

^ permalink raw reply

* Re: [PATCH 1/4] powerpc: Add a ppc_inst_as_str() helper
From: Segher Boessenkool @ 2020-06-02 19:41 UTC (permalink / raw)
  To: Jordan Niethe; +Cc: linuxppc-dev, alistair
In-Reply-To: <20200602052728.18227-1-jniethe5@gmail.com>

On Tue, Jun 02, 2020 at 03:27:25PM +1000, Jordan Niethe wrote:
> There are quite a few places where instructions are printed, this is
> done using a '%x' format specifier. With the introduction of prefixed
> instructions, this does not work well. Currently in these places,
> ppc_inst_val() is used for the value for %x so only the first word of
> prefixed instructions are printed.
> 
> When the instructions are word instructions, only a single word should
> be printed. For prefixed instructions both the prefix and suffix should
> be printed. To accommodate both of these situations, instead of a '%x'
> specifier use '%s' and introduce a helper, __ppc_inst_as_str() which
> returns a char *. The char * __ppc_inst_as_str() returns is buffer that
> is passed to it by the caller.
> 
> It is cumbersome to require every caller of __ppc_inst_as_str() to now
> declare a buffer. To make it more convenient to use __ppc_inst_as_str(),
> wrap it in a macro that uses a compound statement to allocate a buffer
> on the caller's stack before calling it.
> 
> Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>

> +#define PPC_INST_STR_LEN sizeof("0x00000000 0x00000000")

sizeof is not a function or anything function-like; it's just an
operator, like (unary) "+" or "-" or dereference "*".  The parentheses
are completely redundant here.  And it kind of matters, as written a
reader might think that a string object is actually constructed here.

But, so very many people do this, I'm not going to fight this fight ;-)


Segher

^ permalink raw reply

* [PATCH v5 2/3] arch: wire-up close_range()
From: Christian Brauner @ 2020-06-02 20:42 UTC (permalink / raw)
  To: torvalds, linux-kernel, Kyle Evans, Victor Stinner
  Cc: linux-ia64, linux-sh, ldv, dhowells, sparclinux,
	Christian Brauner, shuah, linux-arch, linux-s390, x86, linux-mips,
	linux-xtensa, arnd, jannh, linux-m68k, viro, linux-arm-kernel,
	fweimer, linux-parisc, linux-api, oleg, linux-alpha,
	linux-fsdevel, linuxppc-dev
In-Reply-To: <20200602204219.186620-1-christian.brauner@ubuntu.com>

This wires up the close_range() syscall into all arches at once.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Jann Horn <jannh@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Dmitry V. Levin <ldv@altlinux.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: linux-api@vger.kernel.org
Cc: linux-alpha@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-mips@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-arch@vger.kernel.org
Cc: x86@kernel.org
---
/* v2 */
not present

/* v3 */
not present

/* v4 */
introduced
- Arnd Bergmann <arnd@arndb.de>:
  - split into two patches:
    1. add close_range()
    2. add syscall to all arches at once
  - bump __NR_compat_syscalls in arch/arm64/include/asm/unistd.h

/* v5 */
unchanged
---
 arch/alpha/kernel/syscalls/syscall.tbl      | 1 +
 arch/arm/tools/syscall.tbl                  | 1 +
 arch/arm64/include/asm/unistd.h             | 2 +-
 arch/arm64/include/asm/unistd32.h           | 2 ++
 arch/ia64/kernel/syscalls/syscall.tbl       | 1 +
 arch/m68k/kernel/syscalls/syscall.tbl       | 1 +
 arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl   | 1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl   | 1 +
 arch/mips/kernel/syscalls/syscall_o32.tbl   | 1 +
 arch/parisc/kernel/syscalls/syscall.tbl     | 1 +
 arch/powerpc/kernel/syscalls/syscall.tbl    | 1 +
 arch/s390/kernel/syscalls/syscall.tbl       | 1 +
 arch/sh/kernel/syscalls/syscall.tbl         | 1 +
 arch/sparc/kernel/syscalls/syscall.tbl      | 1 +
 arch/x86/entry/syscalls/syscall_32.tbl      | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl      | 1 +
 arch/xtensa/kernel/syscalls/syscall.tbl     | 1 +
 include/uapi/asm-generic/unistd.h           | 4 +++-
 19 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index 36d42da7466a..67ef02ead4da 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -475,5 +475,6 @@
 543	common	fspick				sys_fspick
 544	common	pidfd_open			sys_pidfd_open
 # 545 reserved for clone3
+546	common	close_range			sys_close_range
 547	common	openat2				sys_openat2
 548	common	pidfd_getfd			sys_pidfd_getfd
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 4d1cf74a2caa..13c5652137fb 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -449,5 +449,6 @@
 433	common	fspick				sys_fspick
 434	common	pidfd_open			sys_pidfd_open
 435	common	clone3				sys_clone3
+436	common	close_range			sys_close_range
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 803039d504de..3b859596840d 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -38,7 +38,7 @@
 #define __ARM_NR_compat_set_tls		(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END		(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls		439
+#define __NR_compat_syscalls		440
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index c1c61635f89c..902bfb136002 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -879,6 +879,8 @@ __SYSCALL(__NR_fspick, sys_fspick)
 __SYSCALL(__NR_pidfd_open, sys_pidfd_open)
 #define __NR_clone3 435
 __SYSCALL(__NR_clone3, sys_clone3)
+#define __NR_close_range 436
+__SYSCALL(__NR_close_range, sys_close_range)
 #define __NR_openat2 437
 __SYSCALL(__NR_openat2, sys_openat2)
 #define __NR_pidfd_getfd 438
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
index 042911e670b8..df2e14da6a29 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -356,5 +356,6 @@
 433	common	fspick				sys_fspick
 434	common	pidfd_open			sys_pidfd_open
 # 435 reserved for clone3
+436	common	close_range			sys_close_range
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index f4f49fcb76d0..553b8858e667 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -435,5 +435,6 @@
 433	common	fspick				sys_fspick
 434	common	pidfd_open			sys_pidfd_open
 435	common	clone3				__sys_clone3
+436	common	close_range			sys_close_range
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index 4c67b11f9c9e..4467e2211d3c 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -441,5 +441,6 @@
 433	common	fspick				sys_fspick
 434	common	pidfd_open			sys_pidfd_open
 435	common	clone3				sys_clone3
+436	common	close_range			sys_close_range
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 1f9e8ad636cc..82ad4cce163a 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -374,5 +374,6 @@
 433	n32	fspick				sys_fspick
 434	n32	pidfd_open			sys_pidfd_open
 435	n32	clone3				__sys_clone3
+436	n32	close_range			sys_close_range
 437	n32	openat2				sys_openat2
 438	n32	pidfd_getfd			sys_pidfd_getfd
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index c0b9d802dbf6..232934c26d07 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -350,5 +350,6 @@
 433	n64	fspick				sys_fspick
 434	n64	pidfd_open			sys_pidfd_open
 435	n64	clone3				__sys_clone3
+436	n64	close_range			sys_close_range
 437	n64	openat2				sys_openat2
 438	n64	pidfd_getfd			sys_pidfd_getfd
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index ac586774c980..d63000c8e769 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -423,5 +423,6 @@
 433	o32	fspick				sys_fspick
 434	o32	pidfd_open			sys_pidfd_open
 435	o32	clone3				__sys_clone3
+436	o32	close_range			sys_close_range
 437	o32	openat2				sys_openat2
 438	o32	pidfd_getfd			sys_pidfd_getfd
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index 52a15f5cd130..8612458afda6 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -433,5 +433,6 @@
 433	common	fspick				sys_fspick
 434	common	pidfd_open			sys_pidfd_open
 435	common	clone3				sys_clone3_wrapper
+436	common	close_range			sys_close_range
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 220ae11555f2..ac92f5d7279d 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -525,5 +525,6 @@
 435	32	clone3				ppc_clone3			sys_clone3
 435	64	clone3				sys_clone3
 435	spu	clone3				sys_ni_syscall
+436	common	close_range			sys_close_range
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index bd7bd3581a0f..371bb1f2bfc3 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -438,5 +438,6 @@
 433  common	fspick			sys_fspick			sys_fspick
 434  common	pidfd_open		sys_pidfd_open			sys_pidfd_open
 435  common	clone3			sys_clone3			sys_clone3
+436  common	close_range		sys_close_range			sys_close_range
 437  common	openat2			sys_openat2			sys_openat2
 438  common	pidfd_getfd		sys_pidfd_getfd			sys_pidfd_getfd
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index c7a30fcd135f..4db428e7acdd 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -438,5 +438,6 @@
 433	common	fspick				sys_fspick
 434	common	pidfd_open			sys_pidfd_open
 # 435 reserved for clone3
+436	common	close_range			sys_close_range
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index f13615ecdecc..c9233f79a11b 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -481,5 +481,6 @@
 433	common	fspick				sys_fspick
 434	common	pidfd_open			sys_pidfd_open
 # 435 reserved for clone3
+436	common	close_range			sys_close_range
 437	common	openat2			sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 54581ac671b4..49ea7190351a 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -440,5 +440,6 @@
 433	i386	fspick			sys_fspick
 434	i386	pidfd_open		sys_pidfd_open
 435	i386	clone3			sys_clone3
+436	i386	close_range		sys_close_range
 437	i386	openat2			sys_openat2
 438	i386	pidfd_getfd		sys_pidfd_getfd
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 37b844f839bc..c2b50b16a24c 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -357,6 +357,7 @@
 433	common	fspick			sys_fspick
 434	common	pidfd_open		sys_pidfd_open
 435	common	clone3			sys_clone3
+436	common	close_range		sys_close_range
 437	common	openat2			sys_openat2
 438	common	pidfd_getfd		sys_pidfd_getfd
 
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index 85a9ab1bc04d..d5dd7e506893 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -406,5 +406,6 @@
 433	common	fspick				sys_fspick
 434	common	pidfd_open			sys_pidfd_open
 435	common	clone3				sys_clone3
+436	common	close_range			sys_close_range
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 3a3201e4618e..ed4e7c2a557f 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -850,6 +850,8 @@ __SYSCALL(__NR_pidfd_open, sys_pidfd_open)
 #define __NR_clone3 435
 __SYSCALL(__NR_clone3, sys_clone3)
 #endif
+#define __NR_close_range 436
+__SYSCALL(__NR_close_range, sys_close_range)
 
 #define __NR_openat2 437
 __SYSCALL(__NR_openat2, sys_openat2)
@@ -857,7 +859,7 @@ __SYSCALL(__NR_openat2, sys_openat2)
 __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
 
 #undef __NR_syscalls
-#define __NR_syscalls 439
+#define __NR_syscalls 440
 
 /*
  * 32 bit systems traditionally used different
-- 
2.26.2


^ permalink raw reply related

* Re: [RESEND PATCH v9 4/5] ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
From: Ira Weiny @ 2020-06-02 20:51 UTC (permalink / raw)
  To: Vaibhav Jain
  Cc: Santosh Sivaraj, linux-nvdimm, linux-kernel, Steven Rostedt,
	Oliver O'Halloran, Aneesh Kumar K . V, Dan Williams,
	linuxppc-dev
In-Reply-To: <20200602101438.73929-5-vaibhav@linux.ibm.com>

On Tue, Jun 02, 2020 at 03:44:37PM +0530, Vaibhav Jain wrote:
> Introduce support for PAPR NVDIMM Specific Methods (PDSM) in papr_scm
> module and add the command family NVDIMM_FAMILY_PAPR to the white list
> of NVDIMM command sets. Also advertise support for ND_CMD_CALL for the
> nvdimm command mask and implement necessary scaffolding in the module
> to handle ND_CMD_CALL ioctl and PDSM requests that we receive.
> 
> The layout of the PDSM request as we expect from libnvdimm/libndctl is
> described in newly introduced uapi header 'papr_pdsm.h' which
> defines a new 'struct nd_pdsm_cmd_pkg' header. This header is used
> to communicate the PDSM request via member
> 'nd_cmd_pkg.nd_command' and size of payload that need to be
> sent/received for servicing the PDSM.
> 
> A new function is_cmd_valid() is implemented that reads the args to
> papr_scm_ndctl() and performs sanity tests on them. A new function
> papr_scm_service_pdsm() is introduced and is called from
> papr_scm_ndctl() in case of a PDSM request is received via ND_CMD_CALL
> command from libnvdimm.
> 
> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> ---
> Changelog:
> 
> Resend:
> * Added ack from Aneesh.
> 
> v8..v9:
> * Reduced the usage of term SCM replacing it with appropriate
>   replacement [ Dan Williams, Aneesh ]
> * Renamed 'papr_scm_pdsm.h' to 'papr_pdsm.h'
> * s/PAPR_SCM_PDSM_*/PAPR_PDSM_*/g
> * s/NVDIMM_FAMILY_PAPR_SCM/NVDIMM_FAMILY_PAPR/g
> * Minor updates to 'papr_psdm.h' to replace usage of term 'SCM'.
> * Minor update to patch description.
> 
> v7..v8:
> * Removed the 'payload_offset' field from 'struct
>   nd_pdsm_cmd_pkg'. Instead command payload is always assumed to start
>   at 'nd_pdsm_cmd_pkg.payload'. [ Aneesh ]
> * To enable introducing new fields to 'struct nd_pdsm_cmd_pkg',
>   'reserved' field of 10-bytes is introduced. [ Aneesh ]
> * Fixed a typo in "Backward Compatibility" section of papr_scm_pdsm.h
>   [ Ira ]
> 
> Resend:
> * None
> 
> v6..v7 :
> * Removed the re-definitions of __packed macro from papr_scm_pdsm.h
>   [Mpe].
> * Removed the usage of __KERNEL__ macros in papr_scm_pdsm.h [Mpe].
> * Removed macros that were unused in papr_scm.c from papr_scm_pdsm.h
>   [Mpe].
> * Made functions defined in papr_scm_pdsm.h as static inline. [Mpe]
> 
> v5..v6 :
> * Changed the usage of the term DSM to PDSM to distinguish it from the
>   ACPI term [ Dan Williams ]
> * Renamed papr_scm_dsm.h to papr_scm_pdsm.h and updated various struct
>   to reflect the new terminology.
> * Updated the patch description and title to reflect the new terminology.
> * Squashed patch to introduce new command family in 'ndctl.h' with
>   this patch [ Dan Williams ]
> * Updated the papr_scm_pdsm method starting index from 0x10000 to 0x0
>   [ Dan Williams ]
> * Removed redundant license text from the papr_scm_psdm.h file.
>   [ Dan Williams ]
> * s/envelop/envelope/ at various places [ Dan Williams ]
> * Added '__packed' attribute to command package header to gaurd
>   against different compiler adding paddings between the fields.
>   [ Dan Williams]
> * Converted various pr_debug to dev_debug [ Dan Williams ]
> 
> v4..v5 :
> * None
> 
> v3..v4 :
> * None
> 
> v2..v3 :
> * Updated the patch prefix to 'ndctl/uapi' [Aneesh]
> 
> v1..v2 :
> * None
> ---
>  arch/powerpc/include/uapi/asm/papr_pdsm.h | 136 ++++++++++++++++++++++
>  arch/powerpc/platforms/pseries/papr_scm.c | 101 +++++++++++++++-
>  include/uapi/linux/ndctl.h                |   1 +
>  3 files changed, 232 insertions(+), 6 deletions(-)
>  create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h
> 
> diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h
> new file mode 100644
> index 000000000000..6407fefcc007
> --- /dev/null
> +++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
> @@ -0,0 +1,136 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +/*
> + * PAPR nvDimm Specific Methods (PDSM) and structs for libndctl
> + *
> + * (C) Copyright IBM 2020
> + *
> + * Author: Vaibhav Jain <vaibhav at linux.ibm.com>
> + */
> +
> +#ifndef _UAPI_ASM_POWERPC_PAPR_PDSM_H_
> +#define _UAPI_ASM_POWERPC_PAPR_PDSM_H_
> +
> +#include <linux/types.h>
> +
> +/*
> + * PDSM Envelope:
> + *
> + * The ioctl ND_CMD_CALL transfers data between user-space and kernel via
> + * envelope which consists of a header and user-defined payload sections.
> + * The header is described by 'struct nd_pdsm_cmd_pkg' which expects a
> + * payload following it and accessible via 'nd_pdsm_cmd_pkg.payload' field.
> + * There is reserved field that can used to introduce new fields to the
> + * structure in future. It also tries to ensure that 'nd_pdsm_cmd_pkg.payload'
> + * lies at a 8-byte boundary.
> + *
> + *  +-------------+---------------------+---------------------------+
> + *  |   64-Bytes  |       16-Bytes      |       Max 176-Bytes       |
> + *  +-------------+---------------------+---------------------------+
> + *  |               nd_pdsm_cmd_pkg     |                           |
> + *  |-------------+                     |                           |
> + *  |  nd_cmd_pkg |                     |                           |
> + *  +-------------+---------------------+---------------------------+
> + *  | nd_family   |                     |                           |
> + *  | nd_size_out | cmd_status          |                           |
> + *  | nd_size_in  | payload_version     |     payload               |
> + *  | nd_command  | reserved            |                           |
> + *  | nd_fw_size  |                     |                           |
> + *  +-------------+---------------------+---------------------------+
> + *
> + * PDSM Header:
> + *
> + * The header is defined as 'struct nd_pdsm_cmd_pkg' which embeds a
> + * 'struct nd_cmd_pkg' instance. The PDSM command is assigned to member
> + * 'nd_cmd_pkg.nd_command'. Apart from size information of the envelope which is
> + * contained in 'struct nd_cmd_pkg', the header also has members following 
                                                             ^^^^^
                                                        ...  the  ...

> + * members:
> + *
> + * 'cmd_status'		: (Out) Errors if any encountered while servicing PDSM.
> + * 'payload_version'	: (In/Out) Version number associated with the payload.
> + * 'reserved'		: Not used and reserved for future.
> + *
> + * PDSM Payload:
> + *
> + * The layout of the PDSM Payload is defined by various structs shared between
> + * papr_scm and libndctl so that contents of payload can be interpreted. During
> + * servicing of a PDSM the papr_scm module will read input args from the payload
> + * field by casting its contents to an appropriate struct pointer based on the
> + * PDSM command. Similarly the output of servicing the PDSM command will be
> + * copied to the payload field using the same struct.
> + *
> + * 'libnvdimm' enforces a hard limit of 256 bytes on the envelope size, which
> + * leaves around 176 bytes for the envelope payload (ignoring any padding that
> + * the compiler may silently introduce).
> + *
> + * Payload Version:
> + *
> + * A 'payload_version' field is present in PDSM header that indicates a specific
> + * version of the structure present in PDSM Payload for a given PDSM command.
> + * This provides backward compatibility in case the PDSM Payload structure
> + * evolves and different structures are supported by 'papr_scm' and 'libndctl'.
> + *
> + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send the version
> + * of the payload struct it supports via 'payload_version' field. The 'papr_scm'
> + * module when servicing the PDSM envelope checks the 'payload_version' and then
> + * uses 'payload struct version' == MIN('payload_version field',
> + * 'max payload-struct-version supported by papr_scm') to service the PDSM.
> + * After servicing the PDSM, 'papr_scm' put the negotiated version of payload
> + * struct in returned 'payload_version' field.
> + *
> + * Libndctl on receiving the envelope back from papr_scm again checks the
> + * 'payload_version' field and based on it use the appropriate version dsm
> + * struct to parse the results.
> + *
> + * Backward Compatibility:
> + *
> + * Above scheme of exchanging different versioned PDSM struct between libndctl
> + * and papr_scm should provide backward compatibility until following two
> + * assumptions/conditions when defining new PDSM structs hold:
> + *
> + * Let T(X) = { set of attributes in PDSM struct 'T' versioned X }
> + *
> + * 1. T(X) is a proper subset of T(Y) if Y > X.
> + *    i.e Each new version of PDSM struct should retain existing struct
> + *    attributes from previous version
> + *
> + * 2. If an entity (libndctl or papr_scm) supports a PDSM struct T(X) then
> + *    it should also support T(1), T(2)...T(X - 1).
> + *    i.e When adding support for new version of a PDSM struct, libndctl
> + *    and papr_scm should retain support of the existing PDSM struct
> + *    version they support.

Please see this thread for an example why versions are a bad idea in UAPIs:

https://lkml.org/lkml/2020/3/26/213

While the use of version is different in that thread the fundamental issues are
the same.  You end up with some weird matrix of supported features and
structure definitions.  For example, you are opening up the possibility of
changing structures with a different version for no good reason.

Also having the user query with version Z and get back version X (older) is
odd.  Generally if the kernel does not know about a feature (ie version Z of
the structure) it should return -EINVAL and let the user figure out what to do.
The user may just give up or they could try a different query.

> + */
> +
> +/* PDSM-header + payload expected with ND_CMD_CALL ioctl from libnvdimm */
> +struct nd_pdsm_cmd_pkg {
> +	struct nd_cmd_pkg hdr;	/* Package header containing sub-cmd */
> +	__s32 cmd_status;	/* Out: Sub-cmd status returned back */
> +	__u16 reserved[5];	/* Ignored and to be used in future */

How do you know when reserved is used for something else in the future?  Is
reserved guaranteed (and checked by the code) to be 0?

> +	__u16 payload_version;	/* In/Out: version of the payload */

Why is payload_version after reserved?

> +	__u8 payload[];		/* In/Out: Sub-cmd data buffer */
> +} __packed;
> +
> +/*
> + * Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
> + * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
> + */
> +enum papr_pdsm {
> +	PAPR_PDSM_MIN = 0x0,
> +	PAPR_PDSM_MAX,
> +};
> +
> +/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */
> +static inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg *cmd)
> +{
> +	return (struct nd_pdsm_cmd_pkg *) cmd;
> +}
> +
> +/* Return the payload pointer for a given pcmd */
> +static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
> +{
> +	if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
> +		return NULL;
> +	else
> +		return (void *)(pcmd->payload);
> +}
> +
> +#endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> index 149431594839..5e2237e7ec08 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -15,13 +15,15 @@
>  #include <linux/seq_buf.h>
>  
>  #include <asm/plpar_wrappers.h>
> +#include <asm/papr_pdsm.h>
>  
>  #define BIND_ANY_ADDR (~0ul)
>  
>  #define PAPR_SCM_DIMM_CMD_MASK \
>  	((1ul << ND_CMD_GET_CONFIG_SIZE) | \
>  	 (1ul << ND_CMD_GET_CONFIG_DATA) | \
> -	 (1ul << ND_CMD_SET_CONFIG_DATA))
> +	 (1ul << ND_CMD_SET_CONFIG_DATA) | \
> +	 (1ul << ND_CMD_CALL))
>  
>  /* DIMM health bitmap bitmap indicators */
>  /* SCM device is unable to persist memory contents */
> @@ -350,16 +352,97 @@ static int papr_scm_meta_set(struct papr_scm_priv *p,
>  	return 0;
>  }
>  
> +/*
> + * Validate the inputs args to dimm-control function and return '0' if valid.
> + * This also does initial sanity validation to ND_CMD_CALL sub-command packages.
> + */
> +static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
> +		       unsigned int buf_len)
> +{
> +	unsigned long cmd_mask = PAPR_SCM_DIMM_CMD_MASK;
> +	struct nd_pdsm_cmd_pkg *pkg = nd_to_pdsm_cmd_pkg(buf);
> +	struct papr_scm_priv *p;
> +
> +	/* Only dimm-specific calls are supported atm */
> +	if (!nvdimm)
> +		return -EINVAL;
> +
> +	/* get the provider date from struct nvdimm */

s/date/data

> +	p = nvdimm_provider_data(nvdimm);
> +
> +	if (!test_bit(cmd, &cmd_mask)) {
> +		dev_dbg(&p->pdev->dev, "Unsupported cmd=%u\n", cmd);
> +		return -EINVAL;
> +	} else if (cmd == ND_CMD_CALL) {
> +
> +		/* Verify the envelope package */
> +		if (!buf || buf_len < sizeof(struct nd_pdsm_cmd_pkg)) {
> +			dev_dbg(&p->pdev->dev, "Invalid pkg size=%u\n",
> +				buf_len);
> +			return -EINVAL;
> +		}
> +
> +		/* Verify that the PDSM family is valid */
> +		if (pkg->hdr.nd_family != NVDIMM_FAMILY_PAPR) {
> +			dev_dbg(&p->pdev->dev, "Invalid pkg family=0x%llx\n",
> +				pkg->hdr.nd_family);
> +			return -EINVAL;
> +
> +		}
> +
> +		/* We except a payload with all PDSM commands */
> +		if (pdsm_cmd_to_payload(pkg) == NULL) {
> +			dev_dbg(&p->pdev->dev,
> +				"Empty payload for sub-command=0x%llx\n",
> +				pkg->hdr.nd_command);
> +			return -EINVAL;
> +		}
> +	}
> +
> +	/* Command looks valid */

I assume the first command to be implemented also checks the { nd_command,
payload_version, payload length } for correctness?

> +	return 0;
> +}
> +
> +static int papr_scm_service_pdsm(struct papr_scm_priv *p,
> +				struct nd_pdsm_cmd_pkg *call_pkg)
> +{
> +	/* unknown subcommands return error in packages */
> +	if (call_pkg->hdr.nd_command <= PAPR_PDSM_MIN ||
> +	    call_pkg->hdr.nd_command >= PAPR_PDSM_MAX) {
> +		dev_dbg(&p->pdev->dev, "Invalid PDSM request 0x%llx\n",
> +			call_pkg->hdr.nd_command);
> +		call_pkg->cmd_status = -EINVAL;
> +		return 0;
> +	}
> +
> +	/* Depending on the DSM command call appropriate service routine */
> +	switch (call_pkg->hdr.nd_command) {
> +	default:
> +		dev_dbg(&p->pdev->dev, "Unsupported PDSM request 0x%llx\n",
> +			call_pkg->hdr.nd_command);
> +		call_pkg->cmd_status = -ENOENT;
> +		return 0;
> +	}
> +}
> +
>  static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
>  			  struct nvdimm *nvdimm, unsigned int cmd, void *buf,
>  			  unsigned int buf_len, int *cmd_rc)
>  {
>  	struct nd_cmd_get_config_size *get_size_hdr;
>  	struct papr_scm_priv *p;
> +	struct nd_pdsm_cmd_pkg *call_pkg = NULL;
> +	int rc;
>  
> -	/* Only dimm-specific calls are supported atm */
> -	if (!nvdimm)
> -		return -EINVAL;
> +	/* Use a local variable in case cmd_rc pointer is NULL */
> +	if (cmd_rc == NULL)
> +		cmd_rc = &rc;

Why is this needed?  AFAICT The caller of papr_scm_ndctl does not specify null
and you did not change it.

> +
> +	*cmd_rc = is_cmd_valid(nvdimm, cmd, buf, buf_len);
> +	if (*cmd_rc) {
> +		pr_debug("Invalid cmd=0x%x. Err=%d\n", cmd, *cmd_rc);
> +		return *cmd_rc;
> +	}
>  
>  	p = nvdimm_provider_data(nvdimm);
>  
> @@ -381,13 +464,19 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
>  		*cmd_rc = papr_scm_meta_set(p, buf);
>  		break;
>  
> +	case ND_CMD_CALL:
> +		call_pkg = nd_to_pdsm_cmd_pkg(buf);
> +		*cmd_rc = papr_scm_service_pdsm(p, call_pkg);
> +		break;
> +
>  	default:
> -		return -EINVAL;
> +		dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd);
> +		*cmd_rc = -EINVAL;

Is this change related?  If there is a bug where there is a caller of
papr_scm_ndctl() with cmd_rc == NULL this should be a separate patch to fix
that issue.

Ira

>  	}
>  
>  	dev_dbg(&p->pdev->dev, "returned with cmd_rc = %d\n", *cmd_rc);
>  
> -	return 0;
> +	return *cmd_rc;
>  }
>  
>  static ssize_t flags_show(struct device *dev,
> diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
> index de5d90212409..0e09dc5cec19 100644
> --- a/include/uapi/linux/ndctl.h
> +++ b/include/uapi/linux/ndctl.h
> @@ -244,6 +244,7 @@ struct nd_cmd_pkg {
>  #define NVDIMM_FAMILY_HPE2 2
>  #define NVDIMM_FAMILY_MSFT 3
>  #define NVDIMM_FAMILY_HYPERV 4
> +#define NVDIMM_FAMILY_PAPR 5
>  
>  #define ND_IOCTL_CALL			_IOWR(ND_IOCTL, ND_CMD_CALL,\
>  					struct nd_cmd_pkg)
> -- 
> 2.26.2
> 

^ permalink raw reply

* Re: [RESEND PATCH v9 4/5] ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
From: Ira Weiny @ 2020-06-02 21:05 UTC (permalink / raw)
  To: Vaibhav Jain
  Cc: linux-nvdimm, linux-kernel, Steven Rostedt, Aneesh Kumar K . V,
	linuxppc-dev
In-Reply-To: <20200602205148.GF1505637@iweiny-DESK2.sc.intel.com>

On Tue, Jun 02, 2020 at 01:51:49PM -0700, 'Ira Weiny' wrote:
> On Tue, Jun 02, 2020 at 03:44:37PM +0530, Vaibhav Jain wrote:

...

> > +
> > +/*
> > + * PDSM Envelope:
> > + *
> > + * The ioctl ND_CMD_CALL transfers data between user-space and kernel via
> > + * envelope which consists of a header and user-defined payload sections.
> > + * The header is described by 'struct nd_pdsm_cmd_pkg' which expects a
> > + * payload following it and accessible via 'nd_pdsm_cmd_pkg.payload' field.
> > + * There is reserved field that can used to introduce new fields to the
> > + * structure in future. It also tries to ensure that 'nd_pdsm_cmd_pkg.payload'
> > + * lies at a 8-byte boundary.
> > + *
> > + *  +-------------+---------------------+---------------------------+
> > + *  |   64-Bytes  |       16-Bytes      |       Max 176-Bytes       |
> > + *  +-------------+---------------------+---------------------------+
> > + *  |               nd_pdsm_cmd_pkg     |                           |
> > + *  |-------------+                     |                           |
> > + *  |  nd_cmd_pkg |                     |                           |
> > + *  +-------------+---------------------+---------------------------+
> > + *  | nd_family   |                     |                           |
> > + *  | nd_size_out | cmd_status          |                           |
> > + *  | nd_size_in  | payload_version     |     payload               |
> > + *  | nd_command  | reserved            |                           |
> > + *  | nd_fw_size  |                     |                           |
> > + *  +-------------+---------------------+---------------------------+

One more comment WRT nd_size_[in|out].  I know that it is defined as the size
of the FW payload but normally when you nest headers 'size' in Header A
represents everything after Header A, including Header B.  In this case that
would be including nd_pdsm_cmd_pkg...

It looks like that is not what you have done?  Or perhaps I missed it?

Ira

> > + *
> > + * PDSM Header:
> > + *
> > + * The header is defined as 'struct nd_pdsm_cmd_pkg' which embeds a
> > + * 'struct nd_cmd_pkg' instance. The PDSM command is assigned to member
> > + * 'nd_cmd_pkg.nd_command'. Apart from size information of the envelope which is
> > + * contained in 'struct nd_cmd_pkg', the header also has members following 
>                                                              ^^^^^
>                                                         ...  the  ...
> 
> > + * members:
> > + *
> > + * 'cmd_status'		: (Out) Errors if any encountered while servicing PDSM.
> > + * 'payload_version'	: (In/Out) Version number associated with the payload.
> > + * 'reserved'		: Not used and reserved for future.
> > + *
> > + * PDSM Payload:
> > + *
> > + * The layout of the PDSM Payload is defined by various structs shared between
> > + * papr_scm and libndctl so that contents of payload can be interpreted. During
> > + * servicing of a PDSM the papr_scm module will read input args from the payload
> > + * field by casting its contents to an appropriate struct pointer based on the
> > + * PDSM command. Similarly the output of servicing the PDSM command will be
> > + * copied to the payload field using the same struct.
> > + *
> > + * 'libnvdimm' enforces a hard limit of 256 bytes on the envelope size, which
> > + * leaves around 176 bytes for the envelope payload (ignoring any padding that
> > + * the compiler may silently introduce).
> > + *
> > + * Payload Version:
> > + *
> > + * A 'payload_version' field is present in PDSM header that indicates a specific
> > + * version of the structure present in PDSM Payload for a given PDSM command.
> > + * This provides backward compatibility in case the PDSM Payload structure
> > + * evolves and different structures are supported by 'papr_scm' and 'libndctl'.
> > + *
> > + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send the version
> > + * of the payload struct it supports via 'payload_version' field. The 'papr_scm'
> > + * module when servicing the PDSM envelope checks the 'payload_version' and then
> > + * uses 'payload struct version' == MIN('payload_version field',
> > + * 'max payload-struct-version supported by papr_scm') to service the PDSM.
> > + * After servicing the PDSM, 'papr_scm' put the negotiated version of payload
> > + * struct in returned 'payload_version' field.
> > + *
> > + * Libndctl on receiving the envelope back from papr_scm again checks the
> > + * 'payload_version' field and based on it use the appropriate version dsm
> > + * struct to parse the results.
> > + *
> > + * Backward Compatibility:
> > + *
> > + * Above scheme of exchanging different versioned PDSM struct between libndctl
> > + * and papr_scm should provide backward compatibility until following two
> > + * assumptions/conditions when defining new PDSM structs hold:
> > + *
> > + * Let T(X) = { set of attributes in PDSM struct 'T' versioned X }
> > + *
> > + * 1. T(X) is a proper subset of T(Y) if Y > X.
> > + *    i.e Each new version of PDSM struct should retain existing struct
> > + *    attributes from previous version
> > + *
> > + * 2. If an entity (libndctl or papr_scm) supports a PDSM struct T(X) then
> > + *    it should also support T(1), T(2)...T(X - 1).
> > + *    i.e When adding support for new version of a PDSM struct, libndctl
> > + *    and papr_scm should retain support of the existing PDSM struct
> > + *    version they support.
> 
> Please see this thread for an example why versions are a bad idea in UAPIs:
> 
> https://lkml.org/lkml/2020/3/26/213
> 
> While the use of version is different in that thread the fundamental issues are
> the same.  You end up with some weird matrix of supported features and
> structure definitions.  For example, you are opening up the possibility of
> changing structures with a different version for no good reason.
> 
> Also having the user query with version Z and get back version X (older) is
> odd.  Generally if the kernel does not know about a feature (ie version Z of
> the structure) it should return -EINVAL and let the user figure out what to do.
> The user may just give up or they could try a different query.
> 
> > + */
> > +
> > +/* PDSM-header + payload expected with ND_CMD_CALL ioctl from libnvdimm */
> > +struct nd_pdsm_cmd_pkg {
> > +	struct nd_cmd_pkg hdr;	/* Package header containing sub-cmd */
> > +	__s32 cmd_status;	/* Out: Sub-cmd status returned back */
> > +	__u16 reserved[5];	/* Ignored and to be used in future */
> 
> How do you know when reserved is used for something else in the future?  Is
> reserved guaranteed (and checked by the code) to be 0?
> 
> > +	__u16 payload_version;	/* In/Out: version of the payload */
> 
> Why is payload_version after reserved?
> 
> > +	__u8 payload[];		/* In/Out: Sub-cmd data buffer */
> > +} __packed;
> > +
> > +/*
> > + * Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
> > + * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
> > + */
> > +enum papr_pdsm {
> > +	PAPR_PDSM_MIN = 0x0,
> > +	PAPR_PDSM_MAX,
> > +};
> > +
> > +/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */
> > +static inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg *cmd)
> > +{
> > +	return (struct nd_pdsm_cmd_pkg *) cmd;
> > +}
> > +
> > +/* Return the payload pointer for a given pcmd */
> > +static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
> > +{
> > +	if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
> > +		return NULL;
> > +	else
> > +		return (void *)(pcmd->payload);
> > +}
> > +
> > +#endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
> > diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> > index 149431594839..5e2237e7ec08 100644
> > --- a/arch/powerpc/platforms/pseries/papr_scm.c
> > +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> > @@ -15,13 +15,15 @@
> >  #include <linux/seq_buf.h>
> >  
> >  #include <asm/plpar_wrappers.h>
> > +#include <asm/papr_pdsm.h>
> >  
> >  #define BIND_ANY_ADDR (~0ul)
> >  
> >  #define PAPR_SCM_DIMM_CMD_MASK \
> >  	((1ul << ND_CMD_GET_CONFIG_SIZE) | \
> >  	 (1ul << ND_CMD_GET_CONFIG_DATA) | \
> > -	 (1ul << ND_CMD_SET_CONFIG_DATA))
> > +	 (1ul << ND_CMD_SET_CONFIG_DATA) | \
> > +	 (1ul << ND_CMD_CALL))
> >  
> >  /* DIMM health bitmap bitmap indicators */
> >  /* SCM device is unable to persist memory contents */
> > @@ -350,16 +352,97 @@ static int papr_scm_meta_set(struct papr_scm_priv *p,
> >  	return 0;
> >  }
> >  
> > +/*
> > + * Validate the inputs args to dimm-control function and return '0' if valid.
> > + * This also does initial sanity validation to ND_CMD_CALL sub-command packages.
> > + */
> > +static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
> > +		       unsigned int buf_len)
> > +{
> > +	unsigned long cmd_mask = PAPR_SCM_DIMM_CMD_MASK;
> > +	struct nd_pdsm_cmd_pkg *pkg = nd_to_pdsm_cmd_pkg(buf);
> > +	struct papr_scm_priv *p;
> > +
> > +	/* Only dimm-specific calls are supported atm */
> > +	if (!nvdimm)
> > +		return -EINVAL;
> > +
> > +	/* get the provider date from struct nvdimm */
> 
> s/date/data
> 
> > +	p = nvdimm_provider_data(nvdimm);
> > +
> > +	if (!test_bit(cmd, &cmd_mask)) {
> > +		dev_dbg(&p->pdev->dev, "Unsupported cmd=%u\n", cmd);
> > +		return -EINVAL;
> > +	} else if (cmd == ND_CMD_CALL) {
> > +
> > +		/* Verify the envelope package */
> > +		if (!buf || buf_len < sizeof(struct nd_pdsm_cmd_pkg)) {
> > +			dev_dbg(&p->pdev->dev, "Invalid pkg size=%u\n",
> > +				buf_len);
> > +			return -EINVAL;
> > +		}
> > +
> > +		/* Verify that the PDSM family is valid */
> > +		if (pkg->hdr.nd_family != NVDIMM_FAMILY_PAPR) {
> > +			dev_dbg(&p->pdev->dev, "Invalid pkg family=0x%llx\n",
> > +				pkg->hdr.nd_family);
> > +			return -EINVAL;
> > +
> > +		}
> > +
> > +		/* We except a payload with all PDSM commands */
> > +		if (pdsm_cmd_to_payload(pkg) == NULL) {
> > +			dev_dbg(&p->pdev->dev,
> > +				"Empty payload for sub-command=0x%llx\n",
> > +				pkg->hdr.nd_command);
> > +			return -EINVAL;
> > +		}
> > +	}
> > +
> > +	/* Command looks valid */
> 
> I assume the first command to be implemented also checks the { nd_command,
> payload_version, payload length } for correctness?
> 
> > +	return 0;
> > +}
> > +
> > +static int papr_scm_service_pdsm(struct papr_scm_priv *p,
> > +				struct nd_pdsm_cmd_pkg *call_pkg)
> > +{
> > +	/* unknown subcommands return error in packages */
> > +	if (call_pkg->hdr.nd_command <= PAPR_PDSM_MIN ||
> > +	    call_pkg->hdr.nd_command >= PAPR_PDSM_MAX) {
> > +		dev_dbg(&p->pdev->dev, "Invalid PDSM request 0x%llx\n",
> > +			call_pkg->hdr.nd_command);
> > +		call_pkg->cmd_status = -EINVAL;
> > +		return 0;
> > +	}
> > +
> > +	/* Depending on the DSM command call appropriate service routine */
> > +	switch (call_pkg->hdr.nd_command) {
> > +	default:
> > +		dev_dbg(&p->pdev->dev, "Unsupported PDSM request 0x%llx\n",
> > +			call_pkg->hdr.nd_command);
> > +		call_pkg->cmd_status = -ENOENT;
> > +		return 0;
> > +	}
> > +}
> > +
> >  static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
> >  			  struct nvdimm *nvdimm, unsigned int cmd, void *buf,
> >  			  unsigned int buf_len, int *cmd_rc)
> >  {
> >  	struct nd_cmd_get_config_size *get_size_hdr;
> >  	struct papr_scm_priv *p;
> > +	struct nd_pdsm_cmd_pkg *call_pkg = NULL;
> > +	int rc;
> >  
> > -	/* Only dimm-specific calls are supported atm */
> > -	if (!nvdimm)
> > -		return -EINVAL;
> > +	/* Use a local variable in case cmd_rc pointer is NULL */
> > +	if (cmd_rc == NULL)
> > +		cmd_rc = &rc;
> 
> Why is this needed?  AFAICT The caller of papr_scm_ndctl does not specify null
> and you did not change it.
> 
> > +
> > +	*cmd_rc = is_cmd_valid(nvdimm, cmd, buf, buf_len);
> > +	if (*cmd_rc) {
> > +		pr_debug("Invalid cmd=0x%x. Err=%d\n", cmd, *cmd_rc);
> > +		return *cmd_rc;
> > +	}
> >  
> >  	p = nvdimm_provider_data(nvdimm);
> >  
> > @@ -381,13 +464,19 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
> >  		*cmd_rc = papr_scm_meta_set(p, buf);
> >  		break;
> >  
> > +	case ND_CMD_CALL:
> > +		call_pkg = nd_to_pdsm_cmd_pkg(buf);
> > +		*cmd_rc = papr_scm_service_pdsm(p, call_pkg);
> > +		break;
> > +
> >  	default:
> > -		return -EINVAL;
> > +		dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd);
> > +		*cmd_rc = -EINVAL;
> 
> Is this change related?  If there is a bug where there is a caller of
> papr_scm_ndctl() with cmd_rc == NULL this should be a separate patch to fix
> that issue.
> 
> Ira
> 
> >  	}
> >  
> >  	dev_dbg(&p->pdev->dev, "returned with cmd_rc = %d\n", *cmd_rc);
> >  
> > -	return 0;
> > +	return *cmd_rc;
> >  }
> >  
> >  static ssize_t flags_show(struct device *dev,
> > diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
> > index de5d90212409..0e09dc5cec19 100644
> > --- a/include/uapi/linux/ndctl.h
> > +++ b/include/uapi/linux/ndctl.h
> > @@ -244,6 +244,7 @@ struct nd_cmd_pkg {
> >  #define NVDIMM_FAMILY_HPE2 2
> >  #define NVDIMM_FAMILY_MSFT 3
> >  #define NVDIMM_FAMILY_HYPERV 4
> > +#define NVDIMM_FAMILY_PAPR 5
> >  
> >  #define ND_IOCTL_CALL			_IOWR(ND_IOCTL, ND_CMD_CALL,\
> >  					struct nd_cmd_pkg)
> > -- 
> > 2.26.2
> > 
> _______________________________________________
> Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
> To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply

* Re: [RESEND PATCH v9 5/5] powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH
From: Ira Weiny @ 2020-06-02 21:19 UTC (permalink / raw)
  To: Vaibhav Jain
  Cc: Santosh Sivaraj, linux-nvdimm, linux-kernel, Steven Rostedt,
	Oliver O'Halloran, Aneesh Kumar K . V, Dan Williams,
	linuxppc-dev
In-Reply-To: <20200602101438.73929-6-vaibhav@linux.ibm.com>

On Tue, Jun 02, 2020 at 03:44:38PM +0530, Vaibhav Jain wrote:
> This patch implements support for PDSM request 'PAPR_PDSM_HEALTH'
> that returns a newly introduced 'struct nd_papr_pdsm_health' instance
> containing dimm health information back to user space in response to
> ND_CMD_CALL. This functionality is implemented in newly introduced
> papr_pdsm_health() that queries the nvdimm health information and
> then copies this information to the package payload whose layout is
> defined by 'struct nd_papr_pdsm_health'.
> 
> The patch also introduces a new member 'struct papr_scm_priv.health'
> thats an instance of 'struct nd_papr_pdsm_health' to cache the health
> information of a nvdimm. As a result functions drc_pmem_query_health()
> and flags_show() are updated to populate and use this new struct
> instead of a u64 integer that was earlier used.
> 
> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> ---
> Changelog:
> 
> Resend:
> * Added ack from Aneesh.
> 
> v8..v9:
> * s/PAPR_SCM_PDSM_HEALTH/PAPR_PDSM_HEALTH/g  [ Dan , Aneesh ]
> * s/PAPR_SCM_PSDM_DIMM_*/PAPR_PDSM_DIMM_*/g
> * Renamed papr_scm_get_health() to papr_psdm_health()
> * Updated patch description to replace papr-scm dimm with nvdimm.
> 
> v7..v8:
> * None
> 
> Resend:
> * None
> 
> v6..v7:
> * Updated flags_show() to use seq_buf_printf(). [Mpe]
> * Updated papr_scm_get_health() to use newly introduced
>   __drc_pmem_query_health() bypassing the cache [Mpe].
> 
> v5..v6:
> * Added attribute '__packed' to 'struct nd_papr_pdsm_health_v1' to
>   gaurd against possibility of different compilers adding different
>   paddings to the struct [ Dan Williams ]
> 
> * Updated 'struct nd_papr_pdsm_health_v1' to use __u8 instead of
>   'bool' and also updated drc_pmem_query_health() to take this into
>   account. [ Dan Williams ]
> 
> v4..v5:
> * None
> 
> v3..v4:
> * Call the DSM_PAPR_SCM_HEALTH service function from
>   papr_scm_service_dsm() instead of papr_scm_ndctl(). [Aneesh]
> 
> v2..v3:
> * Updated struct nd_papr_scm_dimm_health_stat_v1 to use '__xx' types
>   as its exported to the userspace [Aneesh]
> * Changed the constants DSM_PAPR_SCM_DIMM_XX indicating dimm health
>   from enum to #defines [Aneesh]
> 
> v1..v2:
> * New patch in the series
> ---
>  arch/powerpc/include/uapi/asm/papr_pdsm.h |  39 +++++++
>  arch/powerpc/platforms/pseries/papr_scm.c | 125 +++++++++++++++++++---
>  2 files changed, 147 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h
> index 6407fefcc007..411725a91591 100644
> --- a/arch/powerpc/include/uapi/asm/papr_pdsm.h
> +++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
> @@ -115,6 +115,7 @@ struct nd_pdsm_cmd_pkg {
>   */
>  enum papr_pdsm {
>  	PAPR_PDSM_MIN = 0x0,
> +	PAPR_PDSM_HEALTH,
>  	PAPR_PDSM_MAX,
>  };
>  
> @@ -133,4 +134,42 @@ static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
>  		return (void *)(pcmd->payload);
>  }
>  
> +/* Various nvdimm health indicators */
> +#define PAPR_PDSM_DIMM_HEALTHY       0
> +#define PAPR_PDSM_DIMM_UNHEALTHY     1
> +#define PAPR_PDSM_DIMM_CRITICAL      2
> +#define PAPR_PDSM_DIMM_FATAL         3
> +
> +/*
> + * Struct exchanged between kernel & ndctl in for PAPR_PDSM_HEALTH
> + * Various flags indicate the health status of the dimm.
> + *
> + * dimm_unarmed		: Dimm not armed. So contents wont persist.
> + * dimm_bad_shutdown	: Previous shutdown did not persist contents.
> + * dimm_bad_restore	: Contents from previous shutdown werent restored.
> + * dimm_scrubbed	: Contents of the dimm have been scrubbed.
> + * dimm_locked		: Contents of the dimm cant be modified until CEC reboot
> + * dimm_encrypted	: Contents of dimm are encrypted.
> + * dimm_health		: Dimm health indicator. One of PAPR_PDSM_DIMM_XXXX
> + */
> +struct nd_papr_pdsm_health_v1 {
> +	__u8 dimm_unarmed;
> +	__u8 dimm_bad_shutdown;
> +	__u8 dimm_bad_restore;
> +	__u8 dimm_scrubbed;
> +	__u8 dimm_locked;
> +	__u8 dimm_encrypted;
> +	__u16 dimm_health;
> +} __packed;
> +
> +/*
> + * Typedef the current struct for dimm_health so that any application
> + * or kernel recompiled after introducing a new version automatically
> + * supports the new version.
> + */
> +#define nd_papr_pdsm_health nd_papr_pdsm_health_v1
> +
> +/* Current version number for the dimm health struct */

This can't be the 'current' version.  You will need a list of versions you
support.  Because if the user passes in an old version you need to be able to
respond with that old version.  Also if you plan to support 'return X for a Y
query' then the user will need both X and Y defined to interpret X.

> +#define ND_PAPR_PDSM_HEALTH_VERSION 1
> +
>  #endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> index 5e2237e7ec08..c0606c0c659c 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -88,7 +88,7 @@ struct papr_scm_priv {
>  	unsigned long lasthealth_jiffies;
>  
>  	/* Health information for the dimm */
> -	u64 health_bitmap;
> +	struct nd_papr_pdsm_health health;

ok so we are throwing away all the #defs from patch 1?  Are they still valid?

I'm confused that patch 3 added this and we are throwing it away here...

>  };
>  
>  static int drc_pmem_bind(struct papr_scm_priv *p)
> @@ -201,6 +201,7 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
>  static int __drc_pmem_query_health(struct papr_scm_priv *p)
>  {
>  	unsigned long ret[PLPAR_HCALL_BUFSIZE];
> +	u64 health;
>  	long rc;
>  
>  	/* issue the hcall */
> @@ -208,18 +209,46 @@ static int __drc_pmem_query_health(struct papr_scm_priv *p)
>  	if (rc != H_SUCCESS) {
>  		dev_err(&p->pdev->dev,
>  			 "Failed to query health information, Err:%ld\n", rc);
> -		rc = -ENXIO;
> -		goto out;
> +		return -ENXIO;

I missed this...  probably did not need the goto in the first patch?

>  	}
>  
>  	p->lasthealth_jiffies = jiffies;
> -	p->health_bitmap = ret[0] & ret[1];
> +	health = ret[0] & ret[1];
>  
>  	dev_dbg(&p->pdev->dev,
>  		"Queried dimm health info. Bitmap:0x%016lx Mask:0x%016lx\n",
>  		ret[0], ret[1]);
> -out:
> -	return rc;
> +
> +	memset(&p->health, 0, sizeof(p->health));
> +
> +	/* Check for various masks in bitmap and set the buffer */
> +	if (health & PAPR_PMEM_UNARMED_MASK)

Oh ok...  odd.  (don't add code then just take it away in a series)
You could have lead with the user structure and put this code in patch 3.

Why does the user need u8 to represent a single bit?  Does this help protect
against endian issues?


> +		p->health.dimm_unarmed = 1;
> +
> +	if (health & PAPR_PMEM_BAD_SHUTDOWN_MASK)
> +		p->health.dimm_bad_shutdown = 1;
> +
> +	if (health & PAPR_PMEM_BAD_RESTORE_MASK)
> +		p->health.dimm_bad_restore = 1;
> +
> +	if (health & PAPR_PMEM_ENCRYPTED)
> +		p->health.dimm_encrypted = 1;
> +
> +	if (health & PAPR_PMEM_SCRUBBED_AND_LOCKED) {
> +		p->health.dimm_locked = 1;
> +		p->health.dimm_scrubbed = 1;
> +	}
> +
> +	if (health & PAPR_PMEM_HEALTH_UNHEALTHY)
> +		p->health.dimm_health = PAPR_PDSM_DIMM_UNHEALTHY;
> +
> +	if (health & PAPR_PMEM_HEALTH_CRITICAL)
> +		p->health.dimm_health = PAPR_PDSM_DIMM_CRITICAL;
> +
> +	if (health & PAPR_PMEM_HEALTH_FATAL)
> +		p->health.dimm_health = PAPR_PDSM_DIMM_FATAL;
> +
> +	return 0;
>  }
>  
>  /* Min interval in seconds for assuming stable dimm health */
> @@ -403,6 +432,58 @@ static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
>  	return 0;
>  }
>  
> +/* Fetch the DIMM health info and populate it in provided package. */
> +static int papr_pdsm_health(struct papr_scm_priv *p,
> +			       struct nd_pdsm_cmd_pkg *pkg)
> +{
> +	int rc;
> +	size_t copysize = sizeof(p->health);
> +
> +	/* Ensure dimm health mutex is taken preventing concurrent access */
> +	rc = mutex_lock_interruptible(&p->health_mutex);
> +	if (rc)
> +		goto out;
> +
> +	/* Always fetch upto date dimm health data ignoring cached values */
> +	rc = __drc_pmem_query_health(p);
> +	if (rc)
> +		goto out_unlock;
> +	/*
> +	 * If the requested payload version is greater than one we know
> +	 * about, return the payload version we know about and let
> +	 * caller/userspace handle.
> +	 */
> +	if (pkg->payload_version > ND_PAPR_PDSM_HEALTH_VERSION)
> +		pkg->payload_version = ND_PAPR_PDSM_HEALTH_VERSION;

I know this seems easy now but I do think you will run into trouble later.

Ira

> +
> +	if (pkg->hdr.nd_size_out < copysize) {
> +		dev_dbg(&p->pdev->dev, "Truncated payload (%u). Expected (%lu)",
> +			pkg->hdr.nd_size_out, copysize);
> +		rc = -ENOSPC;
> +		goto out_unlock;
> +	}
> +
> +	dev_dbg(&p->pdev->dev, "Copying payload size=%lu version=0x%x\n",
> +		copysize, pkg->payload_version);
> +
> +	/* Copy the health struct to the payload */
> +	memcpy(pdsm_cmd_to_payload(pkg), &p->health, copysize);
> +	pkg->hdr.nd_fw_size = copysize;
> +
> +out_unlock:
> +	mutex_unlock(&p->health_mutex);
> +
> +out:
> +	/*
> +	 * Put the error in out package and return success from function
> +	 * so that errors if any are propogated back to userspace.
> +	 */
> +	pkg->cmd_status = rc;
> +	dev_dbg(&p->pdev->dev, "completion code = %d\n", rc);
> +
> +	return 0;
> +}
> +
>  static int papr_scm_service_pdsm(struct papr_scm_priv *p,
>  				struct nd_pdsm_cmd_pkg *call_pkg)
>  {
> @@ -417,6 +498,9 @@ static int papr_scm_service_pdsm(struct papr_scm_priv *p,
>  
>  	/* Depending on the DSM command call appropriate service routine */
>  	switch (call_pkg->hdr.nd_command) {
> +	case PAPR_PDSM_HEALTH:
> +		return papr_pdsm_health(p, call_pkg);
> +
>  	default:
>  		dev_dbg(&p->pdev->dev, "Unsupported PDSM request 0x%llx\n",
>  			call_pkg->hdr.nd_command);
> @@ -485,34 +569,41 @@ static ssize_t flags_show(struct device *dev,
>  	struct nvdimm *dimm = to_nvdimm(dev);
>  	struct papr_scm_priv *p = nvdimm_provider_data(dimm);
>  	struct seq_buf s;
> -	u64 health;
>  	int rc;
>  
>  	rc = drc_pmem_query_health(p);
>  	if (rc)
>  		return rc;
>  
> -	/* Copy health_bitmap locally, check masks & update out buffer */
> -	health = READ_ONCE(p->health_bitmap);
> -
>  	seq_buf_init(&s, buf, PAGE_SIZE);
> -	if (health & PAPR_PMEM_UNARMED_MASK)
> +
> +	/* Protect concurrent modifications to papr_scm_priv */
> +	rc = mutex_lock_interruptible(&p->health_mutex);
> +	if (rc)
> +		return rc;
> +
> +	if (p->health.dimm_unarmed)
>  		seq_buf_printf(&s, "not_armed ");
>  
> -	if (health & PAPR_PMEM_BAD_SHUTDOWN_MASK)
> +	if (p->health.dimm_bad_shutdown)
>  		seq_buf_printf(&s, "flush_fail ");
>  
> -	if (health & PAPR_PMEM_BAD_RESTORE_MASK)
> +	if (p->health.dimm_bad_restore)
>  		seq_buf_printf(&s, "restore_fail ");
>  
> -	if (health & PAPR_PMEM_ENCRYPTED)
> +	if (p->health.dimm_encrypted)
>  		seq_buf_printf(&s, "encrypted ");
>  
> -	if (health & PAPR_PMEM_SMART_EVENT_MASK)
> +	if (p->health.dimm_health)
>  		seq_buf_printf(&s, "smart_notify ");
>  
> -	if (health & PAPR_PMEM_SCRUBBED_AND_LOCKED)
> -		seq_buf_printf(&s, "scrubbed locked ");
> +	if (p->health.dimm_scrubbed)
> +		seq_buf_printf(&s, "scrubbed ");
> +
> +	if (p->health.dimm_locked)
> +		seq_buf_printf(&s, "locked ");
> +
> +	mutex_unlock(&p->health_mutex);
>  
>  	if (seq_buf_used(&s))
>  		seq_buf_printf(&s, "\n");
> -- 
> 2.26.2
> 

^ permalink raw reply

* Re: [RESEND PATCH v9 1/5] powerpc: Document details on H_SCM_HEALTH hcall
From: Ira Weiny @ 2020-06-02 21:21 UTC (permalink / raw)
  To: Vaibhav Jain
  Cc: Santosh Sivaraj, linux-nvdimm, linux-kernel, Steven Rostedt,
	Oliver O'Halloran, Aneesh Kumar K . V, Dan Williams,
	linuxppc-dev
In-Reply-To: <20200602101438.73929-2-vaibhav@linux.ibm.com>

On Tue, Jun 02, 2020 at 03:44:34PM +0530, Vaibhav Jain wrote:
> Add documentation to 'papr_hcalls.rst' describing the bitmap flags
> that are returned from H_SCM_HEALTH hcall as per the PAPR-SCM
> specification.
> 
> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Ira Weiny <ira.weiny@intel.com>

Acked-by: Ira Weiny <ira.weiny@intel.com>

> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> ---
> Changelog:
> 
> Resend:
> * None
> 
> v8..v9:
> * s/SCM/PMEM device. [ Dan Williams, Aneesh ]
> 
> v7..v8:
> * Added a clarification on bit-ordering of Health Bitmap
> 
> Resend:
> * None
> 
> v6..v7:
> * None
> 
> v5..v6:
> * New patch in the series
> ---
>  Documentation/powerpc/papr_hcalls.rst | 46 ++++++++++++++++++++++++---
>  1 file changed, 42 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/powerpc/papr_hcalls.rst b/Documentation/powerpc/papr_hcalls.rst
> index 3493631a60f8..48fcf1255a33 100644
> --- a/Documentation/powerpc/papr_hcalls.rst
> +++ b/Documentation/powerpc/papr_hcalls.rst
> @@ -220,13 +220,51 @@ from the LPAR memory.
>  **H_SCM_HEALTH**
>  
>  | Input: drcIndex
> -| Out: *health-bitmap, health-bit-valid-bitmap*
> +| Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)*
>  | Return Value: *H_Success, H_Parameter, H_Hardware*
>  
>  Given a DRC Index return the info on predictive failure and overall health of
> -the NVDIMM. The asserted bits in the health-bitmap indicate a single predictive
> -failure and health-bit-valid-bitmap indicate which bits in health-bitmap are
> -valid.
> +the PMEM device. The asserted bits in the health-bitmap indicate one or more states
> +(described in table below) of the PMEM device and health-bit-valid-bitmap indicate
> +which bits in health-bitmap are valid. The bits are reported in
> +reverse bit ordering for example a value of 0xC400000000000000
> +indicates bits 0, 1, and 5 are valid.
> +
> +Health Bitmap Flags:
> +
> ++------+-----------------------------------------------------------------------+
> +|  Bit |               Definition                                              |
> ++======+=======================================================================+
> +|  00  | PMEM device is unable to persist memory contents.                     |
> +|      | If the system is powered down, nothing will be saved.                 |
> ++------+-----------------------------------------------------------------------+
> +|  01  | PMEM device failed to persist memory contents. Either contents were   |
> +|      | not saved successfully on power down or were not restored properly on |
> +|      | power up.                                                             |
> ++------+-----------------------------------------------------------------------+
> +|  02  | PMEM device contents are persisted from previous IPL. The data from   |
> +|      | the last boot were successfully restored.                             |
> ++------+-----------------------------------------------------------------------+
> +|  03  | PMEM device contents are not persisted from previous IPL. There was no|
> +|      | data to restore from the last boot.                                   |
> ++------+-----------------------------------------------------------------------+
> +|  04  | PMEM device memory life remaining is critically low                   |
> ++------+-----------------------------------------------------------------------+
> +|  05  | PMEM device will be garded off next IPL due to failure                |
> ++------+-----------------------------------------------------------------------+
> +|  06  | PMEM device contents cannot persist due to current platform health    |
> +|      | status. A hardware failure may prevent data from being saved or       |
> +|      | restored.                                                             |
> ++------+-----------------------------------------------------------------------+
> +|  07  | PMEM device is unable to persist memory contents in certain conditions|
> ++------+-----------------------------------------------------------------------+
> +|  08  | PMEM device is encrypted                                              |
> ++------+-----------------------------------------------------------------------+
> +|  09  | PMEM device has successfully completed a requested erase or secure    |
> +|      | erase procedure.                                                      |
> ++------+-----------------------------------------------------------------------+
> +|10:63 | Reserved / Unused                                                     |
> ++------+-----------------------------------------------------------------------+
>  
>  **H_SCM_PERFORMANCE_STATS**
>  
> -- 
> 2.26.2
> 

^ permalink raw reply

* Re: [PATCH] cxl: Fix kobject memory leak in cxl_sysfs_afu_new_cr()
From: wanghai (M) @ 2020-06-03  1:42 UTC (permalink / raw)
  To: Markus Elfring, linuxppc-dev
  Cc: Andrew Donnellan, Arnd Bergmann, Greg Kroah-Hartman,
	kernel-janitors, linux-kernel, Ian Munsie, Frederic Barrat
In-Reply-To: <b9791ff3-8397-f6e9-ca88-59c9bbe8c78f@web.de>


在 2020/6/3 1:20, Markus Elfring 写道:
>> Fix it by adding a call to kobject_put() in the error path of
>> kobject_init_and_add().
> Thanks for another completion of the exception handling.
>
> Would an other patch subject be a bit nicer?
Thanks for the guidance, I will perfect this description and send a v2
>
> …
>> +++ b/drivers/misc/cxl/sysfs.c
>> @@ -624,7 +624,7 @@ static struct afu_config_record *cxl_sysfs_afu_new_cr(struct cxl_afu *afu, int c
>>   	rc = kobject_init_and_add(&cr->kobj, &afu_config_record_type,
>>   				  &afu->dev.kobj, "cr%i", cr->cr);
>>   	if (rc)
>> -		goto err;
>> +		goto err1;
> …
>
> Can an other label be more reasonable here?
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?id=f359287765c04711ff54fbd11645271d8e5ff763#n465
I just used the original author's label, should I replace all his labels 
like'err','err1' with reasonable one.


^ permalink raw reply

* Re: linux-next: manual merge of the rcu tree with the powerpc tree
From: Stephen Rothwell @ 2020-06-03  2:02 UTC (permalink / raw)
  To: Michael Ellerman, PowerPC
  Cc: Paul E. McKenney, Peter Zijlstra, Linux Kernel Mailing List,
	Nicholas Piggin, Linux Next Mailing List, Thomas Gleixner
In-Reply-To: <20200519172316.3b37cbae@canb.auug.org.au>

[-- Attachment #1: Type: text/plain, Size: 638 bytes --]

Hi all,

On Tue, 19 May 2020 17:23:16 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> Hi all,
> 
> Today's linux-next merge of the rcu tree got a conflict in:
> 
>   arch/powerpc/kernel/traps.c
> 
> between commit:
> 
>   116ac378bb3f ("powerpc/64s: machine check interrupt update NMI accounting")
> 
> from the powerpc tree and commit:
> 
>   187416eeb388 ("hardirq/nmi: Allow nested nmi_enter()")
> 
> from the rcu tree.

This is now a conflict between commit

  69ea03b56ed2 ("hardirq/nmi: Allow nested nmi_enter()")

From Linus tree and the above powerpc tree commit.
-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH] cxl: Fix kobject memory leak in cxl_sysfs_afu_new_cr()
From: Michael Ellerman @ 2020-06-03  3:02 UTC (permalink / raw)
  To: wanghai (M), Markus Elfring, linuxppc-dev
  Cc: Andrew Donnellan, Arnd Bergmann, Greg Kroah-Hartman,
	kernel-janitors, linux-kernel, Ian Munsie, Frederic Barrat
In-Reply-To: <25ad528b-beaf-820f-9738-ea304dcbc0d7@huawei.com>

"wanghai (M)" <wanghai38@huawei.com> writes:
> 在 2020/6/3 1:20, Markus Elfring 写道:
>>> Fix it by adding a call to kobject_put() in the error path of
>>> kobject_init_and_add().
>> Thanks for another completion of the exception handling.
>>
>> Would an other patch subject be a bit nicer?
> Thanks for the guidance, I will perfect this description and send a v2
>>
>> …
>>> +++ b/drivers/misc/cxl/sysfs.c
>>> @@ -624,7 +624,7 @@ static struct afu_config_record *cxl_sysfs_afu_new_cr(struct cxl_afu *afu, int c
>>>   	rc = kobject_init_and_add(&cr->kobj, &afu_config_record_type,
>>>   				  &afu->dev.kobj, "cr%i", cr->cr);
>>>   	if (rc)
>>> -		goto err;
>>> +		goto err1;
>> …
>>
>> Can an other label be more reasonable here?
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?id=f359287765c04711ff54fbd11645271d8e5ff763#n465
> I just used the original author's label, should I replace all his labels 
> like'err','err1' with reasonable one.

No.

cheers

^ permalink raw reply

* Re: [PATCH v2] selftests: powerpc: Add test for execute-disabled pkeys
From: Michael Ellerman @ 2020-06-03  3:17 UTC (permalink / raw)
  To: Sandipan Das
  Cc: fweimer, aneesh.kumar, linuxram, linux-mm, linux-kselftest,
	linuxppc-dev, bauerman
In-Reply-To: <1eb388dc-0fde-64f3-9c05-7f9f2a398543@linux.ibm.com>

Sandipan Das <sandipan@linux.ibm.com> writes:
> Hi Michael,
>
> Thanks for your suggestions. I had a few questions regarding some
> of them.
>
> On 29/05/20 7:18 am, Michael Ellerman wrote:
>>> [...]
>>> +
>>> +static void pkeyreg_set(unsigned long uamr)
>>> +{
>>> +	asm volatile("isync; mtspr	0xd, %0; isync;" : : "r"(uamr));
>>> +}
>> 
>> You can use mtspr() there, but you'll need to add the isync's yourself.
>> 
>
> Would it make sense to add a new macro that adds the CSI instructions?
> Something like this.

I guess. I'm not sure there's that many places that need it, it's just
the pkey tests I think.

I'd be more inclined to have a set_amr() helper that includes the
isyncs, rather than a generic mtspr() version.

> diff --git a/tools/testing/selftests/powerpc/include/reg.h b/tools/testing/selftests/powerpc/include/reg.h
> index 022c5076b2c5..d60f66380cad 100644
> --- a/tools/testing/selftests/powerpc/include/reg.h
> +++ b/tools/testing/selftests/powerpc/include/reg.h
> @@ -15,6 +15,10 @@
>  #define mtspr(rn, v)   asm volatile("mtspr " _str(rn) ",%0" : \
>                                     : "r" ((unsigned long)(v)) \
>                                     : "memory")
> +#define mtspr_csi(rn, v)       ({ \
> +                       asm volatile("isync; mtspr " _str(rn) ",%0; isync;" : \
> +                                   : "r" ((unsigned long)(v)) \
> +                                   : "memory"); })
>  
>  #define mb()           asm volatile("sync" : : : "memory");
>  #define barrier()      asm volatile("" : : : "memory");
>
>
> I also noticed that two of the ptrace-related pkey tests were also not
> using CSIs. I will fix those too.
>
>>> [...]
>>> +	/* The following two cases will avoid SEGV_PKUERR */
>>> +	ftype = -1;
>>> +	fpkey = -1;
>>> +
>>> +	/*
>>> +	 * Read an instruction word from the address when AMR bits
>>> +	 * are not set.
>> 
>> You should explain for people who aren't familiar with the ISA that "AMR
>> bits not set" means "read/write access allowed".
>> 
>>> +	 *
>>> +	 * This should not generate a fault as having PROT_EXEC
>>> +	 * implicitly allows reads. The pkey currently restricts
>> 
>> Whether PROT_EXEC implies read is not well defined (see the man page).
>> If you want to test this case I think you'd be better off specifying
>> PROT_EXEC | PROT_READ explicitly.
>> 
>
> But I guess specifying PROT_EXEC | PROT_READ defeats the purpose? I can
> tweak the passing condition though based on whether READ_IMPLIES_EXEC is
> set in the personality.
>
>> [...]
>>> +	FAIL_IF(faults != 0 || fcode != SEGV_ACCERR);
>>> +
>>> +	/* The following three cases will generate SEGV_PKUERR */
>>> +	ftype = PKEY_DISABLE_ACCESS;
>>> +	fpkey = pkey;
>>> +
>>> +	/*
>>> +	 * Read an instruction word from the address when AMR bits
>>> +	 * are set.
>>> +	 *
>>> +	 * This should generate a pkey fault based on AMR bits only
>>> +	 * as having PROT_EXEC implicitly allows reads.
>> 
>> Again would be better to specify PROT_READ IMHO.
>> 
>
> I can use a personality check here too.
>
>>> +	 */
>>> +	faults = 1;
>>> +	FAIL_IF(sys_pkey_mprotect(insns, pgsize, PROT_EXEC, pkey) != 0);
>>> +	printf("read from %p, pkey is execute-disabled, access-disabled\n",
>>> +	       (void *) faddr);
>>> +	pkey_set_rights(pkey, PKEY_DISABLE_ACCESS);
>>> +	i = *faddr;
>>> +	FAIL_IF(faults != 0 || fcode != SEGV_PKUERR);
>>> +
>>> +	/*
>>> +	 * Write an instruction word to the address when AMR bits
>>> +	 * are set.
>>> +	 *
>>> +	 * This should generate two faults. First, a pkey fault based
>>> +	 * on AMR bits and then an access fault based on PROT_EXEC.
>>> +	 */
>>> +	faults = 2;
>> 
>> Setting faults to the expected value and decrementing it in the fault
>> handler is kind of weird.
>> 
>> I think it would be clearer if faults was just a zero-based counter of
>> how many faults we've taken, and then you test that it's == 2 below.
>> 
>>> +	FAIL_IF(sys_pkey_mprotect(insns, pgsize, PROT_EXEC, pkey) != 0);
>>> +	printf("write to %p, pkey is execute-disabled, access-disabled\n",
>>> +	       (void *) faddr);
>>> +	pkey_set_rights(pkey, PKEY_DISABLE_ACCESS);
>>> +	*faddr = 0x60000000;	/* nop */
>>> +	FAIL_IF(faults != 0 || fcode != SEGV_ACCERR);
>> 
>> ie. FAIL_IF(faults != 2 || ... )
>> 
>
> Agreed, it is weird. IIRC, I did this to make sure that if the test
> kept getting repeated faults at the same address and exceeded the
> maximum number of expected faults i.e. it gets another fault when
> 'faults' is already zero, then the signal handler will attempt to
> let the program continue by giving all permissions to the page and
> also the pkey. Would it make sense to just rename 'faults' to
> something like 'remaining_faults'?

It seems like you've tried to make the code cope with a situation that
should not happen, and would indicate a bug if it did happen, in which
case I think it would be fine if the test just timed out.

But if you want to handle it that's up to you, renaming the variable
might help a bit.

cheers

^ permalink raw reply

* [PATCH 1/3] ASoC: fsl_easrc: Fix -Wmissing-prototypes warning
From: Shengjiu Wang @ 2020-06-03  3:39 UTC (permalink / raw)
  To: timur, nicoleotsuka, Xiubo.Lee, festevam, broonie, alsa-devel,
	lgirdwood, perex, tiwai
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <cover.1591155860.git.shengjiu.wang@nxp.com>

Obtained with:
$ make W=1

sound/soc/fsl/fsl_easrc.c:967:5: warning: no previous prototype for function 'fsl_easrc_config_context' [-Wmissing-prototypes]
int fsl_easrc_config_context(struct fsl_asrc *easrc, unsigned int ctx_id)
    ^
sound/soc/fsl/fsl_easrc.c:967:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int fsl_easrc_config_context(struct fsl_asrc *easrc, unsigned int ctx_id)
^
static
sound/soc/fsl/fsl_easrc.c:1128:5: warning: no previous prototype for function 'fsl_easrc_set_ctx_format' [-Wmissing-prototypes]
int fsl_easrc_set_ctx_format(struct fsl_asrc_pair *ctx,
    ^
sound/soc/fsl/fsl_easrc.c:1128:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int fsl_easrc_set_ctx_format(struct fsl_asrc_pair *ctx,
^
static
sound/soc/fsl/fsl_easrc.c:1201:5: warning: no previous prototype for function 'fsl_easrc_set_ctx_organziation' [-Wmissing-prototypes]
int fsl_easrc_set_ctx_organziation(struct fsl_asrc_pair *ctx)
    ^
sound/soc/fsl/fsl_easrc.c:1201:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int fsl_easrc_set_ctx_organziation(struct fsl_asrc_pair *ctx)
^
static
sound/soc/fsl/fsl_easrc.c:1245:5: warning: no previous prototype for function 'fsl_easrc_request_context' [-Wmissing-prototypes]
int fsl_easrc_request_context(int channels, struct fsl_asrc_pair *ctx)
    ^
sound/soc/fsl/fsl_easrc.c:1245:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int fsl_easrc_request_context(int channels, struct fsl_asrc_pair *ctx)
^
static
sound/soc/fsl/fsl_easrc.c:1290:6: warning: no previous prototype for function 'fsl_easrc_release_context' [-Wmissing-prototypes]
void fsl_easrc_release_context(struct fsl_asrc_pair *ctx)
     ^
sound/soc/fsl/fsl_easrc.c:1290:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
void fsl_easrc_release_context(struct fsl_asrc_pair *ctx)
^
static
sound/soc/fsl/fsl_easrc.c:1317:5: warning: no previous prototype for function 'fsl_easrc_start_context' [-Wmissing-prototypes]
int fsl_easrc_start_context(struct fsl_asrc_pair *ctx)
    ^
sound/soc/fsl/fsl_easrc.c:1317:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int fsl_easrc_start_context(struct fsl_asrc_pair *ctx)
^
static
sound/soc/fsl/fsl_easrc.c:1335:5: warning: no previous prototype for function 'fsl_easrc_stop_context' [-Wmissing-prototypes]
int fsl_easrc_stop_context(struct fsl_asrc_pair *ctx)
    ^
sound/soc/fsl/fsl_easrc.c:1335:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int fsl_easrc_stop_context(struct fsl_asrc_pair *ctx)
^
static
sound/soc/fsl/fsl_easrc.c:1382:18: warning: no previous prototype for function 'fsl_easrc_get_dma_channel' [-Wmissing-prototypes]
struct dma_chan *fsl_easrc_get_dma_channel(struct fsl_asrc_pair *ctx,
                 ^
sound/soc/fsl/fsl_easrc.c:1382:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
struct dma_chan *fsl_easrc_get_dma_channel(struct fsl_asrc_pair *ctx,
^
static

Fixes: 955ac624058f ("ASoC: fsl_easrc: Add EASRC ASoC CPU DAI drivers")
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Reported-by: kbuild test robot <lkp@intel.com>
---
 sound/soc/fsl/fsl_easrc.c | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/sound/soc/fsl/fsl_easrc.c b/sound/soc/fsl/fsl_easrc.c
index c6b5eb2d2af7..f227308a50e2 100644
--- a/sound/soc/fsl/fsl_easrc.c
+++ b/sound/soc/fsl/fsl_easrc.c
@@ -964,7 +964,7 @@ static int fsl_easrc_release_slot(struct fsl_asrc *easrc, unsigned int ctx_id)
  *
  * Configure the register relate with context.
  */
-int fsl_easrc_config_context(struct fsl_asrc *easrc, unsigned int ctx_id)
+static int fsl_easrc_config_context(struct fsl_asrc *easrc, unsigned int ctx_id)
 {
 	struct fsl_easrc_ctx_priv *ctx_priv;
 	struct fsl_asrc_pair *ctx;
@@ -1125,9 +1125,9 @@ static int fsl_easrc_process_format(struct fsl_asrc_pair *ctx,
 	return 0;
 }
 
-int fsl_easrc_set_ctx_format(struct fsl_asrc_pair *ctx,
-			     snd_pcm_format_t *in_raw_format,
-			     snd_pcm_format_t *out_raw_format)
+static int fsl_easrc_set_ctx_format(struct fsl_asrc_pair *ctx,
+				    snd_pcm_format_t *in_raw_format,
+				    snd_pcm_format_t *out_raw_format)
 {
 	struct fsl_asrc *easrc = ctx->asrc;
 	struct fsl_easrc_ctx_priv *ctx_priv = ctx->private;
@@ -1198,7 +1198,7 @@ int fsl_easrc_set_ctx_format(struct fsl_asrc_pair *ctx,
  * to conform with this format. Interleaving parameters are accessed
  * through the ASRC_CTRL_IN_ACCESSa and ASRC_CTRL_OUT_ACCESSa registers
  */
-int fsl_easrc_set_ctx_organziation(struct fsl_asrc_pair *ctx)
+static int fsl_easrc_set_ctx_organziation(struct fsl_asrc_pair *ctx)
 {
 	struct fsl_easrc_ctx_priv *ctx_priv;
 	struct device *dev;
@@ -1242,7 +1242,7 @@ int fsl_easrc_set_ctx_organziation(struct fsl_asrc_pair *ctx)
  * Returns a negative number on error and >=0 as context id
  * on success
  */
-int fsl_easrc_request_context(int channels, struct fsl_asrc_pair *ctx)
+static int fsl_easrc_request_context(int channels, struct fsl_asrc_pair *ctx)
 {
 	enum asrc_pair_index index = ASRC_INVALID_PAIR;
 	struct fsl_asrc *easrc = ctx->asrc;
@@ -1287,7 +1287,7 @@ int fsl_easrc_request_context(int channels, struct fsl_asrc_pair *ctx)
  *
  * This funciton is mainly doing the revert thing in request context
  */
-void fsl_easrc_release_context(struct fsl_asrc_pair *ctx)
+static void fsl_easrc_release_context(struct fsl_asrc_pair *ctx)
 {
 	unsigned long lock_flags;
 	struct fsl_asrc *easrc;
@@ -1314,7 +1314,7 @@ void fsl_easrc_release_context(struct fsl_asrc_pair *ctx)
  *
  * Enable the DMA request and context
  */
-int fsl_easrc_start_context(struct fsl_asrc_pair *ctx)
+static int fsl_easrc_start_context(struct fsl_asrc_pair *ctx)
 {
 	struct fsl_asrc *easrc = ctx->asrc;
 
@@ -1332,7 +1332,7 @@ int fsl_easrc_start_context(struct fsl_asrc_pair *ctx)
  *
  * Disable the DMA request and context
  */
-int fsl_easrc_stop_context(struct fsl_asrc_pair *ctx)
+static int fsl_easrc_stop_context(struct fsl_asrc_pair *ctx)
 {
 	struct fsl_asrc *easrc = ctx->asrc;
 	int val, i;
@@ -1379,8 +1379,8 @@ int fsl_easrc_stop_context(struct fsl_asrc_pair *ctx)
 	return 0;
 }
 
-struct dma_chan *fsl_easrc_get_dma_channel(struct fsl_asrc_pair *ctx,
-					   bool dir)
+static struct dma_chan *fsl_easrc_get_dma_channel(struct fsl_asrc_pair *ctx,
+						  bool dir)
 {
 	struct fsl_asrc *easrc = ctx->asrc;
 	enum asrc_pair_index index = ctx->index;
@@ -1391,7 +1391,6 @@ struct dma_chan *fsl_easrc_get_dma_channel(struct fsl_asrc_pair *ctx,
 
 	return dma_request_slave_channel(&easrc->pdev->dev, name);
 };
-EXPORT_SYMBOL_GPL(fsl_easrc_get_dma_channel);
 
 static const unsigned int easrc_rates[] = {
 	8000, 11025, 12000, 16000,
-- 
2.21.0


^ permalink raw reply related

* [PATCH 0/3] ASoC: fsl_easrc: Fix several warnings
From: Shengjiu Wang @ 2020-06-03  3:39 UTC (permalink / raw)
  To: timur, nicoleotsuka, Xiubo.Lee, festevam, broonie, alsa-devel,
	lgirdwood, perex, tiwai
  Cc: linuxppc-dev, linux-kernel

Fix several warnings with "make W=1"

Shengjiu Wang (3):
  ASoC: fsl_easrc: Fix -Wmissing-prototypes warning
  ASoC: fsl_easrc: Fix -Wunused-but-set-variable
  ASoC: fsl_easrc: Fix "Function parameter not described" warnings

 sound/soc/fsl/fsl_easrc.c | 42 +++++++++++++++++----------------------
 1 file changed, 18 insertions(+), 24 deletions(-)

-- 
2.21.0


^ permalink raw reply

* [PATCH 2/3] ASoC: fsl_easrc: Fix -Wunused-but-set-variable
From: Shengjiu Wang @ 2020-06-03  3:39 UTC (permalink / raw)
  To: timur, nicoleotsuka, Xiubo.Lee, festevam, broonie, alsa-devel,
	lgirdwood, perex, tiwai
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <cover.1591155860.git.shengjiu.wang@nxp.com>

Obtained with:
$ make W=1

sound/soc/fsl/fsl_easrc.c: In function 'fsl_easrc_set_rs_ratio':
sound/soc/fsl/fsl_easrc.c:182:15: warning: variable 'int_bits' set but not used [-Wunused-but-set-variable]
  unsigned int int_bits;
               ^
sound/soc/fsl/fsl_easrc.c: In function 'fsl_easrc_set_ctx_organziation':
sound/soc/fsl/fsl_easrc.c:1204:17: warning: variable 'dev' set but not used [-Wunused-but-set-variable]
  struct device *dev;
                 ^
sound/soc/fsl/fsl_easrc.c: In function 'fsl_easrc_release_context':
sound/soc/fsl/fsl_easrc.c:1294:17: warning: variable 'dev' set but not used [-Wunused-but-set-variable]
  struct device *dev;
                 ^
Fixes: 955ac624058f ("ASoC: fsl_easrc: Add EASRC ASoC CPU DAI drivers")
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Reported-by: kbuild test robot <lkp@intel.com>
---
 sound/soc/fsl/fsl_easrc.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/sound/soc/fsl/fsl_easrc.c b/sound/soc/fsl/fsl_easrc.c
index f227308a50e2..7d8bf9d47842 100644
--- a/sound/soc/fsl/fsl_easrc.c
+++ b/sound/soc/fsl/fsl_easrc.c
@@ -179,22 +179,21 @@ static int fsl_easrc_set_rs_ratio(struct fsl_asrc_pair *ctx)
 	struct fsl_easrc_ctx_priv *ctx_priv = ctx->private;
 	unsigned int in_rate = ctx_priv->in_params.norm_rate;
 	unsigned int out_rate = ctx_priv->out_params.norm_rate;
-	unsigned int int_bits;
 	unsigned int frac_bits;
 	u64 val;
 	u32 *r;
 
 	switch (easrc_priv->rs_num_taps) {
 	case EASRC_RS_32_TAPS:
-		int_bits = 5;
+		/* integer bits = 5; */
 		frac_bits = 39;
 		break;
 	case EASRC_RS_64_TAPS:
-		int_bits = 6;
+		/* integer bits = 6; */
 		frac_bits = 38;
 		break;
 	case EASRC_RS_128_TAPS:
-		int_bits = 7;
+		/* integer bits = 7; */
 		frac_bits = 37;
 		break;
 	default:
@@ -1201,7 +1200,6 @@ static int fsl_easrc_set_ctx_format(struct fsl_asrc_pair *ctx,
 static int fsl_easrc_set_ctx_organziation(struct fsl_asrc_pair *ctx)
 {
 	struct fsl_easrc_ctx_priv *ctx_priv;
-	struct device *dev;
 	struct fsl_asrc *easrc;
 
 	if (!ctx)
@@ -1209,7 +1207,6 @@ static int fsl_easrc_set_ctx_organziation(struct fsl_asrc_pair *ctx)
 
 	easrc = ctx->asrc;
 	ctx_priv = ctx->private;
-	dev = &easrc->pdev->dev;
 
 	/* input interleaving parameters */
 	regmap_update_bits(easrc->regmap, REG_EASRC_CIA(ctx->index),
@@ -1291,13 +1288,11 @@ static void fsl_easrc_release_context(struct fsl_asrc_pair *ctx)
 {
 	unsigned long lock_flags;
 	struct fsl_asrc *easrc;
-	struct device *dev;
 
 	if (!ctx)
 		return;
 
 	easrc = ctx->asrc;
-	dev = &easrc->pdev->dev;
 
 	spin_lock_irqsave(&easrc->lock, lock_flags);
 
-- 
2.21.0


^ permalink raw reply related

* [PATCH 3/3] ASoC: fsl_easrc: Fix "Function parameter not described" warnings
From: Shengjiu Wang @ 2020-06-03  3:39 UTC (permalink / raw)
  To: timur, nicoleotsuka, Xiubo.Lee, festevam, broonie, alsa-devel,
	lgirdwood, perex, tiwai
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <cover.1591155860.git.shengjiu.wang@nxp.com>

Obtained with:
$ make W=1

sound/soc/fsl/fsl_easrc.c:403: warning: Function parameter or member 'easrc' not described in 'fsl_easrc_normalize_filter'
sound/soc/fsl/fsl_easrc.c:403: warning: Function parameter or member 'infilter' not described in 'fsl_easrc_normalize_filter'
sound/soc/fsl/fsl_easrc.c:403: warning: Function parameter or member 'outfilter' not described in 'fsl_easrc_normalize_filter'
sound/soc/fsl/fsl_easrc.c:403: warning: Function parameter or member 'shift' not described in 'fsl_easrc_normalize_filter'

Fixes: 955ac624058f ("ASoC: fsl_easrc: Add EASRC ASoC CPU DAI drivers")
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Reported-by: kbuild test robot <lkp@intel.com>
---
 sound/soc/fsl/fsl_easrc.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/sound/soc/fsl/fsl_easrc.c b/sound/soc/fsl/fsl_easrc.c
index 7d8bf9d47842..2f6b3d8bfcfc 100644
--- a/sound/soc/fsl/fsl_easrc.c
+++ b/sound/soc/fsl/fsl_easrc.c
@@ -389,11 +389,11 @@ static int fsl_easrc_resampler_config(struct fsl_asrc *easrc)
  *  For input int[16, 24, 32] -> output float32
  *      scale it by multiplying filter coefficients by 2^-15, 2^-23, 2^-31
  *  input:
- *      asrc:  Structure pointer of fsl_asrc
- *      infilter : Pointer to non-scaled input filter
- *      shift:  The multiply factor
+ *      @easrc:  Structure pointer of fsl_asrc
+ *      @infilter : Pointer to non-scaled input filter
+ *      @shift:  The multiply factor
  *  output:
- *      outfilter: scaled filter
+ *      @outfilter: scaled filter
  */
 static int fsl_easrc_normalize_filter(struct fsl_asrc *easrc,
 				      u64 *infilter,
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH 1/4] dma-mapping: move the remaining DMA API calls out of line
From: Alexey Kardashevskiy @ 2020-06-03  4:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Greg Kroah-Hartman, Joerg Roedel, linuxppc-dev, linux-kernel,
	iommu, Robin Murphy, Lu Baolu
In-Reply-To: <20200509081946.GA21834@lst.de>



On 09/05/2020 18:19, Christoph Hellwig wrote:
> On Tue, May 05, 2020 at 02:18:37PM +1000, Alexey Kardashevskiy wrote:
>>
>>
>> On 17/04/2020 17:58, Christoph Hellwig wrote:
>>> On Wed, Apr 15, 2020 at 09:21:37PM +1000, Alexey Kardashevskiy wrote:
>>>> And the fact they were exported leaves possibility that there is a
>>>> driver somewhere relying on these symbols or distro kernel won't build
>>>> because the symbol disappeared from exports (I do not know what KABI
>>>> guarantees or if mainline kernel cares).
>>>
>>> We absolutely do not care.  In fact for abuses of APIs that drivers
>>> should not use we almost care to make them private and break people
>>> abusing them.
>>
>> ok :)
>>
>>>> I do not care in particular but
>>>> some might, a line separated with empty lines in the commit log would do.
>>>
>>> I'll add a blurb for the next version.
>>
>>
>> Has it gone anywhere? Thanks,
> 
> I've been hoping for the sg_buf helpers to land first, as they need
> backporting and would conflict.  Do you urgently need the series?

Any progress with sg_buf helpers stuff? Thanks,



-- 
Alexey

^ permalink raw reply

* [powerpc:next] BUILD SUCCESS 4336b9337824a60a0b10013c622caeee99460db5
From: kbuild test robot @ 2020-06-03  4:34 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  next
branch HEAD: 4336b9337824a60a0b10013c622caeee99460db5  powerpc/pseries: Make vio and ibmebus initcalls pseries specific

elapsed time: 919m

configs tested: 118
configs skipped: 12

The following configs have been built successfully.
More configs may be tested in the coming days.

arm                                 defconfig
arm                              allyesconfig
arm                              allmodconfig
arm                               allnoconfig
arm64                            allyesconfig
arm64                               defconfig
arm64                            allmodconfig
arm64                             allnoconfig
arm                     davinci_all_defconfig
m68k                            q40_defconfig
sh                           se7343_defconfig
mips                        nlm_xlp_defconfig
arm                           corgi_defconfig
sparc                            allyesconfig
mips                      bmips_stb_defconfig
arm                        realview_defconfig
alpha                               defconfig
arm                          prima2_defconfig
s390                              allnoconfig
mips                              allnoconfig
mips                            gpr_defconfig
sh                     sh7710voipgw_defconfig
ia64                             allmodconfig
powerpc                  storcenter_defconfig
mips                  decstation_64_defconfig
sh                   rts7751r2dplus_defconfig
sh                     magicpanelr2_defconfig
parisc                            allnoconfig
mips                           ci20_defconfig
mips                           ip22_defconfig
powerpc                          alldefconfig
mips                        maltaup_defconfig
mips                        jmr3927_defconfig
s390                             alldefconfig
c6x                        evmc6472_defconfig
i386                              allnoconfig
i386                             allyesconfig
i386                                defconfig
i386                              debian-10.3
ia64                                defconfig
ia64                              allnoconfig
ia64                             allyesconfig
m68k                             allmodconfig
m68k                              allnoconfig
m68k                           sun3_defconfig
m68k                                defconfig
m68k                             allyesconfig
nios2                               defconfig
nios2                            allyesconfig
openrisc                            defconfig
c6x                              allyesconfig
c6x                               allnoconfig
openrisc                         allyesconfig
nds32                               defconfig
nds32                             allnoconfig
csky                             allyesconfig
csky                                defconfig
alpha                            allyesconfig
xtensa                              defconfig
xtensa                           allyesconfig
h8300                            allyesconfig
h8300                            allmodconfig
arc                                 defconfig
arc                              allyesconfig
sh                               allmodconfig
sh                                allnoconfig
microblaze                        allnoconfig
mips                             allyesconfig
mips                             allmodconfig
parisc                              defconfig
parisc                           allyesconfig
parisc                           allmodconfig
powerpc                             defconfig
powerpc                          allyesconfig
powerpc                          rhel-kconfig
powerpc                          allmodconfig
powerpc                           allnoconfig
i386                 randconfig-a001-20200602
i386                 randconfig-a006-20200602
i386                 randconfig-a002-20200602
i386                 randconfig-a005-20200602
i386                 randconfig-a003-20200602
i386                 randconfig-a004-20200602
x86_64               randconfig-a011-20200602
x86_64               randconfig-a016-20200602
x86_64               randconfig-a013-20200602
x86_64               randconfig-a012-20200602
x86_64               randconfig-a014-20200602
x86_64               randconfig-a015-20200602
i386                 randconfig-a014-20200602
i386                 randconfig-a015-20200602
i386                 randconfig-a011-20200602
i386                 randconfig-a016-20200602
i386                 randconfig-a012-20200602
i386                 randconfig-a013-20200602
riscv                            allyesconfig
riscv                             allnoconfig
riscv                               defconfig
riscv                            allmodconfig
s390                             allyesconfig
s390                             allmodconfig
s390                                defconfig
sparc                               defconfig
sparc64                             defconfig
sparc64                           allnoconfig
sparc64                          allyesconfig
sparc64                          allmodconfig
um                                  defconfig
um                                allnoconfig
um                               allyesconfig
um                               allmodconfig
x86_64                                   rhel
x86_64                               rhel-7.6
x86_64                    rhel-7.6-kselftests
x86_64                         rhel-7.2-clear
x86_64                                    lkp
x86_64                              fedora-25
x86_64                                  kexec

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply

* [powerpc:merge] BUILD SUCCESS 4d8244d005b2a92c872a0a993c3aa94b5842e56b
From: kbuild test robot @ 2020-06-03  4:33 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  merge
branch HEAD: 4d8244d005b2a92c872a0a993c3aa94b5842e56b  Automatic merge of 'master', 'next' and 'fixes' (2020-06-02 11:51)

elapsed time: 922m

configs tested: 119
configs skipped: 10

The following configs have been built successfully.
More configs may be tested in the coming days.

arm                                 defconfig
arm                              allyesconfig
arm                              allmodconfig
arm                               allnoconfig
arm64                            allyesconfig
arm64                            allmodconfig
arm64                             allnoconfig
arm64                               defconfig
arm                       imx_v4_v5_defconfig
h8300                     edosk2674_defconfig
parisc                generic-64bit_defconfig
m68k                         amcore_defconfig
arm                     davinci_all_defconfig
m68k                            q40_defconfig
sh                           se7343_defconfig
mips                        nlm_xlp_defconfig
arm                           corgi_defconfig
mips                      bmips_stb_defconfig
arm                        realview_defconfig
sparc                            allyesconfig
alpha                               defconfig
arm                          prima2_defconfig
s390                              allnoconfig
mips                              allnoconfig
mips                            gpr_defconfig
sh                     sh7710voipgw_defconfig
ia64                             allmodconfig
powerpc                  storcenter_defconfig
mips                  decstation_64_defconfig
sh                   rts7751r2dplus_defconfig
sh                     magicpanelr2_defconfig
parisc                            allnoconfig
mips                           ci20_defconfig
mips                           ip22_defconfig
s390                             alldefconfig
c6x                        evmc6472_defconfig
i386                              allnoconfig
i386                             allyesconfig
i386                                defconfig
i386                              debian-10.3
ia64                                defconfig
ia64                              allnoconfig
ia64                             allyesconfig
m68k                             allmodconfig
m68k                              allnoconfig
m68k                           sun3_defconfig
m68k                                defconfig
m68k                             allyesconfig
nios2                               defconfig
nios2                            allyesconfig
openrisc                            defconfig
c6x                              allyesconfig
c6x                               allnoconfig
openrisc                         allyesconfig
nds32                               defconfig
nds32                             allnoconfig
csky                             allyesconfig
csky                                defconfig
alpha                            allyesconfig
xtensa                              defconfig
xtensa                           allyesconfig
h8300                            allyesconfig
h8300                            allmodconfig
arc                                 defconfig
arc                              allyesconfig
sh                               allmodconfig
sh                                allnoconfig
microblaze                        allnoconfig
mips                             allyesconfig
mips                             allmodconfig
parisc                              defconfig
parisc                           allyesconfig
parisc                           allmodconfig
powerpc                             defconfig
powerpc                          allyesconfig
powerpc                          rhel-kconfig
powerpc                          allmodconfig
powerpc                           allnoconfig
i386                 randconfig-a001-20200602
i386                 randconfig-a006-20200602
i386                 randconfig-a002-20200602
i386                 randconfig-a005-20200602
i386                 randconfig-a003-20200602
i386                 randconfig-a004-20200602
x86_64               randconfig-a011-20200602
x86_64               randconfig-a016-20200602
x86_64               randconfig-a013-20200602
x86_64               randconfig-a012-20200602
x86_64               randconfig-a014-20200602
x86_64               randconfig-a015-20200602
i386                 randconfig-a014-20200602
i386                 randconfig-a015-20200602
i386                 randconfig-a011-20200602
i386                 randconfig-a016-20200602
i386                 randconfig-a012-20200602
i386                 randconfig-a013-20200602
riscv                            allyesconfig
riscv                             allnoconfig
riscv                               defconfig
riscv                            allmodconfig
s390                             allyesconfig
s390                             allmodconfig
s390                                defconfig
sparc                               defconfig
sparc64                             defconfig
sparc64                           allnoconfig
sparc64                          allyesconfig
sparc64                          allmodconfig
um                                  defconfig
um                                allnoconfig
um                               allyesconfig
um                               allmodconfig
x86_64                                   rhel
x86_64                               rhel-7.6
x86_64                    rhel-7.6-kselftests
x86_64                         rhel-7.2-clear
x86_64                                    lkp
x86_64                              fedora-25
x86_64                                  kexec

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply

* Re: cxl: Fix kobject memory leak in cxl_sysfs_afu_new_cr()
From: Markus Elfring @ 2020-06-03  5:07 UTC (permalink / raw)
  To: Michael Ellerman, Wang Hai, linuxppc-dev
  Cc: Andrew Donnellan, Arnd Bergmann, Greg Kroah-Hartman,
	kernel-janitors, linux-kernel, Ian Munsie, Frederic Barrat
In-Reply-To: <877dwoj1hz.fsf@mpe.ellerman.id.au>

>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?id=f359287765c04711ff54fbd11645271d8e5ff763#n465
>> I just used the original author's label, should I replace all his labels
>> like'err','err1' with reasonable one.
>
> No.

Do you insist to deviate from the current Linux coding style?

Regards,
Markus

^ permalink raw reply

* [PATCH 1/5] powerpc/mm: Introduce temporary mm
From: Christopher M. Riedl @ 2020-06-03  5:19 UTC (permalink / raw)
  To: linuxppc-dev, kernel-hardening
In-Reply-To: <20200603051912.23296-1-cmr@informatik.wtf>

x86 supports the notion of a temporary mm which restricts access to
temporary PTEs to a single CPU. A temporary mm is useful for situations
where a CPU needs to perform sensitive operations (such as patching a
STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing
said mappings to other CPUs. A side benefit is that other CPU TLBs do
not need to be flushed when the temporary mm is torn down.

Mappings in the temporary mm can be set in the userspace portion of the
address-space.

Interrupts must be disabled while the temporary mm is in use. HW
breakpoints, which may have been set by userspace as watchpoints on
addresses now within the temporary mm, are saved and disabled when
loading the temporary mm. The HW breakpoints are restored when unloading
the temporary mm. All HW breakpoints are indiscriminately disabled while
the temporary mm is in use.

Based on x86 implementation:

commit cefa929c034e
("x86/mm: Introduce temporary mm structs")

Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
---
 arch/powerpc/include/asm/debug.h       |  1 +
 arch/powerpc/include/asm/mmu_context.h | 64 ++++++++++++++++++++++++++
 arch/powerpc/kernel/process.c          |  5 ++
 3 files changed, 70 insertions(+)

diff --git a/arch/powerpc/include/asm/debug.h b/arch/powerpc/include/asm/debug.h
index ec57daf87f40..827350c9bcf3 100644
--- a/arch/powerpc/include/asm/debug.h
+++ b/arch/powerpc/include/asm/debug.h
@@ -46,6 +46,7 @@ static inline int debugger_fault_handler(struct pt_regs *regs) { return 0; }
 #endif
 
 void __set_breakpoint(int nr, struct arch_hw_breakpoint *brk);
+void __get_breakpoint(int nr, struct arch_hw_breakpoint *brk);
 bool ppc_breakpoint_available(void);
 #ifdef CONFIG_PPC_ADV_DEBUG_REGS
 extern void do_send_trap(struct pt_regs *regs, unsigned long address,
diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 1a474f6b1992..9269c7c7b04e 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -10,6 +10,7 @@
 #include <asm/mmu.h>	
 #include <asm/cputable.h>
 #include <asm/cputhreads.h>
+#include <asm/debug.h>
 
 /*
  * Most if the context management is out of line
@@ -300,5 +301,68 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm,
 	return 0;
 }
 
+struct temp_mm {
+	struct mm_struct *temp;
+	struct mm_struct *prev;
+	bool is_kernel_thread;
+	struct arch_hw_breakpoint brk[HBP_NUM_MAX];
+};
+
+static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm)
+{
+	temp_mm->temp = mm;
+	temp_mm->prev = NULL;
+	temp_mm->is_kernel_thread = false;
+	memset(&temp_mm->brk, 0, sizeof(temp_mm->brk));
+}
+
+static inline void use_temporary_mm(struct temp_mm *temp_mm)
+{
+	lockdep_assert_irqs_disabled();
+
+	temp_mm->is_kernel_thread = current->mm == NULL;
+	if (temp_mm->is_kernel_thread)
+		temp_mm->prev = current->active_mm;
+	else
+		temp_mm->prev = current->mm;
+
+	/*
+	 * Hash requires a non-NULL current->mm to allocate a userspace address
+	 * when handling a page fault. Does not appear to hurt in Radix either.
+	 */
+	current->mm = temp_mm->temp;
+	switch_mm_irqs_off(NULL, temp_mm->temp, current);
+
+	if (ppc_breakpoint_available()) {
+		struct arch_hw_breakpoint null_brk = {0};
+		int i = 0;
+
+		for (; i < nr_wp_slots(); ++i) {
+			__get_breakpoint(i, &temp_mm->brk[i]);
+			if (temp_mm->brk[i].type != 0)
+				__set_breakpoint(i, &null_brk);
+		}
+	}
+}
+
+static inline void unuse_temporary_mm(struct temp_mm *temp_mm)
+{
+	lockdep_assert_irqs_disabled();
+
+	if (temp_mm->is_kernel_thread)
+		current->mm = NULL;
+	else
+		current->mm = temp_mm->prev;
+	switch_mm_irqs_off(NULL, temp_mm->prev, current);
+
+	if (ppc_breakpoint_available()) {
+		int i = 0;
+
+		for (; i < nr_wp_slots(); ++i)
+			if (temp_mm->brk[i].type != 0)
+				__set_breakpoint(i, &temp_mm->brk[i]);
+	}
+}
+
 #endif /* __KERNEL__ */
 #endif /* __ASM_POWERPC_MMU_CONTEXT_H */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 048d64c4e115..3973144f6980 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -825,6 +825,11 @@ static inline int set_breakpoint_8xx(struct arch_hw_breakpoint *brk)
 	return 0;
 }
 
+void __get_breakpoint(int nr, struct arch_hw_breakpoint *brk)
+{
+	memcpy(brk, this_cpu_ptr(&current_brk[nr]), sizeof(*brk));
+}
+
 void __set_breakpoint(int nr, struct arch_hw_breakpoint *brk)
 {
 	memcpy(this_cpu_ptr(&current_brk[nr]), brk, sizeof(*brk));
-- 
2.26.2


^ permalink raw reply related

* [PATCH 4/5] powerpc/lib: Add LKDTM accessor for patching addr
From: Christopher M. Riedl @ 2020-06-03  5:19 UTC (permalink / raw)
  To: linuxppc-dev, kernel-hardening
In-Reply-To: <20200603051912.23296-1-cmr@informatik.wtf>

When live patching a STRICT_RWX kernel, a mapping is installed at a
"patching address" with temporary write permissions. Provide a
LKDTM-only accessor function for this address in preparation for a LKDTM
test which attempts to "hijack" this mapping by writing to it from
another CPU.

Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
---
 arch/powerpc/lib/code-patching.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index df0765845204..c23453049116 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -52,6 +52,13 @@ int raw_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
 static struct mm_struct *patching_mm __ro_after_init;
 static unsigned long patching_addr __ro_after_init;
 
+#ifdef CONFIG_LKDTM
+unsigned long read_cpu_patching_addr(unsigned int cpu)
+{
+	return patching_addr;
+}
+#endif
+
 void __init poking_init(void)
 {
 	spinlock_t *ptl; /* for protecting pte table */
-- 
2.26.2


^ permalink raw reply related

* [PATCH 3/5] powerpc/lib: Use a temporary mm for code patching
From: Christopher M. Riedl @ 2020-06-03  5:19 UTC (permalink / raw)
  To: linuxppc-dev, kernel-hardening
In-Reply-To: <20200603051912.23296-1-cmr@informatik.wtf>

Currently, code patching a STRICT_KERNEL_RWX exposes the temporary
mappings to other CPUs. These mappings should be kept local to the CPU
doing the patching. Use the pre-initialized temporary mm and patching
address for this purpose. Also add a check after patching to ensure the
patch succeeded.

Use the KUAP functions on non-BOOKS3_64 platforms since the temporary
mapping for patching uses a userspace address (to keep the mapping
local). On BOOKS3_64 platforms hash does not implement KUAP and on radix
the use of PAGE_KERNEL sets EAA[0] for the PTE which means the AMR
(KUAP) protection is ignored (see PowerISA v3.0b, Fig, 35).

Based on x86 implementation:

commit b3fd8e83ada0
("x86/alternatives: Use temporary mm for text poking")

Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
---
 arch/powerpc/lib/code-patching.c | 148 ++++++++++++-------------------
 1 file changed, 55 insertions(+), 93 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 599114f63b44..df0765845204 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -20,6 +20,7 @@
 #include <asm/code-patching.h>
 #include <asm/setup.h>
 #include <asm/inst.h>
+#include <asm/mmu_context.h>
 
 static int __patch_instruction(struct ppc_inst *exec_addr, struct ppc_inst instr,
 			       struct ppc_inst *patch_addr)
@@ -78,101 +79,58 @@ void __init poking_init(void)
 	pte_unmap_unlock(ptep, ptl);
 }
 
-static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
-
-static int text_area_cpu_up(unsigned int cpu)
-{
-	struct vm_struct *area;
-
-	area = get_vm_area(PAGE_SIZE, VM_ALLOC);
-	if (!area) {
-		WARN_ONCE(1, "Failed to create text area for cpu %d\n",
-			cpu);
-		return -1;
-	}
-	this_cpu_write(text_poke_area, area);
-
-	return 0;
-}
-
-static int text_area_cpu_down(unsigned int cpu)
-{
-	free_vm_area(this_cpu_read(text_poke_area));
-	return 0;
-}
-
-/*
- * Run as a late init call. This allows all the boot time patching to be done
- * simply by patching the code, and then we're called here prior to
- * mark_rodata_ro(), which happens after all init calls are run. Although
- * BUG_ON() is rude, in this case it should only happen if ENOMEM, and we judge
- * it as being preferable to a kernel that will crash later when someone tries
- * to use patch_instruction().
- */
-static int __init setup_text_poke_area(void)
-{
-	BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
-		"powerpc/text_poke:online", text_area_cpu_up,
-		text_area_cpu_down));
-
-	return 0;
-}
-late_initcall(setup_text_poke_area);
+struct patch_mapping {
+	spinlock_t *ptl; /* for protecting pte table */
+	pte_t *ptep;
+	struct temp_mm temp_mm;
+};
 
 /*
  * This can be called for kernel text or a module.
  */
-static int map_patch_area(void *addr, unsigned long text_poke_addr)
+static int map_patch(const void *addr, struct patch_mapping *patch_mapping)
 {
-	unsigned long pfn;
-	int err;
+	struct page *page;
+	pte_t pte;
+	pgprot_t pgprot;
 
 	if (is_vmalloc_addr(addr))
-		pfn = vmalloc_to_pfn(addr);
+		page = vmalloc_to_page(addr);
 	else
-		pfn = __pa_symbol(addr) >> PAGE_SHIFT;
+		page = virt_to_page(addr);
 
-	err = map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), PAGE_KERNEL);
+	if (radix_enabled())
+		pgprot = PAGE_KERNEL;
+	else
+		pgprot = PAGE_SHARED;
 
-	pr_devel("Mapped addr %lx with pfn %lx:%d\n", text_poke_addr, pfn, err);
-	if (err)
+	patch_mapping->ptep = get_locked_pte(patching_mm, patching_addr,
+					     &patch_mapping->ptl);
+	if (unlikely(!patch_mapping->ptep)) {
+		pr_warn("map patch: failed to allocate pte for patching\n");
 		return -1;
+	}
+
+	pte = mk_pte(page, pgprot);
+	if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64))
+		pte = pte_mkdirty(pte);
+	set_pte_at(patching_mm, patching_addr, patch_mapping->ptep, pte);
+
+	init_temp_mm(&patch_mapping->temp_mm, patching_mm);
+	use_temporary_mm(&patch_mapping->temp_mm);
 
 	return 0;
 }
 
-static inline int unmap_patch_area(unsigned long addr)
+static void unmap_patch(struct patch_mapping *patch_mapping)
 {
-	pte_t *ptep;
-	pmd_t *pmdp;
-	pud_t *pudp;
-	pgd_t *pgdp;
-
-	pgdp = pgd_offset_k(addr);
-	if (unlikely(!pgdp))
-		return -EINVAL;
-
-	pudp = pud_offset(pgdp, addr);
-	if (unlikely(!pudp))
-		return -EINVAL;
-
-	pmdp = pmd_offset(pudp, addr);
-	if (unlikely(!pmdp))
-		return -EINVAL;
-
-	ptep = pte_offset_kernel(pmdp, addr);
-	if (unlikely(!ptep))
-		return -EINVAL;
+	/* In hash, pte_clear flushes the tlb */
+	pte_clear(patching_mm, patching_addr, patch_mapping->ptep);
+	unuse_temporary_mm(&patch_mapping->temp_mm);
 
-	pr_devel("clearing mm %p, pte %p, addr %lx\n", &init_mm, ptep, addr);
-
-	/*
-	 * In hash, pte_clear flushes the tlb, in radix, we have to
-	 */
-	pte_clear(&init_mm, addr, ptep);
-	flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
-
-	return 0;
+	/* In radix, we have to explicitly flush the tlb (no-op in hash) */
+	local_flush_tlb_mm(patching_mm);
+	pte_unmap_unlock(patch_mapping->ptep, patch_mapping->ptl);
 }
 
 static int do_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
@@ -180,32 +138,36 @@ static int do_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
 	int err;
 	struct ppc_inst *patch_addr = NULL;
 	unsigned long flags;
-	unsigned long text_poke_addr;
-	unsigned long kaddr = (unsigned long)addr;
+	struct patch_mapping patch_mapping;
 
 	/*
-	 * During early early boot patch_instruction is called
-	 * when text_poke_area is not ready, but we still need
-	 * to allow patching. We just do the plain old patching
+	 * The patching_mm is initialized before calling mark_rodata_ro. Prior
+	 * to this, patch_instruction is called when we don't have (and don't
+	 * need) the patching_mm so just do plain old patching.
 	 */
-	if (!this_cpu_read(text_poke_area))
+	if (!patching_mm)
 		return raw_patch_instruction(addr, instr);
 
 	local_irq_save(flags);
 
-	text_poke_addr = (unsigned long)__this_cpu_read(text_poke_area)->addr;
-	if (map_patch_area(addr, text_poke_addr)) {
-		err = -1;
+	err = map_patch(addr, &patch_mapping);
+	if (err)
 		goto out;
-	}
 
-	patch_addr = (struct ppc_inst *)(text_poke_addr + (kaddr & ~PAGE_MASK));
+	patch_addr = (struct ppc_inst *)(patching_addr | offset_in_page(addr));
 
-	__patch_instruction(addr, instr, patch_addr);
+	if (!radix_enabled())
+		allow_write_to_user(patch_addr, sizeof(instr));
+	err = __patch_instruction(addr, instr, patch_addr);
+	if (!radix_enabled())
+		prevent_write_to_user(patch_addr, sizeof(instr));
 
-	err = unmap_patch_area(text_poke_addr);
-	if (err)
-		pr_warn("failed to unmap %lx\n", text_poke_addr);
+	unmap_patch(&patch_mapping);
+	/*
+	 * Something is wrong if what we just wrote doesn't match what we
+	 * think we just wrote.
+	 */
+	WARN_ON(!ppc_inst_equal(ppc_inst_read(addr), instr));
 
 out:
 	local_irq_restore(flags);
-- 
2.26.2


^ permalink raw reply related

* [PATCH 2/5] powerpc/lib: Initialize a temporary mm for code patching
From: Christopher M. Riedl @ 2020-06-03  5:19 UTC (permalink / raw)
  To: linuxppc-dev, kernel-hardening
In-Reply-To: <20200603051912.23296-1-cmr@informatik.wtf>

When code patching a STRICT_KERNEL_RWX kernel the page containing the
address to be patched is temporarily mapped with permissive memory
protections. Currently, a per-cpu vmalloc patch area is used for this
purpose. While the patch area is per-cpu, the temporary page mapping is
inserted into the kernel page tables for the duration of the patching.
The mapping is exposed to CPUs other than the patching CPU - this is
undesirable from a hardening perspective.

Use the `poking_init` init hook to prepare a temporary mm and patching
address. Initialize the temporary mm by copying the init mm. Choose a
randomized patching address inside the temporary mm userspace address
portion. The next patch uses the temporary mm and patching address for
code patching.

Based on x86 implementation:

commit 4fc19708b165
("x86/alternatives: Initialize temporary mm for patching")

Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
---
 arch/powerpc/lib/code-patching.c | 33 ++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 5ecf0d635a8d..599114f63b44 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -11,6 +11,8 @@
 #include <linux/cpuhotplug.h>
 #include <linux/slab.h>
 #include <linux/uaccess.h>
+#include <linux/sched/task.h>
+#include <linux/random.h>
 
 #include <asm/pgtable.h>
 #include <asm/tlbflush.h>
@@ -45,6 +47,37 @@ int raw_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
 }
 
 #ifdef CONFIG_STRICT_KERNEL_RWX
+
+static struct mm_struct *patching_mm __ro_after_init;
+static unsigned long patching_addr __ro_after_init;
+
+void __init poking_init(void)
+{
+	spinlock_t *ptl; /* for protecting pte table */
+	pte_t *ptep;
+
+	/*
+	 * Some parts of the kernel (static keys for example) depend on
+	 * successful code patching. Code patching under STRICT_KERNEL_RWX
+	 * requires this setup - otherwise we cannot patch at all. We use
+	 * BUG_ON() here and later since an early failure is preferred to
+	 * buggy behavior and/or strange crashes later.
+	 */
+	patching_mm = copy_init_mm();
+	BUG_ON(!patching_mm);
+
+	/*
+	 * In hash we cannot go above DEFAULT_MAP_WINDOW easily.
+	 * XXX: Do we want additional bits of entropy for radix?
+	 */
+	patching_addr = (get_random_long() & PAGE_MASK) %
+		(DEFAULT_MAP_WINDOW - PAGE_SIZE);
+
+	ptep = get_locked_pte(patching_mm, patching_addr, &ptl);
+	BUG_ON(!ptep);
+	pte_unmap_unlock(ptep, ptl);
+}
+
 static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
 
 static int text_area_cpu_up(unsigned int cpu)
-- 
2.26.2


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox