LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH v2 1/1] powerpc/papr_scm: Properly handle UUID types and API
From: Andy Shevchenko @ 2021-06-16 14:05 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-kernel, Oliver O'Halloran, Paul Mackerras, linuxppc-dev
In-Reply-To: <6f895afd-3469-c330-a4da-72db89dba6b3@linux.ibm.com>

On Wed, Jun 16, 2021 at 07:17:03PM +0530, Aneesh Kumar K.V wrote:
> On 6/16/21 7:13 PM, Andy Shevchenko wrote:
> > Parse to and export from UUID own type, before dereferencing.
> > This also fixes wrong comment (Little Endian UUID is something else)
> > and should eliminate the direct strict types assignments.
> > 
> > Fixes: 43001c52b603 ("powerpc/papr_scm: Use ibm,unit-guid as the iset cookie")
> > Fixes: 259a948c4ba1 ("powerpc/pseries/scm: Use a specific endian format for storing uuid from the device tree")
> 
> Do we need the Fixes: there? It didn't change any functionality right? The
> format with which we stored cookie1 remains the same with older and newer
> code. The newer one is better?

Depends if you are okay with Sparse warnings.

> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

Thanks for review!

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply

* Re: [PATCH 00/21] Rid W=1 warnings from IDE
From: Christoph Hellwig @ 2021-06-16 13:52 UTC (permalink / raw)
  To: Lee Jones
  Cc: linux-ide, Mike Waychison, Paul Mackerras, Christopher J. Reimer,
	Tim Hockin, Erik Andersen, Alan Cox, CJ, Sergei Shtylyov,
	Christoph Hellwig, Duncan Laurie, Scott Snyder, Gadi Oxman,
	Andre Hedrick, Christian Brunner, Jens Axboe, or,
	Benoit Poulot-Cazajous, Robert Bringman, linux-kernel,
	Clear Zhang, Mark Lord, Adrian Sun, Frank Tiernan, linuxppc-dev,
	David S. Miller
In-Reply-To: <20210614091228.GB5285@dell>

On Mon, Jun 14, 2021 at 10:12:28AM +0100, Lee Jones wrote:
> On Mon, 07 Jun 2021, Christoph Hellwig wrote:
> 
> > Please don't touch this code as it is about to be removed entirely.
> 
> Do you have an ETA for this work?

I just resent the series.

^ permalink raw reply

* Re: [PATCH v4 1/2] module: add elf_check_module_arch for module specific elf arch checks
From: Jessica Yu @ 2021-06-16 13:49 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Michal Suchánek, linuxppc-dev, linux-kernel, Nicholas Piggin
In-Reply-To: <87v96esffr.fsf@mpe.ellerman.id.au>

+++ Michael Ellerman [16/06/21 12:37 +1000]:
>Jessica Yu <jeyu@kernel.org> writes:
>> +++ Nicholas Piggin [15/06/21 12:05 +1000]:
>>>Excerpts from Jessica Yu's message of June 14, 2021 10:06 pm:
>>>> +++ Nicholas Piggin [11/06/21 19:39 +1000]:
>>>>>The elf_check_arch() function is used to test usermode binaries, but
>>>>>kernel modules may have more specific requirements. powerpc would like
>>>>>to test for ABI version compatibility.
>>>>>
>>>>>Add an arch-overridable function elf_check_module_arch() that defaults
>>>>>to elf_check_arch() and use it in elf_validity_check().
>>>>>
>>>>>Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
>>>>>[np: split patch, added changelog]
>>>>>Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>>>>---
>>>>> include/linux/moduleloader.h | 5 +++++
>>>>> kernel/module.c              | 2 +-
>>>>> 2 files changed, 6 insertions(+), 1 deletion(-)
>>>>>
>>>>>diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
>>>>>index 9e09d11ffe5b..fdc042a84562 100644
>>>>>--- a/include/linux/moduleloader.h
>>>>>+++ b/include/linux/moduleloader.h
>>>>>@@ -13,6 +13,11 @@
>>>>>  * must be implemented by each architecture.
>>>>>  */
>>>>>
>>>>>+// Allow arch to optionally do additional checking of module ELF header
>>>>>+#ifndef elf_check_module_arch
>>>>>+#define elf_check_module_arch elf_check_arch
>>>>>+#endif
>>>>
>>>> Hi Nicholas,
>>>>
>>>> Why not make elf_check_module_arch() consistent with the other
>>>> arch-specific functions? Please see module_frob_arch_sections(),
>>>> module_{init,exit}_section(), etc in moduleloader.h. That is, they are
>>>> all __weak functions that are overridable by arches. We can maybe make
>>>> elf_check_module_arch() a weak symbol, available for arches to
>>>> override if they want to perform additional elf checks. Then we don't
>>>> have to have this one-off #define.
>
>>>Like this? I like it. Good idea.
>>
>> Yeah! Also, maybe we can alternatively make elf_check_module_arch() a
>> separate check entirely so that the powerpc implementation doesn't
>> have to include that extra elf_check_arch() call. Something like this maybe?
>
>My thinking for making elf_check_module_arch() the only hook was that
>conceivably you might not want/need to call elf_check_arch() from
>elf_check_module_arch().
>
>So having a single module specific hook allows arch code to decide
>how to implement the check, which may or may not involve calling
>elf_check_arch(), but that becomes an arch implementation detail.

Thanks for the feedback! Yeah, that's fair too. Well, I ended up doing
it this way mostly to create less churn/change of behavior, since in
its current state elf_check_arch() is already being called for each
arch. Additionally I wanted to save the powerpc implementation of
elf_check_module_arch() an extra elf_check_arch() call. In any case I
have a slight preference for having a second hook to allow arches add
any extra checks in addition to elf_check_arch(). Thanks!

^ permalink raw reply

* Re: [PATCH v2 1/1] powerpc/papr_scm: Properly handle UUID types and API
From: Aneesh Kumar K.V @ 2021-06-16 13:47 UTC (permalink / raw)
  To: Andy Shevchenko, Michael Ellerman, linuxppc-dev, linux-kernel
  Cc: Oliver O'Halloran, Paul Mackerras
In-Reply-To: <20210616134303.58185-1-andriy.shevchenko@linux.intel.com>

On 6/16/21 7:13 PM, Andy Shevchenko wrote:
> Parse to and export from UUID own type, before dereferencing.
> This also fixes wrong comment (Little Endian UUID is something else)
> and should eliminate the direct strict types assignments.
> 
> Fixes: 43001c52b603 ("powerpc/papr_scm: Use ibm,unit-guid as the iset cookie")
> Fixes: 259a948c4ba1 ("powerpc/pseries/scm: Use a specific endian format for storing uuid from the device tree")


Do we need the Fixes: there? It didn't change any functionality right? 
The format with which we stored cookie1 remains the same with older and 
newer code. The newer one is better?

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

> Cc: Oliver O'Halloran <oohall@gmail.com>
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> ---
> v2: added missed header (Vaibhav), updated comment (Aneesh),
>      rewrite part of the commit message to avoid mentioning the Sparse
>   arch/powerpc/platforms/pseries/papr_scm.c | 27 +++++++++++++++--------
>   1 file changed, 18 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> index e2b69cc3beaf..b43be41e8ff7 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -18,6 +18,7 @@
>   #include <asm/plpar_wrappers.h>
>   #include <asm/papr_pdsm.h>
>   #include <asm/mce.h>
> +#include <asm/unaligned.h>
>   
>   #define BIND_ANY_ADDR (~0ul)
>   
> @@ -1101,8 +1102,9 @@ static int papr_scm_probe(struct platform_device *pdev)
>   	u32 drc_index, metadata_size;
>   	u64 blocks, block_size;
>   	struct papr_scm_priv *p;
> +	u8 uuid_raw[UUID_SIZE];
>   	const char *uuid_str;
> -	u64 uuid[2];
> +	uuid_t uuid;
>   	int rc;
>   
>   	/* check we have all the required DT properties */
> @@ -1145,16 +1147,23 @@ static int papr_scm_probe(struct platform_device *pdev)
>   	p->hcall_flush_required = of_property_read_bool(dn, "ibm,hcall-flush-required");
>   
>   	/* We just need to ensure that set cookies are unique across */
> -	uuid_parse(uuid_str, (uuid_t *) uuid);
> +	uuid_parse(uuid_str, &uuid);
> +
>   	/*
> -	 * cookie1 and cookie2 are not really little endian
> -	 * we store a little endian representation of the
> -	 * uuid str so that we can compare this with the label
> -	 * area cookie irrespective of the endian config with which
> -	 * the kernel is built.
> +	 * The cookie1 and cookie2 are not really little endian.
> +	 * We store a raw buffer representation of the
> +	 * uuid string so that we can compare this with the label
> +	 * area cookie irrespective of the endian configuration
> +	 * with which the kernel is built.
> +	 *
> +	 * Historically we stored the cookie in the below format.
> +	 * for a uuid string 72511b67-0b3b-42fd-8d1d-5be3cae8bcaa
> +	 *	cookie1 was 0xfd423b0b671b5172
> +	 *	cookie2 was 0xaabce8cae35b1d8d
>   	 */
> -	p->nd_set.cookie1 = cpu_to_le64(uuid[0]);
> -	p->nd_set.cookie2 = cpu_to_le64(uuid[1]);
> +	export_uuid(uuid_raw, &uuid);
> +	p->nd_set.cookie1 = get_unaligned_le64(&uuid_raw[0]);
> +	p->nd_set.cookie2 = get_unaligned_le64(&uuid_raw[8]);
>   
>   	/* might be zero */
>   	p->metadata_size = metadata_size;
> 


^ permalink raw reply

* [PATCH 1/1] powerpc/pseries: Interface to represent PAPR firmware attributes
From: Pratik R. Sampat @ 2021-06-16 13:42 UTC (permalink / raw)
  To: mpe, benh, paulus, linuxppc-dev, kvm-ppc, linux-kernel, psampat,
	pratik.r.sampat
In-Reply-To: <20210616134240.62195-1-psampat@linux.ibm.com>

Adds a generic interface to represent the energy and frequency related
PAPR attributes on the system using the new H_CALL
"H_GET_ENERGY_SCALE_INFO".

H_GET_EM_PARMS H_CALL was previously responsible for exporting this
information in the lparcfg, however the H_GET_EM_PARMS H_CALL
will be deprecated P10 onwards.

The H_GET_ENERGY_SCALE_INFO H_CALL is of the following call format:
hcall(
  uint64 H_GET_ENERGY_SCALE_INFO,  // Get energy scale info
  uint64 flags,           // Per the flag request
  uint64 firstAttributeId,// The attribute id
  uint64 bufferAddress,   // Guest physical address of the output buffer
  uint64 bufferSize       // The size in bytes of the output buffer
);

This H_CALL can query either all the attributes at once with
firstAttributeId = 0, flags = 0 as well as query only one attribute
at a time with firstAttributeId = id

The output buffer consists of the following
1. number of attributes              - 8 bytes
2. array offset to the data location - 8 bytes
3. version info                      - 1 byte
4. A data array of size num attributes, which contains the following:
  a. attribute ID              - 8 bytes
  b. attribute value in number - 8 bytes
  c. attribute name in string  - 64 bytes
  d. attribute value in string - 64 bytes

The new H_CALL exports information in direct string value format, hence
a new interface has been introduced in
/sys/firmware/papr/energy_scale_info to export this information to
userspace in an extensible pass-through format.

The H_CALL returns the name, numeric value and string value (if exists)

The format of exposing the sysfs information is as follows:
/sys/firmware/papr/energy_scale_info/
   |-- <id>/
     |-- desc
     |-- value
     |-- value_desc (if exists)
   |-- <id>/
     |-- desc
     |-- value
     |-- value_desc (if exists)
...

The energy information that is exported is useful for userspace tools
such as powerpc-utils. Currently these tools infer the
"power_mode_data" value in the lparcfg, which in turn is obtained from
the to be deprecated H_GET_EM_PARMS H_CALL.
On future platforms, such userspace utilities will have to look at the
data returned from the new H_CALL being populated in this new sysfs
interface and report this information directly without the need of
interpretation.

Signed-off-by: Pratik R. Sampat <psampat@linux.ibm.com>
---
 .../sysfs-firmware-papr-energy-scale-info     |  26 ++
 arch/powerpc/include/asm/hvcall.h             |  21 +-
 arch/powerpc/kvm/trace_hv.h                   |   1 +
 arch/powerpc/platforms/pseries/Makefile       |   3 +-
 .../pseries/papr_platform_attributes.c        | 292 ++++++++++++++++++
 5 files changed, 341 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
 create mode 100644 arch/powerpc/platforms/pseries/papr_platform_attributes.c

diff --git a/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info b/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
new file mode 100644
index 000000000000..499bc1ae173a
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
@@ -0,0 +1,26 @@
+What:		/sys/firmware/papr/energy_scale_info
+Date:		June 2021
+Contact:	Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org>
+Description:	Director hosting a set of platform attributes on Linux
+		running as a PAPR guest.
+
+		Each file in a directory contains a platform
+		attribute hierarchy pertaining to performance/
+		energy-savings mode and processor frequency.
+
+What:		/sys/firmware/papr/energy_scale_info/<id>
+		/sys/firmware/papr/energy_scale_info/<id>/desc
+		/sys/firmware/papr/energy_scale_info/<id>/value
+		/sys/firmware/papr/energy_scale_info/<id>/value_desc
+Date:		June 2021
+Contact:	Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org>
+Description:	PAPR attributes directory for POWERVM servers
+
+		This directory provides PAPR information. It
+		contains below sysfs attributes:
+
+		- desc: File contains the name of attribute <id>
+
+		- value: Numeric value of attribute <id>
+
+		- value_desc: String value of attribute <id>
diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index e3b29eda8074..19a2a8c77a49 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -316,7 +316,8 @@
 #define H_SCM_PERFORMANCE_STATS 0x418
 #define H_RPT_INVALIDATE	0x448
 #define H_SCM_FLUSH		0x44C
-#define MAX_HCALL_OPCODE	H_SCM_FLUSH
+#define H_GET_ENERGY_SCALE_INFO	0x450
+#define MAX_HCALL_OPCODE	H_GET_ENERGY_SCALE_INFO
 
 /* Scope args for H_SCM_UNBIND_ALL */
 #define H_UNBIND_SCOPE_ALL (0x1)
@@ -631,6 +632,24 @@ struct hv_gpci_request_buffer {
 	uint8_t bytes[HGPCI_MAX_DATA_BYTES];
 } __packed;
 
+#define MAX_EM_ATTRS	10
+#define MAX_EM_DATA_BYTES \
+	(sizeof(struct energy_scale_attributes) * MAX_EM_ATTRS)
+struct energy_scale_attributes {
+	__be64 attr_id;
+	__be64 attr_value;
+	unsigned char attr_desc[64];
+	unsigned char attr_value_desc[64];
+} __packed;
+
+struct hv_energy_scale_buffer {
+	__be64 num_attr;
+	__be64 array_offset;
+	__u8 data_header_version;
+	unsigned char data[MAX_EM_DATA_BYTES];
+} __packed;
+
+
 #endif /* __ASSEMBLY__ */
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_HVCALL_H */
diff --git a/arch/powerpc/kvm/trace_hv.h b/arch/powerpc/kvm/trace_hv.h
index 830a126e095d..38cd0ed0a617 100644
--- a/arch/powerpc/kvm/trace_hv.h
+++ b/arch/powerpc/kvm/trace_hv.h
@@ -115,6 +115,7 @@
 	{H_VASI_STATE,			"H_VASI_STATE"}, \
 	{H_ENABLE_CRQ,			"H_ENABLE_CRQ"}, \
 	{H_GET_EM_PARMS,		"H_GET_EM_PARMS"}, \
+	{H_GET_ENERGY_SCALE_INFO,	"H_GET_ENERGY_SCALE_INFO"}, \
 	{H_SET_MPP,			"H_SET_MPP"}, \
 	{H_GET_MPP,			"H_GET_MPP"}, \
 	{H_HOME_NODE_ASSOCIATIVITY,	"H_HOME_NODE_ASSOCIATIVITY"}, \
diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
index c8a2b0b05ac0..d14fca89ac25 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -6,7 +6,8 @@ obj-y			:= lpar.o hvCall.o nvram.o reconfig.o \
 			   of_helpers.o \
 			   setup.o iommu.o event_sources.o ras.o \
 			   firmware.o power.o dlpar.o mobility.o rng.o \
-			   pci.o pci_dlpar.o eeh_pseries.o msi.o
+			   pci.o pci_dlpar.o eeh_pseries.o msi.o \
+			   papr_platform_attributes.o
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_SCANLOG)	+= scanlog.o
 obj-$(CONFIG_KEXEC_CORE)	+= kexec.o
diff --git a/arch/powerpc/platforms/pseries/papr_platform_attributes.c b/arch/powerpc/platforms/pseries/papr_platform_attributes.c
new file mode 100644
index 000000000000..498c74a5e9ab
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/papr_platform_attributes.c
@@ -0,0 +1,292 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * PAPR platform energy attributes driver
+ *
+ * This driver creates a sys file at /sys/firmware/papr/ which contains
+ * files keyword - value pairs that specify energy configuration of the system.
+ *
+ * Copyright 2021 IBM Corp.
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/hugetlb.h>
+#include <asm/lppaca.h>
+#include <asm/hvcall.h>
+#include <asm/firmware.h>
+#include <asm/time.h>
+#include <asm/prom.h>
+#include <asm/vdso_datapage.h>
+#include <asm/vio.h>
+#include <asm/mmu.h>
+#include <asm/machdep.h>
+#include <asm/drmem.h>
+
+#include "pseries.h"
+
+#define MAX_ATTRS	3
+#define MAX_NAME_LEN	16
+
+struct papr_attr {
+	u64 id;
+	struct kobj_attribute attr;
+};
+struct papr_group {
+	char name[MAX_NAME_LEN];
+	struct attribute_group pg;
+	struct papr_attr *pgattrs;
+} *pgs;
+
+struct kobject *papr_kobj;
+struct kobject *escale_kobj;
+struct hv_energy_scale_buffer *em_buf;
+struct energy_scale_attributes *ea;
+
+static ssize_t papr_show_desc(struct kobject *kobj,
+			       struct kobj_attribute *attr,
+			       char *buf)
+{
+	struct papr_attr *pattr = container_of(attr, struct papr_attr, attr);
+	int idx, ret = 0;
+
+	/*
+	 * We do not expect the name to change, hence use the old value
+	 * and save a HCALL
+	 */
+	for (idx = 0; idx < be64_to_cpu(em_buf->num_attr); idx++) {
+		if (pattr->id == be64_to_cpu(ea[idx].attr_id)) {
+			ret = sprintf(buf, "%s\n", ea[idx].attr_desc);
+			if (ret < 0)
+				ret = -EIO;
+			break;
+		}
+	}
+
+	return ret;
+}
+
+static ssize_t papr_show_value(struct kobject *kobj,
+				struct kobj_attribute *attr,
+				char *buf)
+{
+	struct papr_attr *pattr = container_of(attr, struct papr_attr, attr);
+	struct hv_energy_scale_buffer *t_buf;
+	struct energy_scale_attributes *t_ea;
+	int data_offset, ret = 0;
+
+	t_buf = kmalloc(sizeof(*t_buf), GFP_KERNEL);
+	if (t_buf == NULL)
+		return -ENOMEM;
+
+	ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, 0,
+				 pattr->id, virt_to_phys(t_buf),
+				 sizeof(*t_buf));
+
+	if (ret != H_SUCCESS) {
+		pr_warn("hcall failed: H_GET_ENERGY_SCALE_INFO");
+		goto out;
+	}
+
+	data_offset = be64_to_cpu(t_buf->array_offset) -
+			(sizeof(t_buf->num_attr) +
+			sizeof(t_buf->array_offset) +
+			sizeof(t_buf->data_header_version));
+
+	t_ea = (struct energy_scale_attributes *) &t_buf->data[data_offset];
+
+	ret = sprintf(buf, "%llu\n", be64_to_cpu(t_ea->attr_value));
+	if (ret < 0)
+		ret = -EIO;
+out:
+	kfree(t_buf);
+
+	return ret;
+}
+
+static ssize_t papr_show_value_desc(struct kobject *kobj,
+				     struct kobj_attribute *attr,
+				     char *buf)
+{
+	struct papr_attr *pattr = container_of(attr, struct papr_attr, attr);
+	struct hv_energy_scale_buffer *t_buf;
+	struct energy_scale_attributes *t_ea;
+	int data_offset, ret = 0;
+
+	t_buf = kmalloc(sizeof(*t_buf), GFP_KERNEL);
+	if (t_buf == NULL)
+		return -ENOMEM;
+
+	ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, 0,
+				 pattr->id, virt_to_phys(t_buf),
+				 sizeof(*t_buf));
+
+	if (ret != H_SUCCESS) {
+		pr_warn("hcall failed: H_GET_ENERGY_SCALE_INFO");
+		goto out;
+	}
+
+	data_offset = be64_to_cpu(t_buf->array_offset) -
+			(sizeof(t_buf->num_attr) +
+			sizeof(t_buf->array_offset) +
+			sizeof(t_buf->data_header_version));
+
+	t_ea = (struct energy_scale_attributes *) &t_buf->data[data_offset];
+
+	ret = sprintf(buf, "%s\n", t_ea->attr_value_desc);
+	if (ret < 0)
+		ret = -EIO;
+out:
+	kfree(t_buf);
+
+	return ret;
+}
+
+static struct papr_ops_info {
+	const char *attr_name;
+	ssize_t (*show)(struct kobject *kobj, struct kobj_attribute *attr,
+			char *buf);
+} ops_info[] = {
+	{ "desc", papr_show_desc },
+	{ "value", papr_show_value },
+	{ "value_desc", papr_show_value_desc },
+};
+
+static void add_attr(u64 id, int index, struct papr_attr *attr)
+{
+	attr->id = id;
+	sysfs_attr_init(&attr->attr.attr);
+	attr->attr.attr.name = ops_info[index].attr_name;
+	attr->attr.attr.mode = 0444;
+	attr->attr.show = ops_info[index].show;
+}
+
+static int add_attr_group(u64 id, int len, struct papr_group *pg,
+			  bool show_val_desc)
+{
+	int i;
+
+	for (i = 0; i < len; i++) {
+		if (!strcmp(ops_info[i].attr_name, "value_desc") &&
+		    !show_val_desc) {
+			continue;
+		}
+		add_attr(id, i, &pg->pgattrs[i]);
+		pg->pg.attrs[i] = &pg->pgattrs[i].attr.attr;
+	}
+
+	return sysfs_create_group(escale_kobj, &pg->pg);
+}
+
+
+static int __init papr_init(void)
+{
+	uint64_t num_attr;
+	int ret, idx, i, data_offset;
+
+	em_buf = kmalloc(sizeof(*em_buf), GFP_KERNEL);
+	if (em_buf == NULL)
+		return -ENOMEM;
+	/*
+	 * hcall(
+	 * uint64 H_GET_ENERGY_SCALE_INFO,  // Get energy scale info
+	 * uint64 flags,            // Per the flag request
+	 * uint64 firstAttributeId, // The attribute id
+	 * uint64 bufferAddress,    // Guest physical address of the output buffer
+	 * uint64 bufferSize);      // The size in bytes of the output buffer
+	 */
+	ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, 0, 0,
+				 virt_to_phys(em_buf), sizeof(*em_buf));
+
+	if (!firmware_has_feature(FW_FEATURE_LPAR) || ret != H_SUCCESS ||
+	    em_buf->data_header_version != 0x1) {
+		pr_warn("hcall failed: H_GET_ENERGY_SCALE_INFO");
+		goto out;
+	}
+
+	num_attr = be64_to_cpu(em_buf->num_attr);
+
+	/*
+	 * Typecast the energy buffer to the attribute structure at the offset
+	 * specified in the buffer
+	 */
+	data_offset = be64_to_cpu(em_buf->array_offset) -
+			(sizeof(em_buf->num_attr) +
+			sizeof(em_buf->array_offset) +
+			sizeof(em_buf->data_header_version));
+
+	ea = (struct energy_scale_attributes *) &em_buf->data[data_offset];
+
+	pgs = kcalloc(num_attr, sizeof(*pgs), GFP_KERNEL);
+	if (!pgs)
+		goto out_pgs;
+
+	papr_kobj = kobject_create_and_add("papr", firmware_kobj);
+	if (!papr_kobj) {
+		pr_warn("kobject_create_and_add papr failed\n");
+		goto out_kobj;
+	}
+
+	escale_kobj = kobject_create_and_add("energy_scale_info", papr_kobj);
+	if (!escale_kobj) {
+		pr_warn("kobject_create_and_add energy_scale_info failed\n");
+		goto out_ekobj;
+	}
+
+	for (idx = 0; idx < num_attr; idx++) {
+		char buf[4];
+		bool show_val_desc = true;
+
+		pgs[idx].pgattrs = kcalloc(MAX_ATTRS,
+					   sizeof(*pgs[idx].pgattrs),
+					   GFP_KERNEL);
+		if (!pgs[idx].pgattrs)
+			goto out_kobj;
+
+		pgs[idx].pg.attrs = kcalloc(MAX_ATTRS + 1,
+					    sizeof(*pgs[idx].pg.attrs),
+					    GFP_KERNEL);
+		if (!pgs[idx].pg.attrs) {
+			kfree(pgs[idx].pgattrs);
+			goto out_kobj;
+		}
+
+		sprintf(buf, "%lld", be64_to_cpu(ea[idx].attr_id));
+		pgs[idx].pg.name = buf;
+
+		/* Do not add the value description if it does not exist */
+		if (strlen(ea[idx].attr_value_desc) == 0)
+			show_val_desc = false;
+
+		if (add_attr_group(be64_to_cpu(ea[idx].attr_id),
+				   MAX_ATTRS, &pgs[idx], show_val_desc)) {
+			pr_warn("Failed to create papr attribute group %s\n",
+				pgs[idx].pg.name);
+			goto out_pgattrs;
+		}
+	}
+
+	return 0;
+
+out_pgattrs:
+	for (i = 0; i < MAX_ATTRS; i++) {
+		kfree(pgs[i].pgattrs);
+		kfree(pgs[i].pg.attrs);
+	}
+out_ekobj:
+	kobject_put(escale_kobj);
+out_kobj:
+	kobject_put(papr_kobj);
+out_pgs:
+	kfree(pgs);
+out:
+	kfree(em_buf);
+
+	return -ENOMEM;
+}
+
+machine_device_initcall(pseries, papr_init);
-- 
2.30.2


^ permalink raw reply related

* [PATCH 0/1] Interface to represent PAPR firmware attributes
From: Pratik R. Sampat @ 2021-06-16 13:42 UTC (permalink / raw)
  To: mpe, benh, paulus, linuxppc-dev, kvm-ppc, linux-kernel, psampat,
	pratik.r.sampat

RFC --> v1
RFC: https://lkml.org/lkml/2021/6/4/791

Changelog:
Overhaul in interface design to the following:
/sys/firmware/papr/energy_scale_info/
   |-- <id>/
     |-- desc
     |-- value
     |-- value_desc (if exists)
   |-- <id>/
     |-- desc
     |-- value
     |-- value_desc (if exists)

Also implemented a POC using this interface for the powerpc-utils'
ppc64_cpu --frequency command-line tool to utilize this information
in userspace.
The POC for the new interface has been hosted here:
https://github.com/pratiksampat/powerpc-utils/tree/H_GET_ENERGY_SCALE_INFO_v2

Sample output from the powerpc-utils tool is as follows:

# ppc64_cpu --frequency
Power and Performance Mode: XXXX
Idle Power Saver Status   : XXXX
Processor Folding Status  : XXXX --> Printed if Idle power save status is supported

Platform reported frequencies --> Frequencies reported from the platform's H_CALL i.e PAPR interface
min        :    NNNN GHz
max        :    NNNN GHz
static     :    NNNN GHz

Tool Computed frequencies
min        :    NNNN GHz (cpu XX)
max        :    NNNN GHz (cpu XX)
avg        :    NNNN GHz

Pratik R. Sampat (1):
  powerpc/pseries: Interface to represent PAPR firmware attributes

 .../sysfs-firmware-papr-energy-scale-info     |  26 ++
 arch/powerpc/include/asm/hvcall.h             |  21 +-
 arch/powerpc/kvm/trace_hv.h                   |   1 +
 arch/powerpc/platforms/pseries/Makefile       |   3 +-
 .../pseries/papr_platform_attributes.c        | 292 ++++++++++++++++++
 5 files changed, 341 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
 create mode 100644 arch/powerpc/platforms/pseries/papr_platform_attributes.c

-- 
2.30.2


^ permalink raw reply

* [PATCH v2 1/1] powerpc/papr_scm: Properly handle UUID types and API
From: Andy Shevchenko @ 2021-06-16 13:43 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev, linux-kernel
  Cc: Oliver O'Halloran, Andy Shevchenko, Paul Mackerras,
	Aneesh Kumar K . V

Parse to and export from UUID own type, before dereferencing.
This also fixes wrong comment (Little Endian UUID is something else)
and should eliminate the direct strict types assignments.

Fixes: 43001c52b603 ("powerpc/papr_scm: Use ibm,unit-guid as the iset cookie")
Fixes: 259a948c4ba1 ("powerpc/pseries/scm: Use a specific endian format for storing uuid from the device tree")
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
v2: added missed header (Vaibhav), updated comment (Aneesh),
    rewrite part of the commit message to avoid mentioning the Sparse
 arch/powerpc/platforms/pseries/papr_scm.c | 27 +++++++++++++++--------
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index e2b69cc3beaf..b43be41e8ff7 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -18,6 +18,7 @@
 #include <asm/plpar_wrappers.h>
 #include <asm/papr_pdsm.h>
 #include <asm/mce.h>
+#include <asm/unaligned.h>
 
 #define BIND_ANY_ADDR (~0ul)
 
@@ -1101,8 +1102,9 @@ static int papr_scm_probe(struct platform_device *pdev)
 	u32 drc_index, metadata_size;
 	u64 blocks, block_size;
 	struct papr_scm_priv *p;
+	u8 uuid_raw[UUID_SIZE];
 	const char *uuid_str;
-	u64 uuid[2];
+	uuid_t uuid;
 	int rc;
 
 	/* check we have all the required DT properties */
@@ -1145,16 +1147,23 @@ static int papr_scm_probe(struct platform_device *pdev)
 	p->hcall_flush_required = of_property_read_bool(dn, "ibm,hcall-flush-required");
 
 	/* We just need to ensure that set cookies are unique across */
-	uuid_parse(uuid_str, (uuid_t *) uuid);
+	uuid_parse(uuid_str, &uuid);
+
 	/*
-	 * cookie1 and cookie2 are not really little endian
-	 * we store a little endian representation of the
-	 * uuid str so that we can compare this with the label
-	 * area cookie irrespective of the endian config with which
-	 * the kernel is built.
+	 * The cookie1 and cookie2 are not really little endian.
+	 * We store a raw buffer representation of the
+	 * uuid string so that we can compare this with the label
+	 * area cookie irrespective of the endian configuration
+	 * with which the kernel is built.
+	 *
+	 * Historically we stored the cookie in the below format.
+	 * for a uuid string 72511b67-0b3b-42fd-8d1d-5be3cae8bcaa
+	 *	cookie1 was 0xfd423b0b671b5172
+	 *	cookie2 was 0xaabce8cae35b1d8d
 	 */
-	p->nd_set.cookie1 = cpu_to_le64(uuid[0]);
-	p->nd_set.cookie2 = cpu_to_le64(uuid[1]);
+	export_uuid(uuid_raw, &uuid);
+	p->nd_set.cookie1 = get_unaligned_le64(&uuid_raw[0]);
+	p->nd_set.cookie2 = get_unaligned_le64(&uuid_raw[8]);
 
 	/* might be zero */
 	p->metadata_size = metadata_size;
-- 
2.30.2


^ permalink raw reply related

* Re: [PATCH v1 1/1] powerpc/papr_scm: Properly handle UUID types and API
From: Andy Shevchenko @ 2021-06-16 13:38 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Oliver O'Halloran, linux-kernel, Paul Mackerras, Vaibhav Jain,
	linuxppc-dev
In-Reply-To: <8e724a87-da78-9fc9-073e-cbbfea0ff97e@linux.ibm.com>

On Fri, Apr 16, 2021 at 03:05:31PM +0530, Aneesh Kumar K.V wrote:
> On 4/16/21 2:39 PM, Andy Shevchenko wrote:
> > On Fri, Apr 16, 2021 at 01:28:21PM +0530, Aneesh Kumar K.V wrote:
> > > On 4/15/21 7:16 PM, Andy Shevchenko wrote:
> > > > Parse to and export from UUID own type, before dereferencing.
> > > > This also fixes wrong comment (Little Endian UUID is something else)
> > > > and should fix Sparse warnings about assigning strict types to POD.
> > > > 
> > > > Fixes: 43001c52b603 ("powerpc/papr_scm: Use ibm,unit-guid as the iset cookie")
> > > > Fixes: 259a948c4ba1 ("powerpc/pseries/scm: Use a specific endian format for storing uuid from the device tree")
> > > > Cc: Oliver O'Halloran <oohall@gmail.com>
> > > > Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> > > > Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > > > ---
> > > > Not tested
> > > >    arch/powerpc/platforms/pseries/papr_scm.c | 13 ++++++++-----
> > > >    1 file changed, 8 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> > > > index ae6f5d80d5ce..4366e1902890 100644
> > > > --- a/arch/powerpc/platforms/pseries/papr_scm.c
> > > > +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> > > > @@ -1085,8 +1085,9 @@ static int papr_scm_probe(struct platform_device *pdev)
> > > >    	u32 drc_index, metadata_size;
> > > >    	u64 blocks, block_size;
> > > >    	struct papr_scm_priv *p;
> > > > +	u8 uuid_raw[UUID_SIZE];
> > > >    	const char *uuid_str;
> > > > -	u64 uuid[2];
> > > > +	uuid_t uuid;
> > > >    	int rc;
> > > >    	/* check we have all the required DT properties */
> > > > @@ -1129,16 +1130,18 @@ static int papr_scm_probe(struct platform_device *pdev)
> > > >    	p->hcall_flush_required = of_property_read_bool(dn, "ibm,hcall-flush-required");
> > > >    	/* We just need to ensure that set cookies are unique across */
> > > > -	uuid_parse(uuid_str, (uuid_t *) uuid);
> > > > +	uuid_parse(uuid_str, &uuid);
> > > > +
> > > >    	/*
> > > >    	 * cookie1 and cookie2 are not really little endian
> > > > -	 * we store a little endian representation of the
> > > > +	 * we store a raw buffer representation of the
> > > >    	 * uuid str so that we can compare this with the label
> > > >    	 * area cookie irrespective of the endian config with which
> > > >    	 * the kernel is built.
> > > >    	 */
> > > > -	p->nd_set.cookie1 = cpu_to_le64(uuid[0]);
> > > > -	p->nd_set.cookie2 = cpu_to_le64(uuid[1]);
> > > > +	export_uuid(uuid_raw, &uuid);
> > > > +	p->nd_set.cookie1 = get_unaligned_le64(&uuid_raw[0]);
> > > > +	p->nd_set.cookie2 = get_unaligned_le64(&uuid_raw[8]);
> > > 
> > > ok that does the equivalent of cpu_to_le64 there. So we are good. But the
> > > comment update is missing the details why we did that get_unaligned_le64.
> > > Maybe raw buffer representation is the correct term?
> > > Should we add an example in the comment. ie,
> > 
> > > /*
> > >   * Historically we stored the cookie in the below format.
> > > for a uuid str 72511b67-0b3b-42fd-8d1d-5be3cae8bcaa
> > > cookie1 was  0xfd423b0b671b5172 cookie2 was 0xaabce8cae35b1d8d
> > > */
> > 
> > I'm fine with the comment. At least it will shed a light on the byte ordering
> > we are expecting.
> > 
> 
> Will you be sending an update? Also it will be good to list the sparse
> warning in the commit message?

I'll send an update but I rephrase to remove mention of Sparse. I have no
Sparse build for this architecture.

If you have one, try to build with `make W=1 C=1 CF=-D__CHECK_ENDIAN__ ...`
which will enable warnings about restricted types assignment.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply

* Re: [PATCH 07/11] powerpc: Add support for microwatt's hardware random number generator
From: Michael Ellerman @ 2021-06-16 13:16 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev, Paul Mackerras
In-Reply-To: <1623720368.eqmkro3mgw.astroid@bobo.none>

Nicholas Piggin <npiggin@gmail.com> writes:
> Excerpts from Paul Mackerras's message of June 15, 2021 9:02 am:
>> This is accessed using the DARN instruction and should probably be
>> done more generically.
>> 
>> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
>> ---
>>  arch/powerpc/include/asm/archrandom.h     | 12 +++++-
>>  arch/powerpc/platforms/microwatt/Kconfig  |  1 +
>>  arch/powerpc/platforms/microwatt/Makefile |  2 +-
>>  arch/powerpc/platforms/microwatt/rng.c    | 48 +++++++++++++++++++++++
>>  4 files changed, 61 insertions(+), 2 deletions(-)
>>  create mode 100644 arch/powerpc/platforms/microwatt/rng.c
>> 
>> diff --git a/arch/powerpc/include/asm/archrandom.h b/arch/powerpc/include/asm/archrandom.h
>> index 9a53e29680f4..e8ae0f7740f9 100644
>> --- a/arch/powerpc/include/asm/archrandom.h
>> +++ b/arch/powerpc/include/asm/archrandom.h
>> @@ -8,12 +8,22 @@
>>  
>>  static inline bool __must_check arch_get_random_long(unsigned long *v)
>>  {
>> +	if (ppc_md.get_random_seed)
>> +		return ppc_md.get_random_seed(v);
>> +
>>  	return false;
>>  }
>>  
>>  static inline bool __must_check arch_get_random_int(unsigned int *v)
>>  {
>> -	return false;
>> +	unsigned long val;
>> +	bool rc;
>> +
>> +	rc = arch_get_random_long(&val);
>> +	if (rc)
>> +		*v = val;
>> +
>> +	return rc;
>>  }
>>  
>
> I would be happier if you didn't change this (or at least put it in its 
> own patch explaining why it's not going to slow down other platforms).

It would essentially be a revert of 01c9348c7620 ("powerpc: Use hardware
RNG for arch_get_random_seed_* not arch_get_random_*")

Which would be ironic :)

cheers

^ permalink raw reply

* Re: [PATCH v2 00/12] powerpc: Cleanup use of 'struct ppc_inst'
From: Michael Ellerman @ 2021-06-16 13:05 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
	naveen.n.rao, jniethe5
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <0c2b2eb4-f58d-9ec3-4b98-af22cef188e2@csgroup.eu>

Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> Le 15/06/2021 à 09:18, Michael Ellerman a écrit :
>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>> This series is a cleanup of the use of 'struct ppc_inst'.
>>>
>>> A confusion is made between internal representation of powerpc
>>> instructions with 'struct ppc_inst' and in-memory code which is
>>> and will always be an array of 'unsigned int'.
>> 
>> Why don't we use u32 *, to make it even more explicit what the expected
>> size is?
>> 
>
> I guess that's historical, we could use u32 *

Yeah I think it is historical, we just never thought about it much.

> We can convert it incrementaly maybe ?

I've still got this series in next-test, so I'll go through it and
change any uses of unsigned int * to u32 *, and then we can do another
pass later to change the remaining cases.

cheers

^ permalink raw reply

* Re: [PATCH v4 1/2] module: add elf_check_module_arch for module specific elf arch checks
From: Jessica Yu @ 2021-06-16 12:54 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Michal Suchánek, linuxppc-dev, linux-kernel
In-Reply-To: <1623805495.qdikm5ks8v.astroid@bobo.none>

+++ Nicholas Piggin [16/06/21 11:18 +1000]:
>Excerpts from Jessica Yu's message of June 15, 2021 10:17 pm:
>> +++ Nicholas Piggin [15/06/21 12:05 +1000]:
>>>Excerpts from Jessica Yu's message of June 14, 2021 10:06 pm:
>>>> +++ Nicholas Piggin [11/06/21 19:39 +1000]:
>>>>>The elf_check_arch() function is used to test usermode binaries, but
>>>>>kernel modules may have more specific requirements. powerpc would like
>>>>>to test for ABI version compatibility.
>>>>>
>>>>>Add an arch-overridable function elf_check_module_arch() that defaults
>>>>>to elf_check_arch() and use it in elf_validity_check().
>>>>>
>>>>>Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
>>>>>[np: split patch, added changelog]
>>>>>Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>>>>---
>>>>> include/linux/moduleloader.h | 5 +++++
>>>>> kernel/module.c              | 2 +-
>>>>> 2 files changed, 6 insertions(+), 1 deletion(-)
>>>>>
>>>>>diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
>>>>>index 9e09d11ffe5b..fdc042a84562 100644
>>>>>--- a/include/linux/moduleloader.h
>>>>>+++ b/include/linux/moduleloader.h
>>>>>@@ -13,6 +13,11 @@
>>>>>  * must be implemented by each architecture.
>>>>>  */
>>>>>
>>>>>+// Allow arch to optionally do additional checking of module ELF header
>>>>>+#ifndef elf_check_module_arch
>>>>>+#define elf_check_module_arch elf_check_arch
>>>>>+#endif
>>>>
>>>> Hi Nicholas,
>>>>
>>>> Why not make elf_check_module_arch() consistent with the other
>>>> arch-specific functions? Please see module_frob_arch_sections(),
>>>> module_{init,exit}_section(), etc in moduleloader.h. That is, they are
>>>> all __weak functions that are overridable by arches. We can maybe make
>>>> elf_check_module_arch() a weak symbol, available for arches to
>>>> override if they want to perform additional elf checks. Then we don't
>>>> have to have this one-off #define.
>>>
>>>
>>>Like this? I like it. Good idea.
>>
>> Yeah! Also, maybe we can alternatively make elf_check_module_arch() a
>> separate check entirely so that the powerpc implementation doesn't
>> have to include that extra elf_check_arch() call. Something like this maybe?
>
>Yeah we can do that. Would you be okay if it goes via powerpc tree? If
>yes, then we should get your Ack (or SOB because it seems to be entirely
>your patch now :D)

This can go through the powerpc tree. Will you do another respin
of this patch? And yes, feel free to take my SOB for this one -

     Signed-off-by: Jessica Yu <jeyu@kernel.org>

Thanks!

Jessica

^ permalink raw reply

* Re: [PATCH v12 00/12] Restricted DMA
From: Will Deacon @ 2021-06-16 12:08 UTC (permalink / raw)
  To: Claire Chang
  Cc: heikki.krogerus, thomas.hellstrom, peterz, joonas.lahtinen,
	dri-devel, chris, grant.likely, paulus, Frank Rowand, mingo,
	Marek Szyprowski, sstabellini, Saravana Kannan, Joerg Roedel,
	Rafael J . Wysocki, Christoph Hellwig, Bartosz Golaszewski,
	bskeggs, linux-pci, xen-devel, Thierry Reding, intel-gfx,
	matthew.auld, linux-devicetree, jxgao, daniel,
	Konrad Rzeszutek Wilk, maarten.lankhorst, airlied, Dan Williams,
	linuxppc-dev, jani.nikula, Rob Herring, rodrigo.vivi, bhelgaas,
	boris.ostrovsky, Andy Shevchenko, jgross, Nicolas Boichat,
	Greg KH, Randy Dunlap, lkml, tfiga, list@263.net:IOMMU DRIVERS,
	Jim Quinlan, xypron.glpk, Robin Murphy, bauerman
In-Reply-To: <20210616062157.953777-1-tientzu@chromium.org>

Hi Claire,

On Wed, Jun 16, 2021 at 02:21:45PM +0800, Claire Chang wrote:
> This series implements mitigations for lack of DMA access control on
> systems without an IOMMU, which could result in the DMA accessing the
> system memory at unexpected times and/or unexpected addresses, possibly
> leading to data leakage or corruption.
> 
> For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
> not behind an IOMMU. As PCI-e, by design, gives the device full access to
> system memory, a vulnerability in the Wi-Fi firmware could easily escalate
> to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
> full chain of exploits; [2], [3]).
> 
> To mitigate the security concerns, we introduce restricted DMA. Restricted
> DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
> specially allocated region and does memory allocation from the same region.
> The feature on its own provides a basic level of protection against the DMA
> overwriting buffer contents at unexpected times. However, to protect
> against general data leakage and system memory corruption, the system needs
> to provide a way to restrict the DMA to a predefined memory region (this is
> usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]).
> 
> [1a] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html
> [1b] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html
> [2] https://blade.tencent.com/en/advisories/qualpwn/
> [3] https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/
> [4] https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132
> 
> v12:
> Split is_dev_swiotlb_force into is_swiotlb_force_bounce (patch 06/12) and
> is_swiotlb_for_alloc (patch 09/12)

I took this for a spin in an arm64 KVM guest with virtio devices using the
DMA API and it works as expected on top of swiotlb devel/for-linus-5.14, so:

Tested-by: Will Deacon <will@kernel.org>

Thanks!

Will

^ permalink raw reply

* Re: [PATCH v2 0/4] Add perf interface to expose nvdimm
From: Nageswara Sastry @ 2021-06-16 10:55 UTC (permalink / raw)
  To: Kajol Jain
  Cc: nvdimm, santosh, maddy, ira.weiny, peterz, linux-kernel,
	Athira Rajeev, aneesh.kumar, vaibhav, dan.j.williams,
	linuxppc-dev, tglx
In-Reply-To: <20210614052326.285710-1-kjain@linux.ibm.com>



> On 14-Jun-2021, at 10:53 AM, Kajol Jain <kjain@linux.ibm.com> wrote:
> 
> Patchset adds performance stats reporting support for nvdimm.
> Added interface includes support for pmu register/unregister
> functions. A structure is added called nvdimm_pmu to be used for
> adding arch/platform specific data such as supported events, cpumask
> pmu event functions like event_init/add/read/del.
> User could use the standard perf tool to access perf
> events exposed via pmu.
> 
> Added implementation to expose IBM pseries platform nmem*
> device performance stats using this interface.
> ...
> 
> Patch1:
>        Introduces the nvdimm_pmu structure
> Patch2:
> 	Adds common interface to add arch/platform specific data
> 	includes supported events, pmu event functions. It also
> 	adds code for cpu hotplug support.
> Patch3:
>        Add code in arch/powerpc/platform/pseries/papr_scm.c to expose
>        nmem* pmu. It fills in the nvdimm_pmu structure with event attrs
>        cpumask andevent functions and then registers the pmu by adding
>        callbacks to register_nvdimm_pmu.
> Patch4:
>        Sysfs documentation patch

Tested with the following scenarios:
1. Check dmesg for nmem PMU registered messages.
2. Listed nmem events using 'perf list and perf list nmem'
3. Ran 'perf stat' with single event, grouping events, events from same pmu,
   different pmu and invalid events
4. Read from sysfs files, Writing in to sysfs files
5. While running nmem events with perf stat, offline cpu from the nmem?/cpumask

While running the above functionality worked as expected, no error messages seen
in dmesg.

Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>

> 
> Changelog
> ---
> PATCH v1 -> PATCH v2
> - Fix hotplug code by adding pmu migration call
>  incase current designated cpu got offline. As
>  pointed by Peter Zijlstra.
> 
> - Removed the retun -1 part from cpu hotplug offline
>  function.
> 
> - Link to the previous patchset : https://lkml.org/lkml/2021/6/8/500
> ---
> Kajol Jain (4):
>  drivers/nvdimm: Add nvdimm pmu structure
>  drivers/nvdimm: Add perf interface to expose nvdimm performance stats
>  powerpc/papr_scm: Add perf interface support
>  powerpc/papr_scm: Document papr_scm sysfs event format entries
> 
> Documentation/ABI/testing/sysfs-bus-papr-pmem |  31 ++
> arch/powerpc/include/asm/device.h             |   5 +
> arch/powerpc/platforms/pseries/papr_scm.c     | 365 ++++++++++++++++++
> drivers/nvdimm/Makefile                       |   1 +
> drivers/nvdimm/nd_perf.c                      | 230 +++++++++++
> include/linux/nd.h                            |  46 +++
> 6 files changed, 678 insertions(+)
> create mode 100644 drivers/nvdimm/nd_perf.c
> 
Thanks and Regards,
R.Nageswara Sastry

> 


^ permalink raw reply

* Re: [PATCH 8/8] membarrier: Rewrite sync_core_before_usermode() and improve documentation
From: Will Deacon @ 2021-06-16 10:20 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Catalin Marinas, linux-mm, Peter Zijlstra, x86, LKML,
	Nicholas Piggin, Dave Hansen, Paul Mackerras, stable,
	Mathieu Desnoyers, Andrew Morton, linuxppc-dev, linux-arm-kernel
In-Reply-To: <07a8b963002cb955b7516e61bad19514a3acaa82.1623813516.git.luto@kernel.org>

On Tue, Jun 15, 2021 at 08:21:13PM -0700, Andy Lutomirski wrote:
> The old sync_core_before_usermode() comments suggested that a non-icache-syncing
> return-to-usermode instruction is x86-specific and that all other
> architectures automatically notice cross-modified code on return to
> userspace.
> 
> This is misleading.  The incantation needed to modify code from one
> CPU and execute it on another CPU is highly architecture dependent.
> On x86, according to the SDM, one must modify the code, issue SFENCE
> if the modification was WC or nontemporal, and then issue a "serializing
> instruction" on the CPU that will execute the code.  membarrier() can do
> the latter.
> 
> On arm64 and powerpc, one must flush the icache and then flush the pipeline
> on the target CPU, although the CPU manuals don't necessarily use this
> language.
> 
> So let's drop any pretense that we can have a generic way to define or
> implement membarrier's SYNC_CORE operation and instead require all
> architectures to define the helper and supply their own documentation as to
> how to use it.  This means x86, arm64, and powerpc for now.  Let's also
> rename the function from sync_core_before_usermode() to
> membarrier_sync_core_before_usermode() because the precise flushing details
> may very well be specific to membarrier, and even the concept of
> "sync_core" in the kernel is mostly an x86-ism.
> 
> (It may well be the case that, on real x86 processors, synchronizing the
>  icache (which requires no action at all) and "flushing the pipeline" is
>  sufficient, but trying to use this language would be confusing at best.
>  LFENCE does something awfully like "flushing the pipeline", but the SDM
>  does not permit LFENCE as an alternative to a "serializing instruction"
>  for this purpose.)
> 
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: x86@kernel.org
> Cc: stable@vger.kernel.org
> Fixes: 70216e18e519 ("membarrier: Provide core serializing command, *_SYNC_CORE")
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  .../membarrier-sync-core/arch-support.txt     | 68 ++++++-------------
>  arch/arm64/include/asm/sync_core.h            | 19 ++++++
>  arch/powerpc/include/asm/sync_core.h          | 14 ++++
>  arch/x86/Kconfig                              |  1 -
>  arch/x86/include/asm/sync_core.h              |  7 +-
>  arch/x86/kernel/alternative.c                 |  2 +-
>  arch/x86/kernel/cpu/mce/core.c                |  2 +-
>  arch/x86/mm/tlb.c                             |  3 +-
>  drivers/misc/sgi-gru/grufault.c               |  2 +-
>  drivers/misc/sgi-gru/gruhandles.c             |  2 +-
>  drivers/misc/sgi-gru/grukservices.c           |  2 +-
>  include/linux/sched/mm.h                      |  1 -
>  include/linux/sync_core.h                     | 21 ------
>  init/Kconfig                                  |  3 -
>  kernel/sched/membarrier.c                     | 15 ++--
>  15 files changed, 75 insertions(+), 87 deletions(-)
>  create mode 100644 arch/arm64/include/asm/sync_core.h
>  create mode 100644 arch/powerpc/include/asm/sync_core.h
>  delete mode 100644 include/linux/sync_core.h

For the arm64 bits (docs and asm/sync_core.h):

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply

* Re: [PATCH v5 13/17] powerpc/pseries/vas: Setup IRQ and fault handling
From: Nicholas Piggin @ 2021-06-16  9:40 UTC (permalink / raw)
  To: Haren Myneni, herbert, linux-crypto, linuxppc-dev, mpe
In-Reply-To: <09992f14d25b9212686acfa85e8e4cab93b6b1fc.camel@linux.ibm.com>

Excerpts from Haren Myneni's message of June 15, 2021 7:01 pm:
> On Mon, 2021-06-14 at 13:07 +1000, Nicholas Piggin wrote:
>> Excerpts from Haren Myneni's message of June 13, 2021 9:02 pm:
>> > NX generates an interrupt when sees a fault on the user space
>> > buffer and the hypervisor forwards that interrupt to OS. Then
>> > the kernel handles the interrupt by issuing H_GET_NX_FAULT hcall
>> > to retrieve the fault CRB information.
>> > 
>> > This patch also adds changes to setup and free IRQ per each
>> > window and also handles the fault by updating the CSB.
>> > 
>> > Signed-off-by: Haren Myneni <haren@linux.ibm.com>
>> > ---
>> >  arch/powerpc/platforms/pseries/vas.c | 108
>> > +++++++++++++++++++++++++++
>> >  1 file changed, 108 insertions(+)
>> > 
>> > diff --git a/arch/powerpc/platforms/pseries/vas.c
>> > b/arch/powerpc/platforms/pseries/vas.c
>> > index fe375f7a7029..55185bdd3776 100644
>> > --- a/arch/powerpc/platforms/pseries/vas.c
>> > +++ b/arch/powerpc/platforms/pseries/vas.c
>> > @@ -11,6 +11,7 @@
>> >  #include <linux/types.h>
>> >  #include <linux/delay.h>
>> >  #include <linux/slab.h>
>> > +#include <linux/interrupt.h>
>> >  #include <asm/machdep.h>
>> >  #include <asm/hvcall.h>
>> >  #include <asm/plpar_wrappers.h>
>> > @@ -190,6 +191,58 @@ int h_query_vas_capabilities(const u64 hcall,
>> > u8 query_type, u64 result)
>> >  }
>> >  EXPORT_SYMBOL_GPL(h_query_vas_capabilities);
>> >  
>> > +/*
>> > + * hcall to get fault CRB from pHyp.
>> > + */
>> > +static int h_get_nx_fault(u32 winid, u64 buffer)
>> > +{
>> > +	long rc;
>> > +
>> > +	rc = plpar_hcall_norets(H_GET_NX_FAULT, winid, buffer);
>> > +
>> > +	switch (rc) {
>> > +	case H_SUCCESS:
>> > +		return 0;
>> > +	case H_PARAMETER:
>> > +		pr_err("HCALL(%x): Invalid window ID %u\n",
>> > H_GET_NX_FAULT,
>> > +		       winid);
>> > +		return -EINVAL;
>> > +	case H_PRIVILEGE:
>> > +		pr_err("HCALL(%x): Window(%u): Invalid fault buffer
>> > 0x%llx\n",
>> > +		       H_GET_NX_FAULT, winid, buffer);
>> > +		return -EACCES;
>> > +	default:
>> > +		pr_err("HCALL(%x): Failed with error %ld for
>> > window(%u)\n",
>> > +		       H_GET_NX_FAULT, rc, winid);
>> > +		return -EIO;
>> 
>> 3 error messages have 3 different formats for window ID.
>> 
>> I agree with Michael you could just have one error message that
>> reports 
>> the return value. Also "H_GET_NX_FAULT: " would be nicer than
>> "HCALL(380): "
> 
> yes, Added just one printk for all error codes except for errors which
> depend on arguments to HCALL (Ex: WinID).
> 
> Sure, I will add just one error message and print all arguments passed
> to HCALL. 
> 
> pr_err("H_GET_NX_FAULT: window(%u), fault buffer(0x%llx) Failed with
> error %ld\n", rc, winid, buffer);

Thanks.

>> 
>> Check how some other hcall failures are reported, "hcall failed: 
>> H_CALL_NAME" seems to have a few takers.
>> 
>> > +	}
>> > +}
>> > +
>> > +/*
>> > + * Handle the fault interrupt.
>> > + * When the fault interrupt is received for each window, query
>> > pHyp to get
>> > + * the fault CRB on the specific fault. Then process the CRB by
>> > updating
>> > + * CSB or send signal if the user space CSB is invalid.
>> > + * Note: pHyp forwards an interrupt for each fault request. So one
>> > fault
>> > + *	CRB to process for each H_GET_NX_FAULT HCALL.
>> > + */
>> > +irqreturn_t pseries_vas_fault_thread_fn(int irq, void *data)
>> > +{
>> > +	struct pseries_vas_window *txwin = data;
>> > +	struct coprocessor_request_block crb;
>> > +	struct vas_user_win_ref *tsk_ref;
>> > +	int rc;
>> > +
>> > +	rc = h_get_nx_fault(txwin->vas_win.winid,
>> > (u64)virt_to_phys(&crb));
>> > +	if (!rc) {
>> > +		tsk_ref = &txwin->vas_win.task_ref;
>> > +		vas_dump_crb(&crb);
>> > +		vas_update_csb(&crb, tsk_ref);
>> > +	}
>> > +
>> > +	return IRQ_HANDLED;
>> > +}
>> > +
>> >  /*
>> >   * Allocate window and setup IRQ mapping.
>> >   */
>> > @@ -201,10 +254,51 @@ static int allocate_setup_window(struct
>> > pseries_vas_window *txwin,
>> >  	rc = h_allocate_vas_window(txwin, domain, wintype,
>> > DEF_WIN_CREDS);
>> >  	if (rc)
>> >  		return rc;
>> > +	/*
>> > +	 * On powerVM, pHyp setup and forwards the fault interrupt per
>> 
>>            The hypervisor forwards the fault interrupt per-window...
>> 
>> > +	 * window. So the IRQ setup and fault handling will be done for
>> > +	 * each open window separately.
>> > +	 */
>> > +	txwin->fault_virq = irq_create_mapping(NULL, txwin->fault_irq);
>> > +	if (!txwin->fault_virq) {
>> > +		pr_err("Failed irq mapping %d\n", txwin->fault_irq);
>> > +		rc = -EINVAL;
>> > +		goto out_win;
>> > +	}
>> > +
>> > +	txwin->name = kasprintf(GFP_KERNEL, "vas-win-%d",
>> > +				txwin->vas_win.winid);
>> > +	if (!txwin->name) {
>> > +		rc = -ENOMEM;
>> > +		goto out_irq;
>> > +	}
>> > +
>> > +	rc = request_threaded_irq(txwin->fault_virq, NULL,
>> > +				  pseries_vas_fault_thread_fn,
>> > IRQF_ONESHOT,
>> > +				  txwin->name, txwin);
>> > +	if (rc) {
>> > +		pr_err("VAS-Window[%d]: Request IRQ(%u) failed with
>> > %d\n",
>> > +		       txwin->vas_win.winid, txwin->fault_virq, rc);
>> > +		goto out_free;
>> > +	}
>> >  
>> >  	txwin->vas_win.wcreds_max = DEF_WIN_CREDS;
>> >  
>> >  	return 0;
>> > +out_free:
>> > +	kfree(txwin->name);
>> > +out_irq:
>> > +	irq_dispose_mapping(txwin->fault_virq);
>> > +out_win:
>> > +	h_deallocate_vas_window(txwin->vas_win.winid);
>> > +	return rc;
>> > +}
>> > +
>> > +static inline void free_irq_setup(struct pseries_vas_window
>> > *txwin)
>> > +{
>> > +	free_irq(txwin->fault_virq, txwin);
>> > +	irq_dispose_mapping(txwin->fault_virq);
>> > +	kfree(txwin->name);
>> 
>> Nit, but this freeing is in a different order than the error handling
>> in 
>> the function does. I'd just keep it the same unless there is a reason
>> to 
>> be different, in which case it could use a comment.
> 
> shouldn't matter, I can change it to:
> static inline void free_irq_setup(struct pseries_vas_window *txwin)
> {
>         free_irq(txwin->fault_virq, txwin);
>         kfree(txwin->name);
>         irq_dispose_mapping(txwin->fault_virq);
> }

Okay, it wasn't a big deal I was just wondering. Given the changes,

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH v5 12/17] powerpc/pseries/vas: Integrate API with open/close windows
From: Nicholas Piggin @ 2021-06-16  9:38 UTC (permalink / raw)
  To: Haren Myneni, herbert, linux-crypto, linuxppc-dev, mpe
In-Reply-To: <91d139adcbd9e4851fdb11d30888f5f5a923b764.camel@linux.ibm.com>

Excerpts from Haren Myneni's message of June 15, 2021 5:26 pm:
> On Mon, 2021-06-14 at 12:55 +1000, Nicholas Piggin wrote:
>> Excerpts from Haren Myneni's message of June 13, 2021 9:02 pm:
>> > This patch adds VAS window allocatioa/close with the corresponding
>> > hcalls. Also changes to integrate with the existing user space VAS
>> > API and provide register/unregister functions to NX pseries driver.
>> > 
>> > The driver register function is used to create the user space
>> > interface (/dev/crypto/nx-gzip) and unregister to remove this
>> > entry.
>> > 
>> > The user space process opens this device node and makes an ioctl
>> > to allocate VAS window. The close interface is used to deallocate
>> > window.
>> > 
>> > Signed-off-by: Haren Myneni <haren@linux.ibm.com>
>> > ---
>> >  arch/powerpc/include/asm/vas.h          |   4 +
>> >  arch/powerpc/platforms/pseries/Makefile |   1 +
>> >  arch/powerpc/platforms/pseries/vas.c    | 223
>> > ++++++++++++++++++++++++
>> >  3 files changed, 228 insertions(+)
>> > 
>> > diff --git a/arch/powerpc/include/asm/vas.h
>> > b/arch/powerpc/include/asm/vas.h
>> > index eefc758d8cd4..9d5646d721c4 100644
>> > --- a/arch/powerpc/include/asm/vas.h
>> > +++ b/arch/powerpc/include/asm/vas.h
>> > @@ -254,6 +254,10 @@ struct vas_all_caps {
>> >  	u64     feat_type;
>> >  };
>> >  
>> > +int h_query_vas_capabilities(const u64 hcall, u8 query_type, u64
>> > result);
>> > +int vas_register_api_pseries(struct module *mod,
>> > +			     enum vas_cop_type cop_type, const char
>> > *name);
>> > +void vas_unregister_api_pseries(void);
>> >  #endif
>> >  
>> >  /*
>> > diff --git a/arch/powerpc/platforms/pseries/Makefile
>> > b/arch/powerpc/platforms/pseries/Makefile
>> > index c8a2b0b05ac0..4cda0ef87be0 100644
>> > --- a/arch/powerpc/platforms/pseries/Makefile
>> > +++ b/arch/powerpc/platforms/pseries/Makefile
>> > @@ -30,3 +30,4 @@ obj-$(CONFIG_PPC_SVM)		+= svm.o
>> >  obj-$(CONFIG_FA_DUMP)		+= rtas-fadump.o
>> >  
>> >  obj-$(CONFIG_SUSPEND)		+= suspend.o
>> > +obj-$(CONFIG_PPC_VAS)		+= vas.o
>> > diff --git a/arch/powerpc/platforms/pseries/vas.c
>> > b/arch/powerpc/platforms/pseries/vas.c
>> > index 98109a13f1c2..fe375f7a7029 100644
>> > --- a/arch/powerpc/platforms/pseries/vas.c
>> > +++ b/arch/powerpc/platforms/pseries/vas.c
>> > @@ -10,6 +10,7 @@
>> >  #include <linux/export.h>
>> >  #include <linux/types.h>
>> >  #include <linux/delay.h>
>> > +#include <linux/slab.h>
>> >  #include <asm/machdep.h>
>> >  #include <asm/hvcall.h>
>> >  #include <asm/plpar_wrappers.h>
>> > @@ -187,6 +188,228 @@ int h_query_vas_capabilities(const u64 hcall,
>> > u8 query_type, u64 result)
>> >  		return -EIO;
>> >  	}
>> >  }
>> > +EXPORT_SYMBOL_GPL(h_query_vas_capabilities);
>> > +
>> > +/*
>> > + * Allocate window and setup IRQ mapping.
>> > + */
>> > +static int allocate_setup_window(struct pseries_vas_window *txwin,
>> > +				 u64 *domain, u8 wintype)
>> > +{
>> > +	int rc;
>> > +
>> > +	rc = h_allocate_vas_window(txwin, domain, wintype,
>> > DEF_WIN_CREDS);
>> > +	if (rc)
>> > +		return rc;
>> > +
>> > +	txwin->vas_win.wcreds_max = DEF_WIN_CREDS;
>> > +
>> > +	return 0;
>> > +}
>> > +
>> > +static struct vas_window *vas_allocate_window(struct
>> > vas_tx_win_open_attr *uattr,
>> > +					      enum vas_cop_type
>> > cop_type)
>> > +{
>> > +	long domain[PLPAR_HCALL9_BUFSIZE] = {VAS_DEFAULT_DOMAIN_ID};
>> > +	struct vas_ct_caps *ct_caps;
>> > +	struct vas_caps *caps;
>> > +	struct pseries_vas_window *txwin;
>> > +	int rc;
>> > +
>> > +	txwin = kzalloc(sizeof(*txwin), GFP_KERNEL);
>> > +	if (!txwin)
>> > +		return ERR_PTR(-ENOMEM);
>> > +
>> > +	/*
>> > +	 * A VAS window can have many credits which means that many
>> > +	 * requests can be issued simultaneously. But phyp restricts
>> > +	 * one credit per window.
>> > +	 * phyp introduces 2 different types of credits:
>> > +	 * Default credit type (Uses normal priority FIFO):
>> > +	 *	A limited number of credits are assigned to partitions
>> > +	 *	based on processor entitlement. But these credits may be
>> > +	 *	over-committed on a system depends on whether the CPUs
>> > +	 *	are in shared or dedicated modes - that is, more requests
>> > +	 *	may be issued across the system than NX can service at
>> > +	 *	once which can result in paste command failure (RMA_busy).
>> > +	 *	Then the process has to resend requests or fall-back to
>> > +	 *	SW compression.
>> > +	 * Quality of Service (QoS) credit type (Uses high priority
>> > FIFO):
>> > +	 *	To avoid NX HW contention, the system admins can assign
>> > +	 *	QoS credits for each LPAR so that this partition is
>> > +	 *	guaranteed access to NX resources. These credits are
>> > +	 *	assigned to partitions via the HMC.
>> > +	 *	Refer PAPR for more information.
>> > +	 *
>> > +	 * Allocate window with QoS credits if user requested.
>> > Otherwise
>> > +	 * default credits are used.
>> > +	 */
>> > +	if (uattr->flags & VAS_TX_WIN_FLAG_QOS_CREDIT)
>> > +		caps = &vascaps[VAS_GZIP_QOS_FEAT_TYPE];
>> > +	else
>> > +		caps = &vascaps[VAS_GZIP_DEF_FEAT_TYPE];
>> > +
>> > +	ct_caps = &caps->caps;
>> > +
>> > +	if (atomic_inc_return(&ct_caps->used_lpar_creds) >
>> > +			atomic_read(&ct_caps->target_lpar_creds)) {
>> > +		pr_err("Credits are not available to allocate
>> > window\n");
>> > +		rc = -EINVAL;
>> > +		goto out;
>> > +	}
>> > +
>> > +	/*
>> > +	 * The user space is requesting to allocate a window on a VAS
>> > +	 * instance (or chip) where the process is executing.
>> > +	 * On powerVM, domain values are passed to pHyp to select chip
>> > /
>> > +	 * VAS instance. Useful if the process is affinity to NUMA
>> > node.
>> > +	 * pHyp selects VAS instance if VAS_DEFAULT_DOMAIN_ID (-1) is
>> > +	 * passed for domain values.
>> > +	 */
>> 
>> s/powerVM/PowerVM
>> s/pHyp/PowerVM
>> 
>> You could also just call it the hypervisor. KVM may not implement
>> the 
>> hcalls now, but in future it could.
>> 
>> > +	if (uattr->vas_id == -1) {
>> 
>> Should the above comment fit under here? vas_id == -1 means
>> userspace 
>> asks for any VAS but preferably a local one?
>> 
>> > +		/*
>> > +		 * To allocate VAS window, pass same domain values
>> > returned
>> > +		 * from this HCALL.
>> > +		 */
>> 
>> Then you could merge it with this comment and make it a bit clearer:
>> the h_allocate_vas_window hcall is defined to take a domain as
>> specified by h_home_node_associativity, so no conversions or
>> unpacking
>> needs to be done.
>> 
>> > +		rc = plpar_hcall9(H_HOME_NODE_ASSOCIATIVITY, domain,
>> > +				  VPHN_FLAG_VCPU, smp_processor_id());
>> > +		if (rc != H_SUCCESS) {
>> > +			pr_err("HCALL(%x): failed with ret(%d)\n",
>> > +			       H_HOME_NODE_ASSOCIATIVITY, rc);
>> > +			goto out;
>> > +		}
>> > +	}
>> > +
>> > +	/*
>> > +	 * Allocate / Deallocate window HCALLs and setup / free IRQs
>> > +	 * have to be protected with mutex.
>> > +	 * Open VAS window: Allocate window HCALL and setup IRQ
>> > +	 * Close VAS window: Deallocate window HCALL and free IRQ
>> > +	 *	The hypervisor waits until all NX requests are
>> > +	 *	completed before closing the window. So expects OS
>> > +	 *	to handle NX faults, means IRQ can be freed only
>> > +	 *	after the deallocate window HCALL is returned.
>> > +	 * So once the window is closed with deallocate HCALL before
>> > +	 * the IRQ is freed, it can be assigned to new allocate
>> > +	 * HCALL with the same fault IRQ by the hypervisor. It can
>> > +	 * result in setup IRQ fail for the new window since the
>> > +	 * same fault IRQ is not freed by the OS.
>> > +	 */
>> > +	mutex_lock(&vas_pseries_mutex);
>> 
>> Why? What's the mutex protecting here?
> This mutex is protecting Allocate/ Deallocate HCAlls and setup/free
> IRQ.

But this lock is not protecting both, it's only around the hcall
as far as I can see... Oh, you add the irq setup in the next patch.
Okay that makes more sense.

I would possibly add the mutex lock in that patch, and move it into
allocate_setup_window?


> Basically once the window is opened and setup IRQ successfully,
> makes sure same IRQ is not assigned to otherwindow before the IRQ is
> freed in OS. 
> 
> Allocate window: 
> 	Allocate window HCALL
> 	setup IRQ
> 
> Deallocate window:
> 	Deallocate window HCALL - The hypervisor waits all credits are
> returned including any faults. Disable window on this VAS so that do
> not take new requests and wait for all pending requests (faults are
> processed) 
> 	free IRQ (We can not free IRQ before the HCALL so that pending
> faults can be processed and credits returned)
> 
> So if we do not jave mutex, possible that same fault IRQ may be
> assigned to different window open HCALL before it is actually freed in
> OS.
> 
> Process A:
> Allocate window HCALL (winID 123)
> Setup IRQ (IRQ# 321)
> Process requests
> Deallocate HCALL(winID 123)       Process B:
>                                     Allocate window HCALL (winID 123)
>                                     setup IRQ (IRQ# 321) -- will fail.
> Free IRQ (IRQ#321)

Thanks. To educate me, what is the significance of #321 and 123 here?
Why could it not just allocate IRQ #322 here, or is it fixed to the
window ID?

> 
>                         
>> 
>> > +	rc = allocate_setup_window(txwin, (u64 *)&domain[0],
>> > +				   ct_caps->win_type);
>> 
>> If you define the types to be the same, can you avoid this casting?
>> allocate_setup_window specifically needs an array of 
>> PLPAR_HCALL9_BUFSIZE longs.
> 
> Yes H_HOME_NODE_ASSOCIATIVITY returns longs (PLPAR_HCALL9_BUFSIZE) but
> H_ALLOCATE_VAS_WINDOW expects u64.

Oh, ignore me then I guess that's okay.

> 
> Syntax:
> int64 /* H_Success, H_Parameter, H_Resource, H_Constrained */
> /* H_Cop_Hw, H_Unsupported, H_Busy */
> /* H_LongBusyOrder1mSec, LongBusyOrder10mSec */
> hcall(const uint64 H_ALLOCATE_VAS_WINDOW /* Allocate a VAS window */
> uint8 vasWindowType, /* The type of VAS window to allocate */
> uint16 numCredits, /* Number of credits to assign to the window */
> uint64 desiredAssociativityDomainIdentifier1, /* Associativity Domain
> Identifier Doubleword 1 */
> uint64 desiredAssociativityDomainIdentifier2, /* Associativity Domain
> Identifier Doubleword 2 */
> uint64 desiredAssociativityDomainIdentifier3, /* Associativity Domain
> Identifier Doubleword 3 */
> uint64 desiredAssociativityDomainIdentifier4, /* Associativity Domain
> Identifier Doubleword 4 */
> uint64 desiredAssociativityDomainIdentifier5, /* Associativity Domain
> Identifier Doubleword 5 */
> uint64 desiredAssociativityDomainIdentifier6) /* Associativity Domain
> Identifier Doubleword 6 */
> 
>> 
>> > +	mutex_unlock(&vas_pseries_mutex);
>> > +	if (rc)
>> > +		goto out;
>> > +
>> > +	/*
>> > +	 * Modify window and it is ready to use.
>> > +	 */
>> > +	rc = h_modify_vas_window(txwin);
>> > +	if (!rc)
>> > +		rc = get_vas_user_win_ref(&txwin->vas_win.task_ref);
>> > +	if (rc)
>> > +		goto out_free;
>> > +
>> > +	vas_user_win_add_mm_context(&txwin->vas_win.task_ref);
>> > +	txwin->win_type = ct_caps->win_type;
>> > +	mutex_lock(&vas_pseries_mutex);
>> > +	list_add(&txwin->win_list, &caps->list);
>> > +	mutex_unlock(&vas_pseries_mutex);
>> > +
>> > +	return &txwin->vas_win;
>> > +
>> > +out_free:
>> > +	h_deallocate_vas_window(txwin->vas_win.winid);
>> 
>> No mutex here in this deallocate hcall.
> 
> We do not need mutex just for this hcall which is executing during
> failure. Need mutex only for the code patch as explained above. 
> 
> If the window open is not returned to user space successfully, window
> close path will be:
> 
>  free_irq_setup(txwin);
>  h_deallocate_vas_window(txwin->vas_win.winid);

Right, it's clearer in context of the next patch.

>> 
>> I suspect you don't actually need the mutex for the hcalls
>> themselves, 
>> but the list manipulations. I would possibly consider putting 
>> used_lpar_creds under that same lock rather than making it atomic and
>> playing lock free games, unless you really need to.
>> 
>> Also... "creds". credentials? credits, right? Don't go through and 
>> change everything now, but not skimping on naming helps a lot with
>> reading code that you're not familiar with. All the vas/nx stuff
>> could probably do with a pass to make the names a bit easier.
> 
> Yes creds is for credits. We have names like credits and capabilities
> for VAS/NX. Using creds for credits and caps for capabilities. creds is
> already used in the existing code.
> 
>> 
>> (creds isn't so bad, "ct" for "coprocessor type" is pretty obscure 
>> though).
> 
> So can I use 'copt' coprocessor type?

Sorry I don't want to get into nitpicking so much, naming is hard and 
I'm far from perfect at it. The existing stuff is not bad, I'm just not
used to reading it but all code has its own conventions. Go with what
you have for now.

I think that takes care of my questions, I'm happy with whatever you 
decide with the comments and mutex, they were only suggestions. So

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH v5 04/17] powerpc/vas: Add platform specific user window operations
From: Nicholas Piggin @ 2021-06-16  9:11 UTC (permalink / raw)
  To: Haren Myneni, herbert, linux-crypto, linuxppc-dev, mpe
In-Reply-To: <e409750a3bd5f8410d7a8a290c69375486420b93.camel@linux.ibm.com>

Excerpts from Haren Myneni's message of June 15, 2021 4:37 pm:
> On Mon, 2021-06-14 at 12:24 +1000, Nicholas Piggin wrote:
>> Excerpts from Haren Myneni's message of June 13, 2021 8:57 pm:
>> > PowerNV uses registers to open/close VAS windows, and getting the
>> > paste address. Whereas the hypervisor calls are used on PowerVM.
>> > 
>> > This patch adds the platform specific user space window operations
>> > and register with the common VAS user space interface.
>> > 
>> > Signed-off-by: Haren Myneni <haren@linux.ibm.com>
>> > ---
>> >  arch/powerpc/include/asm/vas.h              | 14 +++++-
>> >  arch/powerpc/platforms/book3s/vas-api.c     | 53 +++++++++++++--
>> > ------
>> >  arch/powerpc/platforms/powernv/vas-window.c | 45 ++++++++++++++++-
>> >  3 files changed, 89 insertions(+), 23 deletions(-)
>> > 
>> > diff --git a/arch/powerpc/include/asm/vas.h
>> > b/arch/powerpc/include/asm/vas.h
>> > index bab7891d43f5..85318d7446c7 100644
>> > --- a/arch/powerpc/include/asm/vas.h
>> > +++ b/arch/powerpc/include/asm/vas.h
>> > @@ -5,6 +5,7 @@
>> >  
>> >  #ifndef _ASM_POWERPC_VAS_H
>> >  #define _ASM_POWERPC_VAS_H
>> > +#include <uapi/asm/vas-api.h>
>> >  
>> >  struct vas_window;
>> >  
>> > @@ -48,6 +49,16 @@ enum vas_cop_type {
>> >  	VAS_COP_TYPE_MAX,
>> >  };
>> >  
>> > +/*
>> > + * User space window operations used for powernv and powerVM
>> > + */
>> > +struct vas_user_win_ops {
>> > +	struct vas_window * (*open_win)(struct vas_tx_win_open_attr *,
>> > +				enum vas_cop_type);
>> > +	u64 (*paste_addr)(struct vas_window *);
>> > +	int (*close_win)(struct vas_window *);
>> > +};
>> 
>> This looks better, but rather than pull in uapi and the user API 
>> structure here, could you just pass in vas_id and flags after the
>> common 
>> code does the user copy and verifies the version and other details?
>> 
>> I think it's generally good practice to limit the data that the usre
>> can influence as much as possible. Sorry for not picking up on that
>> earlier.
> 
> The user space pass vas_tx_win_open_attr struct - use only vas_id and
> flags right now but it can be extended in future with reserve elements.
> So passing the same struct to platform specific API.
> 
> do you prefer "struct vas_window * (*open_win)(vas_id, flags, cop)" and
> extend later when more elments are used?

Yes I think so. The reason being so you don't sending data under the
control of user very far into the kernel. Better safe than sorry.

Thanks,
Nick


^ permalink raw reply

* Re: [PATCH v13 1/3] kasan: allow an architecture to disable inline instrumentation
From: Marco Elver @ 2021-06-16  9:09 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: aneesh.kumar, LKML, kasan-dev, Linux Memory Management List,
	Andrew Morton, linuxppc-dev, Andrey Konovalov
In-Reply-To: <20210616080244.51236-2-dja@axtens.net>

On Wed, 16 Jun 2021 at 10:02, Daniel Axtens <dja@axtens.net> wrote:
>
> For annoying architectural reasons, it's very difficult to support inline
> instrumentation on powerpc64.*
>
> Add a Kconfig flag to allow an arch to disable inline. (It's a bit
> annoying to be 'backwards', but I'm not aware of any way to have
> an arch force a symbol to be 'n', rather than 'y'.)
>
> We also disable stack instrumentation in this case as it does things that
> are functionally equivalent to inline instrumentation, namely adding
> code that touches the shadow directly without going through a C helper.
>
> * on ppc64 atm, the shadow lives in virtual memory and isn't accessible in
> real mode. However, before we turn on virtual memory, we parse the device
> tree to determine which platform and MMU we're running under. That calls
> generic DT code, which is instrumented. Inline instrumentation in DT would
> unconditionally attempt to touch the shadow region, which we won't have
> set up yet, and would crash. We can make outline mode wait for the arch to
> be ready, but we can't change what the compiler inserts for inline mode.
>
> Signed-off-by: Daniel Axtens <dja@axtens.net>

Reviewed-by: Marco Elver <elver@google.com>

Thank you.

> ---
>  lib/Kconfig.kasan | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
> index cffc2ebbf185..cb5e02d09e11 100644
> --- a/lib/Kconfig.kasan
> +++ b/lib/Kconfig.kasan
> @@ -12,6 +12,15 @@ config HAVE_ARCH_KASAN_HW_TAGS
>  config HAVE_ARCH_KASAN_VMALLOC
>         bool
>
> +config ARCH_DISABLE_KASAN_INLINE
> +       bool
> +       help
> +         Sometimes an architecture might not be able to support inline
> +         instrumentation but might be able to support outline instrumentation.
> +         This option allows an architecture to prevent inline and stack
> +         instrumentation from being enabled.
> +
> +
>  config CC_HAS_KASAN_GENERIC
>         def_bool $(cc-option, -fsanitize=kernel-address)
>
> @@ -130,6 +139,7 @@ config KASAN_OUTLINE
>
>  config KASAN_INLINE
>         bool "Inline instrumentation"
> +       depends on !ARCH_DISABLE_KASAN_INLINE
>         help
>           Compiler directly inserts code checking shadow memory before
>           memory accesses. This is faster than outline (in some workloads
> @@ -141,6 +151,7 @@ endchoice
>  config KASAN_STACK
>         bool "Enable stack instrumentation (unsafe)" if CC_IS_CLANG && !COMPILE_TEST
>         depends on KASAN_GENERIC || KASAN_SW_TAGS
> +       depends on !ARCH_DISABLE_KASAN_INLINE
>         default y if CC_IS_GCC
>         help
>           The LLVM stack address sanitizer has a know problem that
> @@ -154,6 +165,9 @@ config KASAN_STACK
>           but clang users can still enable it for builds without
>           CONFIG_COMPILE_TEST.  On gcc it is assumed to always be safe
>           to use and enabled by default.
> +         If the architecture disables inline instrumentation, this is
> +         also disabled as it adds inline-style instrumentation that
> +         is run unconditionally.
>
>  config KASAN_SW_TAGS_IDENTIFY
>         bool "Enable memory corruption identification"
> --
> 2.30.2
>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20210616080244.51236-2-dja%40axtens.net.

^ permalink raw reply

* Re: [PATCH v13 3/3] kasan: define and use MAX_PTRS_PER_* for early shadow tables
From: Marco Elver @ 2021-06-16  9:07 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: aneesh.kumar, LKML, kasan-dev, Linux Memory Management List,
	Andrew Morton, linuxppc-dev, Andrey Konovalov
In-Reply-To: <20210616080244.51236-4-dja@axtens.net>

On Wed, 16 Jun 2021 at 10:03, Daniel Axtens <dja@axtens.net> wrote:
[...]
> diff --git a/include/linux/kasan.h b/include/linux/kasan.h
> index 768d7d342757..fd65f477ac92 100644
> --- a/include/linux/kasan.h
> +++ b/include/linux/kasan.h
> @@ -40,10 +40,22 @@ struct kunit_kasan_expectation {
>  #define PTE_HWTABLE_PTRS 0
>  #endif
>
> +#ifndef MAX_PTRS_PER_PTE
> +#define MAX_PTRS_PER_PTE PTRS_PER_PTE
> +#endif
> +
> +#ifndef MAX_PTRS_PER_PMD
> +#define MAX_PTRS_PER_PMD PTRS_PER_PMD
> +#endif
> +
> +#ifndef MAX_PTRS_PER_PUD
> +#define MAX_PTRS_PER_PUD PTRS_PER_PUD
> +#endif

This is introducing new global constants in a <linux/..> header. It
feels like this should be in <linux/pgtable.h> together with a
comment. Because <linux/kasan.h> is actually included in
<linux/slab.h>, most of the kernel will get these new definitions.
That in itself is fine, but it feels wrong that the KASAN header
introduces these.

Thoughts?

Sorry for only realizing this now.

Thanks,
-- Marco

>  extern unsigned char kasan_early_shadow_page[PAGE_SIZE];
> -extern pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS];
> -extern pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD];
> -extern pud_t kasan_early_shadow_pud[PTRS_PER_PUD];
> +extern pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS];
> +extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD];
> +extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD];
>  extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D];
>
>  int kasan_populate_early_shadow(const void *shadow_start,
> diff --git a/mm/kasan/init.c b/mm/kasan/init.c
> index 348f31d15a97..cc64ed6858c6 100644
> --- a/mm/kasan/init.c
> +++ b/mm/kasan/init.c
> @@ -41,7 +41,7 @@ static inline bool kasan_p4d_table(pgd_t pgd)
>  }
>  #endif
>  #if CONFIG_PGTABLE_LEVELS > 3
> -pud_t kasan_early_shadow_pud[PTRS_PER_PUD] __page_aligned_bss;
> +pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD] __page_aligned_bss;
>  static inline bool kasan_pud_table(p4d_t p4d)
>  {
>         return p4d_page(p4d) == virt_to_page(lm_alias(kasan_early_shadow_pud));
> @@ -53,7 +53,7 @@ static inline bool kasan_pud_table(p4d_t p4d)
>  }
>  #endif
>  #if CONFIG_PGTABLE_LEVELS > 2
> -pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD] __page_aligned_bss;
> +pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD] __page_aligned_bss;
>  static inline bool kasan_pmd_table(pud_t pud)
>  {
>         return pud_page(pud) == virt_to_page(lm_alias(kasan_early_shadow_pmd));
> @@ -64,7 +64,7 @@ static inline bool kasan_pmd_table(pud_t pud)
>         return false;
>  }
>  #endif
> -pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS]
> +pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS]
>         __page_aligned_bss;
>
>  static inline bool kasan_pte_table(pmd_t pmd)
> --
> 2.30.2
>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20210616080244.51236-4-dja%40axtens.net.

^ permalink raw reply

* Re: [PATCH v13 2/3] kasan: allow architectures to provide an outline readiness check
From: Marco Elver @ 2021-06-16  8:56 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: Aneesh Kumar K . V, aneesh.kumar, LKML, kasan-dev,
	Linux Memory Management List, Andrew Morton, linuxppc-dev,
	Andrey Konovalov
In-Reply-To: <20210616080244.51236-3-dja@axtens.net>

On Wed, 16 Jun 2021 at 10:02, Daniel Axtens <dja@axtens.net> wrote:
> Allow architectures to define a kasan_arch_is_ready() hook that bails
> out of any function that's about to touch the shadow unless the arch
> says that it is ready for the memory to be accessed. This is fairly
> uninvasive and should have a negligible performance penalty.
>
> This will only work in outline mode, so an arch must specify
> ARCH_DISABLE_KASAN_INLINE if it requires this.
>
> Cc: Balbir Singh <bsingharora@gmail.com>
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> Signed-off-by: Daniel Axtens <dja@axtens.net>

Reviewed-by: Marco Elver <elver@google.com>

but also check if an assertion that this is only used with
KASAN_GENERIC might make sense (below). Depends on how much we want to
make sure kasan_arch_is_ready() could be useful for other modes (which
I don't think it makes sense).

> --
>
> I discuss the justfication for this later in the series. Also,
> both previous RFCs for ppc64 - by 2 different people - have
> needed this trick! See:
>  - https://lore.kernel.org/patchwork/patch/592820/ # ppc64 hash series
>  - https://patchwork.ozlabs.org/patch/795211/      # ppc radix series
> ---
>  mm/kasan/common.c  | 4 ++++
>  mm/kasan/generic.c | 3 +++
>  mm/kasan/kasan.h   | 4 ++++
>  mm/kasan/shadow.c  | 8 ++++++++
>  4 files changed, 19 insertions(+)
>
> diff --git a/mm/kasan/common.c b/mm/kasan/common.c
> index 10177cc26d06..0ad615f3801d 100644
> --- a/mm/kasan/common.c
> +++ b/mm/kasan/common.c
> @@ -331,6 +331,10 @@ static inline bool ____kasan_slab_free(struct kmem_cache *cache, void *object,
>         u8 tag;
>         void *tagged_object;
>
> +       /* Bail if the arch isn't ready */
> +       if (!kasan_arch_is_ready())
> +               return false;
> +
>         tag = get_tag(object);
>         tagged_object = object;
>         object = kasan_reset_tag(object);
> diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
> index 53cbf28859b5..c3f5ba7a294a 100644
> --- a/mm/kasan/generic.c
> +++ b/mm/kasan/generic.c
> @@ -163,6 +163,9 @@ static __always_inline bool check_region_inline(unsigned long addr,
>                                                 size_t size, bool write,
>                                                 unsigned long ret_ip)
>  {
> +       if (!kasan_arch_is_ready())
> +               return true;
> +
>         if (unlikely(size == 0))
>                 return true;
>
> diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
> index 8f450bc28045..19323a3d5975 100644
> --- a/mm/kasan/kasan.h
> +++ b/mm/kasan/kasan.h
> @@ -449,6 +449,10 @@ static inline void kasan_poison_last_granule(const void *address, size_t size) {
>
>  #endif /* CONFIG_KASAN_GENERIC */
>
> +#ifndef kasan_arch_is_ready
> +static inline bool kasan_arch_is_ready(void)   { return true; }
> +#endif
> +

I've been trying to think of a way to make it clear this is only for
KASAN_GENERIC mode, and not the others. An arch can always define this
function, but of course it might not be used. One way would be to add
an '#ifndef CONFIG_KASAN_GENERIC' in the #else case and #error if it's
not generic mode.

I think trying to make this do anything useful for SW_TAGS or HW_TAGS
modes does not make sense (at least right now).

>  /*
>   * Exported functions for interfaces called from assembly or from generated
>   * code. Declarations here to avoid warning about missing declarations.
> diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
> index 082ee5b6d9a1..3c7f7efe6f68 100644
> --- a/mm/kasan/shadow.c
> +++ b/mm/kasan/shadow.c
> @@ -73,6 +73,10 @@ void kasan_poison(const void *addr, size_t size, u8 value, bool init)
>  {
>         void *shadow_start, *shadow_end;
>
> +       /* Don't touch the shadow memory if arch isn't ready */
> +       if (!kasan_arch_is_ready())
> +               return;
> +
>         /*
>          * Perform shadow offset calculation based on untagged address, as
>          * some of the callers (e.g. kasan_poison_object_data) pass tagged
> @@ -99,6 +103,10 @@ EXPORT_SYMBOL(kasan_poison);
>  #ifdef CONFIG_KASAN_GENERIC
>  void kasan_poison_last_granule(const void *addr, size_t size)
>  {
> +       /* Don't touch the shadow memory if arch isn't ready */
> +       if (!kasan_arch_is_ready())
> +               return;
> +
>         if (size & KASAN_GRANULE_MASK) {
>                 u8 *shadow = (u8 *)kasan_mem_to_shadow(addr + size);
>                 *shadow = size & KASAN_GRANULE_MASK;
> --
> 2.30.2
>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20210616080244.51236-3-dja%40axtens.net.

^ permalink raw reply

* [PATCH v3] lockdown, selinux: fix wrong subject in some SELinux lockdown checks
From: Ondrej Mosnacek @ 2021-06-16  8:51 UTC (permalink / raw)
  To: linux-security-module, James Morris
  Cc: linux-efi, linux-pci, linux-cxl, Steffen Klassert, Paul Moore,
	x86, linux-acpi, Ingo Molnar, linux-serial, linux-pm, selinux,
	Steven Rostedt, linux-fsdevel, Herbert Xu, netdev,
	Stephen Smalley, kexec, linux-kernel, Casey Schaufler, bpf,
	linuxppc-dev, David S . Miller

Commit 59438b46471a ("security,lockdown,selinux: implement SELinux
lockdown") added an implementation of the locked_down LSM hook to
SELinux, with the aim to restrict which domains are allowed to perform
operations that would breach lockdown.

However, in several places the security_locked_down() hook is called in
situations where the current task isn't doing any action that would
directly breach lockdown, leading to SELinux checks that are basically
bogus.

To fix this, add an explicit struct cred pointer argument to
security_lockdown() and define NULL as a special value to pass instead
of current_cred() in such situations. LSMs that take the subject
credentials into account can then fall back to some default or ignore
such calls altogether. In the SELinux lockdown hook implementation, use
SECINITSID_KERNEL in case the cred argument is NULL.

Most of the callers are updated to pass current_cred() as the cred
pointer, thus maintaining the same behavior. The following callers are
modified to pass NULL as the cred pointer instead:
1. arch/powerpc/xmon/xmon.c
     Seems to be some interactive debugging facility. It appears that
     the lockdown hook is called from interrupt context here, so it
     should be more appropriate to request a global lockdown decision.
2. fs/tracefs/inode.c:tracefs_create_file()
     Here the call is used to prevent creating new tracefs entries when
     the kernel is locked down. Assumes that locking down is one-way -
     i.e. if the hook returns non-zero once, it will never return zero
     again, thus no point in creating these files. Also, the hook is
     often called by a module's init function when it is loaded by
     userspace, where it doesn't make much sense to do a check against
     the current task's creds, since the task itself doesn't actually
     use the tracing functionality (i.e. doesn't breach lockdown), just
     indirectly makes some new tracepoints available to whoever is
     authorized to use them.
3. net/xfrm/xfrm_user.c:copy_to_user_*()
     Here a cryptographic secret is redacted based on the value returned
     from the hook. There are two possible actions that may lead here:
     a) A netlink message XFRM_MSG_GETSA with NLM_F_DUMP set - here the
        task context is relevant, since the dumped data is sent back to
        the current task.
     b) When adding/deleting/updating an SA via XFRM_MSG_xxxSA, the
        dumped SA is broadcasted to tasks subscribed to XFRM events -
        here the current task context is not relevant as it doesn't
        represent the tasks that could potentially see the secret.
     It doesn't seem worth it to try to keep using the current task's
     context in the a) case, since the eventual data leak can be
     circumvented anyway via b), plus there is no way for the task to
     indicate that it doesn't care about the actual key value, so the
     check could generate a lot of "false alert" denials with SELinux.
     Thus, let's pass NULL instead of current_cred() here faute de
     mieux.

Improvements-suggested-by: Casey Schaufler <casey@schaufler-ca.com>
Improvements-suggested-by: Paul Moore <paul@paul-moore.com>
Fixes: 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
---

v3:
- add the cred argument to security_locked_down() and adapt all callers
- keep using current_cred() in BPF, as the hook calls have been shifted
  to program load time (commit ff40e51043af ("bpf, lockdown, audit: Fix
  buggy SELinux lockdown permission checks"))
- in SELinux, don't ignore hook calls where cred == NULL, but use
  SECINITSID_KERNEL as the subject instead
- update explanations in the commit message

v2: https://lore.kernel.org/lkml/20210517092006.803332-1-omosnace@redhat.com/
- change to a single hook based on suggestions by Casey Schaufler

v1: https://lore.kernel.org/lkml/20210507114048.138933-1-omosnace@redhat.com/

 arch/powerpc/xmon/xmon.c             |  4 ++--
 arch/x86/kernel/ioport.c             |  4 ++--
 arch/x86/kernel/msr.c                |  4 ++--
 arch/x86/mm/testmmiotrace.c          |  2 +-
 drivers/acpi/acpi_configfs.c         |  2 +-
 drivers/acpi/custom_method.c         |  2 +-
 drivers/acpi/osl.c                   |  3 ++-
 drivers/acpi/tables.c                |  2 +-
 drivers/char/mem.c                   |  2 +-
 drivers/cxl/mem.c                    |  2 +-
 drivers/firmware/efi/efi.c           |  2 +-
 drivers/firmware/efi/test/efi_test.c |  2 +-
 drivers/pci/pci-sysfs.c              |  6 +++---
 drivers/pci/proc.c                   |  6 +++---
 drivers/pci/syscall.c                |  2 +-
 drivers/pcmcia/cistpl.c              |  2 +-
 drivers/tty/serial/serial_core.c     |  2 +-
 fs/debugfs/file.c                    |  2 +-
 fs/debugfs/inode.c                   |  2 +-
 fs/proc/kcore.c                      |  2 +-
 fs/tracefs/inode.c                   |  2 +-
 include/linux/lsm_hook_defs.h        |  2 +-
 include/linux/lsm_hooks.h            |  1 +
 include/linux/security.h             |  4 ++--
 kernel/bpf/helpers.c                 | 10 ++++++----
 kernel/events/core.c                 |  2 +-
 kernel/kexec.c                       |  2 +-
 kernel/kexec_file.c                  |  2 +-
 kernel/module.c                      |  2 +-
 kernel/params.c                      |  2 +-
 kernel/power/hibernate.c             |  3 ++-
 kernel/trace/bpf_trace.c             | 20 ++++++++++++--------
 kernel/trace/ftrace.c                |  4 ++--
 kernel/trace/ring_buffer.c           |  2 +-
 kernel/trace/trace.c                 | 10 +++++-----
 kernel/trace/trace_events.c          |  2 +-
 kernel/trace/trace_events_hist.c     |  4 ++--
 kernel/trace/trace_events_synth.c    |  2 +-
 kernel/trace/trace_events_trigger.c  |  2 +-
 kernel/trace/trace_kprobe.c          |  6 +++---
 kernel/trace/trace_printk.c          |  2 +-
 kernel/trace/trace_stack.c           |  2 +-
 kernel/trace/trace_stat.c            |  2 +-
 kernel/trace/trace_uprobe.c          |  4 ++--
 net/xfrm/xfrm_user.c                 | 11 +++++++++--
 security/lockdown/lockdown.c         |  3 ++-
 security/security.c                  |  4 ++--
 security/selinux/hooks.c             |  7 +++++--
 48 files changed, 97 insertions(+), 77 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index c8173e92f19d..63176798f235 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -299,7 +299,7 @@ static bool xmon_is_locked_down(void)
 	static bool lockdown;
 
 	if (!lockdown) {
-		lockdown = !!security_locked_down(LOCKDOWN_XMON_RW);
+		lockdown = !!security_locked_down(NULL, LOCKDOWN_XMON_RW);
 		if (lockdown) {
 			printf("xmon: Disabled due to kernel lockdown\n");
 			xmon_is_ro = true;
@@ -307,7 +307,7 @@ static bool xmon_is_locked_down(void)
 	}
 
 	if (!xmon_is_ro) {
-		xmon_is_ro = !!security_locked_down(LOCKDOWN_XMON_WR);
+		xmon_is_ro = !!security_locked_down(NULL, LOCKDOWN_XMON_WR);
 		if (xmon_is_ro)
 			printf("xmon: Read-only due to kernel lockdown\n");
 	}
diff --git a/arch/x86/kernel/ioport.c b/arch/x86/kernel/ioport.c
index e2fab3ceb09f..838ba45ecc71 100644
--- a/arch/x86/kernel/ioport.c
+++ b/arch/x86/kernel/ioport.c
@@ -71,7 +71,7 @@ long ksys_ioperm(unsigned long from, unsigned long num, int turn_on)
 	if ((from + num <= from) || (from + num > IO_BITMAP_BITS))
 		return -EINVAL;
 	if (turn_on && (!capable(CAP_SYS_RAWIO) ||
-			security_locked_down(LOCKDOWN_IOPORT)))
+			security_locked_down(current_cred(), LOCKDOWN_IOPORT)))
 		return -EPERM;
 
 	/*
@@ -187,7 +187,7 @@ SYSCALL_DEFINE1(iopl, unsigned int, level)
 	/* Trying to gain more privileges? */
 	if (level > old) {
 		if (!capable(CAP_SYS_RAWIO) ||
-		    security_locked_down(LOCKDOWN_IOPORT))
+		    security_locked_down(current_cred(), LOCKDOWN_IOPORT))
 			return -EPERM;
 	}
 
diff --git a/arch/x86/kernel/msr.c b/arch/x86/kernel/msr.c
index ed8ac6bcbafb..6a687d91515b 100644
--- a/arch/x86/kernel/msr.c
+++ b/arch/x86/kernel/msr.c
@@ -116,7 +116,7 @@ static ssize_t msr_write(struct file *file, const char __user *buf,
 	int err = 0;
 	ssize_t bytes = 0;
 
-	err = security_locked_down(LOCKDOWN_MSR);
+	err = security_locked_down(current_cred(), LOCKDOWN_MSR);
 	if (err)
 		return err;
 
@@ -179,7 +179,7 @@ static long msr_ioctl(struct file *file, unsigned int ioc, unsigned long arg)
 			err = -EFAULT;
 			break;
 		}
-		err = security_locked_down(LOCKDOWN_MSR);
+		err = security_locked_down(current_cred(), LOCKDOWN_MSR);
 		if (err)
 			break;
 
diff --git a/arch/x86/mm/testmmiotrace.c b/arch/x86/mm/testmmiotrace.c
index bda73cb7a044..c43a13241ae8 100644
--- a/arch/x86/mm/testmmiotrace.c
+++ b/arch/x86/mm/testmmiotrace.c
@@ -116,7 +116,7 @@ static void do_test_bulk_ioremapping(void)
 static int __init init(void)
 {
 	unsigned long size = (read_far) ? (8 << 20) : (16 << 10);
-	int ret = security_locked_down(LOCKDOWN_MMIOTRACE);
+	int ret = security_locked_down(current_cred(), LOCKDOWN_MMIOTRACE);
 
 	if (ret)
 		return ret;
diff --git a/drivers/acpi/acpi_configfs.c b/drivers/acpi/acpi_configfs.c
index 3a14859dbb75..2221a63d57e1 100644
--- a/drivers/acpi/acpi_configfs.c
+++ b/drivers/acpi/acpi_configfs.c
@@ -29,7 +29,7 @@ static ssize_t acpi_table_aml_write(struct config_item *cfg,
 {
 	const struct acpi_table_header *header = data;
 	struct acpi_table *table;
-	int ret = security_locked_down(LOCKDOWN_ACPI_TABLES);
+	int ret = security_locked_down(current_cred(), LOCKDOWN_ACPI_TABLES);
 
 	if (ret)
 		return ret;
diff --git a/drivers/acpi/custom_method.c b/drivers/acpi/custom_method.c
index d39a9b474727..8cac7f683245 100644
--- a/drivers/acpi/custom_method.c
+++ b/drivers/acpi/custom_method.c
@@ -30,7 +30,7 @@ static ssize_t cm_write(struct file *file, const char __user *user_buf,
 	acpi_status status;
 	int ret;
 
-	ret = security_locked_down(LOCKDOWN_ACPI_TABLES);
+	ret = security_locked_down(current_cred(), LOCKDOWN_ACPI_TABLES);
 	if (ret)
 		return ret;
 
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 327e1b4eb6b0..b55364f50164 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -198,7 +198,8 @@ acpi_physical_address __init acpi_os_get_root_pointer(void)
 	 * specific location (if appropriate) so it can be carried
 	 * over further kexec()s.
 	 */
-	if (acpi_rsdp && !security_locked_down(LOCKDOWN_ACPI_TABLES)) {
+	if (acpi_rsdp && !security_locked_down(current_cred(),
+					       LOCKDOWN_ACPI_TABLES)) {
 		acpi_arch_set_root_pointer(acpi_rsdp);
 		return acpi_rsdp;
 	}
diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index 9d581045acff..e9d18d0a69b6 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -568,7 +568,7 @@ void __init acpi_table_upgrade(void)
 	if (table_nr == 0)
 		return;
 
-	if (security_locked_down(LOCKDOWN_ACPI_TABLES)) {
+	if (security_locked_down(current_cred(), LOCKDOWN_ACPI_TABLES)) {
 		pr_notice("kernel is locked down, ignoring table override\n");
 		return;
 	}
diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 15dc54fa1d47..aa1e6c542e90 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -618,7 +618,7 @@ static int open_port(struct inode *inode, struct file *filp)
 	if (!capable(CAP_SYS_RAWIO))
 		return -EPERM;
 
-	rc = security_locked_down(LOCKDOWN_DEV_MEM);
+	rc = security_locked_down(current_cred(), LOCKDOWN_DEV_MEM);
 	if (rc)
 		return rc;
 
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 2acc6173da36..c1747b6555c7 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -568,7 +568,7 @@ static bool cxl_mem_raw_command_allowed(u16 opcode)
 	if (!IS_ENABLED(CONFIG_CXL_MEM_RAW_COMMANDS))
 		return false;
 
-	if (security_locked_down(LOCKDOWN_NONE))
+	if (security_locked_down(current_cred(), LOCKDOWN_NONE))
 		return false;
 
 	if (cxl_raw_allow_all)
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 4b7ee3fa9224..b2121e190ffe 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -200,7 +200,7 @@ static void generic_ops_unregister(void)
 static char efivar_ssdt[EFIVAR_SSDT_NAME_MAX] __initdata;
 static int __init efivar_ssdt_setup(char *str)
 {
-	int ret = security_locked_down(LOCKDOWN_ACPI_TABLES);
+	int ret = security_locked_down(current_cred(), LOCKDOWN_ACPI_TABLES);
 
 	if (ret)
 		return ret;
diff --git a/drivers/firmware/efi/test/efi_test.c b/drivers/firmware/efi/test/efi_test.c
index 47d67bb0a516..942c25843665 100644
--- a/drivers/firmware/efi/test/efi_test.c
+++ b/drivers/firmware/efi/test/efi_test.c
@@ -722,7 +722,7 @@ static long efi_test_ioctl(struct file *file, unsigned int cmd,
 
 static int efi_test_open(struct inode *inode, struct file *file)
 {
-	int ret = security_locked_down(LOCKDOWN_EFI_TEST);
+	int ret = security_locked_down(current_cred(), LOCKDOWN_EFI_TEST);
 
 	if (ret)
 		return ret;
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index beb8d1f4fafe..7fcfdddfd586 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -753,7 +753,7 @@ static ssize_t pci_write_config(struct file *filp, struct kobject *kobj,
 	u8 *data = (u8 *) buf;
 	int ret;
 
-	ret = security_locked_down(LOCKDOWN_PCI_ACCESS);
+	ret = security_locked_down(current_cred(), LOCKDOWN_PCI_ACCESS);
 	if (ret)
 		return ret;
 
@@ -1047,7 +1047,7 @@ static int pci_mmap_resource(struct kobject *kobj, struct bin_attribute *attr,
 	struct resource *res = &pdev->resource[bar];
 	int ret;
 
-	ret = security_locked_down(LOCKDOWN_PCI_ACCESS);
+	ret = security_locked_down(current_cred(), LOCKDOWN_PCI_ACCESS);
 	if (ret)
 		return ret;
 
@@ -1128,7 +1128,7 @@ static ssize_t pci_write_resource_io(struct file *filp, struct kobject *kobj,
 {
 	int ret;
 
-	ret = security_locked_down(LOCKDOWN_PCI_ACCESS);
+	ret = security_locked_down(current_cred(), LOCKDOWN_PCI_ACCESS);
 	if (ret)
 		return ret;
 
diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
index 9bab07302bbf..0cc69bba4c4a 100644
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -118,7 +118,7 @@ static ssize_t proc_bus_pci_write(struct file *file, const char __user *buf,
 	int size = dev->cfg_size;
 	int cnt, ret;
 
-	ret = security_locked_down(LOCKDOWN_PCI_ACCESS);
+	ret = security_locked_down(current_cred(), LOCKDOWN_PCI_ACCESS);
 	if (ret)
 		return ret;
 
@@ -201,7 +201,7 @@ static long proc_bus_pci_ioctl(struct file *file, unsigned int cmd,
 #endif /* HAVE_PCI_MMAP */
 	int ret = 0;
 
-	ret = security_locked_down(LOCKDOWN_PCI_ACCESS);
+	ret = security_locked_down(current_cred(), LOCKDOWN_PCI_ACCESS);
 	if (ret)
 		return ret;
 
@@ -248,7 +248,7 @@ static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma)
 	int i, ret, write_combine = 0, res_bit = IORESOURCE_MEM;
 
 	if (!capable(CAP_SYS_RAWIO) ||
-	    security_locked_down(LOCKDOWN_PCI_ACCESS))
+	    security_locked_down(current_cred(), LOCKDOWN_PCI_ACCESS))
 		return -EPERM;
 
 	if (fpriv->mmap_state == pci_mmap_io) {
diff --git a/drivers/pci/syscall.c b/drivers/pci/syscall.c
index 8b003c890b87..29da701bcc25 100644
--- a/drivers/pci/syscall.c
+++ b/drivers/pci/syscall.c
@@ -92,7 +92,7 @@ SYSCALL_DEFINE5(pciconfig_write, unsigned long, bus, unsigned long, dfn,
 	int err = 0;
 
 	if (!capable(CAP_SYS_ADMIN) ||
-	    security_locked_down(LOCKDOWN_PCI_ACCESS))
+	    security_locked_down(current_cred(), LOCKDOWN_PCI_ACCESS))
 		return -EPERM;
 
 	dev = pci_get_domain_bus_and_slot(0, bus, dfn);
diff --git a/drivers/pcmcia/cistpl.c b/drivers/pcmcia/cistpl.c
index 948b763dc451..96c96c1cd6da 100644
--- a/drivers/pcmcia/cistpl.c
+++ b/drivers/pcmcia/cistpl.c
@@ -1577,7 +1577,7 @@ static ssize_t pccard_store_cis(struct file *filp, struct kobject *kobj,
 	struct pcmcia_socket *s;
 	int error;
 
-	error = security_locked_down(LOCKDOWN_PCMCIA_CIS);
+	error = security_locked_down(current_cred(), LOCKDOWN_PCMCIA_CIS);
 	if (error)
 		return error;
 
diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index 18ff85a83f80..23dcfbb9cbdb 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -864,7 +864,7 @@ static int uart_set_info(struct tty_struct *tty, struct tty_port *port,
 	}
 
 	if (change_irq || change_port) {
-		retval = security_locked_down(LOCKDOWN_TIOCSSERIAL);
+		retval = security_locked_down(current_cred(), LOCKDOWN_TIOCSSERIAL);
 		if (retval)
 			goto exit;
 	}
diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
index e813acfaa6e8..92b5eda0722e 100644
--- a/fs/debugfs/file.c
+++ b/fs/debugfs/file.c
@@ -154,7 +154,7 @@ static int debugfs_locked_down(struct inode *inode,
 	    !real_fops->mmap)
 		return 0;
 
-	if (security_locked_down(LOCKDOWN_DEBUGFS))
+	if (security_locked_down(current_cred(), LOCKDOWN_DEBUGFS))
 		return -EPERM;
 
 	return 0;
diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index 8129a430d789..17f6438cc1b5 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -48,7 +48,7 @@ static int debugfs_setattr(struct user_namespace *mnt_userns,
 	int ret;
 
 	if (ia->ia_valid & (ATTR_MODE | ATTR_UID | ATTR_GID)) {
-		ret = security_locked_down(LOCKDOWN_DEBUGFS);
+		ret = security_locked_down(current_cred(), LOCKDOWN_DEBUGFS);
 		if (ret)
 			return ret;
 	}
diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 4d2e64e9016c..3747ac4097b8 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -545,7 +545,7 @@ out:
 
 static int open_kcore(struct inode *inode, struct file *filp)
 {
-	int ret = security_locked_down(LOCKDOWN_KCORE);
+	int ret = security_locked_down(current_cred(), LOCKDOWN_KCORE);
 
 	if (!capable(CAP_SYS_RAWIO))
 		return -EPERM;
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index 1261e8b41edb..9db8dd52d429 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -396,7 +396,7 @@ struct dentry *tracefs_create_file(const char *name, umode_t mode,
 	struct dentry *dentry;
 	struct inode *inode;
 
-	if (security_locked_down(LOCKDOWN_TRACEFS))
+	if (security_locked_down(NULL, LOCKDOWN_TRACEFS))
 		return NULL;
 
 	if (!(mode & S_IFMT))
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 04c01794de83..a763e373ce0d 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -394,7 +394,7 @@ LSM_HOOK(int, 0, bpf_prog_alloc_security, struct bpf_prog_aux *aux)
 LSM_HOOK(void, LSM_RET_VOID, bpf_prog_free_security, struct bpf_prog_aux *aux)
 #endif /* CONFIG_BPF_SYSCALL */
 
-LSM_HOOK(int, 0, locked_down, enum lockdown_reason what)
+LSM_HOOK(int, 0, locked_down, const struct cred *cred, enum lockdown_reason what)
 
 #ifdef CONFIG_PERF_EVENTS
 LSM_HOOK(int, 0, perf_event_open, struct perf_event_attr *attr, int type)
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 5c4c5c0602cb..8156f2dbaab7 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1543,6 +1543,7 @@
  *     Determine whether a kernel feature that potentially enables arbitrary
  *     code execution in kernel space should be permitted.
  *
+ *     @cred: credential asociated with the operation, or NULL if not applicable
  *     @what: kernel feature being accessed
  *
  * Security hooks for perf events
diff --git a/include/linux/security.h b/include/linux/security.h
index 06f7c50ce77f..ed01b1425ce7 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -470,7 +470,7 @@ void security_inode_invalidate_secctx(struct inode *inode);
 int security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen);
 int security_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen);
 int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen);
-int security_locked_down(enum lockdown_reason what);
+int security_locked_down(const struct cred *cred, enum lockdown_reason what);
 #else /* CONFIG_SECURITY */
 
 static inline int call_blocking_lsm_notifier(enum lsm_event event, void *data)
@@ -1343,7 +1343,7 @@ static inline int security_inode_getsecctx(struct inode *inode, void **ctx, u32
 {
 	return -EOPNOTSUPP;
 }
-static inline int security_locked_down(enum lockdown_reason what)
+static inline int security_locked_down(struct cred *cred, enum lockdown_reason what)
 {
 	return 0;
 }
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index a2f1f15ce432..69b9b72f1b5f 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1070,13 +1070,15 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 	case BPF_FUNC_probe_read_user:
 		return &bpf_probe_read_user_proto;
 	case BPF_FUNC_probe_read_kernel:
-		return security_locked_down(LOCKDOWN_BPF_READ) < 0 ?
-		       NULL : &bpf_probe_read_kernel_proto;
+		if (security_locked_down(current_cred(), LOCKDOWN_BPF_READ) < 0)
+			return NULL;
+		return &bpf_probe_read_kernel_proto;
 	case BPF_FUNC_probe_read_user_str:
 		return &bpf_probe_read_user_str_proto;
 	case BPF_FUNC_probe_read_kernel_str:
-		return security_locked_down(LOCKDOWN_BPF_READ) < 0 ?
-		       NULL : &bpf_probe_read_kernel_str_proto;
+		if (security_locked_down(current_cred(), LOCKDOWN_BPF_READ) < 0)
+			return NULL;
+		return &bpf_probe_read_kernel_str_proto;
 	case BPF_FUNC_snprintf_btf:
 		return &bpf_snprintf_btf_proto;
 	case BPF_FUNC_snprintf:
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6fee4a7e88d7..188c38f470c2 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11977,7 +11977,7 @@ SYSCALL_DEFINE5(perf_event_open,
 
 	/* REGS_INTR can leak data, lockdown must prevent this */
 	if (attr.sample_type & PERF_SAMPLE_REGS_INTR) {
-		err = security_locked_down(LOCKDOWN_PERF);
+		err = security_locked_down(current_cred(), LOCKDOWN_PERF);
 		if (err)
 			return err;
 	}
diff --git a/kernel/kexec.c b/kernel/kexec.c
index c82c6c06f051..abbed3eeb631 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -213,7 +213,7 @@ static inline int kexec_load_check(unsigned long nr_segments,
 	 * kexec can be used to circumvent module loading restrictions, so
 	 * prevent loading in that case
 	 */
-	result = security_locked_down(LOCKDOWN_KEXEC);
+	result = security_locked_down(current_cred(), LOCKDOWN_KEXEC);
 	if (result)
 		return result;
 
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 33400ff051a8..add00b325f4f 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -204,7 +204,7 @@ kimage_validate_signature(struct kimage *image)
 		 * down.
 		 */
 		if (!ima_appraise_signature(READING_KEXEC_IMAGE) &&
-		    security_locked_down(LOCKDOWN_KEXEC))
+		    security_locked_down(current_cred(), LOCKDOWN_KEXEC))
 			return -EPERM;
 
 		pr_debug("kernel signature verification failed (%d).\n", ret);
diff --git a/kernel/module.c b/kernel/module.c
index 7e78dfabca97..9351ea9d394b 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2905,7 +2905,7 @@ static int module_sig_check(struct load_info *info, int flags)
 		return -EKEYREJECTED;
 	}
 
-	return security_locked_down(LOCKDOWN_MODULE_SIGNATURE);
+	return security_locked_down(current_cred(), LOCKDOWN_MODULE_SIGNATURE);
 }
 #else /* !CONFIG_MODULE_SIG */
 static int module_sig_check(struct load_info *info, int flags)
diff --git a/kernel/params.c b/kernel/params.c
index 2daa2780a92c..42c8f28f6071 100644
--- a/kernel/params.c
+++ b/kernel/params.c
@@ -100,7 +100,7 @@ bool parameq(const char *a, const char *b)
 static bool param_check_unsafe(const struct kernel_param *kp)
 {
 	if (kp->flags & KERNEL_PARAM_FL_HWPARAM &&
-	    security_locked_down(LOCKDOWN_MODULE_PARAMETERS))
+	    security_locked_down(current_cred(), LOCKDOWN_MODULE_PARAMETERS))
 		return false;
 
 	if (kp->flags & KERNEL_PARAM_FL_UNSAFE) {
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index da0b41914177..a980e587b93a 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -81,7 +81,8 @@ void hibernate_release(void)
 
 bool hibernation_available(void)
 {
-	return nohibernate == 0 && !security_locked_down(LOCKDOWN_HIBERNATION);
+	return nohibernate == 0 &&
+		!security_locked_down(current_cred(), LOCKDOWN_HIBERNATION);
 }
 
 /**
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 7a52bc172841..9b374e077e96 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -999,20 +999,24 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_probe_read_user:
 		return &bpf_probe_read_user_proto;
 	case BPF_FUNC_probe_read_kernel:
-		return security_locked_down(LOCKDOWN_BPF_READ) < 0 ?
-		       NULL : &bpf_probe_read_kernel_proto;
+		if (security_locked_down(current_cred(), LOCKDOWN_BPF_READ) < 0)
+			return NULL;
+		return &bpf_probe_read_kernel_proto;
 	case BPF_FUNC_probe_read_user_str:
 		return &bpf_probe_read_user_str_proto;
 	case BPF_FUNC_probe_read_kernel_str:
-		return security_locked_down(LOCKDOWN_BPF_READ) < 0 ?
-		       NULL : &bpf_probe_read_kernel_str_proto;
+		if (security_locked_down(current_cred(), LOCKDOWN_BPF_READ) < 0)
+			return NULL;
+		return &bpf_probe_read_kernel_str_proto;
 #ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
 	case BPF_FUNC_probe_read:
-		return security_locked_down(LOCKDOWN_BPF_READ) < 0 ?
-		       NULL : &bpf_probe_read_compat_proto;
+		if (security_locked_down(current_cred(), LOCKDOWN_BPF_READ) < 0)
+			return NULL;
+		return &bpf_probe_read_compat_proto;
 	case BPF_FUNC_probe_read_str:
-		return security_locked_down(LOCKDOWN_BPF_READ) < 0 ?
-		       NULL : &bpf_probe_read_compat_str_proto;
+		if (security_locked_down(current_cred(), LOCKDOWN_BPF_READ) < 0)
+			return NULL;
+		return &bpf_probe_read_compat_str_proto;
 #endif
 #ifdef CONFIG_CGROUPS
 	case BPF_FUNC_get_current_cgroup_id:
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 2e8a3fde7104..73cae115a523 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3688,7 +3688,7 @@ ftrace_avail_open(struct inode *inode, struct file *file)
 	struct ftrace_iterator *iter;
 	int ret;
 
-	ret = security_locked_down(LOCKDOWN_TRACEFS);
+	ret = security_locked_down(current_cred(), LOCKDOWN_TRACEFS);
 	if (ret)
 		return ret;
 
@@ -5817,7 +5817,7 @@ __ftrace_graph_open(struct inode *inode, struct file *file,
 	int ret;
 	struct ftrace_hash *new_hash = NULL;
 
-	ret = security_locked_down(LOCKDOWN_TRACEFS);
+	ret = security_locked_down(current_cred(), LOCKDOWN_TRACEFS);
 	if (ret)
 		return ret;
 
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 2c0ee6484990..e9fa3bd389f7 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -5860,7 +5860,7 @@ static __init int test_ringbuffer(void)
 	int cpu;
 	int ret = 0;
 
-	if (security_locked_down(LOCKDOWN_TRACEFS)) {
+	if (security_locked_down(current_cred(), LOCKDOWN_TRACEFS)) {
 		pr_warn("Lockdown is enabled, skipping ring buffer tests\n");
 		return 0;
 	}
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index a21ef9cd2aae..8f5915aabae1 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -477,7 +477,7 @@ int tracing_check_open_get_tr(struct trace_array *tr)
 {
 	int ret;
 
-	ret = security_locked_down(LOCKDOWN_TRACEFS);
+	ret = security_locked_down(current_cred(), LOCKDOWN_TRACEFS);
 	if (ret)
 		return ret;
 
@@ -2063,7 +2063,7 @@ int __init register_tracer(struct tracer *type)
 		return -1;
 	}
 
-	if (security_locked_down(LOCKDOWN_TRACEFS)) {
+	if (security_locked_down(current_cred(), LOCKDOWN_TRACEFS)) {
 		pr_warn("Can not register tracer %s due to lockdown\n",
 			   type->name);
 		return -EPERM;
@@ -9356,7 +9356,7 @@ int tracing_init_dentry(void)
 {
 	struct trace_array *tr = &global_trace;
 
-	if (security_locked_down(LOCKDOWN_TRACEFS)) {
+	if (security_locked_down(current_cred(), LOCKDOWN_TRACEFS)) {
 		pr_warn("Tracing disabled due to lockdown\n");
 		return -EPERM;
 	}
@@ -9818,7 +9818,7 @@ __init static int tracer_alloc_buffers(void)
 	int ret = -ENOMEM;
 
 
-	if (security_locked_down(LOCKDOWN_TRACEFS)) {
+	if (security_locked_down(current_cred(), LOCKDOWN_TRACEFS)) {
 		pr_warn("Tracing disabled due to lockdown\n");
 		return -EPERM;
 	}
@@ -9989,7 +9989,7 @@ __init static int tracing_set_default_clock(void)
 {
 	/* sched_clock_stable() is determined in late_initcall */
 	if (!trace_boot_clock && !sched_clock_stable()) {
-		if (security_locked_down(LOCKDOWN_TRACEFS)) {
+		if (security_locked_down(current_cred(), LOCKDOWN_TRACEFS)) {
 			pr_warn("Can not set tracing clock due to lockdown\n");
 			return -EPERM;
 		}
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 80e96989770e..3703afc687f6 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -2129,7 +2129,7 @@ ftrace_event_open(struct inode *inode, struct file *file,
 	struct seq_file *m;
 	int ret;
 
-	ret = security_locked_down(LOCKDOWN_TRACEFS);
+	ret = security_locked_down(current_cred(), LOCKDOWN_TRACEFS);
 	if (ret)
 		return ret;
 
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index c1abd63f1d6c..9fccd239abae 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -4783,7 +4783,7 @@ static int event_hist_open(struct inode *inode, struct file *file)
 {
 	int ret;
 
-	ret = security_locked_down(LOCKDOWN_TRACEFS);
+	ret = security_locked_down(current_cred(), LOCKDOWN_TRACEFS);
 	if (ret)
 		return ret;
 
@@ -5055,7 +5055,7 @@ static int event_hist_debug_open(struct inode *inode, struct file *file)
 {
 	int ret;
 
-	ret = security_locked_down(LOCKDOWN_TRACEFS);
+	ret = security_locked_down(current_cred(), LOCKDOWN_TRACEFS);
 	if (ret)
 		return ret;
 
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index 2ac75eb6aa86..7157a6bd64b7 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -2171,7 +2171,7 @@ static int synth_events_open(struct inode *inode, struct file *file)
 {
 	int ret;
 
-	ret = security_locked_down(LOCKDOWN_TRACEFS);
+	ret = security_locked_down(current_cred(), LOCKDOWN_TRACEFS);
 	if (ret)
 		return ret;
 
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index b8bfa8505b7b..0db73069d8fa 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -178,7 +178,7 @@ static int event_trigger_regex_open(struct inode *inode, struct file *file)
 {
 	int ret;
 
-	ret = security_locked_down(LOCKDOWN_TRACEFS);
+	ret = security_locked_down(current_cred(), LOCKDOWN_TRACEFS);
 	if (ret)
 		return ret;
 
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index ea6178cb5e33..370fa23d7f0f 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -483,7 +483,7 @@ static int __register_trace_kprobe(struct trace_kprobe *tk)
 {
 	int i, ret;
 
-	ret = security_locked_down(LOCKDOWN_KPROBES);
+	ret = security_locked_down(current_cred(), LOCKDOWN_KPROBES);
 	if (ret)
 		return ret;
 
@@ -1150,7 +1150,7 @@ static int probes_open(struct inode *inode, struct file *file)
 {
 	int ret;
 
-	ret = security_locked_down(LOCKDOWN_TRACEFS);
+	ret = security_locked_down(current_cred(), LOCKDOWN_TRACEFS);
 	if (ret)
 		return ret;
 
@@ -1208,7 +1208,7 @@ static int profile_open(struct inode *inode, struct file *file)
 {
 	int ret;
 
-	ret = security_locked_down(LOCKDOWN_TRACEFS);
+	ret = security_locked_down(current_cred(), LOCKDOWN_TRACEFS);
 	if (ret)
 		return ret;
 
diff --git a/kernel/trace/trace_printk.c b/kernel/trace/trace_printk.c
index 4b320fe7df70..47c808484cb2 100644
--- a/kernel/trace/trace_printk.c
+++ b/kernel/trace/trace_printk.c
@@ -362,7 +362,7 @@ ftrace_formats_open(struct inode *inode, struct file *file)
 {
 	int ret;
 
-	ret = security_locked_down(LOCKDOWN_TRACEFS);
+	ret = security_locked_down(current_cred(), LOCKDOWN_TRACEFS);
 	if (ret)
 		return ret;
 
diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
index 63c285042051..63b6ebe7bdce 100644
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -477,7 +477,7 @@ static int stack_trace_open(struct inode *inode, struct file *file)
 {
 	int ret;
 
-	ret = security_locked_down(LOCKDOWN_TRACEFS);
+	ret = security_locked_down(current_cred(), LOCKDOWN_TRACEFS);
 	if (ret)
 		return ret;
 
diff --git a/kernel/trace/trace_stat.c b/kernel/trace/trace_stat.c
index 8d141c3825a9..2f6ae81ee67e 100644
--- a/kernel/trace/trace_stat.c
+++ b/kernel/trace/trace_stat.c
@@ -236,7 +236,7 @@ static int tracing_stat_open(struct inode *inode, struct file *file)
 	struct seq_file *m;
 	struct stat_session *session = inode->i_private;
 
-	ret = security_locked_down(LOCKDOWN_TRACEFS);
+	ret = security_locked_down(current_cred(), LOCKDOWN_TRACEFS);
 	if (ret)
 		return ret;
 
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 9b50869a5ddb..a5a71096d363 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -781,7 +781,7 @@ static int probes_open(struct inode *inode, struct file *file)
 {
 	int ret;
 
-	ret = security_locked_down(LOCKDOWN_TRACEFS);
+	ret = security_locked_down(current_cred(), LOCKDOWN_TRACEFS);
 	if (ret)
 		return ret;
 
@@ -836,7 +836,7 @@ static int profile_open(struct inode *inode, struct file *file)
 {
 	int ret;
 
-	ret = security_locked_down(LOCKDOWN_TRACEFS);
+	ret = security_locked_down(current_cred(), LOCKDOWN_TRACEFS);
 	if (ret)
 		return ret;
 
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index f0aecee4d539..e3a8c66bead0 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -850,8 +850,15 @@ static int copy_user_offload(struct xfrm_state_offload *xso, struct sk_buff *skb
 
 static bool xfrm_redact(void)
 {
-	return IS_ENABLED(CONFIG_SECURITY) &&
-		security_locked_down(LOCKDOWN_XFRM_SECRET);
+	/* Don't use current_cred() here, since this may be called when
+	 * broadcasting a notification that an SA has been created/deleted.
+	 * In that case current task is the one triggering the notification,
+	 * but the SA key is actually leaked to the event subscribers.
+	 * Since we can't easily do the redact decision per-subscriber,
+	 * just pass NULL here, indicating to the LSMs that a global lockdown
+	 * decision should be made instead.
+	 */
+	return security_locked_down(NULL, LOCKDOWN_XFRM_SECRET);
 }
 
 static int copy_to_user_auth(struct xfrm_algo_auth *auth, struct sk_buff *skb)
diff --git a/security/lockdown/lockdown.c b/security/lockdown/lockdown.c
index 87cbdc64d272..2abe92157e82 100644
--- a/security/lockdown/lockdown.c
+++ b/security/lockdown/lockdown.c
@@ -55,7 +55,8 @@ early_param("lockdown", lockdown_param);
  * lockdown_is_locked_down - Find out if the kernel is locked down
  * @what: Tag to use in notice generated if lockdown is in effect
  */
-static int lockdown_is_locked_down(enum lockdown_reason what)
+static int lockdown_is_locked_down(const struct cred *cred,
+				   enum lockdown_reason what)
 {
 	if (WARN(what >= LOCKDOWN_CONFIDENTIALITY_MAX,
 		 "Invalid lockdown reason"))
diff --git a/security/security.c b/security/security.c
index b38155b2de83..9719327b5d70 100644
--- a/security/security.c
+++ b/security/security.c
@@ -2592,9 +2592,9 @@ void security_bpf_prog_free(struct bpf_prog_aux *aux)
 }
 #endif /* CONFIG_BPF_SYSCALL */
 
-int security_locked_down(enum lockdown_reason what)
+int security_locked_down(const struct cred *cred, enum lockdown_reason what)
 {
-	return call_int_hook(locked_down, 0, what);
+	return call_int_hook(locked_down, 0, cred, what);
 }
 EXPORT_SYMBOL(security_locked_down);
 
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index eaea837d89d1..f7cb702598b9 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -7017,10 +7017,10 @@ static void selinux_bpf_prog_free(struct bpf_prog_aux *aux)
 }
 #endif
 
-static int selinux_lockdown(enum lockdown_reason what)
+static int selinux_lockdown(const struct cred *cred, enum lockdown_reason what)
 {
 	struct common_audit_data ad;
-	u32 sid = current_sid();
+	u32 sid;
 	int invalid_reason = (what <= LOCKDOWN_NONE) ||
 			     (what == LOCKDOWN_INTEGRITY_MAX) ||
 			     (what >= LOCKDOWN_CONFIDENTIALITY_MAX);
@@ -7032,6 +7032,9 @@ static int selinux_lockdown(enum lockdown_reason what)
 		return -EINVAL;
 	}
 
+	/* Use SECINITSID_KERNEL if there is no relevant cred to check against */
+	sid = cred ? cred_sid(cred) : SECINITSID_KERNEL;
+
 	ad.type = LSM_AUDIT_DATA_LOCKDOWN;
 	ad.u.reason = what;
 
-- 
2.31.1


^ permalink raw reply related

* [PATCH v13 3/3] kasan: define and use MAX_PTRS_PER_* for early shadow tables
From: Daniel Axtens @ 2021-06-16  8:02 UTC (permalink / raw)
  To: linux-kernel, linux-mm, kasan-dev, elver, akpm, andreyknvl
  Cc: aneesh.kumar, linuxppc-dev, Daniel Axtens
In-Reply-To: <20210616080244.51236-1-dja@axtens.net>

powerpc has a variable number of PTRS_PER_*, set at runtime based
on the MMU that the kernel is booted under.

This means the PTRS_PER_* are no longer constants, and therefore
breaks the build.

Define default MAX_PTRS_PER_*s in the same style as MAX_PTRS_PER_P4D.
As KASAN is the only user at the moment, just define them in the kasan
header, and have them default to PTRS_PER_* unless overridden in arch
code.

Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Suggested-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 include/linux/kasan.h | 18 +++++++++++++++---
 mm/kasan/init.c       |  6 +++---
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 768d7d342757..fd65f477ac92 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -40,10 +40,22 @@ struct kunit_kasan_expectation {
 #define PTE_HWTABLE_PTRS 0
 #endif
 
+#ifndef MAX_PTRS_PER_PTE
+#define MAX_PTRS_PER_PTE PTRS_PER_PTE
+#endif
+
+#ifndef MAX_PTRS_PER_PMD
+#define MAX_PTRS_PER_PMD PTRS_PER_PMD
+#endif
+
+#ifndef MAX_PTRS_PER_PUD
+#define MAX_PTRS_PER_PUD PTRS_PER_PUD
+#endif
+
 extern unsigned char kasan_early_shadow_page[PAGE_SIZE];
-extern pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS];
-extern pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD];
-extern pud_t kasan_early_shadow_pud[PTRS_PER_PUD];
+extern pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS];
+extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD];
+extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD];
 extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D];
 
 int kasan_populate_early_shadow(const void *shadow_start,
diff --git a/mm/kasan/init.c b/mm/kasan/init.c
index 348f31d15a97..cc64ed6858c6 100644
--- a/mm/kasan/init.c
+++ b/mm/kasan/init.c
@@ -41,7 +41,7 @@ static inline bool kasan_p4d_table(pgd_t pgd)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 3
-pud_t kasan_early_shadow_pud[PTRS_PER_PUD] __page_aligned_bss;
+pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD] __page_aligned_bss;
 static inline bool kasan_pud_table(p4d_t p4d)
 {
 	return p4d_page(p4d) == virt_to_page(lm_alias(kasan_early_shadow_pud));
@@ -53,7 +53,7 @@ static inline bool kasan_pud_table(p4d_t p4d)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 2
-pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD] __page_aligned_bss;
+pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD] __page_aligned_bss;
 static inline bool kasan_pmd_table(pud_t pud)
 {
 	return pud_page(pud) == virt_to_page(lm_alias(kasan_early_shadow_pmd));
@@ -64,7 +64,7 @@ static inline bool kasan_pmd_table(pud_t pud)
 	return false;
 }
 #endif
-pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS]
+pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS]
 	__page_aligned_bss;
 
 static inline bool kasan_pte_table(pmd_t pmd)
-- 
2.30.2


^ permalink raw reply related

* [PATCH v13 2/3] kasan: allow architectures to provide an outline readiness check
From: Daniel Axtens @ 2021-06-16  8:02 UTC (permalink / raw)
  To: linux-kernel, linux-mm, kasan-dev, elver, akpm, andreyknvl
  Cc: aneesh.kumar, Aneesh Kumar K . V, linuxppc-dev, Daniel Axtens
In-Reply-To: <20210616080244.51236-1-dja@axtens.net>

Allow architectures to define a kasan_arch_is_ready() hook that bails
out of any function that's about to touch the shadow unless the arch
says that it is ready for the memory to be accessed. This is fairly
uninvasive and should have a negligible performance penalty.

This will only work in outline mode, so an arch must specify
ARCH_DISABLE_KASAN_INLINE if it requires this.

Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Daniel Axtens <dja@axtens.net>

--

I discuss the justfication for this later in the series. Also,
both previous RFCs for ppc64 - by 2 different people - have
needed this trick! See:
 - https://lore.kernel.org/patchwork/patch/592820/ # ppc64 hash series
 - https://patchwork.ozlabs.org/patch/795211/      # ppc radix series
---
 mm/kasan/common.c  | 4 ++++
 mm/kasan/generic.c | 3 +++
 mm/kasan/kasan.h   | 4 ++++
 mm/kasan/shadow.c  | 8 ++++++++
 4 files changed, 19 insertions(+)

diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 10177cc26d06..0ad615f3801d 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -331,6 +331,10 @@ static inline bool ____kasan_slab_free(struct kmem_cache *cache, void *object,
 	u8 tag;
 	void *tagged_object;
 
+	/* Bail if the arch isn't ready */
+	if (!kasan_arch_is_ready())
+		return false;
+
 	tag = get_tag(object);
 	tagged_object = object;
 	object = kasan_reset_tag(object);
diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index 53cbf28859b5..c3f5ba7a294a 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -163,6 +163,9 @@ static __always_inline bool check_region_inline(unsigned long addr,
 						size_t size, bool write,
 						unsigned long ret_ip)
 {
+	if (!kasan_arch_is_ready())
+		return true;
+
 	if (unlikely(size == 0))
 		return true;
 
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index 8f450bc28045..19323a3d5975 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -449,6 +449,10 @@ static inline void kasan_poison_last_granule(const void *address, size_t size) {
 
 #endif /* CONFIG_KASAN_GENERIC */
 
+#ifndef kasan_arch_is_ready
+static inline bool kasan_arch_is_ready(void)	{ return true; }
+#endif
+
 /*
  * Exported functions for interfaces called from assembly or from generated
  * code. Declarations here to avoid warning about missing declarations.
diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index 082ee5b6d9a1..3c7f7efe6f68 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -73,6 +73,10 @@ void kasan_poison(const void *addr, size_t size, u8 value, bool init)
 {
 	void *shadow_start, *shadow_end;
 
+	/* Don't touch the shadow memory if arch isn't ready */
+	if (!kasan_arch_is_ready())
+		return;
+
 	/*
 	 * Perform shadow offset calculation based on untagged address, as
 	 * some of the callers (e.g. kasan_poison_object_data) pass tagged
@@ -99,6 +103,10 @@ EXPORT_SYMBOL(kasan_poison);
 #ifdef CONFIG_KASAN_GENERIC
 void kasan_poison_last_granule(const void *addr, size_t size)
 {
+	/* Don't touch the shadow memory if arch isn't ready */
+	if (!kasan_arch_is_ready())
+		return;
+
 	if (size & KASAN_GRANULE_MASK) {
 		u8 *shadow = (u8 *)kasan_mem_to_shadow(addr + size);
 		*shadow = size & KASAN_GRANULE_MASK;
-- 
2.30.2


^ permalink raw reply related

* [PATCH v13 1/3] kasan: allow an architecture to disable inline instrumentation
From: Daniel Axtens @ 2021-06-16  8:02 UTC (permalink / raw)
  To: linux-kernel, linux-mm, kasan-dev, elver, akpm, andreyknvl
  Cc: aneesh.kumar, linuxppc-dev, Daniel Axtens
In-Reply-To: <20210616080244.51236-1-dja@axtens.net>

For annoying architectural reasons, it's very difficult to support inline
instrumentation on powerpc64.*

Add a Kconfig flag to allow an arch to disable inline. (It's a bit
annoying to be 'backwards', but I'm not aware of any way to have
an arch force a symbol to be 'n', rather than 'y'.)

We also disable stack instrumentation in this case as it does things that
are functionally equivalent to inline instrumentation, namely adding
code that touches the shadow directly without going through a C helper.

* on ppc64 atm, the shadow lives in virtual memory and isn't accessible in
real mode. However, before we turn on virtual memory, we parse the device
tree to determine which platform and MMU we're running under. That calls
generic DT code, which is instrumented. Inline instrumentation in DT would
unconditionally attempt to touch the shadow region, which we won't have
set up yet, and would crash. We can make outline mode wait for the arch to
be ready, but we can't change what the compiler inserts for inline mode.

Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 lib/Kconfig.kasan | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index cffc2ebbf185..cb5e02d09e11 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -12,6 +12,15 @@ config HAVE_ARCH_KASAN_HW_TAGS
 config HAVE_ARCH_KASAN_VMALLOC
 	bool
 
+config ARCH_DISABLE_KASAN_INLINE
+	bool
+	help
+	  Sometimes an architecture might not be able to support inline
+	  instrumentation but might be able to support outline instrumentation.
+	  This option allows an architecture to prevent inline and stack
+	  instrumentation from being enabled.
+
+
 config CC_HAS_KASAN_GENERIC
 	def_bool $(cc-option, -fsanitize=kernel-address)
 
@@ -130,6 +139,7 @@ config KASAN_OUTLINE
 
 config KASAN_INLINE
 	bool "Inline instrumentation"
+	depends on !ARCH_DISABLE_KASAN_INLINE
 	help
 	  Compiler directly inserts code checking shadow memory before
 	  memory accesses. This is faster than outline (in some workloads
@@ -141,6 +151,7 @@ endchoice
 config KASAN_STACK
 	bool "Enable stack instrumentation (unsafe)" if CC_IS_CLANG && !COMPILE_TEST
 	depends on KASAN_GENERIC || KASAN_SW_TAGS
+	depends on !ARCH_DISABLE_KASAN_INLINE
 	default y if CC_IS_GCC
 	help
 	  The LLVM stack address sanitizer has a know problem that
@@ -154,6 +165,9 @@ config KASAN_STACK
 	  but clang users can still enable it for builds without
 	  CONFIG_COMPILE_TEST.	On gcc it is assumed to always be safe
 	  to use and enabled by default.
+	  If the architecture disables inline instrumentation, this is
+	  also disabled as it adds inline-style instrumentation that
+	  is run unconditionally.
 
 config KASAN_SW_TAGS_IDENTIFY
 	bool "Enable memory corruption identification"
-- 
2.30.2


^ permalink raw reply related

* [PATCH v13 0/3] KASAN core changes for ppc64 radix KASAN
From: Daniel Axtens @ 2021-06-16  8:02 UTC (permalink / raw)
  To: linux-kernel, linux-mm, kasan-dev, elver, akpm, andreyknvl
  Cc: aneesh.kumar, linuxppc-dev, Daniel Axtens

Building on the work of Christophe, Aneesh and Balbir, I've ported
KASAN to 64-bit Book3S kernels running on the Radix MMU. I've been
trying this for a while, but we keep having collisions between the
kasan code in the mm tree and the code I want to put in to the ppc
tree.

So this series just contains the kasan core changes that we
need. These can go in via the mm tree. I will then propose the powerpc
changes for a later cycle. (The most recent RFC for the powerpc
changes is in the last series at
https://lore.kernel.org/linux-mm/20210615014705.2234866-1-dja@axtens.net/
)

v13 applies to next-20210611. There should be no noticeable changes to
other platforms.

Changes since v12: respond to Marco's review comments - clean up the
help for ARCH_DISABLE_KASAN_INLINE, and add an arch readiness check to
the new granule poisioning function. Thanks Marco.

Kind regards,
Daniel

Daniel Axtens (3):
  kasan: allow an architecture to disable inline instrumentation
  kasan: allow architectures to provide an outline readiness check
  kasan: define and use MAX_PTRS_PER_* for early shadow tables

 include/linux/kasan.h | 18 +++++++++++++++---
 lib/Kconfig.kasan     | 14 ++++++++++++++
 mm/kasan/common.c     |  4 ++++
 mm/kasan/generic.c    |  3 +++
 mm/kasan/init.c       |  6 +++---
 mm/kasan/kasan.h      |  4 ++++
 mm/kasan/shadow.c     |  8 ++++++++
 7 files changed, 51 insertions(+), 6 deletions(-)

-- 
2.30.2

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox