LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 5/5] powerpc/pseries: Add support for FORM2 associativity
From: Aneesh Kumar K.V @ 2021-08-12 13:22 UTC (permalink / raw)
  To: linuxppc-dev, mpe
  Cc: Nathan Lynch, Aneesh Kumar K.V, Daniel Henrique Barboza,
	David Gibson
In-Reply-To: <20210812132223.225214-1-aneesh.kumar@linux.ibm.com>

PAPR interface currently supports two different ways of communicating resource
grouping details to the OS. These are referred to as Form 0 and Form 1
associativity grouping. Form 0 is the older format and is now considered
deprecated. This patch adds another resource grouping named FORM2.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 Documentation/powerpc/associativity.rst   | 104 ++++++++++++
 arch/powerpc/include/asm/firmware.h       |   3 +-
 arch/powerpc/include/asm/prom.h           |   1 +
 arch/powerpc/kernel/prom_init.c           |   3 +-
 arch/powerpc/mm/numa.c                    | 187 ++++++++++++++++++----
 arch/powerpc/platforms/pseries/firmware.c |   1 +
 6 files changed, 262 insertions(+), 37 deletions(-)
 create mode 100644 Documentation/powerpc/associativity.rst

diff --git a/Documentation/powerpc/associativity.rst b/Documentation/powerpc/associativity.rst
new file mode 100644
index 000000000000..07e7dd3d6c87
--- /dev/null
+++ b/Documentation/powerpc/associativity.rst
@@ -0,0 +1,104 @@
+============================
+NUMA resource associativity
+=============================
+
+Associativity represents the groupings of the various platform resources into
+domains of substantially similar mean performance relative to resources outside
+of that domain. Resources subsets of a given domain that exhibit better
+performance relative to each other than relative to other resources subsets
+are represented as being members of a sub-grouping domain. This performance
+characteristic is presented in terms of NUMA node distance within the Linux kernel.
+From the platform view, these groups are also referred to as domains.
+
+PAPR interface currently supports different ways of communicating these resource
+grouping details to the OS. These are referred to as Form 0, Form 1 and Form2
+associativity grouping. Form 0 is the oldest format and is now considered deprecated.
+
+Hypervisor indicates the type/form of associativity used via "ibm,architecture-vec-5 property".
+Bit 0 of byte 5 in the "ibm,architecture-vec-5" property indicates usage of Form 0 or Form 1.
+A value of 1 indicates the usage of Form 1 associativity. For Form 2 associativity
+bit 2 of byte 5 in the "ibm,architecture-vec-5" property is used.
+
+Form 0
+-----
+Form 0 associativity supports only two NUMA distances (LOCAL and REMOTE).
+
+Form 1
+-----
+With Form 1 a combination of ibm,associativity-reference-points, and ibm,associativity
+device tree properties are used to determine the NUMA distance between resource groups/domains.
+
+The “ibm,associativity” property contains a list of one or more numbers (domainID)
+representing the resource’s platform grouping domains.
+
+The “ibm,associativity-reference-points” property contains a list of one or more numbers
+(domainID index) that represents the 1 based ordinal in the associativity lists.
+The list of domainID indexes represents an increasing hierarchy of resource grouping.
+
+ex:
+{ primary domainID index, secondary domainID index, tertiary domainID index.. }
+
+Linux kernel uses the domainID at the primary domainID index as the NUMA node id.
+Linux kernel computes NUMA distance between two domains by recursively comparing
+if they belong to the same higher-level domains. For mismatch at every higher
+level of the resource group, the kernel doubles the NUMA distance between the
+comparing domains.
+
+Form 2
+-------
+Form 2 associativity format adds separate device tree properties representing NUMA node distance
+thereby making the node distance computation flexible. Form 2 also allows flexible primary
+domain numbering. With numa distance computation now detached from the index value in
+"ibm,associativity-reference-points" property, Form 2 allows a large number of primary domain
+ids at the same domainID index representing resource groups of different performance/latency
+characteristics.
+
+Hypervisor indicates the usage of FORM2 associativity using bit 2 of byte 5 in the
+"ibm,architecture-vec-5" property.
+
+"ibm,numa-lookup-index-table" property contains a list of one or more numbers representing
+the domainIDs present in the system. The offset of the domainID in this property is
+used as an index while computing numa distance information via "ibm,numa-distance-table".
+
+prop-encoded-array: The number N of the domainIDs encoded as with encode-int, followed by
+N domainID encoded as with encode-int
+
+For ex:
+"ibm,numa-lookup-index-table" =  {4, 0, 8, 250, 252}. The offset of domainID 8 (2) is used when
+computing the distance of domain 8 from other domains present in the system. For the rest of
+this document, this offset will be referred to as domain distance offset.
+
+"ibm,numa-distance-table" property contains a list of one or more numbers representing the NUMA
+distance between resource groups/domains present in the system.
+
+prop-encoded-array: The number N of the distance values encoded as with encode-int, followed by
+N distance values encoded as with encode-bytes. The max distance value we could encode is 255.
+The number N must be equal to the square of m where m is the number of domainIDs in the
+numa-lookup-index-table.
+
+For ex:
+ibm,numa-lookup-index-table = <3 0 8 40>;
+ibm,numa-distace-table = <9>, /bits/ 8 < 10  20  80
+					 20  10 160
+					 80 160  10>;
+  | 0    8   40
+--|------------
+  |
+0 | 10   20  80
+  |
+8 | 20   10  160
+  |
+40| 80   160  10
+
+A possible "ibm,associativity" property for resources in node 0, 8 and 40
+
+{ 3, 6, 7, 0 }
+{ 3, 6, 9, 8 }
+{ 3, 6, 7, 40}
+
+With "ibm,associativity-reference-points"  { 0x3 }
+
+"ibm,lookup-index-table" helps in having a compact representation of distance matrix.
+Since domainID can be sparse, the matrix of distances can also be effectively sparse.
+With "ibm,lookup-index-table" we can achieve a compact representation of
+distance information.
diff --git a/arch/powerpc/include/asm/firmware.h b/arch/powerpc/include/asm/firmware.h
index 60b631161360..97a3bd9ffeb9 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -53,6 +53,7 @@
 #define FW_FEATURE_ULTRAVISOR	ASM_CONST(0x0000004000000000)
 #define FW_FEATURE_STUFF_TCE	ASM_CONST(0x0000008000000000)
 #define FW_FEATURE_RPT_INVALIDATE ASM_CONST(0x0000010000000000)
+#define FW_FEATURE_FORM2_AFFINITY ASM_CONST(0x0000020000000000)
 
 #ifndef __ASSEMBLY__
 
@@ -73,7 +74,7 @@ enum {
 		FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRMEM_V2 |
 		FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE |
 		FW_FEATURE_PAPR_SCM | FW_FEATURE_ULTRAVISOR |
-		FW_FEATURE_RPT_INVALIDATE,
+		FW_FEATURE_RPT_INVALIDATE | FW_FEATURE_FORM2_AFFINITY,
 	FW_FEATURE_PSERIES_ALWAYS = 0,
 	FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL | FW_FEATURE_ULTRAVISOR,
 	FW_FEATURE_POWERNV_ALWAYS = 0,
diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h
index df9fec9d232c..5c80152e8f18 100644
--- a/arch/powerpc/include/asm/prom.h
+++ b/arch/powerpc/include/asm/prom.h
@@ -149,6 +149,7 @@ extern int of_read_drc_info_cell(struct property **prop,
 #define OV5_XCMO		0x0440	/* Page Coalescing */
 #define OV5_FORM1_AFFINITY	0x0580	/* FORM1 NUMA affinity */
 #define OV5_PRRN		0x0540	/* Platform Resource Reassignment */
+#define OV5_FORM2_AFFINITY	0x0520	/* Form2 NUMA affinity */
 #define OV5_HP_EVT		0x0604	/* Hot Plug Event support */
 #define OV5_RESIZE_HPT		0x0601	/* Hash Page Table resizing */
 #define OV5_PFO_HW_RNG		0x1180	/* PFO Random Number Generator */
diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 57db605ad33a..95a42d49e291 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -1096,7 +1096,8 @@ static const struct ibm_arch_vec ibm_architecture_vec_template __initconst = {
 #else
 		0,
 #endif
-		.associativity = OV5_FEAT(OV5_FORM1_AFFINITY) | OV5_FEAT(OV5_PRRN),
+		.associativity = OV5_FEAT(OV5_FORM1_AFFINITY) | OV5_FEAT(OV5_PRRN) |
+		OV5_FEAT(OV5_FORM2_AFFINITY),
 		.bin_opts = OV5_FEAT(OV5_RESIZE_HPT) | OV5_FEAT(OV5_HP_EVT),
 		.micro_checkpoint = 0,
 		.reserved0 = 0,
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index fdb2472befa4..32acef4da845 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -56,12 +56,17 @@ static int n_mem_addr_cells, n_mem_size_cells;
 
 #define FORM0_AFFINITY 0
 #define FORM1_AFFINITY 1
+#define FORM2_AFFINITY 2
 static int affinity_form;
 
 #define MAX_DISTANCE_REF_POINTS 4
 static int distance_ref_points_depth;
 static const __be32 *distance_ref_points;
 static int distance_lookup_table[MAX_NUMNODES][MAX_DISTANCE_REF_POINTS];
+static int numa_distance_table[MAX_NUMNODES][MAX_NUMNODES] = {
+	[0 ... MAX_NUMNODES - 1] = { [0 ... MAX_NUMNODES - 1] = -1 }
+};
+static int numa_id_index_table[MAX_NUMNODES] = { [0 ... MAX_NUMNODES - 1] = NUMA_NO_NODE };
 
 /*
  * Allocate node_to_cpumask_map based on number of available nodes
@@ -166,6 +171,54 @@ static void unmap_cpu_from_node(unsigned long cpu)
 }
 #endif /* CONFIG_HOTPLUG_CPU || CONFIG_PPC_SPLPAR */
 
+static int __associativity_to_nid(const __be32 *associativity,
+				  int max_array_sz)
+{
+	int nid;
+	/*
+	 * primary_domain_index is 1 based array index.
+	 */
+	int index = primary_domain_index  - 1;
+
+	if (!numa_enabled || index >= max_array_sz)
+		return NUMA_NO_NODE;
+
+	nid = of_read_number(&associativity[index], 1);
+
+	/* POWER4 LPAR uses 0xffff as invalid node */
+	if (nid == 0xffff || nid >= nr_node_ids)
+		nid = NUMA_NO_NODE;
+	return nid;
+}
+/*
+ * Returns nid in the range [0..nr_node_ids], or -1 if no useful NUMA
+ * info is found.
+ */
+static int associativity_to_nid(const __be32 *associativity)
+{
+	int array_sz = of_read_number(associativity, 1);
+
+	/* Skip the first element in the associativity array */
+	return __associativity_to_nid((associativity + 1), array_sz);
+}
+
+static int __cpu_form2_relative_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc)
+{
+	int dist;
+	int node1, node2;
+
+	node1 = associativity_to_nid(cpu1_assoc);
+	node2 = associativity_to_nid(cpu2_assoc);
+
+	dist = numa_distance_table[node1][node2];
+	if (dist <= LOCAL_DISTANCE)
+		return 0;
+	else if (dist <= REMOTE_DISTANCE)
+		return 1;
+	else
+		return 2;
+}
+
 static int __cpu_form1_relative_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc)
 {
 	int dist = 0;
@@ -186,8 +239,9 @@ int cpu_relative_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc)
 {
 	/* We should not get called with FORM0 */
 	VM_WARN_ON(affinity_form == FORM0_AFFINITY);
-
-	return __cpu_form1_relative_distance(cpu1_assoc, cpu2_assoc);
+	if (affinity_form == FORM1_AFFINITY)
+		return __cpu_form1_relative_distance(cpu1_assoc, cpu2_assoc);
+	return __cpu_form2_relative_distance(cpu1_assoc, cpu2_assoc);
 }
 
 /* must hold reference to node during call */
@@ -201,7 +255,9 @@ int __node_distance(int a, int b)
 	int i;
 	int distance = LOCAL_DISTANCE;
 
-	if (affinity_form == FORM0_AFFINITY)
+	if (affinity_form == FORM2_AFFINITY)
+		return numa_distance_table[a][b];
+	else if (affinity_form == FORM0_AFFINITY)
 		return ((a == b) ? LOCAL_DISTANCE : REMOTE_DISTANCE);
 
 	for (i = 0; i < distance_ref_points_depth; i++) {
@@ -216,37 +272,6 @@ int __node_distance(int a, int b)
 }
 EXPORT_SYMBOL(__node_distance);
 
-static int __associativity_to_nid(const __be32 *associativity,
-				  int max_array_sz)
-{
-	int nid;
-	/*
-	 * primary_domain_index is 1 based array index.
-	 */
-	int index = primary_domain_index  - 1;
-
-	if (!numa_enabled || index >= max_array_sz)
-		return NUMA_NO_NODE;
-
-	nid = of_read_number(&associativity[index], 1);
-
-	/* POWER4 LPAR uses 0xffff as invalid node */
-	if (nid == 0xffff || nid >= nr_node_ids)
-		nid = NUMA_NO_NODE;
-	return nid;
-}
-/*
- * Returns nid in the range [0..nr_node_ids], or -1 if no useful NUMA
- * info is found.
- */
-static int associativity_to_nid(const __be32 *associativity)
-{
-	int array_sz = of_read_number(associativity, 1);
-
-	/* Skip the first element in the associativity array */
-	return __associativity_to_nid((associativity + 1), array_sz);
-}
-
 /* Returns the nid associated with the given device tree node,
  * or -1 if not found.
  */
@@ -320,6 +345,8 @@ static void initialize_form1_numa_distance(const __be32 *associativity)
  */
 void update_numa_distance(struct device_node *node)
 {
+	int nid;
+
 	if (affinity_form == FORM0_AFFINITY)
 		return;
 	else if (affinity_form == FORM1_AFFINITY) {
@@ -332,6 +359,84 @@ void update_numa_distance(struct device_node *node)
 		initialize_form1_numa_distance(associativity);
 		return;
 	}
+
+	/* FORM2 affinity  */
+	nid = of_node_to_nid_single(node);
+	if (nid == NUMA_NO_NODE)
+		return;
+
+	/*
+	 * With FORM2 we expect NUMA distance of all possible NUMA
+	 * nodes to be provided during boot.
+	 */
+	WARN(numa_distance_table[nid][nid] == -1,
+	     "NUMA distance details for node %d not provided\n", nid);
+}
+
+/*
+ * ibm,numa-lookup-index-table= {N, domainid1, domainid2, ..... domainidN}
+ * ibm,numa-distance-table = { N, 1, 2, 4, 5, 1, 6, .... N elements}
+ */
+static void initialize_form2_numa_distance_lookup_table(void)
+{
+	int i, j;
+	struct device_node *root;
+	const __u8 *numa_dist_table;
+	const __be32 *numa_lookup_index;
+	int numa_dist_table_length;
+	int max_numa_index, distance_index;
+
+	if (firmware_has_feature(FW_FEATURE_OPAL))
+		root = of_find_node_by_path("/ibm,opal");
+	else
+		root = of_find_node_by_path("/rtas");
+	if (!root)
+		root = of_find_node_by_path("/");
+
+	numa_lookup_index = of_get_property(root, "ibm,numa-lookup-index-table", NULL);
+	max_numa_index = of_read_number(&numa_lookup_index[0], 1);
+
+	/* first element of the array is the size and is encode-int */
+	numa_dist_table = of_get_property(root, "ibm,numa-distance-table", NULL);
+	numa_dist_table_length = of_read_number((const __be32 *)&numa_dist_table[0], 1);
+	/* Skip the size which is encoded int */
+	numa_dist_table += sizeof(__be32);
+
+	pr_debug("numa_dist_table_len = %d, numa_dist_indexes_len = %d\n",
+		 numa_dist_table_length, max_numa_index);
+
+	for (i = 0; i < max_numa_index; i++)
+		/* +1 skip the max_numa_index in the property */
+		numa_id_index_table[i] = of_read_number(&numa_lookup_index[i + 1], 1);
+
+
+	if (numa_dist_table_length != max_numa_index * max_numa_index) {
+		WARN(1, "Wrong NUMA distance information\n");
+		/* consider everybody else just remote. */
+		for (i = 0;  i < max_numa_index; i++) {
+			for (j = 0; j < max_numa_index; j++) {
+				int nodeA = numa_id_index_table[i];
+				int nodeB = numa_id_index_table[j];
+
+				if (nodeA == nodeB)
+					numa_distance_table[nodeA][nodeB] = LOCAL_DISTANCE;
+				else
+					numa_distance_table[nodeA][nodeB] = REMOTE_DISTANCE;
+			}
+		}
+	}
+
+	distance_index = 0;
+	for (i = 0;  i < max_numa_index; i++) {
+		for (j = 0; j < max_numa_index; j++) {
+			int nodeA = numa_id_index_table[i];
+			int nodeB = numa_id_index_table[j];
+
+			numa_distance_table[nodeA][nodeB] = numa_dist_table[distance_index++];
+			pr_debug("dist[%d][%d]=%d ", nodeA, nodeB, numa_distance_table[nodeA][nodeB]);
+		}
+	}
+	of_node_put(root);
 }
 
 static int __init find_primary_domain_index(void)
@@ -344,6 +449,9 @@ static int __init find_primary_domain_index(void)
 	 */
 	if (firmware_has_feature(FW_FEATURE_OPAL)) {
 		affinity_form = FORM1_AFFINITY;
+	} else if (firmware_has_feature(FW_FEATURE_FORM2_AFFINITY)) {
+		dbg("Using form 2 affinity\n");
+		affinity_form = FORM2_AFFINITY;
 	} else if (firmware_has_feature(FW_FEATURE_FORM1_AFFINITY)) {
 		dbg("Using form 1 affinity\n");
 		affinity_form = FORM1_AFFINITY;
@@ -388,9 +496,12 @@ static int __init find_primary_domain_index(void)
 
 		index = of_read_number(&distance_ref_points[1], 1);
 	} else {
+		/*
+		 * Both FORM1 and FORM2 affinity find the primary domain details
+		 * at the same offset.
+		 */
 		index = of_read_number(distance_ref_points, 1);
 	}
-
 	/*
 	 * Warn and cap if the hardware supports more than
 	 * MAX_DISTANCE_REF_POINTS domains.
@@ -819,6 +930,12 @@ static int __init parse_numa_properties(void)
 
 	dbg("NUMA associativity depth for CPU/Memory: %d\n", primary_domain_index);
 
+	/*
+	 * If it is FORM2 initialize the distance table here.
+	 */
+	if (affinity_form == FORM2_AFFINITY)
+		initialize_form2_numa_distance_lookup_table();
+
 	/*
 	 * Even though we connect cpus to numa domains later in SMP
 	 * init, we need to know the node ids now. This is because
diff --git a/arch/powerpc/platforms/pseries/firmware.c b/arch/powerpc/platforms/pseries/firmware.c
index 5d4c2bc20bba..f162156b7b68 100644
--- a/arch/powerpc/platforms/pseries/firmware.c
+++ b/arch/powerpc/platforms/pseries/firmware.c
@@ -123,6 +123,7 @@ vec5_fw_features_table[] = {
 	{FW_FEATURE_PRRN,		OV5_PRRN},
 	{FW_FEATURE_DRMEM_V2,		OV5_DRMEM_V2},
 	{FW_FEATURE_DRC_INFO,		OV5_DRC_INFO},
+	{FW_FEATURE_FORM2_AFFINITY,	OV5_FORM2_AFFINITY},
 };
 
 static void __init fw_vec5_feature_init(const char *vec5, unsigned long len)
-- 
2.31.1


^ permalink raw reply related

* [PATCH v2 2/2] powerpc: rename powerpc_debugfs_root to arch_debugfs_dir
From: Aneesh Kumar K.V @ 2021-08-12 13:28 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V
In-Reply-To: <20210812132831.233794-1-aneesh.kumar@linux.ibm.com>

No functional change in this patch. arch_debugfs_dir is the generic kernel
name declared in linux/debugfs.h for arch-specific debugfs directory.
Architectures like x86/s390 already use the name. Rename powerpc
specific powerpc_debugfs_root to arch_debugfs_dir.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/debugfs.h          | 13 -------------
 arch/powerpc/kernel/Makefile                |  3 ++-
 arch/powerpc/kernel/dawr.c                  |  3 +--
 arch/powerpc/kernel/eeh.c                   | 16 ++++++++--------
 arch/powerpc/kernel/eeh_cache.c             |  4 ++--
 arch/powerpc/kernel/fadump.c                |  4 ++--
 arch/powerpc/kernel/hw_breakpoint.c         |  1 -
 arch/powerpc/kernel/kdebugfs.c              | 14 ++++++++++++++
 arch/powerpc/kernel/security.c              | 16 ++++++++--------
 arch/powerpc/kernel/setup-common.c          | 13 -------------
 arch/powerpc/kernel/setup_64.c              |  1 -
 arch/powerpc/kernel/traps.c                 |  4 ++--
 arch/powerpc/kvm/book3s_xics.c              |  6 +++---
 arch/powerpc/kvm/book3s_xive.c              |  3 +--
 arch/powerpc/kvm/book3s_xive_native.c       |  3 +--
 arch/powerpc/mm/book3s64/hash_utils.c       |  4 ++--
 arch/powerpc/mm/book3s64/pgtable.c          |  4 ++--
 arch/powerpc/mm/book3s64/radix_tlb.c        |  6 +++---
 arch/powerpc/mm/ptdump/bats.c               |  4 ++--
 arch/powerpc/mm/ptdump/segment_regs.c       |  4 ++--
 arch/powerpc/platforms/cell/axon_msi.c      |  4 ++--
 arch/powerpc/platforms/powernv/memtrace.c   |  3 +--
 arch/powerpc/platforms/powernv/opal-imc.c   |  4 ++--
 arch/powerpc/platforms/powernv/opal-lpc.c   |  4 ++--
 arch/powerpc/platforms/powernv/opal-xscom.c |  4 ++--
 arch/powerpc/platforms/powernv/pci-ioda.c   |  4 ++--
 arch/powerpc/platforms/pseries/dtl.c        |  4 ++--
 arch/powerpc/platforms/pseries/lpar.c       |  5 +++--
 arch/powerpc/sysdev/xive/common.c           |  3 +--
 arch/powerpc/xmon/xmon.c                    |  6 +++---
 30 files changed, 75 insertions(+), 92 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/debugfs.h
 create mode 100644 arch/powerpc/kernel/kdebugfs.c

diff --git a/arch/powerpc/include/asm/debugfs.h b/arch/powerpc/include/asm/debugfs.h
deleted file mode 100644
index 2c5c48571d75..000000000000
--- a/arch/powerpc/include/asm/debugfs.h
+++ /dev/null
@@ -1,13 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-#ifndef _ASM_POWERPC_DEBUGFS_H
-#define _ASM_POWERPC_DEBUGFS_H
-
-/*
- * Copyright 2017, Michael Ellerman, IBM Corporation.
- */
-
-#include <linux/debugfs.h>
-
-extern struct dentry *powerpc_debugfs_root;
-
-#endif /* _ASM_POWERPC_DEBUGFS_H */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index f66b63e81c3b..7be36c1e1db6 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -46,7 +46,8 @@ obj-y				:= cputable.o syscalls.o \
 				   prom.o traps.o setup-common.o \
 				   udbg.o misc.o io.o misc_$(BITS).o \
 				   of_platform.o prom_parse.o firmware.o \
-				   hw_breakpoint_constraints.o interrupt.o
+				   hw_breakpoint_constraints.o interrupt.o \
+				   kdebugfs.o
 obj-y				+= ptrace/
 obj-$(CONFIG_PPC64)		+= setup_64.o \
 				   paca.o nvram_64.o note.o
diff --git a/arch/powerpc/kernel/dawr.c b/arch/powerpc/kernel/dawr.c
index cdc2dccb987d..64e423d2fe0f 100644
--- a/arch/powerpc/kernel/dawr.c
+++ b/arch/powerpc/kernel/dawr.c
@@ -9,7 +9,6 @@
 #include <linux/export.h>
 #include <linux/fs.h>
 #include <linux/debugfs.h>
-#include <asm/debugfs.h>
 #include <asm/machdep.h>
 #include <asm/hvcall.h>
 
@@ -101,7 +100,7 @@ static int __init dawr_force_setup(void)
 	if (PVR_VER(mfspr(SPRN_PVR)) == PVR_POWER9) {
 		/* Turn DAWR off by default, but allow admin to turn it on */
 		debugfs_create_file_unsafe("dawr_enable_dangerous", 0600,
-					   powerpc_debugfs_root,
+					   arch_debugfs_dir,
 					   &dawr_force_enable,
 					   &dawr_enable_fops);
 	}
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 3bbdcc86d01b..e9b597ed423c 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -21,9 +21,9 @@
 #include <linux/spinlock.h>
 #include <linux/export.h>
 #include <linux/of.h>
+#include <linux/debugfs.h>
 
 #include <linux/atomic.h>
-#include <asm/debugfs.h>
 #include <asm/eeh.h>
 #include <asm/eeh_event.h>
 #include <asm/io.h>
@@ -1901,24 +1901,24 @@ static int __init eeh_init_proc(void)
 		proc_create_single("powerpc/eeh", 0, NULL, proc_eeh_show);
 #ifdef CONFIG_DEBUG_FS
 		debugfs_create_file_unsafe("eeh_enable", 0600,
-					   powerpc_debugfs_root, NULL,
+					   arch_debugfs_dir, NULL,
 					   &eeh_enable_dbgfs_ops);
 		debugfs_create_u32("eeh_max_freezes", 0600,
-				powerpc_debugfs_root, &eeh_max_freezes);
+				arch_debugfs_dir, &eeh_max_freezes);
 		debugfs_create_bool("eeh_disable_recovery", 0600,
-				powerpc_debugfs_root,
+				arch_debugfs_dir,
 				&eeh_debugfs_no_recover);
 		debugfs_create_file_unsafe("eeh_dev_check", 0600,
-				powerpc_debugfs_root, NULL,
+				arch_debugfs_dir, NULL,
 				&eeh_dev_check_fops);
 		debugfs_create_file_unsafe("eeh_dev_break", 0600,
-				powerpc_debugfs_root, NULL,
+				arch_debugfs_dir, NULL,
 				&eeh_dev_break_fops);
 		debugfs_create_file_unsafe("eeh_force_recover", 0600,
-				powerpc_debugfs_root, NULL,
+				arch_debugfs_dir, NULL,
 				&eeh_force_recover_fops);
 		debugfs_create_file_unsafe("eeh_dev_can_recover", 0600,
-				powerpc_debugfs_root, NULL,
+				arch_debugfs_dir, NULL,
 				&eeh_dev_can_recover_fops);
 		eeh_cache_debugfs_init();
 #endif
diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
index bf3270426d82..9bdaaf7fddc9 100644
--- a/arch/powerpc/kernel/eeh_cache.c
+++ b/arch/powerpc/kernel/eeh_cache.c
@@ -12,8 +12,8 @@
 #include <linux/slab.h>
 #include <linux/spinlock.h>
 #include <linux/atomic.h>
+#include <linux/debugfs.h>
 #include <asm/pci-bridge.h>
-#include <asm/debugfs.h>
 #include <asm/ppc-pci.h>
 
 
@@ -283,6 +283,6 @@ DEFINE_SHOW_ATTRIBUTE(eeh_addr_cache);
 void eeh_cache_debugfs_init(void)
 {
 	debugfs_create_file_unsafe("eeh_address_cache", 0400,
-			powerpc_debugfs_root, NULL,
+			arch_debugfs_dir, NULL,
 			&eeh_addr_cache_fops);
 }
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index b990075285f5..b7ceb041743c 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -24,8 +24,8 @@
 #include <linux/slab.h>
 #include <linux/cma.h>
 #include <linux/hugetlb.h>
+#include <linux/debugfs.h>
 
-#include <asm/debugfs.h>
 #include <asm/page.h>
 #include <asm/prom.h>
 #include <asm/fadump.h>
@@ -1557,7 +1557,7 @@ static void fadump_init_files(void)
 		return;
 	}
 
-	debugfs_create_file("fadump_region", 0444, powerpc_debugfs_root, NULL,
+	debugfs_create_file("fadump_region", 0444, arch_debugfs_dir, NULL,
 			    &fadump_region_fops);
 
 	if (fw_dump.dump_active) {
diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c
index 21a638aff72f..91a3be14808b 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -22,7 +22,6 @@
 #include <asm/processor.h>
 #include <asm/sstep.h>
 #include <asm/debug.h>
-#include <asm/debugfs.h>
 #include <asm/hvcall.h>
 #include <asm/inst.h>
 #include <linux/uaccess.h>
diff --git a/arch/powerpc/kernel/kdebugfs.c b/arch/powerpc/kernel/kdebugfs.c
new file mode 100644
index 000000000000..36d3124d5a8b
--- /dev/null
+++ b/arch/powerpc/kernel/kdebugfs.c
@@ -0,0 +1,14 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/debugfs.h>
+#include <linux/export.h>
+#include <linux/init.h>
+
+struct dentry *arch_debugfs_dir;
+EXPORT_SYMBOL(arch_debugfs_dir);
+
+static int __init arch_kdebugfs_init(void)
+{
+	arch_debugfs_dir = debugfs_create_dir("powerpc", NULL);
+	return 0;
+}
+arch_initcall(arch_kdebugfs_init);
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index cc51fa52e783..1a998490fe60 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -11,10 +11,10 @@
 #include <linux/nospec.h>
 #include <linux/prctl.h>
 #include <linux/seq_buf.h>
+#include <linux/debugfs.h>
 
 #include <asm/asm-prototypes.h>
 #include <asm/code-patching.h>
-#include <asm/debugfs.h>
 #include <asm/security_features.h>
 #include <asm/setup.h>
 #include <asm/inst.h>
@@ -106,7 +106,7 @@ DEFINE_DEBUGFS_ATTRIBUTE(fops_barrier_nospec, barrier_nospec_get,
 static __init int barrier_nospec_debugfs_init(void)
 {
 	debugfs_create_file_unsafe("barrier_nospec", 0600,
-				   powerpc_debugfs_root, NULL,
+				   arch_debugfs_dir, NULL,
 				   &fops_barrier_nospec);
 	return 0;
 }
@@ -114,7 +114,7 @@ device_initcall(barrier_nospec_debugfs_init);
 
 static __init int security_feature_debugfs_init(void)
 {
-	debugfs_create_x64("security_features", 0400, powerpc_debugfs_root,
+	debugfs_create_x64("security_features", 0400, arch_debugfs_dir,
 			   &powerpc_security_features);
 	return 0;
 }
@@ -420,7 +420,7 @@ DEFINE_DEBUGFS_ATTRIBUTE(fops_stf_barrier, stf_barrier_get, stf_barrier_set,
 
 static __init int stf_barrier_debugfs_init(void)
 {
-	debugfs_create_file_unsafe("stf_barrier", 0600, powerpc_debugfs_root,
+	debugfs_create_file_unsafe("stf_barrier", 0600, arch_debugfs_dir,
 				   NULL, &fops_stf_barrier);
 	return 0;
 }
@@ -748,7 +748,7 @@ DEFINE_DEBUGFS_ATTRIBUTE(fops_count_cache_flush, count_cache_flush_get,
 static __init int count_cache_flush_debugfs_init(void)
 {
 	debugfs_create_file_unsafe("count_cache_flush", 0600,
-				   powerpc_debugfs_root, NULL,
+				   arch_debugfs_dir, NULL,
 				   &fops_count_cache_flush);
 	return 0;
 }
@@ -834,9 +834,9 @@ DEFINE_SIMPLE_ATTRIBUTE(fops_uaccess_flush, uaccess_flush_get, uaccess_flush_set
 
 static __init int rfi_flush_debugfs_init(void)
 {
-	debugfs_create_file("rfi_flush", 0600, powerpc_debugfs_root, NULL, &fops_rfi_flush);
-	debugfs_create_file("entry_flush", 0600, powerpc_debugfs_root, NULL, &fops_entry_flush);
-	debugfs_create_file("uaccess_flush", 0600, powerpc_debugfs_root, NULL, &fops_uaccess_flush);
+	debugfs_create_file("rfi_flush", 0600, arch_debugfs_dir, NULL, &fops_rfi_flush);
+	debugfs_create_file("entry_flush", 0600, arch_debugfs_dir, NULL, &fops_entry_flush);
+	debugfs_create_file("uaccess_flush", 0600, arch_debugfs_dir, NULL, &fops_uaccess_flush);
 	return 0;
 }
 device_initcall(rfi_flush_debugfs_init);
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index aa9c2d01424a..b1e43b69a559 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -33,7 +33,6 @@
 #include <linux/of_platform.h>
 #include <linux/hugetlb.h>
 #include <linux/pgtable.h>
-#include <asm/debugfs.h>
 #include <asm/io.h>
 #include <asm/paca.h>
 #include <asm/prom.h>
@@ -773,18 +772,6 @@ static int __init check_cache_coherency(void)
 late_initcall(check_cache_coherency);
 #endif /* CONFIG_CHECK_CACHE_COHERENCY */
 
-#ifdef CONFIG_DEBUG_FS
-struct dentry *powerpc_debugfs_root;
-EXPORT_SYMBOL(powerpc_debugfs_root);
-
-static int powerpc_debugfs_init(void)
-{
-	powerpc_debugfs_root = debugfs_create_dir("powerpc", NULL);
-	return 0;
-}
-arch_initcall(powerpc_debugfs_init);
-#endif
-
 void ppc_printk_progress(char *s, unsigned short hex)
 {
 	pr_info("%s\n", s);
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 1ff258f6c76c..eaa79a0996d1 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -32,7 +32,6 @@
 #include <linux/nmi.h>
 #include <linux/pgtable.h>
 
-#include <asm/debugfs.h>
 #include <asm/kvm_guest.h>
 #include <asm/io.h>
 #include <asm/kdump.h>
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index dfbce527c98e..c8f648727d36 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -37,10 +37,10 @@
 #include <linux/smp.h>
 #include <linux/console.h>
 #include <linux/kmsg_dump.h>
+#include <linux/debugfs.h>
 
 #include <asm/emulated_ops.h>
 #include <linux/uaccess.h>
-#include <asm/debugfs.h>
 #include <asm/interrupt.h>
 #include <asm/io.h>
 #include <asm/machdep.h>
@@ -2267,7 +2267,7 @@ static int __init ppc_warn_emulated_init(void)
 	struct ppc_emulated_entry *entries = (void *)&ppc_emulated;
 
 	dir = debugfs_create_dir("emulated_instructions",
-				 powerpc_debugfs_root);
+				 arch_debugfs_dir);
 
 	debugfs_create_u32("do_warn", 0644, dir, &ppc_warn_emulated);
 
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 303e3cb096db..ebd5d920de8c 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -10,13 +10,13 @@
 #include <linux/gfp.h>
 #include <linux/anon_inodes.h>
 #include <linux/spinlock.h>
-
+#include <linux/debugfs.h>
 #include <linux/uaccess.h>
+
 #include <asm/kvm_book3s.h>
 #include <asm/kvm_ppc.h>
 #include <asm/hvcall.h>
 #include <asm/xics.h>
-#include <asm/debugfs.h>
 #include <asm/time.h>
 
 #include <linux/seq_file.h>
@@ -1024,7 +1024,7 @@ static void xics_debugfs_init(struct kvmppc_xics *xics)
 		return;
 	}
 
-	xics->dentry = debugfs_create_file(name, 0444, powerpc_debugfs_root,
+	xics->dentry = debugfs_create_file(name, 0444, arch_debugfs_dir,
 					   xics, &xics_debug_fops);
 
 	pr_debug("%s: created %s\n", __func__, name);
diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index 8cfab3547494..19adbb1dda04 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -22,7 +22,6 @@
 #include <asm/xive.h>
 #include <asm/xive-regs.h>
 #include <asm/debug.h>
-#include <asm/debugfs.h>
 #include <asm/time.h>
 #include <asm/opal.h>
 
@@ -2329,7 +2328,7 @@ static void xive_debugfs_init(struct kvmppc_xive *xive)
 		return;
 	}
 
-	xive->dentry = debugfs_create_file(name, S_IRUGO, powerpc_debugfs_root,
+	xive->dentry = debugfs_create_file(name, S_IRUGO, arch_debugfs_dir,
 					   xive, &xive_debug_fops);
 
 	pr_debug("%s: created %s\n", __func__, name);
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index 573ecaab3597..4ce800c7454f 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -20,7 +20,6 @@
 #include <asm/xive.h>
 #include <asm/xive-regs.h>
 #include <asm/debug.h>
-#include <asm/debugfs.h>
 #include <asm/opal.h>
 
 #include <linux/debugfs.h>
@@ -1257,7 +1256,7 @@ static void xive_native_debugfs_init(struct kvmppc_xive *xive)
 		return;
 	}
 
-	xive->dentry = debugfs_create_file(name, 0444, powerpc_debugfs_root,
+	xive->dentry = debugfs_create_file(name, 0444, arch_debugfs_dir,
 					   xive, &xive_native_debug_fops);
 
 	pr_debug("%s: created %s\n", __func__, name);
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index ac5720371c0d..c145776d3ae5 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -36,8 +36,8 @@
 #include <linux/hugetlb.h>
 #include <linux/cpu.h>
 #include <linux/pgtable.h>
+#include <linux/debugfs.h>
 
-#include <asm/debugfs.h>
 #include <asm/interrupt.h>
 #include <asm/processor.h>
 #include <asm/mmu.h>
@@ -2072,7 +2072,7 @@ DEFINE_DEBUGFS_ATTRIBUTE(fops_hpt_order, hpt_order_get, hpt_order_set, "%llu\n")
 
 static int __init hash64_debugfs(void)
 {
-	debugfs_create_file("hpt_order", 0600, powerpc_debugfs_root, NULL,
+	debugfs_create_file("hpt_order", 0600, arch_debugfs_dir, NULL,
 			    &fops_hpt_order);
 	return 0;
 }
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index 9ffa65074cb0..7011a8f16d7a 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -6,9 +6,9 @@
 #include <linux/sched.h>
 #include <linux/mm_types.h>
 #include <linux/memblock.h>
+#include <linux/debugfs.h>
 #include <misc/cxl-base.h>
 
-#include <asm/debugfs.h>
 #include <asm/pgalloc.h>
 #include <asm/tlb.h>
 #include <asm/trace.h>
@@ -520,7 +520,7 @@ static int __init pgtable_debugfs_setup(void)
 	 * invalidated as expected.
 	 */
 	debugfs_create_bool("tlbie_enabled", 0600,
-			powerpc_debugfs_root,
+			arch_debugfs_dir,
 			&tlbie_enabled);
 
 	return 0;
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index 1fa2bc6a969e..7724af19ed7e 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -10,6 +10,7 @@
 #include <linux/memblock.h>
 #include <linux/mmu_context.h>
 #include <linux/sched/mm.h>
+#include <linux/debugfs.h>
 
 #include <asm/ppc-opcode.h>
 #include <asm/tlb.h>
@@ -17,7 +18,6 @@
 #include <asm/trace.h>
 #include <asm/cputhreads.h>
 #include <asm/plpar_wrappers.h>
-#include <asm/debugfs.h>
 
 #include "internal.h"
 
@@ -1529,9 +1529,9 @@ EXPORT_SYMBOL_GPL(do_h_rpt_invalidate_prt);
 static int __init create_tlb_single_page_flush_ceiling(void)
 {
 	debugfs_create_u32("tlb_single_page_flush_ceiling", 0600,
-			   powerpc_debugfs_root, &tlb_single_page_flush_ceiling);
+			   arch_debugfs_dir, &tlb_single_page_flush_ceiling);
 	debugfs_create_u32("tlb_local_single_page_flush_ceiling", 0600,
-			   powerpc_debugfs_root, &tlb_local_single_page_flush_ceiling);
+			   arch_debugfs_dir, &tlb_local_single_page_flush_ceiling);
 	return 0;
 }
 late_initcall(create_tlb_single_page_flush_ceiling);
diff --git a/arch/powerpc/mm/ptdump/bats.c b/arch/powerpc/mm/ptdump/bats.c
index c4c628b03cf8..8bf7383fb26c 100644
--- a/arch/powerpc/mm/ptdump/bats.c
+++ b/arch/powerpc/mm/ptdump/bats.c
@@ -7,7 +7,7 @@
  */
 
 #include <linux/pgtable.h>
-#include <asm/debugfs.h>
+#include <linux/debugfs.h>
 #include <asm/cpu_has_feature.h>
 
 #include "ptdump.h"
@@ -103,7 +103,7 @@ static const struct file_operations bats_fops = {
 static int __init bats_init(void)
 {
 	debugfs_create_file("block_address_translation", 0400,
-			    powerpc_debugfs_root, NULL, &bats_fops);
+			    arch_debugfs_dir, NULL, &bats_fops);
 	return 0;
 }
 device_initcall(bats_init);
diff --git a/arch/powerpc/mm/ptdump/segment_regs.c b/arch/powerpc/mm/ptdump/segment_regs.c
index 565048a0c9be..9223dfb85c51 100644
--- a/arch/powerpc/mm/ptdump/segment_regs.c
+++ b/arch/powerpc/mm/ptdump/segment_regs.c
@@ -6,7 +6,7 @@
  * This dumps the content of Segment Registers
  */
 
-#include <asm/debugfs.h>
+#include <linux/debugfs.h>
 
 static void seg_show(struct seq_file *m, int i)
 {
@@ -55,7 +55,7 @@ static const struct file_operations sr_fops = {
 
 static int __init sr_init(void)
 {
-	debugfs_create_file("segment_registers", 0400, powerpc_debugfs_root,
+	debugfs_create_file("segment_registers", 0400, arch_debugfs_dir,
 			    NULL, &sr_fops);
 	return 0;
 }
diff --git a/arch/powerpc/platforms/cell/axon_msi.c b/arch/powerpc/platforms/cell/axon_msi.c
index ca2555b8a0c2..82335e364c44 100644
--- a/arch/powerpc/platforms/cell/axon_msi.c
+++ b/arch/powerpc/platforms/cell/axon_msi.c
@@ -12,8 +12,8 @@
 #include <linux/export.h>
 #include <linux/of_platform.h>
 #include <linux/slab.h>
+#include <linux/debugfs.h>
 
-#include <asm/debugfs.h>
 #include <asm/dcr.h>
 #include <asm/machdep.h>
 #include <asm/prom.h>
@@ -480,6 +480,6 @@ void axon_msi_debug_setup(struct device_node *dn, struct axon_msic *msic)
 
 	snprintf(name, sizeof(name), "msic_%d", of_node_to_nid(dn));
 
-	debugfs_create_file(name, 0600, powerpc_debugfs_root, msic, &fops_msic);
+	debugfs_create_file(name, 0600, arch_debugfs_dir, msic, &fops_msic);
 }
 #endif /* DEBUG */
diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index 537a4daed614..877720c64515 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -18,7 +18,6 @@
 #include <linux/memory_hotplug.h>
 #include <linux/numa.h>
 #include <asm/machdep.h>
-#include <asm/debugfs.h>
 #include <asm/cacheflush.h>
 
 /* This enables us to keep track of the memory removed from each node. */
@@ -330,7 +329,7 @@ DEFINE_SIMPLE_ATTRIBUTE(memtrace_init_fops, memtrace_enable_get,
 static int memtrace_init(void)
 {
 	memtrace_debugfs_dir = debugfs_create_dir("memtrace",
-						  powerpc_debugfs_root);
+						  arch_debugfs_dir);
 
 	debugfs_create_file("enable", 0600, memtrace_debugfs_dir,
 			    NULL, &memtrace_init_fops);
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
index 7824cc364bc4..63e4738baf33 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -13,11 +13,11 @@
 #include <linux/of_address.h>
 #include <linux/of_platform.h>
 #include <linux/crash_dump.h>
+#include <linux/debugfs.h>
 #include <asm/opal.h>
 #include <asm/io.h>
 #include <asm/imc-pmu.h>
 #include <asm/cputhreads.h>
-#include <asm/debugfs.h>
 
 static struct dentry *imc_debugfs_parent;
 
@@ -56,7 +56,7 @@ static void export_imc_mode_and_cmd(struct device_node *node,
 	u32 cb_offset;
 	struct imc_mem_info *ptr = pmu_ptr->mem_info;
 
-	imc_debugfs_parent = debugfs_create_dir("imc", powerpc_debugfs_root);
+	imc_debugfs_parent = debugfs_create_dir("imc", arch_debugfs_dir);
 
 	if (of_property_read_u32(node, "cb_offset", &cb_offset))
 		cb_offset = IMC_CNTL_BLK_OFFSET;
diff --git a/arch/powerpc/platforms/powernv/opal-lpc.c b/arch/powerpc/platforms/powernv/opal-lpc.c
index 608569082ba0..1e5d51db40f8 100644
--- a/arch/powerpc/platforms/powernv/opal-lpc.c
+++ b/arch/powerpc/platforms/powernv/opal-lpc.c
@@ -10,13 +10,13 @@
 #include <linux/bug.h>
 #include <linux/io.h>
 #include <linux/slab.h>
+#include <linux/debugfs.h>
 
 #include <asm/machdep.h>
 #include <asm/firmware.h>
 #include <asm/opal.h>
 #include <asm/prom.h>
 #include <linux/uaccess.h>
-#include <asm/debugfs.h>
 #include <asm/isa-bridge.h>
 
 static int opal_lpc_chip_id = -1;
@@ -371,7 +371,7 @@ static int opal_lpc_init_debugfs(void)
 	if (opal_lpc_chip_id < 0)
 		return -ENODEV;
 
-	root = debugfs_create_dir("lpc", powerpc_debugfs_root);
+	root = debugfs_create_dir("lpc", arch_debugfs_dir);
 
 	rc |= opal_lpc_debugfs_create_type(root, "io", OPAL_LPC_IO);
 	rc |= opal_lpc_debugfs_create_type(root, "mem", OPAL_LPC_MEM);
diff --git a/arch/powerpc/platforms/powernv/opal-xscom.c b/arch/powerpc/platforms/powernv/opal-xscom.c
index fd510d961b8c..6b4eed2ef4fa 100644
--- a/arch/powerpc/platforms/powernv/opal-xscom.c
+++ b/arch/powerpc/platforms/powernv/opal-xscom.c
@@ -14,11 +14,11 @@
 #include <linux/gfp.h>
 #include <linux/slab.h>
 #include <linux/uaccess.h>
+#include <linux/debugfs.h>
 
 #include <asm/machdep.h>
 #include <asm/firmware.h>
 #include <asm/opal.h>
-#include <asm/debugfs.h>
 #include <asm/prom.h>
 
 static u64 opal_scom_unmangle(u64 addr)
@@ -189,7 +189,7 @@ static int scom_debug_init(void)
 	if (!firmware_has_feature(FW_FEATURE_OPAL))
 		return 0;
 
-	root = debugfs_create_dir("scom", powerpc_debugfs_root);
+	root = debugfs_create_dir("scom", arch_debugfs_dir);
 	if (!root)
 		return -1;
 
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 7de464679292..eb4897f499ab 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -20,6 +20,7 @@
 #include <linux/iommu.h>
 #include <linux/rculist.h>
 #include <linux/sizes.h>
+#include <linux/debugfs.h>
 
 #include <asm/sections.h>
 #include <asm/io.h>
@@ -32,7 +33,6 @@
 #include <asm/iommu.h>
 #include <asm/tce.h>
 #include <asm/xics.h>
-#include <asm/debugfs.h>
 #include <asm/firmware.h>
 #include <asm/pnv-pci.h>
 #include <asm/mmzone.h>
@@ -2259,7 +2259,7 @@ static void pnv_pci_ioda_create_dbgfs(void)
 		phb = hose->private_data;
 
 		sprintf(name, "PCI%04x", hose->global_number);
-		phb->dbgfs = debugfs_create_dir(name, powerpc_debugfs_root);
+		phb->dbgfs = debugfs_create_dir(name, arch_debugfs_dir);
 
 		debugfs_create_file_unsafe("dump_diag_regs", 0200, phb->dbgfs,
 					   phb, &pnv_pci_diag_data_fops);
diff --git a/arch/powerpc/platforms/pseries/dtl.c b/arch/powerpc/platforms/pseries/dtl.c
index 982f069e4c31..352af5b14a0f 100644
--- a/arch/powerpc/platforms/pseries/dtl.c
+++ b/arch/powerpc/platforms/pseries/dtl.c
@@ -11,10 +11,10 @@
 #include <linux/spinlock.h>
 #include <asm/smp.h>
 #include <linux/uaccess.h>
+#include <linux/debugfs.h>
 #include <asm/firmware.h>
 #include <asm/dtl.h>
 #include <asm/lppaca.h>
-#include <asm/debugfs.h>
 #include <asm/plpar_wrappers.h>
 #include <asm/machdep.h>
 
@@ -338,7 +338,7 @@ static int dtl_init(void)
 
 	/* set up common debugfs structure */
 
-	dtl_dir = debugfs_create_dir("dtl", powerpc_debugfs_root);
+	dtl_dir = debugfs_create_dir("dtl", arch_debugfs_dir);
 
 	debugfs_create_x8("dtl_event_mask", 0600, dtl_dir, &dtl_event_mask);
 	debugfs_create_u32("dtl_buf_entries", 0400, dtl_dir, &dtl_buf_entries);
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index dab356e3ff87..2c1d66b5fde3 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -22,6 +22,8 @@
 #include <linux/workqueue.h>
 #include <linux/proc_fs.h>
 #include <linux/pgtable.h>
+#include <linux/debugfs.h>
+
 #include <asm/processor.h>
 #include <asm/mmu.h>
 #include <asm/page.h>
@@ -39,7 +41,6 @@
 #include <asm/kexec.h>
 #include <asm/fadump.h>
 #include <asm/asm-prototypes.h>
-#include <asm/debugfs.h>
 #include <asm/dtl.h>
 
 #include "pseries.h"
@@ -2016,7 +2017,7 @@ static int __init vpa_debugfs_init(void)
 	if (!firmware_has_feature(FW_FEATURE_SPLPAR))
 		return 0;
 
-	vpa_dir = debugfs_create_dir("vpa", powerpc_debugfs_root);
+	vpa_dir = debugfs_create_dir("vpa", arch_debugfs_dir);
 
 	/* set up the per-cpu vpa file*/
 	for_each_possible_cpu(i) {
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index dbdbbc2f1dc5..f85915ca0f98 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -21,7 +21,6 @@
 #include <linux/msi.h>
 #include <linux/vmalloc.h>
 
-#include <asm/debugfs.h>
 #include <asm/prom.h>
 #include <asm/io.h>
 #include <asm/smp.h>
@@ -1716,7 +1715,7 @@ DEFINE_SHOW_ATTRIBUTE(xive_core_debug);
 int xive_core_debug_init(void)
 {
 	if (xive_enabled())
-		debugfs_create_file("xive", 0400, powerpc_debugfs_root,
+		debugfs_create_file("xive", 0400, arch_debugfs_dir,
 				    NULL, &xive_core_debug_fops);
 	return 0;
 }
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index da4d7f225a40..ead460b80905 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -26,8 +26,8 @@
 #include <linux/ctype.h>
 #include <linux/highmem.h>
 #include <linux/security.h>
+#include <linux/debugfs.h>
 
-#include <asm/debugfs.h>
 #include <asm/ptrace.h>
 #include <asm/smp.h>
 #include <asm/string.h>
@@ -4077,8 +4077,8 @@ DEFINE_SIMPLE_ATTRIBUTE(xmon_dbgfs_ops, xmon_dbgfs_get,
 
 static int __init setup_xmon_dbgfs(void)
 {
-	debugfs_create_file("xmon", 0600, powerpc_debugfs_root, NULL,
-				&xmon_dbgfs_ops);
+	debugfs_create_file("xmon", 0600, arch_debugfs_dir, NULL,
+			    &xmon_dbgfs_ops);
 	return 0;
 }
 device_initcall(setup_xmon_dbgfs);
-- 
2.31.1


^ permalink raw reply related

* [PATCH v2 1/2] powerpc/book3s64/radix: make tlb_single_page_flush_ceiling a debugfs entry
From: Aneesh Kumar K.V @ 2021-08-12 13:28 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V

Similar to x86/s390 add a debugfs file to tune tlb_single_page_flush_ceiling.
Also add a debugfs entry for tlb_local_single_page_flush_ceiling.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
Changes from v1:
* switch to debugfs_create_u32

 arch/powerpc/mm/book3s64/radix_tlb.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index aefc100d79a7..1fa2bc6a969e 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -17,6 +17,7 @@
 #include <asm/trace.h>
 #include <asm/cputhreads.h>
 #include <asm/plpar_wrappers.h>
+#include <asm/debugfs.h>
 
 #include "internal.h"
 
@@ -1106,8 +1107,8 @@ EXPORT_SYMBOL(radix__flush_tlb_kernel_range);
  * invalidating a full PID, so it has a far lower threshold to change from
  * individual page flushes to full-pid flushes.
  */
-static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
-static unsigned long tlb_local_single_page_flush_ceiling __read_mostly = POWER9_TLB_SETS_RADIX * 2;
+static u32 tlb_single_page_flush_ceiling __read_mostly = 33;
+static u32 tlb_local_single_page_flush_ceiling __read_mostly = POWER9_TLB_SETS_RADIX * 2;
 
 static inline void __radix__flush_tlb_range(struct mm_struct *mm,
 					    unsigned long start, unsigned long end)
@@ -1524,3 +1525,14 @@ void do_h_rpt_invalidate_prt(unsigned long pid, unsigned long lpid,
 EXPORT_SYMBOL_GPL(do_h_rpt_invalidate_prt);
 
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
+
+static int __init create_tlb_single_page_flush_ceiling(void)
+{
+	debugfs_create_u32("tlb_single_page_flush_ceiling", 0600,
+			   powerpc_debugfs_root, &tlb_single_page_flush_ceiling);
+	debugfs_create_u32("tlb_local_single_page_flush_ceiling", 0600,
+			   powerpc_debugfs_root, &tlb_local_single_page_flush_ceiling);
+	return 0;
+}
+late_initcall(create_tlb_single_page_flush_ceiling);
+
-- 
2.31.1


^ permalink raw reply related

* Re: [PATCH v2 1/2] KVM: Refactor kvm_arch_vcpu_fault() to return a struct page pointer
From: Paolo Bonzini @ 2021-08-12 15:41 UTC (permalink / raw)
  To: David Hildenbrand, Hou Wenlong, kvm
  Cc: x86, Wanpeng Li, linux-mips, H. Peter Anvin, Claudio Imbrenda,
	Will Deacon, kvmarm, linux-s390, Janosch Frank, Marc Zyngier,
	Joerg Roedel, Huacai Chen, Christian Borntraeger,
	Aleksandar Markovic, Ingo Molnar, Catalin Marinas, Vasily Gorbik,
	Suzuki K Poulose, Heiko Carstens, kvm-ppc, Borislav Petkov,
	Thomas Gleixner, Alexandru Elisei, linux-arm-kernel, Jim Mattson,
	Thomas Bogendoerfer, Sean Christopherson, Cornelia Huck,
	linux-kernel, James Morse, Vitaly Kuznetsov, linuxppc-dev
In-Reply-To: <98adbd3c-ec6f-5689-1686-2a8a7909951a@redhat.com>

On 12/08/21 11:04, David Hildenbrand wrote:
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>
> 
> But at the same time I wonder if we should just get rid of 
> CONFIG_KVM_S390_UCONTROL and consequently kvm_arch_vcpu_fault().
> 
> In practice CONFIG_KVM_S390_UCONTROL, is never enabled in any reasonable 
> kernel build and consequently it's never tested; further, exposing the 
> sie_block to user space allows user space to generate random SIE 
> validity intercepts.
> 
> CONFIG_KVM_S390_UCONTROL feels like something that should just be 
> maintained out of tree by someone who really needs to hack deep into hw 
> virtualization for testing purposes etc.

I have no preference either way.  It should definitely have selftests, 
but in x86 land there are some features that are not covered by QEMU and 
were nevertheless accepted upstream with selftests.

Paolo


^ permalink raw reply

* Re: [PATCH v2 5/9] powerpc/microwatt: Use standard 16550 UART for console
From: Segher Boessenkool @ 2021-08-12 16:09 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: linuxppc-dev
In-Reply-To: <ec000575-7f7a-57b7-50d2-9444e17f2a7e@csgroup.eu>

On Thu, Aug 12, 2021 at 03:14:44PM +0200, Christophe Leroy wrote:
> Le 18/06/2021 à 05:46, Paul Mackerras a écrit :
> >+static u8 udbg_uart_in_isa300_rm(unsigned int reg)
> >+{
> >+	uint64_t msr = mfmsr();
> >+	uint8_t  c;
> >+
> >+	mtmsr(msr & ~(MSR_EE|MSR_DR));
> >+	isync();
> >+	eieio();
> >+	c = __raw_rm_readb(UDBG_UART_MW_ADDR + (reg << 2));
> >+	mtmsr(msr);
> >+	isync();
> >+	return c;
> >+}
> 
> How do you make sure that GCC won't emit any access to the stack between 
> the two mtmsr() ?

The mtmsr are asm with a memory clobber so nothing will be moved between
these, and it is very unlikely anything will sprout up here out of
nothing.  But yes, this whole thing should be written as real asm (or as
one huge inline asm, but ugh).


Segher

^ permalink raw reply

* [PATCH] ppc: add "-z notext" flag to disable diagnostic
From: Bill Wendling @ 2021-08-12 20:49 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Nathan Chancellor, Nick Desaulniers, linuxppc-dev, linux-kernel,
	clang-built-linux
  Cc: Bill Wendling

The "-z notext" flag disables reporting an error if DT_TEXTREL is set on
PPC with CONFIG=kdump:

  ld.lld: error: can't create dynamic relocation R_PPC64_ADDR64 against
    local symbol in readonly segment; recompile object files with -fPIC
    or pass '-Wl,-z,notext' to allow text relocations in the output
  >>> defined in built-in.a(arch/powerpc/kernel/misc.o)
  >>> referenced by arch/powerpc/kernel/misc.o:(.text+0x20) in archive
      built-in.a

The BFD linker disables this by default (though it's configurable in
current versions). LLD enables this by default. So we add the flag to
keep LLD from emitting the error.

Signed-off-by: Bill Wendling <morbo@google.com>

---
The output of the "get_maintainer.pl" run just in case no one believes me. :-)

$ ./scripts/get_maintainer.pl arch/powerpc/Makefile
Michael Ellerman <mpe@ellerman.id.au> (supporter:LINUX FOR POWERPC (32-BIT AND 64-BIT))
Benjamin Herrenschmidt <benh@kernel.crashing.org> (reviewer:LINUX FOR POWERPC (32-BIT AND 64-BIT))
Paul Mackerras <paulus@samba.org> (reviewer:LINUX FOR POWERPC (32-BIT AND 64-BIT))
Nathan Chancellor <nathan@kernel.org> (supporter:CLANG/LLVM BUILD SUPPORT)
Nick Desaulniers <ndesaulniers@google.com> (supporter:CLANG/LLVM BUILD SUPPORT)
linuxppc-dev@lists.ozlabs.org (open list:LINUX FOR POWERPC (32-BIT AND 64-BIT))
linux-kernel@vger.kernel.org (open list)
clang-built-linux@googlegroups.com (open list:CLANG/LLVM BUILD SUPPORT)
---
 arch/powerpc/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 6505d66f1193..17a9fbf9b789 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -122,6 +122,7 @@ endif
 
 LDFLAGS_vmlinux-y := -Bstatic
 LDFLAGS_vmlinux-$(CONFIG_RELOCATABLE) := -pie
+LDFLAGS_vmlinux-$(CONFIG_RELOCATABLE) += -z notext
 LDFLAGS_vmlinux	:= $(LDFLAGS_vmlinux-y)
 
 ifdef CONFIG_PPC64
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related

* Re: [PATCH] ppc: add "-z notext" flag to disable diagnostic
From: Nick Desaulniers @ 2021-08-12 20:53 UTC (permalink / raw)
  To: Bill Wendling
  Cc: itaru.kitayama, linux-kernel, Nathan Chancellor,
	clang-built-linux, Paul Mackerras, linuxppc-dev
In-Reply-To: <20210812204951.1551782-1-morbo@google.com>

On Thu, Aug 12, 2021 at 1:49 PM Bill Wendling <morbo@google.com> wrote:
>
> The "-z notext" flag disables reporting an error if DT_TEXTREL is set on
> PPC with CONFIG=kdump:
>
>   ld.lld: error: can't create dynamic relocation R_PPC64_ADDR64 against
>     local symbol in readonly segment; recompile object files with -fPIC
>     or pass '-Wl,-z,notext' to allow text relocations in the output
>   >>> defined in built-in.a(arch/powerpc/kernel/misc.o)
>   >>> referenced by arch/powerpc/kernel/misc.o:(.text+0x20) in archive
>       built-in.a
>
> The BFD linker disables this by default (though it's configurable in
> current versions). LLD enables this by default. So we add the flag to
> keep LLD from emitting the error.
>
> Signed-off-by: Bill Wendling <morbo@google.com>

Link: https://github.com/ClangBuiltLinux/linux/issues/811
Reported-by: Itaru Kitayama <itaru.kitayama@riken.jp>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

>
> ---
> The output of the "get_maintainer.pl" run just in case no one believes me. :-)

LOL!

>
> $ ./scripts/get_maintainer.pl arch/powerpc/Makefile
> Michael Ellerman <mpe@ellerman.id.au> (supporter:LINUX FOR POWERPC (32-BIT AND 64-BIT))
> Benjamin Herrenschmidt <benh@kernel.crashing.org> (reviewer:LINUX FOR POWERPC (32-BIT AND 64-BIT))
> Paul Mackerras <paulus@samba.org> (reviewer:LINUX FOR POWERPC (32-BIT AND 64-BIT))
> Nathan Chancellor <nathan@kernel.org> (supporter:CLANG/LLVM BUILD SUPPORT)
> Nick Desaulniers <ndesaulniers@google.com> (supporter:CLANG/LLVM BUILD SUPPORT)
> linuxppc-dev@lists.ozlabs.org (open list:LINUX FOR POWERPC (32-BIT AND 64-BIT))
> linux-kernel@vger.kernel.org (open list)
> clang-built-linux@googlegroups.com (open list:CLANG/LLVM BUILD SUPPORT)
> ---
>  arch/powerpc/Makefile | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index 6505d66f1193..17a9fbf9b789 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -122,6 +122,7 @@ endif
>
>  LDFLAGS_vmlinux-y := -Bstatic
>  LDFLAGS_vmlinux-$(CONFIG_RELOCATABLE) := -pie
> +LDFLAGS_vmlinux-$(CONFIG_RELOCATABLE) += -z notext
>  LDFLAGS_vmlinux        := $(LDFLAGS_vmlinux-y)
>
>  ifdef CONFIG_PPC64
> --
> 2.33.0.rc1.237.g0d66db33f3-goog
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply

* Re: [PATCH -next] trap: Cleanup trap_init()
From: Vineet Gupta @ 2021-08-12 18:57 UTC (permalink / raw)
  To: Kefeng Wang, linux-snps-arc, linux-kernel, linux-arm-kernel,
	uclinux-h8-devel, linux-hexagon, openrisc, linux-parisc,
	linuxppc-dev, linux-riscv, linux-um, linux-mm
  Cc: Jonas Bonn, Andrew Morton, Yoshinori Sato, Helge Deller,
	Paul Walmsley, Russell King, Ley Foon Tan, Stefan Kristiansson,
	James E . J . Bottomley, Richard Weinberger, Paul Mackerras,
	Vineet Gupta, Stafford Horne, Jeff Dike, Anton Ivanov
In-Reply-To: <20210812123602.76356-1-wangkefeng.wang@huawei.com>

On 8/12/21 5:36 AM, Kefeng Wang wrote:
> There are some empty trap_init() in different ARCHs, introduce
> a new weak trap_init() function to cleanup them.
>
> Cc: Vineet Gupta<vgupta@kernel.org>
> Cc: Russell King<linux@armlinux.org.uk>
> Cc: Yoshinori Sato<ysato@users.sourceforge.jp>
> Cc: Ley Foon Tan<ley.foon.tan@intel.com>
> Cc: Jonas Bonn<jonas@southpole.se>
> Cc: Stefan Kristiansson<stefan.kristiansson@saunalahti.fi>
> Cc: Stafford Horne<shorne@gmail.com>
> Cc: James E.J. Bottomley<James.Bottomley@HansenPartnership.com>
> Cc: Helge Deller<deller@gmx.de>
> Cc: Michael Ellerman<mpe@ellerman.id.au>
> Cc: Benjamin Herrenschmidt<benh@kernel.crashing.org>
> Cc: Paul Mackerras<paulus@samba.org>
> Cc: Paul Walmsley<palmerdabbelt@google.com>
> Cc: Jeff Dike<jdike@addtoit.com>
> Cc: Richard Weinberger<richard@nod.at>
> Cc: Anton Ivanov<anton.ivanov@cambridgegreys.com>
> Cc: Andrew Morton<akpm@linux-foundation.org>
> Signed-off-by: Kefeng Wang<wangkefeng.wang@huawei.com>
> ---
>   arch/arc/kernel/traps.c      | 5 -----

Acked-by: Vineet Gupta <vgupt@kernel.org>  #arch/arc

^ permalink raw reply

* Re: [PATCH -next] trap: Cleanup trap_init()
From: Michael Ellerman @ 2021-08-13  2:27 UTC (permalink / raw)
  To: Kefeng Wang, linux-snps-arc, linux-kernel, linux-arm-kernel,
	uclinux-h8-devel, linux-hexagon, openrisc, linux-parisc,
	linuxppc-dev, linux-riscv, linux-um, linux-mm
  Cc: Jonas Bonn, Kefeng Wang, Andrew Morton, Yoshinori Sato,
	Helge Deller, Paul Walmsley, Russell King, Ley Foon Tan,
	Stefan Kristiansson, James E . J . Bottomley, Richard Weinberger,
	Paul Mackerras, Vineet Gupta, Stafford Horne, Jeff Dike,
	Anton Ivanov
In-Reply-To: <20210812123602.76356-1-wangkefeng.wang@huawei.com>

Kefeng Wang <wangkefeng.wang@huawei.com> writes:
> There are some empty trap_init() in different ARCHs, introduce
> a new weak trap_init() function to cleanup them.
>
> Cc: Vineet Gupta <vgupta@kernel.org>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
> Cc: Ley Foon Tan <ley.foon.tan@intel.com>
> Cc: Jonas Bonn <jonas@southpole.se>
> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
> Cc: Stafford Horne <shorne@gmail.com>
> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
> Cc: Helge Deller <deller@gmx.de>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Paul Walmsley <palmerdabbelt@google.com>
> Cc: Jeff Dike <jdike@addtoit.com>
> Cc: Richard Weinberger <richard@nod.at>
> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>  arch/arc/kernel/traps.c      | 5 -----
>  arch/arm/kernel/traps.c      | 5 -----
>  arch/h8300/kernel/traps.c    | 4 ----
>  arch/hexagon/kernel/traps.c  | 4 ----
>  arch/nds32/kernel/traps.c    | 5 -----
>  arch/nios2/kernel/traps.c    | 5 -----
>  arch/openrisc/kernel/traps.c | 5 -----
>  arch/parisc/kernel/traps.c   | 4 ----
>  arch/powerpc/kernel/traps.c  | 5 -----

Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)

cheers

^ permalink raw reply

* [powerpc:fixes-test] BUILD SUCCESS cbc06f051c524dcfe52ef0d1f30647828e226d30
From: kernel test robot @ 2021-08-13  2:30 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git fixes-test
branch HEAD: cbc06f051c524dcfe52ef0d1f30647828e226d30  powerpc/xive: Do not skip CPU-less nodes when creating the IPIs

elapsed time: 721m

configs tested: 134
configs skipped: 108

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm                                 defconfig
arm64                            allyesconfig
arm64                               defconfig
arm                              allyesconfig
arm                              allmodconfig
i386                 randconfig-c001-20210812
i386                 randconfig-c001-20210813
sh                          polaris_defconfig
powerpc                      bamboo_defconfig
powerpc                       holly_defconfig
powerpc                mpc7448_hpc2_defconfig
powerpc                        cell_defconfig
powerpc                 mpc837x_rdb_defconfig
powerpc                     redwood_defconfig
arm                         at91_dt_defconfig
sh                   secureedge5410_defconfig
powerpc                     pseries_defconfig
powerpc                      makalu_defconfig
arm                      jornada720_defconfig
m68k                             alldefconfig
sh                          rsk7264_defconfig
sh                            titan_defconfig
powerpc                     asp8347_defconfig
arc                          axs103_defconfig
powerpc                     mpc5200_defconfig
arm                     eseries_pxa_defconfig
powerpc                 mpc8540_ads_defconfig
arm                         vf610m4_defconfig
powerpc                      ppc40x_defconfig
powerpc                      ppc44x_defconfig
m68k                       m5208evb_defconfig
x86_64                           alldefconfig
mips                            gpr_defconfig
mips                      maltaaprp_defconfig
xtensa                generic_kc705_defconfig
arm                           stm32_defconfig
arm                           omap1_defconfig
arc                           tb10x_defconfig
arm                          exynos_defconfig
mips                        workpad_defconfig
mips                     decstation_defconfig
mips                          ath25_defconfig
powerpc                     ppa8548_defconfig
arc                        nsimosci_defconfig
arm                     davinci_all_defconfig
mips                           gcw0_defconfig
arm                            xcep_defconfig
riscv                             allnoconfig
powerpc64                        alldefconfig
powerpc                     ksi8560_defconfig
powerpc                      pcm030_defconfig
arm                         s5pv210_defconfig
powerpc                        icon_defconfig
sh                         ap325rxa_defconfig
arm                          gemini_defconfig
sh                               alldefconfig
sh                         apsh4a3a_defconfig
powerpc                      cm5200_defconfig
powerpc                           allnoconfig
arm                            zeus_defconfig
sh                             sh03_defconfig
arm                       aspeed_g5_defconfig
mips                        maltaup_defconfig
arc                     nsimosci_hs_defconfig
x86_64                            allnoconfig
ia64                             allmodconfig
ia64                                defconfig
ia64                             allyesconfig
m68k                             allmodconfig
m68k                                defconfig
m68k                             allyesconfig
nios2                               defconfig
arc                              allyesconfig
nds32                             allnoconfig
nds32                               defconfig
nios2                            allyesconfig
csky                                defconfig
alpha                               defconfig
alpha                            allyesconfig
xtensa                           allyesconfig
h8300                            allyesconfig
arc                                 defconfig
sh                               allmodconfig
parisc                              defconfig
s390                             allyesconfig
s390                             allmodconfig
parisc                           allyesconfig
s390                                defconfig
i386                             allyesconfig
sparc                            allyesconfig
sparc                               defconfig
i386                                defconfig
mips                             allyesconfig
mips                             allmodconfig
powerpc                          allyesconfig
powerpc                          allmodconfig
x86_64               randconfig-a006-20210812
x86_64               randconfig-a004-20210812
x86_64               randconfig-a003-20210812
x86_64               randconfig-a005-20210812
x86_64               randconfig-a002-20210812
x86_64               randconfig-a001-20210812
i386                 randconfig-a004-20210812
i386                 randconfig-a003-20210812
i386                 randconfig-a002-20210812
i386                 randconfig-a001-20210812
i386                 randconfig-a006-20210812
i386                 randconfig-a005-20210812
i386                 randconfig-a011-20210812
i386                 randconfig-a015-20210812
i386                 randconfig-a013-20210812
i386                 randconfig-a014-20210812
i386                 randconfig-a016-20210812
i386                 randconfig-a012-20210812
riscv                    nommu_k210_defconfig
riscv                            allyesconfig
riscv                    nommu_virt_defconfig
riscv                               defconfig
riscv                          rv32_defconfig
riscv                            allmodconfig
x86_64                    rhel-8.3-kselftests
um                           x86_64_defconfig
um                             i386_defconfig
x86_64                           allyesconfig
x86_64                              defconfig
x86_64                               rhel-8.3
x86_64                                  kexec

clang tested configs:
x86_64               randconfig-c001-20210812
x86_64               randconfig-a011-20210812
x86_64               randconfig-a013-20210812
x86_64               randconfig-a012-20210812
x86_64               randconfig-a016-20210812
x86_64               randconfig-a015-20210812
x86_64               randconfig-a014-20210812

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply

* [powerpc:next-test] BUILD REGRESSION aaf01a9f507c108cc8e86f420b0a81e21354be1b
From: kernel test robot @ 2021-08-13  2:38 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next-test
branch HEAD: aaf01a9f507c108cc8e86f420b0a81e21354be1b  cpufreq: powernv: Fix init_chip_info initialization in numa=off

Error/Warning reports:

https://lore.kernel.org/lkml/202108130432.0md3hWU2-lkp@intel.com

Error/Warning in current branch:

arch/powerpc/platforms/pseries/hotplug-cpu.c:1022:8: error: 'node_to_cpumask_map' undeclared (first use in this function)

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
`-- powerpc-randconfig-r005-20210812
    `-- arch-powerpc-platforms-pseries-hotplug-cpu.c:error:node_to_cpumask_map-undeclared-(first-use-in-this-function)

elapsed time: 727m

configs tested: 115
configs skipped: 3

gcc tested configs:
arm                                 defconfig
arm64                            allyesconfig
arm64                               defconfig
arm                              allyesconfig
arm                              allmodconfig
i386                 randconfig-c001-20210812
i386                 randconfig-c001-20210813
powerpc                     mpc5200_defconfig
parisc                              defconfig
powerpc                         wii_defconfig
arm                      integrator_defconfig
arm                        trizeps4_defconfig
sh                          polaris_defconfig
powerpc                      bamboo_defconfig
powerpc                       holly_defconfig
powerpc                mpc7448_hpc2_defconfig
powerpc                        cell_defconfig
powerpc                 mpc837x_rdb_defconfig
m68k                             alldefconfig
sh                          rsk7264_defconfig
sh                            titan_defconfig
powerpc                     asp8347_defconfig
powerpc                        icon_defconfig
arc                          axs103_defconfig
riscv                    nommu_k210_defconfig
m68k                       m5208evb_defconfig
x86_64                           alldefconfig
powerpc                     pseries_defconfig
mips                            gpr_defconfig
mips                      maltaaprp_defconfig
xtensa                generic_kc705_defconfig
arm                           stm32_defconfig
arm                           omap1_defconfig
arc                           tb10x_defconfig
arm                          exynos_defconfig
mips                          ath25_defconfig
mips                        workpad_defconfig
mips                     decstation_defconfig
powerpc                     ppa8548_defconfig
arc                        nsimosci_defconfig
arm                            zeus_defconfig
sh                             sh03_defconfig
arm                       aspeed_g5_defconfig
mips                        maltaup_defconfig
arc                     nsimosci_hs_defconfig
x86_64                            allnoconfig
ia64                             allmodconfig
ia64                                defconfig
ia64                             allyesconfig
m68k                             allmodconfig
m68k                                defconfig
m68k                             allyesconfig
nios2                               defconfig
arc                              allyesconfig
nds32                             allnoconfig
nds32                               defconfig
nios2                            allyesconfig
csky                                defconfig
alpha                               defconfig
alpha                            allyesconfig
xtensa                           allyesconfig
h8300                            allyesconfig
arc                                 defconfig
sh                               allmodconfig
s390                             allyesconfig
s390                             allmodconfig
parisc                           allyesconfig
s390                                defconfig
i386                             allyesconfig
sparc                            allyesconfig
sparc                               defconfig
i386                                defconfig
mips                             allyesconfig
mips                             allmodconfig
powerpc                          allyesconfig
powerpc                          allmodconfig
powerpc                           allnoconfig
x86_64               randconfig-a006-20210812
x86_64               randconfig-a004-20210812
x86_64               randconfig-a003-20210812
x86_64               randconfig-a005-20210812
x86_64               randconfig-a002-20210812
x86_64               randconfig-a001-20210812
i386                 randconfig-a004-20210812
i386                 randconfig-a003-20210812
i386                 randconfig-a002-20210812
i386                 randconfig-a001-20210812
i386                 randconfig-a006-20210812
i386                 randconfig-a005-20210812
i386                 randconfig-a011-20210812
i386                 randconfig-a015-20210812
i386                 randconfig-a013-20210812
i386                 randconfig-a014-20210812
i386                 randconfig-a016-20210812
i386                 randconfig-a012-20210812
riscv                            allyesconfig
riscv                    nommu_virt_defconfig
riscv                             allnoconfig
riscv                               defconfig
riscv                          rv32_defconfig
riscv                            allmodconfig
x86_64                    rhel-8.3-kselftests
um                           x86_64_defconfig
um                             i386_defconfig
x86_64                           allyesconfig
x86_64                              defconfig
x86_64                               rhel-8.3
x86_64                                  kexec

clang tested configs:
x86_64               randconfig-c001-20210812
x86_64               randconfig-a011-20210812
x86_64               randconfig-a013-20210812
x86_64               randconfig-a012-20210812
x86_64               randconfig-a016-20210812
x86_64               randconfig-a015-20210812
x86_64               randconfig-a014-20210812

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply

* Re: [RFC PATCH v0 5/5] pseries: Asynchronous page fault support
From: Nicholas Piggin @ 2021-08-13  4:06 UTC (permalink / raw)
  To: Bharata B Rao, kvm-ppc, linuxppc-dev; +Cc: aneesh.kumar, kvm, bharata.rao
In-Reply-To: <20210805072439.501481-6-bharata@linux.ibm.com>

Excerpts from Bharata B Rao's message of August 5, 2021 5:24 pm:
> Add asynchronous page fault support for pseries guests.
> 
> 1. Setup the guest to handle async-pf
>    - Issue H_REG_SNS hcall to register the SNS region.
>    - Setup the subvention interrupt irq.
>    - Enable async-pf by updating the byte_b9 of VPA for each
>      CPU.
> 2. Check if the page fault is an expropriation notification
>    (SRR1_PROGTRAP set in SRR1) and if so put the task on
>    wait queue based on the expropriation correlation number
>    read from the VPA.
> 3. Handle subvention interrupt to wake any waiting tasks.
>    The wait and wakeup mechanism from x86 async-pf implementation
>    is being reused here.

I don't know too much about the background of this.

How much benefit does this give? What situations? Does PowerVM implement 
it? Do other architectures KVM have something similar?

The SRR1 setting for the DSI is in PAPR? In that case it should be okay,
it might be good to add a small comment in exceptions-64s.S.

[...]

> @@ -395,6 +395,11 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
>  	vm_fault_t fault, major = 0;
>  	bool kprobe_fault = kprobe_page_fault(regs, 11);
>  
> +#ifdef CONFIG_PPC_PSERIES
> +	if (handle_async_page_fault(regs, address))
> +		return 0;
> +#endif
> +
>  	if (unlikely(debugger_fault_handler(regs) || kprobe_fault))
>  		return 0;

[...]

> +int handle_async_page_fault(struct pt_regs *regs, unsigned long addr)
> +{
> +	struct async_pf_sleep_node n;
> +	DECLARE_SWAITQUEUE(wait);
> +	unsigned long exp_corr_nr;
> +
> +	/* Is this Expropriation notification? */
> +	if (!(mfspr(SPRN_SRR1) & SRR1_PROGTRAP))
> +		return 0;

Yep this should be an inline that is guarded by a static key, and then 
probably have an inline check for SRR1_PROGTRAP. You shouldn't need to
mfspr here, but just use regs->msr.

> +
> +	if (unlikely(!user_mode(regs)))
> +		panic("Host injected async PF in kernel mode\n");

Hmm. Is there anything in the PAPR interface that specifies that the
OS can only deal with problem state access faults here? Or is that
inherent in the expropriation feature?

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH v1 17/55] KVM: PPC: Book3S HV P9: Implement PMU save/restore in C
From: Nicholas Piggin @ 2021-08-13  4:24 UTC (permalink / raw)
  To: Athira Rajeev; +Cc: linuxppc-dev, kvm-ppc
In-Reply-To: <1A47BBEF-FC8C-4C4D-8393-9DE66B7FF96C@linux.vnet.ibm.com>

Excerpts from Athira Rajeev's message of August 9, 2021 1:03 pm:
> 
> 
>> On 26-Jul-2021, at 9:19 AM, Nicholas Piggin <npiggin@gmail.com> wrote:


>> +static void freeze_pmu(unsigned long mmcr0, unsigned long mmcra)
>> +{
>> +	if (!(mmcr0 & MMCR0_FC))
>> +		goto do_freeze;
>> +	if (mmcra & MMCRA_SAMPLE_ENABLE)
>> +		goto do_freeze;
>> +	if (cpu_has_feature(CPU_FTR_ARCH_31)) {
>> +		if (!(mmcr0 & MMCR0_PMCCEXT))
>> +			goto do_freeze;
>> +		if (!(mmcra & MMCRA_BHRB_DISABLE))
>> +			goto do_freeze;
>> +	}
>> +	return;
>> +
>> +do_freeze:
>> +	mmcr0 = MMCR0_FC;
>> +	mmcra = 0;
>> +	if (cpu_has_feature(CPU_FTR_ARCH_31)) {
>> +		mmcr0 |= MMCR0_PMCCEXT;
>> +		mmcra = MMCRA_BHRB_DISABLE;
>> +	}
>> +
>> +	mtspr(SPRN_MMCR0, mmcr0);
>> +	mtspr(SPRN_MMCRA, mmcra);
>> +	isync();
>> +}
>> +
> Hi Nick,
> 
> After feezing pmu, do we need to clear “pmcregs_in_use” as well?

Not until we save the values out of the registers. pmcregs_in_use = 0 
means our hypervisor is free to clear our PMU register contents.

> Also can’t we unconditionally do the MMCR0/MMCRA/ freeze settings in here ? do we need the if conditions for FC/PMCCEXT/BHRB ?

I think it's possible, but pretty minimal advantage. I would prefer to 
set them the way perf does for now. If we can move this code into perf/
it should become easier for you to tweak things.

Thanks,
Nick

^ permalink raw reply

* Re: [RFC PATCH v0 5/5] pseries: Asynchronous page fault support
From: Bharata B Rao @ 2021-08-13  4:54 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: aneesh.kumar, linuxppc-dev, kvm, kvm-ppc, bharata.rao
In-Reply-To: <1628825941.uhcogyrzjc.astroid@bobo.none>

On Fri, Aug 13, 2021 at 02:06:40PM +1000, Nicholas Piggin wrote:
> Excerpts from Bharata B Rao's message of August 5, 2021 5:24 pm:
> > Add asynchronous page fault support for pseries guests.
> > 
> > 1. Setup the guest to handle async-pf
> >    - Issue H_REG_SNS hcall to register the SNS region.
> >    - Setup the subvention interrupt irq.
> >    - Enable async-pf by updating the byte_b9 of VPA for each
> >      CPU.
> > 2. Check if the page fault is an expropriation notification
> >    (SRR1_PROGTRAP set in SRR1) and if so put the task on
> >    wait queue based on the expropriation correlation number
> >    read from the VPA.
> > 3. Handle subvention interrupt to wake any waiting tasks.
> >    The wait and wakeup mechanism from x86 async-pf implementation
> >    is being reused here.
> 
> I don't know too much about the background of this.
> 
> How much benefit does this give? What situations?

I haven't yet gotten into measuring the benefit of this. Once
the patches are bit more stable than what they are currently,
we need to measure and evaluate the benefits.

> Does PowerVM implement it?

I suppose so, need to check though.

> Do other architectures KVM have something similar?

Yes, x86 and s390 KVM have had this feature for a while now
and generic KVM interfaces exist to support it.

> 
> The SRR1 setting for the DSI is in PAPR? In that case it should be okay,
> it might be good to add a small comment in exceptions-64s.S.

Yes, SRR1 setting is part of PAPR.

> 
> [...]
> 
> > @@ -395,6 +395,11 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
> >  	vm_fault_t fault, major = 0;
> >  	bool kprobe_fault = kprobe_page_fault(regs, 11);
> >  
> > +#ifdef CONFIG_PPC_PSERIES
> > +	if (handle_async_page_fault(regs, address))
> > +		return 0;
> > +#endif
> > +
> >  	if (unlikely(debugger_fault_handler(regs) || kprobe_fault))
> >  		return 0;
> 
> [...]
> 
> > +int handle_async_page_fault(struct pt_regs *regs, unsigned long addr)
> > +{
> > +	struct async_pf_sleep_node n;
> > +	DECLARE_SWAITQUEUE(wait);
> > +	unsigned long exp_corr_nr;
> > +
> > +	/* Is this Expropriation notification? */
> > +	if (!(mfspr(SPRN_SRR1) & SRR1_PROGTRAP))
> > +		return 0;
> 
> Yep this should be an inline that is guarded by a static key, and then 
> probably have an inline check for SRR1_PROGTRAP. You shouldn't need to
> mfspr here, but just use regs->msr.

Right.

> 
> > +
> > +	if (unlikely(!user_mode(regs)))
> > +		panic("Host injected async PF in kernel mode\n");
> 
> Hmm. Is there anything in the PAPR interface that specifies that the
> OS can only deal with problem state access faults here? Or is that
> inherent in the expropriation feature?

Didn't see anything specific to that effect in PAPR. However since
this puts the faulting guest process to sleep until the page
becomes ready in the host, I have limited it to guest user space
faults.

Regards,
Bharata.

^ permalink raw reply

* Re: [PATCH v6 1/2] tty: hvc: pass DMA capable memory to put_chars()
From: Jiri Slaby @ 2021-08-13  5:53 UTC (permalink / raw)
  To: kernel test robot, Xianting Tian, gregkh, amit, arnd, osandov
  Cc: clang-built-linux, kbuild-all, linuxppc-dev, linux-kernel,
	virtualization
In-Reply-To: <202108122040.lBf24DNp-lkp@intel.com>

Hi,

On 12. 08. 21, 14:26, kernel test robot wrote:
>>> drivers/tty/hvc/hvc_console.c:190:26: warning: variable 'hp' is uninitialized when used here [-Wuninitialized]
>             spin_unlock_irqrestore(&hp->c_lock, flags);
>                                     ^~
>     drivers/tty/hvc/hvc_console.c:149:23: note: initialize the variable 'hp' to silence this warning
>             struct hvc_struct *hp;
>                                  ^
>                                   = NULL

So you clearly didn't test your change as it would crash quite 
instantly. I wonder, where do you intend to get hp from in the 
console::print() hook?

thanks,
-- 
js
suse labs

^ permalink raw reply

* Re: [PATCH v6 1/2] tty: hvc: pass DMA capable memory to put_chars()
From: Xianting TIan @ 2021-08-13  6:04 UTC (permalink / raw)
  To: Jiri Slaby, kernel test robot, gregkh, amit, arnd, osandov
  Cc: clang-built-linux, kbuild-all, linuxppc-dev, linux-kernel,
	virtualization
In-Reply-To: <0c808001-7643-fdcf-66ba-738654ec0c21@kernel.org>


在 2021/8/13 下午1:53, Jiri Slaby 写道:
> Hi,
>
> On 12. 08. 21, 14:26, kernel test robot wrote:
>>>> drivers/tty/hvc/hvc_console.c:190:26: warning: variable 'hp' is 
>>>> uninitialized when used here [-Wuninitialized]
>>             spin_unlock_irqrestore(&hp->c_lock, flags);
>>                                     ^~
>>     drivers/tty/hvc/hvc_console.c:149:23: note: initialize the 
>> variable 'hp' to silence this warning
>>             struct hvc_struct *hp;
>>                                  ^
>>                                   = NULL
>
> So you clearly didn't test your change as it would crash quite 
> instantly. I wonder, where do you intend to get hp from in the 
> console::print() hook?

I am very sorry for the inconvenience caused.

This is caused by my carelessness:(

I take it for granted that there is no problem when I just switch to use 
array(cons_outbuf[]).

sorry agin.

>
> thanks,

^ permalink raw reply

* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
From: Nicholas Piggin @ 2021-08-13  6:08 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman,
	Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <b286e07fb771a664b631cd07a40b09c06f26e64b.1618331881.git.christophe.leroy@csgroup.eu>

Excerpts from Christophe Leroy's message of April 14, 2021 2:38 am:
> powerpc BUG_ON() and WARN_ON() are based on using twnei instruction.
> 
> For catching simple conditions like a variable having value 0, this
> is efficient because it does the test and the trap at the same time.
> But most conditions used with BUG_ON or WARN_ON are more complex and
> forces GCC to format the condition into a 0 or 1 value in a register.
> This will usually require 2 to 3 instructions.
> 
> The most efficient solution would be to use __builtin_trap() because
> GCC is able to optimise the use of the different trap instructions
> based on the requested condition, but this is complex if not
> impossible for the following reasons:
> - __builtin_trap() is a non-recoverable instruction, so it can't be
> used for WARN_ON
> - Knowing which line of code generated the trap would require the
> analysis of DWARF information. This is not a feature we have today.
> 
> As mentioned in commit 8d4fbcfbe0a4 ("Fix WARN_ON() on bitfield ops")
> the way WARN_ON() is implemented is suboptimal. That commit also
> mentions an issue with 'long long' condition. It fixed it for
> WARN_ON() but the same problem still exists today with BUG_ON() on
> PPC32. It will be fixed by using the generic implementation.
> 
> By using the generic implementation, gcc will naturally generate a
> branch to the unconditional trap generated by BUG().
> 
> As modern powerpc implement zero-cycle branch,
> that's even more efficient.
> 
> And for the functions using WARN_ON() and its return, the test
> on return from WARN_ON() is now also used for the WARN_ON() itself.
> 
> On PPC64 we don't want it because we want to be able to use CFAR
> register to track how we entered the code that trapped. The CFAR
> register would be clobbered by the branch.
> 
> A simple test function:
> 
> 	unsigned long test9w(unsigned long a, unsigned long b)
> 	{
> 		if (WARN_ON(!b))
> 			return 0;
> 		return a / b;
> 	}
> 
> Before the patch:
> 
> 	0000046c <test9w>:
> 	 46c:	7c 89 00 34 	cntlzw  r9,r4
> 	 470:	55 29 d9 7e 	rlwinm  r9,r9,27,5,31
> 	 474:	0f 09 00 00 	twnei   r9,0
> 	 478:	2c 04 00 00 	cmpwi   r4,0
> 	 47c:	41 82 00 0c 	beq     488 <test9w+0x1c>
> 	 480:	7c 63 23 96 	divwu   r3,r3,r4
> 	 484:	4e 80 00 20 	blr
> 
> 	 488:	38 60 00 00 	li      r3,0
> 	 48c:	4e 80 00 20 	blr
> 
> After the patch:
> 
> 	00000468 <test9w>:
> 	 468:	2c 04 00 00 	cmpwi   r4,0
> 	 46c:	41 82 00 0c 	beq     478 <test9w+0x10>
> 	 470:	7c 63 23 96 	divwu   r3,r3,r4
> 	 474:	4e 80 00 20 	blr
> 
> 	 478:	0f e0 00 00 	twui    r0,0
> 	 47c:	38 60 00 00 	li      r3,0
> 	 480:	4e 80 00 20 	blr

That's clearly better because we have a branch anyway.

> 
> So we see before the patch we need 3 instructions on the likely path
> to handle the WARN_ON(). With the patch the trap goes on the unlikely
> path.
> 
> See below the difference at the entry of system_call_exception where
> we have several BUG_ON(), allthough less impressing.
> 
> With the patch:
> 
> 	00000000 <system_call_exception>:
> 	   0:	81 6a 00 84 	lwz     r11,132(r10)
> 	   4:	90 6a 00 88 	stw     r3,136(r10)
> 	   8:	71 60 00 02 	andi.   r0,r11,2
> 	   c:	41 82 00 70 	beq     7c <system_call_exception+0x7c>
> 	  10:	71 60 40 00 	andi.   r0,r11,16384
> 	  14:	41 82 00 6c 	beq     80 <system_call_exception+0x80>
> 	  18:	71 6b 80 00 	andi.   r11,r11,32768
> 	  1c:	41 82 00 68 	beq     84 <system_call_exception+0x84>
> 	  20:	94 21 ff e0 	stwu    r1,-32(r1)
> 	  24:	93 e1 00 1c 	stw     r31,28(r1)
> 	  28:	7d 8c 42 e6 	mftb    r12
> 	...
> 	  7c:	0f e0 00 00 	twui    r0,0
> 	  80:	0f e0 00 00 	twui    r0,0
> 	  84:	0f e0 00 00 	twui    r0,0
> 
> Without the patch:
> 
> 	00000000 <system_call_exception>:
> 	   0:	94 21 ff e0 	stwu    r1,-32(r1)
> 	   4:	93 e1 00 1c 	stw     r31,28(r1)
> 	   8:	90 6a 00 88 	stw     r3,136(r10)
> 	   c:	81 6a 00 84 	lwz     r11,132(r10)
> 	  10:	69 60 00 02 	xori    r0,r11,2
> 	  14:	54 00 ff fe 	rlwinm  r0,r0,31,31,31
> 	  18:	0f 00 00 00 	twnei   r0,0
> 	  1c:	69 60 40 00 	xori    r0,r11,16384
> 	  20:	54 00 97 fe 	rlwinm  r0,r0,18,31,31
> 	  24:	0f 00 00 00 	twnei   r0,0
> 	  28:	69 6b 80 00 	xori    r11,r11,32768
> 	  2c:	55 6b 8f fe 	rlwinm  r11,r11,17,31,31
> 	  30:	0f 0b 00 00 	twnei   r11,0
> 	  34:	7d 8c 42 e6 	mftb    r12

This one possibly the branches end up in predictors, whereas conditional 
trap is always just speculated not to hit. Branches may also have a
throughput limit on execution whereas trap could be more (1 per cycle
vs 4 per cycle on POWER9).

On typical ppc32 CPUs, maybe it's a more obvious win. As you say there
is the CFAR issue as well which makes it a problem for 64s. It would
have been nice if it could use the same code though.

Maybe one day gcc's __builtin_trap() will become smart enough around
conditional statements that it it generates better code and tries to
avoid branches.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
From: Nicholas Piggin @ 2021-08-13  6:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman,
	Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <389962b1b702e3c78d169e59bcfac56282889173.1618331882.git.christophe.leroy@csgroup.eu>

Excerpts from Christophe Leroy's message of April 14, 2021 2:38 am:
> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
> flexibility to GCC.
> 
> For that add an entry to the exception table so that
> program_check_exception() knowns where to resume execution
> after a WARNING.

Nice idea. How much does it bloat the exception table?
How easy would it be to make a different exception table for
unimportant stuff like WARN_ON faults?

> 
> Here are two exemples. The first one is done on PPC32 (which
> benefits from the previous patch), the second is on PPC64.
> 
> 	unsigned long test(struct pt_regs *regs)
> 	{
> 		int ret;
> 
> 		WARN_ON(regs->msr & MSR_PR);
> 
> 		return regs->gpr[3];
> 	}
> 
> 	unsigned long test9w(unsigned long a, unsigned long b)
> 	{
> 		if (WARN_ON(!b))
> 			return 0;
> 		return a / b;
> 	}
> 
> Before the patch:
> 
> 	000003a8 <test>:
> 	 3a8:	81 23 00 84 	lwz     r9,132(r3)
> 	 3ac:	71 29 40 00 	andi.   r9,r9,16384
> 	 3b0:	40 82 00 0c 	bne     3bc <test+0x14>
> 	 3b4:	80 63 00 0c 	lwz     r3,12(r3)
> 	 3b8:	4e 80 00 20 	blr
> 
> 	 3bc:	0f e0 00 00 	twui    r0,0
> 	 3c0:	80 63 00 0c 	lwz     r3,12(r3)
> 	 3c4:	4e 80 00 20 	blr
> 
> 	0000000000000bf0 <.test9w>:
> 	 bf0:	7c 89 00 74 	cntlzd  r9,r4
> 	 bf4:	79 29 d1 82 	rldicl  r9,r9,58,6
> 	 bf8:	0b 09 00 00 	tdnei   r9,0
> 	 bfc:	2c 24 00 00 	cmpdi   r4,0
> 	 c00:	41 82 00 0c 	beq     c0c <.test9w+0x1c>
> 	 c04:	7c 63 23 92 	divdu   r3,r3,r4
> 	 c08:	4e 80 00 20 	blr
> 
> 	 c0c:	38 60 00 00 	li      r3,0
> 	 c10:	4e 80 00 20 	blr
> 
> After the patch:
> 
> 	000003a8 <test>:
> 	 3a8:	81 23 00 84 	lwz     r9,132(r3)
> 	 3ac:	71 29 40 00 	andi.   r9,r9,16384
> 	 3b0:	40 82 00 0c 	bne     3bc <test+0x14>
> 	 3b4:	80 63 00 0c 	lwz     r3,12(r3)
> 	 3b8:	4e 80 00 20 	blr
> 
> 	 3bc:	0f e0 00 00 	twui    r0,0
> 
> 	0000000000000c50 <.test9w>:
> 	 c50:	7c 89 00 74 	cntlzd  r9,r4
> 	 c54:	79 29 d1 82 	rldicl  r9,r9,58,6
> 	 c58:	0b 09 00 00 	tdnei   r9,0
> 	 c5c:	7c 63 23 92 	divdu   r3,r3,r4
> 	 c60:	4e 80 00 20 	blr
> 
> 	 c70:	38 60 00 00 	li      r3,0
> 	 c74:	4e 80 00 20 	blr

One thing that would be nice for WARN_ON_ONCE is to patch out the trap
after the program interrupt. I've seen sometimes the WARN_ON_ONCE starts
firing a lot and slows down the kernel. That gets harder to do if the
trap is now part of the control flow.

I guess that's less important case for performance though. And in theory
you might be able to replace the trap with an equivalent cmp and branch
(but that's probably going too over engineering it).

Thanks,
Nick


^ permalink raw reply

* Re: [PATCH v6 1/2] tty: hvc: pass DMA capable memory to put_chars()
From: Greg KH @ 2021-08-13  7:27 UTC (permalink / raw)
  To: Xianting Tian
  Cc: arnd, amit, jirislaby, linux-kernel, virtualization, linuxppc-dev,
	osandov
In-Reply-To: <20210812094532.145497-2-xianting.tian@linux.alibaba.com>

On Thu, Aug 12, 2021 at 05:45:31PM +0800, Xianting Tian wrote:
> As well known, hvc backend can register its opertions to hvc backend.
> the opertions contain put_chars(), get_chars() and so on.
> 
> Some hvc backend may do dma in its opertions. eg, put_chars() of
> virtio-console. But in the code of hvc framework, it may pass DMA
> incapable memory to put_chars() under a specific configuration, which
> is explained in commit c4baad5029(virtio-console: avoid DMA from stack):
> 1, c[] is on stack,
>    hvc_console_print():
> 	char c[N_OUTBUF] __ALIGNED__;
> 	cons_ops[index]->put_chars(vtermnos[index], c, i);
> 2, ch is on stack,
>    static void hvc_poll_put_char(,,char ch)
>    {
> 	struct tty_struct *tty = driver->ttys[0];
> 	struct hvc_struct *hp = tty->driver_data;
> 	int n;
> 
> 	do {
> 		n = hp->ops->put_chars(hp->vtermno, &ch, 1);
> 	} while (n <= 0);
>    }
> 
> Commit c4baad5029 is just the fix to avoid DMA from stack memory, which
> is passed to virtio-console by hvc framework in above code. But I think
> the fix is aggressive, it directly uses kmemdup() to alloc new buffer
> from kmalloc area and do memcpy no matter the memory is in kmalloc area
> or not. But most importantly, it should better be fixed in the hvc
> framework, by changing it to never pass stack memory to the put_chars()
> function in the first place. Otherwise, we still face the same issue if
> a new hvc backend using dma added in the furture.
> 
> We make 'char c[N_OUTBUF]' part of 'struct hvc_struct', so hp->c is no
> longer the stack memory. we can use it in above two cases.
> 
> Other fix is use L1_CACHE_BYTES as the alignment, use 'sizeof(long)' as
> dma alignment is wrong. And use struct_size() to calculate size of
> hvc_struct.
> 
> Introduce another array(cons_outbuf[]) for the hp->c pointers next to
> the cons_ops[] and vtermnos[] arrays.
> 
> With the patch, we can remove the fix c4baad5029.
> 
> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
> Tested-by: Xianting Tian <xianting.tian@linux.alibaba.com>

As the build shows, you obviously did not test this code :(

Also, no need to add a tested-by line as that should be implicit if you
wrote and signed off on it.

I am going to ask you to get some help from some other developers at
your company, and get them to test and sign off on this series before
sending it out again, as there seems to be a bit of a disconnect as to
what is actually needed to do when sending a patch for us to review.

That is now a requirement for us to be able to take your changes here.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH v2 1/2] powerpc/book3s64/radix: make tlb_single_page_flush_ceiling a debugfs entry
From: Daniel Axtens @ 2021-08-13  7:39 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V
In-Reply-To: <20210812132831.233794-1-aneesh.kumar@linux.ibm.com>

"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:

> Similar to x86/s390 add a debugfs file to tune tlb_single_page_flush_ceiling.
> Also add a debugfs entry for tlb_local_single_page_flush_ceiling.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
> Changes from v1:
> * switch to debugfs_create_u32
>
>  arch/powerpc/mm/book3s64/radix_tlb.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
> index aefc100d79a7..1fa2bc6a969e 100644
> --- a/arch/powerpc/mm/book3s64/radix_tlb.c
> +++ b/arch/powerpc/mm/book3s64/radix_tlb.c
> @@ -17,6 +17,7 @@
>  #include <asm/trace.h>
>  #include <asm/cputhreads.h>
>  #include <asm/plpar_wrappers.h>
> +#include <asm/debugfs.h>
>  
>  #include "internal.h"
>  
> @@ -1106,8 +1107,8 @@ EXPORT_SYMBOL(radix__flush_tlb_kernel_range);
>   * invalidating a full PID, so it has a far lower threshold to change from
>   * individual page flushes to full-pid flushes.
>   */
> -static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
> -static unsigned long tlb_local_single_page_flush_ceiling __read_mostly = POWER9_TLB_SETS_RADIX * 2;
> +static u32 tlb_single_page_flush_ceiling __read_mostly = 33;
> +static u32 tlb_local_single_page_flush_ceiling __read_mostly = POWER9_TLB_SETS_RADIX * 2;
>  
>  static inline void __radix__flush_tlb_range(struct mm_struct *mm,
>  					    unsigned long start, unsigned long end)
> @@ -1524,3 +1525,14 @@ void do_h_rpt_invalidate_prt(unsigned long pid, unsigned long lpid,
>  EXPORT_SYMBOL_GPL(do_h_rpt_invalidate_prt);
>  
>  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
> +
> +static int __init create_tlb_single_page_flush_ceiling(void)
> +{
> +	debugfs_create_u32("tlb_single_page_flush_ceiling", 0600,
> +			   powerpc_debugfs_root, &tlb_single_page_flush_ceiling);
> +	debugfs_create_u32("tlb_local_single_page_flush_ceiling", 0600,
> +			   powerpc_debugfs_root, &tlb_local_single_page_flush_ceiling);
> +	return 0;
> +}
> +late_initcall(create_tlb_single_page_flush_ceiling);

This patch seems to do what the commit message says, and it does seem to
make sense to have these parameters as tunables.

I was briefly concerned that switching from an unsigned long to a u32
might lead to suboptimal code generation in older gcc versions, but it
doesn't seem to be a case where a single instruction is going to make a
huge impact.

I also wondered what the C integer promotion rules would do with a the
nr_pages > tlb*flush_ceiling comparisons, but if we are trying to flush
more than 4 billion pages we might have other, bigger problems! (Also,
if I understand the C integer rules correctly the u32 will get promoted
to an unsigned long anyway.)

All in all this seems good to me.

Reviewed-by: Daniel Axtens <dja@axtens.net>

Kind regards,
Daniel



> +
> -- 
> 2.31.1

^ permalink raw reply

* [PATCH] powerpc/perf/hv-gpci: Fix the logic to compute counter value from the hcall result buffer.
From: Kajol Jain @ 2021-08-13  8:21 UTC (permalink / raw)
  To: mpe, linuxppc-dev; +Cc: kjain, suka, maddy, rnsastry, atrajeev

H_GetPerformanceCounterInfo (0xF080) hcall returns the counter data in the
result buffer. Result buffer has specific format defined in the PAPR
specification. One of the field is counter offset and width of the counter
data returned.

Counter data are returned in a unsigned char array. To
get the final counter data, these values should be left shifted
byte at a time. But commit 220a0c609ad17 ("powerpc/perf: Add support 
for the hv gpci (get performance counter info) interface") made the
shifting bitwise. Because of this, hcall counters values could end up
in lower side, which messes the counter prev vs now calculation. This
lead to huge counter value reporting

[command]#: perf stat -e hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
           -C 0 -I 1000
        time             counts unit events
     1.000078854 18,446,744,073,709,535,232      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     2.000213293                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     3.000320107                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     4.000428392                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     5.000537864                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     6.000649087                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     7.000760312                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     8.000865218             16,448      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     9.000978985 18,446,744,073,709,535,232      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
    10.001088891             16,384      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
    11.001201435                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
    12.001307937 18,446,744,073,709,535,232      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/

Patch here fixes the shifting logic to make is byte-wise with which no more the issue seen. 

Fixes: e4f226b1580b3 ("powerpc/perf/hv-gpci: Increase request buffer size")
Reported-by: Nageswara R Sastry<rnsastry@linux.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 arch/powerpc/perf/hv-gpci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
index d48413e28c39..c756228a081f 100644
--- a/arch/powerpc/perf/hv-gpci.c
+++ b/arch/powerpc/perf/hv-gpci.c
@@ -175,7 +175,7 @@ static unsigned long single_gpci_request(u32 req, u32 starting_index,
 	 */
 	count = 0;
 	for (i = offset; i < offset + length; i++)
-		count |= arg->bytes[i] << (i - offset);
+		count |= (u64)(arg->bytes[i]) << ((length - 1 - (i - offset)) * 8);
 
 	*value = count;
 out:
-- 
2.26.2


^ permalink raw reply related

* [PATCH v2 1/2] powerpc/perf: Use stack siar instead of mfspr
From: Kajol Jain @ 2021-08-13  8:24 UTC (permalink / raw)
  To: mpe, linuxppc-dev; +Cc: kjain, atrajeev, maddy, rnsastry

Minor optimization in the 'perf_instruction_pointer' function code by
making use of stack siar instead of mfspr.

Fixes: 75382aa72f06 ("powerpc/perf: Move code to select SIAR or pt_regs
into perf_read_regs")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 arch/powerpc/perf/core-book3s.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index bb0ee716de91..1b464aad29c4 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2260,7 +2260,7 @@ unsigned long perf_instruction_pointer(struct pt_regs *regs)
 		else
 			return regs->nip;
 	} else if (use_siar && siar_valid(regs))
-		return mfspr(SPRN_SIAR) + perf_ip_adjust(regs);
+		return siar + perf_ip_adjust(regs);
 	else if (use_siar)
 		return 0;		// no valid instruction pointer
 	else
-- 
2.26.2


^ permalink raw reply related

* [PATCH v2 2/2] powerpc/perf: Return regs->nip as instruction pointer value when SIAR is 0
From: Kajol Jain @ 2021-08-13  8:24 UTC (permalink / raw)
  To: mpe, linuxppc-dev; +Cc: kjain, atrajeev, maddy, rnsastry
In-Reply-To: <20210813082450.429320-1-kjain@linux.ibm.com>

Incase of random sampling, there can be scenarios where SIAR is not
latching sample address and results in 0 value. Since current code
directly returning the siar value, we could see multiple instruction
pointer values as 0 in perf report.
Patch resolves this issue by adding a ternary condition to return
regs->nip incase SIAR is 0.

Fixes: 75382aa72f06 ("powerpc/perf: Move code to select SIAR or pt_regs
into perf_read_regs")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 arch/powerpc/perf/core-book3s.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 1b464aad29c4..aeecaaf6810f 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2260,7 +2260,7 @@ unsigned long perf_instruction_pointer(struct pt_regs *regs)
 		else
 			return regs->nip;
 	} else if (use_siar && siar_valid(regs))
-		return siar + perf_ip_adjust(regs);
+		return siar ? siar + perf_ip_adjust(regs) : regs->nip;
 	else if (use_siar)
 		return 0;		// no valid instruction pointer
 	else
-- 
2.26.2


^ permalink raw reply related

* Re: [PATCH v2 1/2] powerpc/perf: Use stack siar instead of mfspr
From: kajoljain @ 2021-08-13  8:29 UTC (permalink / raw)
  To: mpe, linuxppc-dev; +Cc: atrajeev, maddy, rnsastry
In-Reply-To: <20210813082450.429320-1-kjain@linux.ibm.com>



On 8/13/21 1:54 PM, Kajol Jain wrote:
> Minor optimization in the 'perf_instruction_pointer' function code by
> making use of stack siar instead of mfspr.
> 
> Fixes: 75382aa72f06 ("powerpc/perf: Move code to select SIAR or pt_regs
> into perf_read_regs")
> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>

Please ignore this patch-set as I mentioned wrong version number. I will resend
this patch-set again with correct version. Sorry for the confusion.

Thanks,
Kajol Jain
> ---
>  arch/powerpc/perf/core-book3s.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index bb0ee716de91..1b464aad29c4 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -2260,7 +2260,7 @@ unsigned long perf_instruction_pointer(struct pt_regs *regs)
>  		else
>  			return regs->nip;
>  	} else if (use_siar && siar_valid(regs))
> -		return mfspr(SPRN_SIAR) + perf_ip_adjust(regs);
> +		return siar + perf_ip_adjust(regs);
>  	else if (use_siar)
>  		return 0;		// no valid instruction pointer
>  	else
> 

^ permalink raw reply

* [PATCH 1/2] powerpc/perf: Use stack siar instead of mfspr
From: Kajol Jain @ 2021-08-13  8:30 UTC (permalink / raw)
  To: mpe, linuxppc-dev; +Cc: kjain, atrajeev, maddy, rnsastry

Minor optimization in the 'perf_instruction_pointer' function code by
making use of stack siar instead of mfspr.

Fixes: 75382aa72f06 ("powerpc/perf: Move code to select SIAR or pt_regs
into perf_read_regs")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 arch/powerpc/perf/core-book3s.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index bb0ee716de91..1b464aad29c4 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2260,7 +2260,7 @@ unsigned long perf_instruction_pointer(struct pt_regs *regs)
 		else
 			return regs->nip;
 	} else if (use_siar && siar_valid(regs))
-		return mfspr(SPRN_SIAR) + perf_ip_adjust(regs);
+		return siar + perf_ip_adjust(regs);
 	else if (use_siar)
 		return 0;		// no valid instruction pointer
 	else
-- 
2.26.2


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox