LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 2/2] cpufreq: Specify default governor on command line
From: Quentin Perret @ 2020-06-15 16:55 UTC (permalink / raw)
  To: rjw, rafael, viresh.kumar
  Cc: juri.lelli, kernel-team, vincent.guittot, arnd, linux-pm, peterz,
	adharmap, qperret, linux-kernel, mingo, paulus, linuxppc-dev,
	tkjos
In-Reply-To: <20200615165554.228063-1-qperret@google.com>

Currently, the only way to specify the default CPUfreq governor is via
Kconfig options, which suits users who can build the kernel themselves
perfectly.

However, for those who use a distro-like kernel (such as Android, with
the Generic Kernel Image project), the only way to use a different
default is to boot to userspace, and to then switch using the sysfs
interface. Being able to specify the default governor on the command
line, like is the case for cpuidle, would enable those users to specify
their governor of choice earlier on, and to simplify slighlty the
userspace boot procedure.

To support this use-case, add a kernel command line parameter enabling
to specify a default governor for CPUfreq, which takes precedence over
the builtin default.

This implementation has one notable limitation: the default governor
must be registered before the driver. This is solved for builtin
governors and drivers using appropriate *_initcall() functions. And in
the modular case, this must be reflected as a constraint on the module
loading order.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 .../admin-guide/kernel-parameters.txt         |  5 +++
 Documentation/admin-guide/pm/cpufreq.rst      |  6 ++--
 drivers/cpufreq/cpufreq.c                     | 34 ++++++++++++++++---
 3 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index fb95fad81c79..5fd3c9f187eb 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -703,6 +703,11 @@
 	cpufreq.off=1	[CPU_FREQ]
 			disable the cpufreq sub-system
 
+	cpufreq.default_governor=
+			[CPU_FREQ] Name of the default cpufreq governor to use.
+			This governor must be registered in the kernel before
+			the cpufreq driver probes.
+
 	cpu_init_udelay=N
 			[X86] Delay for N microsec between assert and de-assert
 			of APIC INIT to start processors.  This delay occurs
diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst
index 0c74a7784964..368e612145d2 100644
--- a/Documentation/admin-guide/pm/cpufreq.rst
+++ b/Documentation/admin-guide/pm/cpufreq.rst
@@ -147,9 +147,9 @@ CPUs in it.
 
 The next major initialization step for a new policy object is to attach a
 scaling governor to it (to begin with, that is the default scaling governor
-determined by the kernel configuration, but it may be changed later
-via ``sysfs``).  First, a pointer to the new policy object is passed to the
-governor's ``->init()`` callback which is expected to initialize all of the
+determined by the kernel command line or configuration, but it may be changed
+later via ``sysfs``).  First, a pointer to the new policy object is passed to
+the governor's ``->init()`` callback which is expected to initialize all of the
 data structures necessary to handle the given policy and, possibly, to add
 a governor ``sysfs`` interface to it.  Next, the governor is started by
 invoking its ``->start()`` callback.
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 0128de3603df..0f05caedc320 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -50,6 +50,9 @@ static LIST_HEAD(cpufreq_governor_list);
 #define for_each_governor(__governor)				\
 	list_for_each_entry(__governor, &cpufreq_governor_list, governor_list)
 
+static char cpufreq_param_governor[CPUFREQ_NAME_LEN];
+static struct cpufreq_governor *default_governor;
+
 /**
  * The "cpufreq driver" - the arch- or hardware-dependent low
  * level driver of CPUFreq support, and its spinlock. This lock
@@ -1055,7 +1058,6 @@ __weak struct cpufreq_governor *cpufreq_default_governor(void)
 
 static int cpufreq_init_policy(struct cpufreq_policy *policy)
 {
-	struct cpufreq_governor *def_gov = cpufreq_default_governor();
 	struct cpufreq_governor *gov = NULL;
 	unsigned int pol = CPUFREQ_POLICY_UNKNOWN;
 
@@ -1065,8 +1067,8 @@ static int cpufreq_init_policy(struct cpufreq_policy *policy)
 		if (gov) {
 			pr_debug("Restoring governor %s for cpu %d\n",
 				 policy->governor->name, policy->cpu);
-		} else if (def_gov) {
-			gov = def_gov;
+		} else if (default_governor) {
+			gov = default_governor;
 		} else {
 			return -ENODATA;
 		}
@@ -1074,8 +1076,8 @@ static int cpufreq_init_policy(struct cpufreq_policy *policy)
 		/* Use the default policy if there is no last_policy. */
 		if (policy->last_policy) {
 			pol = policy->last_policy;
-		} else if (def_gov) {
-			pol = cpufreq_parse_policy(def_gov->name);
+		} else if (default_governor) {
+			pol = cpufreq_parse_policy(default_governor->name);
 			/*
 			 * In case the default governor is neiter "performance"
 			 * nor "powersave", fall back to the initial policy
@@ -2196,6 +2198,24 @@ __weak struct cpufreq_governor *cpufreq_fallback_governor(void)
 	return NULL;
 }
 
+static void cpufreq_get_default_governor(void)
+{
+	default_governor = cpufreq_parse_governor(cpufreq_param_governor);
+	if (!default_governor) {
+		if (*cpufreq_param_governor)
+			pr_warn("Failed to find %s\n", cpufreq_param_governor);
+		default_governor = cpufreq_default_governor();
+	}
+}
+
+static void cpufreq_put_default_governor(void)
+{
+	if (!default_governor)
+		return;
+	module_put(default_governor->owner);
+	default_governor = NULL;
+}
+
 static int cpufreq_init_governor(struct cpufreq_policy *policy)
 {
 	int ret;
@@ -2701,6 +2721,8 @@ int cpufreq_register_driver(struct cpufreq_driver *driver_data)
 
 	if (driver_data->setpolicy)
 		driver_data->flags |= CPUFREQ_CONST_LOOPS;
+	else
+		cpufreq_get_default_governor();
 
 	if (cpufreq_boost_supported()) {
 		ret = create_boost_sysfs_file();
@@ -2769,6 +2791,7 @@ int cpufreq_unregister_driver(struct cpufreq_driver *driver)
 	subsys_interface_unregister(&cpufreq_interface);
 	remove_boost_sysfs_file();
 	cpuhp_remove_state_nocalls_cpuslocked(hp_online);
+	cpufreq_put_default_governor();
 
 	write_lock_irqsave(&cpufreq_driver_lock, flags);
 
@@ -2792,4 +2815,5 @@ static int __init cpufreq_core_init(void)
 	return 0;
 }
 module_param(off, int, 0444);
+module_param_string(default_governor, cpufreq_param_governor, CPUFREQ_NAME_LEN, 0444);
 core_initcall(cpufreq_core_init);
-- 
2.27.0.290.gba653c62da-goog


^ permalink raw reply related

* [PATCH 1/2] cpufreq: Register governors at core_initcall
From: Quentin Perret @ 2020-06-15 16:55 UTC (permalink / raw)
  To: rjw, rafael, viresh.kumar
  Cc: juri.lelli, kernel-team, vincent.guittot, arnd, linux-pm, peterz,
	adharmap, qperret, linux-kernel, mingo, paulus, linuxppc-dev,
	tkjos
In-Reply-To: <20200615165554.228063-1-qperret@google.com>

Currently, most CPUFreq governors are registered at core_initcall time
when used as default, and module_init otherwise. In preparation for
letting users specify the default governor on the kernel command line,
change all of them to use core_initcall unconditionally, as is already
the case for schedutil and performance. This will enable us to assume
builtin governors have been registered before the builtin CPUFreq
drivers probe.

And since all governors now have similar init/exit patterns, introduce
two new macros cpufreq_governor_{init,exit}() to factorize the code.

Signed-off-by: Quentin Perret <qperret@google.com>
---
Note: I couldn't boot-test the change to spudemand, by lack of hardware.
But I can confirm cell_defconfig compiles just fine.
---
 .../platforms/cell/cpufreq_spudemand.c        | 26 ++-----------------
 drivers/cpufreq/cpufreq_conservative.c        | 22 ++++------------
 drivers/cpufreq/cpufreq_ondemand.c            | 24 +++++------------
 drivers/cpufreq/cpufreq_performance.c         | 14 ++--------
 drivers/cpufreq/cpufreq_powersave.c           | 18 +++----------
 drivers/cpufreq/cpufreq_userspace.c           | 18 +++----------
 include/linux/cpufreq.h                       | 14 ++++++++++
 kernel/sched/cpufreq_schedutil.c              |  6 +----
 8 files changed, 36 insertions(+), 106 deletions(-)

diff --git a/arch/powerpc/platforms/cell/cpufreq_spudemand.c b/arch/powerpc/platforms/cell/cpufreq_spudemand.c
index 55b31eadb3c8..ca7849e113d7 100644
--- a/arch/powerpc/platforms/cell/cpufreq_spudemand.c
+++ b/arch/powerpc/platforms/cell/cpufreq_spudemand.c
@@ -126,30 +126,8 @@ static struct cpufreq_governor spu_governor = {
 	.stop = spu_gov_stop,
 	.owner = THIS_MODULE,
 };
-
-/*
- * module init and destoy
- */
-
-static int __init spu_gov_init(void)
-{
-	int ret;
-
-	ret = cpufreq_register_governor(&spu_governor);
-	if (ret)
-		printk(KERN_ERR "registration of governor failed\n");
-	return ret;
-}
-
-static void __exit spu_gov_exit(void)
-{
-	cpufreq_unregister_governor(&spu_governor);
-}
-
-
-module_init(spu_gov_init);
-module_exit(spu_gov_exit);
+cpufreq_governor_init(spu_governor);
+cpufreq_governor_exit(spu_governor);
 
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Christian Krafft <krafft@de.ibm.com>");
-
diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
index 737ff3b9c2c0..aa39ff31ec9f 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -322,17 +322,7 @@ static struct dbs_governor cs_governor = {
 	.start = cs_start,
 };
 
-#define CPU_FREQ_GOV_CONSERVATIVE	(&cs_governor.gov)
-
-static int __init cpufreq_gov_dbs_init(void)
-{
-	return cpufreq_register_governor(CPU_FREQ_GOV_CONSERVATIVE);
-}
-
-static void __exit cpufreq_gov_dbs_exit(void)
-{
-	cpufreq_unregister_governor(CPU_FREQ_GOV_CONSERVATIVE);
-}
+#define CPU_FREQ_GOV_CONSERVATIVE	(cs_governor.gov)
 
 MODULE_AUTHOR("Alexander Clouter <alex@digriz.org.uk>");
 MODULE_DESCRIPTION("'cpufreq_conservative' - A dynamic cpufreq governor for "
@@ -343,11 +333,9 @@ MODULE_LICENSE("GPL");
 #ifdef CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE
 struct cpufreq_governor *cpufreq_default_governor(void)
 {
-	return CPU_FREQ_GOV_CONSERVATIVE;
+	return &CPU_FREQ_GOV_CONSERVATIVE;
 }
-
-core_initcall(cpufreq_gov_dbs_init);
-#else
-module_init(cpufreq_gov_dbs_init);
 #endif
-module_exit(cpufreq_gov_dbs_exit);
+
+cpufreq_governor_init(CPU_FREQ_GOV_CONSERVATIVE);
+cpufreq_governor_exit(CPU_FREQ_GOV_CONSERVATIVE);
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index 82a4d37ddecb..ac361a8b1d3b 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -408,7 +408,7 @@ static struct dbs_governor od_dbs_gov = {
 	.start = od_start,
 };
 
-#define CPU_FREQ_GOV_ONDEMAND	(&od_dbs_gov.gov)
+#define CPU_FREQ_GOV_ONDEMAND	(od_dbs_gov.gov)
 
 static void od_set_powersave_bias(unsigned int powersave_bias)
 {
@@ -429,7 +429,7 @@ static void od_set_powersave_bias(unsigned int powersave_bias)
 			continue;
 
 		policy = cpufreq_cpu_get_raw(cpu);
-		if (!policy || policy->governor != CPU_FREQ_GOV_ONDEMAND)
+		if (!policy || policy->governor != &CPU_FREQ_GOV_ONDEMAND)
 			continue;
 
 		policy_dbs = policy->governor_data;
@@ -461,16 +461,6 @@ void od_unregister_powersave_bias_handler(void)
 }
 EXPORT_SYMBOL_GPL(od_unregister_powersave_bias_handler);
 
-static int __init cpufreq_gov_dbs_init(void)
-{
-	return cpufreq_register_governor(CPU_FREQ_GOV_ONDEMAND);
-}
-
-static void __exit cpufreq_gov_dbs_exit(void)
-{
-	cpufreq_unregister_governor(CPU_FREQ_GOV_ONDEMAND);
-}
-
 MODULE_AUTHOR("Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>");
 MODULE_AUTHOR("Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>");
 MODULE_DESCRIPTION("'cpufreq_ondemand' - A dynamic cpufreq governor for "
@@ -480,11 +470,9 @@ MODULE_LICENSE("GPL");
 #ifdef CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND
 struct cpufreq_governor *cpufreq_default_governor(void)
 {
-	return CPU_FREQ_GOV_ONDEMAND;
+	return &CPU_FREQ_GOV_ONDEMAND;
 }
-
-core_initcall(cpufreq_gov_dbs_init);
-#else
-module_init(cpufreq_gov_dbs_init);
 #endif
-module_exit(cpufreq_gov_dbs_exit);
+
+cpufreq_governor_init(CPU_FREQ_GOV_ONDEMAND);
+cpufreq_governor_exit(CPU_FREQ_GOV_ONDEMAND);
diff --git a/drivers/cpufreq/cpufreq_performance.c b/drivers/cpufreq/cpufreq_performance.c
index def9afe0f5b8..71c1d9aba772 100644
--- a/drivers/cpufreq/cpufreq_performance.c
+++ b/drivers/cpufreq/cpufreq_performance.c
@@ -23,16 +23,6 @@ static struct cpufreq_governor cpufreq_gov_performance = {
 	.limits		= cpufreq_gov_performance_limits,
 };
 
-static int __init cpufreq_gov_performance_init(void)
-{
-	return cpufreq_register_governor(&cpufreq_gov_performance);
-}
-
-static void __exit cpufreq_gov_performance_exit(void)
-{
-	cpufreq_unregister_governor(&cpufreq_gov_performance);
-}
-
 #ifdef CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE
 struct cpufreq_governor *cpufreq_default_governor(void)
 {
@@ -50,5 +40,5 @@ MODULE_AUTHOR("Dominik Brodowski <linux@brodo.de>");
 MODULE_DESCRIPTION("CPUfreq policy governor 'performance'");
 MODULE_LICENSE("GPL");
 
-core_initcall(cpufreq_gov_performance_init);
-module_exit(cpufreq_gov_performance_exit);
+cpufreq_governor_init(cpufreq_gov_performance);
+cpufreq_governor_exit(cpufreq_gov_performance);
diff --git a/drivers/cpufreq/cpufreq_powersave.c b/drivers/cpufreq/cpufreq_powersave.c
index 1ae66019eb83..7749522355b5 100644
--- a/drivers/cpufreq/cpufreq_powersave.c
+++ b/drivers/cpufreq/cpufreq_powersave.c
@@ -23,16 +23,6 @@ static struct cpufreq_governor cpufreq_gov_powersave = {
 	.owner		= THIS_MODULE,
 };
 
-static int __init cpufreq_gov_powersave_init(void)
-{
-	return cpufreq_register_governor(&cpufreq_gov_powersave);
-}
-
-static void __exit cpufreq_gov_powersave_exit(void)
-{
-	cpufreq_unregister_governor(&cpufreq_gov_powersave);
-}
-
 MODULE_AUTHOR("Dominik Brodowski <linux@brodo.de>");
 MODULE_DESCRIPTION("CPUfreq policy governor 'powersave'");
 MODULE_LICENSE("GPL");
@@ -42,9 +32,7 @@ struct cpufreq_governor *cpufreq_default_governor(void)
 {
 	return &cpufreq_gov_powersave;
 }
-
-core_initcall(cpufreq_gov_powersave_init);
-#else
-module_init(cpufreq_gov_powersave_init);
 #endif
-module_exit(cpufreq_gov_powersave_exit);
+
+cpufreq_governor_init(cpufreq_gov_powersave);
+cpufreq_governor_exit(cpufreq_gov_powersave);
diff --git a/drivers/cpufreq/cpufreq_userspace.c b/drivers/cpufreq/cpufreq_userspace.c
index b43e7cd502c5..50a4d7846580 100644
--- a/drivers/cpufreq/cpufreq_userspace.c
+++ b/drivers/cpufreq/cpufreq_userspace.c
@@ -126,16 +126,6 @@ static struct cpufreq_governor cpufreq_gov_userspace = {
 	.owner		= THIS_MODULE,
 };
 
-static int __init cpufreq_gov_userspace_init(void)
-{
-	return cpufreq_register_governor(&cpufreq_gov_userspace);
-}
-
-static void __exit cpufreq_gov_userspace_exit(void)
-{
-	cpufreq_unregister_governor(&cpufreq_gov_userspace);
-}
-
 MODULE_AUTHOR("Dominik Brodowski <linux@brodo.de>, "
 		"Russell King <rmk@arm.linux.org.uk>");
 MODULE_DESCRIPTION("CPUfreq policy governor 'userspace'");
@@ -146,9 +136,7 @@ struct cpufreq_governor *cpufreq_default_governor(void)
 {
 	return &cpufreq_gov_userspace;
 }
-
-core_initcall(cpufreq_gov_userspace_init);
-#else
-module_init(cpufreq_gov_userspace_init);
 #endif
-module_exit(cpufreq_gov_userspace_exit);
+
+cpufreq_governor_init(cpufreq_gov_userspace);
+cpufreq_governor_exit(cpufreq_gov_userspace);
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 3494f6763597..e62b022cb07e 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -577,6 +577,20 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy);
 int cpufreq_register_governor(struct cpufreq_governor *governor);
 void cpufreq_unregister_governor(struct cpufreq_governor *governor);
 
+#define cpufreq_governor_init(__governor)			\
+static int __init __governor##_init(void)			\
+{								\
+	return cpufreq_register_governor(&__governor);	\
+}								\
+core_initcall(__governor##_init)
+
+#define cpufreq_governor_exit(__governor)			\
+static void __exit __governor##_exit(void)			\
+{								\
+	return cpufreq_unregister_governor(&__governor);	\
+}								\
+module_exit(__governor##_exit)
+
 struct cpufreq_governor *cpufreq_default_governor(void);
 struct cpufreq_governor *cpufreq_fallback_governor(void);
 
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 7fbaee24c824..402a09af9f43 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -909,11 +909,7 @@ struct cpufreq_governor *cpufreq_default_governor(void)
 }
 #endif
 
-static int __init sugov_register(void)
-{
-	return cpufreq_register_governor(&schedutil_gov);
-}
-core_initcall(sugov_register);
+cpufreq_governor_init(schedutil_gov);
 
 #ifdef CONFIG_ENERGY_MODEL
 extern bool sched_energy_update;
-- 
2.27.0.290.gba653c62da-goog


^ permalink raw reply related

* [PATCH 0/2] cpufreq: Specify the default governor on command line
From: Quentin Perret @ 2020-06-15 16:55 UTC (permalink / raw)
  To: rjw, rafael, viresh.kumar
  Cc: juri.lelli, kernel-team, vincent.guittot, arnd, linux-pm, peterz,
	adharmap, qperret, linux-kernel, mingo, paulus, linuxppc-dev,
	tkjos

This series enables users of prebuilt kernels (e.g. distro kernels) to
specify their CPUfreq governor of choice using the kernel command line,
instead of having to wait for the system to fully boot to userspace to
switch using the sysfs interface. This is helpful for 2 reasons:
  1. users get to choose the governor that runs during the actual boot;
  2. it simplifies the userspace boot procedure a bit (one less thing to
     worry about).

To enable this, the first patch moves all governor init calls to
core_initcall, to make sure they are registered by the time the drivers
probe. This should be relatively low impact as registering a governor
is a simple procedure (it gets added to a llist), and all governors
already load at core_initcall anyway when they're set as the default
in Kconfig. This also allows to clean-up the governors' init/exit code,
and reduces boilerplate.

The second patch introduces the new command line parameter, inspired by
its cpuidle counterpart. More details can be found in the respective
patch headers.

Feedback is very much welcome :-)

Thanks,
Quentin

Quentin Perret (2):
  cpufreq: Register governors at core_initcall
  cpufreq: Specify default governor on command line

 .../admin-guide/kernel-parameters.txt         |  5 +++
 Documentation/admin-guide/pm/cpufreq.rst      |  6 ++--
 .../platforms/cell/cpufreq_spudemand.c        | 26 ++------------
 drivers/cpufreq/cpufreq.c                     | 34 ++++++++++++++++---
 drivers/cpufreq/cpufreq_conservative.c        | 22 +++---------
 drivers/cpufreq/cpufreq_ondemand.c            | 24 ++++---------
 drivers/cpufreq/cpufreq_performance.c         | 14 ++------
 drivers/cpufreq/cpufreq_powersave.c           | 18 ++--------
 drivers/cpufreq/cpufreq_userspace.c           | 18 ++--------
 include/linux/cpufreq.h                       | 14 ++++++++
 kernel/sched/cpufreq_schedutil.c              |  6 +---
 11 files changed, 73 insertions(+), 114 deletions(-)

-- 
2.27.0.290.gba653c62da-goog


^ permalink raw reply

* [PATCH 17/25] mm/powerpc: Use mm_fault_accounting()
From: Peter Xu @ 2020-06-15 22:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrea Arcangeli, linuxppc-dev, peterx, Paul Mackerras,
	Andrew Morton, Linus Torvalds, Gerald Schaefer
In-Reply-To: <20200615221607.7764-1-peterx@redhat.com>

Use the new mm_fault_accounting() helper for page fault accounting.

cmo_account_page_fault() is special.  Keep that.

CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 arch/powerpc/mm/fault.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 84af6c8eecf7..6043b639ae42 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -481,8 +481,6 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	if (!arch_irq_disabled_regs(regs))
 		local_irq_enable();
 
-	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
-
 	if (error_code & DSISR_KEYFAULT)
 		return bad_key_fault_exception(regs, address,
 					       get_mm_addr_key(mm, address));
@@ -604,14 +602,11 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	/*
 	 * Major/minor page fault accounting.
 	 */
-	if (major) {
-		current->maj_flt++;
-		perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address);
+	if (major)
 		cmo_account_page_fault();
-	} else {
-		current->min_flt++;
-		perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address);
-	}
+
+	mm_fault_accounting(current, regs, address, major);
+
 	return 0;
 }
 NOKPROBE_SYMBOL(__do_page_fault);
-- 
2.26.2


^ permalink raw reply related

* [Bug 208197] OF: /pci@f2000000/mac-io@17/gpio@50/...: could not find phandle
From: bugzilla-daemon @ 2020-06-15 22:03 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <bug-208197-206035@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=208197

--- Comment #1 from Erhard F. (erhard_f@mailbox.org) ---
Created attachment 289687
  --> https://bugzilla.kernel.org/attachment.cgi?id=289687&action=edit
kernel .config (5.8-rc1, PowerMac G4 DP)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* [Bug 208197] New: OF: /pci@f2000000/mac-io@17/gpio@50/...: could not find phandle
From: bugzilla-daemon @ 2020-06-15 22:02 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=208197

            Bug ID: 208197
           Summary: OF: /pci@f2000000/mac-io@17/gpio@50/...: could not
                    find phandle
           Product: Platform Specific/Hardware
           Version: 2.5
    Kernel Version: 5.8-rc1
          Hardware: PPC-32
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: PPC-32
          Assignee: platform_ppc-32@kernel-bugs.osdl.org
          Reporter: erhard_f@mailbox.org
        Regression: Yes

Created attachment 289685
  --> https://bugzilla.kernel.org/attachment.cgi?id=289685&action=edit
dmesg (5.8-rc1, PowerMac G4 DP)

Since 5.8-rc1 I get these OF errors on my PowerMac G4 DP:

[...]
T600 kernel: OF: /pci@f2000000/mac-io@17/gpio@50/gpio5@6f: could not find
phandle
T600 kernel: OF: /pci@f2000000/mac-io@17/gpio@50/extint-gpio15@67: could not
find phandle
T600 kernel: OF: /pci@f2000000/mac-io@17/gpio@50/gpio6@70: could not find
phandle
T600 kernel: OF: /pci@f2000000/mac-io@17/gpio@50/extint-gpio16@68: could not
find phandle
T600 kernel: OF: /pci@f2000000/mac-io@17/gpio@50/extint-gpio14@66: could not
find phandle
T600 kernel: OF: /pci@f2000000/mac-io@17/gpio@50/gpio12@76: could not find
phandle
T600 kernel: OF: /pci@f2000000/mac-io@17/gpio@50/gpio11@75: could not find
phandle
T600 kernel: OF: /pci@f2000000/mac-io@17/gpio@50/gpio5@6f: could not find
phandle
T600 kernel: OF: /pci@f2000000/mac-io@17/gpio@50/gpio6@70: could not find
phandle
T600 kernel: OF: /pci@f2000000/mac-io@17/gpio@50/extint-gpio4@5c: could not
find phandle
T600 kernel: OF: /pci@f2000000/mac-io@17/gpio@50/gpio11@75: could not find
phandle
T600 kernel: OF: /pci@f2000000/mac-io@17/gpio@50/extint-gpio15@67: could not
find phandle[...]

This has not been the case in kernel 5.7 or earlier.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
From: Peter Zijlstra @ 2020-06-15 21:53 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Dave Hansen, H Peter Anvin, Dave Jiang, Ashok Raj, Joerg Roedel,
	x86, amd-gfx, Ingo Molnar, Ravi V Shankar, Yu-cheng Yu,
	Andrew Donnellan, Borislav Petkov, Sohil Mehta, Thomas Gleixner,
	Tony Luck, linuxppc-dev, Felix Kuehling, linux-kernel, iommu,
	Jacob Jun Pan, Frederic Barrat, David Woodhouse, Lu Baolu
In-Reply-To: <20200615201735.GE13792@romley-ivt3.sc.intel.com>

On Mon, Jun 15, 2020 at 01:17:35PM -0700, Fenghua Yu wrote:
> Hi, Peter,
> 
> On Mon, Jun 15, 2020 at 09:09:28PM +0200, Peter Zijlstra wrote:
> > On Mon, Jun 15, 2020 at 11:55:29AM -0700, Fenghua Yu wrote:
> > 
> > > Or do you suggest to add a random new flag in struct thread_info instead
> > > of a TIF flag?
> > 
> > Why thread_info? What's wrong with something simple like the below. It
> > takes a bit from the 'strictly current' flags word.
> > 
> > 
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index b62e6aaf28f0..fca830b97055 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -801,6 +801,9 @@ struct task_struct {
> >  	/* Stalled due to lack of memory */
> >  	unsigned			in_memstall:1;
> >  #endif
> > +#ifdef CONFIG_PCI_PASID
> > +	unsigned			has_valid_pasid:1;
> > +#endif
> >  
> >  	unsigned long			atomic_flags; /* Flags requiring atomic access. */
> >  
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 142b23645d82..10b3891be99e 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -955,6 +955,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
> >  	tsk->use_memdelay = 0;
> >  #endif
> >  
> > +#ifdef CONFIG_PCI_PASID
> > +	tsk->has_valid_pasid = 0;
> > +#endif
> > +
> >  #ifdef CONFIG_MEMCG
> >  	tsk->active_memcg = NULL;
> >  #endif
> 
> The PASID MSR is x86 specific although PASID is PCIe concept and per-mm.
> Checking if the MSR has valid PASID (bit31=1) is an x86 specifc work.
> The flag should be cleared in cloned()/forked() and is only set and
> read in fixup() in x86 #GP for heuristic. It's not used anywhere outside
> of x86.
> 
> That's why we think the flag should be in x86 struct thread_info instead
> of in generice struct task_struct.

I don't think anybody really cares, it's just one bit, we have plenty
left.

On x86_64 there's a u32 sized alignment hole in thread_info, also we
don't use the upper 32bit of thread_info::flags, however using those
would still mean you have to use atomic ops, which you really don't
need.



^ permalink raw reply

* RE: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
From: Luck, Tony @ 2020-06-15 21:24 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Shankar, Ravi V, Peter Zijlstra, Hansen, Dave, H Peter Anvin,
	Jiang, Dave, Raj, Ashok, Joerg Roedel, x86, amd-gfx, Ingo Molnar,
	Yu, Fenghua, Yu,  Yu-cheng, Andrew Donnellan, Borislav Petkov,
	Mehta, Sohil, Thomas Gleixner, linuxppc-dev, Felix Kuehling,
	linux-kernel, iommu@lists.linux-foundation.org, Pan, Jacob jun,
	Frederic Barrat, David Woodhouse, Lu Baolu
In-Reply-To: <E39A5DE2-5615-41FF-9953-4F4C4E8499D8@amacapital.net>

> So what’s the RDMSR for?  Surely you
> have some state somewhere that says “this task has a PASID.”
> Can’t you just make sure that stays in sync with the MSR?  Then, on #GP, if the task already has a PASID, you know the MSR is set.

We have state that says the process ("mm") has been allocated a PASID. But not for each task.

E.g. a process may clone a bunch of tasks, then one of them opens a device that needs
a PASID.   We did internally have patches to go hunt down all those other tasks and
force a PASID onto each. But the code was big and ugly. Also maybe the wrong thing
to do if those threads didn't ever need to access the device.

PeterZ suggested that we can add a bit to the task structure for this purpose.

Fenghua is hesitant about adding an x86 only bit there.

-Tony

^ permalink raw reply

* Re: [PATCH v13 2/6] seq_buf: Export seq_buf_printf
From: Steven Rostedt @ 2020-06-15 21:24 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Santosh Sivaraj, Ira Weiny, linux-nvdimm, Aneesh Kumar K . V,
	Cezary Rojewski, linux-kernel, Piotr Maziarz, Christoph Hellwig,
	Oliver O'Halloran, Vaibhav Jain, Dan Williams, linuxppc-dev
In-Reply-To: <20200615125552.GI14668@zn.tnic>

On Mon, 15 Jun 2020 14:55:52 +0200
Borislav Petkov <bp@alien8.de> wrote:

> Can you please resend your patchset once a week like everyone else and
> not flood inboxes with it?

Boris,

Haven't you automated your inbox yet? I have patchwork reading my INBOX
and it's smart enough to understand new series, and the old one simply
disappears as "superseded".

Email is just the wires to our infrastructure. Get with the times man! ;-)

-- Steve

^ permalink raw reply

* Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
From: Andy Lutomirski @ 2020-06-15 21:18 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Shankar, Ravi V, Peter Zijlstra, Hansen, Dave, H Peter Anvin,
	Jiang, Dave, Raj, Ashok, Joerg Roedel, x86, amd-gfx, Ingo Molnar,
	Yu, Fenghua, Yu, Yu-cheng, Andrew Donnellan, Borislav Petkov,
	Mehta, Sohil, Thomas Gleixner, linuxppc-dev, Felix Kuehling,
	linux-kernel, iommu@lists.linux-foundation.org, Pan, Jacob jun,
	Frederic Barrat, David Woodhouse, Lu Baolu
In-Reply-To: <3908561D78D1C84285E8C5FCA982C28F7F66C849@ORSMSX115.amr.corp.intel.com>



> On Jun 15, 2020, at 1:56 PM, Luck, Tony <tony.luck@intel.com> wrote:
> 
> 
>> 
>> Are we planning to keep PASID live once a task has used it once or are we going to swap it lazily?  If the latter, a percpu variable might be better.
> 
> Current plan is "touch it once and the task owns it until exit(2)"
> 
> Maybe someday in the future when we have data on how applications
> actually use accelerators we could look at something more complex
> if usage patterns look like it would be beneficial.
> 
> 

So what’s the RDMSR for?  Surely you
have some state somewhere that says “this task has a PASID.”  Can’t you just make sure that stays in sync with the MSR?  Then, on #GP, if the task already has a PASID, you know the MSR is set.

^ permalink raw reply

* RE: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
From: Luck, Tony @ 2020-06-15 20:56 UTC (permalink / raw)
  To: Andy Lutomirski, Yu, Fenghua
  Cc: Peter Zijlstra, Hansen, Dave, H Peter Anvin, Jiang, Dave,
	Raj, Ashok, Joerg Roedel, x86, amd-gfx, Ingo Molnar,
	Shankar, Ravi V, Yu,  Yu-cheng, Andrew Donnellan, Borislav Petkov,
	Mehta, Sohil, Thomas Gleixner, linuxppc-dev, Felix Kuehling,
	linux-kernel, iommu@lists.linux-foundation.org, Pan, Jacob jun,
	Frederic Barrat, David Woodhouse, Lu Baolu
In-Reply-To: <C41D099B-BE2E-4D0E-A7B5-7CE587E11930@amacapital.net>

> Are we planning to keep PASID live once a task has used it once or are we going to swap it lazily?  If the latter, a percpu variable might be better.

Current plan is "touch it once and the task owns it until exit(2)"

Maybe someday in the future when we have data on how applications
actually use accelerators we could look at something more complex
if usage patterns look like it would be beneficial.

-Tony

^ permalink raw reply

* [Bug 206203] kmemleak reports various leaks in drivers/of/unittest.c
From: bugzilla-daemon @ 2020-06-15 20:53 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <bug-206203-206035@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=206203

Erhard F. (erhard_f@mailbox.org) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #287675|0                           |1
        is obsolete|                            |

--- Comment #15 from Erhard F. (erhard_f@mailbox.org) ---
Created attachment 289679
  --> https://bugzilla.kernel.org/attachment.cgi?id=289679&action=edit
kernel .config (kernel 5.8-rc1, PowerMac G5 11,2)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
From: Andy Lutomirski @ 2020-06-15 20:51 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Peter Zijlstra, Dave Hansen, H Peter Anvin, Dave Jiang, Ashok Raj,
	Joerg Roedel, x86, amd-gfx, Ingo Molnar, Ravi V Shankar,
	Yu-cheng Yu, Andrew Donnellan, Borislav Petkov, Sohil Mehta,
	Thomas Gleixner, Tony Luck, linuxppc-dev, Felix Kuehling,
	linux-kernel, iommu, Jacob Jun Pan, Frederic Barrat,
	David Woodhouse, Lu Baolu
In-Reply-To: <20200615201735.GE13792@romley-ivt3.sc.intel.com>


> On Jun 15, 2020, at 1:17 PM, Fenghua Yu <fenghua.yu@intel.com> wrote:
> 
> Hi, Peter,
> 
>> On Mon, Jun 15, 2020 at 09:09:28PM +0200, Peter Zijlstra wrote:
>>> On Mon, Jun 15, 2020 at 11:55:29AM -0700, Fenghua Yu wrote:
>>> 
>>> Or do you suggest to add a random new flag in struct thread_info instead
>>> of a TIF flag?
>> 
>> Why thread_info? What's wrong with something simple like the below. It
>> takes a bit from the 'strictly current' flags word.
>> 
>> 
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index b62e6aaf28f0..fca830b97055 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -801,6 +801,9 @@ struct task_struct {
>>    /* Stalled due to lack of memory */
>>    unsigned            in_memstall:1;
>> #endif
>> +#ifdef CONFIG_PCI_PASID
>> +    unsigned            has_valid_pasid:1;
>> +#endif
>> 
>>    unsigned long            atomic_flags; /* Flags requiring atomic access. */
>> 
>> diff --git a/kernel/fork.c b/kernel/fork.c
>> index 142b23645d82..10b3891be99e 100644
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -955,6 +955,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
>>    tsk->use_memdelay = 0;
>> #endif
>> 
>> +#ifdef CONFIG_PCI_PASID
>> +    tsk->has_valid_pasid = 0;
>> +#endif
>> +
>> #ifdef CONFIG_MEMCG
>>    tsk->active_memcg = NULL;
>> #endif
> 
> The PASID MSR is x86 specific although PASID is PCIe concept and per-mm.
> Checking if the MSR has valid PASID (bit31=1) is an x86 specifc work.
> The flag should be cleared in cloned()/forked() and is only set and
> read in fixup() in x86 #GP for heuristic. It's not used anywhere outside
> of x86.
> 
> That's why we think the flag should be in x86 struct thread_info instead
> of in generice struct task_struct.
> 

Are we planning to keep PASID live once a task has used it once or are we going to swap it lazily?  If the latter, a percpu variable might be better.

> Please advice.
> 
> Thanks.
> 
> -Fenghua

^ permalink raw reply

* [Bug 206203] kmemleak reports various leaks in drivers/of/unittest.c
From: bugzilla-daemon @ 2020-06-15 20:45 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <bug-206203-206035@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=206203

Erhard F. (erhard_f@mailbox.org) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #287673|0                           |1
        is obsolete|                            |
 Attachment #288567|0                           |1
        is obsolete|                            |

--- Comment #14 from Erhard F. (erhard_f@mailbox.org) ---
Created attachment 289677
  --> https://bugzilla.kernel.org/attachment.cgi?id=289677&action=edit
dmesg (kernel 5.8-rc1, PowerMac G5 11,2)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* [Bug 206203] kmemleak reports various leaks in drivers/of/unittest.c
From: bugzilla-daemon @ 2020-06-15 20:43 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <bug-206203-206035@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=206203

Erhard F. (erhard_f@mailbox.org) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #287671|0                           |1
        is obsolete|                            |
 Attachment #288565|0                           |1
        is obsolete|                            |

--- Comment #13 from Erhard F. (erhard_f@mailbox.org) ---
Created attachment 289675
  --> https://bugzilla.kernel.org/attachment.cgi?id=289675&action=edit
kmemleak output (kernel 5.8-rc1, PowerMac G5 11,2)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* Re: [PATCH net v2] ibmvnic: Harden device login requests
From: David Miller @ 2020-06-15 20:18 UTC (permalink / raw)
  To: tlfalcon; +Cc: netdev, linuxppc-dev, danymadden
In-Reply-To: <1592234963-9535-1-git-send-email-tlfalcon@linux.ibm.com>

From: Thomas Falcon <tlfalcon@linux.ibm.com>
Date: Mon, 15 Jun 2020 10:29:23 -0500

> The VNIC driver's "login" command sequence is the final step
> in the driver's initialization process with device firmware,
> confirming the available device queue resources to be utilized
> by the driver. Under high system load, firmware may not respond
> to the request in a timely manner or may abort the request. In
> such cases, the driver should reattempt the login command
> sequence. In case of a device error, the number of retries
> is bounded.
> 
> Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
> ---
> v2: declare variables in Reverse Christmas tree format

Applied, thanks.

^ permalink raw reply

* Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
From: Fenghua Yu @ 2020-06-15 20:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dave Hansen, H Peter Anvin, Dave Jiang, Ashok Raj, Joerg Roedel,
	x86, amd-gfx, Ingo Molnar, Ravi V Shankar, Yu-cheng Yu,
	Andrew Donnellan, Borislav Petkov, Sohil Mehta, Thomas Gleixner,
	Tony Luck, linuxppc-dev, Felix Kuehling, linux-kernel, iommu,
	Jacob Jun Pan, Frederic Barrat, David Woodhouse, Lu Baolu
In-Reply-To: <20200615190928.GJ2531@hirez.programming.kicks-ass.net>

Hi, Peter,

On Mon, Jun 15, 2020 at 09:09:28PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 15, 2020 at 11:55:29AM -0700, Fenghua Yu wrote:
> 
> > Or do you suggest to add a random new flag in struct thread_info instead
> > of a TIF flag?
> 
> Why thread_info? What's wrong with something simple like the below. It
> takes a bit from the 'strictly current' flags word.
> 
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index b62e6aaf28f0..fca830b97055 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -801,6 +801,9 @@ struct task_struct {
>  	/* Stalled due to lack of memory */
>  	unsigned			in_memstall:1;
>  #endif
> +#ifdef CONFIG_PCI_PASID
> +	unsigned			has_valid_pasid:1;
> +#endif
>  
>  	unsigned long			atomic_flags; /* Flags requiring atomic access. */
>  
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 142b23645d82..10b3891be99e 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -955,6 +955,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
>  	tsk->use_memdelay = 0;
>  #endif
>  
> +#ifdef CONFIG_PCI_PASID
> +	tsk->has_valid_pasid = 0;
> +#endif
> +
>  #ifdef CONFIG_MEMCG
>  	tsk->active_memcg = NULL;
>  #endif

The PASID MSR is x86 specific although PASID is PCIe concept and per-mm.
Checking if the MSR has valid PASID (bit31=1) is an x86 specifc work.
The flag should be cleared in cloned()/forked() and is only set and
read in fixup() in x86 #GP for heuristic. It's not used anywhere outside
of x86.

That's why we think the flag should be in x86 struct thread_info instead
of in generice struct task_struct.

Please advice.

Thanks.

-Fenghua

^ permalink raw reply

* Re: [PATCH 2/6] exec: simplify the compat syscall handling
From: Brian Gerst @ 2020-06-15 19:45 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arch, linux-s390, Parisc List, the arch/x86 maintainers,
	open list:BROADCOM NVRAM DRIVER, linux-kernel@vger.kernel.org,
	Linux FS-devel Mailing List, Luis Chamberlain, Al Viro,
	sparclinux, linuxppc-dev, Christoph Hellwig, Linux ARM
In-Reply-To: <CAK8P3a3q-hC7kZwLPbc+E5VYcqthQS4J4Rqo+rKkBRU2n071XA@mail.gmail.com>

On Mon, Jun 15, 2020 at 2:47 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Mon, Jun 15, 2020 at 4:48 PM Brian Gerst <brgerst@gmail.com> wrote:
> > On Mon, Jun 15, 2020 at 10:13 AM Christoph Hellwig <hch@lst.de> wrote:
> > > On Mon, Jun 15, 2020 at 03:31:35PM +0200, Arnd Bergmann wrote:
>
> > >
> > > I'd rather keep it in common code as that allows all the low-level
> > > exec stuff to be marked static, and avoid us growing new pointless
> > > compat variants through copy and paste.
> > > smart compiler to d
> > >
> > > > I don't really understand
> > > > the comment, why can't this just use this?
> > >
> > > That errors out with:
> > >
> > > ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1040): undefined reference to
> > > `__x32_sys_execve'
> > > ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1108): undefined reference to
> > > `__x32_sys_execveat'
> > > make: *** [Makefile:1139: vmlinux] Error 1
> >
> > I think I have a fix for this, by modifying the syscall wrappers to
> > add an alias for the __x32 variant to the native __x64_sys_foo().
> > I'll get back to you with a patch.
>
> Do we actually need the __x32 prefix any more, or could we just
> change all x32 specific calls to use __x64_compat_sys_foo()?

I suppose that would work too.  The prefix really describes the
register mapping.

--
Brian Gerst

^ permalink raw reply

* Re: [PATCH v13 2/6] seq_buf: Export seq_buf_printf
From: Dan Williams @ 2020-06-15 19:40 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Santosh Sivaraj, Ira Weiny, linux-nvdimm, Cezary Rojewski,
	Linux Kernel Mailing List, Steven Rostedt, Piotr Maziarz,
	Christoph Hellwig, Oliver O'Halloran, Aneesh Kumar K . V,
	Vaibhav Jain, linuxppc-dev
In-Reply-To: <20200615125552.GI14668@zn.tnic>

On Mon, Jun 15, 2020 at 5:56 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Mon, Jun 15, 2020 at 06:14:03PM +0530, Vaibhav Jain wrote:
> > 'seq_buf' provides a very useful abstraction for writing to a string
> > buffer without needing to worry about it over-flowing. However even
> > though the API has been stable for couple of years now its still not
> > exported to kernel loadable modules limiting its usage.
> >
> > Hence this patch proposes update to 'seq_buf.c' to mark
> > seq_buf_printf() which is part of the seq_buf API to be exported to
> > kernel loadable GPL modules. This symbol will be used in later parts
> > of this patch-set to simplify content creation for a sysfs attribute.
> >
> > Cc: Piotr Maziarz <piotrx.maziarz@linux.intel.com>
> > Cc: Cezary Rojewski <cezary.rojewski@intel.com>
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> > Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> > ---
> > Changelog:
> >
> > v12..v13:
> > * None
> >
> > v11..v12:
> > * None
>
> Can you please resend your patchset once a week like everyone else and
> not flood inboxes with it?

Hi Boris,

I gave Vaibhav some long shot hope that his series could be included
in my libnvdimm pull request for -rc1. Save for a last minute clang
report that I misread as a gcc warning, I likely would have included.
This spin is looking to address the last of the comments I had and
something I would consider for -rc2. So, in this case the resends were
requested by me and I'll take the grumbles on Vaibhav's behalf.

^ permalink raw reply

* Re: [PATCH v6 0/2] Append new variables to vmcoreinfo (TCR_EL1.T1SZ for arm64 and MAX_PHYSMEM_BITS for all archs)
From: Bhupesh Sharma @ 2020-06-15 19:11 UTC (permalink / raw)
  To: linux-arm-kernel, x86, Catalin Marinas, Will Deacon
  Cc: Mark Rutland, John Donnelly, Ard Biesheuvel, Jonathan Corbet,
	Steve Capper, Linux Doc Mailing List, linuxppc-dev,
	Linux Kernel Mailing List, Scott Branden, Kazuhito Hagio,
	James Morse, kexec mailing list, Amit Kachhap, Boris Petkov,
	Thomas Gleixner, Bhupesh SHARMA, Dave Anderson, Ingo Molnar,
	Paul Mackerras
In-Reply-To: <CACi5LpMKSNz=_OQWmEQ2kaswbjAONjn2pXQiu=jCA=wMt3wGCQ@mail.gmail.com>

Hello Catalin, Will,

On Tue, Jun 2, 2020 at 10:54 AM Bhupesh Sharma <bhsharma@redhat.com> wrote:
>
> Hello,
>
> On Thu, May 14, 2020 at 12:22 AM Bhupesh Sharma <bhsharma@redhat.com> wrote:
> >
> > Apologies for the delayed update. Its been quite some time since I
> > posted the last version (v5), but I have been really caught up in some
> > other critical issues.
> >
> > Changes since v5:
> > ----------------
> > - v5 can be viewed here:
> >   http://lists.infradead.org/pipermail/kexec/2019-November/024055.html
> > - Addressed review comments from James Morse and Boris.
> > - Added Tested-by received from John on v5 patchset.
> > - Rebased against arm64 (for-next/ptr-auth) branch which has Amit's
> >   patchset for ARMv8.3-A Pointer Authentication feature vmcoreinfo
> >   applied.
> >
> > Changes since v4:
> > ----------------
> > - v4 can be seen here:
> >   http://lists.infradead.org/pipermail/kexec/2019-November/023961.html
> > - Addressed comments from Dave and added patches for documenting
> >   new variables appended to vmcoreinfo documentation.
> > - Added testing report shared by Akashi for PATCH 2/5.
> >
> > Changes since v3:
> > ----------------
> > - v3 can be seen here:
> >   http://lists.infradead.org/pipermail/kexec/2019-March/022590.html
> > - Addressed comments from James and exported TCR_EL1.T1SZ in vmcoreinfo
> >   instead of PTRS_PER_PGD.
> > - Added a new patch (via [PATCH 3/3]), which fixes a simple typo in
> >   'Documentation/arm64/memory.rst'
> >
> > Changes since v2:
> > ----------------
> > - v2 can be seen here:
> >   http://lists.infradead.org/pipermail/kexec/2019-March/022531.html
> > - Protected 'MAX_PHYSMEM_BITS' vmcoreinfo variable under CONFIG_SPARSEMEM
> >   ifdef sections, as suggested by Kazu.
> > - Updated vmcoreinfo documentation to add description about
> >   'MAX_PHYSMEM_BITS' variable (via [PATCH 3/3]).
> >
> > Changes since v1:
> > ----------------
> > - v1 was sent out as a single patch which can be seen here:
> >   http://lists.infradead.org/pipermail/kexec/2019-February/022411.html
> >
> > - v2 breaks the single patch into two independent patches:
> >   [PATCH 1/2] appends 'PTRS_PER_PGD' to vmcoreinfo for arm64 arch, whereas
> >   [PATCH 2/2] appends 'MAX_PHYSMEM_BITS' to vmcoreinfo in core kernel code (all archs)
> >
> > This patchset primarily fixes the regression reported in user-space
> > utilities like 'makedumpfile' and 'crash-utility' on arm64 architecture
> > with the availability of 52-bit address space feature in underlying
> > kernel. These regressions have been reported both on CPUs which don't
> > support ARMv8.2 extensions (i.e. LVA, LPA) and are running newer kernels
> > and also on prototype platforms (like ARMv8 FVP simulator model) which
> > support ARMv8.2 extensions and are running newer kernels.
> >
> > The reason for these regressions is that right now user-space tools
> > have no direct access to these values (since these are not exported
> > from the kernel) and hence need to rely on a best-guess method of
> > determining value of 'vabits_actual' and 'MAX_PHYSMEM_BITS' supported
> > by underlying kernel.
> >
> > Exporting these values via vmcoreinfo will help user-land in such cases.
> > In addition, as per suggestion from makedumpfile maintainer (Kazu),
> > it makes more sense to append 'MAX_PHYSMEM_BITS' to
> > vmcoreinfo in the core code itself rather than in arm64 arch-specific
> > code, so that the user-space code for other archs can also benefit from
> > this addition to the vmcoreinfo and use it as a standard way of
> > determining 'SECTIONS_SHIFT' value in user-land.
> >
> > Cc: Boris Petkov <bp@alien8.de>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: James Morse <james.morse@arm.com>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Steve Capper <steve.capper@arm.com>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > Cc: Michael Ellerman <mpe@ellerman.id.au>
> > Cc: Paul Mackerras <paulus@samba.org>
> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Cc: Dave Anderson <anderson@redhat.com>
> > Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
> > Cc: John Donnelly <john.p.donnelly@oracle.com>
> > Cc: scott.branden@broadcom.com
> > Cc: Amit Kachhap <amit.kachhap@arm.com>
> > Cc: x86@kernel.org
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux-doc@vger.kernel.org
> > Cc: kexec@lists.infradead.org
> >
> > Bhupesh Sharma (2):
> >   crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo
> >   arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo
> >
> >  Documentation/admin-guide/kdump/vmcoreinfo.rst | 16 ++++++++++++++++
> >  arch/arm64/include/asm/pgtable-hwdef.h         |  1 +
> >  arch/arm64/kernel/crash_core.c                 | 10 ++++++++++
> >  kernel/crash_core.c                            |  1 +
> >  4 files changed, 28 insertions(+)
>
> Ping. @James Morse , Others
>
> Please share if you have some comments regarding this patchset.

Ping. I think we have two Tested-by available from Oracle and Marvell
folks on this patchset and no further review-comments.
Can you please help review/pick this patchset via the arm64 tree?

User-space utilities like makedumpfile and crash have been broken
since 52-bit VA addressing was enabled on arm64 kernel, so distros are
obliged to carry downstream-only fixes for these user-space utilities
to make them work with the kernel which support 52-bit VA addressing
on arm64.

Thanks,
Bhupesh


^ permalink raw reply

* Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
From: Peter Zijlstra @ 2020-06-15 19:09 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Dave Hansen, H Peter Anvin, Dave Jiang, Ashok Raj, Joerg Roedel,
	x86, amd-gfx, Ingo Molnar, Ravi V Shankar, Yu-cheng Yu,
	Andrew Donnellan, Borislav Petkov, Sohil Mehta, Thomas Gleixner,
	Tony Luck, linuxppc-dev, Felix Kuehling, linux-kernel, iommu,
	Jacob Jun Pan, Frederic Barrat, David Woodhouse, Lu Baolu
In-Reply-To: <20200615185529.GD13792@romley-ivt3.sc.intel.com>

On Mon, Jun 15, 2020 at 11:55:29AM -0700, Fenghua Yu wrote:

> Or do you suggest to add a random new flag in struct thread_info instead
> of a TIF flag?

Why thread_info? What's wrong with something simple like the below. It
takes a bit from the 'strictly current' flags word.


diff --git a/include/linux/sched.h b/include/linux/sched.h
index b62e6aaf28f0..fca830b97055 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -801,6 +801,9 @@ struct task_struct {
 	/* Stalled due to lack of memory */
 	unsigned			in_memstall:1;
 #endif
+#ifdef CONFIG_PCI_PASID
+	unsigned			has_valid_pasid:1;
+#endif
 
 	unsigned long			atomic_flags; /* Flags requiring atomic access. */
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 142b23645d82..10b3891be99e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -955,6 +955,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
 	tsk->use_memdelay = 0;
 #endif
 
+#ifdef CONFIG_PCI_PASID
+	tsk->has_valid_pasid = 0;
+#endif
+
 #ifdef CONFIG_MEMCG
 	tsk->active_memcg = NULL;
 #endif

^ permalink raw reply related

* Re: [PATCH 00/17] spelling.txt: /decriptors/descriptors/
From: Jason Gunthorpe @ 2020-06-15 18:58 UTC (permalink / raw)
  To: Kieran Bingham
  Cc: linux-scsi, linux-pm, linux-rdma, netdev, linux-usb,
	linux-wireless, Kieran Bingham, dri-devel, linux-kernel,
	linux-renesas-soc, linux-gpio, linux-mtd, ath10k, linux-input,
	virtualization, linuxppc-dev, linux-mm, linux-arm-kernel
In-Reply-To: <20200609124610.3445662-1-kieran.bingham+renesas@ideasonboard.com>

On Tue, Jun 09, 2020 at 01:45:53PM +0100, Kieran Bingham wrote:
> I wouldn't normally go through spelling fixes, but I caught sight of
> this typo twice, and then foolishly grepped the tree for it, and saw how
> pervasive it was.
> 
> so here I am ... fixing a typo globally... but with an addition in
> scripts/spelling.txt so it shouldn't re-appear ;-)
> 
> Cc: linux-arm-kernel@lists.infradead.org (moderated list:TI DAVINCI MACHINE SUPPORT)
> Cc: linux-kernel@vger.kernel.org (open list)
> Cc: linux-pm@vger.kernel.org (open list:DEVICE FREQUENCY EVENT (DEVFREQ-EVENT))
> Cc: linux-gpio@vger.kernel.org (open list:GPIO SUBSYSTEM)
> Cc: dri-devel@lists.freedesktop.org (open list:DRM DRIVERS)
> Cc: linux-rdma@vger.kernel.org (open list:HFI1 DRIVER)
> Cc: linux-input@vger.kernel.org (open list:INPUT (KEYBOARD, MOUSE, JOYSTICK, TOUCHSCREEN)...)
> Cc: linux-mtd@lists.infradead.org (open list:NAND FLASH SUBSYSTEM)
> Cc: netdev@vger.kernel.org (open list:NETWORKING DRIVERS)
> Cc: ath10k@lists.infradead.org (open list:QUALCOMM ATHEROS ATH10K WIRELESS DRIVER)
> Cc: linux-wireless@vger.kernel.org (open list:NETWORKING DRIVERS (WIRELESS))
> Cc: linux-scsi@vger.kernel.org (open list:IBM Power Virtual FC Device Drivers)
> Cc: linuxppc-dev@lists.ozlabs.org (open list:LINUX FOR POWERPC (32-BIT AND 64-BIT))
> Cc: linux-usb@vger.kernel.org (open list:USB SUBSYSTEM)
> Cc: virtualization@lists.linux-foundation.org (open list:VIRTIO CORE AND NET DRIVERS)
> Cc: linux-mm@kvack.org (open list:MEMORY MANAGEMENT)
> 
> 
> Kieran Bingham (17):
>   arch: arm: mach-davinci: Fix trivial spelling
>   drivers: infiniband: Fix trivial spelling
>   drivers: infiniband: Fix trivial spelling

I took these two RDMA patches and merged them, thanks

Jason

^ permalink raw reply

* Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
From: Fenghua Yu @ 2020-06-15 18:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dave Hansen, H Peter Anvin, Dave Jiang, Ashok Raj, Joerg Roedel,
	x86, amd-gfx, Ingo Molnar, Ravi V Shankar, Yu-cheng Yu,
	Andrew Donnellan, Borislav Petkov, Sohil Mehta, Thomas Gleixner,
	Tony Luck, linuxppc-dev, Felix Kuehling, linux-kernel, iommu,
	Jacob Jun Pan, Frederic Barrat, David Woodhouse, Lu Baolu
In-Reply-To: <20200615183116.GD2531@hirez.programming.kicks-ass.net>

Hi, Peter,

On Mon, Jun 15, 2020 at 08:31:16PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 15, 2020 at 11:12:59AM -0700, Fenghua Yu wrote:
> > > I don't get why you need a rdmsr here, or why not having one would
> > > require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed?
> > 
> > My concern is TIF flags are precious (only 3 slots available). Defining
> > a new TIF flag may be not worth it while rdmsr() can check if PASID
> > is valid in the MSR. And performance here might not be a big issue
> > in #GP.
> > 
> > But if you think using TIF flag is better, I can define a new TIF flag
> > and maintain it per thread (init 0 when clone()/fork(), set 1 in fixup()).
> > Then we can avoid using rdmsr() to check valid PASID in the MSR.
> 
> WHY ?!?! What do you need a TIF flag for?

We need "a way" to check if the per thread MSR has a valid PASID. If yes,
no need to fix up the MSR (wrmsr()), and let other handler to handle the #GP.
Otherwise, apply the heuristics and fix up the MSR and exit the #GP.

The way to check the valid PASID in the MSR is rdmsr() in this series.
A TIF flag will be much faster than rdmsr() and seems a sutiable way
to check valid PASID status per thread. That's why it could replace
rdmsr() to check PASID in the MSR.

Or do you suggest to add a random new flag in struct thread_info instead
of a TIF flag?

Thanks.

-Fenghua


^ permalink raw reply

* Re: [PATCH 2/6] exec: simplify the compat syscall handling
From: Arnd Bergmann @ 2020-06-15 18:47 UTC (permalink / raw)
  To: Brian Gerst
  Cc: linux-arch, linux-s390, Parisc List, the arch/x86 maintainers,
	open list:BROADCOM NVRAM DRIVER, linux-kernel@vger.kernel.org,
	Linux FS-devel Mailing List, Luis Chamberlain, Al Viro,
	sparclinux, linuxppc-dev, Christoph Hellwig, Linux ARM
In-Reply-To: <CAMzpN2heSzZzg16ws3yQkd7YZwmPPx_4RFCpb9JYfFWJ9gfPhA@mail.gmail.com>

On Mon, Jun 15, 2020 at 4:48 PM Brian Gerst <brgerst@gmail.com> wrote:
> On Mon, Jun 15, 2020 at 10:13 AM Christoph Hellwig <hch@lst.de> wrote:
> > On Mon, Jun 15, 2020 at 03:31:35PM +0200, Arnd Bergmann wrote:

> >
> > I'd rather keep it in common code as that allows all the low-level
> > exec stuff to be marked static, and avoid us growing new pointless
> > compat variants through copy and paste.
> > smart compiler to d
> >
> > > I don't really understand
> > > the comment, why can't this just use this?
> >
> > That errors out with:
> >
> > ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1040): undefined reference to
> > `__x32_sys_execve'
> > ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1108): undefined reference to
> > `__x32_sys_execveat'
> > make: *** [Makefile:1139: vmlinux] Error 1
>
> I think I have a fix for this, by modifying the syscall wrappers to
> add an alias for the __x32 variant to the native __x64_sys_foo().
> I'll get back to you with a patch.

Do we actually need the __x32 prefix any more, or could we just
change all x32 specific calls to use __x64_compat_sys_foo()?

      Arnd

^ permalink raw reply

* Re: [PATCH 1/2] mm, treewide: Rename kzfree() to kfree_sensitive()
From: Waiman Long @ 2020-06-15 18:39 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: linux-cifs, linux-wireless, Jarkko Sakkinen, virtualization,
	David Howells, linux-mm, linux-sctp, target-devel, kasan-dev,
	cocci, devel, linux-s390, linux-scsi, x86, James Morris,
	Matthew Wilcox, linux-stm32, intel-wired-lan, David Rientjes,
	Serge E. Hallyn, linux-pm, ecryptfs, linuxppc-dev, linux-fscrypt,
	linux-mediatek, linux-amlogic, linux-btrfs, linux-integrity,
	linux-arm-kernel, linux-nfs, Linus Torvalds, samba-technical,
	linux-kernel, linux-bluetooth, linux-security-module, keyrings,
	tipc-discussion, linux-crypto, netdev, Joe Perches, Andrew Morton,
	linux-wpan, wireguard, linux-ppp
In-Reply-To: <20200615180753.GJ4151@kadam>

On 6/15/20 2:07 PM, Dan Carpenter wrote:
> On Mon, Apr 13, 2020 at 05:15:49PM -0400, Waiman Long wrote:
>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>> index 23c7500eea7d..c08bc7eb20bd 100644
>> --- a/mm/slab_common.c
>> +++ b/mm/slab_common.c
>> @@ -1707,17 +1707,17 @@ void *krealloc(const void *p, size_t new_size, gfp_t flags)
>>   EXPORT_SYMBOL(krealloc);
>>   
>>   /**
>> - * kzfree - like kfree but zero memory
>> + * kfree_sensitive - Clear sensitive information in memory before freeing
>>    * @p: object to free memory of
>>    *
>>    * The memory of the object @p points to is zeroed before freed.
>> - * If @p is %NULL, kzfree() does nothing.
>> + * If @p is %NULL, kfree_sensitive() does nothing.
>>    *
>>    * Note: this function zeroes the whole allocated buffer which can be a good
>>    * deal bigger than the requested buffer size passed to kmalloc(). So be
>>    * careful when using this function in performance sensitive code.
>>    */
>> -void kzfree(const void *p)
>> +void kfree_sensitive(const void *p)
>>   {
>>   	size_t ks;
>>   	void *mem = (void *)p;
>> @@ -1725,10 +1725,10 @@ void kzfree(const void *p)
>>   	if (unlikely(ZERO_OR_NULL_PTR(mem)))
>>   		return;
>>   	ks = ksize(mem);
>> -	memset(mem, 0, ks);
>> +	memzero_explicit(mem, ks);
>          ^^^^^^^^^^^^^^^^^^^^^^^^^
> This is an unrelated bug fix.  It really needs to be pulled into a
> separate patch by itself and back ported to stable kernels.
>
>>   	kfree(mem);
>>   }
>> -EXPORT_SYMBOL(kzfree);
>> +EXPORT_SYMBOL(kfree_sensitive);
>>   
>>   /**
>>    * ksize - get the actual amount of memory allocated for a given object
> regards,
> dan carpenter
>
Thanks for the suggestion. I will break it out and post a version soon.

Cheers,
Longman


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox