* Re: [PATCH v2 2/2] cpufreq: Specify default governor on command line
From: Rafael J. Wysocki @ 2020-06-25 11:44 UTC (permalink / raw)
To: Viresh Kumar
Cc: Juri Lelli, Cc: Android Kernel, Vincent Guittot, Arnd Bergmann,
Rafael J. Wysocki, Peter Zijlstra, Linux PM, Quentin Perret,
Rafael J. Wysocki, Linux Kernel Mailing List, Ingo Molnar,
Paul Mackerras, linuxppc-dev, adharmap, Todd Kjos
In-Reply-To: <20200625113602.z2xrwebd2gngbww3@vireshk-i7>
On Thu, Jun 25, 2020 at 1:36 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> After your last email (reply to my patch), I noticed a change which
> isn't required. :)
>
> On 23-06-20, 15:21, Quentin Perret wrote:
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index 0128de3603df..4b1a5c0173cf 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -50,6 +50,9 @@ static LIST_HEAD(cpufreq_governor_list);
> > #define for_each_governor(__governor) \
> > list_for_each_entry(__governor, &cpufreq_governor_list, governor_list)
> >
> > +static char cpufreq_param_governor[CPUFREQ_NAME_LEN];
> > +static struct cpufreq_governor *default_governor;
> > +
> > /**
> > * The "cpufreq driver" - the arch- or hardware-dependent low
> > * level driver of CPUFreq support, and its spinlock. This lock
> > @@ -1055,7 +1058,6 @@ __weak struct cpufreq_governor *cpufreq_default_governor(void)
> >
> > static int cpufreq_init_policy(struct cpufreq_policy *policy)
> > {
> > - struct cpufreq_governor *def_gov = cpufreq_default_governor();
> > struct cpufreq_governor *gov = NULL;
> > unsigned int pol = CPUFREQ_POLICY_UNKNOWN;
> >
> > @@ -1065,8 +1067,8 @@ static int cpufreq_init_policy(struct cpufreq_policy *policy)
> > if (gov) {
> > pr_debug("Restoring governor %s for cpu %d\n",
> > policy->governor->name, policy->cpu);
> > - } else if (def_gov) {
> > - gov = def_gov;
> > + } else if (default_governor) {
> > + gov = default_governor;
> > } else {
> > return -ENODATA;
> > }
>
>
> > @@ -1074,8 +1076,8 @@ static int cpufreq_init_policy(struct cpufreq_policy *policy)
> > /* Use the default policy if there is no last_policy. */
> > if (policy->last_policy) {
> > pol = policy->last_policy;
> > - } else if (def_gov) {
> > - pol = cpufreq_parse_policy(def_gov->name);
> > + } else if (default_governor) {
> > + pol = cpufreq_parse_policy(default_governor->name);
>
> This change is not right IMO. This part handles the set-policy case,
> where there are no governors. Right now this code, for some reasons
> unknown to me, forcefully uses the default governor set to indicate
> the policy, which is not a great idea in my opinion TBH. This doesn't
> and shouldn't care about governor modules and should only be looking
> at strings instead of governor pointer.
Sounds right.
> Rafael, I even think we should remove this code completely and just
> rely on what the driver has sent to us. Using the selected governor
> for set policy drivers is very confusing and also we shouldn't be
> forced to compiling any governor for the set-policy case.
Well, AFAICS the idea was to use the default governor as a kind of
default policy proxy, but I agree that strings should be sufficient
for that.
I'll have a look at what to do with that code.
^ permalink raw reply
* Re: [PATCH v2 2/2] cpufreq: Specify default governor on command line
From: Viresh Kumar @ 2020-06-25 11:36 UTC (permalink / raw)
To: Quentin Perret
Cc: juri.lelli, kernel-team, vincent.guittot, arnd, rafael, peterz,
adharmap, linux-pm, rjw, linux-kernel, mingo, paulus,
linuxppc-dev, tkjos
In-Reply-To: <20200623142138.209513-3-qperret@google.com>
After your last email (reply to my patch), I noticed a change which
isn't required. :)
On 23-06-20, 15:21, Quentin Perret wrote:
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 0128de3603df..4b1a5c0173cf 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -50,6 +50,9 @@ static LIST_HEAD(cpufreq_governor_list);
> #define for_each_governor(__governor) \
> list_for_each_entry(__governor, &cpufreq_governor_list, governor_list)
>
> +static char cpufreq_param_governor[CPUFREQ_NAME_LEN];
> +static struct cpufreq_governor *default_governor;
> +
> /**
> * The "cpufreq driver" - the arch- or hardware-dependent low
> * level driver of CPUFreq support, and its spinlock. This lock
> @@ -1055,7 +1058,6 @@ __weak struct cpufreq_governor *cpufreq_default_governor(void)
>
> static int cpufreq_init_policy(struct cpufreq_policy *policy)
> {
> - struct cpufreq_governor *def_gov = cpufreq_default_governor();
> struct cpufreq_governor *gov = NULL;
> unsigned int pol = CPUFREQ_POLICY_UNKNOWN;
>
> @@ -1065,8 +1067,8 @@ static int cpufreq_init_policy(struct cpufreq_policy *policy)
> if (gov) {
> pr_debug("Restoring governor %s for cpu %d\n",
> policy->governor->name, policy->cpu);
> - } else if (def_gov) {
> - gov = def_gov;
> + } else if (default_governor) {
> + gov = default_governor;
> } else {
> return -ENODATA;
> }
> @@ -1074,8 +1076,8 @@ static int cpufreq_init_policy(struct cpufreq_policy *policy)
> /* Use the default policy if there is no last_policy. */
> if (policy->last_policy) {
> pol = policy->last_policy;
> - } else if (def_gov) {
> - pol = cpufreq_parse_policy(def_gov->name);
> + } else if (default_governor) {
> + pol = cpufreq_parse_policy(default_governor->name);
This change is not right IMO. This part handles the set-policy case,
where there are no governors. Right now this code, for some reasons
unknown to me, forcefully uses the default governor set to indicate
the policy, which is not a great idea in my opinion TBH. This doesn't
and shouldn't care about governor modules and should only be looking
at strings instead of governor pointer.
Rafael, I even think we should remove this code completely and just
rely on what the driver has sent to us. Using the selected governor
for set policy drivers is very confusing and also we shouldn't be
forced to compiling any governor for the set-policy case.
--
viresh
^ permalink raw reply
* Re: [PATCH v2 1/4] powerpc/mm/radix: Fix PTE/PMD fragment count for early page table mappings
From: Aneesh Kumar K.V @ 2020-06-25 11:30 UTC (permalink / raw)
To: linuxppc-dev, mpe; +Cc: Bharata B Rao
In-Reply-To: <20200625064547.228448-2-aneesh.kumar@linux.ibm.com>
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> Fixing this includes 3 parts:
>
> - Re-walk the init_mm page tables from mem_init() and initialize
> the PMD and PTE fragment count to 1.
> - When freeing PUD, PMD and PTE page table pages, check explicitly
> if they come from memblock and if so free then appropriately.
> - When we do early memblock based allocation of PMD and PUD pages,
> allocate in PAGE_SIZE granularity so that we are sure the
> complete page is used as pagetable page.
>
> Since we now do PAGE_SIZE allocations for both PUD table and
> PMD table (Note that PTE table allocation is already of PAGE_SIZE),
> we end up allocating more memory for the same amount of system RAM.
> Here is a comparision of how much more we need for a 64T and 2G
> system after this patch:
>
Missed updating the commit message w.r.t page table fragments. Updated
one below.
powerpc/mm/radix: Fix PTE/PMD fragment count for early page table mappings
We can hit the following BUG_ON during memory unplug:
kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:342!
Oops: Exception in kernel mode, sig: 5 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
NIP [c000000000093308] pmd_fragment_free+0x48/0xc0
LR [c00000000147bfec] remove_pagetable+0x578/0x60c
Call Trace:
0xc000008050000000 (unreliable)
remove_pagetable+0x384/0x60c
radix__remove_section_mapping+0x18/0x2c
remove_section_mapping+0x1c/0x3c
arch_remove_memory+0x11c/0x180
try_remove_memory+0x120/0x1b0
__remove_memory+0x20/0x40
dlpar_remove_lmb+0xc0/0x114
dlpar_memory+0x8b0/0xb20
handle_dlpar_errorlog+0xc0/0x190
pseries_hp_work_fn+0x2c/0x60
process_one_work+0x30c/0x810
worker_thread+0x98/0x540
kthread+0x1c4/0x1d0
ret_from_kernel_thread+0x5c/0x74
This occurs when unplug is attempted for such memory which has
been mapped using memblock pages as part of early kernel page
table setup. We wouldn't have initialized the PMD or PTE fragment
count for those PMD or PTE pages.
This can be fixed by allocating memory in PAGE_SIZE granularity
during early page table allocation. This makes sure a specific
page is not shared for another memblock allocation and we can
free them correctly on removing page-table pages.
Since we now do PAGE_SIZE allocations for both PUD table and
PMD table (Note that PTE table allocation is already of PAGE_SIZE),
we end up allocating more memory for the same amount of system RAM.
Here is a comparision of how much more we need for a 64T and 2G
system after this patch:
1. 64T system
-------------
64T RAM would need 64G for vmemmap with struct page size being 64B.
128 PUD tables for 64T memory (1G mappings)
1 PUD table and 64 PMD tables for 64G vmemmap (2M mappings)
With default PUD[PMD]_TABLE_SIZE(4K), (128+1+64)*4K=772K
With PAGE_SIZE(64K) table allocations, (128+1+64)*64K=12352K
2. 2G system
------------
2G RAM would need 2M for vmemmap with struct page size being 64B.
1 PUD table for 2G memory (1G mapping)
1 PUD table and 1 PMD table for 2M vmemmap (2M mappings)
With default PUD[PMD]_TABLE_SIZE(4K), (1+1+1)*4K=12K
With new PAGE_SIZE(64K) table allocations, (1+1+1)*64K=192K
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
-aneesh
^ permalink raw reply
* Re: [PATCH v2 2/2] cpufreq: Specify default governor on command line
From: Rafael J. Wysocki @ 2020-06-25 10:52 UTC (permalink / raw)
To: Viresh Kumar
Cc: Juri Lelli, Cc: Android Kernel, Vincent Guittot, Arnd Bergmann,
Rafael J. Wysocki, Peter Zijlstra, Linux PM, Quentin Perret,
Rafael J. Wysocki, Linux Kernel Mailing List, Ingo Molnar,
Paul Mackerras, linuxppc-dev, adharmap, Todd Kjos
In-Reply-To: <20200625085052.4ah4wbog3guj74v4@vireshk-i7>
On Thu, Jun 25, 2020 at 10:50 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 24-06-20, 16:32, Quentin Perret wrote:
> > Right, but I must admit that, looking at this more, I'm getting a bit
> > confused with the overall locking for governors :/
> >
> > When in cpufreq_init_policy() we find a governor using
> > find_governor(policy->last_governor), what guarantees this governor is
> > not concurrently unregistered? That is, what guarantees this governor
> > doesn't go away between that find_governor() call, and the subsequent
> > call to try_module_get() in cpufreq_set_policy() down the line?
> >
> > Can we somewhat assume that whatever governor is referred to by
> > policy->last_governor will have a non-null refcount? Or are the
> > cpufreq_online() and cpufreq_unregister_governor() path mutually
> > exclusive? Or is there something else?
>
> This should be sufficient to fix pending issues I believe. Based over your
> patches.
LGTM, but can you post it in a new thread to let Patchwork pick it up?
> -------------------------8<-------------------------
> From: Viresh Kumar <viresh.kumar@linaro.org>
> Date: Thu, 25 Jun 2020 13:15:23 +0530
> Subject: [PATCH] cpufreq: Fix locking issues with governors
>
> The locking around governors handling isn't adequate currently. The list
> of governors should never be traversed without locking in place. Also we
> must make sure the governor isn't removed while it is still referenced
> by code.
>
> Reported-by: Quentin Perret <qperret@google.com>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
> drivers/cpufreq/cpufreq.c | 59 ++++++++++++++++++++++++---------------
> 1 file changed, 36 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 4b1a5c0173cf..dad6b85f4c89 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -624,6 +624,24 @@ static struct cpufreq_governor *find_governor(const char *str_governor)
> return NULL;
> }
>
> +static struct cpufreq_governor *get_governor(const char *str_governor)
> +{
> + struct cpufreq_governor *t;
> +
> + mutex_lock(&cpufreq_governor_mutex);
> + t = find_governor(str_governor);
> + if (!t)
> + goto unlock;
> +
> + if (!try_module_get(t->owner))
> + t = NULL;
> +
> +unlock:
> + mutex_unlock(&cpufreq_governor_mutex);
> +
> + return t;
> +}
> +
> static unsigned int cpufreq_parse_policy(char *str_governor)
> {
> if (!strncasecmp(str_governor, "performance", CPUFREQ_NAME_LEN))
> @@ -643,28 +661,14 @@ static struct cpufreq_governor *cpufreq_parse_governor(char *str_governor)
> {
> struct cpufreq_governor *t;
>
> - mutex_lock(&cpufreq_governor_mutex);
> -
> - t = find_governor(str_governor);
> - if (!t) {
> - int ret;
> -
> - mutex_unlock(&cpufreq_governor_mutex);
> -
> - ret = request_module("cpufreq_%s", str_governor);
> - if (ret)
> - return NULL;
> -
> - mutex_lock(&cpufreq_governor_mutex);
> + t = get_governor(str_governor);
> + if (t)
> + return t;
>
> - t = find_governor(str_governor);
> - }
> - if (t && !try_module_get(t->owner))
> - t = NULL;
> -
> - mutex_unlock(&cpufreq_governor_mutex);
> + if (request_module("cpufreq_%s", str_governor))
> + return NULL;
>
> - return t;
> + return get_governor(str_governor);
> }
>
> /**
> @@ -818,12 +822,14 @@ static ssize_t show_scaling_available_governors(struct cpufreq_policy *policy,
> goto out;
> }
>
> + mutex_lock(&cpufreq_governor_mutex);
> for_each_governor(t) {
> if (i >= (ssize_t) ((PAGE_SIZE / sizeof(char))
> - (CPUFREQ_NAME_LEN + 2)))
> - goto out;
> + break;
> i += scnprintf(&buf[i], CPUFREQ_NAME_PLEN, "%s ", t->name);
> }
> + mutex_unlock(&cpufreq_governor_mutex);
> out:
> i += sprintf(&buf[i], "\n");
> return i;
> @@ -1060,11 +1066,14 @@ static int cpufreq_init_policy(struct cpufreq_policy *policy)
> {
> struct cpufreq_governor *gov = NULL;
> unsigned int pol = CPUFREQ_POLICY_UNKNOWN;
> + bool put_governor = false;
> + int ret;
>
> if (has_target()) {
> /* Update policy governor to the one used before hotplug. */
> - gov = find_governor(policy->last_governor);
> + gov = get_governor(policy->last_governor);
> if (gov) {
> + put_governor = true;
> pr_debug("Restoring governor %s for cpu %d\n",
> policy->governor->name, policy->cpu);
> } else if (default_governor) {
> @@ -1091,7 +1100,11 @@ static int cpufreq_init_policy(struct cpufreq_policy *policy)
> return -ENODATA;
> }
>
> - return cpufreq_set_policy(policy, gov, pol);
> + ret = cpufreq_set_policy(policy, gov, pol);
> + if (put_governor)
> + module_put(gov->owner);
> +
> + return ret;
> }
>
> static int cpufreq_add_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu)
^ permalink raw reply
* Re: [PATCH] powerpc: Warn about use of smt_snooze_delay
From: Christophe Leroy @ 2020-06-25 10:29 UTC (permalink / raw)
To: Joel Stanley, linuxppc-dev; +Cc: ego
In-Reply-To: <20200625100349.2408899-1-joel@jms.id.au>
Le 25/06/2020 à 12:03, Joel Stanley a écrit :
> It's not done anything for a long time. Save the percpu variable, and
> emit a warning to remind users to not expect it to do anything.
Why not just drop the file entirely if it is useless ?
Christophe
>
> Signed-off-by: Joel Stanley <joel@jms.id.au>
> ---
> arch/powerpc/kernel/sysfs.c | 41 +++++++++++++------------------------
> 1 file changed, 14 insertions(+), 27 deletions(-)
>
> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> index 571b3259697e..530ae92bc46d 100644
> --- a/arch/powerpc/kernel/sysfs.c
> +++ b/arch/powerpc/kernel/sysfs.c
> @@ -32,29 +32,25 @@
>
> static DEFINE_PER_CPU(struct cpu, cpu_devices);
>
> -/*
> - * SMT snooze delay stuff, 64-bit only for now
> - */
> -
> #ifdef CONFIG_PPC64
>
> -/* Time in microseconds we delay before sleeping in the idle loop */
> -static DEFINE_PER_CPU(long, smt_snooze_delay) = { 100 };
> +/*
> + * Snooze delay has not been hooked up since 3fa8cad82b94 ("powerpc/pseries/cpuidle:
> + * smt-snooze-delay cleanup.") and has been broken even longer. As was foretold in
> + * 2014:
> + *
> + * "ppc64_util currently utilises it. Once we fix ppc64_util, propose to clean
> + * up the kernel code."
> + *
> + * At some point in the future this code should be removed.
> + */
>
> static ssize_t store_smt_snooze_delay(struct device *dev,
> struct device_attribute *attr,
> const char *buf,
> size_t count)
> {
> - struct cpu *cpu = container_of(dev, struct cpu, dev);
> - ssize_t ret;
> - long snooze;
> -
> - ret = sscanf(buf, "%ld", &snooze);
> - if (ret != 1)
> - return -EINVAL;
> -
> - per_cpu(smt_snooze_delay, cpu->dev.id) = snooze;
> + WARN_ON_ONCE("smt_snooze_delay sysfs file has no effect\n");
> return count;
> }
>
> @@ -62,9 +58,9 @@ static ssize_t show_smt_snooze_delay(struct device *dev,
> struct device_attribute *attr,
> char *buf)
> {
> - struct cpu *cpu = container_of(dev, struct cpu, dev);
> + WARN_ON_ONCE("smt_snooze_delay sysfs file has no effect\n");
>
> - return sprintf(buf, "%ld\n", per_cpu(smt_snooze_delay, cpu->dev.id));
> + return sprintf(buf, "100\n");
> }
>
> static DEVICE_ATTR(smt_snooze_delay, 0644, show_smt_snooze_delay,
> @@ -72,16 +68,7 @@ static DEVICE_ATTR(smt_snooze_delay, 0644, show_smt_snooze_delay,
>
> static int __init setup_smt_snooze_delay(char *str)
> {
> - unsigned int cpu;
> - long snooze;
> -
> - if (!cpu_has_feature(CPU_FTR_SMT))
> - return 1;
> -
> - snooze = simple_strtol(str, NULL, 10);
> - for_each_possible_cpu(cpu)
> - per_cpu(smt_snooze_delay, cpu) = snooze;
> -
> + WARN_ON_ONCE("smt-snooze-delay command line option has no effect\n");
> return 1;
> }
> __setup("smt-snooze-delay=", setup_smt_snooze_delay);
>
^ permalink raw reply
* Re: FSL P5020/P5040: DPAA Ethernet issue with the latest Git kernel
From: Alexander Gordeev @ 2020-06-25 10:22 UTC (permalink / raw)
To: Christian Zigotzky
Cc: Darren Stevens, mad skateman, Madalin Bucur, Sascha Hauer,
R.T.Dickinson, linuxppc-dev, Christian Zigotzky
In-Reply-To: <004794fb-370c-c165-38e6-a451dc50c294@xenosoft.de>
On Thu, Jun 25, 2020 at 01:01:52AM +0200, Christian Zigotzky wrote:
[...]
> I compiled a test kernel with the option "CONFIG_TEST_BITMAP=y"
> yesterday. After that Skateman and I booted it and looked for the
> bitmap tests with "dmesg | grep -i bitmap".
>
> Results:
>
> FSL P5020:
>
> [ 0.297756] test_bitmap: loaded.
> [ 0.298113] test_bitmap: parselist: 14: input is '0-2047:128/256'
> OK, Time: 562
> [ 0.298142] test_bitmap: parselist_user: 14: input is
> '0-2047:128/256' OK, Time: 761
> [ 0.301553] test_bitmap: all 1663 tests passed
>
> FSL P5040:
>
> [ 0.296563] test_bitmap: loaded.
> [ 0.296894] test_bitmap: parselist: 14: input is '0-2047:128/256'
> OK, Time: 540
> [ 0.296920] test_bitmap: parselist_user: 14: input is
> '0-2047:128/256' OK, Time: 680
> [ 0.299994] test_bitmap: all 1663 tests passed
Thanks for the test! So it works as expected.
I would suggest to compare what is going on on the device probing
with and without the bisected commit.
There seems to be MAC and PHY mode initialization issue that might
resulted from the bitmap format change.
I put Madalin and Sascha on CC as they have done some works on
this part recently.
Thanks!
> Thanks,
> Christian
^ permalink raw reply
* [PATCH] powerpc: Warn about use of smt_snooze_delay
From: Joel Stanley @ 2020-06-25 10:03 UTC (permalink / raw)
To: linuxppc-dev; +Cc: ego
It's not done anything for a long time. Save the percpu variable, and
emit a warning to remind users to not expect it to do anything.
Signed-off-by: Joel Stanley <joel@jms.id.au>
---
arch/powerpc/kernel/sysfs.c | 41 +++++++++++++------------------------
1 file changed, 14 insertions(+), 27 deletions(-)
diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 571b3259697e..530ae92bc46d 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -32,29 +32,25 @@
static DEFINE_PER_CPU(struct cpu, cpu_devices);
-/*
- * SMT snooze delay stuff, 64-bit only for now
- */
-
#ifdef CONFIG_PPC64
-/* Time in microseconds we delay before sleeping in the idle loop */
-static DEFINE_PER_CPU(long, smt_snooze_delay) = { 100 };
+/*
+ * Snooze delay has not been hooked up since 3fa8cad82b94 ("powerpc/pseries/cpuidle:
+ * smt-snooze-delay cleanup.") and has been broken even longer. As was foretold in
+ * 2014:
+ *
+ * "ppc64_util currently utilises it. Once we fix ppc64_util, propose to clean
+ * up the kernel code."
+ *
+ * At some point in the future this code should be removed.
+ */
static ssize_t store_smt_snooze_delay(struct device *dev,
struct device_attribute *attr,
const char *buf,
size_t count)
{
- struct cpu *cpu = container_of(dev, struct cpu, dev);
- ssize_t ret;
- long snooze;
-
- ret = sscanf(buf, "%ld", &snooze);
- if (ret != 1)
- return -EINVAL;
-
- per_cpu(smt_snooze_delay, cpu->dev.id) = snooze;
+ WARN_ON_ONCE("smt_snooze_delay sysfs file has no effect\n");
return count;
}
@@ -62,9 +58,9 @@ static ssize_t show_smt_snooze_delay(struct device *dev,
struct device_attribute *attr,
char *buf)
{
- struct cpu *cpu = container_of(dev, struct cpu, dev);
+ WARN_ON_ONCE("smt_snooze_delay sysfs file has no effect\n");
- return sprintf(buf, "%ld\n", per_cpu(smt_snooze_delay, cpu->dev.id));
+ return sprintf(buf, "100\n");
}
static DEVICE_ATTR(smt_snooze_delay, 0644, show_smt_snooze_delay,
@@ -72,16 +68,7 @@ static DEVICE_ATTR(smt_snooze_delay, 0644, show_smt_snooze_delay,
static int __init setup_smt_snooze_delay(char *str)
{
- unsigned int cpu;
- long snooze;
-
- if (!cpu_has_feature(CPU_FTR_SMT))
- return 1;
-
- snooze = simple_strtol(str, NULL, 10);
- for_each_possible_cpu(cpu)
- per_cpu(smt_snooze_delay, cpu) = snooze;
-
+ WARN_ON_ONCE("smt-snooze-delay command line option has no effect\n");
return 1;
}
__setup("smt-snooze-delay=", setup_smt_snooze_delay);
--
2.27.0
^ permalink raw reply related
* Re: PowerPC KVM-PR issue
From: Christian Zigotzky @ 2020-06-25 9:38 UTC (permalink / raw)
To: Nicholas Piggin, kvm-ppc@vger.kernel.org, linuxppc-dev
Cc: Darren Stevens, mad skateman, R.T.Dickinson, Christian Zigotzky
In-Reply-To: <292cba7f-ca2b-efb0-db3d-ecd7ee5f1fad@xenosoft.de>
On 15 June 2020 at 01:39 pm, Christian Zigotzky wrote:
> On 14 June 2020 at 04:52 pm, Christian Zigotzky wrote:
>> On 14 June 2020 at 02:53 pm, Nicholas Piggin wrote:
>>> Excerpts from Christian Zigotzky's message of June 12, 2020 11:01 pm:
>>>> On 11 June 2020 at 04:47 pm, Christian Zigotzky wrote:
>>>>> On 10 June 2020 at 01:23 pm, Christian Zigotzky wrote:
>>>>>> On 10 June 2020 at 11:06 am, Christian Zigotzky wrote:
>>>>>>> On 10 June 2020 at 00:18 am, Christian Zigotzky wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> KVM-PR doesn't work anymore on my Nemo board [1]. I figured out
>>>>>>>> that the Git kernels and the kernel 5.7 are affected.
>>>>>>>>
>>>>>>>> Error message: Fienix kernel: kvmppc_exit_pr_progint: emulation at
>>>>>>>> 700 failed (00000000)
>>>>>>>>
>>>>>>>> I can boot virtual QEMU PowerPC machines with KVM-PR with the
>>>>>>>> kernel 5.6 without any problems on my Nemo board.
>>>>>>>>
>>>>>>>> I tested it with QEMU 2.5.0 and QEMU 5.0.0 today.
>>>>>>>>
>>>>>>>> Could you please check KVM-PR on your PowerPC machine?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Christian
>>>>>>>>
>>>>>>>> [1] https://en.wikipedia.org/wiki/AmigaOne_X1000
>>>>>>> I figured out that the PowerPC updates 5.7-1 [1] are responsible
>>>>>>> for
>>>>>>> the KVM-PR issue. Please test KVM-PR on your PowerPC machines and
>>>>>>> check the PowerPC updates 5.7-1 [1].
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> [1]
>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d38c07afc356ddebaa3ed8ecb3f553340e05c969
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> I tested the latest Git kernel with Mac-on-Linux/KVM-PR today.
>>>>>> Unfortunately I can't use KVM-PR with MoL anymore because of this
>>>>>> issue (see screenshots [1]). Please check the PowerPC updates 5.7-1.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> [1]
>>>>>> -
>>>>>> https://i.pinimg.com/originals/0c/b3/64/0cb364a40241fa2b7f297d4272bbb8b7.png
>>>>>>
>>>>>> -
>>>>>> https://i.pinimg.com/originals/9a/61/d1/9a61d170b1c9f514f7a78a3014ffd18f.png
>>>>>>
>>>>>>
>>>>> Hi All,
>>>>>
>>>>> I bisected today because of the KVM-PR issue.
>>>>>
>>>>> Result:
>>>>>
>>>>> 9600f261acaaabd476d7833cec2dd20f2919f1a0 is the first bad commit
>>>>> commit 9600f261acaaabd476d7833cec2dd20f2919f1a0
>>>>> Author: Nicholas Piggin <npiggin@gmail.com>
>>>>> Date: Wed Feb 26 03:35:21 2020 +1000
>>>>>
>>>>> powerpc/64s/exception: Move KVM test to common code
>>>>>
>>>>> This allows more code to be moved out of unrelocated regions.
>>>>> The
>>>>> system call KVMTEST is changed to be open-coded and remain in
>>>>> the
>>>>> tramp area to avoid having to move it to entry_64.S. The custom
>>>>> nature
>>>>> of the system call entry code means the hcall case can be
>>>>> made more
>>>>> streamlined than regular interrupt handlers.
>>>>>
>>>>> mpe: Incorporate fix from Nick:
>>>>>
>>>>> Moving KVM test to the common entry code missed the case of
>>>>> HMI and
>>>>> MCE, which do not do __GEN_COMMON_ENTRY (because they don't
>>>>> want to
>>>>> switch to virt mode).
>>>>>
>>>>> This means a MCE or HMI exception that is taken while KVM is
>>>>> running a
>>>>> guest context will not be switched out of that context, and
>>>>> KVM won't
>>>>> be notified. Found by running sigfuz in guest with patched
>>>>> host on
>>>>> POWER9 DD2.3, which causes some TM related HMI interrupts
>>>>> (which are
>>>>> expected and supposed to be handled by KVM).
>>>>>
>>>>> This fix adds a __GEN_REALMODE_COMMON_ENTRY for those
>>>>> handlers to add
>>>>> the KVM test. This makes them look a little more like other
>>>>> handlers
>>>>> that all use __GEN_COMMON_ENTRY.
>>>>>
>>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>>>> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
>>>>> Link:
>>>>> https://lore.kernel.org/r/20200225173541.1549955-13-npiggin@gmail.com
>>>>>
>>>>> :040000 040000 ec21cec22d165f8696d69532734cb2985d532cb0
>>>>> 87dd49a9cd7202ec79350e8ca26cea01f1dbd93d M arch
>>>>>
>>>>> -----
>>>>>
>>>>> The following commit is the problem: powerpc/64s/exception: Move KVM
>>>>> test to common code [1]
>>>>>
>>>>> These changes were included in the PowerPC updates 5.7-1. [2]
>>>>>
>>>>> Another test:
>>>>>
>>>>> git checkout d38c07afc356ddebaa3ed8ecb3f553340e05c969 (PowerPC
>>>>> updates
>>>>> 5.7-1 [2] ) -> KVM-PR doesn't work.
>>>>>
>>>>> After that: git revert d38c07afc356ddebaa3ed8ecb3f553340e05c969 -m 1
>>>>> -> KVM-PR works.
>>>>>
>>>>> Could you please check the first bad commit? [1]
>>>>>
>>>>> Thanks,
>>>>> Christian
>>>>>
>>>>>
>>>>> [1]
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9600f261acaaabd476d7833cec2dd20f2919f1a0
>>>>>
>>>>> [2]
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d38c07afc356ddebaa3ed8ecb3f553340e05c969
>>>>>
>>>> Hi All,
>>>>
>>>> I tried to revert the __GEN_REALMODE_COMMON_ENTRY fix for the
>>>> latest Git
>>>> kernel and for the stable kernel 5.7.2 but without any success. There
>>>> was lot of restructuring work during the kernel 5.7 development
>>>> time in
>>>> the PowerPC area so it isn't possible reactivate the old code. That
>>>> means we have lost the whole KVM-PR support. I also reported this
>>>> issue
>>>> to Alexander Graf two days ago. He wrote: "Howdy :). It looks pretty
>>>> broken. Have you ever made a bisect to see where the problem comes
>>>> from?"
>>>>
>>>> Please check the KVM-PR code.
>>> Does this patch fix it for you?
>>>
>>> The CTR register reload in the KVM interrupt path used the wrong save
>>> area for SLB (and NMI) interrupts.
>>>
>>> Fixes: 9600f261acaaa ("powerpc/64s/exception: Move KVM test to
>>> common code")
>>> Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>> ---
>>> arch/powerpc/kernel/exceptions-64s.S | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kernel/exceptions-64s.S
>>> b/arch/powerpc/kernel/exceptions-64s.S
>>> index e70ebb5c318c..fa080694e581 100644
>>> --- a/arch/powerpc/kernel/exceptions-64s.S
>>> +++ b/arch/powerpc/kernel/exceptions-64s.S
>>> @@ -270,7 +270,7 @@ BEGIN_FTR_SECTION
>>> END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
>>> .endif
>>> - ld r10,PACA_EXGEN+EX_CTR(r13)
>>> + ld r10,IAREA+EX_CTR(r13)
>>> mtctr r10
>>> BEGIN_FTR_SECTION
>>> ld r10,IAREA+EX_PPR(r13)
>>> @@ -298,7 +298,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
>>> .if IKVM_SKIP
>>> 89: mtocrf 0x80,r9
>>> - ld r10,PACA_EXGEN+EX_CTR(r13)
>>> + ld r10,IAREA+EX_CTR(r13)
>>> mtctr r10
>>> ld r9,IAREA+EX_R9(r13)
>>> ld r10,IAREA+EX_R10(r13)
>> Many thanks for the fix! I will test it with the RC1 tomorrow.
>>
>> -- Christian
>
> It works! :-) Thanks a lot! Screenshot:
> https://i.pinimg.com/originals/5d/5f/e5/5d5fe584db474dc88bcc641450b2a7e0.png
>
> -- Christian
Just for info: I successfully tested KVM-PR with the stable kernel 5.7.6
and with the RC2 of kernel 5.8 today. Thanks a lot for fixing the issue.
-- Christian
^ permalink raw reply
* Re: [PATCH 16/17] arch: remove HAVE_COPY_THREAD_TLS
From: Thomas Bogendoerfer @ 2020-06-25 8:25 UTC (permalink / raw)
To: Christian Brauner
Cc: Rich Felker, linux-sh, Peter Zijlstra, Catalin Marinas,
Heiko Carstens, linux-mips, James E.J. Bottomley, Guo Ren,
linux-csky, sparclinux, linux-hexagon, linux-riscv, Vincent Chen,
Will Deacon, Thomas Gleixner, Anton Ivanov, Jonas Bonn,
linux-s390, linux-ia64, linux-c6x-dev, Brian Cain, linux-xtensa,
Helge Deller, x86, Russell King, Ley Foon Tan, Mike Rapoport,
Christian Borntraeger, Ingo Molnar, Geert Uytterhoeven,
linux-parisc, Mark Salter, Matt Turner, linux-snps-arc,
uclinux-h8-devel, Fenghua Yu, Albert Ou, Kees Cook, Vasily Gorbik,
Jeff Dike, linux-alpha, linux-um, linuxppc-dev, Aurelien Jacquiot,
linux-m68k, openrisc, Ivan Kokshaysky, Greentime Hu,
Paul Walmsley, Stafford Horne, Stefan Kristiansson, Guan Xuetao,
linux-arm-kernel, Richard Henderson, Michal Simek, Tony Luck,
Yoshinori Sato, Nick Hu, Vineet Gupta, linux-kernel,
Palmer Dabbelt, Richard Weinberger, Paul Mackerras,
Linus Torvalds, David S. Miller, Al Viro
In-Reply-To: <20200622234326.906346-17-christian.brauner@ubuntu.com>
On Tue, Jun 23, 2020 at 01:43:25AM +0200, Christian Brauner wrote:
> All architectures support copy_thread_tls() now, so remove the legacy
> copy_thread() function and the HAVE_COPY_THREAD_TLS config option. Everyone
> uses the same process creation calling convention based on
> copy_thread_tls() and struct kernel_clone_args. This will make it easier to
> maintain the core process creation code under kernel/, simplifies the
> callpaths and makes the identical for all architectures.
> [..]
> arch/mips/Kconfig | 1 -
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Thomas.
--
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea. [ RFC1925, 2.3 ]
^ permalink raw reply
* Re: [PATCH 17/17] arch: rename copy_thread_tls() back to copy_thread()
From: Thomas Bogendoerfer @ 2020-06-25 8:26 UTC (permalink / raw)
To: Christian Brauner
Cc: Rich Felker, linux-sh, Peter Zijlstra (Intel), Catalin Marinas,
linux-mips, James E.J. Bottomley, Max Filippov, Guo Ren,
Matthew Wilcox (Oracle), H. Peter Anvin, sparclinux,
linux-hexagon, linux-riscv, Vincent Chen, Will Deacon,
Thomas Gleixner, Anton Ivanov, Jonas Bonn, linux-s390, linux-ia64,
linux-c6x-dev, Brian Cain, linux-xtensa, Helge Deller, x86,
Russell King, Ley Foon Tan, Christian Borntraeger, Ingo Molnar,
Geert Uytterhoeven, linux-parisc, Mark Salter, Matt Turner,
linux-snps-arc, uclinux-h8-devel, Fenghua Yu, Albert Ou,
Kees Cook, Jeff Dike, linux-alpha, linux-um, linuxppc-dev,
Aurelien Jacquiot, linux-m68k, linux-csky, Ivan Kokshaysky,
Greentime Hu, Paul Walmsley, Stafford Horne, Stefan Kristiansson,
Guan Xuetao, linux-arm-kernel, Richard Henderson, Chris Zankel,
Michal Simek, Tony Luck, Yoshinori Sato, Nick Hu, Vineet Gupta,
linux-kernel, openrisc, Palmer Dabbelt, Richard Weinberger,
Paul Mackerras, Linus Torvalds, David S. Miller, Al Viro
In-Reply-To: <20200622234326.906346-18-christian.brauner@ubuntu.com>
On Tue, Jun 23, 2020 at 01:43:26AM +0200, Christian Brauner wrote:
> Now that HAVE_COPY_THREAD_TLS has been removed, rename copy_thread_tls()
> back simply copy_thread(). It's a simpler name, and doesn't imply that only
> tls is copied here. This finishes an outstanding chunk of internal process
> creation work since we've added clone3().
> [..]
> arch/mips/kernel/process.c | 2 +-
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Thomas.
--
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea. [ RFC1925, 2.3 ]
^ permalink raw reply
* Re: [PATCH v2 2/2] cpufreq: Specify default governor on command line
From: Viresh Kumar @ 2020-06-25 8:50 UTC (permalink / raw)
To: Quentin Perret
Cc: Juri Lelli, Cc: Android Kernel, Vincent Guittot, Arnd Bergmann,
Rafael J. Wysocki, Peter Zijlstra, adharmap, Linux PM,
Rafael J. Wysocki, Linux Kernel Mailing List, Ingo Molnar,
Paul Mackerras, linuxppc-dev, Todd Kjos
In-Reply-To: <20200624153259.GA2844@google.com>
On 24-06-20, 16:32, Quentin Perret wrote:
> Right, but I must admit that, looking at this more, I'm getting a bit
> confused with the overall locking for governors :/
>
> When in cpufreq_init_policy() we find a governor using
> find_governor(policy->last_governor), what guarantees this governor is
> not concurrently unregistered? That is, what guarantees this governor
> doesn't go away between that find_governor() call, and the subsequent
> call to try_module_get() in cpufreq_set_policy() down the line?
>
> Can we somewhat assume that whatever governor is referred to by
> policy->last_governor will have a non-null refcount? Or are the
> cpufreq_online() and cpufreq_unregister_governor() path mutually
> exclusive? Or is there something else?
This should be sufficient to fix pending issues I believe. Based over your
patches.
--
viresh
-------------------------8<-------------------------
From: Viresh Kumar <viresh.kumar@linaro.org>
Date: Thu, 25 Jun 2020 13:15:23 +0530
Subject: [PATCH] cpufreq: Fix locking issues with governors
The locking around governors handling isn't adequate currently. The list
of governors should never be traversed without locking in place. Also we
must make sure the governor isn't removed while it is still referenced
by code.
Reported-by: Quentin Perret <qperret@google.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
drivers/cpufreq/cpufreq.c | 59 ++++++++++++++++++++++++---------------
1 file changed, 36 insertions(+), 23 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 4b1a5c0173cf..dad6b85f4c89 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -624,6 +624,24 @@ static struct cpufreq_governor *find_governor(const char *str_governor)
return NULL;
}
+static struct cpufreq_governor *get_governor(const char *str_governor)
+{
+ struct cpufreq_governor *t;
+
+ mutex_lock(&cpufreq_governor_mutex);
+ t = find_governor(str_governor);
+ if (!t)
+ goto unlock;
+
+ if (!try_module_get(t->owner))
+ t = NULL;
+
+unlock:
+ mutex_unlock(&cpufreq_governor_mutex);
+
+ return t;
+}
+
static unsigned int cpufreq_parse_policy(char *str_governor)
{
if (!strncasecmp(str_governor, "performance", CPUFREQ_NAME_LEN))
@@ -643,28 +661,14 @@ static struct cpufreq_governor *cpufreq_parse_governor(char *str_governor)
{
struct cpufreq_governor *t;
- mutex_lock(&cpufreq_governor_mutex);
-
- t = find_governor(str_governor);
- if (!t) {
- int ret;
-
- mutex_unlock(&cpufreq_governor_mutex);
-
- ret = request_module("cpufreq_%s", str_governor);
- if (ret)
- return NULL;
-
- mutex_lock(&cpufreq_governor_mutex);
+ t = get_governor(str_governor);
+ if (t)
+ return t;
- t = find_governor(str_governor);
- }
- if (t && !try_module_get(t->owner))
- t = NULL;
-
- mutex_unlock(&cpufreq_governor_mutex);
+ if (request_module("cpufreq_%s", str_governor))
+ return NULL;
- return t;
+ return get_governor(str_governor);
}
/**
@@ -818,12 +822,14 @@ static ssize_t show_scaling_available_governors(struct cpufreq_policy *policy,
goto out;
}
+ mutex_lock(&cpufreq_governor_mutex);
for_each_governor(t) {
if (i >= (ssize_t) ((PAGE_SIZE / sizeof(char))
- (CPUFREQ_NAME_LEN + 2)))
- goto out;
+ break;
i += scnprintf(&buf[i], CPUFREQ_NAME_PLEN, "%s ", t->name);
}
+ mutex_unlock(&cpufreq_governor_mutex);
out:
i += sprintf(&buf[i], "\n");
return i;
@@ -1060,11 +1066,14 @@ static int cpufreq_init_policy(struct cpufreq_policy *policy)
{
struct cpufreq_governor *gov = NULL;
unsigned int pol = CPUFREQ_POLICY_UNKNOWN;
+ bool put_governor = false;
+ int ret;
if (has_target()) {
/* Update policy governor to the one used before hotplug. */
- gov = find_governor(policy->last_governor);
+ gov = get_governor(policy->last_governor);
if (gov) {
+ put_governor = true;
pr_debug("Restoring governor %s for cpu %d\n",
policy->governor->name, policy->cpu);
} else if (default_governor) {
@@ -1091,7 +1100,11 @@ static int cpufreq_init_policy(struct cpufreq_policy *policy)
return -ENODATA;
}
- return cpufreq_set_policy(policy, gov, pol);
+ ret = cpufreq_set_policy(policy, gov, pol);
+ if (put_governor)
+ module_put(gov->owner);
+
+ return ret;
}
static int cpufreq_add_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu)
^ permalink raw reply related
* Re: [PATCH v5 01/13] powerpc: Remove Xilinx PPC405/PPC440 support
From: Joel Stanley @ 2020-06-25 8:39 UTC (permalink / raw)
To: Michael Ellerman
Cc: Arnd Bergmann, Nick Desaulniers, Michal Simek, LKML,
clang-built-linux, Paul Mackerras, Nathan Chancellor,
linuxppc-dev
In-Reply-To: <87eeqbco82.fsf@mpe.ellerman.id.au>
On Fri, 19 Jun 2020 at 11:02, Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Nathan Chancellor <natechancellor@gmail.com> writes:
> >> It's kind of nuts that the zImage points to some arbitrary image
> >> depending on what's configured and the order of things in the Makefile.
> >> But I'm not sure how we make it less nuts without risking breaking
> >> people's existing setups.
> >
> > Hi Michael,
> >
> > For what it's worth, this is squared this away in terms of our CI by
> > just building and booting the uImage directly, rather than implicitly
> > using the zImage:
> >
> > https://github.com/ClangBuiltLinux/continuous-integration/pull/282
> > https://github.com/ClangBuiltLinux/boot-utils/pull/22
>
> Great.
>
> > We were only using the zImage because that is what Joel Stanley intially
> > set us up with when PowerPC 32-bit was added to our CI:
> >
> > https://github.com/ClangBuiltLinux/continuous-integration/pull/100
>
> Ah, so Joel owes us all beers then ;)
Hey, you owe me beers for finding broken machines!
This machine was picked from a vague discussion on an internal chat.
The two requirements were that it would build, and boot in qemu.
If there's a better supported 32 bit machine then we should switch the
CI over. We don't want the Clang CI to be the only user and give the
false impression that someone out there is still booting upstream
kernels on it.
> > Admittedly, we really do not have many PowerPC experts in our
> > organization so we are supporting it on a "best effort" basis, which
> > often involves using whatever knowledge is floating around or can be
> > gained from interactions such as this :) so thank you for that!
>
> No worries. I definitely don't expect you folks to invest much effort in
> powerpc, especially the old 32-bit stuff, so always happy to help debug
> things, and really appreciate the testing you do.
+1
Cheers,
Joel
^ permalink raw reply
* Re: [PATCH v4 6/8] arm: Break cyclic percpu include
From: Will Deacon @ 2020-06-25 7:31 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-s390, bigeasy, x86, heiko.carstens, linux-kernel, rostedt,
davem, a.darwish, sparclinux, linux, tglx, linuxppc-dev, mingo
In-Reply-To: <20200624175320.GN4781@hirez.programming.kicks-ass.net>
On Wed, Jun 24, 2020 at 07:53:20PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 23, 2020 at 10:02:57AM +0100, Will Deacon wrote:
> > On Tue, Jun 23, 2020 at 10:36:51AM +0200, Peter Zijlstra wrote:
> > > In order to use <asm/percpu.h> in irqflags.h, we need to make sure
> > > asm/percpu.h does not itself depend on irqflags.h.
> > >
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > ---
> > > arch/arm/include/asm/percpu.h | 2 ++
> > > 1 file changed, 2 insertions(+)
> > >
> > > --- a/arch/arm/include/asm/percpu.h
> > > +++ b/arch/arm/include/asm/percpu.h
> > > @@ -10,6 +10,8 @@
> > > * in the TPIDRPRW. TPIDRPRW only exists on V6K and V7
> > > */
> > > #if defined(CONFIG_SMP) && !defined(CONFIG_CPU_V6)
> > > +register unsigned long current_stack_pointer asm ("sp");
> >
> > If you define this unconditionally, then we can probably get rid of the
> > copy in asm/thread_info.h, rather than duplicate the same #define.
>
> The below delta seems to build arm-allnoconfig, arm-defconfig and
> arm-allmodconfig.
>
> Although please don't ask me how asm/thread_info.h includes asm/percpu.h
>
> Does that work for you?
Yes, thanks! I can't believe you removed the helpful comment.
> -/*
> - * how to get the current stack pointer in C
> - */
Will
^ permalink raw reply
* [PATCH v2 4/4] powerpc/mm/radix: Create separate mappings for hot-plugged memory
From: Aneesh Kumar K.V @ 2020-06-25 6:45 UTC (permalink / raw)
To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, Bharata B Rao
In-Reply-To: <20200625064547.228448-1-aneesh.kumar@linux.ibm.com>
To enable memory unplug without splitting kernel page table
mapping, we force the max mapping size to the LMB size. LMB
size is the unit in which hypervisor will do memory add/remove
operation.
This implies on pseries system, we now end up mapping
memory with 2M page size instead of 1G. To improve
that we want hypervisor to hint the kernel about the hotplug
memory range. This was added that as part of
commit b6eca183e23e ("powerpc/kernel: Enables memory
hot-remove after reboot on pseries guests")
But we still don't do that on PowerVM. Once we get PowerVM
updated, we can then force the 2M mapping only to hot-pluggable
memory region using memblock_is_hotpluggable(). Till then
let's depend on LMB size for finding the mapping page size
for linear range.
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/mm/book3s64/radix_pgtable.c | 83 ++++++++++++++++++++----
arch/powerpc/platforms/powernv/setup.c | 10 ++-
2 files changed, 81 insertions(+), 12 deletions(-)
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 78ad11812e0d..a4179e4da49d 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -15,6 +15,7 @@
#include <linux/mm.h>
#include <linux/hugetlb.h>
#include <linux/string_helpers.h>
+#include <linux/memory.h>
#include <asm/pgtable.h>
#include <asm/pgalloc.h>
@@ -34,6 +35,7 @@
unsigned int mmu_pid_bits;
unsigned int mmu_base_pid;
+unsigned int radix_mem_block_size;
static __ref void *early_alloc_pgtable(unsigned long size, int nid,
unsigned long region_start, unsigned long region_end)
@@ -266,6 +268,7 @@ static unsigned long next_boundary(unsigned long addr, unsigned long end)
static int __meminit create_physical_mapping(unsigned long start,
unsigned long end,
+ unsigned long max_mapping_size,
int nid, pgprot_t _prot)
{
unsigned long vaddr, addr, mapping_size = 0;
@@ -279,6 +282,8 @@ static int __meminit create_physical_mapping(unsigned long start,
int rc;
gap = next_boundary(addr, end) - addr;
+ if (gap > max_mapping_size)
+ gap = max_mapping_size;
previous_size = mapping_size;
prev_exec = exec;
@@ -329,8 +334,9 @@ static void __init radix_init_pgtable(void)
/* We don't support slb for radix */
mmu_slb_size = 0;
+
/*
- * Create the linear mapping, using standard page size for now
+ * Create the linear mapping
*/
for_each_memblock(memory, reg) {
/*
@@ -346,6 +352,7 @@ static void __init radix_init_pgtable(void)
WARN_ON(create_physical_mapping(reg->base,
reg->base + reg->size,
+ radix_mem_block_size,
-1, PAGE_KERNEL));
}
@@ -486,6 +493,49 @@ static int __init radix_dt_scan_page_sizes(unsigned long node,
return 1;
}
+static int __init probe_memory_block_size(unsigned long node, const char *uname, int
+ depth, void *data)
+{
+ const __be32 *block_size;
+ int len;
+
+ if (depth != 1)
+ return 0;
+
+ if (!strcmp(uname, "ibm,dynamic-reconfiguration-memory")) {
+
+ block_size = of_get_flat_dt_prop(node, "ibm,lmb-size", &len);
+ if (!block_size || len < dt_root_size_cells * sizeof(__be32))
+ /*
+ * Nothing in the device tree
+ */
+ return MIN_MEMORY_BLOCK_SIZE;
+
+ return dt_mem_next_cell(dt_root_size_cells, &block_size);
+
+ }
+
+ return 0;
+}
+
+static unsigned long radix_memory_block_size(void)
+{
+ unsigned long mem_block_size = MIN_MEMORY_BLOCK_SIZE;
+
+ if (firmware_has_feature(FW_FEATURE_OPAL)) {
+
+ mem_block_size = 1UL * 1024 * 1024 * 1024;
+
+ } else if (firmware_has_feature(FW_FEATURE_LPAR)) {
+ mem_block_size = of_scan_flat_dt(probe_memory_block_size, NULL);
+ if (!mem_block_size)
+ mem_block_size = MIN_MEMORY_BLOCK_SIZE;
+ }
+
+ return mem_block_size;
+}
+
+
void __init radix__early_init_devtree(void)
{
int rc;
@@ -494,17 +544,27 @@ void __init radix__early_init_devtree(void)
* Try to find the available page sizes in the device-tree
*/
rc = of_scan_flat_dt(radix_dt_scan_page_sizes, NULL);
- if (rc != 0) /* Found */
- goto found;
+ if (rc == 0) {
+ /*
+ * no page size details found in device tree
+ * let's assume we have page 4k and 64k support
+ */
+ mmu_psize_defs[MMU_PAGE_4K].shift = 12;
+ mmu_psize_defs[MMU_PAGE_4K].ap = 0x0;
+
+ mmu_psize_defs[MMU_PAGE_64K].shift = 16;
+ mmu_psize_defs[MMU_PAGE_64K].ap = 0x5;
+ }
+
/*
- * let's assume we have page 4k and 64k support
+ * Max mapping size used when mapping pages. We don't use
+ * ppc_md.memory_block_size() here because this get called
+ * early and we don't have machine probe called yet. Also
+ * the pseries implementation only check for ibm,lmb-size.
+ * All hypervisor supporting radix do expose that device
+ * tree node.
*/
- mmu_psize_defs[MMU_PAGE_4K].shift = 12;
- mmu_psize_defs[MMU_PAGE_4K].ap = 0x0;
-
- mmu_psize_defs[MMU_PAGE_64K].shift = 16;
- mmu_psize_defs[MMU_PAGE_64K].ap = 0x5;
-found:
+ radix_mem_block_size = radix_memory_block_size();
return;
}
@@ -856,7 +916,8 @@ int __meminit radix__create_section_mapping(unsigned long start,
return -1;
}
- return create_physical_mapping(__pa(start), __pa(end), nid, prot);
+ return create_physical_mapping(__pa(start), __pa(end),
+ radix_mem_block_size, nid, prot);
}
int __meminit radix__remove_section_mapping(unsigned long start, unsigned long end)
diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index 3bc188da82ba..6e2f614590a3 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -399,7 +399,15 @@ static void pnv_kexec_cpu_down(int crash_shutdown, int secondary)
#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
static unsigned long pnv_memory_block_size(void)
{
- return 256UL * 1024 * 1024;
+ /*
+ * We map the kernel linear region with 1GB large pages on radix. For
+ * memory hot unplug to work our memory block size must be at least
+ * this size.
+ */
+ if (radix_enabled())
+ return 1UL * 1024 * 1024 * 1024;
+ else
+ return 256UL * 1024 * 1024;
}
#endif
--
2.26.2
^ permalink raw reply related
* [PATCH v2 3/4] powerpc/mm/radix: Remove split_kernel_mapping()
From: Aneesh Kumar K.V @ 2020-06-25 6:45 UTC (permalink / raw)
To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K . V, Bharata B Rao
In-Reply-To: <20200625064547.228448-1-aneesh.kumar@linux.ibm.com>
From: Bharata B Rao <bharata@linux.ibm.com>
We split the page table mapping on memory unplug if the
linear range was mapped with huge page mapping (for ex: 1G)
The page table splitting code has a few issues:
1. Recursive locking
--------------------
Memory unplug path takes cpu_hotplug_lock and calls stop_machine()
for splitting the mappings. However stop_machine() takes
cpu_hotplug_lock again causing deadlock.
2. BUG: sleeping function called from in_atomic() context
---------------------------------------------------------
Memory unplug path (remove_pagetable) takes init_mm.page_table_lock
spinlock and later calls stop_machine() which does wait_for_completion()
3. Bad unlock unbalance
-----------------------
Memory unplug path takes init_mm.page_table_lock spinlock and calls
stop_machine(). The stop_machine thread function runs in a different
thread context (migration thread) which tries to release and reaquire
ptl. Releasing ptl from a different thread than which acquired it
causes bad unlock unbalance.
These problems can be avoided if we avoid mapping hot-plugged memory
with 1G mapping, thereby removing the need for splitting them during
unplug. The kernel always make sure the minimum unplug request is
SUBSECTION_SIZE for device memory and SECTION_SIZE for regular memory.
In preparation for such a change remove page table splitting support.
This essentially is a revert of
commit 4dd5f8a99e791 ("powerpc/mm/radix: Split linear mapping on hot-unplug")
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/mm/book3s64/radix_pgtable.c | 95 +++++-------------------
1 file changed, 19 insertions(+), 76 deletions(-)
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index af57604f295f..78ad11812e0d 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -15,7 +15,6 @@
#include <linux/mm.h>
#include <linux/hugetlb.h>
#include <linux/string_helpers.h>
-#include <linux/stop_machine.h>
#include <asm/pgtable.h>
#include <asm/pgalloc.h>
@@ -723,32 +722,6 @@ static void free_pud_table(pud_t *pud_start, p4d_t *p4d)
p4d_clear(p4d);
}
-struct change_mapping_params {
- pte_t *pte;
- unsigned long start;
- unsigned long end;
- unsigned long aligned_start;
- unsigned long aligned_end;
-};
-
-static int __meminit stop_machine_change_mapping(void *data)
-{
- struct change_mapping_params *params =
- (struct change_mapping_params *)data;
-
- if (!data)
- return -1;
-
- spin_unlock(&init_mm.page_table_lock);
- pte_clear(&init_mm, params->aligned_start, params->pte);
- create_physical_mapping(__pa(params->aligned_start),
- __pa(params->start), -1, PAGE_KERNEL);
- create_physical_mapping(__pa(params->end), __pa(params->aligned_end),
- -1, PAGE_KERNEL);
- spin_lock(&init_mm.page_table_lock);
- return 0;
-}
-
static void remove_pte_table(pte_t *pte_start, unsigned long addr,
unsigned long end)
{
@@ -777,52 +750,6 @@ static void remove_pte_table(pte_t *pte_start, unsigned long addr,
}
}
-/*
- * clear the pte and potentially split the mapping helper
- */
-static void __meminit split_kernel_mapping(unsigned long addr, unsigned long end,
- unsigned long size, pte_t *pte)
-{
- unsigned long mask = ~(size - 1);
- unsigned long aligned_start = addr & mask;
- unsigned long aligned_end = addr + size;
- struct change_mapping_params params;
- bool split_region = false;
-
- if ((end - addr) < size) {
- /*
- * We're going to clear the PTE, but not flushed
- * the mapping, time to remap and flush. The
- * effects if visible outside the processor or
- * if we are running in code close to the
- * mapping we cleared, we are in trouble.
- */
- if (overlaps_kernel_text(aligned_start, addr) ||
- overlaps_kernel_text(end, aligned_end)) {
- /*
- * Hack, just return, don't pte_clear
- */
- WARN_ONCE(1, "Linear mapping %lx->%lx overlaps kernel "
- "text, not splitting\n", addr, end);
- return;
- }
- split_region = true;
- }
-
- if (split_region) {
- params.pte = pte;
- params.start = addr;
- params.end = end;
- params.aligned_start = addr & ~(size - 1);
- params.aligned_end = min_t(unsigned long, aligned_end,
- (unsigned long)__va(memblock_end_of_DRAM()));
- stop_machine(stop_machine_change_mapping, ¶ms, NULL);
- return;
- }
-
- pte_clear(&init_mm, addr, pte);
-}
-
static void remove_pmd_table(pmd_t *pmd_start, unsigned long addr,
unsigned long end)
{
@@ -838,7 +765,12 @@ static void remove_pmd_table(pmd_t *pmd_start, unsigned long addr,
continue;
if (pmd_is_leaf(*pmd)) {
- split_kernel_mapping(addr, end, PMD_SIZE, (pte_t *)pmd);
+ if (!IS_ALIGNED(addr, PMD_SIZE) ||
+ !IS_ALIGNED(next, PMD_SIZE)) {
+ WARN_ONCE(1, "%s: unaligned range\n", __func__);
+ continue;
+ }
+ pte_clear(&init_mm, addr, (pte_t *)pmd);
continue;
}
@@ -863,7 +795,12 @@ static void remove_pud_table(pud_t *pud_start, unsigned long addr,
continue;
if (pud_is_leaf(*pud)) {
- split_kernel_mapping(addr, end, PUD_SIZE, (pte_t *)pud);
+ if (!IS_ALIGNED(addr, PUD_SIZE) ||
+ !IS_ALIGNED(next, PUD_SIZE)) {
+ WARN_ONCE(1, "%s: unaligned range\n", __func__);
+ continue;
+ }
+ pte_clear(&init_mm, addr, (pte_t *)pud);
continue;
}
@@ -891,7 +828,13 @@ static void __meminit remove_pagetable(unsigned long start, unsigned long end)
continue;
if (p4d_is_leaf(*p4d)) {
- split_kernel_mapping(addr, end, P4D_SIZE, (pte_t *)p4d);
+ if (!IS_ALIGNED(addr, P4D_SIZE) ||
+ !IS_ALIGNED(next, P4D_SIZE)) {
+ WARN_ONCE(1, "%s: unaligned range\n", __func__);
+ continue;
+ }
+
+ pte_clear(&init_mm, addr, (pte_t *)pgd);
continue;
}
--
2.26.2
^ permalink raw reply related
* [PATCH v2 2/4] powerpc/mm/radix: Free PUD table when freeing pagetable
From: Aneesh Kumar K.V @ 2020-06-25 6:45 UTC (permalink / raw)
To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K . V, Bharata B Rao
In-Reply-To: <20200625064547.228448-1-aneesh.kumar@linux.ibm.com>
From: Bharata B Rao <bharata@linux.ibm.com>
remove_pagetable() isn't freeing PUD table. This causes memory
leak during memory unplug. Fix this.
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/mm/book3s64/radix_pgtable.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 80be96d3018f..af57604f295f 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -708,6 +708,21 @@ static void free_pmd_table(pmd_t *pmd_start, pud_t *pud)
pud_clear(pud);
}
+static void free_pud_table(pud_t *pud_start, p4d_t *p4d)
+{
+ pud_t *pud;
+ int i;
+
+ for (i = 0; i < PTRS_PER_PUD; i++) {
+ pud = pud_start + i;
+ if (!pud_none(*pud))
+ return;
+ }
+
+ pud_free(&init_mm, pud_start);
+ p4d_clear(p4d);
+}
+
struct change_mapping_params {
pte_t *pte;
unsigned long start;
@@ -882,6 +897,7 @@ static void __meminit remove_pagetable(unsigned long start, unsigned long end)
pud_base = (pud_t *)p4d_page_vaddr(*p4d);
remove_pud_table(pud_base, addr, next);
+ free_pud_table(pud_base, p4d);
}
spin_unlock(&init_mm.page_table_lock);
--
2.26.2
^ permalink raw reply related
* [PATCH v2 1/4] powerpc/mm/radix: Fix PTE/PMD fragment count for early page table mappings
From: Aneesh Kumar K.V @ 2020-06-25 6:45 UTC (permalink / raw)
To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, Bharata B Rao
In-Reply-To: <20200625064547.228448-1-aneesh.kumar@linux.ibm.com>
We can hit the following BUG_ON during memory unplug:
kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:342!
Oops: Exception in kernel mode, sig: 5 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
NIP [c000000000093308] pmd_fragment_free+0x48/0xc0
LR [c00000000147bfec] remove_pagetable+0x578/0x60c
Call Trace:
0xc000008050000000 (unreliable)
remove_pagetable+0x384/0x60c
radix__remove_section_mapping+0x18/0x2c
remove_section_mapping+0x1c/0x3c
arch_remove_memory+0x11c/0x180
try_remove_memory+0x120/0x1b0
__remove_memory+0x20/0x40
dlpar_remove_lmb+0xc0/0x114
dlpar_memory+0x8b0/0xb20
handle_dlpar_errorlog+0xc0/0x190
pseries_hp_work_fn+0x2c/0x60
process_one_work+0x30c/0x810
worker_thread+0x98/0x540
kthread+0x1c4/0x1d0
ret_from_kernel_thread+0x5c/0x74
This occurs when unplug is attempted for such memory which has
been mapped using memblock pages as part of early kernel page
table setup. We wouldn't have initialized the PMD or PTE fragment
count for those PMD or PTE pages.
Fixing this includes 3 parts:
- Re-walk the init_mm page tables from mem_init() and initialize
the PMD and PTE fragment count to 1.
- When freeing PUD, PMD and PTE page table pages, check explicitly
if they come from memblock and if so free then appropriately.
- When we do early memblock based allocation of PMD and PUD pages,
allocate in PAGE_SIZE granularity so that we are sure the
complete page is used as pagetable page.
Since we now do PAGE_SIZE allocations for both PUD table and
PMD table (Note that PTE table allocation is already of PAGE_SIZE),
we end up allocating more memory for the same amount of system RAM.
Here is a comparision of how much more we need for a 64T and 2G
system after this patch:
1. 64T system
-------------
64T RAM would need 64G for vmemmap with struct page size being 64B.
128 PUD tables for 64T memory (1G mappings)
1 PUD table and 64 PMD tables for 64G vmemmap (2M mappings)
With default PUD[PMD]_TABLE_SIZE(4K), (128+1+64)*4K=772K
With PAGE_SIZE(64K) table allocations, (128+1+64)*64K=12352K
2. 2G system
------------
2G RAM would need 2M for vmemmap with struct page size being 64B.
1 PUD table for 2G memory (1G mapping)
1 PUD table and 1 PMD table for 2M vmemmap (2M mappings)
With default PUD[PMD]_TABLE_SIZE(4K), (1+1+1)*4K=12K
With new PAGE_SIZE(64K) table allocations, (1+1+1)*64K=192K
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/include/asm/book3s/64/pgalloc.h | 16 +++++++++++++++-
arch/powerpc/mm/book3s64/pgtable.c | 5 ++++-
arch/powerpc/mm/book3s64/radix_pgtable.c | 15 +++++++++++----
arch/powerpc/mm/pgtable-frag.c | 3 +++
4 files changed, 33 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index 69c5b051734f..e1af0b394ceb 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -107,9 +107,23 @@ static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
return pud;
}
+static inline void __pud_free(pud_t *pud)
+{
+ struct page *page = virt_to_page(pud);
+
+ /*
+ * Early pud pages allocated via memblock allocator
+ * can't be directly freed to slab
+ */
+ if (PageReserved(page))
+ free_reserved_page(page);
+ else
+ kmem_cache_free(PGT_CACHE(PUD_CACHE_INDEX), pud);
+}
+
static inline void pud_free(struct mm_struct *mm, pud_t *pud)
{
- kmem_cache_free(PGT_CACHE(PUD_CACHE_INDEX), pud);
+ return __pud_free(pud);
}
static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index c58ad1049909..85de5b574dd7 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -339,6 +339,9 @@ void pmd_fragment_free(unsigned long *pmd)
{
struct page *page = virt_to_page(pmd);
+ if (PageReserved(page))
+ return free_reserved_page(page);
+
BUG_ON(atomic_read(&page->pt_frag_refcount) <= 0);
if (atomic_dec_and_test(&page->pt_frag_refcount)) {
pgtable_pmd_page_dtor(page);
@@ -356,7 +359,7 @@ static inline void pgtable_free(void *table, int index)
pmd_fragment_free(table);
break;
case PUD_INDEX:
- kmem_cache_free(PGT_CACHE(PUD_CACHE_INDEX), table);
+ __pud_free(table);
break;
#if defined(CONFIG_PPC_4K_PAGES) && defined(CONFIG_HUGETLB_PAGE)
/* 16M hugepd directory at pud level */
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 8acb96de0e48..80be96d3018f 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -57,6 +57,13 @@ static __ref void *early_alloc_pgtable(unsigned long size, int nid,
return ptr;
}
+/*
+ * When allocating pud or pmd pointers, we allocate a complete page
+ * of PAGE_SIZE rather than PUD_TABLE_SIZE or PMD_TABLE_SIZE. This
+ * is to ensure that the page obtained from the memblock allocator
+ * can be completely used as page table page and can be freed
+ * correctly when the page table entries are removed.
+ */
static int early_map_kernel_page(unsigned long ea, unsigned long pa,
pgprot_t flags,
unsigned int map_page_size,
@@ -73,8 +80,8 @@ static int early_map_kernel_page(unsigned long ea, unsigned long pa,
pgdp = pgd_offset_k(ea);
p4dp = p4d_offset(pgdp, ea);
if (p4d_none(*p4dp)) {
- pudp = early_alloc_pgtable(PUD_TABLE_SIZE, nid,
- region_start, region_end);
+ pudp = early_alloc_pgtable(PAGE_SIZE, nid,
+ region_start, region_end);
p4d_populate(&init_mm, p4dp, pudp);
}
pudp = pud_offset(p4dp, ea);
@@ -83,8 +90,8 @@ static int early_map_kernel_page(unsigned long ea, unsigned long pa,
goto set_the_pte;
}
if (pud_none(*pudp)) {
- pmdp = early_alloc_pgtable(PMD_TABLE_SIZE, nid,
- region_start, region_end);
+ pmdp = early_alloc_pgtable(PAGE_SIZE, nid, region_start,
+ region_end);
pud_populate(&init_mm, pudp, pmdp);
}
pmdp = pmd_offset(pudp, ea);
diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
index ee4bd6d38602..97ae4935da79 100644
--- a/arch/powerpc/mm/pgtable-frag.c
+++ b/arch/powerpc/mm/pgtable-frag.c
@@ -110,6 +110,9 @@ void pte_fragment_free(unsigned long *table, int kernel)
{
struct page *page = virt_to_page(table);
+ if (PageReserved(page))
+ return free_reserved_page(page);
+
BUG_ON(atomic_read(&page->pt_frag_refcount) <= 0);
if (atomic_dec_and_test(&page->pt_frag_refcount)) {
if (!kernel)
--
2.26.2
^ permalink raw reply related
* [PATCH v2 0/4] powerpc/mm/radix: Memory unplug fixes
From: Aneesh Kumar K.V @ 2020-06-25 6:45 UTC (permalink / raw)
To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, Bharata B Rao
This is the next version of the fixes for memory unplug on radix.
The issues and the fix are described in the actual patches.
Changes from v1:
- Added back patch to drop split_kernel_mapping
- Most of the split_kernel_mapping related issues are now described
in the removal patch
- drop pte fragment change
- use lmb size as the max mapping size.
- Radix baremetal now use memory block size of 1G.
Changes from v0:
==============
- Rebased to latest kernel.
- Took care of p4d changes.
- Addressed Aneesh's review feedback:
- Added comments.
- Indentation fixed.
- Dropped the 1st patch (setting DRCONF_MEM_HOTREMOVABLE lmb flags) as
it is debatable if this flag should be set in the device tree by OS
and not by platform in case of hotplug. This can be looked at separately.
(The fixes in this patchset remain valid without the dropped patch)
- Dropped the last patch that removed split_kernel_mapping() to ensure
that spilitting code is available for any radix guest running on
platforms that don't set DRCONF_MEM_HOTREMOVABLE.
Aneesh Kumar K.V (2):
powerpc/mm/radix: Fix PTE/PMD fragment count for early page table
mappings
powerpc/mm/radix: Create separate mappings for hot-plugged memory
Bharata B Rao (2):
powerpc/mm/radix: Free PUD table when freeing pagetable
powerpc/mm/radix: Remove split_kernel_mapping()
arch/powerpc/include/asm/book3s/64/pgalloc.h | 16 +-
arch/powerpc/mm/book3s64/pgtable.c | 5 +-
arch/powerpc/mm/book3s64/radix_pgtable.c | 199 +++++++++++--------
arch/powerpc/mm/pgtable-frag.c | 3 +
arch/powerpc/platforms/powernv/setup.c | 10 +-
5 files changed, 144 insertions(+), 89 deletions(-)
--
2.26.2
^ permalink raw reply
* Re: [PATCH v4 00/11] kunit: create a centralized executor to dispatch all KUnit tests
From: David Gow @ 2020-06-25 1:47 UTC (permalink / raw)
To: Brendan Higgins
Cc: open list:DOCUMENTATION, catalin.marinas, jcmvbkbc, will, paulus,
open list:KERNEL SELFTEST FRAMEWORK, Frank Rowand, Anton Ivanov,
linux-arch, Richard Weinberger, rppt, Iurii Zaikin, linux-xtensa,
Kees Cook, Arnd Bergmann, Jeff Dike, linux-um, linuxppc-dev,
Shuah Khan, linux-arm-kernel, KUnit Development, chris, monstr,
Stephen Boyd, Greg Kroah-Hartman, Linux Kernel Mailing List,
Luis Chamberlain, Alan Maguire, Andrew Morton, Logan Gunthorpe
In-Reply-To: <20200624205550.215599-1-brendanhiggins@google.com>
Glad this is back out there: a couple of minor nitpicks below:
On Thu, Jun 25, 2020 at 4:58 AM Brendan Higgins
<brendanhiggins@google.com> wrote:
>
> ## TL;DR
>
> This patchset adds a centralized executor to dispatch tests rather than
> relying on late_initcall to schedule each test suite separately along
> with a couple of new features that depend on it.
>
> Also, sorry for the extreme delay in getting this out. Part of the delay
> came from finding that there were actually several architectures that
> the previous revision of this patchset didn't work on, so I went through
> and attempted to test this patchset on every architecture - more on that
> later.
>
> ## What am I trying to do?
>
> Conceptually, I am trying to provide a mechanism by which test suites
> can be grouped together so that they can be reasoned about collectively.
> The last two of three patches in this series add features which depend
> on this:
>
> PATCH 8/11 Prints out a test plan[1] right before KUnit tests are run;
> this is valuable because it makes it possible for a test
> harness to detect whether the number of tests run matches the
> number of tests expected to be run, ensuring that no tests
> silently failed. The test plan includes a count of tests that
> will run. With the centralized executor, the tests are
> located in a single data structure and thus can be counted.
>
This appears to actually be patch 9/11.
> PATCH 9/11 Add a new kernel command-line option which allows the user to
> specify that the kernel poweroff, halt, or reboot after
> completing all KUnit tests; this is very handy for running
> KUnit tests on UML or a VM so that the UML/VM process exits
> cleanly immediately after running all tests without needing a
> special initramfs. The centralized executor provides a
> definitive point when all tests have completed and the
> poweroff, halt, or reboot could occur.
This seems to have been merged into the above patch (9/11).
> In addition, by dispatching tests from a single location, we can
> guarantee that all KUnit tests run after late_init is complete, which
> was a concern during the initial KUnit patchset review (this has not
> been a problem in practice, but resolving with certainty is nevertheless
> desirable).
>
> Other use cases for this exist, but the above features should provide an
> idea of the value that this could provide.
>
> ## Changes since last revision:
> - On the last revision I got some messages from 0day that showed that
> this patchset didn't work on several architectures, one issue that
> this patchset addresses is that we were aligning both memory segments
> as well as structures in the segments to specific byte boundaries
> which was incorrect.
> - The issue mentioned above also caused me to test on additional
> architectures which revealed that some architectures other than UML
> do not use the default init linker section macro that most
> architectures use. There are now several new patches (2, 3, 4, and
> 6).
> - Fixed a formatting consistency issue in the kernel params
> documentation patch (9/9).
> - Add a brief blurb on how and when the kunit_test_suite macro works.
>
> ## Remaining work to be done:
>
> The only architecture for which I was able to get a compiler, but was
> apparently unable to get KUnit into a section that the executor to see
> was m68k - not sure why.
>
> Alan Maguire (1):
> kunit: test: create a single centralized executor for all tests
>
> Brendan Higgins (10):
> vmlinux.lds.h: add linker section for KUnit test suites
> arch: arm64: add linker section for KUnit test suites
> arch: microblaze: add linker section for KUnit test suites
> arch: powerpc: add linker section for KUnit test suites
> arch: um: add linker section for KUnit test suites
> arch: xtensa: add linker section for KUnit test suites
> init: main: add KUnit to kernel init
> kunit: test: add test plan to KUnit TAP format
> Documentation: Add kunit_shutdown to kernel-parameters.txt
> Documentation: kunit: add a brief blurb about kunit_test_suite
>
> .../admin-guide/kernel-parameters.txt | 8 ++
> Documentation/dev-tools/kunit/usage.rst | 5 ++
> arch/arm64/kernel/vmlinux.lds.S | 3 +
> arch/microblaze/kernel/vmlinux.lds.S | 4 +
> arch/powerpc/kernel/vmlinux.lds.S | 4 +
> arch/um/include/asm/common.lds.S | 4 +
> arch/xtensa/kernel/vmlinux.lds.S | 4 +
> include/asm-generic/vmlinux.lds.h | 8 ++
> include/kunit/test.h | 73 ++++++++++++-----
> init/main.c | 4 +
> lib/kunit/Makefile | 3 +-
> lib/kunit/executor.c | 63 +++++++++++++++
> lib/kunit/test.c | 13 +--
> tools/testing/kunit/kunit_kernel.py | 2 +-
> tools/testing/kunit/kunit_parser.py | 74 +++++++++++++++---
> .../test_is_test_passed-all_passed.log | Bin 1562 -> 1567 bytes
> .../test_data/test_is_test_passed-crash.log | Bin 3016 -> 3021 bytes
> .../test_data/test_is_test_passed-failure.log | Bin 1700 -> 1705 bytes
> 18 files changed, 226 insertions(+), 46 deletions(-)
> create mode 100644 lib/kunit/executor.c
>
>
> base-commit: 4333a9b0b67bb4e8bcd91bdd80da80b0ec151162
> prerequisite-patch-id: 2d4b5aa9fa8ada9ae04c8584b47c299a822b9455
> prerequisite-patch-id: 582b6d9d28ce4b71628890ec832df6522ca68de0
>
> These patches are available for download with dependencies here:
>
> https://kunit-review.googlesource.com/c/linux/+/3829
>
> [1] https://github.com/isaacs/testanything.github.io/blob/tap14/tap-version-14-specification.md#the-plan
> [2] https://patchwork.kernel.org/patch/11383635/
>
> --
> 2.27.0.212.ge8ba1cc988-goog
>
^ permalink raw reply
* Re: [PATCH] powerpc/boot: Use address-of operator on section symbols
From: Nathan Chancellor @ 2020-06-25 2:22 UTC (permalink / raw)
To: Geoff Levand
Cc: linux-kernel, clang-built-linux, Paul Mackerras, Joel Stanley,
linuxppc-dev
In-Reply-To: <1bbb6956-d9de-e0c8-5b45-20b6fecc2189@infradead.org>
Hi Geoff,
On Wed, Jun 24, 2020 at 06:18:48PM -0700, Geoff Levand wrote:
> Hi Nathan,
>
> On 6/23/20 8:59 PM, Nathan Chancellor wrote:
> > These are not true arrays, they are linker defined symbols, which are
> > just addresses. Using the address of operator silences the warning
> > and does not change the resulting assembly with either clang/ld.lld
> > or gcc/ld (tested with diff + objdump -Dr).
>
> Thanks for your patch. I tested this patch applied to v5.8-rc2 on a
> PS3 and it seems OK.
>
> Tested-by: Geoff Levand <geoff@infradead.org>
Thanks a lot for the quick response and testing, I really appreciate it!
Cheers,
Nathan
^ permalink raw reply
* Re: [PATCH] powerpc/boot: Use address-of operator on section symbols
From: Geoff Levand @ 2020-06-25 1:18 UTC (permalink / raw)
To: Nathan Chancellor, Michael Ellerman
Cc: linux-kernel, clang-built-linux, Paul Mackerras, Joel Stanley,
linuxppc-dev
In-Reply-To: <20200624035920.835571-1-natechancellor@gmail.com>
Hi Nathan,
On 6/23/20 8:59 PM, Nathan Chancellor wrote:
> These are not true arrays, they are linker defined symbols, which are
> just addresses. Using the address of operator silences the warning
> and does not change the resulting assembly with either clang/ld.lld
> or gcc/ld (tested with diff + objdump -Dr).
Thanks for your patch. I tested this patch applied to v5.8-rc2 on a
PS3 and it seems OK.
Tested-by: Geoff Levand <geoff@infradead.org>
^ permalink raw reply
* Re: [PATCH] powerpc/pseries: Use doorbells even if XIVE is available
From: Michael Ellerman @ 2020-06-25 1:11 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev; +Cc: Anton Blanchard, Nicholas Piggin, kvm-ppc
In-Reply-To: <20200624134724.2343007-1-npiggin@gmail.com>
Nicholas Piggin <npiggin@gmail.com> writes:
> KVM supports msgsndp in guests by trapping and emulating the
> instruction, so it was decided to always use XIVE for IPIs if it is
> available. However on PowerVM systems, msgsndp can be used and gives
> better performance. On large systems, high XIVE interrupt rates can
> have sub-linear scaling, and using msgsndp can reduce the load on
> the interrupt controller.
>
> So switch to using core local doorbells even if XIVE is available.
> This reduces performance for KVM guests with an SMT topology by
> about 50% for ping-pong context switching between SMT vCPUs.
You have to take explicit steps to configure KVM in that way with qemu.
eg. "qemu .. -smp 8" will give you 8 SMT1 CPUs by default.
> An option vector (or dt-cpu-ftrs) could be defined to disable msgsndp
> to get KVM performance back.
Qemu/KVM populates /proc/device-tree/hypervisor, so we *could* look at
that. Though adding PowerVM/KVM specific hacks is obviously a very
slippery slope.
> diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
> index 6891710833be..a737a2f87c67 100644
> --- a/arch/powerpc/platforms/pseries/smp.c
> +++ b/arch/powerpc/platforms/pseries/smp.c
> @@ -188,13 +188,14 @@ static int pseries_smp_prepare_cpu(int cpu)
> return 0;
> }
>
> +static void (*cause_ipi_offcore)(int cpu) __ro_after_init;
> +
> static void smp_pseries_cause_ipi(int cpu)
This is static so the name could be more descriptive, it doesn't need
the "smp_pseries" prefix.
> {
> - /* POWER9 should not use this handler */
> if (doorbell_try_core_ipi(cpu))
> return;
Seems like it would be worth making that static inline so we can avoid
the function call overhead.
> - icp_ops->cause_ipi(cpu);
> + cause_ipi_offcore(cpu);
> }
>
> static int pseries_cause_nmi_ipi(int cpu)
> @@ -222,10 +223,7 @@ static __init void pSeries_smp_probe_xics(void)
> {
> xics_smp_probe();
>
> - if (cpu_has_feature(CPU_FTR_DBELL) && !is_secure_guest())
> - smp_ops->cause_ipi = smp_pseries_cause_ipi;
> - else
> - smp_ops->cause_ipi = icp_ops->cause_ipi;
> + smp_ops->cause_ipi = icp_ops->cause_ipi;
> }
>
> static __init void pSeries_smp_probe(void)
> @@ -238,6 +236,18 @@ static __init void pSeries_smp_probe(void)
The comment just above here says:
/*
* Don't use P9 doorbells when XIVE is enabled. IPIs
* using MMIOs should be faster
*/
> xive_smp_probe();
Which is no longer true.
> else
> pSeries_smp_probe_xics();
I think you should just fold this in, it would make the logic slightly
easier to follow.
> + /*
> + * KVM emulates doorbells by reading the instruction, which
> + * can't be done if the guest is secure. If a secure guest
> + * runs under PowerVM, it could use msgsndp but would need a
> + * way to distinguish.
> + */
It's not clear what it needs to distinguish: That it's running under
PowerVM and therefore *can* use msgsndp even though it's secure.
Also the comment just talks about the is_secure_guest() test, which is
not obvious on first reading.
> + if (cpu_has_feature(CPU_FTR_DBELL) &&
> + cpu_has_feature(CPU_FTR_SMT) && !is_secure_guest()) {
> + cause_ipi_offcore = smp_ops->cause_ipi;
> + smp_ops->cause_ipi = smp_pseries_cause_ipi;
> + }
Because we're at the tail of the function I think this would be clearer
if it used early returns, eg:
// If the CPU doesn't have doorbells then we must use xics/xive
if (!cpu_has_feature(CPU_FTR_DBELL))
return;
// If the CPU doesn't have SMT then doorbells don't help us
if (!cpu_has_feature(CPU_FTR_SMT))
return;
// Secure guests can't use doorbells because ...
if (!is_secure_guest()
return;
/*
* Otherwise we want to use doorbells for sibling threads and
* xics/xive for IPIs off the core, because it performs better
* on large systems ...
*/
cause_ipi_offcore = smp_ops->cause_ipi;
smp_ops->cause_ipi = smp_pseries_cause_ipi;
}
cheers
^ permalink raw reply
* Re: FSL P5020/P5040: DPAA Ethernet issue with the latest Git kernel
From: Christian Zigotzky @ 2020-06-24 23:01 UTC (permalink / raw)
To: Alexander Gordeev
Cc: Darren Stevens, linuxppc-dev, R.T.Dickinson, mad skateman,
Christian Zigotzky
In-Reply-To: <20200624082352.GA24934@oc3871087118.ibm.com>
On 24 June 2020, at 10:23 am, Alexander Gordeev wrote:
> On Sun, Jun 21, 2020 at 08:40:14AM +0200, Christian Zigotzky wrote:
>> Hello Alexander,
>>
>> The DPAA Ethernet doesn’t work anymore on our FSL P5020/P5040 boards [1] since the RC1 of kernel 5.8 [2].
>> We bisected last days [3] and found the problematic commit [4]. I was able to revert it [5]. After that the DPAA Ethernet works again. I created a patch for reverting the commit [5]. This patch works and I will use it for the RC2.
>> Could you please check your commit? [4]
> Hi Christian,
>
> Could you please check if the kernel passes CONFIG_TEST_BITMAP self-test?
>
> Thanks!
>
>> Thanks,
>> Christian
>>
>> [1] http://wiki.amiga.org/index.php?title=X5000
>> [2] https://forum.hyperion-entertainment.com/viewtopic.php?p=50885#p50885
>> [3] https://forum.hyperion-entertainment.com/viewtopic.php?p=50892#p50892
>> [4] lib: fix bitmap_parse() on 64-bit big endian archs: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=81c4f4d924d5d009b5ed785a3e22b18d0f7b831f
>> [5] https://forum.hyperion-entertainment.com/viewtopic.php?p=50982#p50982
>>
>>
Hi Alexander,
Thanks for your reply!
I compiled a test kernel with the option "CONFIG_TEST_BITMAP=y"
yesterday. After that Skateman and I booted it and looked for the bitmap
tests with "dmesg | grep -i bitmap".
Results:
FSL P5020:
[ 0.297756] test_bitmap: loaded.
[ 0.298113] test_bitmap: parselist: 14: input is '0-2047:128/256' OK,
Time: 562
[ 0.298142] test_bitmap: parselist_user: 14: input is
'0-2047:128/256' OK, Time: 761
[ 0.301553] test_bitmap: all 1663 tests passed
FSL P5040:
[ 0.296563] test_bitmap: loaded.
[ 0.296894] test_bitmap: parselist: 14: input is '0-2047:128/256' OK,
Time: 540
[ 0.296920] test_bitmap: parselist_user: 14: input is
'0-2047:128/256' OK, Time: 680
[ 0.299994] test_bitmap: all 1663 tests passed
Thanks,
Christian
^ permalink raw reply
* [PATCH v4 11/11] Documentation: kunit: add a brief blurb about kunit_test_suite
From: Brendan Higgins @ 2020-06-24 20:55 UTC (permalink / raw)
To: jdike, richard, anton.ivanov, arnd, keescook, skhan, alan.maguire,
yzaikin, davidgow, akpm, rppt, frowand.list, catalin.marinas,
will, monstr, mpe, benh, paulus, chris, jcmvbkbc
Cc: linux-arch, linux-xtensa, linux-doc, sboyd, gregkh, linuxppc-dev,
linux-um, linux-kernel, Brendan Higgins, mcgrof, linux-kselftest,
logang, linux-arm-kernel, kunit-dev
In-Reply-To: <20200624205550.215599-1-brendanhiggins@google.com>
Add a brief blurb saying how and when the kunit_test_suite() macro
works to the usage documentation.
Signed-off-by: Brendan Higgins <brendanhiggins@google.com>
---
Documentation/dev-tools/kunit/usage.rst | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/Documentation/dev-tools/kunit/usage.rst b/Documentation/dev-tools/kunit/usage.rst
index 3c3fe8b5feccf..961d3ea3ca19a 100644
--- a/Documentation/dev-tools/kunit/usage.rst
+++ b/Documentation/dev-tools/kunit/usage.rst
@@ -211,6 +211,11 @@ KUnit test framework.
.. note::
A test case will only be run if it is associated with a test suite.
+``kunit_test_suite(...)`` is a macro which tells the linker to put the specified
+test suite in a special linker section so that it can be run by KUnit either
+after late_init, or when the test module is loaded (depending on whether the
+test was built in or not).
+
For more information on these types of things see the :doc:`api/test`.
Isolating Behavior
--
2.27.0.212.ge8ba1cc988-goog
^ permalink raw reply related
* [PATCH v4 10/11] Documentation: Add kunit_shutdown to kernel-parameters.txt
From: Brendan Higgins @ 2020-06-24 20:55 UTC (permalink / raw)
To: jdike, richard, anton.ivanov, arnd, keescook, skhan, alan.maguire,
yzaikin, davidgow, akpm, rppt, frowand.list, catalin.marinas,
will, monstr, mpe, benh, paulus, chris, jcmvbkbc
Cc: linux-arch, linux-xtensa, linux-doc, sboyd, gregkh, linuxppc-dev,
linux-um, linux-kernel, Brendan Higgins, mcgrof, linux-kselftest,
logang, linux-arm-kernel, kunit-dev
In-Reply-To: <20200624205550.215599-1-brendanhiggins@google.com>
Add kunit_shutdown, an option to specify that the kernel shutsdown after
running KUnit tests, to the kernel-parameters.txt documentation.
Signed-off-by: Brendan Higgins <brendanhiggins@google.com>
Reviewed-by: Stephen Boyd <sboyd@kernel.org>
---
Documentation/admin-guide/kernel-parameters.txt | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index fb95fad81c79a..e7d5eb7249e7f 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2183,6 +2183,14 @@
0: force disabled
1: force enabled
+ kunit_shutdown=[KERNEL UNIT TESTING FRAMEWORK] Shutdown kernel after
+ running built-in tests. Tests configured as modules will
+ not be run.
+ Default: (flag not present) don't shutdown
+ poweroff: poweroff the kernel after running tests
+ halt: halt the kernel after running tests
+ reboot: reboot the kernel after running tests
+
kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs.
Default is 0 (don't ignore, but inject #GP)
--
2.27.0.212.ge8ba1cc988-goog
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox