LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH v2] powerpc: Warn about use of smt_snooze_delay
From: Michael Ellerman @ 2020-08-04 11:59 UTC (permalink / raw)
  To: Joel Stanley, linuxppc-dev; +Cc: Tyrel Datwyler, Gautham R Shenoy
In-Reply-To: <20200630015935.2675676-1-joel@jms.id.au>

Joel Stanley <joel@jms.id.au> writes:
> It's not done anything for a long time. Save the percpu variable, and
> emit a warning to remind users to not expect it to do anything.
>
> Fixes: 3fa8cad82b94 ("powerpc/pseries/cpuidle: smt-snooze-delay cleanup.")
> Cc: stable@vger.kernel.org # v3.14
> Signed-off-by: Joel Stanley <joel@jms.id.au>
> --
> v2:
>  Use pr_warn instead of WARN
>  Reword and print proccess name with pid in message
>  Leave CPU_FTR_SMT test in
>  Add Fixes line
>
> mpe, if you don't agree then feel free to drop the cc stable.
>
> Testing 'ppc64_cpu --smt=off' on a 24 core / 4 SMT system it's quite noisy
> as the online/offline loop that ppc64_cpu runs is slow.

Hmm, that is pretty spammy.

I know I suggested the ratelimit, but I was thinking it would print once
for each ppc64_cpu invocation, not ~30 times.

How about pr_warn_once(), that should still be sufficient for people to
notice if they're looking for it.

...
> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> index 571b3259697e..ba6d4cee19ef 100644
> --- a/arch/powerpc/kernel/sysfs.c
> +++ b/arch/powerpc/kernel/sysfs.c
> @@ -32,29 +32,26 @@
>  
>  static DEFINE_PER_CPU(struct cpu, cpu_devices);
>  
> -/*
> - * SMT snooze delay stuff, 64-bit only for now
> - */
> -
>  #ifdef CONFIG_PPC64
>  
> -/* Time in microseconds we delay before sleeping in the idle loop */
> -static DEFINE_PER_CPU(long, smt_snooze_delay) = { 100 };
> +/*
> + * Snooze delay has not been hooked up since 3fa8cad82b94 ("powerpc/pseries/cpuidle:
> + * smt-snooze-delay cleanup.") and has been broken even longer. As was foretold in
> + * 2014:
> + *
> + *  "ppc64_util currently utilises it. Once we fix ppc64_util, propose to clean
> + *  up the kernel code."
> + *
> + * At some point in the future this code should be removed.
> + */
>  
>  static ssize_t store_smt_snooze_delay(struct device *dev,
>  				      struct device_attribute *attr,
>  				      const char *buf,
>  				      size_t count)
>  {
> -	struct cpu *cpu = container_of(dev, struct cpu, dev);
> -	ssize_t ret;
> -	long snooze;
> -
> -	ret = sscanf(buf, "%ld", &snooze);
> -	if (ret != 1)
> -		return -EINVAL;
> -
> -	per_cpu(smt_snooze_delay, cpu->dev.id) = snooze;
> +	pr_warn_ratelimited("%s (%d) used unsupported smt_snooze_delay, this has no effect\n",
> +			    current->comm, current->pid);

Can we make this:

	"%s (%d) stored to unsupported smt_snooze_delay, which has no effect.\n",


>  	return count;
>  }
>  
> @@ -62,9 +59,9 @@ static ssize_t show_smt_snooze_delay(struct device *dev,
>  				     struct device_attribute *attr,
>  				     char *buf)
>  {
> -	struct cpu *cpu = container_of(dev, struct cpu, dev);
> -
> -	return sprintf(buf, "%ld\n", per_cpu(smt_snooze_delay, cpu->dev.id));
> +	pr_warn_ratelimited("%s (%d) used unsupported smt_snooze_delay, this has no effect\n",
> +			    current->comm, current->pid);

It has as much effect as it ever did :)

So maybe:

	"%s (%d) read from unsupported smt_snooze_delay.\n",


I can do those changes when applying if you like rather than making you
do a v3.

cheers

^ permalink raw reply

* Re: [PATCH 1/6] powerpc/powernv/smp: Fix spurious DBG() warning
From: Michael Ellerman @ 2020-08-04 12:05 UTC (permalink / raw)
  To: Joel Stanley, Oliver O'Halloran; +Cc: linuxppc-dev
In-Reply-To: <CACPK8XfoZ8+SUG6cuWuEJqdTfmxePsBGFGgqyrPvmn1WyRVyjA@mail.gmail.com>

Joel Stanley <joel@jms.id.au> writes:
> On Tue, 4 Aug 2020 at 00:57, Oliver O'Halloran <oohall@gmail.com> wrote:
>>
>> When building with W=1 we get the following warning:
>>
>>  arch/powerpc/platforms/powernv/smp.c: In function ‘pnv_smp_cpu_kill_self’:
>>  arch/powerpc/platforms/powernv/smp.c:276:16: error: suggest braces around
>>         empty body in an ‘if’ statement [-Werror=empty-body]
>>    276 |      cpu, srr1);
>>        |                ^
>>  cc1: all warnings being treated as errors
>>
>> The full context is this block:
>>
>>  if (srr1 && !generic_check_cpu_restart(cpu))
>>         DBG("CPU%d Unexpected exit while offline srr1=%lx!\n",
>>                         cpu, srr1);
>>
>> When building with DEBUG undefined DBG() expands to nothing and GCC emits
>> the warning due to the lack of braces around an empty statement.
>>
>> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
>> ---
>> We could add the braces too. That might even be better since it's a multi-line
>> if block even though it's only a single statement.
>
> Or you could put it all on one line, now that our 120 line overlords
> have taken over.

Yeah I think that one may as well be one line.

> Reviewed-by: Joel Stanley <joel@jms.id.au>
>
> Messy:
>
> $ git grep "define DBG(" arch/powerpc/ |grep -v print
> arch/powerpc/kernel/crash_dump.c:#define DBG(fmt...)
> arch/powerpc/kernel/iommu.c:#define DBG(...)
> arch/powerpc/kernel/legacy_serial.c:#define DBG(fmt...) do { } while(0)
> arch/powerpc/kernel/prom.c:#define DBG(fmt...)
..

Yeah, gross old cruft.

The vast majority of those should just be replaced with pr_devel()
and/or pr_debug().

The pnv_smp_cpu_kill_self() case is one where we probably do want to
stick with udbg_printf(), because I don't think it's kosher to call
printk() from an offline CPU.

cheers

^ permalink raw reply

* Re: [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask
From: Srikar Dronamraju @ 2020-08-04 12:10 UTC (permalink / raw)
  To: peterz
  Cc: Gautham R Shenoy, Michael Neuling, Vincent Guittot, Rik van Riel,
	linuxppc-dev, LKML, Ingo Molnar, Thomas Gleixner, Mel Gorman,
	Valentin Schneider, Dietmar Eggemann
In-Reply-To: <20200804104520.GB2657@hirez.programming.kicks-ass.net>

* peterz@infradead.org <peterz@infradead.org> [2020-08-04 12:45:20]:

> On Tue, Aug 04, 2020 at 09:03:06AM +0530, Srikar Dronamraju wrote:
> > cpu_smt_mask tracks topology_sibling_cpumask. This would be good for
> > most architectures. One of the users of cpu_smt_mask(), would be to
> > identify idle-cores. On Power9, a pair of cores can be presented by the
> > firmware as a big-core for backward compatibility reasons.
> > 
> > In order to maintain userspace backward compatibility with previous
> > versions of processor, (since Power8 had SMT8 cores), Power9 onwards there
> > is option to the firmware to advertise a pair of SMT4 cores as a fused
> > cores (referred to as the big_core mode in the Linux Kernel). On Power9
> > this pair shares the L2 cache as well. However, from the scheduler's point
> > of view, a core should be determined by SMT4. The load-balancer already
> > does this. Hence allow PowerPc architecture to override the default
> > cpu_smt_mask() to point to the SMT4 cores in a big_core mode.
> 
> I'm utterly confused.
> 
> Why can't you set your regular siblings mask to the smt4 thing? Who
> cares about the compat stuff, I thought that was an LPAR/AIX thing.

There are no technical challenges to set the sibling mask to SMT4.
This is for Linux running on PowerVM. When these Power9 boxes are sold /
marketed as X core boxes (where X stand for SMT8 cores).  Since on PowerVM
world, everything is in SMT8 mode, the device tree properties still mark
the system to be running on 8 thread core. There are a number of utilities
like ppc64_cpu that directly read from device-tree. They would get core
count and thread count which is SMT8 based.

If the sibling_mask is set to small core, then same user when looking at
output from lscpu and other utilities that look at sysfs will start seeing
2x number of cores to what he had provisioned and what the utilities from
the device-tree show. This can gets users confused.

So to keep the device-tree properties, utilities depending on device-tree,
sysfs and utilities depending on sysfs on the same page, userspace are only
exposed as SMT8.

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply

* Re: [merge] Build failure selftest/powerpc/mm/pkey_exec_prot
From: Michael Ellerman @ 2020-08-04 12:23 UTC (permalink / raw)
  To: Sandipan Das; +Cc: Sachin Sant, linuxppc-dev
In-Reply-To: <185c2277-91fd-74eb-3c04-75caeb90ed9e@linux.ibm.com>

Sandipan Das <sandipan@linux.ibm.com> writes:
> On 04/08/20 6:38 am, Michael Ellerman wrote:
>> Sandipan Das <sandipan@linux.ibm.com> writes:
>>> On 03/08/20 4:32 pm, Michael Ellerman wrote:
>>>> Sachin Sant <sachinp@linux.vnet.ibm.com> writes:
>>>>>> On 02-Aug-2020, at 10:58 PM, Sandipan Das <sandipan@linux.ibm.com> wrote:
>>>>>> On 02/08/20 4:45 pm, Sachin Sant wrote:
>>>>>>> pkey_exec_prot test from linuxppc merge branch (3f68564f1f5a) fails to
>>>>>>> build due to following error:
>>>>>>>
>>>>>>> gcc -std=gnu99 -O2 -Wall -Werror -DGIT_VERSION='"v5.8-rc7-1276-g3f68564f1f5a"' -I/home/sachin/linux/tools/testing/selftests/powerpc/include  -m64    pkey_exec_prot.c /home/sachin/linux/tools/testing/selftests/kselftest_harness.h /home/sachin/linux/tools/testing/selftests/kselftest.h ../harness.c ../utils.c  -o /home/sachin/linux/tools/testing/selftests/powerpc/mm/pkey_exec_prot
>>>>>>> In file included from pkey_exec_prot.c:18:
>>>>>>> /home/sachin/linux/tools/testing/selftests/powerpc/include/pkeys.h:34: error: "SYS_pkey_mprotect" redefined [-Werror]
>>>>>>> #define SYS_pkey_mprotect 386
>>>>>>>
>>>>>>> In file included from /usr/include/sys/syscall.h:31,
>>>>>>>                 from /home/sachin/linux/tools/testing/selftests/powerpc/include/utils.h:47,
>>>>>>>                 from /home/sachin/linux/tools/testing/selftests/powerpc/include/pkeys.h:12,
>>>>>>>                 from pkey_exec_prot.c:18:
>>>>>>> /usr/include/bits/syscall.h:1583: note: this is the location of the previous definition
>>>>>>> # define SYS_pkey_mprotect __NR_pkey_mprotect
>>>>>>>
>>>>>>> commit 128d3d021007 introduced this error.
>>>>>>> selftests/powerpc: Move pkey helpers to headers
>>>>>>>
>>>>>>> Possibly the # defines for sys calls can be retained in pkey_exec_prot.c or
>>>>>>>
>>>>>>
>>>>>> I am unable to reproduce this on the latest merge branch (HEAD at f59195f7faa4).
>>>>>> I don't see any redefinitions in pkey_exec_prot.c either.
>>>>>>
>>>>>
>>>>> I can still see this problem on latest merge branch.
>>>>> I have following gcc version
>>>>>
>>>>> gcc version 8.3.1 20191121
>>>>
>>>> What libc version? Or just the distro & version?
>>>
>>> Sachin observed this on RHEL 8.2 with glibc-2.28.
>>> I couldn't reproduce it on Ubuntu 20.04 and Fedora 32 and both these distros
>>> are using glibc-2.31.
>> 
>> OK odd. Usually it's newer glibc that hits this problem.
>> 
>> I guess on RHEL 8.2 we're getting the asm-generic version? But that
>> would be quite wrong if that's what's happening.
>
> If I let GCC dump all the headers that are being used for the source file, I always
> see syscall.h being included on the RHEL 8.2 system. That is the header with the
> conflicting definition.
>
>   $ cd tools/testing/selftests/powerpc/mm
>   $ gcc -H -std=gnu99 -O2 -Wall -Werror -DGIT_VERSION='"v5.8-rc7-1456-gf59195f7faa4-dirty"' \
>         -I../include -m64 pkey_exec_prot.c ../../kselftest_harness.h ../../kselftest.h ../harness.c ../utils.c \
>         -o pkey_exec_prot 2>&1 | grep syscall
>
> On Ubuntu 20.04 and Fedora 32, grep doesn't find any matching text.
> On RHEL 8.2, it shows the following.
>   ... /usr/include/sys/syscall.h
>   .... /usr/include/bits/syscall.h
>   In file included from /usr/include/sys/syscall.h:31,
>   /usr/include/bits/syscall.h:1583: note: this is the location of the previous definition
>   In file included from /usr/include/sys/syscall.h:31,
>   /usr/include/bits/syscall.h:1575: note: this is the location of the previous definition
>   In file included from /usr/include/sys/syscall.h:31,
>   /usr/include/bits/syscall.h:1579: note: this is the location of the previous definition
>   /usr/include/bits/syscall.h
>   .. /usr/include/sys/syscall.h
>   ... /usr/include/bits/syscall.h
>   /usr/include/bits/syscall.h
>   .. /usr/include/sys/syscall.h
>   ... /usr/include/bits/syscall.h
>   /usr/include/bits/syscall.h
>
> So utils.h is also including /usr/include/sys/syscall.h for glibc versions older than 2.30
> because of commit 743f3544fffb ("selftests/powerpc: Add wrapper for gettid") :)

Haha, of course. :facepalm_emoji:

> [...]
> . ../include/pkeys.h
> [...]
> .. ../include/utils.h
> [...]
> ... /usr/include/sys/syscall.h
> .... /usr/include/asm/unistd.h
> .... /usr/include/bits/syscall.h
> In file included from pkey_exec_prot.c:18:
> ../include/pkeys.h:34: error: "SYS_pkey_mprotect" redefined [-Werror]
>  #define SYS_pkey_mprotect 386
>
> In file included from /usr/include/sys/syscall.h:31,
>                  from ../include/utils.h:47,
>                  from ../include/pkeys.h:12,
>                  from pkey_exec_prot.c:18:
> /usr/include/bits/syscall.h:1583: note: this is the location of the previous definition
>  # define SYS_pkey_mprotect __NR_pkey_mprotect

Aha, that explains why redefining gives us an error, because we're
defining it to the literal 386 whereas the system header is defining it
to the __NR value.

Is there a reason to use the SYS_ name?

Typically we just use the __NR value directly, and that would avoid any
problems with redefinition I think, as long as we're using the same
value as the system header (which we always should be).

eg:

diff --git a/tools/testing/selftests/powerpc/include/pkeys.h b/tools/testing/selftests/powerpc/include/pkeys.h
index 6ba95039a034..3312cb1b058d 100644
--- a/tools/testing/selftests/powerpc/include/pkeys.h
+++ b/tools/testing/selftests/powerpc/include/pkeys.h
@@ -31,9 +31,9 @@
 
 #define SI_PKEY_OFFSET	0x20
 
-#define SYS_pkey_mprotect	386
-#define SYS_pkey_alloc		384
-#define SYS_pkey_free		385
+#define __NR_pkey_mprotect	386
+#define __NR_pkey_alloc		384
+#define __NR_pkey_free		385
 
 #define PKEY_BITS_PER_PKEY	2
 #define NR_PKEYS		32
@@ -62,17 +62,17 @@ void pkey_set_rights(int pkey, unsigned long rights)
 
 int sys_pkey_mprotect(void *addr, size_t len, int prot, int pkey)
 {
-	return syscall(SYS_pkey_mprotect, addr, len, prot, pkey);
+	return syscall(__NR_pkey_mprotect, addr, len, prot, pkey);
 }
 
 int sys_pkey_alloc(unsigned long flags, unsigned long rights)
 {
-	return syscall(SYS_pkey_alloc, flags, rights);
+	return syscall(__NR_pkey_alloc, flags, rights);
 }
 
 int sys_pkey_free(int pkey)
 {
-	return syscall(SYS_pkey_free, pkey);
+	return syscall(__NR_pkey_free, pkey);
 }
 
 int pkeys_unsupported(void)


cheers

^ permalink raw reply related

* Re: [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask
From: peterz @ 2020-08-04 12:47 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Gautham R Shenoy, Michael Neuling, Vincent Guittot, Rik van Riel,
	linuxppc-dev, LKML, Ingo Molnar, Thomas Gleixner, Mel Gorman,
	Valentin Schneider, Dietmar Eggemann
In-Reply-To: <20200804121007.GJ24375@linux.vnet.ibm.com>

On Tue, Aug 04, 2020 at 05:40:07PM +0530, Srikar Dronamraju wrote:
> * peterz@infradead.org <peterz@infradead.org> [2020-08-04 12:45:20]:
> 
> > On Tue, Aug 04, 2020 at 09:03:06AM +0530, Srikar Dronamraju wrote:
> > > cpu_smt_mask tracks topology_sibling_cpumask. This would be good for
> > > most architectures. One of the users of cpu_smt_mask(), would be to
> > > identify idle-cores. On Power9, a pair of cores can be presented by the
> > > firmware as a big-core for backward compatibility reasons.
> > > 
> > > In order to maintain userspace backward compatibility with previous
> > > versions of processor, (since Power8 had SMT8 cores), Power9 onwards there
> > > is option to the firmware to advertise a pair of SMT4 cores as a fused
> > > cores (referred to as the big_core mode in the Linux Kernel). On Power9
> > > this pair shares the L2 cache as well. However, from the scheduler's point
> > > of view, a core should be determined by SMT4. The load-balancer already
> > > does this. Hence allow PowerPc architecture to override the default
> > > cpu_smt_mask() to point to the SMT4 cores in a big_core mode.
> > 
> > I'm utterly confused.
> > 
> > Why can't you set your regular siblings mask to the smt4 thing? Who
> > cares about the compat stuff, I thought that was an LPAR/AIX thing.
> 
> There are no technical challenges to set the sibling mask to SMT4.
> This is for Linux running on PowerVM. When these Power9 boxes are sold /
> marketed as X core boxes (where X stand for SMT8 cores).  Since on PowerVM
> world, everything is in SMT8 mode, the device tree properties still mark
> the system to be running on 8 thread core. There are a number of utilities
> like ppc64_cpu that directly read from device-tree. They would get core
> count and thread count which is SMT8 based.
> 
> If the sibling_mask is set to small core, then same user when looking at

FWIW, I find the small/big core naming utterly confusing vs the
big/little naming from ARM. When you say small, I'm thinking of
asymmetric cores, not SMT4/SMT8.

> output from lscpu and other utilities that look at sysfs will start seeing
> 2x number of cores to what he had provisioned and what the utilities from
> the device-tree show. This can gets users confused.

One will report SMT8 and the other SMT4, right? So only users that
cannot read will be confused, but if you can't read, confusion is
guaranteed anyway.

Also, by exposing the true (SMT4) topology to userspace, userspace
applications could behave better -- for those few that actually parse
the topology information.

> So to keep the device-tree properties, utilities depending on device-tree,
> sysfs and utilities depending on sysfs on the same page, userspace are only
> exposed as SMT8.

I'm not convinced it makes sense to lie to userspace just to accomodate
a few users that cannot read.

^ permalink raw reply

* [PATCH] powerpc: Fix circular dependency between percpu.h and mmu.h
From: Michael Ellerman @ 2020-08-04 13:05 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: sfr, w

Recently random.h started including percpu.h (see commit
f227e3ec3b5c ("random32: update the net random state on interrupt and
activity")), which broke corenet64_smp_defconfig:

  In file included from /linux/arch/powerpc/include/asm/paca.h:18,
                   from /linux/arch/powerpc/include/asm/percpu.h:13,
                   from /linux/include/linux/random.h:14,
                   from /linux/lib/uuid.c:14:
  /linux/arch/powerpc/include/asm/mmu.h:139:22: error: unknown type name 'next_tlbcam_idx'
    139 | DECLARE_PER_CPU(int, next_tlbcam_idx);

This is due to a circular header dependency:
  asm/mmu.h includes asm/percpu.h, which includes asm/paca.h, which
  includes asm/mmu.h

Which means DECLARE_PER_CPU() isn't defined when mmu.h needs it.

We can fix it by moving the include of paca.h below the include of
asm-generic/percpu.h.

This moves the include of paca.h out of the #ifdef __powerpc64__, but
that is OK because paca.h is almost entirely inside #ifdef
CONFIG_PPC64 anyway.

It also moves the include of paca.h out of the #ifdef CONFIG_SMP,
which could possibly break something, but seems to have no ill
effects.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: f227e3ec3b5c ("random32: update the net random state on interrupt and activity")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/include/asm/percpu.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/percpu.h b/arch/powerpc/include/asm/percpu.h
index dce863a7635c..8e5b7d0b851c 100644
--- a/arch/powerpc/include/asm/percpu.h
+++ b/arch/powerpc/include/asm/percpu.h
@@ -10,8 +10,6 @@
 
 #ifdef CONFIG_SMP
 
-#include <asm/paca.h>
-
 #define __my_cpu_offset local_paca->data_offset
 
 #endif /* CONFIG_SMP */
@@ -19,4 +17,6 @@
 
 #include <asm-generic/percpu.h>
 
+#include <asm/paca.h>
+
 #endif /* _ASM_POWERPC_PERCPU_H_ */
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCHv4 1/2] powerpc/pseries: group lmb operation and memblock's
From: Pingfan Liu @ 2020-08-04 13:33 UTC (permalink / raw)
  To: Laurent Dufour
  Cc: Nathan Lynch, Kexec Mailing List, linuxppc-dev, Hari Bathini,
	Nathan Fontenot
In-Reply-To: <ee585548-aefc-ff3f-6a79-e616b9e04f12@linux.ibm.com>

On Mon, Aug 3, 2020 at 9:52 PM Laurent Dufour <ldufour@linux.ibm.com> wrote:
>
> > @@ -603,6 +606,8 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
> >         }
> >
> >         lmb_set_nid(lmb);
> > +       lmb->flags |= DRCONF_MEM_ASSIGNED;
> > +
> >         block_sz = memory_block_size_bytes();
> >
> >         /* Add the memory */
>
> Since the lmb->flags is now set earlier, you should unset it in the case the
> call to __add_memory() fails, something like:
>
> @@ -614,6 +614,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
>         rc = __add_memory(lmb->nid, lmb->base_addr, block_sz);
>         if (rc) {
>                 invalidate_lmb_associativity_index(lmb);
> +               lmb->flags &= ~DRCONF_MEM_ASSIGNED;
You are right. I will fix it in V5.

Thanks,
Pingfan

^ permalink raw reply

* Re: [PATCH] powerpc/pseries/hotplug-cpu: increase wait time for vCPU death
From: Michael Ellerman @ 2020-08-04 13:35 UTC (permalink / raw)
  To: Michael Roth, linuxppc-dev
  Cc: Nathan Lynch, Cedric Le Goater, Thiago Jung Bauermann, Greg Kurz
In-Reply-To: <20200804032937.7235-1-mdroth@linux.vnet.ibm.com>

Hi Mike,

There is a bit of history to this code, but not in a good way :)

Michael Roth <mdroth@linux.vnet.ibm.com> writes:
> For a power9 KVM guest with XIVE enabled, running a test loop
> where we hotplug 384 vcpus and then unplug them, the following traces
> can be seen (generally within a few loops) either from the unplugged
> vcpu:
>
>   [ 1767.353447] cpu 65 (hwid 65) Ready to die...
>   [ 1767.952096] Querying DEAD? cpu 66 (66) shows 2
>   [ 1767.952311] list_del corruption. next->prev should be c00a000002470208, but was c00a000002470048
...
>
> At that point the worker thread assumes the unplugged CPU is in some
> unknown/dead state and procedes with the cleanup, causing the race with
> the XIVE cleanup code executed by the unplugged CPU.
>
> Fix this by inserting an msleep() after each RTAS call to avoid

We previously had an msleep(), but it was removed:

  b906cfa397fd ("powerpc/pseries: Fix cpu hotplug")

> pseries_cpu_die() returning prematurely, and double the number of
> attempts so we wait at least a total of 5 seconds. While this isn't an
> ideal solution, it is similar to how we dealt with a similar issue for
> cede_offline mode in the past (940ce422a3).

Thiago tried to fix this previously but there was a bit of discussion
that didn't quite resolve:

  https://lore.kernel.org/linuxppc-dev/20190423223914.3882-1-bauerman@linux.ibm.com/


Spinning forever seems like a bad idea, but as has been demonstrated at
least twice now, continuing when we don't know the state of the other
CPU can lead to straight up crashes.

So I think I'm persuaded that it's preferable to have the kernel stuck
spinning rather than oopsing.

I'm 50/50 on whether we should have a cond_resched() in the loop. My
first instinct is no, if we're stuck here for 20s a stack trace would be
good. But then we will probably hit that on some big and/or heavily
loaded machine.

So possibly we should call cond_resched() but have some custom logic in
the loop to print a warning if we are stuck for more than some
sufficiently long amount of time.


> Fixes: eac1e731b59ee ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1856588

This is not public.

I tend to trim Bugzilla links from the change log, because I'm not
convinced they will last forever, but it is good to have them in the
mail archive.

cheers

> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Cedric Le Goater <clg@kaod.org>
> Cc: Greg Kurz <groug@kaod.org>
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/hotplug-cpu.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index c6e0d8abf75e..3cb172758052 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -111,13 +111,12 @@ static void pseries_cpu_die(unsigned int cpu)
>  	int cpu_status = 1;
>  	unsigned int pcpu = get_hard_smp_processor_id(cpu);
>  
> -	for (tries = 0; tries < 25; tries++) {
> +	for (tries = 0; tries < 50; tries++) {
>  		cpu_status = smp_query_cpu_stopped(pcpu);
>  		if (cpu_status == QCSS_STOPPED ||
>  		    cpu_status == QCSS_HARDWARE_ERROR)
>  			break;
> -		cpu_relax();
> -
> +		msleep(100);
>  	}
>  
>  	if (cpu_status != 0) {
> -- 
> 2.17.1

^ permalink raw reply

* Re: [PATCH] powerpc: Fix circular dependency between percpu.h and mmu.h
From: Michael Ellerman @ 2020-08-04 13:37 UTC (permalink / raw)
  To: linuxppc-dev, Michael Ellerman; +Cc: sfr, w
In-Reply-To: <20200804130558.292328-1-mpe@ellerman.id.au>

On Tue, 4 Aug 2020 23:05:58 +1000, Michael Ellerman wrote:
> Recently random.h started including percpu.h (see commit
> f227e3ec3b5c ("random32: update the net random state on interrupt and
> activity")), which broke corenet64_smp_defconfig:
> 
>   In file included from /linux/arch/powerpc/include/asm/paca.h:18,
>                    from /linux/arch/powerpc/include/asm/percpu.h:13,
>                    from /linux/include/linux/random.h:14,
>                    from /linux/lib/uuid.c:14:
>   /linux/arch/powerpc/include/asm/mmu.h:139:22: error: unknown type name 'next_tlbcam_idx'
>     139 | DECLARE_PER_CPU(int, next_tlbcam_idx);
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc: Fix circular dependency between percpu.h and mmu.h
      https://git.kernel.org/powerpc/c/0c83b277ada72b585e6a3e52b067669df15bcedb

cheers

^ permalink raw reply

* Re: [PATCHv4 2/2] powerpc/pseries: update device tree before ejecting hotplug uevents
From: Pingfan Liu @ 2020-08-04 13:38 UTC (permalink / raw)
  To: Laurent Dufour
  Cc: Nathan Lynch, Kexec Mailing List, linuxppc-dev, Hari Bathini,
	Nathan Fontenot
In-Reply-To: <951e4406-1172-09a3-a8db-6f0a92aea2ed@linux.ibm.com>

On Tue, Aug 4, 2020 at 12:29 AM Laurent Dufour <ldufour@linux.ibm.com> wrote:
>
[...]
> >       lmb_set_nid(lmb);
> >       lmb->flags |= DRCONF_MEM_ASSIGNED;
> > +     if (dt_update) {
> > +             ret = drmem_update_dt();
> > +             if (ret)
> > +                     pr_warn("%s fail to update dt, but continue\n", __func__);
> > +     }
> >
> >       block_sz = memory_block_size_bytes();
>
> In the case the call to __add_memory is failing, the flag DRCONF_MEM_ASSIGNED
> should be cleared as I mentioned in your previous patch. In addition to this,
Yes.
> the DT should be updated, or the caller should manage that but that will be hard
> since it doesn't know where the error happened in this function.
Yeah, it is hard to manage it by caller, so just updating dt  is a
easier method.
>
> >
> > @@ -625,7 +653,11 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
> >               invalidate_lmb_associativity_index(lmb);
> >               lmb_clear_nid(lmb);
> >               lmb->flags &= ~DRCONF_MEM_ASSIGNED;
> > -
> > +             if (dt_update) {
> > +                     ret = drmem_update_dt();
> > +                     if (ret)
> > +                             pr_warn("%s fail to update dt during rollback, but continue\n", __func__);
> > +             }
> >               __remove_memory(nid, base_addr, block_sz);
> >       }
> >
> > @@ -638,6 +670,7 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
> >       int lmbs_available = 0;
> >       int lmbs_added = 0;
> >       int rc;
> > +     bool dt_update = false;
> >
> >       pr_info("Attempting to hot-add %d LMB(s)\n", lmbs_to_add);
> >
> > @@ -664,7 +697,7 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
> >               if (rc)
> >                       continue;
> >
> > -             rc = dlpar_add_lmb(lmb);
> > +             rc = dlpar_add_lmb(lmb, dt_update);
> >               if (rc) {
> >                       dlpar_release_drc(lmb->drc_index);
> >                       continue;
> > @@ -678,16 +711,23 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
> >               lmbs_added++;
> >               if (lmbs_added == lmbs_to_add)
> >                       break;
> > +             else if (lmbs_added == lmbs_to_add - 1)
> > +                     dt_update = true;
>
> In the case the number of LMB to add is 1, dt_update is never set to true, and
> the device tree is never updated. You need to set dt_update to true earlier in
> the loop block.
Oh, I will fix it in V5
>
> >       }
> >
> >       if (lmbs_added != lmbs_to_add) {
> > +             bool rollback_dt_update = false;
> > +
> >               pr_err("Memory hot-add failed, removing any added LMBs\n");
> >
> >               for_each_drmem_lmb(lmb) {
> >                       if (!drmem_lmb_reserved(lmb))
> >                               continue;
> >
> > -                     rc = dlpar_remove_lmb(lmb);
> > +                     if (--lmbs_added == 0 && dt_update)
> > +                             rollback_dt_update = true;
>
> That test may have to be review to deal with error during the last LMB addition,
> the DT may have been updated before __add_memory() is failing in
> dlpar_add_lmb(). In that case, lmbs_added == 0 and that branch is not covered.
> That's not an issue if dlpar_add_lmb() is handling that case (see my comment above).
I will fix it in next version.

Thanks for your review.

Regards,
Pingfan

^ permalink raw reply

* Re: [PATCH v2 01/17] KVM: PPC: Book3S HV: simplify kvm_cma_reserve()
From: Daniel Axtens @ 2020-08-04 13:53 UTC (permalink / raw)
  To: Mike Rapoport, Andrew Morton
  Cc: Thomas Gleixner, Emil Renner Berthing, linux-sh, Peter Zijlstra,
	Dave Hansen, linux-mips, Max Filippov, Paul Mackerras, sparclinux,
	linux-riscv, Will Deacon, Christoph Hellwig, Marek Szyprowski,
	linux-arch, linux-s390, linux-c6x-dev, Baoquan He, x86,
	Russell King, Mike Rapoport, clang-built-linux, Ingo Molnar,
	linux-arm-kernel, Catalin Marinas, uclinux-h8-devel, linux-xtensa,
	openrisc, Borislav Petkov, Andy Lutomirski, Paul Walmsley,
	Stafford Horne, Hari Bathini, Michal Simek, Yoshinori Sato,
	linux-mm, linux-kernel, iommu, Palmer Dabbelt, linuxppc-dev,
	Mike Rapoport
In-Reply-To: <20200802163601.8189-2-rppt@kernel.org>

Hi Mike,

>
> The memory size calculation in kvm_cma_reserve() traverses memblock.memory
> rather than simply call memblock_phys_mem_size(). The comment in that
> function suggests that at some point there should have been call to
> memblock_analyze() before memblock_phys_mem_size() could be used.
> As of now, there is no memblock_analyze() at all and
> memblock_phys_mem_size() can be used as soon as cold-plug memory is
> registerd with memblock.
>
> Replace loop over memblock.memory with a call to memblock_phys_mem_size().
>
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>  arch/powerpc/kvm/book3s_hv_builtin.c | 11 ++---------
>  1 file changed, 2 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
> index 7cd3cf3d366b..56ab0d28de2a 100644
> --- a/arch/powerpc/kvm/book3s_hv_builtin.c
> +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
> @@ -95,22 +95,15 @@ EXPORT_SYMBOL_GPL(kvm_free_hpt_cma);
>  void __init kvm_cma_reserve(void)
>  {
>  	unsigned long align_size;
> -	struct memblock_region *reg;
> -	phys_addr_t selected_size = 0;
> +	phys_addr_t selected_size;
>  
>  	/*
>  	 * We need CMA reservation only when we are in HV mode
>  	 */
>  	if (!cpu_has_feature(CPU_FTR_HVMODE))
>  		return;
> -	/*
> -	 * We cannot use memblock_phys_mem_size() here, because
> -	 * memblock_analyze() has not been called yet.
> -	 */
> -	for_each_memblock(memory, reg)
> -		selected_size += memblock_region_memory_end_pfn(reg) -
> -				 memblock_region_memory_base_pfn(reg);
>  
> +	selected_size = PHYS_PFN(memblock_phys_mem_size());
>  	selected_size = (selected_size * kvm_cma_resv_ratio / 100) << PAGE_SHIFT;

I think this is correct, but PHYS_PFN does x >> PAGE_SHIFT and then the
next line does x << PAGE_SHIFT, so I think we could combine those two
lines as:

selected_size = PAGE_ALIGN(memblock_phys_mem_size() * kvm_cma_resv_ratio / 100);

(I think that might technically change it from aligning down to aligning
up but I don't think 1 page matters here.)

Kind regards,
Daniel


>  	if (selected_size) {
>  		pr_debug("%s: reserving %ld MiB for global area\n", __func__,
> -- 
> 2.26.2

^ permalink raw reply

* [PATCH -next] soc: fsl: qe: Remove unnessesary check in ucc_set_tdm_rxtx_clk
From: Wang Hai @ 2020-08-04 13:56 UTC (permalink / raw)
  To: qiang.zhao, leoyang.li; +Cc: linuxppc-dev, linux-kernel, linux-arm-kernel

Fix smatch warning:

drivers/soc/fsl/qe/ucc.c:526
 ucc_set_tdm_rxtx_clk() warn: unsigned 'tdm_num' is never less than zero.

'tdm_num' is u32 type, never less than zero.

Signed-off-by: Wang Hai <wanghai38@huawei.com>
---
 drivers/soc/fsl/qe/ucc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/soc/fsl/qe/ucc.c b/drivers/soc/fsl/qe/ucc.c
index cac0fb7693a0..21dbcd787cd5 100644
--- a/drivers/soc/fsl/qe/ucc.c
+++ b/drivers/soc/fsl/qe/ucc.c
@@ -523,7 +523,7 @@ int ucc_set_tdm_rxtx_clk(u32 tdm_num, enum qe_clock clock,
 
 	qe_mux_reg = &qe_immr->qmx;
 
-	if (tdm_num > 7 || tdm_num < 0)
+	if (tdm_num > 7)
 		return -EINVAL;
 
 	/* The communications direction must be RX or TX */
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH] powerpc/pseries/hotplug-cpu: increase wait time for vCPU death
From: Greg Kurz @ 2020-08-04 14:16 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, linuxppc-dev, Michael Roth, Thiago Jung Bauermann,
	Cedric Le Goater
In-Reply-To: <873652zg8h.fsf@mpe.ellerman.id.au>

On Tue, 04 Aug 2020 23:35:10 +1000
Michael Ellerman <mpe@ellerman.id.au> wrote:

> Hi Mike,
> 
> There is a bit of history to this code, but not in a good way :)
> 
> Michael Roth <mdroth@linux.vnet.ibm.com> writes:
> > For a power9 KVM guest with XIVE enabled, running a test loop
> > where we hotplug 384 vcpus and then unplug them, the following traces
> > can be seen (generally within a few loops) either from the unplugged
> > vcpu:
> >
> >   [ 1767.353447] cpu 65 (hwid 65) Ready to die...
> >   [ 1767.952096] Querying DEAD? cpu 66 (66) shows 2
> >   [ 1767.952311] list_del corruption. next->prev should be c00a000002470208, but was c00a000002470048
> ...
> >
> > At that point the worker thread assumes the unplugged CPU is in some
> > unknown/dead state and procedes with the cleanup, causing the race with
> > the XIVE cleanup code executed by the unplugged CPU.
> >
> > Fix this by inserting an msleep() after each RTAS call to avoid
> 
> We previously had an msleep(), but it was removed:
> 
>   b906cfa397fd ("powerpc/pseries: Fix cpu hotplug")
> 

Ah, I hadn't seen that one...

> > pseries_cpu_die() returning prematurely, and double the number of
> > attempts so we wait at least a total of 5 seconds. While this isn't an
> > ideal solution, it is similar to how we dealt with a similar issue for
> > cede_offline mode in the past (940ce422a3).
> 
> Thiago tried to fix this previously but there was a bit of discussion
> that didn't quite resolve:
> 
>   https://lore.kernel.org/linuxppc-dev/20190423223914.3882-1-bauerman@linux.ibm.com/
> 

Yeah it appears that the motivation at the time was to make the "Querying DEAD?"
messages to disappear and to avoid potentially concurrent calls to rtas-stop-self
which is prohibited by PAPR... not fixing actual crashes.

> 
> Spinning forever seems like a bad idea, but as has been demonstrated at
> least twice now, continuing when we don't know the state of the other
> CPU can lead to straight up crashes.
> 
> So I think I'm persuaded that it's preferable to have the kernel stuck
> spinning rather than oopsing.
> 

+1

> I'm 50/50 on whether we should have a cond_resched() in the loop. My
> first instinct is no, if we're stuck here for 20s a stack trace would be
> good. But then we will probably hit that on some big and/or heavily
> loaded machine.
> 
> So possibly we should call cond_resched() but have some custom logic in
> the loop to print a warning if we are stuck for more than some
> sufficiently long amount of time.
> 

How long should that be ?

> 
> > Fixes: eac1e731b59ee ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
> > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1856588
> 
> This is not public.
> 

I'll have a look at changing that.

> I tend to trim Bugzilla links from the change log, because I'm not
> convinced they will last forever, but it is good to have them in the
> mail archive.
> 
> cheers
> 

Cheers,

--
Greg

> > Cc: Michael Ellerman <mpe@ellerman.id.au>
> > Cc: Cedric Le Goater <clg@kaod.org>
> > Cc: Greg Kurz <groug@kaod.org>
> > Cc: Nathan Lynch <nathanl@linux.ibm.com>
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/platforms/pseries/hotplug-cpu.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > index c6e0d8abf75e..3cb172758052 100644
> > --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > @@ -111,13 +111,12 @@ static void pseries_cpu_die(unsigned int cpu)
> >  	int cpu_status = 1;
> >  	unsigned int pcpu = get_hard_smp_processor_id(cpu);
> >  
> > -	for (tries = 0; tries < 25; tries++) {
> > +	for (tries = 0; tries < 50; tries++) {
> >  		cpu_status = smp_query_cpu_stopped(pcpu);
> >  		if (cpu_status == QCSS_STOPPED ||
> >  		    cpu_status == QCSS_HARDWARE_ERROR)
> >  			break;
> > -		cpu_relax();
> > -
> > +		msleep(100);
> >  	}
> >  
> >  	if (cpu_status != 0) {
> > -- 
> > 2.17.1


^ permalink raw reply

* Re: [merge] Build failure selftest/powerpc/mm/pkey_exec_prot
From: Sandipan Das @ 2020-08-04 16:45 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: Sachin Sant, linuxppc-dev
In-Reply-To: <877duezjk3.fsf@mpe.ellerman.id.au>


On 04/08/20 5:53 pm, Michael Ellerman wrote:
> Sandipan Das <sandipan@linux.ibm.com> writes:
>> On 04/08/20 6:38 am, Michael Ellerman wrote:
>>> Sandipan Das <sandipan@linux.ibm.com> writes:
>>>> On 03/08/20 4:32 pm, Michael Ellerman wrote:
>>>>> Sachin Sant <sachinp@linux.vnet.ibm.com> writes:
>>>>>>> On 02-Aug-2020, at 10:58 PM, Sandipan Das <sandipan@linux.ibm.com> wrote:
>>>>>>> On 02/08/20 4:45 pm, Sachin Sant wrote:
>>>>>>>> pkey_exec_prot test from linuxppc merge branch (3f68564f1f5a) fails to
>>>>>>>> build due to following error:
>>>>>>>>
>>>>>>>> gcc -std=gnu99 -O2 -Wall -Werror -DGIT_VERSION='"v5.8-rc7-1276-g3f68564f1f5a"' -I/home/sachin/linux/tools/testing/selftests/powerpc/include  -m64    pkey_exec_prot.c /home/sachin/linux/tools/testing/selftests/kselftest_harness.h /home/sachin/linux/tools/testing/selftests/kselftest.h ../harness.c ../utils.c  -o /home/sachin/linux/tools/testing/selftests/powerpc/mm/pkey_exec_prot
>>>>>>>> In file included from pkey_exec_prot.c:18:
>>>>>>>> /home/sachin/linux/tools/testing/selftests/powerpc/include/pkeys.h:34: error: "SYS_pkey_mprotect" redefined [-Werror]
>>>>>>>> #define SYS_pkey_mprotect 386
>>>>>>>>
>>>>>>>> In file included from /usr/include/sys/syscall.h:31,
>>>>>>>>                 from /home/sachin/linux/tools/testing/selftests/powerpc/include/utils.h:47,
>>>>>>>>                 from /home/sachin/linux/tools/testing/selftests/powerpc/include/pkeys.h:12,
>>>>>>>>                 from pkey_exec_prot.c:18:
>>>>>>>> /usr/include/bits/syscall.h:1583: note: this is the location of the previous definition
>>>>>>>> # define SYS_pkey_mprotect __NR_pkey_mprotect
>>>>>>>>
>>>>>>>> commit 128d3d021007 introduced this error.
>>>>>>>> selftests/powerpc: Move pkey helpers to headers
>>>>>>>>
>>>>>>>> Possibly the # defines for sys calls can be retained in pkey_exec_prot.c or
>>>>>>>>
>>>>>>>
>>>>>>> I am unable to reproduce this on the latest merge branch (HEAD at f59195f7faa4).
>>>>>>> I don't see any redefinitions in pkey_exec_prot.c either.
>>>>>>>
>>>>>>
>>>>>> I can still see this problem on latest merge branch.
>>>>>> I have following gcc version
>>>>>>
>>>>>> gcc version 8.3.1 20191121
>>>>>
>>>>> What libc version? Or just the distro & version?
>>>>
>>>> Sachin observed this on RHEL 8.2 with glibc-2.28.
>>>> I couldn't reproduce it on Ubuntu 20.04 and Fedora 32 and both these distros
>>>> are using glibc-2.31.
>>>
>>> OK odd. Usually it's newer glibc that hits this problem.
>>>
>>> I guess on RHEL 8.2 we're getting the asm-generic version? But that
>>> would be quite wrong if that's what's happening.
>>
>> If I let GCC dump all the headers that are being used for the source file, I always
>> see syscall.h being included on the RHEL 8.2 system. That is the header with the
>> conflicting definition.
>>
>>   $ cd tools/testing/selftests/powerpc/mm
>>   $ gcc -H -std=gnu99 -O2 -Wall -Werror -DGIT_VERSION='"v5.8-rc7-1456-gf59195f7faa4-dirty"' \
>>         -I../include -m64 pkey_exec_prot.c ../../kselftest_harness.h ../../kselftest.h ../harness.c ../utils.c \
>>         -o pkey_exec_prot 2>&1 | grep syscall
>>
>> On Ubuntu 20.04 and Fedora 32, grep doesn't find any matching text.
>> On RHEL 8.2, it shows the following.
>>   ... /usr/include/sys/syscall.h
>>   .... /usr/include/bits/syscall.h
>>   In file included from /usr/include/sys/syscall.h:31,
>>   /usr/include/bits/syscall.h:1583: note: this is the location of the previous definition
>>   In file included from /usr/include/sys/syscall.h:31,
>>   /usr/include/bits/syscall.h:1575: note: this is the location of the previous definition
>>   In file included from /usr/include/sys/syscall.h:31,
>>   /usr/include/bits/syscall.h:1579: note: this is the location of the previous definition
>>   /usr/include/bits/syscall.h
>>   .. /usr/include/sys/syscall.h
>>   ... /usr/include/bits/syscall.h
>>   /usr/include/bits/syscall.h
>>   .. /usr/include/sys/syscall.h
>>   ... /usr/include/bits/syscall.h
>>   /usr/include/bits/syscall.h
>>
>> So utils.h is also including /usr/include/sys/syscall.h for glibc versions older than 2.30
>> because of commit 743f3544fffb ("selftests/powerpc: Add wrapper for gettid") :)
> 
> Haha, of course. :facepalm_emoji:
> 
>> [...]
>> . ../include/pkeys.h
>> [...]
>> .. ../include/utils.h
>> [...]
>> ... /usr/include/sys/syscall.h
>> .... /usr/include/asm/unistd.h
>> .... /usr/include/bits/syscall.h
>> In file included from pkey_exec_prot.c:18:
>> ../include/pkeys.h:34: error: "SYS_pkey_mprotect" redefined [-Werror]
>>  #define SYS_pkey_mprotect 386
>>
>> In file included from /usr/include/sys/syscall.h:31,
>>                  from ../include/utils.h:47,
>>                  from ../include/pkeys.h:12,
>>                  from pkey_exec_prot.c:18:
>> /usr/include/bits/syscall.h:1583: note: this is the location of the previous definition
>>  # define SYS_pkey_mprotect __NR_pkey_mprotect
> 
> Aha, that explains why redefining gives us an error, because we're
> defining it to the literal 386 whereas the system header is defining it
> to the __NR value.
> 
> Is there a reason to use the SYS_ name?
> 

That's just something I borrowed from the pkey tests under selftests/vm
... but without the #ifndefs

> Typically we just use the __NR value directly, and that would avoid any
> problems with redefinition I think, as long as we're using the same
> value as the system header (which we always should be).
> 

Agreed. David Laight suggested this too. Will send v4 with these changes.

- Sandipan

> eg:
> 
> diff --git a/tools/testing/selftests/powerpc/include/pkeys.h b/tools/testing/selftests/powerpc/include/pkeys.h
> index 6ba95039a034..3312cb1b058d 100644
> --- a/tools/testing/selftests/powerpc/include/pkeys.h
> +++ b/tools/testing/selftests/powerpc/include/pkeys.h
> @@ -31,9 +31,9 @@
>  
>  #define SI_PKEY_OFFSET	0x20
>  
> -#define SYS_pkey_mprotect	386
> -#define SYS_pkey_alloc		384
> -#define SYS_pkey_free		385
> +#define __NR_pkey_mprotect	386
> +#define __NR_pkey_alloc		384
> +#define __NR_pkey_free		385
>  
>  #define PKEY_BITS_PER_PKEY	2
>  #define NR_PKEYS		32
> @@ -62,17 +62,17 @@ void pkey_set_rights(int pkey, unsigned long rights)
>  
>  int sys_pkey_mprotect(void *addr, size_t len, int prot, int pkey)
>  {
> -	return syscall(SYS_pkey_mprotect, addr, len, prot, pkey);
> +	return syscall(__NR_pkey_mprotect, addr, len, prot, pkey);
>  }
>  
>  int sys_pkey_alloc(unsigned long flags, unsigned long rights)
>  {
> -	return syscall(SYS_pkey_alloc, flags, rights);
> +	return syscall(__NR_pkey_alloc, flags, rights);
>  }
>  
>  int sys_pkey_free(int pkey)
>  {
> -	return syscall(SYS_pkey_free, pkey);
> +	return syscall(__NR_pkey_free, pkey);
>  }
>  
>  int pkeys_unsupported(void)
> 

^ permalink raw reply

* [PATCH v4] selftests/powerpc: Fix pkey syscall redefinitions
From: Sandipan Das @ 2020-08-04 17:31 UTC (permalink / raw)
  To: mpe; +Cc: sachinp, aneesh.kumar, David Laight, linuxppc-dev

On distros using older glibc versions, the pkey tests
encounter build failures due to redefinition of the
pkey syscall numbers.

For compatibility, commit 743f3544fffb added a wrapper
for the gettid() syscall and included syscall.h if the
version of glibc used is older than 2.30. This leads
to different definitions of SYS_pkey_* as the ones in
the pkey test header set numeric constants where as the
ones from syscall.h reuse __NR_pkey_*. The compiler
complains about redefinitions since they are different.

This replaces SYS_pkey_* definitions with __NR_pkey_*
such that the definitions in both syscall.h and pkeys.h
are alike. This way, if syscall.h has to be included
for compatibility reasons, builds will still succeed.

Fixes: 743f3544fffb ("selftests/powerpc: Add wrapper for gettid")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Suggested-by: David Laight <david.laight@aculab.com>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
---
Previous versions can be found at:
v3: https://lore.kernel.org/linuxppc-dev/1bb744b0c7ed3985a5b73289f4de629ac0aeaf7c.1596453627.git.sandipan@linux.ibm.com/
v2: https://lore.kernel.org/linuxppc-dev/566dde119ce71f00f9642807ba30ceb7f54c9bfa.1596441105.git.sandipan@linux.ibm.com/
v1: https://lore.kernel.org/linuxppc-dev/20200803074043.466809-1-sandipan@linux.ibm.com/

Changes in v4:
- Replace SYS_pkey_* with __NR_pkey_* based on suggestions
  from David and Michael.
- Update commit message and add fixes tag.

Changes in v3:
- Use ifndef...endif instead of undef as suggested by
  Michael.

Changes in v2:
- Fix incorrect commit message.

---
 tools/testing/selftests/powerpc/include/pkeys.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/powerpc/include/pkeys.h b/tools/testing/selftests/powerpc/include/pkeys.h
index 6ba95039a034..3312cb1b058d 100644
--- a/tools/testing/selftests/powerpc/include/pkeys.h
+++ b/tools/testing/selftests/powerpc/include/pkeys.h
@@ -31,9 +31,9 @@
 
 #define SI_PKEY_OFFSET	0x20
 
-#define SYS_pkey_mprotect	386
-#define SYS_pkey_alloc		384
-#define SYS_pkey_free		385
+#define __NR_pkey_mprotect	386
+#define __NR_pkey_alloc		384
+#define __NR_pkey_free		385
 
 #define PKEY_BITS_PER_PKEY	2
 #define NR_PKEYS		32
@@ -62,17 +62,17 @@ void pkey_set_rights(int pkey, unsigned long rights)
 
 int sys_pkey_mprotect(void *addr, size_t len, int prot, int pkey)
 {
-	return syscall(SYS_pkey_mprotect, addr, len, prot, pkey);
+	return syscall(__NR_pkey_mprotect, addr, len, prot, pkey);
 }
 
 int sys_pkey_alloc(unsigned long flags, unsigned long rights)
 {
-	return syscall(SYS_pkey_alloc, flags, rights);
+	return syscall(__NR_pkey_alloc, flags, rights);
 }
 
 int sys_pkey_free(int pkey)
 {
-	return syscall(SYS_pkey_free, pkey);
+	return syscall(__NR_pkey_free, pkey);
 }
 
 int pkeys_unsupported(void)
-- 
2.25.1


^ permalink raw reply related

* [PATCH] powerpc/oprofile: fix spelling mistake "contex" -> "context"
From: Colin King @ 2020-08-04 17:43 UTC (permalink / raw)
  To: Robert Richter, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, oprofile-list, linuxppc-dev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

There is a spelling mistake in a pr_debug message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 arch/powerpc/oprofile/cell/spu_task_sync.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/oprofile/cell/spu_task_sync.c b/arch/powerpc/oprofile/cell/spu_task_sync.c
index df59d0bb121f..489f993100d5 100644
--- a/arch/powerpc/oprofile/cell/spu_task_sync.c
+++ b/arch/powerpc/oprofile/cell/spu_task_sync.c
@@ -572,7 +572,7 @@ void spu_sync_buffer(int spu_num, unsigned int *samples,
 		 * samples are recorded.
 		 * No big deal -- so we just drop a few samples.
 		 */
-		pr_debug("SPU_PROF: No cached SPU contex "
+		pr_debug("SPU_PROF: No cached SPU context "
 			  "for SPU #%d. Dropping samples.\n", spu_num);
 		goto out;
 	}
-- 
2.27.0


^ permalink raw reply related

* Re: [PATCH v5 00/12] kunit: create a centralized executor to dispatch all KUnit tests
From: Brendan Higgins @ 2020-08-04 20:01 UTC (permalink / raw)
  To: Kees Cook
  Cc: open list:DOCUMENTATION, catalin.marinas, jcmvbkbc, will,
	Paul Mackerras, open list:KERNEL SELFTEST FRAMEWORK, Frank Rowand,
	Anton Ivanov, linux-arch, Richard Weinberger, rppt, Iurii Zaikin,
	linux-xtensa, Arnd Bergmann, Jeff Dike, linux-um, linuxppc-dev,
	David Gow, Shuah Khan, Linux ARM, KUnit Development, Chris Zankel,
	Michal Simek, Stephen Boyd, Greg KH, Linux Kernel Mailing List,
	Luis Chamberlain, Alan Maguire, Andrew Morton, Logan Gunthorpe
In-Reply-To: <202006261442.5C245709@keescook>

On Fri, Jun 26, 2020 at 2:52 PM Kees Cook <keescook@chromium.org> wrote:
>
> On Fri, Jun 26, 2020 at 02:09:05PM -0700, Brendan Higgins wrote:
> > This patchset adds a centralized executor to dispatch tests rather than
> > relying on late_initcall to schedule each test suite separately along
> > with a couple of new features that depend on it.

Sorry it took so long to reply. I got sucked into some other stuff again.

> So, the new section looks fine to me (modulo the INIT_DATA change). The
> plumbing to start the tests, though, I think is redundant. Why not just
> add a sysctl that starts all known tests?

We already have that; however, we use debugfs to start the tests -
same difference. I just find it convenient to not have to build and
then maintain a userland for each architecture. It's also really nice
that KUnit "just works out of the box" - you don't have to download
anything other than the kernel source, and you don't need to do any
steps outside of just run "kuit.py run". That seems like a big
advantage to me.

> That way you don't need the plumbing into init/main.c, and you can have
> a mode where builtin tests can be started on a fully booted system too.
>
> i.e. boot with "sysctl.kernel.kunit=start" or when fully booted with
> "echo start > /proc/sys/kernel/kunit"
>
> And instead of the kunit-specific halt/reboot stuff, how about moving
> /proc/sysrq-trigger into /proc/sys instead? Then you (or anything) could
> do:
>
> sysctl.kernel.kunit=start sysctl.kernel.sysrq-trigger=b

I think it might be harder to make a case for the reboot stuff without
the stuff I am working on outside of this patchset. I think I will
probably drop that patch from this patchset and reintroduce it later.

^ permalink raw reply

* Re: [PATCH v5 01/12] vmlinux.lds.h: add linker section for KUnit test suites
From: Brendan Higgins @ 2020-08-04 20:03 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: open list:DOCUMENTATION, catalin.marinas, jcmvbkbc, will,
	Paul Mackerras, open list:KERNEL SELFTEST FRAMEWORK, Frank Rowand,
	Anton Ivanov, linux-arch, Richard Weinberger, rppt, Iurii Zaikin,
	linux-xtensa, Kees Cook, Arnd Bergmann, Jeff Dike, linux-um,
	linuxppc-dev, David Gow, Shuah Khan, Linux ARM, KUnit Development,
	Chris Zankel, Michal Simek, Stephen Boyd, Greg KH,
	Linux Kernel Mailing List, Logan Gunthorpe, Andrew Morton,
	Alan Maguire
In-Reply-To: <20200708043128.GY4332@42.do-not-panic.com>

On Tue, Jul 7, 2020 at 9:31 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> On Fri, Jun 26, 2020 at 02:22:11PM -0700, Brendan Higgins wrote:
> > On Fri, Jun 26, 2020 at 2:20 PM Kees Cook <keescook@chromium.org> wrote:
> > >
> > > On Fri, Jun 26, 2020 at 02:09:06PM -0700, Brendan Higgins wrote:
> > > > Add a linker section where KUnit can put references to its test suites.
> > > > This patch is the first step in transitioning to dispatching all KUnit
> > > > tests from a centralized executor rather than having each as its own
> > > > separate late_initcall.
> > > >
> > > > Co-developed-by: Iurii Zaikin <yzaikin@google.com>
> > > > Signed-off-by: Iurii Zaikin <yzaikin@google.com>
> > > > Signed-off-by: Brendan Higgins <brendanhiggins@google.com>
> > > > Reviewed-by: Stephen Boyd <sboyd@kernel.org>
> > > > ---
> > > >  include/asm-generic/vmlinux.lds.h | 8 ++++++++
> > > >  1 file changed, 8 insertions(+)
> > > >
> > > > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> > > > index db600ef218d7d..4f9b036fc9616 100644
> > > > --- a/include/asm-generic/vmlinux.lds.h
> > > > +++ b/include/asm-generic/vmlinux.lds.h
> > > > @@ -881,6 +881,13 @@
> > > >               KEEP(*(.con_initcall.init))                             \
> > > >               __con_initcall_end = .;
> > > >
> > > > +/* Alignment must be consistent with (kunit_suite *) in include/kunit/test.h */
> > >
> > > Nit on naming:
> > >
> > > > +#define KUNIT_TEST_SUITES                                            \
> > >
> > > I would call this KUNIT_TABLE to maintain the same names as other things
> > > of this nature.
> > >
> > > > +             . = ALIGN(8);                                           \
> > > > +             __kunit_suites_start = .;                               \
> > > > +             KEEP(*(.kunit_test_suites))                             \
> > > > +             __kunit_suites_end = .;
> > > > +
> > > >  #ifdef CONFIG_BLK_DEV_INITRD
> > > >  #define INIT_RAM_FS                                                  \
> > > >       . = ALIGN(4);                                                   \
> > > > @@ -1056,6 +1063,7 @@
> > > >               INIT_CALLS                                              \
> > > >               CON_INITCALL                                            \
> > > >               INIT_RAM_FS                                             \
> > > > +             KUNIT_TEST_SUITES                                       \
> > > >       }
> > >
> > > Nack: this must be in INIT_DATA, not in INIT_DATA_SECTION. Not all
> > > architectures use the INIT_DATA_SECTION macro (e.g. arm64), but everything
> > > uses INIT_DATA.
> >
> > Oh, maybe that would eliminate the need for the other linkerscript
> > patches? That would be nice.

Sorry for the delayed response. I got pulled into some other things.

> Curious, did changing it as Kees suggest fix it for m68k?

It did! There are still some architectures I cannot test due to a lack
of GCC or QEMU support, but it seems to work on everything else now.

^ permalink raw reply

* Re: [PATCH v5 07/12] kunit: test: create a single centralized executor for all tests
From: Brendan Higgins @ 2020-08-04 20:06 UTC (permalink / raw)
  To: Kees Cook
  Cc: open list:DOCUMENTATION, catalin.marinas, jcmvbkbc, will,
	Paul Mackerras, open list:KERNEL SELFTEST FRAMEWORK, Frank Rowand,
	Anton Ivanov, linux-arch, Richard Weinberger, rppt, Iurii Zaikin,
	linux-xtensa, Arnd Bergmann, Jeff Dike, linux-um, linuxppc-dev,
	David Gow, Shuah Khan, Linux ARM, KUnit Development, Chris Zankel,
	Michal Simek, Stephen Boyd, Greg KH, Linux Kernel Mailing List,
	Luis Chamberlain, Alan Maguire, Andrew Morton, Logan Gunthorpe
In-Reply-To: <202006261423.0BC9D830@keescook>

On Fri, Jun 26, 2020 at 2:29 PM Kees Cook <keescook@chromium.org> wrote:
>
> On Fri, Jun 26, 2020 at 02:09:12PM -0700, Brendan Higgins wrote:
> > From: Alan Maguire <alan.maguire@oracle.com>
> >
> > Add a centralized executor to dispatch tests rather than relying on
> > late_initcall to schedule each test suite separately. Centralized
> > execution is for built-in tests only; modules will execute tests when
> > loaded.
> >
> > Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> > Co-developed-by: Iurii Zaikin <yzaikin@google.com>
> > Signed-off-by: Iurii Zaikin <yzaikin@google.com>
> > Co-developed-by: Brendan Higgins <brendanhiggins@google.com>
> > Signed-off-by: Brendan Higgins <brendanhiggins@google.com>
> > Reviewed-by: Stephen Boyd <sboyd@kernel.org>
> > ---
> >  include/kunit/test.h | 67 +++++++++++++++++++++++++++++---------------
> >  lib/kunit/Makefile   |  3 +-
> >  lib/kunit/executor.c | 28 ++++++++++++++++++
> >  lib/kunit/test.c     |  2 +-
> >  4 files changed, 76 insertions(+), 24 deletions(-)
> >  create mode 100644 lib/kunit/executor.c
> >
> > diff --git a/include/kunit/test.h b/include/kunit/test.h
> > index 47e61e1d53370..f3e86c3953a2b 100644
> > --- a/include/kunit/test.h
> > +++ b/include/kunit/test.h
> > @@ -224,7 +224,7 @@ size_t kunit_suite_num_test_cases(struct kunit_suite *suite);
> >  unsigned int kunit_test_case_num(struct kunit_suite *suite,
> >                                struct kunit_case *test_case);
> >
> > -int __kunit_test_suites_init(struct kunit_suite **suites);
> > +int __kunit_test_suites_init(struct kunit_suite * const * const suites);
> >
> >  void __kunit_test_suites_exit(struct kunit_suite **suites);
> >
> > @@ -237,34 +237,57 @@ void __kunit_test_suites_exit(struct kunit_suite **suites);
> >   * Registers @suites_list with the test framework. See &struct kunit_suite for
> >   * more information.
> >   *
> > - * When builtin, KUnit tests are all run as late_initcalls; this means
> > - * that they cannot test anything where tests must run at a different init
> > - * phase. One significant restriction resulting from this is that KUnit
> > - * cannot reliably test anything that is initialize in the late_init phase;
> > - * another is that KUnit is useless to test things that need to be run in
> > - * an earlier init phase.
> > - *
> > - * An alternative is to build the tests as a module.  Because modules
> > - * do not support multiple late_initcall()s, we need to initialize an
> > - * array of suites for a module.
> > - *
> > - * TODO(brendanhiggins@google.com): Don't run all KUnit tests as
> > - * late_initcalls.  I have some future work planned to dispatch all KUnit
> > - * tests from the same place, and at the very least to do so after
> > - * everything else is definitely initialized.
> > + * If a test suite is built-in, module_init() gets translated into
> > + * an initcall which we don't want as the idea is that for builtins
> > + * the executor will manage execution.  So ensure we do not define
> > + * module_{init|exit} functions for the builtin case when registering
> > + * suites via kunit_test_suites() below.
> >   */
> > -#define kunit_test_suites(suites_list...)                            \
> > -     static struct kunit_suite *suites[] = {suites_list, NULL};      \
> > -     static int kunit_test_suites_init(void)                         \
> > +#ifdef MODULE
> > +#define kunit_test_suites_for_module(__suites)                               \
> > +     static int __init kunit_test_suites_init(void)                  \
> >       {                                                               \
> > -             return __kunit_test_suites_init(suites);                \
> > +             return __kunit_test_suites_init(__suites);              \
> >       }                                                               \
> > -     late_initcall(kunit_test_suites_init);                          \
> > +     module_init(kunit_test_suites_init);                            \
> > +                                                                     \
> >       static void __exit kunit_test_suites_exit(void)                 \
> >       {                                                               \
> > -             return __kunit_test_suites_exit(suites);                \
> > +             return __kunit_test_suites_exit(__suites);              \
> >       }                                                               \
> >       module_exit(kunit_test_suites_exit)
> > +#else
> > +#define kunit_test_suites_for_module(__suites)
> > +#endif /* MODULE */
> > +
> > +#define __kunit_test_suites(unique_array, unique_suites, ...)                       \
> > +     static struct kunit_suite *unique_array[] = { __VA_ARGS__, NULL };     \
> > +     kunit_test_suites_for_module(unique_array);                            \
> > +     static struct kunit_suite **unique_suites                              \
> > +     __used __section(.kunit_test_suites) = unique_array
> > +
> > +/**
> > + * kunit_test_suites() - used to register one or more &struct kunit_suite
> > + *                    with KUnit.
> > + *
> > + * @suites: a statically allocated list of &struct kunit_suite.
> > + *
> > + * Registers @suites with the test framework. See &struct kunit_suite for
> > + * more information.
> > + *
> > + * When builtin,  KUnit tests are all run via executor; this is done
> > + * by placing the array of struct kunit_suite * in the .kunit_test_suites
> > + * ELF section.
> > + *
> > + * An alternative is to build the tests as a module.  Because modules do not
> > + * support multiple initcall()s, we need to initialize an array of suites for a
> > + * module.
> > + *
> > + */
> > +#define kunit_test_suites(...)                                               \
> > +     __kunit_test_suites(__UNIQUE_ID(array),                         \
> > +                         __UNIQUE_ID(suites),                        \
> > +                         __VA_ARGS__)
> >
> >  #define kunit_test_suite(suite)      kunit_test_suites(&suite)
> >
> > diff --git a/lib/kunit/Makefile b/lib/kunit/Makefile
> > index 724b94311ca36..c49f4ffb6273a 100644
> > --- a/lib/kunit/Makefile
> > +++ b/lib/kunit/Makefile
> > @@ -3,7 +3,8 @@ obj-$(CONFIG_KUNIT) +=                        kunit.o
> >  kunit-objs +=                                test.o \
> >                                       string-stream.o \
> >                                       assert.o \
> > -                                     try-catch.o
> > +                                     try-catch.o \
> > +                                     executor.o
> >
> >  ifeq ($(CONFIG_KUNIT_DEBUGFS),y)
> >  kunit-objs +=                                debugfs.o
> > diff --git a/lib/kunit/executor.c b/lib/kunit/executor.c
> > new file mode 100644
> > index 0000000000000..7015e7328dce7
> > --- /dev/null
> > +++ b/lib/kunit/executor.c
> > @@ -0,0 +1,28 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <kunit/test.h>
> > +
> > +/*
> > + * These symbols point to the .kunit_test_suites section and are defined in
> > + * include/asm-generic/vmlinux.lds.h, and consequently must be extern.
> > + */
> > +extern struct kunit_suite * const * const __kunit_suites_start[];
> > +extern struct kunit_suite * const * const __kunit_suites_end[];
>
> I would expect these to be in include/asm-generic/sections.h but I guess
> it's not required.

I don't have strong opinions either way, but I think this is less
clutter since KUnit is the only one that uses it.

> Reviewed-by: Kees Cook <keescook@chromium.org>

Thanks!

^ permalink raw reply

* Re: [PATCH v5 09/12] kunit: test: add test plan to KUnit TAP format
From: Brendan Higgins @ 2020-08-04 20:10 UTC (permalink / raw)
  To: Kees Cook
  Cc: open list:DOCUMENTATION, catalin.marinas, jcmvbkbc, will,
	Paul Mackerras, open list:KERNEL SELFTEST FRAMEWORK, Frank Rowand,
	Anton Ivanov, linux-arch, Richard Weinberger, rppt, Iurii Zaikin,
	linux-xtensa, Arnd Bergmann, Jeff Dike, linux-um, linuxppc-dev,
	David Gow, Shuah Khan, Linux ARM, KUnit Development, Chris Zankel,
	Michal Simek, Stephen Boyd, Greg KH, Linux Kernel Mailing List,
	Luis Chamberlain, Alan Maguire, Andrew Morton, Logan Gunthorpe
In-Reply-To: <202006261434.119AE33DBB@keescook>

On Fri, Jun 26, 2020 at 2:35 PM Kees Cook <keescook@chromium.org> wrote:
>
> On Fri, Jun 26, 2020 at 02:09:14PM -0700, Brendan Higgins wrote:
> > TAP 14 allows an optional test plan to be emitted before the start of
> > the start of testing[1]; this is valuable because it makes it possible
> > for a test harness to detect whether the number of tests run matches the
> > number of tests expected to be run, ensuring that no tests silently
> > failed.
> >
> > Link[1]: https://github.com/isaacs/testanything.github.io/blob/tap14/tap-version-14-specification.md#the-plan
> > Signed-off-by: Brendan Higgins <brendanhiggins@google.com>
> > Reviewed-by: Stephen Boyd <sboyd@kernel.org>
>
> Look good, except...
>
> > diff --git a/tools/testing/kunit/test_data/test_is_test_passed-all_passed.log b/tools/testing/kunit/test_data/test_is_test_passed-all_passed.log
> > index 62ebc0288355c4b122ccc18ae2505f971efa57bc..bc0dc8fe35b760b1feb74ec419818dbfae1adb5c 100644
> > GIT binary patch
> > delta 28
> > jcmbQmGoME|#4$jjEVZaOGe1wk(1goSPtRy09}gP<dC~`u
> >
> > delta 23
> > ecmbQwGmD2W#4$jjEVZaOGe1wk&}5@94;uhhkp{*9
> >
> > diff --git a/tools/testing/kunit/test_data/test_is_test_passed-crash.log b/tools/testing/kunit/test_data/test_is_test_passed-crash.log
> > index 0b249870c8be417a5865bd40a24c8597bb7f5ab1..4d97f6708c4a5ad5bb2ac879e12afca6e816d83d 100644
> > GIT binary patch
> > delta 15
> > WcmX>hepY;fFN>j`p3z318g2k9Uj*m?
> >
> > delta 10
> > RcmX>renNbL@5Z2NZU7lr1S$Xk
> >
> > diff --git a/tools/testing/kunit/test_data/test_is_test_passed-failure.log b/tools/testing/kunit/test_data/test_is_test_passed-failure.log
> > index 9e89d32d5667a59d137f8adacf3a88fdb7f88baf..7a416497e3bec044eefc1535f7d84ee85703ba97 100644
> > GIT binary patch
> > delta 28
> > jcmZ3&yOLKp#4$jjEVZaOGe1wk(1goSPtRy0-!wJ=eKrU$
> >
> > delta 23
> > ecmZ3<yM&i7#4$jjEVZaOGe1wk&}5_VG&TTPhX-Z=
>
> What is happening here?? Those logs appear as text to me. Why did git
> freak out?

That's because this is all test data; it's all plaintext, but out of
necessity some of the test data is kind of munged up and causes
checkpatch to complain, so Shuah asked us to mark it as binary since
it isn't actually code and so checkpatch will stop flagging it.

^ permalink raw reply

* Re: [PATCH v5 10/12] kunit: Add 'kunit_shutdown' option
From: Brendan Higgins @ 2020-08-04 20:18 UTC (permalink / raw)
  To: Kees Cook
  Cc: open list:DOCUMENTATION, catalin.marinas, jcmvbkbc, will,
	Paul Mackerras, open list:KERNEL SELFTEST FRAMEWORK, Frank Rowand,
	Anton Ivanov, linux-arch, Richard Weinberger, rppt, Iurii Zaikin,
	linux-xtensa, Arnd Bergmann, Jeff Dike, linux-um, linuxppc-dev,
	David Gow, Shuah Khan, Linux ARM, KUnit Development, Chris Zankel,
	Michal Simek, Stephen Boyd, Greg KH, Linux Kernel Mailing List,
	Luis Chamberlain, Alan Maguire, Andrew Morton, Logan Gunthorpe
In-Reply-To: <202006261436.DEF4906A5@keescook>

On Fri, Jun 26, 2020 at 2:40 PM Kees Cook <keescook@chromium.org> wrote:
>
> On Fri, Jun 26, 2020 at 02:09:15PM -0700, Brendan Higgins wrote:
> > From: David Gow <davidgow@google.com>
> >
> > Add a new kernel command-line option, 'kunit_shutdown', which allows the
> > user to specify that the kernel poweroff, halt, or reboot after
> > completing all KUnit tests; this is very handy for running KUnit tests
> > on UML or a VM so that the UML/VM process exits cleanly immediately
> > after running all tests without needing a special initramfs.
> >
> > Signed-off-by: David Gow <davidgow@google.com>
> > Signed-off-by: Brendan Higgins <brendanhiggins@google.com>
> > Reviewed-by: Stephen Boyd <sboyd@kernel.org>
> > ---
> >  lib/kunit/executor.c                | 20 ++++++++++++++++++++
> >  tools/testing/kunit/kunit_kernel.py |  2 +-
> >  tools/testing/kunit/kunit_parser.py |  2 +-
> >  3 files changed, 22 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/kunit/executor.c b/lib/kunit/executor.c
> > index a95742a4ece73..38061d456afb2 100644
> > --- a/lib/kunit/executor.c
> > +++ b/lib/kunit/executor.c
> > @@ -1,5 +1,6 @@
> >  // SPDX-License-Identifier: GPL-2.0
> >
> > +#include <linux/reboot.h>
> >  #include <kunit/test.h>
> >
> >  /*
> > @@ -11,6 +12,23 @@ extern struct kunit_suite * const * const __kunit_suites_end[];
> >
> >  #if IS_BUILTIN(CONFIG_KUNIT)
> >
> > +static char *kunit_shutdown;
> > +core_param(kunit_shutdown, kunit_shutdown, charp, 0644);
> > +
> > +static void kunit_handle_shutdown(void)
> > +{
> > +     if (!kunit_shutdown)
> > +             return;
> > +
> > +     if (!strcmp(kunit_shutdown, "poweroff"))
> > +             kernel_power_off();
> > +     else if (!strcmp(kunit_shutdown, "halt"))
> > +             kernel_halt();
> > +     else if (!strcmp(kunit_shutdown, "reboot"))
> > +             kernel_restart(NULL);
> > +
> > +}
>
> If you have patches that do something just before the initrd, and then
> you add more patches to shut down immediately after an initrd, people
> may ask you to just use an initrd instead of filling the kernel with
> these changes...
>
> I mean, I get it, but it's not hard to make an initrd that poke a sysctl
> to start the tests...
>
> In fact, you don't even need a initrd to poke sysctls these days.

True, but it is nice to not have to maintain an initramfs for each
architecture that you want to test. Still, I see your point. If we can
find a convenient way to distribute the needed dependencies for
running KUnit on each non-UML architecture then I think I can abandon
this change.

So how about this: I will drop this patch from this patchset and move
it up to the follow up patchset that adds multiarchitecture support to
kunit_tool. There we can address the problem of how to best track the
necessary dependencies including possibly intitramfss.

^ permalink raw reply

* Re: [PATCH v4 1/7] powerpc/pseries/iommu: Create defines for operations in ibm, ddw-applicable
From: David Dai @ 2020-08-04 21:27 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Alexey Kardashevskiy, Joel Stanley,
	Christophe Leroy, Thiago Jung Bauermann, Ram Pai, Brian King
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200716071658.467820-2-leobras.c@gmail.com>

On Thu, 2020-07-16 at 04:16 -0300, Leonardo Bras wrote:
> Create defines to help handling ibm,ddw-applicable values, avoiding
> confusion about the index of given operations.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
>  arch/powerpc/platforms/pseries/iommu.c | 43 ++++++++++++++++------
> ----
>  1 file changed, 26 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c
> b/arch/powerpc/platforms/pseries/iommu.c
> index 6d47b4a3ce39..ac0d6376bdad 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -39,6 +39,14 @@
>  
>  #include "pseries.h"
>  
> +enum {
> +	DDW_QUERY_PE_DMA_WIN  = 0,
> +	DDW_CREATE_PE_DMA_WIN = 1,
> +	DDW_REMOVE_PE_DMA_WIN = 2,
> +
> +	DDW_APPLICABLE_SIZE
> +};
> +
>  static struct iommu_table_group *iommu_pseries_alloc_group(int node)
>  {
>  	struct iommu_table_group *table_group;
> @@ -771,12 +779,12 @@ static void remove_ddw(struct device_node *np,
> bool remove_prop)
>  {
>  	struct dynamic_dma_window_prop *dwp;
>  	struct property *win64;
> -	u32 ddw_avail[3];
> +	u32 ddw_avail[DDW_APPLICABLE_SIZE];
>  	u64 liobn;
>  	int ret = 0;
>  
>  	ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
> -					 &ddw_avail[0], 3);
> +					 &ddw_avail[0],
> DDW_APPLICABLE_SIZE);
>  
>  	win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
>  	if (!win64)
> @@ -798,15 +806,15 @@ static void remove_ddw(struct device_node *np,
> bool remove_prop)
>  		pr_debug("%pOF successfully cleared tces in window.\n",
>  			 np);
>  
> -	ret = rtas_call(ddw_avail[2], 1, 1, NULL, liobn);
> +	ret = rtas_call(ddw_avail[DDW_REMOVE_PE_DMA_WIN], 1, 1, NULL,
> liobn);
>  	if (ret)
>  		pr_warn("%pOF: failed to remove direct window: rtas
> returned "
>  			"%d to ibm,remove-pe-dma-window(%x) %llx\n",
> -			np, ret, ddw_avail[2], liobn);
> +			np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN],
> liobn);
>  	else
>  		pr_debug("%pOF: successfully removed direct window:
> rtas returned "
>  			"%d to ibm,remove-pe-dma-window(%x) %llx\n",
> -			np, ret, ddw_avail[2], liobn);
> +			np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN],
> liobn);
>  
>  delprop:
>  	if (remove_prop)
> @@ -889,11 +897,11 @@ static int query_ddw(struct pci_dev *dev, const
> u32 *ddw_avail,
>  	buid = pdn->phb->buid;
>  	cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
>  
> -	ret = rtas_call(ddw_avail[0], 3, 5, (u32 *)query,
> -		  cfg_addr, BUID_HI(buid), BUID_LO(buid));
> +	ret = rtas_call(ddw_avail[DDW_QUERY_PE_DMA_WIN], 3, 5, (u32
> *)query,
> +			cfg_addr, BUID_HI(buid), BUID_LO(buid));
>  	dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
> -		" returned %d\n", ddw_avail[0], cfg_addr,
> BUID_HI(buid),
> -		BUID_LO(buid), ret);
> +		" returned %d\n", ddw_avail[DDW_QUERY_PE_DMA_WIN],
> cfg_addr,
> +		 BUID_HI(buid), BUID_LO(buid), ret);
>  	return ret;
>  }
>  
> @@ -920,15 +928,16 @@ static int create_ddw(struct pci_dev *dev,
> const u32 *ddw_avail,
>  
>  	do {
>  		/* extra outputs are LIOBN and dma-addr (hi, lo) */
> -		ret = rtas_call(ddw_avail[1], 5, 4, (u32 *)create,
> -				cfg_addr, BUID_HI(buid), BUID_LO(buid),
> -				page_shift, window_shift);
> +		ret = rtas_call(ddw_avail[DDW_CREATE_PE_DMA_WIN], 5, 4,
> +				(u32 *)create, cfg_addr, BUID_HI(buid),
> +				BUID_LO(buid), page_shift,
> window_shift);
>  	} while (rtas_busy_delay(ret));
>  	dev_info(&dev->dev,
>  		"ibm,create-pe-dma-window(%x) %x %x %x %x %x returned
> %d "
> -		"(liobn = 0x%x starting addr = %x %x)\n", ddw_avail[1],
> -		 cfg_addr, BUID_HI(buid), BUID_LO(buid), page_shift,
> -		 window_shift, ret, create->liobn, create->addr_hi,
> create->addr_lo);
> +		"(liobn = 0x%x starting addr = %x %x)\n",
> +		 ddw_avail[DDW_CREATE_PE_DMA_WIN], cfg_addr,
> BUID_HI(buid),
> +		 BUID_LO(buid), page_shift, window_shift, ret, create-
> >liobn,
> +		 create->addr_hi, create->addr_lo);
>  
>  	return ret;
>  }
> @@ -996,7 +1005,7 @@ static u64 enable_ddw(struct pci_dev *dev,
> struct device_node *pdn)
>  	int page_shift;
>  	u64 dma_addr, max_addr;
>  	struct device_node *dn;
> -	u32 ddw_avail[3];
> +	u32 ddw_avail[DDW_APPLICABLE_SIZE];
>  	struct direct_window *window;
>  	struct property *win64;
>  	struct dynamic_dma_window_prop *ddwprop;
> @@ -1029,7 +1038,7 @@ static u64 enable_ddw(struct pci_dev *dev,
> struct device_node *pdn)
>  	 * the property is actually in the parent, not the PE
>  	 */
>  	ret = of_property_read_u32_array(pdn, "ibm,ddw-applicable",
> -					 &ddw_avail[0], 3);
> +					 &ddw_avail[0],
> DDW_APPLICABLE_SIZE);
>  	if (ret)
>  		goto out_failed;
>  
Tested-by: David Dai <zdai@linux.vnet.ibm.com>


^ permalink raw reply

* Re: [PATCH v4 2/7] powerpc/pseries/iommu: Update call to ibm, query-pe-dma-windows
From: David Dai @ 2020-08-04 21:31 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Alexey Kardashevskiy, Joel Stanley,
	Christophe Leroy, Thiago Jung Bauermann, Ram Pai, Brian King
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200716071658.467820-3-leobras.c@gmail.com>

On Thu, 2020-07-16 at 04:16 -0300, Leonardo Bras wrote:
> > From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can make the
> > number of
> 
> outputs from "ibm,query-pe-dma-windows" go from 5 to 6.
> 
> This change of output size is meant to expand the address size of
> largest_available_block PE TCE from 32-bit to 64-bit, which ends up
> shifting page_size and migration_capable.
> 
> This ends up requiring the update of
> ddw_query_response->largest_available_block from u32 to u64, and
> manually
> assigning the values from the buffer into this struct, according to
> output size.
> 
> Also, a routine was created for helping reading the ddw extensions as
> suggested by LoPAR: First reading the size of the extension array
> from
> index 0, checking if the property exists, and then returning it's
> value.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
>  arch/powerpc/platforms/pseries/iommu.c | 91 +++++++++++++++++++++++-
> --
>  1 file changed, 81 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c
> b/arch/powerpc/platforms/pseries/iommu.c
> index ac0d6376bdad..1a933c4e8bba 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -47,6 +47,12 @@ enum {
>  	DDW_APPLICABLE_SIZE
>  };
>  
> +enum {
> +	DDW_EXT_SIZE = 0,
> +	DDW_EXT_RESET_DMA_WIN = 1,
> +	DDW_EXT_QUERY_OUT_SIZE = 2
> +};
> +
>  static struct iommu_table_group *iommu_pseries_alloc_group(int node)
>  {
>  	struct iommu_table_group *table_group;
> @@ -342,7 +348,7 @@ struct direct_window {
>  /* Dynamic DMA Window support */
>  struct ddw_query_response {
>  	u32 windows_available;
> -	u32 largest_available_block;
> +	u64 largest_available_block;
>  	u32 page_size;
>  	u32 migration_capable;
>  };
> @@ -877,14 +883,62 @@ static int find_existing_ddw_windows(void)
>  }
>  machine_arch_initcall(pseries, find_existing_ddw_windows);
>  
> +/**
> + * ddw_read_ext - Get the value of an DDW extension
> + * @np:		device node from which the extension value is
> to be read.
> + * @extnum:	index number of the extension.
> + * @value:	pointer to return value, modified when extension is
> available.
> + *
> + * Checks if "ibm,ddw-extensions" exists for this node, and get the
> value
> + * on index 'extnum'.
> + * It can be used only to check if a property exists, passing value
> == NULL.
> + *
> + * Returns:
> + *	0 if extension successfully read
> + *	-EINVAL if the "ibm,ddw-extensions" does not exist,
> + *	-ENODATA if "ibm,ddw-extensions" does not have a value, and
> + *	-EOVERFLOW if "ibm,ddw-extensions" does not contain this
> extension.
> + */
> +static inline int ddw_read_ext(const struct device_node *np, int
> extnum,
> +			       u32 *value)
> +{
> +	static const char propname[] = "ibm,ddw-extensions";
> +	u32 count;
> +	int ret;
> +
> +	ret = of_property_read_u32_index(np, propname, DDW_EXT_SIZE,
> &count);
> +	if (ret)
> +		return ret;
> +
> +	if (count < extnum)
> +		return -EOVERFLOW;
> +
> +	if (!value)
> +		value = &count;
> +
> +	return of_property_read_u32_index(np, propname, extnum, value);
> +}
> +
>  static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
> -			struct ddw_query_response *query)
> +		     struct ddw_query_response *query,
> +		     struct device_node *parent)
>  {
>  	struct device_node *dn;
>  	struct pci_dn *pdn;
> -	u32 cfg_addr;
> +	u32 cfg_addr, ext_query, query_out[5];
>  	u64 buid;
> -	int ret;
> +	int ret, out_sz;
> +
> +	/*
> +	 * From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can rule
> how many
> +	 * output parameters ibm,query-pe-dma-windows will have,
> ranging from
> +	 * 5 to 6.
> +	 */
> +	ret = ddw_read_ext(parent, DDW_EXT_QUERY_OUT_SIZE, &ext_query);
> +	if (!ret && ext_query == 1)
> +		out_sz = 6;
> +	else
> +		out_sz = 5;
>  
>  	/*
>  	 * Get the config address and phb buid of the PE window.
> @@ -897,11 +951,28 @@ static int query_ddw(struct pci_dev *dev, const
> u32 *ddw_avail,
>  	buid = pdn->phb->buid;
>  	cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
>  
> -	ret = rtas_call(ddw_avail[DDW_QUERY_PE_DMA_WIN], 3, 5, (u32
> *)query,
> +	ret = rtas_call(ddw_avail[DDW_QUERY_PE_DMA_WIN], 3, out_sz,
> query_out,
>  			cfg_addr, BUID_HI(buid), BUID_LO(buid));
> -	dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
> -		" returned %d\n", ddw_avail[DDW_QUERY_PE_DMA_WIN],
> cfg_addr,
> -		 BUID_HI(buid), BUID_LO(buid), ret);
> +	dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x
> returned %d\n",
> +		 ddw_avail[DDW_QUERY_PE_DMA_WIN], cfg_addr,
> BUID_HI(buid),
> +		 BUID_LO(buid), ret);
> +
> +	switch (out_sz) {
> +	case 5:
> +		query->windows_available = query_out[0];
> +		query->largest_available_block = query_out[1];
> +		query->page_size = query_out[2];
> +		query->migration_capable = query_out[3];
> +		break;
> +	case 6:
> +		query->windows_available = query_out[0];
> +		query->largest_available_block = ((u64)query_out[1] <<
> 32) |
> +						 query_out[2];
> +		query->page_size = query_out[3];
> +		query->migration_capable = query_out[4];
> +		break;
> +	}
> +
>  	return ret;
>  }
>  
> @@ -1049,7 +1120,7 @@ static u64 enable_ddw(struct pci_dev *dev,
> struct device_node *pdn)
>  	 * of page sizes: supported and supported for migrate-dma.
>  	 */
>  	dn = pci_device_to_OF_node(dev);
> -	ret = query_ddw(dev, ddw_avail, &query);
> +	ret = query_ddw(dev, ddw_avail, &query, pdn);
>  	if (ret != 0)
>  		goto out_failed;
>  
> @@ -1077,7 +1148,7 @@ static u64 enable_ddw(struct pci_dev *dev,
> struct device_node *pdn)
>  	/* check largest block * page size > max memory hotplug addr */
>  	max_addr = ddw_memory_hotplug_max();
>  	if (query.largest_available_block < (max_addr >> page_shift)) {
> -		dev_dbg(&dev->dev, "can't map partition max 0x%llx with
> %u "
> +		dev_dbg(&dev->dev, "can't map partition max 0x%llx with
> %llu "
>  			  "%llu-sized pages\n",
> max_addr,  query.largest_available_block,
>  			  1ULL << page_shift);
>  		goto out_failed;
Tested-by: David Dai <zdai@linux.vnet.ibm.com>


^ permalink raw reply

* Re: [PATCH v4 3/7] powerpc/pseries/iommu: Move window-removing part of remove_ddw into remove_dma_window
From: David Dai @ 2020-08-04 21:32 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Alexey Kardashevskiy, Joel Stanley,
	Christophe Leroy, Thiago Jung Bauermann, Ram Pai, Brian King
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200716071658.467820-4-leobras.c@gmail.com>

On Thu, 2020-07-16 at 04:16 -0300, Leonardo Bras wrote:
> Move the window-removing part of remove_ddw into a new function
> (remove_dma_window), so it can be used to remove other DMA windows.
> 
> It's useful for removing DMA windows that don't create
> DIRECT64_PROPNAME
> property, like the default DMA window from the device, which uses
> "ibm,dma-window".
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
>  arch/powerpc/platforms/pseries/iommu.c | 45 +++++++++++++++---------
> --
>  1 file changed, 27 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c
> b/arch/powerpc/platforms/pseries/iommu.c
> index 1a933c4e8bba..4e33147825cc 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -781,25 +781,14 @@ static int __init disable_ddw_setup(char *str)
>  
>  early_param("disable_ddw", disable_ddw_setup);
>  
> -static void remove_ddw(struct device_node *np, bool remove_prop)
> +static void remove_dma_window(struct device_node *np, u32
> *ddw_avail,
> +			      struct property *win)
>  {
>  	struct dynamic_dma_window_prop *dwp;
> -	struct property *win64;
> -	u32 ddw_avail[DDW_APPLICABLE_SIZE];
>  	u64 liobn;
> -	int ret = 0;
> -
> -	ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
> -					 &ddw_avail[0],
> DDW_APPLICABLE_SIZE);
> -
> -	win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
> -	if (!win64)
> -		return;
> -
> -	if (ret || win64->length < sizeof(*dwp))
> -		goto delprop;
> +	int ret;
>  
> -	dwp = win64->value;
> +	dwp = win->value;
>  	liobn = (u64)be32_to_cpu(dwp->liobn);
>  
>  	/* clear the whole window, note the arg is in kernel pages */
> @@ -821,10 +810,30 @@ static void remove_ddw(struct device_node *np,
> bool remove_prop)
>  		pr_debug("%pOF: successfully removed direct window:
> rtas returned "
>  			"%d to ibm,remove-pe-dma-window(%x) %llx\n",
>  			np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN],
> liobn);
> +}
> +
> +static void remove_ddw(struct device_node *np, bool remove_prop)
> +{
> +	struct property *win;
> +	u32 ddw_avail[DDW_APPLICABLE_SIZE];
> +	int ret = 0;
> +
> +	ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
> +					 &ddw_avail[0],
> DDW_APPLICABLE_SIZE);
> +	if (ret)
> +		return;
> +
> +	win = of_find_property(np, DIRECT64_PROPNAME, NULL);
> +	if (!win)
> +		return;
> +
> +	if (win->length >= sizeof(struct dynamic_dma_window_prop))
> +		remove_dma_window(np, ddw_avail, win);
> +
> +	if (!remove_prop)
> +		return;
>  
> -delprop:
> -	if (remove_prop)
> -		ret = of_remove_property(np, win64);
> +	ret = of_remove_property(np, win);
>  	if (ret)
>  		pr_warn("%pOF: failed to remove direct window property:
> %d\n",
>  			np, ret);
Tested-by: David Dai <zdai@linux.vnet.ibm.com>


^ permalink raw reply

* Re: [PATCH v4 4/7] powerpc/pseries/iommu: Remove default DMA window before creating DDW
From: David Dai @ 2020-08-04 21:33 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Alexey Kardashevskiy, Joel Stanley,
	Christophe Leroy, Thiago Jung Bauermann, Ram Pai, Brian King
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200716071658.467820-5-leobras.c@gmail.com>

On Thu, 2020-07-16 at 04:16 -0300, Leonardo Bras wrote:
> On LoPAR "DMA Window Manipulation Calls", it's recommended to remove
> the
> default DMA window for the device, before attempting to configure a
> DDW,
> in order to make the maximum resources available for the next DDW to
> be
> created.
> 
> This is a requirement for using DDW on devices in which hypervisor
> allows only one DMA window.
> 
> If setting up a new DDW fails anywhere after the removal of this
> default DMA window, it's needed to restore the default DMA window.
> For this, an implementation of ibm,reset-pe-dma-windows rtas call is
> needed:
> 
> Platforms supporting the DDW option starting with LoPAR level 2.7
> implement
> ibm,ddw-extensions. The first extension available (index 2) carries
> the
> token for ibm,reset-pe-dma-windows rtas call, which is used to
> restore
> the default DMA window for a device, if it has been deleted.
> 
> It does so by resetting the TCE table allocation for the PE to it's
> boot time value, available in "ibm,dma-window" device tree node.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
>  arch/powerpc/platforms/pseries/iommu.c | 73 +++++++++++++++++++++++-
> --
>  1 file changed, 66 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c
> b/arch/powerpc/platforms/pseries/iommu.c
> index 4e33147825cc..fc8d0555e2e9 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -1066,6 +1066,38 @@ static phys_addr_t
> ddw_memory_hotplug_max(void)
>  	return max_addr;
>  }
>  
> +/*
> + * Platforms supporting the DDW option starting with LoPAR level 2.7
> implement
> + * ibm,ddw-extensions, which carries the rtas token for
> + * ibm,reset-pe-dma-windows.
> + * That rtas-call can be used to restore the default DMA window for
> the device.
> + */
> +static void reset_dma_window(struct pci_dev *dev, struct device_node
> *par_dn)
> +{
> +	int ret;
> +	u32 cfg_addr, reset_dma_win;
> +	u64 buid;
> +	struct device_node *dn;
> +	struct pci_dn *pdn;
> +
> +	ret = ddw_read_ext(par_dn, DDW_EXT_RESET_DMA_WIN,
> &reset_dma_win);
> +	if (ret)
> +		return;
> +
> +	dn = pci_device_to_OF_node(dev);
> +	pdn = PCI_DN(dn);
> +	buid = pdn->phb->buid;
> +	cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
> +
> +	ret = rtas_call(reset_dma_win, 3, 1, NULL, cfg_addr,
> BUID_HI(buid),
> +			BUID_LO(buid));
> +	if (ret)
> +		dev_info(&dev->dev,
> +			 "ibm,reset-pe-dma-windows(%x) %x %x %x
> returned %d ",
> +			 reset_dma_win, cfg_addr, BUID_HI(buid),
> BUID_LO(buid),
> +			 ret);
> +}
> +
>  /*
>   * If the PE supports dynamic dma windows, and there is space for a
> table
>   * that can map all pages in a linear offset, then setup such a
> table,
> @@ -1090,6 +1122,7 @@ static u64 enable_ddw(struct pci_dev *dev,
> struct device_node *pdn)
>  	struct property *win64;
>  	struct dynamic_dma_window_prop *ddwprop;
>  	struct failed_ddw_pdn *fpdn;
> +	bool default_win_removed = false;
>  
>  	mutex_lock(&direct_window_init_mutex);
>  
> @@ -1133,14 +1166,38 @@ static u64 enable_ddw(struct pci_dev *dev,
> struct device_node *pdn)
>  	if (ret != 0)
>  		goto out_failed;
>  
> +	/*
> +	 * If there is no window available, remove the default DMA
> window,
> +	 * if it's present. This will make all the resources available
> to the
> +	 * new DDW window.
> +	 * If anything fails after this, we need to restore it, so also
> check
> +	 * for extensions presence.
> +	 */
>  	if (query.windows_available == 0) {
> -		/*
> -		 * no additional windows are available for this device.
> -		 * We might be able to reallocate the existing window,
> -		 * trading in for a larger page size.
> -		 */
> -		dev_dbg(&dev->dev, "no free dynamic windows");
> -		goto out_failed;
> +		struct property *default_win;
> +		int reset_win_ext;
> +
> +		default_win = of_find_property(pdn, "ibm,dma-window",
> NULL);
> +		if (!default_win)
> +			goto out_failed;
> +
> +		reset_win_ext = ddw_read_ext(pdn,
> DDW_EXT_RESET_DMA_WIN, NULL);
> +		if (reset_win_ext)
> +			goto out_failed;
> +
> +		remove_dma_window(pdn, ddw_avail, default_win);
> +		default_win_removed = true;
> +
> +		/* Query again, to check if the window is available */
> +		ret = query_ddw(dev, ddw_avail, &query, pdn);
> +		if (ret != 0)
> +			goto out_failed;
> +
> +		if (query.windows_available == 0) {
> +			/* no windows are available for this device. */
> +			dev_dbg(&dev->dev, "no free dynamic windows");
> +			goto out_failed;
> +		}
>  	}
>  	if (query.page_size & 4) {
>  		page_shift = 24; /* 16MB */
> @@ -1231,6 +1288,8 @@ static u64 enable_ddw(struct pci_dev *dev,
> struct device_node *pdn)
>  	kfree(win64);
>  
>  out_failed:
> +	if (default_win_removed)
> +		reset_dma_window(dev, pdn);
>  
>  	fpdn = kzalloc(sizeof(*fpdn), GFP_KERNEL);
>  	if (!fpdn)
Tested-by: David Dai <zdai@linux.vnet.ibm.com>


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox