LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v2 01/17] KVM: PPC: Book3S HV: simplify kvm_cma_reserve()
From: Daniel Axtens @ 2020-08-04 13:53 UTC (permalink / raw)
  To: Mike Rapoport, Andrew Morton
  Cc: Thomas Gleixner, Emil Renner Berthing, linux-sh, Peter Zijlstra,
	Dave Hansen, linux-mips, Max Filippov, Paul Mackerras, sparclinux,
	linux-riscv, Will Deacon, Christoph Hellwig, Marek Szyprowski,
	linux-arch, linux-s390, linux-c6x-dev, Baoquan He, x86,
	Russell King, Mike Rapoport, clang-built-linux, Ingo Molnar,
	linux-arm-kernel, Catalin Marinas, uclinux-h8-devel, linux-xtensa,
	openrisc, Borislav Petkov, Andy Lutomirski, Paul Walmsley,
	Stafford Horne, Hari Bathini, Michal Simek, Yoshinori Sato,
	linux-mm, linux-kernel, iommu, Palmer Dabbelt, linuxppc-dev,
	Mike Rapoport
In-Reply-To: <20200802163601.8189-2-rppt@kernel.org>

Hi Mike,

>
> The memory size calculation in kvm_cma_reserve() traverses memblock.memory
> rather than simply call memblock_phys_mem_size(). The comment in that
> function suggests that at some point there should have been call to
> memblock_analyze() before memblock_phys_mem_size() could be used.
> As of now, there is no memblock_analyze() at all and
> memblock_phys_mem_size() can be used as soon as cold-plug memory is
> registerd with memblock.
>
> Replace loop over memblock.memory with a call to memblock_phys_mem_size().
>
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>  arch/powerpc/kvm/book3s_hv_builtin.c | 11 ++---------
>  1 file changed, 2 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
> index 7cd3cf3d366b..56ab0d28de2a 100644
> --- a/arch/powerpc/kvm/book3s_hv_builtin.c
> +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
> @@ -95,22 +95,15 @@ EXPORT_SYMBOL_GPL(kvm_free_hpt_cma);
>  void __init kvm_cma_reserve(void)
>  {
>  	unsigned long align_size;
> -	struct memblock_region *reg;
> -	phys_addr_t selected_size = 0;
> +	phys_addr_t selected_size;
>  
>  	/*
>  	 * We need CMA reservation only when we are in HV mode
>  	 */
>  	if (!cpu_has_feature(CPU_FTR_HVMODE))
>  		return;
> -	/*
> -	 * We cannot use memblock_phys_mem_size() here, because
> -	 * memblock_analyze() has not been called yet.
> -	 */
> -	for_each_memblock(memory, reg)
> -		selected_size += memblock_region_memory_end_pfn(reg) -
> -				 memblock_region_memory_base_pfn(reg);
>  
> +	selected_size = PHYS_PFN(memblock_phys_mem_size());
>  	selected_size = (selected_size * kvm_cma_resv_ratio / 100) << PAGE_SHIFT;

I think this is correct, but PHYS_PFN does x >> PAGE_SHIFT and then the
next line does x << PAGE_SHIFT, so I think we could combine those two
lines as:

selected_size = PAGE_ALIGN(memblock_phys_mem_size() * kvm_cma_resv_ratio / 100);

(I think that might technically change it from aligning down to aligning
up but I don't think 1 page matters here.)

Kind regards,
Daniel


>  	if (selected_size) {
>  		pr_debug("%s: reserving %ld MiB for global area\n", __func__,
> -- 
> 2.26.2

^ permalink raw reply

* Re: [PATCHv4 2/2] powerpc/pseries: update device tree before ejecting hotplug uevents
From: Pingfan Liu @ 2020-08-04 13:38 UTC (permalink / raw)
  To: Laurent Dufour
  Cc: Nathan Lynch, Kexec Mailing List, linuxppc-dev, Hari Bathini,
	Nathan Fontenot
In-Reply-To: <951e4406-1172-09a3-a8db-6f0a92aea2ed@linux.ibm.com>

On Tue, Aug 4, 2020 at 12:29 AM Laurent Dufour <ldufour@linux.ibm.com> wrote:
>
[...]
> >       lmb_set_nid(lmb);
> >       lmb->flags |= DRCONF_MEM_ASSIGNED;
> > +     if (dt_update) {
> > +             ret = drmem_update_dt();
> > +             if (ret)
> > +                     pr_warn("%s fail to update dt, but continue\n", __func__);
> > +     }
> >
> >       block_sz = memory_block_size_bytes();
>
> In the case the call to __add_memory is failing, the flag DRCONF_MEM_ASSIGNED
> should be cleared as I mentioned in your previous patch. In addition to this,
Yes.
> the DT should be updated, or the caller should manage that but that will be hard
> since it doesn't know where the error happened in this function.
Yeah, it is hard to manage it by caller, so just updating dt  is a
easier method.
>
> >
> > @@ -625,7 +653,11 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
> >               invalidate_lmb_associativity_index(lmb);
> >               lmb_clear_nid(lmb);
> >               lmb->flags &= ~DRCONF_MEM_ASSIGNED;
> > -
> > +             if (dt_update) {
> > +                     ret = drmem_update_dt();
> > +                     if (ret)
> > +                             pr_warn("%s fail to update dt during rollback, but continue\n", __func__);
> > +             }
> >               __remove_memory(nid, base_addr, block_sz);
> >       }
> >
> > @@ -638,6 +670,7 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
> >       int lmbs_available = 0;
> >       int lmbs_added = 0;
> >       int rc;
> > +     bool dt_update = false;
> >
> >       pr_info("Attempting to hot-add %d LMB(s)\n", lmbs_to_add);
> >
> > @@ -664,7 +697,7 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
> >               if (rc)
> >                       continue;
> >
> > -             rc = dlpar_add_lmb(lmb);
> > +             rc = dlpar_add_lmb(lmb, dt_update);
> >               if (rc) {
> >                       dlpar_release_drc(lmb->drc_index);
> >                       continue;
> > @@ -678,16 +711,23 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
> >               lmbs_added++;
> >               if (lmbs_added == lmbs_to_add)
> >                       break;
> > +             else if (lmbs_added == lmbs_to_add - 1)
> > +                     dt_update = true;
>
> In the case the number of LMB to add is 1, dt_update is never set to true, and
> the device tree is never updated. You need to set dt_update to true earlier in
> the loop block.
Oh, I will fix it in V5
>
> >       }
> >
> >       if (lmbs_added != lmbs_to_add) {
> > +             bool rollback_dt_update = false;
> > +
> >               pr_err("Memory hot-add failed, removing any added LMBs\n");
> >
> >               for_each_drmem_lmb(lmb) {
> >                       if (!drmem_lmb_reserved(lmb))
> >                               continue;
> >
> > -                     rc = dlpar_remove_lmb(lmb);
> > +                     if (--lmbs_added == 0 && dt_update)
> > +                             rollback_dt_update = true;
>
> That test may have to be review to deal with error during the last LMB addition,
> the DT may have been updated before __add_memory() is failing in
> dlpar_add_lmb(). In that case, lmbs_added == 0 and that branch is not covered.
> That's not an issue if dlpar_add_lmb() is handling that case (see my comment above).
I will fix it in next version.

Thanks for your review.

Regards,
Pingfan

^ permalink raw reply

* Re: [PATCH] powerpc: Fix circular dependency between percpu.h and mmu.h
From: Michael Ellerman @ 2020-08-04 13:37 UTC (permalink / raw)
  To: linuxppc-dev, Michael Ellerman; +Cc: sfr, w
In-Reply-To: <20200804130558.292328-1-mpe@ellerman.id.au>

On Tue, 4 Aug 2020 23:05:58 +1000, Michael Ellerman wrote:
> Recently random.h started including percpu.h (see commit
> f227e3ec3b5c ("random32: update the net random state on interrupt and
> activity")), which broke corenet64_smp_defconfig:
> 
>   In file included from /linux/arch/powerpc/include/asm/paca.h:18,
>                    from /linux/arch/powerpc/include/asm/percpu.h:13,
>                    from /linux/include/linux/random.h:14,
>                    from /linux/lib/uuid.c:14:
>   /linux/arch/powerpc/include/asm/mmu.h:139:22: error: unknown type name 'next_tlbcam_idx'
>     139 | DECLARE_PER_CPU(int, next_tlbcam_idx);
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc: Fix circular dependency between percpu.h and mmu.h
      https://git.kernel.org/powerpc/c/0c83b277ada72b585e6a3e52b067669df15bcedb

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/pseries/hotplug-cpu: increase wait time for vCPU death
From: Michael Ellerman @ 2020-08-04 13:35 UTC (permalink / raw)
  To: Michael Roth, linuxppc-dev
  Cc: Nathan Lynch, Cedric Le Goater, Thiago Jung Bauermann, Greg Kurz
In-Reply-To: <20200804032937.7235-1-mdroth@linux.vnet.ibm.com>

Hi Mike,

There is a bit of history to this code, but not in a good way :)

Michael Roth <mdroth@linux.vnet.ibm.com> writes:
> For a power9 KVM guest with XIVE enabled, running a test loop
> where we hotplug 384 vcpus and then unplug them, the following traces
> can be seen (generally within a few loops) either from the unplugged
> vcpu:
>
>   [ 1767.353447] cpu 65 (hwid 65) Ready to die...
>   [ 1767.952096] Querying DEAD? cpu 66 (66) shows 2
>   [ 1767.952311] list_del corruption. next->prev should be c00a000002470208, but was c00a000002470048
...
>
> At that point the worker thread assumes the unplugged CPU is in some
> unknown/dead state and procedes with the cleanup, causing the race with
> the XIVE cleanup code executed by the unplugged CPU.
>
> Fix this by inserting an msleep() after each RTAS call to avoid

We previously had an msleep(), but it was removed:

  b906cfa397fd ("powerpc/pseries: Fix cpu hotplug")

> pseries_cpu_die() returning prematurely, and double the number of
> attempts so we wait at least a total of 5 seconds. While this isn't an
> ideal solution, it is similar to how we dealt with a similar issue for
> cede_offline mode in the past (940ce422a3).

Thiago tried to fix this previously but there was a bit of discussion
that didn't quite resolve:

  https://lore.kernel.org/linuxppc-dev/20190423223914.3882-1-bauerman@linux.ibm.com/


Spinning forever seems like a bad idea, but as has been demonstrated at
least twice now, continuing when we don't know the state of the other
CPU can lead to straight up crashes.

So I think I'm persuaded that it's preferable to have the kernel stuck
spinning rather than oopsing.

I'm 50/50 on whether we should have a cond_resched() in the loop. My
first instinct is no, if we're stuck here for 20s a stack trace would be
good. But then we will probably hit that on some big and/or heavily
loaded machine.

So possibly we should call cond_resched() but have some custom logic in
the loop to print a warning if we are stuck for more than some
sufficiently long amount of time.


> Fixes: eac1e731b59ee ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1856588

This is not public.

I tend to trim Bugzilla links from the change log, because I'm not
convinced they will last forever, but it is good to have them in the
mail archive.

cheers

> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Cedric Le Goater <clg@kaod.org>
> Cc: Greg Kurz <groug@kaod.org>
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/hotplug-cpu.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index c6e0d8abf75e..3cb172758052 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -111,13 +111,12 @@ static void pseries_cpu_die(unsigned int cpu)
>  	int cpu_status = 1;
>  	unsigned int pcpu = get_hard_smp_processor_id(cpu);
>  
> -	for (tries = 0; tries < 25; tries++) {
> +	for (tries = 0; tries < 50; tries++) {
>  		cpu_status = smp_query_cpu_stopped(pcpu);
>  		if (cpu_status == QCSS_STOPPED ||
>  		    cpu_status == QCSS_HARDWARE_ERROR)
>  			break;
> -		cpu_relax();
> -
> +		msleep(100);
>  	}
>  
>  	if (cpu_status != 0) {
> -- 
> 2.17.1

^ permalink raw reply

* Re: [PATCHv4 1/2] powerpc/pseries: group lmb operation and memblock's
From: Pingfan Liu @ 2020-08-04 13:33 UTC (permalink / raw)
  To: Laurent Dufour
  Cc: Nathan Lynch, Kexec Mailing List, linuxppc-dev, Hari Bathini,
	Nathan Fontenot
In-Reply-To: <ee585548-aefc-ff3f-6a79-e616b9e04f12@linux.ibm.com>

On Mon, Aug 3, 2020 at 9:52 PM Laurent Dufour <ldufour@linux.ibm.com> wrote:
>
> > @@ -603,6 +606,8 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
> >         }
> >
> >         lmb_set_nid(lmb);
> > +       lmb->flags |= DRCONF_MEM_ASSIGNED;
> > +
> >         block_sz = memory_block_size_bytes();
> >
> >         /* Add the memory */
>
> Since the lmb->flags is now set earlier, you should unset it in the case the
> call to __add_memory() fails, something like:
>
> @@ -614,6 +614,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
>         rc = __add_memory(lmb->nid, lmb->base_addr, block_sz);
>         if (rc) {
>                 invalidate_lmb_associativity_index(lmb);
> +               lmb->flags &= ~DRCONF_MEM_ASSIGNED;
You are right. I will fix it in V5.

Thanks,
Pingfan

^ permalink raw reply

* [PATCH] powerpc: Fix circular dependency between percpu.h and mmu.h
From: Michael Ellerman @ 2020-08-04 13:05 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: sfr, w

Recently random.h started including percpu.h (see commit
f227e3ec3b5c ("random32: update the net random state on interrupt and
activity")), which broke corenet64_smp_defconfig:

  In file included from /linux/arch/powerpc/include/asm/paca.h:18,
                   from /linux/arch/powerpc/include/asm/percpu.h:13,
                   from /linux/include/linux/random.h:14,
                   from /linux/lib/uuid.c:14:
  /linux/arch/powerpc/include/asm/mmu.h:139:22: error: unknown type name 'next_tlbcam_idx'
    139 | DECLARE_PER_CPU(int, next_tlbcam_idx);

This is due to a circular header dependency:
  asm/mmu.h includes asm/percpu.h, which includes asm/paca.h, which
  includes asm/mmu.h

Which means DECLARE_PER_CPU() isn't defined when mmu.h needs it.

We can fix it by moving the include of paca.h below the include of
asm-generic/percpu.h.

This moves the include of paca.h out of the #ifdef __powerpc64__, but
that is OK because paca.h is almost entirely inside #ifdef
CONFIG_PPC64 anyway.

It also moves the include of paca.h out of the #ifdef CONFIG_SMP,
which could possibly break something, but seems to have no ill
effects.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: f227e3ec3b5c ("random32: update the net random state on interrupt and activity")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/include/asm/percpu.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/percpu.h b/arch/powerpc/include/asm/percpu.h
index dce863a7635c..8e5b7d0b851c 100644
--- a/arch/powerpc/include/asm/percpu.h
+++ b/arch/powerpc/include/asm/percpu.h
@@ -10,8 +10,6 @@
 
 #ifdef CONFIG_SMP
 
-#include <asm/paca.h>
-
 #define __my_cpu_offset local_paca->data_offset
 
 #endif /* CONFIG_SMP */
@@ -19,4 +17,6 @@
 
 #include <asm-generic/percpu.h>
 
+#include <asm/paca.h>
+
 #endif /* _ASM_POWERPC_PERCPU_H_ */
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask
From: peterz @ 2020-08-04 12:47 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Gautham R Shenoy, Michael Neuling, Vincent Guittot, Rik van Riel,
	linuxppc-dev, LKML, Ingo Molnar, Thomas Gleixner, Mel Gorman,
	Valentin Schneider, Dietmar Eggemann
In-Reply-To: <20200804121007.GJ24375@linux.vnet.ibm.com>

On Tue, Aug 04, 2020 at 05:40:07PM +0530, Srikar Dronamraju wrote:
> * peterz@infradead.org <peterz@infradead.org> [2020-08-04 12:45:20]:
> 
> > On Tue, Aug 04, 2020 at 09:03:06AM +0530, Srikar Dronamraju wrote:
> > > cpu_smt_mask tracks topology_sibling_cpumask. This would be good for
> > > most architectures. One of the users of cpu_smt_mask(), would be to
> > > identify idle-cores. On Power9, a pair of cores can be presented by the
> > > firmware as a big-core for backward compatibility reasons.
> > > 
> > > In order to maintain userspace backward compatibility with previous
> > > versions of processor, (since Power8 had SMT8 cores), Power9 onwards there
> > > is option to the firmware to advertise a pair of SMT4 cores as a fused
> > > cores (referred to as the big_core mode in the Linux Kernel). On Power9
> > > this pair shares the L2 cache as well. However, from the scheduler's point
> > > of view, a core should be determined by SMT4. The load-balancer already
> > > does this. Hence allow PowerPc architecture to override the default
> > > cpu_smt_mask() to point to the SMT4 cores in a big_core mode.
> > 
> > I'm utterly confused.
> > 
> > Why can't you set your regular siblings mask to the smt4 thing? Who
> > cares about the compat stuff, I thought that was an LPAR/AIX thing.
> 
> There are no technical challenges to set the sibling mask to SMT4.
> This is for Linux running on PowerVM. When these Power9 boxes are sold /
> marketed as X core boxes (where X stand for SMT8 cores).  Since on PowerVM
> world, everything is in SMT8 mode, the device tree properties still mark
> the system to be running on 8 thread core. There are a number of utilities
> like ppc64_cpu that directly read from device-tree. They would get core
> count and thread count which is SMT8 based.
> 
> If the sibling_mask is set to small core, then same user when looking at

FWIW, I find the small/big core naming utterly confusing vs the
big/little naming from ARM. When you say small, I'm thinking of
asymmetric cores, not SMT4/SMT8.

> output from lscpu and other utilities that look at sysfs will start seeing
> 2x number of cores to what he had provisioned and what the utilities from
> the device-tree show. This can gets users confused.

One will report SMT8 and the other SMT4, right? So only users that
cannot read will be confused, but if you can't read, confusion is
guaranteed anyway.

Also, by exposing the true (SMT4) topology to userspace, userspace
applications could behave better -- for those few that actually parse
the topology information.

> So to keep the device-tree properties, utilities depending on device-tree,
> sysfs and utilities depending on sysfs on the same page, userspace are only
> exposed as SMT8.

I'm not convinced it makes sense to lie to userspace just to accomodate
a few users that cannot read.

^ permalink raw reply

* Re: [merge] Build failure selftest/powerpc/mm/pkey_exec_prot
From: Michael Ellerman @ 2020-08-04 12:23 UTC (permalink / raw)
  To: Sandipan Das; +Cc: Sachin Sant, linuxppc-dev
In-Reply-To: <185c2277-91fd-74eb-3c04-75caeb90ed9e@linux.ibm.com>

Sandipan Das <sandipan@linux.ibm.com> writes:
> On 04/08/20 6:38 am, Michael Ellerman wrote:
>> Sandipan Das <sandipan@linux.ibm.com> writes:
>>> On 03/08/20 4:32 pm, Michael Ellerman wrote:
>>>> Sachin Sant <sachinp@linux.vnet.ibm.com> writes:
>>>>>> On 02-Aug-2020, at 10:58 PM, Sandipan Das <sandipan@linux.ibm.com> wrote:
>>>>>> On 02/08/20 4:45 pm, Sachin Sant wrote:
>>>>>>> pkey_exec_prot test from linuxppc merge branch (3f68564f1f5a) fails to
>>>>>>> build due to following error:
>>>>>>>
>>>>>>> gcc -std=gnu99 -O2 -Wall -Werror -DGIT_VERSION='"v5.8-rc7-1276-g3f68564f1f5a"' -I/home/sachin/linux/tools/testing/selftests/powerpc/include  -m64    pkey_exec_prot.c /home/sachin/linux/tools/testing/selftests/kselftest_harness.h /home/sachin/linux/tools/testing/selftests/kselftest.h ../harness.c ../utils.c  -o /home/sachin/linux/tools/testing/selftests/powerpc/mm/pkey_exec_prot
>>>>>>> In file included from pkey_exec_prot.c:18:
>>>>>>> /home/sachin/linux/tools/testing/selftests/powerpc/include/pkeys.h:34: error: "SYS_pkey_mprotect" redefined [-Werror]
>>>>>>> #define SYS_pkey_mprotect 386
>>>>>>>
>>>>>>> In file included from /usr/include/sys/syscall.h:31,
>>>>>>>                 from /home/sachin/linux/tools/testing/selftests/powerpc/include/utils.h:47,
>>>>>>>                 from /home/sachin/linux/tools/testing/selftests/powerpc/include/pkeys.h:12,
>>>>>>>                 from pkey_exec_prot.c:18:
>>>>>>> /usr/include/bits/syscall.h:1583: note: this is the location of the previous definition
>>>>>>> # define SYS_pkey_mprotect __NR_pkey_mprotect
>>>>>>>
>>>>>>> commit 128d3d021007 introduced this error.
>>>>>>> selftests/powerpc: Move pkey helpers to headers
>>>>>>>
>>>>>>> Possibly the # defines for sys calls can be retained in pkey_exec_prot.c or
>>>>>>>
>>>>>>
>>>>>> I am unable to reproduce this on the latest merge branch (HEAD at f59195f7faa4).
>>>>>> I don't see any redefinitions in pkey_exec_prot.c either.
>>>>>>
>>>>>
>>>>> I can still see this problem on latest merge branch.
>>>>> I have following gcc version
>>>>>
>>>>> gcc version 8.3.1 20191121
>>>>
>>>> What libc version? Or just the distro & version?
>>>
>>> Sachin observed this on RHEL 8.2 with glibc-2.28.
>>> I couldn't reproduce it on Ubuntu 20.04 and Fedora 32 and both these distros
>>> are using glibc-2.31.
>> 
>> OK odd. Usually it's newer glibc that hits this problem.
>> 
>> I guess on RHEL 8.2 we're getting the asm-generic version? But that
>> would be quite wrong if that's what's happening.
>
> If I let GCC dump all the headers that are being used for the source file, I always
> see syscall.h being included on the RHEL 8.2 system. That is the header with the
> conflicting definition.
>
>   $ cd tools/testing/selftests/powerpc/mm
>   $ gcc -H -std=gnu99 -O2 -Wall -Werror -DGIT_VERSION='"v5.8-rc7-1456-gf59195f7faa4-dirty"' \
>         -I../include -m64 pkey_exec_prot.c ../../kselftest_harness.h ../../kselftest.h ../harness.c ../utils.c \
>         -o pkey_exec_prot 2>&1 | grep syscall
>
> On Ubuntu 20.04 and Fedora 32, grep doesn't find any matching text.
> On RHEL 8.2, it shows the following.
>   ... /usr/include/sys/syscall.h
>   .... /usr/include/bits/syscall.h
>   In file included from /usr/include/sys/syscall.h:31,
>   /usr/include/bits/syscall.h:1583: note: this is the location of the previous definition
>   In file included from /usr/include/sys/syscall.h:31,
>   /usr/include/bits/syscall.h:1575: note: this is the location of the previous definition
>   In file included from /usr/include/sys/syscall.h:31,
>   /usr/include/bits/syscall.h:1579: note: this is the location of the previous definition
>   /usr/include/bits/syscall.h
>   .. /usr/include/sys/syscall.h
>   ... /usr/include/bits/syscall.h
>   /usr/include/bits/syscall.h
>   .. /usr/include/sys/syscall.h
>   ... /usr/include/bits/syscall.h
>   /usr/include/bits/syscall.h
>
> So utils.h is also including /usr/include/sys/syscall.h for glibc versions older than 2.30
> because of commit 743f3544fffb ("selftests/powerpc: Add wrapper for gettid") :)

Haha, of course. :facepalm_emoji:

> [...]
> . ../include/pkeys.h
> [...]
> .. ../include/utils.h
> [...]
> ... /usr/include/sys/syscall.h
> .... /usr/include/asm/unistd.h
> .... /usr/include/bits/syscall.h
> In file included from pkey_exec_prot.c:18:
> ../include/pkeys.h:34: error: "SYS_pkey_mprotect" redefined [-Werror]
>  #define SYS_pkey_mprotect 386
>
> In file included from /usr/include/sys/syscall.h:31,
>                  from ../include/utils.h:47,
>                  from ../include/pkeys.h:12,
>                  from pkey_exec_prot.c:18:
> /usr/include/bits/syscall.h:1583: note: this is the location of the previous definition
>  # define SYS_pkey_mprotect __NR_pkey_mprotect

Aha, that explains why redefining gives us an error, because we're
defining it to the literal 386 whereas the system header is defining it
to the __NR value.

Is there a reason to use the SYS_ name?

Typically we just use the __NR value directly, and that would avoid any
problems with redefinition I think, as long as we're using the same
value as the system header (which we always should be).

eg:

diff --git a/tools/testing/selftests/powerpc/include/pkeys.h b/tools/testing/selftests/powerpc/include/pkeys.h
index 6ba95039a034..3312cb1b058d 100644
--- a/tools/testing/selftests/powerpc/include/pkeys.h
+++ b/tools/testing/selftests/powerpc/include/pkeys.h
@@ -31,9 +31,9 @@
 
 #define SI_PKEY_OFFSET	0x20
 
-#define SYS_pkey_mprotect	386
-#define SYS_pkey_alloc		384
-#define SYS_pkey_free		385
+#define __NR_pkey_mprotect	386
+#define __NR_pkey_alloc		384
+#define __NR_pkey_free		385
 
 #define PKEY_BITS_PER_PKEY	2
 #define NR_PKEYS		32
@@ -62,17 +62,17 @@ void pkey_set_rights(int pkey, unsigned long rights)
 
 int sys_pkey_mprotect(void *addr, size_t len, int prot, int pkey)
 {
-	return syscall(SYS_pkey_mprotect, addr, len, prot, pkey);
+	return syscall(__NR_pkey_mprotect, addr, len, prot, pkey);
 }
 
 int sys_pkey_alloc(unsigned long flags, unsigned long rights)
 {
-	return syscall(SYS_pkey_alloc, flags, rights);
+	return syscall(__NR_pkey_alloc, flags, rights);
 }
 
 int sys_pkey_free(int pkey)
 {
-	return syscall(SYS_pkey_free, pkey);
+	return syscall(__NR_pkey_free, pkey);
 }
 
 int pkeys_unsupported(void)


cheers

^ permalink raw reply related

* Re: [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask
From: Srikar Dronamraju @ 2020-08-04 12:10 UTC (permalink / raw)
  To: peterz
  Cc: Gautham R Shenoy, Michael Neuling, Vincent Guittot, Rik van Riel,
	linuxppc-dev, LKML, Ingo Molnar, Thomas Gleixner, Mel Gorman,
	Valentin Schneider, Dietmar Eggemann
In-Reply-To: <20200804104520.GB2657@hirez.programming.kicks-ass.net>

* peterz@infradead.org <peterz@infradead.org> [2020-08-04 12:45:20]:

> On Tue, Aug 04, 2020 at 09:03:06AM +0530, Srikar Dronamraju wrote:
> > cpu_smt_mask tracks topology_sibling_cpumask. This would be good for
> > most architectures. One of the users of cpu_smt_mask(), would be to
> > identify idle-cores. On Power9, a pair of cores can be presented by the
> > firmware as a big-core for backward compatibility reasons.
> > 
> > In order to maintain userspace backward compatibility with previous
> > versions of processor, (since Power8 had SMT8 cores), Power9 onwards there
> > is option to the firmware to advertise a pair of SMT4 cores as a fused
> > cores (referred to as the big_core mode in the Linux Kernel). On Power9
> > this pair shares the L2 cache as well. However, from the scheduler's point
> > of view, a core should be determined by SMT4. The load-balancer already
> > does this. Hence allow PowerPc architecture to override the default
> > cpu_smt_mask() to point to the SMT4 cores in a big_core mode.
> 
> I'm utterly confused.
> 
> Why can't you set your regular siblings mask to the smt4 thing? Who
> cares about the compat stuff, I thought that was an LPAR/AIX thing.

There are no technical challenges to set the sibling mask to SMT4.
This is for Linux running on PowerVM. When these Power9 boxes are sold /
marketed as X core boxes (where X stand for SMT8 cores).  Since on PowerVM
world, everything is in SMT8 mode, the device tree properties still mark
the system to be running on 8 thread core. There are a number of utilities
like ppc64_cpu that directly read from device-tree. They would get core
count and thread count which is SMT8 based.

If the sibling_mask is set to small core, then same user when looking at
output from lscpu and other utilities that look at sysfs will start seeing
2x number of cores to what he had provisioned and what the utilities from
the device-tree show. This can gets users confused.

So to keep the device-tree properties, utilities depending on device-tree,
sysfs and utilities depending on sysfs on the same page, userspace are only
exposed as SMT8.

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply

* Re: [PATCH 1/6] powerpc/powernv/smp: Fix spurious DBG() warning
From: Michael Ellerman @ 2020-08-04 12:05 UTC (permalink / raw)
  To: Joel Stanley, Oliver O'Halloran; +Cc: linuxppc-dev
In-Reply-To: <CACPK8XfoZ8+SUG6cuWuEJqdTfmxePsBGFGgqyrPvmn1WyRVyjA@mail.gmail.com>

Joel Stanley <joel@jms.id.au> writes:
> On Tue, 4 Aug 2020 at 00:57, Oliver O'Halloran <oohall@gmail.com> wrote:
>>
>> When building with W=1 we get the following warning:
>>
>>  arch/powerpc/platforms/powernv/smp.c: In function ‘pnv_smp_cpu_kill_self’:
>>  arch/powerpc/platforms/powernv/smp.c:276:16: error: suggest braces around
>>         empty body in an ‘if’ statement [-Werror=empty-body]
>>    276 |      cpu, srr1);
>>        |                ^
>>  cc1: all warnings being treated as errors
>>
>> The full context is this block:
>>
>>  if (srr1 && !generic_check_cpu_restart(cpu))
>>         DBG("CPU%d Unexpected exit while offline srr1=%lx!\n",
>>                         cpu, srr1);
>>
>> When building with DEBUG undefined DBG() expands to nothing and GCC emits
>> the warning due to the lack of braces around an empty statement.
>>
>> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
>> ---
>> We could add the braces too. That might even be better since it's a multi-line
>> if block even though it's only a single statement.
>
> Or you could put it all on one line, now that our 120 line overlords
> have taken over.

Yeah I think that one may as well be one line.

> Reviewed-by: Joel Stanley <joel@jms.id.au>
>
> Messy:
>
> $ git grep "define DBG(" arch/powerpc/ |grep -v print
> arch/powerpc/kernel/crash_dump.c:#define DBG(fmt...)
> arch/powerpc/kernel/iommu.c:#define DBG(...)
> arch/powerpc/kernel/legacy_serial.c:#define DBG(fmt...) do { } while(0)
> arch/powerpc/kernel/prom.c:#define DBG(fmt...)
..

Yeah, gross old cruft.

The vast majority of those should just be replaced with pr_devel()
and/or pr_debug().

The pnv_smp_cpu_kill_self() case is one where we probably do want to
stick with udbg_printf(), because I don't think it's kosher to call
printk() from an offline CPU.

cheers

^ permalink raw reply

* Re: [PATCH v2] powerpc: Warn about use of smt_snooze_delay
From: Michael Ellerman @ 2020-08-04 11:59 UTC (permalink / raw)
  To: Joel Stanley, linuxppc-dev; +Cc: Tyrel Datwyler, Gautham R Shenoy
In-Reply-To: <20200630015935.2675676-1-joel@jms.id.au>

Joel Stanley <joel@jms.id.au> writes:
> It's not done anything for a long time. Save the percpu variable, and
> emit a warning to remind users to not expect it to do anything.
>
> Fixes: 3fa8cad82b94 ("powerpc/pseries/cpuidle: smt-snooze-delay cleanup.")
> Cc: stable@vger.kernel.org # v3.14
> Signed-off-by: Joel Stanley <joel@jms.id.au>
> --
> v2:
>  Use pr_warn instead of WARN
>  Reword and print proccess name with pid in message
>  Leave CPU_FTR_SMT test in
>  Add Fixes line
>
> mpe, if you don't agree then feel free to drop the cc stable.
>
> Testing 'ppc64_cpu --smt=off' on a 24 core / 4 SMT system it's quite noisy
> as the online/offline loop that ppc64_cpu runs is slow.

Hmm, that is pretty spammy.

I know I suggested the ratelimit, but I was thinking it would print once
for each ppc64_cpu invocation, not ~30 times.

How about pr_warn_once(), that should still be sufficient for people to
notice if they're looking for it.

...
> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> index 571b3259697e..ba6d4cee19ef 100644
> --- a/arch/powerpc/kernel/sysfs.c
> +++ b/arch/powerpc/kernel/sysfs.c
> @@ -32,29 +32,26 @@
>  
>  static DEFINE_PER_CPU(struct cpu, cpu_devices);
>  
> -/*
> - * SMT snooze delay stuff, 64-bit only for now
> - */
> -
>  #ifdef CONFIG_PPC64
>  
> -/* Time in microseconds we delay before sleeping in the idle loop */
> -static DEFINE_PER_CPU(long, smt_snooze_delay) = { 100 };
> +/*
> + * Snooze delay has not been hooked up since 3fa8cad82b94 ("powerpc/pseries/cpuidle:
> + * smt-snooze-delay cleanup.") and has been broken even longer. As was foretold in
> + * 2014:
> + *
> + *  "ppc64_util currently utilises it. Once we fix ppc64_util, propose to clean
> + *  up the kernel code."
> + *
> + * At some point in the future this code should be removed.
> + */
>  
>  static ssize_t store_smt_snooze_delay(struct device *dev,
>  				      struct device_attribute *attr,
>  				      const char *buf,
>  				      size_t count)
>  {
> -	struct cpu *cpu = container_of(dev, struct cpu, dev);
> -	ssize_t ret;
> -	long snooze;
> -
> -	ret = sscanf(buf, "%ld", &snooze);
> -	if (ret != 1)
> -		return -EINVAL;
> -
> -	per_cpu(smt_snooze_delay, cpu->dev.id) = snooze;
> +	pr_warn_ratelimited("%s (%d) used unsupported smt_snooze_delay, this has no effect\n",
> +			    current->comm, current->pid);

Can we make this:

	"%s (%d) stored to unsupported smt_snooze_delay, which has no effect.\n",


>  	return count;
>  }
>  
> @@ -62,9 +59,9 @@ static ssize_t show_smt_snooze_delay(struct device *dev,
>  				     struct device_attribute *attr,
>  				     char *buf)
>  {
> -	struct cpu *cpu = container_of(dev, struct cpu, dev);
> -
> -	return sprintf(buf, "%ld\n", per_cpu(smt_snooze_delay, cpu->dev.id));
> +	pr_warn_ratelimited("%s (%d) used unsupported smt_snooze_delay, this has no effect\n",
> +			    current->comm, current->pid);

It has as much effect as it ever did :)

So maybe:

	"%s (%d) read from unsupported smt_snooze_delay.\n",


I can do those changes when applying if you like rather than making you
do a v3.

cheers

^ permalink raw reply

* [PATCH v9 5/5] powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32
From: Christophe Leroy @ 2020-08-04 11:47 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, nathanl,
	anton
  Cc: linux-arch, arnd, linux-kernel, luto, tglx, vincenzo.frascino,
	linuxppc-dev
In-Reply-To: <cover.1596541334.git.christophe.leroy@csgroup.eu>

Provides __kernel_clock_gettime64() on vdso32. This is the
64 bits version of __kernel_clock_gettime() which is
y2038 compliant.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/kernel/vdso32/gettimeofday.S  | 9 +++++++++
 arch/powerpc/kernel/vdso32/vdso32.lds.S    | 1 +
 arch/powerpc/kernel/vdso32/vgettimeofday.c | 6 ++++++
 3 files changed, 16 insertions(+)

diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S b/arch/powerpc/kernel/vdso32/gettimeofday.S
index fd7b01c51281..a6e29f880e0e 100644
--- a/arch/powerpc/kernel/vdso32/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso32/gettimeofday.S
@@ -35,6 +35,15 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
 	cvdso_call __c_kernel_clock_gettime
 V_FUNCTION_END(__kernel_clock_gettime)
 
+/*
+ * Exact prototype of clock_gettime64()
+ *
+ * int __kernel_clock_gettime64(clockid_t clock_id, struct __timespec64 *ts);
+ *
+ */
+V_FUNCTION_BEGIN(__kernel_clock_gettime64)
+	cvdso_call __c_kernel_clock_gettime64
+V_FUNCTION_END(__kernel_clock_gettime64)
 
 /*
  * Exact prototype of clock_getres()
diff --git a/arch/powerpc/kernel/vdso32/vdso32.lds.S b/arch/powerpc/kernel/vdso32/vdso32.lds.S
index 4c985467a668..582c5b046cc9 100644
--- a/arch/powerpc/kernel/vdso32/vdso32.lds.S
+++ b/arch/powerpc/kernel/vdso32/vdso32.lds.S
@@ -148,6 +148,7 @@ VERSION
 #ifndef CONFIG_PPC_BOOK3S_601
 		__kernel_gettimeofday;
 		__kernel_clock_gettime;
+		__kernel_clock_gettime64;
 		__kernel_clock_getres;
 		__kernel_time;
 		__kernel_get_tbfreq;
diff --git a/arch/powerpc/kernel/vdso32/vgettimeofday.c b/arch/powerpc/kernel/vdso32/vgettimeofday.c
index 0b9ab4c22ef2..f7f71fecf4ed 100644
--- a/arch/powerpc/kernel/vdso32/vgettimeofday.c
+++ b/arch/powerpc/kernel/vdso32/vgettimeofday.c
@@ -11,6 +11,12 @@ int __c_kernel_clock_gettime(clockid_t clock, struct old_timespec32 *ts,
 	return __cvdso_clock_gettime32_data(vd, clock, ts);
 }
 
+int __c_kernel_clock_gettime64(clockid_t clock, struct __kernel_timespec *ts,
+			       const struct vdso_data *vd)
+{
+	return __cvdso_clock_gettime_data(vd, clock, ts);
+}
+
 int __c_kernel_gettimeofday(struct __kernel_old_timeval *tv, struct timezone *tz,
 			    const struct vdso_data *vd)
 {
-- 
2.25.0


^ permalink raw reply related

* [PATCH v9 4/5] powerpc/vdso: Switch VDSO to generic C implementation.
From: Christophe Leroy @ 2020-08-04 11:47 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, nathanl,
	anton
  Cc: linux-arch, arnd, linux-kernel, luto, tglx, vincenzo.frascino,
	linuxppc-dev
In-Reply-To: <cover.1596541334.git.christophe.leroy@csgroup.eu>

For VDSO32 on PPC64, we create a fake 32 bits config, on the same
principle as MIPS architecture, in order to get the correct parts of
the different asm header files.

With the C VDSO, the performance is slightly lower, but it is worth
it as it will ease maintenance and evolution, and also brings clocks
that are not supported with the ASM VDSO.

On an 8xx at 132 MHz, vdsotest with the ASM VDSO:
gettimeofday:    vdso: 828 nsec/call
clock-getres-realtime-coarse:    vdso: 391 nsec/call
clock-gettime-realtime-coarse:    vdso: 614 nsec/call
clock-getres-realtime:    vdso: 460 nsec/call
clock-gettime-realtime:    vdso: 876 nsec/call
clock-getres-monotonic-coarse:    vdso: 399 nsec/call
clock-gettime-monotonic-coarse:    vdso: 691 nsec/call
clock-getres-monotonic:    vdso: 460 nsec/call
clock-gettime-monotonic:    vdso: 1026 nsec/call

On an 8xx at 132 MHz, vdsotest with the C VDSO:
gettimeofday:    vdso: 955 nsec/call
clock-getres-realtime-coarse:    vdso: 545 nsec/call
clock-gettime-realtime-coarse:    vdso: 592 nsec/call
clock-getres-realtime:    vdso: 545 nsec/call
clock-gettime-realtime:    vdso: 941 nsec/call
clock-getres-monotonic-coarse:    vdso: 545 nsec/call
clock-gettime-monotonic-coarse:    vdso: 591 nsec/call
clock-getres-monotonic:    vdso: 545 nsec/call
clock-gettime-monotonic:    vdso: 940 nsec/call

It is even better for gettime with monotonic clocks.

Unsupported clocks with ASM VDSO:
clock-gettime-boottime:    vdso: 3851 nsec/call
clock-gettime-tai:    vdso: 3852 nsec/call
clock-gettime-monotonic-raw:    vdso: 3396 nsec/call

Same clocks with C VDSO:
clock-gettime-tai:    vdso: 941 nsec/call
clock-gettime-monotonic-raw:    vdso: 1001 nsec/call
clock-gettime-monotonic-coarse:    vdso: 591 nsec/call

On an 8321E at 333 MHz, vdsotest with the ASM VDSO:
gettimeofday:    vdso: 220 nsec/call
clock-getres-realtime-coarse:    vdso: 102 nsec/call
clock-gettime-realtime-coarse:    vdso: 178 nsec/call
clock-getres-realtime:    vdso: 129 nsec/call
clock-gettime-realtime:    vdso: 235 nsec/call
clock-getres-monotonic-coarse:    vdso: 105 nsec/call
clock-gettime-monotonic-coarse:    vdso: 208 nsec/call
clock-getres-monotonic:    vdso: 129 nsec/call
clock-gettime-monotonic:    vdso: 274 nsec/call

On an 8321E at 333 MHz, vdsotest with the C VDSO:
gettimeofday:    vdso: 272 nsec/call
clock-getres-realtime-coarse:    vdso: 160 nsec/call
clock-gettime-realtime-coarse:    vdso: 184 nsec/call
clock-getres-realtime:    vdso: 166 nsec/call
clock-gettime-realtime:    vdso: 281 nsec/call
clock-getres-monotonic-coarse:    vdso: 160 nsec/call
clock-gettime-monotonic-coarse:    vdso: 184 nsec/call
clock-getres-monotonic:    vdso: 169 nsec/call
clock-gettime-monotonic:    vdso: 275 nsec/call

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v9:
- Rebased (Impact on arch/powerpc/kernel/vdso??/Makefile

v7:
- Split out preparatory changes in a new preceding patch
- Added -fasynchronous-unwind-tables to CC flags.

v6:
- Added missing prototypes in asm/vdso/gettimeofday.h for __c_kernel_ functions.
- Using STACK_FRAME_OVERHEAD instead of INT_FRAME_SIZE
- Rebased on powerpc/merge as of 7 Apr 2020
- Fixed build failure with gcc 9
- Added a patch to create asm/vdso/processor.h and more cpu_relax() in it

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/Kconfig                       |   2 +
 arch/powerpc/include/asm/vdso/vsyscall.h   |  25 ++
 arch/powerpc/include/asm/vdso_datapage.h   |  40 +--
 arch/powerpc/kernel/asm-offsets.c          |  49 +---
 arch/powerpc/kernel/time.c                 |  91 +------
 arch/powerpc/kernel/vdso.c                 |   5 +-
 arch/powerpc/kernel/vdso32/Makefile        |  32 ++-
 arch/powerpc/kernel/vdso32/config-fake32.h |  34 +++
 arch/powerpc/kernel/vdso32/gettimeofday.S  | 291 +--------------------
 arch/powerpc/kernel/vdso64/Makefile        |  23 +-
 arch/powerpc/kernel/vdso64/gettimeofday.S  | 242 +----------------
 11 files changed, 143 insertions(+), 691 deletions(-)
 create mode 100644 arch/powerpc/include/asm/vdso/vsyscall.h
 create mode 100644 arch/powerpc/kernel/vdso32/config-fake32.h

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8c7656cc10eb..9977ab939b42 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -171,6 +171,7 @@ config PPC
 	select GENERIC_STRNCPY_FROM_USER
 	select GENERIC_STRNLEN_USER
 	select GENERIC_TIME_VSYSCALL
+	select GENERIC_GETTIMEOFDAY
 	select HAVE_ARCH_AUDITSYSCALL
 	select HAVE_ARCH_HUGE_VMAP		if PPC_BOOK3S_64 && PPC_RADIX_MMU
 	select HAVE_ARCH_JUMP_LABEL
@@ -202,6 +203,7 @@ config PPC
 	select HAVE_FUNCTION_GRAPH_TRACER
 	select HAVE_FUNCTION_TRACER
 	select HAVE_GCC_PLUGINS			if GCC_VERSION >= 50200   # plugin support on gcc <= 5.1 is buggy on PPC
+	select HAVE_GENERIC_VDSO
 	select HAVE_HW_BREAKPOINT		if PERF_EVENTS && (PPC_BOOK3S || PPC_8xx)
 	select HAVE_IDE
 	select HAVE_IOREMAP_PROT
diff --git a/arch/powerpc/include/asm/vdso/vsyscall.h b/arch/powerpc/include/asm/vdso/vsyscall.h
new file mode 100644
index 000000000000..c56a030c0623
--- /dev/null
+++ b/arch/powerpc/include/asm/vdso/vsyscall.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_VDSO_VSYSCALL_H
+#define __ASM_VDSO_VSYSCALL_H
+
+#ifndef __ASSEMBLY__
+
+#include <linux/timekeeper_internal.h>
+#include <asm/vdso_datapage.h>
+
+/*
+ * Update the vDSO data page to keep in sync with kernel timekeeping.
+ */
+static __always_inline
+struct vdso_data *__arch_get_k_vdso_data(void)
+{
+	return vdso_data->data;
+}
+#define __arch_get_k_vdso_data __arch_get_k_vdso_data
+
+/* The asm-generic header needs to be included after the definitions above */
+#include <asm-generic/vdso/vsyscall.h>
+
+#endif /* !__ASSEMBLY__ */
+
+#endif /* __ASM_VDSO_VSYSCALL_H */
diff --git a/arch/powerpc/include/asm/vdso_datapage.h b/arch/powerpc/include/asm/vdso_datapage.h
index b9ef6cf50ea5..c4d320504d26 100644
--- a/arch/powerpc/include/asm/vdso_datapage.h
+++ b/arch/powerpc/include/asm/vdso_datapage.h
@@ -36,6 +36,7 @@
 
 #include <linux/unistd.h>
 #include <linux/time.h>
+#include <vdso/datapage.h>
 
 #define SYSCALL_MAP_SIZE      ((NR_syscalls + 31) / 32)
 
@@ -45,7 +46,7 @@
 
 #ifdef CONFIG_PPC64
 
-struct vdso_data {
+struct vdso_arch_data {
 	__u8  eye_catcher[16];		/* Eyecatcher: SYSTEMCFG:PPC64	0x00 */
 	struct {			/* Systemcfg version numbers	     */
 		__u32 major;		/* Major number			0x10 */
@@ -59,13 +60,13 @@ struct vdso_data {
 	__u32 processor;		/* Processor type		0x1C */
 	__u64 processorCount;		/* # of physical processors	0x20 */
 	__u64 physicalMemorySize;	/* Size of real memory(B)	0x28 */
-	__u64 tb_orig_stamp;		/* Timebase at boot		0x30 */
+	__u64 tb_orig_stamp;		/* (NU) Timebase at boot	0x30 */
 	__u64 tb_ticks_per_sec;		/* Timebase tics / sec		0x38 */
-	__u64 tb_to_xs;			/* Inverse of TB to 2^20	0x40 */
-	__u64 stamp_xsec;		/*				0x48 */
-	__u64 tb_update_count;		/* Timebase atomicity ctr	0x50 */
-	__u32 tz_minuteswest;		/* Minutes west of Greenwich	0x58 */
-	__u32 tz_dsttime;		/* Type of dst correction	0x5C */
+	__u64 tb_to_xs;			/* (NU) Inverse of TB to 2^20	0x40 */
+	__u64 stamp_xsec;		/* (NU)				0x48 */
+	__u64 tb_update_count;		/* (NU) Timebase atomicity ctr	0x50 */
+	__u32 tz_minuteswest;		/* (NU) Min. west of Greenwich	0x58 */
+	__u32 tz_dsttime;		/* (NU) Type of dst correction	0x5C */
 	__u32 dcache_size;		/* L1 d-cache size		0x60 */
 	__u32 dcache_line_size;		/* L1 d-cache line size		0x64 */
 	__u32 icache_size;		/* L1 i-cache size		0x68 */
@@ -78,14 +79,10 @@ struct vdso_data {
 	__u32 icache_block_size;		/* L1 i-cache block size     */
 	__u32 dcache_log_block_size;		/* L1 d-cache log block size */
 	__u32 icache_log_block_size;		/* L1 i-cache log block size */
-	__u32 stamp_sec_fraction;		/* fractional seconds of stamp_xtime */
-	__s32 wtom_clock_nsec;			/* Wall to monotonic clock nsec */
-	__s64 wtom_clock_sec;			/* Wall to monotonic clock sec */
-	__s64 stamp_xtime_sec;			/* xtime secs as at tb_orig_stamp */
-	__s64 stamp_xtime_nsec;			/* xtime nsecs as at tb_orig_stamp */
-	__u32 hrtimer_res;			/* hrtimer resolution */
    	__u32 syscall_map_64[SYSCALL_MAP_SIZE]; /* map of syscalls  */
    	__u32 syscall_map_32[SYSCALL_MAP_SIZE]; /* map of syscalls */
+
+	struct vdso_data data[CS_BASES];
 };
 
 #else /* CONFIG_PPC64 */
@@ -93,26 +90,15 @@ struct vdso_data {
 /*
  * And here is the simpler 32 bits version
  */
-struct vdso_data {
-	__u64 tb_orig_stamp;		/* Timebase at boot		0x30 */
+struct vdso_arch_data {
 	__u64 tb_ticks_per_sec;		/* Timebase tics / sec		0x38 */
-	__u64 tb_to_xs;			/* Inverse of TB to 2^20	0x40 */
-	__u64 stamp_xsec;		/*				0x48 */
-	__u32 tb_update_count;		/* Timebase atomicity ctr	0x50 */
-	__u32 tz_minuteswest;		/* Minutes west of Greenwich	0x58 */
-	__u32 tz_dsttime;		/* Type of dst correction	0x5C */
-	__s32 wtom_clock_sec;			/* Wall to monotonic clock */
-	__s32 wtom_clock_nsec;
-	__s32 stamp_xtime_sec;		/* xtime seconds as at tb_orig_stamp */
-	__s32 stamp_xtime_nsec;		/* xtime nsecs as at tb_orig_stamp */
-	__u32 stamp_sec_fraction;	/* fractional seconds of stamp_xtime */
-	__u32 hrtimer_res;		/* hrtimer resolution */
    	__u32 syscall_map_32[SYSCALL_MAP_SIZE]; /* map of syscalls */
+	struct vdso_data data[CS_BASES];
 };
 
 #endif /* CONFIG_PPC64 */
 
-extern struct vdso_data *vdso_data;
+extern struct vdso_arch_data *vdso_data;
 
 #else /* __ASSEMBLY__ */
 
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 8711c2164b45..684260186dbf 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -397,47 +397,16 @@ int main(void)
 #endif /* ! CONFIG_PPC64 */
 
 	/* datapage offsets for use by vdso */
-	OFFSET(CFG_TB_ORIG_STAMP, vdso_data, tb_orig_stamp);
-	OFFSET(CFG_TB_TICKS_PER_SEC, vdso_data, tb_ticks_per_sec);
-	OFFSET(CFG_TB_TO_XS, vdso_data, tb_to_xs);
-	OFFSET(CFG_TB_UPDATE_COUNT, vdso_data, tb_update_count);
-	OFFSET(CFG_TZ_MINUTEWEST, vdso_data, tz_minuteswest);
-	OFFSET(CFG_TZ_DSTTIME, vdso_data, tz_dsttime);
-	OFFSET(CFG_SYSCALL_MAP32, vdso_data, syscall_map_32);
-	OFFSET(WTOM_CLOCK_SEC, vdso_data, wtom_clock_sec);
-	OFFSET(WTOM_CLOCK_NSEC, vdso_data, wtom_clock_nsec);
-	OFFSET(STAMP_XTIME_SEC, vdso_data, stamp_xtime_sec);
-	OFFSET(STAMP_XTIME_NSEC, vdso_data, stamp_xtime_nsec);
-	OFFSET(STAMP_SEC_FRAC, vdso_data, stamp_sec_fraction);
-	OFFSET(CLOCK_HRTIMER_RES, vdso_data, hrtimer_res);
+	OFFSET(VDSO_DATA_OFFSET, vdso_arch_data, data);
+	OFFSET(CFG_TB_TICKS_PER_SEC, vdso_arch_data, tb_ticks_per_sec);
+	OFFSET(CFG_SYSCALL_MAP32, vdso_arch_data, syscall_map_32);
 #ifdef CONFIG_PPC64
-	OFFSET(CFG_ICACHE_BLOCKSZ, vdso_data, icache_block_size);
-	OFFSET(CFG_DCACHE_BLOCKSZ, vdso_data, dcache_block_size);
-	OFFSET(CFG_ICACHE_LOGBLOCKSZ, vdso_data, icache_log_block_size);
-	OFFSET(CFG_DCACHE_LOGBLOCKSZ, vdso_data, dcache_log_block_size);
-	OFFSET(CFG_SYSCALL_MAP64, vdso_data, syscall_map_64);
-	OFFSET(TVAL64_TV_SEC, __kernel_old_timeval, tv_sec);
-	OFFSET(TVAL64_TV_USEC, __kernel_old_timeval, tv_usec);
-#endif
-	OFFSET(TSPC64_TV_SEC, __kernel_timespec, tv_sec);
-	OFFSET(TSPC64_TV_NSEC, __kernel_timespec, tv_nsec);
-	OFFSET(TVAL32_TV_SEC, old_timeval32, tv_sec);
-	OFFSET(TVAL32_TV_USEC, old_timeval32, tv_usec);
-	OFFSET(TSPC32_TV_SEC, old_timespec32, tv_sec);
-	OFFSET(TSPC32_TV_NSEC, old_timespec32, tv_nsec);
-	/* timeval/timezone offsets for use by vdso */
-	OFFSET(TZONE_TZ_MINWEST, timezone, tz_minuteswest);
-	OFFSET(TZONE_TZ_DSTTIME, timezone, tz_dsttime);
-
-	/* Other bits used by the vdso */
-	DEFINE(CLOCK_REALTIME, CLOCK_REALTIME);
-	DEFINE(CLOCK_MONOTONIC, CLOCK_MONOTONIC);
-	DEFINE(CLOCK_REALTIME_COARSE, CLOCK_REALTIME_COARSE);
-	DEFINE(CLOCK_MONOTONIC_COARSE, CLOCK_MONOTONIC_COARSE);
-	DEFINE(CLOCK_MAX, CLOCK_TAI);
-	DEFINE(NSEC_PER_SEC, NSEC_PER_SEC);
-	DEFINE(EINVAL, EINVAL);
-	DEFINE(KTIME_LOW_RES, KTIME_LOW_RES);
+	OFFSET(CFG_ICACHE_BLOCKSZ, vdso_arch_data, icache_block_size);
+	OFFSET(CFG_DCACHE_BLOCKSZ, vdso_arch_data, dcache_block_size);
+	OFFSET(CFG_ICACHE_LOGBLOCKSZ, vdso_arch_data, icache_log_block_size);
+	OFFSET(CFG_DCACHE_LOGBLOCKSZ, vdso_arch_data, dcache_log_block_size);
+	OFFSET(CFG_SYSCALL_MAP64, vdso_arch_data, syscall_map_64);
+#endif
 
 #ifdef CONFIG_BUG
 	DEFINE(BUG_ENTRY_SIZE, sizeof(struct bug_entry));
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 6fcae436ae51..b63b1f97a1b3 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -91,6 +91,7 @@ static struct clocksource clocksource_timebase = {
 	.flags        = CLOCK_SOURCE_IS_CONTINUOUS,
 	.mask         = CLOCKSOURCE_MASK(64),
 	.read         = timebase_read,
+	.vdso_clock_mode	= VDSO_CLOCKMODE_ARCHTIMER,
 };
 
 #define DECREMENTER_DEFAULT_MAX 0x7FFFFFFF
@@ -855,95 +856,6 @@ static notrace u64 timebase_read(struct clocksource *cs)
 	return (u64)get_tb();
 }
 
-
-void update_vsyscall(struct timekeeper *tk)
-{
-	struct timespec64 xt;
-	struct clocksource *clock = tk->tkr_mono.clock;
-	u32 mult = tk->tkr_mono.mult;
-	u32 shift = tk->tkr_mono.shift;
-	u64 cycle_last = tk->tkr_mono.cycle_last;
-	u64 new_tb_to_xs, new_stamp_xsec;
-	u64 frac_sec;
-
-	if (clock != &clocksource_timebase)
-		return;
-
-	xt.tv_sec = tk->xtime_sec;
-	xt.tv_nsec = (long)(tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift);
-
-	/* Make userspace gettimeofday spin until we're done. */
-	++vdso_data->tb_update_count;
-	smp_mb();
-
-	/*
-	 * This computes ((2^20 / 1e9) * mult) >> shift as a
-	 * 0.64 fixed-point fraction.
-	 * The computation in the else clause below won't overflow
-	 * (as long as the timebase frequency is >= 1.049 MHz)
-	 * but loses precision because we lose the low bits of the constant
-	 * in the shift.  Note that 19342813113834067 ~= 2^(20+64) / 1e9.
-	 * For a shift of 24 the error is about 0.5e-9, or about 0.5ns
-	 * over a second.  (Shift values are usually 22, 23 or 24.)
-	 * For high frequency clocks such as the 512MHz timebase clock
-	 * on POWER[6789], the mult value is small (e.g. 32768000)
-	 * and so we can shift the constant by 16 initially
-	 * (295147905179 ~= 2^(20+64-16) / 1e9) and then do the
-	 * remaining shifts after the multiplication, which gives a
-	 * more accurate result (e.g. with mult = 32768000, shift = 24,
-	 * the error is only about 1.2e-12, or 0.7ns over 10 minutes).
-	 */
-	if (mult <= 62500000 && clock->shift >= 16)
-		new_tb_to_xs = ((u64) mult * 295147905179ULL) >> (clock->shift - 16);
-	else
-		new_tb_to_xs = (u64) mult * (19342813113834067ULL >> clock->shift);
-
-	/*
-	 * Compute the fractional second in units of 2^-32 seconds.
-	 * The fractional second is tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift
-	 * in nanoseconds, so multiplying that by 2^32 / 1e9 gives
-	 * it in units of 2^-32 seconds.
-	 * We assume shift <= 32 because clocks_calc_mult_shift()
-	 * generates shift values in the range 0 - 32.
-	 */
-	frac_sec = tk->tkr_mono.xtime_nsec << (32 - shift);
-	do_div(frac_sec, NSEC_PER_SEC);
-
-	/*
-	 * Work out new stamp_xsec value for any legacy users of systemcfg.
-	 * stamp_xsec is in units of 2^-20 seconds.
-	 */
-	new_stamp_xsec = frac_sec >> 12;
-	new_stamp_xsec += tk->xtime_sec * XSEC_PER_SEC;
-
-	/*
-	 * tb_update_count is used to allow the userspace gettimeofday code
-	 * to assure itself that it sees a consistent view of the tb_to_xs and
-	 * stamp_xsec variables.  It reads the tb_update_count, then reads
-	 * tb_to_xs and stamp_xsec and then reads tb_update_count again.  If
-	 * the two values of tb_update_count match and are even then the
-	 * tb_to_xs and stamp_xsec values are consistent.  If not, then it
-	 * loops back and reads them again until this criteria is met.
-	 */
-	vdso_data->tb_orig_stamp = cycle_last;
-	vdso_data->stamp_xsec = new_stamp_xsec;
-	vdso_data->tb_to_xs = new_tb_to_xs;
-	vdso_data->wtom_clock_sec = tk->wall_to_monotonic.tv_sec;
-	vdso_data->wtom_clock_nsec = tk->wall_to_monotonic.tv_nsec;
-	vdso_data->stamp_xtime_sec = xt.tv_sec;
-	vdso_data->stamp_xtime_nsec = xt.tv_nsec;
-	vdso_data->stamp_sec_fraction = frac_sec;
-	vdso_data->hrtimer_res = hrtimer_resolution;
-	smp_wmb();
-	++(vdso_data->tb_update_count);
-}
-
-void update_vsyscall_tz(void)
-{
-	vdso_data->tz_minuteswest = sys_tz.tz_minuteswest;
-	vdso_data->tz_dsttime = sys_tz.tz_dsttime;
-}
-
 static void __init clocksource_init(void)
 {
 	struct clocksource *clock;
@@ -1113,7 +1025,6 @@ void __init time_init(void)
 		sys_tz.tz_dsttime = 0;
 	}
 
-	vdso_data->tb_update_count = 0;
 	vdso_data->tb_ticks_per_sec = tb_ticks_per_sec;
 
 	/* initialise and enable the large decrementer (if we have one) */
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 8dad44262e75..23208a051af5 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -17,6 +17,7 @@
 #include <linux/elf.h>
 #include <linux/security.h>
 #include <linux/memblock.h>
+#include <vdso/datapage.h>
 
 #include <asm/processor.h>
 #include <asm/mmu.h>
@@ -70,10 +71,10 @@ static int vdso_ready;
  * with it, it will become dynamically allocated
  */
 static union {
-	struct vdso_data	data;
+	struct vdso_arch_data	data;
 	u8			page[PAGE_SIZE];
 } vdso_data_store __page_aligned_data;
-struct vdso_data *vdso_data = &vdso_data_store.data;
+struct vdso_arch_data *vdso_data = &vdso_data_store.data;
 
 /* Format of the patch table */
 struct vdso_patch_def
diff --git a/arch/powerpc/kernel/vdso32/Makefile b/arch/powerpc/kernel/vdso32/Makefile
index 87ab1152d5ce..b922044236dd 100644
--- a/arch/powerpc/kernel/vdso32/Makefile
+++ b/arch/powerpc/kernel/vdso32/Makefile
@@ -2,7 +2,23 @@
 
 # List of files in the vdso, has to be asm only for now
 
-obj-vdso32 = sigtramp.o gettimeofday.o datapage.o cacheflush.o note.o getcpu.o
+ARCH_REL_TYPE_ABS := R_PPC_JUMP_SLOT|R_PPC_GLOB_DAT|R_PPC_ADDR32|R_PPC_ADDR24|R_PPC_ADDR16|R_PPC_ADDR16_LO|R_PPC_ADDR16_HI|R_PPC_ADDR16_HA|R_PPC_ADDR14|R_PPC_ADDR14_BRTAKEN|R_PPC_ADDR14_BRNTAKEN
+include $(srctree)/lib/vdso/Makefile
+
+obj-vdso32 = sigtramp.o datapage.o cacheflush.o note.o getcpu.o $(obj-vdso32-y)
+obj-vdso32 += gettimeofday.o
+
+ifneq ($(c-gettimeofday-y),)
+  ifdef CONFIG_PPC64
+    CFLAGS_vgettimeofday.o += -include $(srctree)/$(src)/config-fake32.h
+  endif
+  CFLAGS_vgettimeofday.o += -include $(c-gettimeofday-y)
+  CFLAGS_vgettimeofday.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
+  CFLAGS_vgettimeofday.o += $(call cc-option, -fno-stack-protector)
+  CFLAGS_vgettimeofday.o += -DDISABLE_BRANCH_PROFILING
+  CFLAGS_vgettimeofday.o += -ffreestanding -fasynchronous-unwind-tables
+  CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE)
+endif
 
 # Build rules
 
@@ -15,6 +31,7 @@ endif
 CC32FLAGS :=
 ifdef CONFIG_PPC64
 CC32FLAGS += -m32
+KBUILD_CFLAGS := $(filter-out -mcmodel=medium,$(KBUILD_CFLAGS))
 endif
 
 targets := $(obj-vdso32) vdso32.so vdso32.so.dbg
@@ -23,6 +40,7 @@ obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32))
 GCOV_PROFILE := n
 KCOV_INSTRUMENT := n
 UBSAN_SANITIZE := n
+KASAN_SANITIZE := n
 
 ccflags-y := -shared -fno-common -fno-builtin -nostdlib \
 	-Wl,-soname=linux-vdso32.so.1 -Wl,--hash-style=both
@@ -36,8 +54,8 @@ CPPFLAGS_vdso32.lds += -P -C -Upowerpc
 $(obj)/vdso32_wrapper.o : $(obj)/vdso32.so
 
 # link rule for the .so file, .lds has to be first
-$(obj)/vdso32.so.dbg: $(src)/vdso32.lds $(obj-vdso32) FORCE
-	$(call if_changed,vdso32ld)
+$(obj)/vdso32.so.dbg: $(src)/vdso32.lds $(obj-vdso32) $(obj)/vgettimeofday.o FORCE
+	$(call if_changed,vdso32ld_and_check)
 
 # strip rule for the .so file
 $(obj)/%.so: OBJCOPYFLAGS := -S
@@ -47,12 +65,16 @@ $(obj)/%.so: $(obj)/%.so.dbg FORCE
 # assembly rules for the .S files
 $(obj-vdso32): %.o: %.S FORCE
 	$(call if_changed_dep,vdso32as)
+$(obj)/vgettimeofday.o: %.o: %.c FORCE
+	$(call if_changed_dep,vdso32cc)
 
 # actual build commands
-quiet_cmd_vdso32ld = VDSO32L $@
-      cmd_vdso32ld = $(VDSOCC) $(c_flags) $(CC32FLAGS) -o $@ $(call cc-ldoption, -Wl$(comma)--orphan-handling=warn) -Wl,-T$(filter %.lds,$^) $(filter %.o,$^)
+quiet_cmd_vdso32ld_and_check = VDSO32L $@
+      cmd_vdso32ld_and_check = $(VDSOCC) $(c_flags) $(CC32FLAGS) -o $@ $(call cc-ldoption, -Wl$(comma)--orphan-handling=warn) -Wl,-T$(filter %.lds,$^) $(filter %.o,$^) ; $(cmd_vdso_check)
 quiet_cmd_vdso32as = VDSO32A $@
       cmd_vdso32as = $(VDSOCC) $(a_flags) $(CC32FLAGS) -c -o $@ $<
+quiet_cmd_vdso32cc = VDSO32C $@
+      cmd_vdso32cc = $(VDSOCC) $(c_flags) $(CC32FLAGS) -c -o $@ $<
 
 # install commands for the unstripped file
 quiet_cmd_vdso_install = INSTALL $@
diff --git a/arch/powerpc/kernel/vdso32/config-fake32.h b/arch/powerpc/kernel/vdso32/config-fake32.h
new file mode 100644
index 000000000000..e16041fc15c9
--- /dev/null
+++ b/arch/powerpc/kernel/vdso32/config-fake32.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * In case of a 32 bit VDSO for a 64 bit kernel fake a 32 bit kernel
+ * configuration.
+ */
+
+#undef CONFIG_PPC64
+#undef CONFIG_64BIT
+#define CONFIG_PPC32
+#define CONFIG_32BIT 1
+#define CONFIG_GENERIC_ATOMIC64 1
+
+#ifdef CONFIG_PPC_BOOK3S_64
+#undef CONFIG_PPC_BOOK3S_64
+#undef CONFIG_PPC_PSERIES
+#define CONFIG_PPC_BOOK3S_32
+#else
+#define CONFIG_PPC_MMU_NOHASH_32
+#define CONFIG_FSL_BOOKE
+#endif
+
+#define CONFIG_TASK_SIZE	0
+#undef CONFIG_MMIOWB
+#undef CONFIG_PPC_SPLPAR
+#undef CONFIG_SPARSEMEM
+#undef CONFIG_PGTABLE_LEVELS
+#define CONFIG_PGTABLE_LEVELS	2
+#undef CONFIG_TRANSPARENT_HUGEPAGE
+#undef CONFIG_SPARSEMEM_VMEMMAP
+#undef CONFIG_FLATMEM
+#define CONFIG_FLATMEM
+#undef CONFIG_PPC_INDIRECT_MMIO
+#undef CONFIG_PPC_INDIRECT_PIO
+#undef CONFIG_EEH
diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S b/arch/powerpc/kernel/vdso32/gettimeofday.S
index e7f8f9f1b3f4..fd7b01c51281 100644
--- a/arch/powerpc/kernel/vdso32/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso32/gettimeofday.S
@@ -12,13 +12,7 @@
 #include <asm/vdso_datapage.h>
 #include <asm/asm-offsets.h>
 #include <asm/unistd.h>
-
-/* Offset for the low 32-bit part of a field of long type */
-#ifdef CONFIG_PPC64
-#define LOPART	4
-#else
-#define LOPART	0
-#endif
+#include <asm/vdso/gettimeofday.h>
 
 	.text
 /*
@@ -28,32 +22,7 @@
  *
  */
 V_FUNCTION_BEGIN(__kernel_gettimeofday)
-  .cfi_startproc
-	mflr	r12
-  .cfi_register lr,r12
-
-	mr.	r10,r3			/* r10 saves tv */
-	mr	r11,r4			/* r11 saves tz */
-	get_datapage	r9, r0
-	beq	3f
-	LOAD_REG_IMMEDIATE(r7, 1000000)	/* load up USEC_PER_SEC */
-	bl	__do_get_tspec@local	/* get sec/usec from tb & kernel */
-	stw	r3,TVAL32_TV_SEC(r10)
-	stw	r4,TVAL32_TV_USEC(r10)
-
-3:	cmplwi	r11,0			/* check if tz is NULL */
-	mtlr	r12
-	crclr	cr0*4+so
-	li	r3,0
-	beqlr
-
-	lwz	r4,CFG_TZ_MINUTEWEST(r9)/* fill tz */
-	lwz	r5,CFG_TZ_DSTTIME(r9)
-	stw	r4,TZONE_TZ_MINWEST(r11)
-	stw	r5,TZONE_TZ_DSTTIME(r11)
-
-	blr
-  .cfi_endproc
+	cvdso_call __c_kernel_gettimeofday
 V_FUNCTION_END(__kernel_gettimeofday)
 
 /*
@@ -63,127 +32,7 @@ V_FUNCTION_END(__kernel_gettimeofday)
  *
  */
 V_FUNCTION_BEGIN(__kernel_clock_gettime)
-  .cfi_startproc
-	/* Check for supported clock IDs */
-	cmpli	cr0,r3,CLOCK_REALTIME
-	cmpli	cr1,r3,CLOCK_MONOTONIC
-	cror	cr0*4+eq,cr0*4+eq,cr1*4+eq
-
-	cmpli	cr5,r3,CLOCK_REALTIME_COARSE
-	cmpli	cr6,r3,CLOCK_MONOTONIC_COARSE
-	cror	cr5*4+eq,cr5*4+eq,cr6*4+eq
-
-	cror	cr0*4+eq,cr0*4+eq,cr5*4+eq
-	bne	cr0, .Lgettime_fallback
-
-	mflr	r12			/* r12 saves lr */
-  .cfi_register lr,r12
-	mr	r11,r4			/* r11 saves tp */
-	get_datapage	r9, r0
-	LOAD_REG_IMMEDIATE(r7, NSEC_PER_SEC)	/* load up NSEC_PER_SEC */
-	beq	cr5, .Lcoarse_clocks
-.Lprecise_clocks:
-	bl	__do_get_tspec@local	/* get sec/nsec from tb & kernel */
-	bne	cr1, .Lfinish		/* not monotonic -> all done */
-
-	/*
-	 * CLOCK_MONOTONIC
-	 */
-
-	/* now we must fixup using wall to monotonic. We need to snapshot
-	 * that value and do the counter trick again. Fortunately, we still
-	 * have the counter value in r8 that was returned by __do_get_xsec.
-	 * At this point, r3,r4 contain our sec/nsec values, r5 and r6
-	 * can be used, r7 contains NSEC_PER_SEC.
-	 */
-
-	lwz	r5,(WTOM_CLOCK_SEC+LOPART)(r9)
-	lwz	r6,WTOM_CLOCK_NSEC(r9)
-
-	/* We now have our offset in r5,r6. We create a fake dependency
-	 * on that value and re-check the counter
-	 */
-	or	r0,r6,r5
-	xor	r0,r0,r0
-	add	r9,r9,r0
-	lwz	r0,(CFG_TB_UPDATE_COUNT+LOPART)(r9)
-        cmpl    cr0,r8,r0		/* check if updated */
-	bne-	.Lprecise_clocks
-	b	.Lfinish_monotonic
-
-	/*
-	 * For coarse clocks we get data directly from the vdso data page, so
-	 * we don't need to call __do_get_tspec, but we still need to do the
-	 * counter trick.
-	 */
-.Lcoarse_clocks:
-	lwz	r8,(CFG_TB_UPDATE_COUNT+LOPART)(r9)
-	andi.	r0,r8,1                 /* pending update ? loop */
-	bne-	.Lcoarse_clocks
-	add	r9,r9,r0		/* r0 is already 0 */
-
-	/*
-	 * CLOCK_REALTIME_COARSE, below values are needed for MONOTONIC_COARSE
-	 * too
-	 */
-	lwz	r3,STAMP_XTIME_SEC+LOPART(r9)
-	lwz	r4,STAMP_XTIME_NSEC+LOPART(r9)
-	bne	cr6,1f
-
-	/* CLOCK_MONOTONIC_COARSE */
-	lwz	r5,(WTOM_CLOCK_SEC+LOPART)(r9)
-	lwz	r6,WTOM_CLOCK_NSEC(r9)
-
-	/* check if counter has updated */
-	or	r0,r6,r5
-1:	or	r0,r0,r3
-	or	r0,r0,r4
-	xor	r0,r0,r0
-	add	r3,r3,r0
-	lwz	r0,CFG_TB_UPDATE_COUNT+LOPART(r9)
-	cmpl	cr0,r0,r8               /* check if updated */
-	bne-	.Lcoarse_clocks
-
-	/* Counter has not updated, so continue calculating proper values for
-	 * sec and nsec if monotonic coarse, or just return with the proper
-	 * values for realtime.
-	 */
-	bne	cr6, .Lfinish
-
-	/* Calculate and store result. Note that this mimics the C code,
-	 * which may cause funny results if nsec goes negative... is that
-	 * possible at all ?
-	 */
-.Lfinish_monotonic:
-	add	r3,r3,r5
-	add	r4,r4,r6
-	cmpw	cr0,r4,r7
-	cmpwi	cr1,r4,0
-	blt	1f
-	subf	r4,r7,r4
-	addi	r3,r3,1
-1:	bge	cr1, .Lfinish
-	addi	r3,r3,-1
-	add	r4,r4,r7
-
-.Lfinish:
-	stw	r3,TSPC32_TV_SEC(r11)
-	stw	r4,TSPC32_TV_NSEC(r11)
-
-	mtlr	r12
-	crclr	cr0*4+so
-	li	r3,0
-	blr
-
-	/*
-	 * syscall fallback
-	 */
-.Lgettime_fallback:
-	li	r0,__NR_clock_gettime
-  .cfi_restore lr
-	sc
-	blr
-  .cfi_endproc
+	cvdso_call __c_kernel_clock_gettime
 V_FUNCTION_END(__kernel_clock_gettime)
 
 
@@ -194,37 +43,7 @@ V_FUNCTION_END(__kernel_clock_gettime)
  *
  */
 V_FUNCTION_BEGIN(__kernel_clock_getres)
-  .cfi_startproc
-	/* Check for supported clock IDs */
-	cmplwi	cr0, r3, CLOCK_MAX
-	cmpwi	cr1, r3, CLOCK_REALTIME_COARSE
-	cmpwi	cr7, r3, CLOCK_MONOTONIC_COARSE
-	bgt	cr0, 99f
-	LOAD_REG_IMMEDIATE(r5, KTIME_LOW_RES)
-	beq	cr1, 1f
-	beq	cr7, 1f
-
-	mflr	r12
-  .cfi_register lr,r12
-	get_datapage	r3, r0
-	lwz	r5, CLOCK_HRTIMER_RES(r3)
-	mtlr	r12
-1:	li	r3,0
-	cmpli	cr0,r4,0
-	crclr	cr0*4+so
-	beqlr
-	stw	r3,TSPC32_TV_SEC(r4)
-	stw	r5,TSPC32_TV_NSEC(r4)
-	blr
-
-	/*
-	 * syscall fallback
-	 */
-99:
-	li	r0,__NR_clock_getres
-	sc
-	blr
-  .cfi_endproc
+	cvdso_call __c_kernel_clock_getres
 V_FUNCTION_END(__kernel_clock_getres)
 
 
@@ -235,105 +54,5 @@ V_FUNCTION_END(__kernel_clock_getres)
  *
  */
 V_FUNCTION_BEGIN(__kernel_time)
-  .cfi_startproc
-	mflr	r12
-  .cfi_register lr,r12
-
-	mr	r11,r3			/* r11 holds t */
-	get_datapage	r9, r0
-
-	lwz	r3,STAMP_XTIME_SEC+LOPART(r9)
-
-	cmplwi	r11,0			/* check if t is NULL */
-	mtlr	r12
-	crclr	cr0*4+so
-	beqlr
-	stw	r3,0(r11)		/* store result at *t */
-	blr
-  .cfi_endproc
+	cvdso_call_time __c_kernel_time
 V_FUNCTION_END(__kernel_time)
-
-/*
- * This is the core of clock_gettime() and gettimeofday(),
- * it returns the current time in r3 (seconds) and r4.
- * On entry, r7 gives the resolution of r4, either USEC_PER_SEC
- * or NSEC_PER_SEC, giving r4 in microseconds or nanoseconds.
- * It expects the datapage ptr in r9 and doesn't clobber it.
- * It clobbers r0, r5 and r6.
- * On return, r8 contains the counter value that can be reused.
- * This clobbers cr0 but not any other cr field.
- */
-__do_get_tspec:
-  .cfi_startproc
-	/* Check for update count & load values. We use the low
-	 * order 32 bits of the update count
-	 */
-1:	lwz	r8,(CFG_TB_UPDATE_COUNT+LOPART)(r9)
-	andi.	r0,r8,1			/* pending update ? loop */
-	bne-	1b
-	xor	r0,r8,r8		/* create dependency */
-	add	r9,r9,r0
-
-	/* Load orig stamp (offset to TB) */
-	lwz	r5,CFG_TB_ORIG_STAMP(r9)
-	lwz	r6,(CFG_TB_ORIG_STAMP+4)(r9)
-
-	/* Get a stable TB value */
-2:	MFTBU(r3)
-	MFTBL(r4)
-	MFTBU(r0)
-	cmplw	cr0,r3,r0
-	bne-	2b
-
-	/* Subtract tb orig stamp and shift left 12 bits.
-	 */
-	subfc	r4,r6,r4
-	subfe	r0,r5,r3
-	slwi	r0,r0,12
-	rlwimi.	r0,r4,12,20,31
-	slwi	r4,r4,12
-
-	/*
-	 * Load scale factor & do multiplication.
-	 * We only use the high 32 bits of the tb_to_xs value.
-	 * Even with a 1GHz timebase clock, the high 32 bits of
-	 * tb_to_xs will be at least 4 million, so the error from
-	 * ignoring the low 32 bits will be no more than 0.25ppm.
-	 * The error will just make the clock run very very slightly
-	 * slow until the next time the kernel updates the VDSO data,
-	 * at which point the clock will catch up to the kernel's value,
-	 * so there is no long-term error accumulation.
-	 */
-	lwz	r5,CFG_TB_TO_XS(r9)	/* load values */
-	mulhwu	r4,r4,r5
-	li	r3,0
-
-	beq+	4f			/* skip high part computation if 0 */
-	mulhwu	r3,r0,r5
-	mullw	r5,r0,r5
-	addc	r4,r4,r5
-	addze	r3,r3
-4:
-	/* At this point, we have seconds since the xtime stamp
-	 * as a 32.32 fixed-point number in r3 and r4.
-	 * Load & add the xtime stamp.
-	 */
-	lwz	r5,STAMP_XTIME_SEC+LOPART(r9)
-	lwz	r6,STAMP_SEC_FRAC(r9)
-	addc	r4,r4,r6
-	adde	r3,r3,r5
-
-	/* We create a fake dependency on the result in r3/r4
-	 * and re-check the counter
-	 */
-	or	r6,r4,r3
-	xor	r0,r6,r6
-	add	r9,r9,r0
-	lwz	r0,(CFG_TB_UPDATE_COUNT+LOPART)(r9)
-        cmplw	cr0,r8,r0		/* check if updated */
-	bne-	1b
-
-	mulhwu	r4,r4,r7		/* convert to micro or nanoseconds */
-
-	blr
-  .cfi_endproc
diff --git a/arch/powerpc/kernel/vdso64/Makefile b/arch/powerpc/kernel/vdso64/Makefile
index 38c317f25141..7890d889f9c5 100644
--- a/arch/powerpc/kernel/vdso64/Makefile
+++ b/arch/powerpc/kernel/vdso64/Makefile
@@ -1,8 +1,20 @@
 # SPDX-License-Identifier: GPL-2.0
 # List of files in the vdso, has to be asm only for now
 
+ARCH_REL_TYPE_ABS := R_PPC_JUMP_SLOT|R_PPC_GLOB_DAT|R_PPC_ADDR32|R_PPC_ADDR24|R_PPC_ADDR16|R_PPC_ADDR16_LO|R_PPC_ADDR16_HI|R_PPC_ADDR16_HA|R_PPC_ADDR14|R_PPC_ADDR14_BRTAKEN|R_PPC_ADDR14_BRNTAKEN
+include $(srctree)/lib/vdso/Makefile
+
 obj-vdso64 = sigtramp.o gettimeofday.o datapage.o cacheflush.o note.o getcpu.o
 
+ifneq ($(c-gettimeofday-y),)
+  CFLAGS_vgettimeofday.o += -include $(c-gettimeofday-y)
+  CFLAGS_vgettimeofday.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
+  CFLAGS_vgettimeofday.o += $(call cc-option, -fno-stack-protector)
+  CFLAGS_vgettimeofday.o += -DDISABLE_BRANCH_PROFILING
+  CFLAGS_vgettimeofday.o += -ffreestanding -fasynchronous-unwind-tables
+  CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE)
+endif
+
 # Build rules
 
 targets := $(obj-vdso64) vdso64.so vdso64.so.dbg
@@ -11,6 +23,7 @@ obj-vdso64 := $(addprefix $(obj)/, $(obj-vdso64))
 GCOV_PROFILE := n
 KCOV_INSTRUMENT := n
 UBSAN_SANITIZE := n
+KASAN_SANITIZE := n
 
 ccflags-y := -shared -fno-common -fno-builtin -nostdlib \
 	-Wl,-soname=linux-vdso64.so.1 -Wl,--hash-style=both
@@ -20,12 +33,14 @@ obj-y += vdso64_wrapper.o
 extra-y += vdso64.lds
 CPPFLAGS_vdso64.lds += -P -C -U$(ARCH)
 
+$(obj)/vgettimeofday.o: %.o: %.c FORCE
+
 # Force dependency (incbin is bad)
 $(obj)/vdso64_wrapper.o : $(obj)/vdso64.so
 
 # link rule for the .so file, .lds has to be first
-$(obj)/vdso64.so.dbg: $(src)/vdso64.lds $(obj-vdso64) FORCE
-	$(call if_changed,vdso64ld)
+$(obj)/vdso64.so.dbg: $(src)/vdso64.lds $(obj-vdso64) $(obj)/vgettimeofday.o FORCE
+	$(call if_changed,vdso64ld_and_check)
 
 # strip rule for the .so file
 $(obj)/%.so: OBJCOPYFLAGS := -S
@@ -33,8 +48,8 @@ $(obj)/%.so: $(obj)/%.so.dbg FORCE
 	$(call if_changed,objcopy)
 
 # actual build commands
-quiet_cmd_vdso64ld = VDSO64L $@
-      cmd_vdso64ld = $(CC) $(c_flags) -o $@ -Wl,-T$(filter %.lds,$^) $(filter %.o,$^) $(call cc-ldoption, -Wl$(comma)--orphan-handling=warn)
+quiet_cmd_vdso64ld_and_check = VDSO64L $@
+      cmd_vdso64ld_and_check = $(CC) $(c_flags) -o $@ -Wl,-T$(filter %.lds,$^) $(filter %.o,$^) $(call cc-ldoption, -Wl$(comma)--orphan-handling=warn); $(cmd_vdso_check)
 
 # install commands for the unstripped file
 quiet_cmd_vdso_install = INSTALL $@
diff --git a/arch/powerpc/kernel/vdso64/gettimeofday.S b/arch/powerpc/kernel/vdso64/gettimeofday.S
index 20f8be40c653..d7a7bfb51081 100644
--- a/arch/powerpc/kernel/vdso64/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso64/gettimeofday.S
@@ -12,6 +12,7 @@
 #include <asm/vdso_datapage.h>
 #include <asm/asm-offsets.h>
 #include <asm/unistd.h>
+#include <asm/vdso/gettimeofday.h>
 
 	.text
 /*
@@ -21,31 +22,7 @@
  *
  */
 V_FUNCTION_BEGIN(__kernel_gettimeofday)
-  .cfi_startproc
-	mflr	r12
-  .cfi_register lr,r12
-
-	mr	r11,r3			/* r11 holds tv */
-	mr	r10,r4			/* r10 holds tz */
-	get_datapage	r3, r0
-	cmpldi	r11,0			/* check if tv is NULL */
-	beq	2f
-	lis	r7,1000000@ha		/* load up USEC_PER_SEC */
-	addi	r7,r7,1000000@l
-	bl	V_LOCAL_FUNC(__do_get_tspec) /* get sec/us from tb & kernel */
-	std	r4,TVAL64_TV_SEC(r11)	/* store sec in tv */
-	std	r5,TVAL64_TV_USEC(r11)	/* store usec in tv */
-2:	cmpldi	r10,0			/* check if tz is NULL */
-	beq	1f
-	lwz	r4,CFG_TZ_MINUTEWEST(r3)/* fill tz */
-	lwz	r5,CFG_TZ_DSTTIME(r3)
-	stw	r4,TZONE_TZ_MINWEST(r10)
-	stw	r5,TZONE_TZ_DSTTIME(r10)
-1:	mtlr	r12
-	crclr	cr0*4+so
-	li	r3,0			/* always success */
-	blr
-  .cfi_endproc
+	cvdso_call __c_kernel_gettimeofday
 V_FUNCTION_END(__kernel_gettimeofday)
 
 
@@ -56,120 +33,7 @@ V_FUNCTION_END(__kernel_gettimeofday)
  *
  */
 V_FUNCTION_BEGIN(__kernel_clock_gettime)
-  .cfi_startproc
-	/* Check for supported clock IDs */
-	cmpwi	cr0,r3,CLOCK_REALTIME
-	cmpwi	cr1,r3,CLOCK_MONOTONIC
-	cror	cr0*4+eq,cr0*4+eq,cr1*4+eq
-
-	cmpwi	cr5,r3,CLOCK_REALTIME_COARSE
-	cmpwi	cr6,r3,CLOCK_MONOTONIC_COARSE
-	cror	cr5*4+eq,cr5*4+eq,cr6*4+eq
-
-	cror	cr0*4+eq,cr0*4+eq,cr5*4+eq
-	bne	cr0,99f
-
-	mflr	r12			/* r12 saves lr */
-  .cfi_register lr,r12
-	mr	r11,r4			/* r11 saves tp */
-	get_datapage	r3, r0
-	lis	r7,NSEC_PER_SEC@h	/* want nanoseconds */
-	ori	r7,r7,NSEC_PER_SEC@l
-	beq	cr5,70f
-50:	bl	V_LOCAL_FUNC(__do_get_tspec)	/* get time from tb & kernel */
-	bne	cr1,80f			/* if not monotonic, all done */
-
-	/*
-	 * CLOCK_MONOTONIC
-	 */
-
-	/* now we must fixup using wall to monotonic. We need to snapshot
-	 * that value and do the counter trick again. Fortunately, we still
-	 * have the counter value in r8 that was returned by __do_get_tspec.
-	 * At this point, r4,r5 contain our sec/nsec values.
-	 */
-
-	ld	r6,WTOM_CLOCK_SEC(r3)
-	lwa	r9,WTOM_CLOCK_NSEC(r3)
-
-	/* We now have our result in r6,r9. We create a fake dependency
-	 * on that result and re-check the counter
-	 */
-	or	r0,r6,r9
-	xor	r0,r0,r0
-	add	r3,r3,r0
-	ld	r0,CFG_TB_UPDATE_COUNT(r3)
-        cmpld   cr0,r0,r8		/* check if updated */
-	bne-	50b
-	b	78f
-
-	/*
-	 * For coarse clocks we get data directly from the vdso data page, so
-	 * we don't need to call __do_get_tspec, but we still need to do the
-	 * counter trick.
-	 */
-70:	ld      r8,CFG_TB_UPDATE_COUNT(r3)
-	andi.   r0,r8,1                 /* pending update ? loop */
-	bne-    70b
-	add     r3,r3,r0		/* r0 is already 0 */
-
-	/*
-	 * CLOCK_REALTIME_COARSE, below values are needed for MONOTONIC_COARSE
-	 * too
-	 */
-	ld      r4,STAMP_XTIME_SEC(r3)
-	ld      r5,STAMP_XTIME_NSEC(r3)
-	bne     cr6,75f
-
-	/* CLOCK_MONOTONIC_COARSE */
-	ld	r6,WTOM_CLOCK_SEC(r3)
-	lwa     r9,WTOM_CLOCK_NSEC(r3)
-
-	/* check if counter has updated */
-	or      r0,r6,r9
-75:	or	r0,r0,r4
-	or	r0,r0,r5
-	xor     r0,r0,r0
-	add     r3,r3,r0
-	ld      r0,CFG_TB_UPDATE_COUNT(r3)
-	cmpld   cr0,r0,r8               /* check if updated */
-	bne-    70b
-
-	/* Counter has not updated, so continue calculating proper values for
-	 * sec and nsec if monotonic coarse, or just return with the proper
-	 * values for realtime.
-	 */
-	bne     cr6,80f
-
-	/* Add wall->monotonic offset and check for overflow or underflow */
-78:	add     r4,r4,r6
-	add     r5,r5,r9
-	cmpd    cr0,r5,r7
-	cmpdi   cr1,r5,0
-	blt     79f
-	subf    r5,r7,r5
-	addi    r4,r4,1
-79:	bge     cr1,80f
-	addi    r4,r4,-1
-	add     r5,r5,r7
-
-80:	std	r4,TSPC64_TV_SEC(r11)
-	std	r5,TSPC64_TV_NSEC(r11)
-
-	mtlr	r12
-	crclr	cr0*4+so
-	li	r3,0
-	blr
-
-	/*
-	 * syscall fallback
-	 */
-99:
-	li	r0,__NR_clock_gettime
-  .cfi_restore lr
-	sc
-	blr
-  .cfi_endproc
+	cvdso_call __c_kernel_clock_gettime
 V_FUNCTION_END(__kernel_clock_gettime)
 
 
@@ -180,34 +44,7 @@ V_FUNCTION_END(__kernel_clock_gettime)
  *
  */
 V_FUNCTION_BEGIN(__kernel_clock_getres)
-  .cfi_startproc
-	/* Check for supported clock IDs */
-	cmpwi	cr0,r3,CLOCK_REALTIME
-	cmpwi	cr1,r3,CLOCK_MONOTONIC
-	cror	cr0*4+eq,cr0*4+eq,cr1*4+eq
-	bne	cr0,99f
-
-	mflr	r12
-  .cfi_register lr,r12
-	get_datapage	r3, r0
-	lwz	r5, CLOCK_HRTIMER_RES(r3)
-	mtlr	r12
-	li	r3,0
-	cmpldi	cr0,r4,0
-	crclr	cr0*4+so
-	beqlr
-	std	r3,TSPC64_TV_SEC(r4)
-	std	r5,TSPC64_TV_NSEC(r4)
-	blr
-
-	/*
-	 * syscall fallback
-	 */
-99:
-	li	r0,__NR_clock_getres
-	sc
-	blr
-  .cfi_endproc
+	cvdso_call __c_kernel_clock_getres
 V_FUNCTION_END(__kernel_clock_getres)
 
 /*
@@ -217,74 +54,5 @@ V_FUNCTION_END(__kernel_clock_getres)
  *
  */
 V_FUNCTION_BEGIN(__kernel_time)
-  .cfi_startproc
-	mflr	r12
-  .cfi_register lr,r12
-
-	mr	r11,r3			/* r11 holds t */
-	get_datapage	r3, r0
-
-	ld	r4,STAMP_XTIME_SEC(r3)
-
-	cmpldi	r11,0			/* check if t is NULL */
-	beq	2f
-	std	r4,0(r11)		/* store result at *t */
-2:	mtlr	r12
-	crclr	cr0*4+so
-	mr	r3,r4
-	blr
-  .cfi_endproc
+	cvdso_call_time __c_kernel_time
 V_FUNCTION_END(__kernel_time)
-
-
-/*
- * This is the core of clock_gettime() and gettimeofday(),
- * it returns the current time in r4 (seconds) and r5.
- * On entry, r7 gives the resolution of r5, either USEC_PER_SEC
- * or NSEC_PER_SEC, giving r5 in microseconds or nanoseconds.
- * It expects the datapage ptr in r3 and doesn't clobber it.
- * It clobbers r0, r6 and r9.
- * On return, r8 contains the counter value that can be reused.
- * This clobbers cr0 but not any other cr field.
- */
-V_FUNCTION_BEGIN(__do_get_tspec)
-  .cfi_startproc
-	/* check for update count & load values */
-1:	ld	r8,CFG_TB_UPDATE_COUNT(r3)
-	andi.	r0,r8,1			/* pending update ? loop */
-	bne-	1b
-	xor	r0,r8,r8		/* create dependency */
-	add	r3,r3,r0
-
-	/* Get TB & offset it. We use the MFTB macro which will generate
-	 * workaround code for Cell.
-	 */
-	MFTB(r6)
-	ld	r9,CFG_TB_ORIG_STAMP(r3)
-	subf	r6,r9,r6
-
-	/* Scale result */
-	ld	r5,CFG_TB_TO_XS(r3)
-	sldi	r6,r6,12		/* compute time since stamp_xtime */
-	mulhdu	r6,r6,r5		/* in units of 2^-32 seconds */
-
-	/* Add stamp since epoch */
-	ld	r4,STAMP_XTIME_SEC(r3)
-	lwz	r5,STAMP_SEC_FRAC(r3)
-	or	r0,r4,r5
-	or	r0,r0,r6
-	xor	r0,r0,r0
-	add	r3,r3,r0
-	ld	r0,CFG_TB_UPDATE_COUNT(r3)
-	cmpld   r0,r8			/* check if updated */
-	bne-	1b			/* reload if so */
-
-	/* convert to seconds & nanoseconds and add to stamp */
-	add	r6,r6,r5		/* add on fractional seconds of xtime */
-	mulhwu	r5,r6,r7		/* compute micro or nanoseconds and */
-	srdi	r6,r6,32		/* seconds since stamp_xtime */
-	clrldi	r5,r5,32
-	add	r4,r4,r6
-	blr
-  .cfi_endproc
-V_FUNCTION_END(__do_get_tspec)
-- 
2.25.0




^ permalink raw reply related

* [PATCH v9 0/5] powerpc: switch VDSO to C implementation
From: Christophe Leroy @ 2020-08-04 11:47 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, nathanl,
	anton
  Cc: linux-arch, arnd, linux-kernel, luto, tglx, vincenzo.frascino,
	linuxppc-dev

This is the nineth version of a series to switch powerpc VDSO to
generic C implementation.

Main changes since v8 are:
- Dropped the patches which put the VDSO datapage in front of VDSO text in the mapping
- Adds a second stack frame because the caller doesn't set one, at least on PPC64
- Saving the TOC pointer on PPC64 (is that really needed ?)

This series applies on today's powerpc/merge branch.

See the last patches for details on changes and performance.

Christophe Leroy (5):
  powerpc/processor: Move cpu_relax() into asm/vdso/processor.h
  powerpc/vdso: Prepare for switching VDSO to generic C implementation.
  powerpc/vdso: Save and restore TOC pointer on PPC64
  powerpc/vdso: Switch VDSO to generic C implementation.
  powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32

 arch/powerpc/Kconfig                         |   2 +
 arch/powerpc/include/asm/clocksource.h       |   7 +
 arch/powerpc/include/asm/processor.h         |  13 +-
 arch/powerpc/include/asm/vdso/clocksource.h  |   7 +
 arch/powerpc/include/asm/vdso/gettimeofday.h | 191 ++++++++++++
 arch/powerpc/include/asm/vdso/processor.h    |  23 ++
 arch/powerpc/include/asm/vdso/vsyscall.h     |  25 ++
 arch/powerpc/include/asm/vdso_datapage.h     |  40 +--
 arch/powerpc/kernel/asm-offsets.c            |  49 +--
 arch/powerpc/kernel/time.c                   |  91 +-----
 arch/powerpc/kernel/vdso.c                   |   5 +-
 arch/powerpc/kernel/vdso32/Makefile          |  32 +-
 arch/powerpc/kernel/vdso32/config-fake32.h   |  34 +++
 arch/powerpc/kernel/vdso32/gettimeofday.S    | 300 +------------------
 arch/powerpc/kernel/vdso32/vdso32.lds.S      |   1 +
 arch/powerpc/kernel/vdso32/vgettimeofday.c   |  35 +++
 arch/powerpc/kernel/vdso64/Makefile          |  23 +-
 arch/powerpc/kernel/vdso64/gettimeofday.S    | 242 +--------------
 arch/powerpc/kernel/vdso64/vgettimeofday.c   |  29 ++
 19 files changed, 447 insertions(+), 702 deletions(-)
 create mode 100644 arch/powerpc/include/asm/clocksource.h
 create mode 100644 arch/powerpc/include/asm/vdso/clocksource.h
 create mode 100644 arch/powerpc/include/asm/vdso/gettimeofday.h
 create mode 100644 arch/powerpc/include/asm/vdso/processor.h
 create mode 100644 arch/powerpc/include/asm/vdso/vsyscall.h
 create mode 100644 arch/powerpc/kernel/vdso32/config-fake32.h
 create mode 100644 arch/powerpc/kernel/vdso32/vgettimeofday.c
 create mode 100644 arch/powerpc/kernel/vdso64/vgettimeofday.c

-- 
2.25.0


^ permalink raw reply

* [PATCH v9 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Christophe Leroy @ 2020-08-04 11:47 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, nathanl,
	anton
  Cc: linux-arch, arnd, linux-kernel, luto, tglx, vincenzo.frascino,
	linuxppc-dev
In-Reply-To: <cover.1596541334.git.christophe.leroy@csgroup.eu>

Prepare for switching VDSO to generic C implementation in following
patch. Here, we:
- Prepare the helpers to call the C VDSO functions
- Prepare the required callbacks for the C VDSO functions
- Prepare the clocksource.h files to define VDSO_ARCH_CLOCKMODES
- Add the C trampolines to the generic C VDSO functions

powerpc is a bit special for VDSO as well as system calls in the
way that it requires setting CR SO bit which cannot be done in C.
Therefore, entry/exit needs to be performed in ASM.

Implementing __arch_get_vdso_data() would clobber the link register,
requiring the caller to save it. As the ASM calling function already
has to set a stack frame and saves the link register before calling
the C vdso function, retriving the vdso data pointer there is lighter.

Implement __arch_vdso_capable() and:
- When the timebase is used, make it always return true.
- When the RTC clock is used, make it always return false.

Provide vdso_shift_ns(), as the generic x >> s gives the following
bad result:

  18:	35 25 ff e0 	addic.  r9,r5,-32
  1c:	41 80 00 10 	blt     2c <shift+0x14>
  20:	7c 64 4c 30 	srw     r4,r3,r9
  24:	38 60 00 00 	li      r3,0
...
  2c:	54 69 08 3c 	rlwinm  r9,r3,1,0,30
  30:	21 45 00 1f 	subfic  r10,r5,31
  34:	7c 84 2c 30 	srw     r4,r4,r5
  38:	7d 29 50 30 	slw     r9,r9,r10
  3c:	7c 63 2c 30 	srw     r3,r3,r5
  40:	7d 24 23 78 	or      r4,r9,r4

In our case the shift is always <= 32. In addition,  the upper 32 bits
of the result are likely nul. Lets GCC know it, it also optimises the
following calculations.

With the patch, we get:
   0:	21 25 00 20 	subfic  r9,r5,32
   4:	7c 69 48 30 	slw     r9,r3,r9
   8:	7c 84 2c 30 	srw     r4,r4,r5
   c:	7d 24 23 78 	or      r4,r9,r4
  10:	7c 63 2c 30 	srw     r3,r3,r5

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v9:
- No more modification of __get_datapage(). Offset is added after.
- Adding a second stack frame because the PPC VDSO ABI doesn't force
the caller to set one.

v8:
- New, splitted out of last patch of the series

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/clocksource.h       |   7 +
 arch/powerpc/include/asm/vdso/clocksource.h  |   7 +
 arch/powerpc/include/asm/vdso/gettimeofday.h | 179 +++++++++++++++++++
 arch/powerpc/kernel/vdso32/vgettimeofday.c   |  29 +++
 arch/powerpc/kernel/vdso64/vgettimeofday.c   |  29 +++
 5 files changed, 251 insertions(+)
 create mode 100644 arch/powerpc/include/asm/clocksource.h
 create mode 100644 arch/powerpc/include/asm/vdso/clocksource.h
 create mode 100644 arch/powerpc/include/asm/vdso/gettimeofday.h
 create mode 100644 arch/powerpc/kernel/vdso32/vgettimeofday.c
 create mode 100644 arch/powerpc/kernel/vdso64/vgettimeofday.c

diff --git a/arch/powerpc/include/asm/clocksource.h b/arch/powerpc/include/asm/clocksource.h
new file mode 100644
index 000000000000..482185566b0c
--- /dev/null
+++ b/arch/powerpc/include/asm/clocksource.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_CLOCKSOURCE_H
+#define _ASM_CLOCKSOURCE_H
+
+#include <asm/vdso/clocksource.h>
+
+#endif
diff --git a/arch/powerpc/include/asm/vdso/clocksource.h b/arch/powerpc/include/asm/vdso/clocksource.h
new file mode 100644
index 000000000000..ec5d672d2569
--- /dev/null
+++ b/arch/powerpc/include/asm/vdso/clocksource.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_VDSOCLOCKSOURCE_H
+#define __ASM_VDSOCLOCKSOURCE_H
+
+#define VDSO_ARCH_CLOCKMODES	VDSO_CLOCKMODE_ARCHTIMER
+
+#endif
diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h b/arch/powerpc/include/asm/vdso/gettimeofday.h
new file mode 100644
index 000000000000..0b6fa245d54e
--- /dev/null
+++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
@@ -0,0 +1,179 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_VDSO_GETTIMEOFDAY_H
+#define __ASM_VDSO_GETTIMEOFDAY_H
+
+#include <asm/ptrace.h>
+
+#ifdef __ASSEMBLY__
+
+.macro cvdso_call funct
+  .cfi_startproc
+	PPC_STLU	r1, -STACK_FRAME_OVERHEAD(r1)
+	mflr		r0
+	PPC_STLU	r1, -STACK_FRAME_OVERHEAD(r1)
+  .cfi_register lr, r0
+	PPC_STL		r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+	get_datapage	r5, r0
+	addi		r5, r5, VDSO_DATA_OFFSET
+	bl		\funct
+	PPC_LL		r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+	cmpwi		r3, 0
+	mtlr		r0
+  .cfi_restore lr
+	addi		r1, r1, 2 * STACK_FRAME_OVERHEAD
+	crclr		so
+	beqlr+
+	crset		so
+	neg		r3, r3
+	blr
+  .cfi_endproc
+.endm
+
+.macro cvdso_call_time funct
+  .cfi_startproc
+	PPC_STLU	r1, -STACK_FRAME_OVERHEAD(r1)
+	mflr		r0
+	PPC_STLU	r1, -STACK_FRAME_OVERHEAD(r1)
+  .cfi_register lr, r0
+	PPC_STL		r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+	get_datapage	r4, r0
+	addi		r4, r4, VDSO_DATA_OFFSET
+	bl		\funct
+	PPC_LL		r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+	crclr		so
+	mtlr		r0
+  .cfi_restore lr
+	addi		r1, r1, 2 * STACK_FRAME_OVERHEAD
+	blr
+  .cfi_endproc
+.endm
+
+#else
+
+#include <asm/time.h>
+#include <asm/unistd.h>
+#include <uapi/linux/time.h>
+
+#define VDSO_HAS_CLOCK_GETRES		1
+
+#define VDSO_HAS_TIME			1
+
+static __always_inline int do_syscall_2(const unsigned long _r0, const unsigned long _r3,
+					const unsigned long _r4)
+{
+	register long r0 asm("r0") = _r0;
+	register unsigned long r3 asm("r3") = _r3;
+	register unsigned long r4 asm("r4") = _r4;
+	register int ret asm ("r3");
+
+	asm volatile(
+		"       sc\n"
+		"	bns+	1f\n"
+		"	neg	%0, %0\n"
+		"1:\n"
+	: "=r" (ret), "+r" (r4), "+r" (r0)
+	: "r" (r3)
+	: "memory", "r5", "r6", "r7", "r8", "r9", "r10", "r11", "r12", "cr0", "ctr");
+
+	return ret;
+}
+
+static __always_inline
+int gettimeofday_fallback(struct __kernel_old_timeval *_tv, struct timezone *_tz)
+{
+	return do_syscall_2(__NR_gettimeofday, (unsigned long)_tv, (unsigned long)_tz);
+}
+
+static __always_inline
+int clock_gettime_fallback(clockid_t _clkid, struct __kernel_timespec *_ts)
+{
+	return do_syscall_2(__NR_clock_gettime, _clkid, (unsigned long)_ts);
+}
+
+static __always_inline
+int clock_getres_fallback(clockid_t _clkid, struct __kernel_timespec *_ts)
+{
+	return do_syscall_2(__NR_clock_getres, _clkid, (unsigned long)_ts);
+}
+
+#ifdef CONFIG_VDSO32
+
+#define BUILD_VDSO32		1
+
+static __always_inline
+int clock_gettime32_fallback(clockid_t _clkid, struct old_timespec32 *_ts)
+{
+	return do_syscall_2(__NR_clock_gettime, _clkid, (unsigned long)_ts);
+}
+
+static __always_inline
+int clock_getres32_fallback(clockid_t _clkid, struct old_timespec32 *_ts)
+{
+	return do_syscall_2(__NR_clock_getres, _clkid, (unsigned long)_ts);
+}
+#endif
+
+static __always_inline u64 __arch_get_hw_counter(s32 clock_mode)
+{
+	return get_tb();
+}
+
+const struct vdso_data *__arch_get_vdso_data(void);
+
+static inline bool vdso_clocksource_ok(const struct vdso_data *vd)
+{
+	return !__USE_RTC();
+}
+#define vdso_clocksource_ok vdso_clocksource_ok
+
+/*
+ * powerpc specific delta calculation.
+ *
+ * This variant removes the masking of the subtraction because the
+ * clocksource mask of all VDSO capable clocksources on powerpc is U64_MAX
+ * which would result in a pointless operation. The compiler cannot
+ * optimize it away as the mask comes from the vdso data and is not compile
+ * time constant.
+ */
+static __always_inline u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult)
+{
+	return (cycles - last) * mult;
+}
+#define vdso_calc_delta vdso_calc_delta
+
+#ifndef __powerpc64__
+static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
+{
+	u32 hi = ns >> 32;
+	u32 lo = ns;
+
+	lo >>= shift;
+	lo |= hi << (32 - shift);
+	hi >>= shift;
+
+	if (likely(hi == 0))
+		return lo;
+
+	return ((u64)hi << 32) | lo;
+}
+#define vdso_shift_ns vdso_shift_ns
+#endif
+
+#ifdef __powerpc64__
+int __c_kernel_clock_gettime(clockid_t clock, struct __kernel_timespec *ts,
+			     const struct vdso_data *vd);
+int __c_kernel_clock_getres(clockid_t clock_id, struct __kernel_timespec *res,
+			    const struct vdso_data *vd);
+#else
+int __c_kernel_clock_gettime(clockid_t clock, struct old_timespec32 *ts,
+			     const struct vdso_data *vd);
+int __c_kernel_clock_getres(clockid_t clock_id, struct old_timespec32 *res,
+			    const struct vdso_data *vd);
+#endif
+int __c_kernel_gettimeofday(struct __kernel_old_timeval *tv, struct timezone *tz,
+			    const struct vdso_data *vd);
+__kernel_old_time_t __c_kernel_time(__kernel_old_time_t *time,
+				    const struct vdso_data *vd);
+#endif /* __ASSEMBLY__ */
+
+#endif /* __ASM_VDSO_GETTIMEOFDAY_H */
diff --git a/arch/powerpc/kernel/vdso32/vgettimeofday.c b/arch/powerpc/kernel/vdso32/vgettimeofday.c
new file mode 100644
index 000000000000..0b9ab4c22ef2
--- /dev/null
+++ b/arch/powerpc/kernel/vdso32/vgettimeofday.c
@@ -0,0 +1,29 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Powerpc userspace implementations of gettimeofday() and similar.
+ */
+#include <linux/time.h>
+#include <linux/types.h>
+
+int __c_kernel_clock_gettime(clockid_t clock, struct old_timespec32 *ts,
+			     const struct vdso_data *vd)
+{
+	return __cvdso_clock_gettime32_data(vd, clock, ts);
+}
+
+int __c_kernel_gettimeofday(struct __kernel_old_timeval *tv, struct timezone *tz,
+			    const struct vdso_data *vd)
+{
+	return __cvdso_gettimeofday_data(vd, tv, tz);
+}
+
+int __c_kernel_clock_getres(clockid_t clock_id, struct old_timespec32 *res,
+			    const struct vdso_data *vd)
+{
+	return __cvdso_clock_getres_time32_data(vd, clock_id, res);
+}
+
+__kernel_old_time_t __c_kernel_time(__kernel_old_time_t *time, const struct vdso_data *vd)
+{
+	return __cvdso_time_data(vd, time);
+}
diff --git a/arch/powerpc/kernel/vdso64/vgettimeofday.c b/arch/powerpc/kernel/vdso64/vgettimeofday.c
new file mode 100644
index 000000000000..5b5500058344
--- /dev/null
+++ b/arch/powerpc/kernel/vdso64/vgettimeofday.c
@@ -0,0 +1,29 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Powerpc userspace implementations of gettimeofday() and similar.
+ */
+#include <linux/time.h>
+#include <linux/types.h>
+
+int __c_kernel_clock_gettime(clockid_t clock, struct __kernel_timespec *ts,
+			     const struct vdso_data *vd)
+{
+	return __cvdso_clock_gettime_data(vd, clock, ts);
+}
+
+int __c_kernel_gettimeofday(struct __kernel_old_timeval *tv, struct timezone *tz,
+			    const struct vdso_data *vd)
+{
+	return __cvdso_gettimeofday_data(vd, tv, tz);
+}
+
+int __c_kernel_clock_getres(clockid_t clock_id, struct __kernel_timespec *res,
+			    const struct vdso_data *vd)
+{
+	return __cvdso_clock_getres_data(vd, clock_id, res);
+}
+
+__kernel_old_time_t __c_kernel_time(__kernel_old_time_t *time, const struct vdso_data *vd)
+{
+	return __cvdso_time_data(vd, time);
+}
-- 
2.25.0


^ permalink raw reply related

* [PATCH v9 3/5] powerpc/vdso: Save and restore TOC pointer on PPC64
From: Christophe Leroy @ 2020-08-04 11:47 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, nathanl,
	anton
  Cc: linux-arch, arnd, linux-kernel, luto, tglx, vincenzo.frascino,
	linuxppc-dev
In-Reply-To: <cover.1596541334.git.christophe.leroy@csgroup.eu>

On PPC64, the TOC pointer needs to be saved and restored.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v9: New.

I'm not sure this is really needed, I can't see the VDSO C code doing
anything with r2, at least on ppc64_defconfig.

So I let you decide whether you take it or not.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/vdso/gettimeofday.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h b/arch/powerpc/include/asm/vdso/gettimeofday.h
index 0b6fa245d54e..fa774086173b 100644
--- a/arch/powerpc/include/asm/vdso/gettimeofday.h
+++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
@@ -13,10 +13,16 @@
 	PPC_STLU	r1, -STACK_FRAME_OVERHEAD(r1)
   .cfi_register lr, r0
 	PPC_STL		r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+#ifdef CONFIG_PPC64
+	PPC_STL		r2, STACK_FRAME_OVERHEAD + STK_GOT(r1)
+#endif
 	get_datapage	r5, r0
 	addi		r5, r5, VDSO_DATA_OFFSET
 	bl		\funct
 	PPC_LL		r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+#ifdef CONFIG_PPC64
+	PPC_LL		r2, STACK_FRAME_OVERHEAD + STK_GOT(r1)
+#endif
 	cmpwi		r3, 0
 	mtlr		r0
   .cfi_restore lr
@@ -36,10 +42,16 @@
 	PPC_STLU	r1, -STACK_FRAME_OVERHEAD(r1)
   .cfi_register lr, r0
 	PPC_STL		r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+#ifdef CONFIG_PPC64
+	PPC_STL		r2, STACK_FRAME_OVERHEAD + STK_GOT(r1)
+#endif
 	get_datapage	r4, r0
 	addi		r4, r4, VDSO_DATA_OFFSET
 	bl		\funct
 	PPC_LL		r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+#ifdef CONFIG_PPC64
+	PPC_LL		r2, STACK_FRAME_OVERHEAD + STK_GOT(r1)
+#endif
 	crclr		so
 	mtlr		r0
   .cfi_restore lr
-- 
2.25.0


^ permalink raw reply related

* [PATCH v9 1/5] powerpc/processor: Move cpu_relax() into asm/vdso/processor.h
From: Christophe Leroy @ 2020-08-04 11:47 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, nathanl,
	anton
  Cc: linux-arch, arnd, linux-kernel, luto, tglx, vincenzo.frascino,
	linuxppc-dev
In-Reply-To: <cover.1596541334.git.christophe.leroy@csgroup.eu>

cpu_relax() need to be in asm/vdso/processor.h to be used by
the C VDSO generic library.

Move it there.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v9: Forgot to remove cpu_relax() from processor.h in v8
---
 arch/powerpc/include/asm/processor.h      | 13 ++-----------
 arch/powerpc/include/asm/vdso/processor.h | 23 +++++++++++++++++++++++
 2 files changed, 25 insertions(+), 11 deletions(-)
 create mode 100644 arch/powerpc/include/asm/vdso/processor.h

diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index ed0d633ab5aa..c1ba9c8d9b90 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -6,6 +6,8 @@
  * Copyright (C) 2001 PPC 64 Team, IBM Corp
  */
 
+#include <vdso/processor.h>
+
 #include <asm/reg.h>
 
 #ifdef CONFIG_VSX
@@ -63,14 +65,6 @@ extern int _chrp_type;
 
 #endif /* defined(__KERNEL__) && defined(CONFIG_PPC32) */
 
-/* Macros for adjusting thread priority (hardware multi-threading) */
-#define HMT_very_low()   asm volatile("or 31,31,31   # very low priority")
-#define HMT_low()	 asm volatile("or 1,1,1	     # low priority")
-#define HMT_medium_low() asm volatile("or 6,6,6      # medium low priority")
-#define HMT_medium()	 asm volatile("or 2,2,2	     # medium priority")
-#define HMT_medium_high() asm volatile("or 5,5,5      # medium high priority")
-#define HMT_high()	 asm volatile("or 3,3,3	     # high priority")
-
 #ifdef __KERNEL__
 
 #ifdef CONFIG_PPC64
@@ -350,7 +344,6 @@ static inline unsigned long __pack_fe01(unsigned int fpmode)
 }
 
 #ifdef CONFIG_PPC64
-#define cpu_relax()	do { HMT_low(); HMT_medium(); barrier(); } while (0)
 
 #define spin_begin()	HMT_low()
 
@@ -369,8 +362,6 @@ do {								\
 	}							\
 } while (0)
 
-#else
-#define cpu_relax()	barrier()
 #endif
 
 /* Check that a certain kernel stack pointer is valid in task_struct p */
diff --git a/arch/powerpc/include/asm/vdso/processor.h b/arch/powerpc/include/asm/vdso/processor.h
new file mode 100644
index 000000000000..39b9beace9ca
--- /dev/null
+++ b/arch/powerpc/include/asm/vdso/processor.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_VDSO_PROCESSOR_H
+#define __ASM_VDSO_PROCESSOR_H
+
+#ifndef __ASSEMBLY__
+
+/* Macros for adjusting thread priority (hardware multi-threading) */
+#define HMT_very_low()		asm volatile("or 31, 31, 31	# very low priority")
+#define HMT_low()		asm volatile("or 1, 1, 1	# low priority")
+#define HMT_medium_low()	asm volatile("or 6, 6, 6	# medium low priority")
+#define HMT_medium()		asm volatile("or 2, 2, 2	# medium priority")
+#define HMT_medium_high()	asm volatile("or 5, 5, 5	# medium high priority")
+#define HMT_high()		asm volatile("or 3, 3, 3	# high priority")
+
+#ifdef CONFIG_PPC64
+#define cpu_relax()	do { HMT_low(); HMT_medium(); barrier(); } while (0)
+#else
+#define cpu_relax()	barrier()
+#endif
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* __ASM_VDSO_PROCESSOR_H */
-- 
2.25.0


^ permalink raw reply related

* Re: [PATCH v8 2/8] powerpc/vdso: Remove __kernel_datapage_offset and simplify __get_datapage()
From: Christophe Leroy @ 2020-08-04 11:17 UTC (permalink / raw)
  To: Michael Ellerman, Christophe Leroy, Benjamin Herrenschmidt,
	Paul Mackerras, nathanl
  Cc: linux-arch, arnd, linux-kernel, luto, tglx, vincenzo.frascino,
	linuxppc-dev
In-Reply-To: <87wo34tbas.fsf@mpe.ellerman.id.au>



On 07/16/2020 02:59 AM, Michael Ellerman wrote:
> Christophe Leroy <christophe.leroy@c-s.fr> writes:
>> The VDSO datapage and the text pages are always located immediately
>> next to each other, so it can be hardcoded without an indirection
>> through __kernel_datapage_offset
>>
>> In order to ease things, move the data page in front like other
>> arches, that way there is no need to know the size of the library
>> to locate the data page.
>>
>> Before:
>> clock-getres-realtime-coarse:    vdso: 714 nsec/call
>> clock-gettime-realtime-coarse:    vdso: 792 nsec/call
>> clock-gettime-realtime:    vdso: 1243 nsec/call
>>
>> After:
>> clock-getres-realtime-coarse:    vdso: 699 nsec/call
>> clock-gettime-realtime-coarse:    vdso: 784 nsec/call
>> clock-gettime-realtime:    vdso: 1231 nsec/call
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>> ---
>> v7:
>> - Moved the removal of the tmp param of __get_datapage()
>> in a subsequent patch
>> - Included the addition of the offset param to __get_datapage()
>> in the further preparatory patch
>> ---
>>   arch/powerpc/include/asm/vdso_datapage.h |  8 ++--
>>   arch/powerpc/kernel/vdso.c               | 53 ++++--------------------
>>   arch/powerpc/kernel/vdso32/datapage.S    |  3 --
>>   arch/powerpc/kernel/vdso32/vdso32.lds.S  |  7 +---
>>   arch/powerpc/kernel/vdso64/datapage.S    |  3 --
>>   arch/powerpc/kernel/vdso64/vdso64.lds.S  |  7 +---
>>   6 files changed, 16 insertions(+), 65 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/vdso_datapage.h b/arch/powerpc/include/asm/vdso_datapage.h
>> index b9ef6cf50ea5..11886467dfdf 100644
>> --- a/arch/powerpc/include/asm/vdso_datapage.h
>> +++ b/arch/powerpc/include/asm/vdso_datapage.h
>> @@ -118,10 +118,12 @@ extern struct vdso_data *vdso_data;
>>   
>>   .macro get_datapage ptr, tmp
>>   	bcl	20, 31, .+4
>> +999:
>>   	mflr	\ptr
>> -	addi	\ptr, \ptr, (__kernel_datapage_offset - (.-4))@l
>> -	lwz	\tmp, 0(\ptr)
>> -	add	\ptr, \tmp, \ptr
>> +#if CONFIG_PPC_PAGE_SHIFT > 14
>> +	addis	\ptr, \ptr, (_vdso_datapage - 999b)@ha
>> +#endif
>> +	addi	\ptr, \ptr, (_vdso_datapage - 999b)@l
>>   .endm
>>   
>>   #endif /* __ASSEMBLY__ */
>> diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
>> index f38f26e844b6..d33fa22ddbed 100644
>> --- a/arch/powerpc/kernel/vdso.c
>> +++ b/arch/powerpc/kernel/vdso.c
>> @@ -190,7 +190,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
>>   	 * install_special_mapping or the perf counter mmap tracking code
>>   	 * will fail to recognise it as a vDSO (since arch_vma_name fails).
>>   	 */
>> -	current->mm->context.vdso_base = vdso_base;
>> +	current->mm->context.vdso_base = vdso_base + PAGE_SIZE;
> 
> I merged this but then realised it breaks the display of the vdso in /proc/self/maps.
> 
> ie. the vdso vma gets no name:
> 
>    # cat /proc/self/maps
>    110f90000-110fa0000 r-xp 00000000 08:03 17021844                         /usr/bin/cat
>    110fa0000-110fb0000 r--p 00000000 08:03 17021844                         /usr/bin/cat
>    110fb0000-110fc0000 rw-p 00010000 08:03 17021844                         /usr/bin/cat
>    126000000-126030000 rw-p 00000000 00:00 0                                [heap]
>    7fffa8790000-7fffa87d0000 rw-p 00000000 00:00 0
>    7fffa87d0000-7fffa8830000 r--p 00000000 08:03 17521786                   /usr/lib/locale/en_AU.utf8/LC_CTYPE
>    7fffa8830000-7fffa8840000 r--p 00000000 08:03 16958337                   /usr/lib/locale/en_AU.utf8/LC_NUMERIC
>    7fffa8840000-7fffa8850000 r--p 00000000 08:03 8501358                    /usr/lib/locale/en_AU.utf8/LC_TIME
>    7fffa8850000-7fffa8ad0000 r--p 00000000 08:03 16870886                   /usr/lib/locale/en_AU.utf8/LC_COLLATE
>    7fffa8ad0000-7fffa8ae0000 r--p 00000000 08:03 8509433                    /usr/lib/locale/en_AU.utf8/LC_MONETARY
>    7fffa8ae0000-7fffa8af0000 r--p 00000000 08:03 25383753                   /usr/lib/locale/en_AU.utf8/LC_MESSAGES/SYS_LC_MESSAGES
>    7fffa8af0000-7fffa8b00000 r--p 00000000 08:03 17521790                   /usr/lib/locale/en_AU.utf8/LC_PAPER
>    7fffa8b00000-7fffa8b10000 r--p 00000000 08:03 8501354                    /usr/lib/locale/en_AU.utf8/LC_NAME
>    7fffa8b10000-7fffa8b20000 r--p 00000000 08:03 8509431                    /usr/lib/locale/en_AU.utf8/LC_ADDRESS
>    7fffa8b20000-7fffa8b30000 r--p 00000000 08:03 8509434                    /usr/lib/locale/en_AU.utf8/LC_TELEPHONE
>    7fffa8b30000-7fffa8b40000 r--p 00000000 08:03 17521787                   /usr/lib/locale/en_AU.utf8/LC_MEASUREMENT
>    7fffa8b40000-7fffa8b50000 r--s 00000000 08:03 25623315                   /usr/lib64/gconv/gconv-modules.cache
>    7fffa8b50000-7fffa8d40000 r-xp 00000000 08:03 25383789                   /usr/lib64/libc-2.30.so
>    7fffa8d40000-7fffa8d50000 r--p 001e0000 08:03 25383789                   /usr/lib64/libc-2.30.so
>    7fffa8d50000-7fffa8d60000 rw-p 001f0000 08:03 25383789                   /usr/lib64/libc-2.30.so
>    7fffa8d60000-7fffa8d70000 r--p 00000000 08:03 8509432                    /usr/lib/locale/en_AU.utf8/LC_IDENTIFICATION
>    7fffa8d70000-7fffa8d90000 r-xp 00000000 00:00 0						<--- missing
>    7fffa8d90000-7fffa8dc0000 r-xp 00000000 08:03 25383781                   /usr/lib64/ld-2.30.so
>    7fffa8dc0000-7fffa8dd0000 r--p 00020000 08:03 25383781                   /usr/lib64/ld-2.30.so
>    7fffa8dd0000-7fffa8de0000 rw-p 00030000 08:03 25383781                   /usr/lib64/ld-2.30.so
>    7fffc31c0000-7fffc31f0000 rw-p 00000000 00:00 0                          [stack]
> 
> 
> And it's also going to break the logic in arch_unmap() to detect if
> we're unmapping (part of) the VDSO. And it will break arch_remap() too.
> 
> And the logic to recognise the signal trampoline in
> arch/powerpc/perf/callchain_*.c as well.

I don't think it breaks that one, because ->vdsobase is still the start 
of text.

> 
> So I'm going to rebase and drop this for now.
> 
> Basically we have a bunch of places that assume that vdso_base is == the
> start of the VDSO vma, and also that the code starts there. So that will
> need some work to tease out all those assumptions and make them work
> with this change.

Ok, one day I need to look at it in more details and see how other 
architectures handle it etc ...

As the benefit is only 2 CPU cycles, I'll drop it for now.

Christophe

^ permalink raw reply

* Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Christophe Leroy @ 2020-08-04 11:14 UTC (permalink / raw)
  To: Michael Ellerman, Christophe Leroy, Benjamin Herrenschmidt,
	Paul Mackerras, nathanl
  Cc: linux-arch, arnd, linux-kernel, Tulio Magno Quites Machado Filho,
	luto, tglx, vincenzo.frascino, linuxppc-dev
In-Reply-To: <878sflvbad.fsf@mpe.ellerman.id.au>



On 07/15/2020 01:04 AM, Michael Ellerman wrote:
> Christophe Leroy <christophe.leroy@c-s.fr> writes:
>> Prepare for switching VDSO to generic C implementation in following
>> patch. Here, we:
>> - Modify __get_datapage() to take an offset
>> - Prepare the helpers to call the C VDSO functions
>> - Prepare the required callbacks for the C VDSO functions
>> - Prepare the clocksource.h files to define VDSO_ARCH_CLOCKMODES
>> - Add the C trampolines to the generic C VDSO functions
>>
>> powerpc is a bit special for VDSO as well as system calls in the
>> way that it requires setting CR SO bit which cannot be done in C.
>> Therefore, entry/exit needs to be performed in ASM.
>>
>> Implementing __arch_get_vdso_data() would clobber the link register,
>> requiring the caller to save it. As the ASM calling function already
>> has to set a stack frame and saves the link register before calling
>> the C vdso function, retriving the vdso data pointer there is lighter.
> ...
> 
>> diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h b/arch/powerpc/include/asm/vdso/gettimeofday.h
>> new file mode 100644
>> index 000000000000..4452897f9bd8
>> --- /dev/null
>> +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
>> @@ -0,0 +1,175 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef __ASM_VDSO_GETTIMEOFDAY_H
>> +#define __ASM_VDSO_GETTIMEOFDAY_H
>> +
>> +#include <asm/ptrace.h>
>> +
>> +#ifdef __ASSEMBLY__
>> +
>> +.macro cvdso_call funct
>> +  .cfi_startproc
>> +	PPC_STLU	r1, -STACK_FRAME_OVERHEAD(r1)
>> +	mflr		r0
>> +  .cfi_register lr, r0
>> +	PPC_STL		r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
> 
> This doesn't work for me on ppc64(le) with glibc.
> 
> glibc doesn't create a stack frame before making the VDSO call, so the
> store of r0 (LR) goes into the caller's frame, corrupting the saved LR,
> leading to an infinite loop.
> 
> This is an example from a statically built program that calls
> clock_gettime():
> 
> 0000000010030cb0 <__clock_gettime>:
>      10030cb0:   0e 10 40 3c     lis     r2,4110
>      10030cb4:   00 7a 42 38     addi    r2,r2,31232
>      10030cb8:   a6 02 08 7c     mflr    r0
>      10030cbc:   ff ff 22 3d     addis   r9,r2,-1
>      10030cc0:   58 6d 29 39     addi    r9,r9,27992
>      10030cc4:   f0 ff c1 fb     std     r30,-16(r1)			<-- redzone store
>      10030cc8:   78 23 9e 7c     mr      r30,r4
>      10030ccc:   f8 ff e1 fb     std     r31,-8(r1)			<-- redzone store
>      10030cd0:   78 1b 7f 7c     mr      r31,r3
>      10030cd4:   10 00 01 f8     std     r0,16(r1)			<-- save LR to caller's frame
>      10030cd8:   00 00 09 e8     ld      r0,0(r9)
>      10030cdc:   00 00 20 2c     cmpdi   r0,0
>      10030ce0:   50 00 82 41     beq     10030d30 <__clock_gettime+0x80>
>      10030ce4:   a6 03 09 7c     mtctr   r0
>      10030ce8:   21 04 80 4e     bctrl					<-- vdso call
>      10030cec:   26 00 00 7c     mfcr    r0
>      10030cf0:   00 10 09 74     andis.  r9,r0,4096
>      10030cf4:   78 1b 69 7c     mr      r9,r3
>      10030cf8:   28 00 82 40     bne     10030d20 <__clock_gettime+0x70>
>      10030cfc:   b4 07 23 7d     extsw   r3,r9
>      10030d00:   10 00 01 e8     ld      r0,16(r1)			<-- load saved LR, since clobbered by the VDSO
>      10030d04:   f0 ff c1 eb     ld      r30,-16(r1)
>      10030d08:   f8 ff e1 eb     ld      r31,-8(r1)
>      10030d0c:   a6 03 08 7c     mtlr    r0				<-- restore LR
>      10030d10:   20 00 80 4e     blr					<-- jumps to 10030cec
> 
> 
> I'm kind of confused how it worked for you on 32-bit.

Indeed, 32-bit doesn't have a redzone, so I believe it needs a stack 
frame whenever it has anything to same. Here is what I have in libc.so:

000fbb60 <__clock_gettime>:
    fbb60:	94 21 ff e0 	stwu    r1,-32(r1)
    fbb64:	7c 08 02 a6 	mflr    r0
    fbb68:	48 09 75 c9 	bl      193130 <_IO_stdout_+0x24b0>
    fbb6c:	93 c1 00 18 	stw     r30,24(r1)
    fbb70:	7f c8 02 a6 	mflr    r30
    fbb74:	90 01 00 24 	stw     r0,36(r1)
    fbb78:	93 81 00 10 	stw     r28,16(r1)
    fbb7c:	93 a1 00 14 	stw     r29,20(r1)
    fbb80:	83 9e fc 98 	lwz     r28,-872(r30)
    fbb84:	93 e1 00 1c 	stw     r31,28(r1)
    fbb88:	80 1c 00 00 	lwz     r0,0(r28)
    fbb8c:	83 82 8f f4 	lwz     r28,-28684(r2)
    fbb90:	7c 7f 1b 78 	mr      r31,r3
    fbb94:	7c 00 e2 79 	xor.    r0,r0,r28
    fbb98:	7c 9d 23 78 	mr      r29,r4
    fbb9c:	41 82 00 40 	beq     fbbdc <__clock_gettime+0x7c>
    fbba0:	7c 09 03 a6 	mtctr   r0
    fbba4:	4e 80 04 21 	bctrl
    fbba8:	7c 00 00 26 	mfcr    r0
    fbbac:	74 1c 10 00 	andis.  r28,r0,4096
    fbbb0:	40 82 00 24 	bne     fbbd4 <__clock_gettime+0x74>
    fbbb4:	80 01 00 24 	lwz     r0,36(r1)
    fbbb8:	83 81 00 10 	lwz     r28,16(r1)
    fbbbc:	7c 08 03 a6 	mtlr    r0
    fbbc0:	83 a1 00 14 	lwz     r29,20(r1)
    fbbc4:	83 c1 00 18 	lwz     r30,24(r1)
    fbbc8:	83 e1 00 1c 	lwz     r31,28(r1)
    fbbcc:	38 21 00 20 	addi    r1,r1,32
    fbbd0:	4e 80 00 20 	blr
...
   193130:	4e 80 00 21 	blrl

But I guess if a prog has a way to avoid having anything to save, we may 
face the same issue. So lets create two frames:

diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h 
b/arch/powerpc/include/asm/vdso/gettimeofday.h
index a0712a6e80d9..0b6fa245d54e 100644
--- a/arch/powerpc/include/asm/vdso/gettimeofday.h
+++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
@@ -10,6 +10,7 @@
    .cfi_startproc
  	PPC_STLU	r1, -STACK_FRAME_OVERHEAD(r1)
  	mflr		r0
+	PPC_STLU	r1, -STACK_FRAME_OVERHEAD(r1)
    .cfi_register lr, r0
  	PPC_STL		r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
  	get_datapage	r5, r0
@@ -19,7 +20,7 @@
  	cmpwi		r3, 0
  	mtlr		r0
    .cfi_restore lr
-	addi		r1, r1, STACK_FRAME_OVERHEAD
+	addi		r1, r1, 2 * STACK_FRAME_OVERHEAD
  	crclr		so
  	beqlr+
  	crset		so
@@ -32,6 +33,7 @@
    .cfi_startproc
  	PPC_STLU	r1, -STACK_FRAME_OVERHEAD(r1)
  	mflr		r0
+	PPC_STLU	r1, -STACK_FRAME_OVERHEAD(r1)
    .cfi_register lr, r0
  	PPC_STL		r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
  	get_datapage	r4, r0
@@ -41,7 +43,7 @@
  	crclr		so
  	mtlr		r0
    .cfi_restore lr
-	addi		r1, r1, STACK_FRAME_OVERHEAD
+	addi		r1, r1, 2 * STACK_FRAME_OVERHEAD
  	blr
    .cfi_endproc
  .endm



> 
> There's also no code to load/restore the TOC pointer on BE, which I
> think we'll need to handle.

I see no code in the generated vdso64.so doing anything with r2, but if 
you think that's needed, just let's do it:

commit 5a704d89bd5d7aac39194fb4c775f406905bf0a4
Author: Christophe Leroy <christophe.leroy@csgroup.eu>
Date:   Tue Aug 4 06:31:48 2020 +0000

     Save and restore GOT

diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h 
b/arch/powerpc/include/asm/vdso/gettimeofday.h
index 0b6fa245d54e..fa774086173b 100644
--- a/arch/powerpc/include/asm/vdso/gettimeofday.h
+++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
@@ -13,10 +13,16 @@
  	PPC_STLU	r1, -STACK_FRAME_OVERHEAD(r1)
    .cfi_register lr, r0
  	PPC_STL		r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+#ifdef CONFIG_PPC64
+	PPC_STL		r2, STACK_FRAME_OVERHEAD + STK_GOT(r1)
+#endif
  	get_datapage	r5, r0
  	addi		r5, r5, VDSO_DATA_OFFSET
  	bl		\funct
  	PPC_LL		r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+#ifdef CONFIG_PPC64
+	PPC_LL		r2, STACK_FRAME_OVERHEAD + STK_GOT(r1)
+#endif
  	cmpwi		r3, 0
  	mtlr		r0
    .cfi_restore lr
@@ -36,10 +42,16 @@
  	PPC_STLU	r1, -STACK_FRAME_OVERHEAD(r1)
    .cfi_register lr, r0
  	PPC_STL		r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+#ifdef CONFIG_PPC64
+	PPC_STL		r2, STACK_FRAME_OVERHEAD + STK_GOT(r1)
+#endif
  	get_datapage	r4, r0
  	addi		r4, r4, VDSO_DATA_OFFSET
  	bl		\funct
  	PPC_LL		r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+#ifdef CONFIG_PPC64
+	PPC_LL		r2, STACK_FRAME_OVERHEAD + STK_GOT(r1)
+#endif
  	crclr		so
  	mtlr		r0
    .cfi_restore lr


Christophe

^ permalink raw reply related

* Re: [PATCH 2/2] powerpc/topology: Override cpu_smt_mask
From: Valentin Schneider @ 2020-08-04 11:02 UTC (permalink / raw)
  To: peterz
  Cc: Gautham R Shenoy, Michael Neuling, Vincent Guittot,
	Srikar Dronamraju, Rik van Riel, linuxppc-dev, LKML,
	Dietmar Eggemann, Thomas Gleixner, Mel Gorman, Ingo Molnar
In-Reply-To: <20200804104642.GC2657@hirez.programming.kicks-ass.net>


On 04/08/20 11:46, peterz@infradead.org wrote:
> On Tue, Aug 04, 2020 at 09:03:07AM +0530, Srikar Dronamraju wrote:
>> On Power9 a pair of cores can be presented by the firmware as a big-core
>> for backward compatibility reasons, with 4 threads per (small) core and 8
>> threads per big-core. cpu_smt_mask() should generally point to the cpu mask
>> of the (small)core.
>>
>> In order to maintain userspace backward compatibility (with Power8 chips in
>> case of Power9) in enterprise Linux systems, the topology_sibling_cpumask
>> has to be set to big-core. Hence override the default cpu_smt_mask() to be
>> powerpc specific allowing for better scheduling behaviour on Power.
>
> Why does Linux userspace care about this?

Ditto; from [1], a core contains CPUs that all share the same L1 (and capacity,
as per SD_SHARE_CPUCAPACITY). So IMO it makes perfect sense to have a first
domain spanning L1, and its parent spanning L2 - that means
topology_sibling_cpumask *itself* should span a single core rather than a
pair.

[1]: https://lkml.kernel.org/r/jhjr1sviswg.mognet@arm.com

^ permalink raw reply

* Re: [PATCH 2/2] powerpc/topology: Override cpu_smt_mask
From: peterz @ 2020-08-04 10:46 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Gautham R Shenoy, Michael Neuling, Vincent Guittot, Rik van Riel,
	linuxppc-dev, LKML, Ingo Molnar, Thomas Gleixner, Mel Gorman,
	Valentin Schneider, Dietmar Eggemann
In-Reply-To: <20200804033307.76111-2-srikar@linux.vnet.ibm.com>

On Tue, Aug 04, 2020 at 09:03:07AM +0530, Srikar Dronamraju wrote:
> On Power9 a pair of cores can be presented by the firmware as a big-core
> for backward compatibility reasons, with 4 threads per (small) core and 8
> threads per big-core. cpu_smt_mask() should generally point to the cpu mask
> of the (small)core.
> 
> In order to maintain userspace backward compatibility (with Power8 chips in
> case of Power9) in enterprise Linux systems, the topology_sibling_cpumask
> has to be set to big-core. Hence override the default cpu_smt_mask() to be
> powerpc specific allowing for better scheduling behaviour on Power.

Why does Linux userspace care about this?

^ permalink raw reply

* Re: [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask
From: peterz @ 2020-08-04 10:45 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Gautham R Shenoy, Michael Neuling, Vincent Guittot, Rik van Riel,
	linuxppc-dev, LKML, Ingo Molnar, Thomas Gleixner, Mel Gorman,
	Valentin Schneider, Dietmar Eggemann
In-Reply-To: <20200804033307.76111-1-srikar@linux.vnet.ibm.com>

On Tue, Aug 04, 2020 at 09:03:06AM +0530, Srikar Dronamraju wrote:
> cpu_smt_mask tracks topology_sibling_cpumask. This would be good for
> most architectures. One of the users of cpu_smt_mask(), would be to
> identify idle-cores. On Power9, a pair of cores can be presented by the
> firmware as a big-core for backward compatibility reasons.
> 
> In order to maintain userspace backward compatibility with previous
> versions of processor, (since Power8 had SMT8 cores), Power9 onwards there
> is option to the firmware to advertise a pair of SMT4 cores as a fused
> cores (referred to as the big_core mode in the Linux Kernel). On Power9
> this pair shares the L2 cache as well. However, from the scheduler's point
> of view, a core should be determined by SMT4. The load-balancer already
> does this. Hence allow PowerPc architecture to override the default
> cpu_smt_mask() to point to the SMT4 cores in a big_core mode.

I'm utterly confused.

Why can't you set your regular siblings mask to the smt4 thing? Who
cares about the compat stuff, I thought that was an LPAR/AIX thing.

^ permalink raw reply

* Re: [PATCH 2/2] lockdep: warn on redundant or incorrect irq state changes
From: Alexey Kardashevskiy @ 2020-08-04 10:00 UTC (permalink / raw)
  To: Nicholas Piggin, linux-kernel
  Cc: linux-arch, Peter Zijlstra, Ingo Molnar, linuxppc-dev,
	Will Deacon
In-Reply-To: <20200723105615.1268126-2-npiggin@gmail.com>



On 23/07/2020 20:56, Nicholas Piggin wrote:
> With the previous patch, lockdep hardirq state changes should not be
> redundant. Softirq state changes already follow that pattern.
> 
> So warn on unexpected enable-when-enabled or disable-when-disabled
> conditions, to catch possible errors or sloppy patterns that could
> lead to similar bad behavior due to NMIs etc.


This one does not apply anymore as Peter's changes went in already but
this one is not really a fix so I can live with that. Did 1/2 go
anywhere? Thanks,


> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  kernel/locking/lockdep.c           | 80 +++++++++++++-----------------
>  kernel/locking/lockdep_internals.h |  4 --
>  kernel/locking/lockdep_proc.c      | 10 +---
>  3 files changed, 35 insertions(+), 59 deletions(-)
> 
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 29a8de4c50b9..138458fb2234 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -3649,15 +3649,8 @@ void lockdep_hardirqs_on_prepare(unsigned long ip)
>  	if (unlikely(!debug_locks || current->lockdep_recursion))
>  		return;
>  
> -	if (unlikely(current->hardirqs_enabled)) {
> -		/*
> -		 * Neither irq nor preemption are disabled here
> -		 * so this is racy by nature but losing one hit
> -		 * in a stat is not a big deal.
> -		 */
> -		__debug_atomic_inc(redundant_hardirqs_on);
> +	if (DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled))
>  		return;
> -	}
>  
>  	/*
>  	 * We're enabling irqs and according to our state above irqs weren't
> @@ -3695,15 +3688,8 @@ void noinstr lockdep_hardirqs_on(unsigned long ip)
>  	if (unlikely(!debug_locks || curr->lockdep_recursion))
>  		return;
>  
> -	if (curr->hardirqs_enabled) {
> -		/*
> -		 * Neither irq nor preemption are disabled here
> -		 * so this is racy by nature but losing one hit
> -		 * in a stat is not a big deal.
> -		 */
> -		__debug_atomic_inc(redundant_hardirqs_on);
> +	if (DEBUG_LOCKS_WARN_ON(curr->hardirqs_enabled))
>  		return;
> -	}
>  
>  	/*
>  	 * We're enabling irqs and according to our state above irqs weren't
> @@ -3738,6 +3724,9 @@ void noinstr lockdep_hardirqs_off(unsigned long ip)
>  	if (unlikely(!debug_locks || curr->lockdep_recursion))
>  		return;
>  
> +	if (DEBUG_LOCKS_WARN_ON(!curr->hardirqs_enabled))
> +		return;
> +
>  	/*
>  	 * So we're supposed to get called after you mask local IRQs, but for
>  	 * some reason the hardware doesn't quite think you did a proper job.
> @@ -3745,17 +3734,13 @@ void noinstr lockdep_hardirqs_off(unsigned long ip)
>  	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
>  		return;
>  
> -	if (curr->hardirqs_enabled) {
> -		/*
> -		 * We have done an ON -> OFF transition:
> -		 */
> -		curr->hardirqs_enabled = 0;
> -		curr->hardirq_disable_ip = ip;
> -		curr->hardirq_disable_event = ++curr->irq_events;
> -		debug_atomic_inc(hardirqs_off_events);
> -	} else {
> -		debug_atomic_inc(redundant_hardirqs_off);
> -	}
> +	/*
> +	 * We have done an ON -> OFF transition:
> +	 */
> +	curr->hardirqs_enabled = 0;
> +	curr->hardirq_disable_ip = ip;
> +	curr->hardirq_disable_event = ++curr->irq_events;
> +	debug_atomic_inc(hardirqs_off_events);
>  }
>  EXPORT_SYMBOL_GPL(lockdep_hardirqs_off);
>  
> @@ -3769,6 +3754,9 @@ void lockdep_softirqs_on(unsigned long ip)
>  	if (unlikely(!debug_locks || current->lockdep_recursion))
>  		return;
>  
> +	if (DEBUG_LOCKS_WARN_ON(curr->softirqs_enabled))
> +		return;
> +
>  	/*
>  	 * We fancy IRQs being disabled here, see softirq.c, avoids
>  	 * funny state and nesting things.
> @@ -3776,11 +3764,6 @@ void lockdep_softirqs_on(unsigned long ip)
>  	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
>  		return;
>  
> -	if (curr->softirqs_enabled) {
> -		debug_atomic_inc(redundant_softirqs_on);
> -		return;
> -	}
> -
>  	current->lockdep_recursion++;
>  	/*
>  	 * We'll do an OFF -> ON transition:
> @@ -3809,26 +3792,26 @@ void lockdep_softirqs_off(unsigned long ip)
>  	if (unlikely(!debug_locks || current->lockdep_recursion))
>  		return;
>  
> +	if (DEBUG_LOCKS_WARN_ON(!curr->softirqs_enabled))
> +		return;
> +
>  	/*
>  	 * We fancy IRQs being disabled here, see softirq.c
>  	 */
>  	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
>  		return;
>  
> -	if (curr->softirqs_enabled) {
> -		/*
> -		 * We have done an ON -> OFF transition:
> -		 */
> -		curr->softirqs_enabled = 0;
> -		curr->softirq_disable_ip = ip;
> -		curr->softirq_disable_event = ++curr->irq_events;
> -		debug_atomic_inc(softirqs_off_events);
> -		/*
> -		 * Whoops, we wanted softirqs off, so why aren't they?
> -		 */
> -		DEBUG_LOCKS_WARN_ON(!softirq_count());
> -	} else
> -		debug_atomic_inc(redundant_softirqs_off);
> +	/*
> +	 * We have done an ON -> OFF transition:
> +	 */
> +	curr->softirqs_enabled = 0;
> +	curr->softirq_disable_ip = ip;
> +	curr->softirq_disable_event = ++curr->irq_events;
> +	debug_atomic_inc(softirqs_off_events);
> +	/*
> +	 * Whoops, we wanted softirqs off, so why aren't they?
> +	 */
> +	DEBUG_LOCKS_WARN_ON(!softirq_count());
>  }
>  
>  static int
> @@ -5684,6 +5667,11 @@ void __init lockdep_init(void)
>  
>  	printk(" per task-struct memory footprint: %zu bytes\n",
>  	       sizeof(((struct task_struct *)NULL)->held_locks));
> +
> +	WARN_ON(irqs_disabled());
> +
> +	current->hardirqs_enabled = 1;
> +	current->softirqs_enabled = 1;
>  }
>  
>  static void
> diff --git a/kernel/locking/lockdep_internals.h b/kernel/locking/lockdep_internals.h
> index baca699b94e9..6dd8b1f06dc4 100644
> --- a/kernel/locking/lockdep_internals.h
> +++ b/kernel/locking/lockdep_internals.h
> @@ -180,12 +180,8 @@ struct lockdep_stats {
>  	unsigned int   chain_lookup_misses;
>  	unsigned long  hardirqs_on_events;
>  	unsigned long  hardirqs_off_events;
> -	unsigned long  redundant_hardirqs_on;
> -	unsigned long  redundant_hardirqs_off;
>  	unsigned long  softirqs_on_events;
>  	unsigned long  softirqs_off_events;
> -	unsigned long  redundant_softirqs_on;
> -	unsigned long  redundant_softirqs_off;
>  	int            nr_unused_locks;
>  	unsigned int   nr_redundant_checks;
>  	unsigned int   nr_redundant;
> diff --git a/kernel/locking/lockdep_proc.c b/kernel/locking/lockdep_proc.c
> index 5525cd3ba0c8..98f204220ed9 100644
> --- a/kernel/locking/lockdep_proc.c
> +++ b/kernel/locking/lockdep_proc.c
> @@ -172,12 +172,8 @@ static void lockdep_stats_debug_show(struct seq_file *m)
>  #ifdef CONFIG_DEBUG_LOCKDEP
>  	unsigned long long hi1 = debug_atomic_read(hardirqs_on_events),
>  			   hi2 = debug_atomic_read(hardirqs_off_events),
> -			   hr1 = debug_atomic_read(redundant_hardirqs_on),
> -			   hr2 = debug_atomic_read(redundant_hardirqs_off),
>  			   si1 = debug_atomic_read(softirqs_on_events),
> -			   si2 = debug_atomic_read(softirqs_off_events),
> -			   sr1 = debug_atomic_read(redundant_softirqs_on),
> -			   sr2 = debug_atomic_read(redundant_softirqs_off);
> +			   si2 = debug_atomic_read(softirqs_off_events);
>  
>  	seq_printf(m, " chain lookup misses:           %11llu\n",
>  		debug_atomic_read(chain_lookup_misses));
> @@ -196,12 +192,8 @@ static void lockdep_stats_debug_show(struct seq_file *m)
>  
>  	seq_printf(m, " hardirq on events:             %11llu\n", hi1);
>  	seq_printf(m, " hardirq off events:            %11llu\n", hi2);
> -	seq_printf(m, " redundant hardirq ons:         %11llu\n", hr1);
> -	seq_printf(m, " redundant hardirq offs:        %11llu\n", hr2);
>  	seq_printf(m, " softirq on events:             %11llu\n", si1);
>  	seq_printf(m, " softirq off events:            %11llu\n", si2);
> -	seq_printf(m, " redundant softirq ons:         %11llu\n", sr1);
> -	seq_printf(m, " redundant softirq offs:        %11llu\n", sr2);
>  #endif
>  }
>  
> 

-- 
Alexey

^ permalink raw reply

* Re: [PATCH] ASoC: fsl_sai: Clean code for synchronize mode
From: Nicolin Chen @ 2020-08-04  8:13 UTC (permalink / raw)
  To: Shengjiu Wang
  Cc: Linux-ALSA, Timur Tabi, Xiubo Li, Fabio Estevam, Shengjiu Wang,
	Takashi Iwai, Liam Girdwood, Mark Brown, linuxppc-dev,
	linux-kernel
In-Reply-To: <CAA+D8ANodghXDbUVOqpf9uq8A5FVbDFEFkf4dWdyMUNDTPaJ7A@mail.gmail.com>

On Tue, Aug 04, 2020 at 03:53:51PM +0800, Shengjiu Wang wrote:

> > >                 /* Check if the opposite FRDE is also disabled */
> > >                 regmap_read(sai->regmap, FSL_SAI_xCSR(!tx, ofs), &xcsr);
> > > +               if (sai->synchronous[tx] && !sai->synchronous[!tx] && !(xcsr & FSL_SAI_CSR_FRDE))
> > > +                       fsl_sai_config_disable(sai, !tx);
> >
> > > +               if (sai->synchronous[tx] || !sai->synchronous[!tx] || !(xcsr & FSL_SAI_CSR_FRDE))
> > > +                       fsl_sai_config_disable(sai, tx);
> >
> > The first "||" should probably be "&&".
> 
> No. it is !(!sai->synchronous[tx] && sai->synchronous[!tx] && (xcsr &
> FSL_SAI_CSR_FRDE))
> so then convert to
> (sai->synchronous[tx] || !sai->synchronous[!tx] || !(xcsr & FSL_SAI_CSR_FRDE))
> 
> if change to &&, then it won't work for:
> sai->synchronous[tx] = false, sai->synchronous[!tx]=false.

Ahh..probably should be
if (!(sync[dir] && !sync[adir]) || !frde)

I have a (seemingly) correct version in my sample code.

And...please untangle the logic using the given example -- adding
helper function(s) and comments. And, though the driver does have
places using array[tx] and array[!tx], better not to use any more
boolean type variable as an array index, as it's hard to read.

^ permalink raw reply

* Re: [PATCH] ASoC: fsl_sai: Clean code for synchronize mode
From: Shengjiu Wang @ 2020-08-04  7:53 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Linux-ALSA, Timur Tabi, Xiubo Li, Fabio Estevam, Shengjiu Wang,
	Takashi Iwai, Liam Girdwood, Mark Brown, linuxppc-dev,
	linux-kernel
In-Reply-To: <20200804070345.GA27658@Asurada-Nvidia>

On Tue, Aug 4, 2020 at 3:04 PM Nicolin Chen <nicoleotsuka@gmail.com> wrote:
>
> On Tue, Aug 04, 2020 at 12:22:53PM +0800, Shengjiu Wang wrote:
>
> > > > Btw, do we need similar change for TRIGGER_STOP?
> > >
> > > This is a good question. It is better to do change for STOP,
> > > but I am afraid that there is no much test to guarantee the result.
>
> > Maybe we can do this change for STOP.
>
> The idea looks good to me...(check inline comments)
>
> > diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c
> > index 1c0e06bb3783..6e4be398eaee 100644
> > --- a/sound/soc/fsl/fsl_sai.c
> > +++ b/sound/soc/fsl/fsl_sai.c
> > @@ -517,6 +517,37 @@ static int fsl_sai_hw_free(struct
> > snd_pcm_substream *substream,
> >         return 0;
> >  }
> >
> > +static void fsl_sai_config_disable(struct fsl_sai *sai, bool tx)
> > +{
> > +       unsigned int ofs = sai->soc_data->reg_offset;
> > +       u32 xcsr, count = 100;
> > +
> > +       regmap_update_bits(sai->regmap, FSL_SAI_xCSR(tx, ofs),
> > +                          FSL_SAI_CSR_TERE, 0);
> > +
> > +       /* TERE will remain set till the end of current frame */
> > +       do {
> > +               udelay(10);
> > +               regmap_read(sai->regmap, FSL_SAI_xCSR(tx, ofs), &xcsr);
> > +       } while (--count && xcsr & FSL_SAI_CSR_TERE);
> > +
> > +       regmap_update_bits(sai->regmap, FSL_SAI_xCSR(tx, ofs),
> > +                          FSL_SAI_CSR_FR, FSL_SAI_CSR_FR);
> > +
> > +       /*
> > +        * For sai master mode, after several open/close sai,
> > +        * there will be no frame clock, and can't recover
> > +        * anymore. Add software reset to fix this issue.
> > +        * This is a hardware bug, and will be fix in the
> > +        * next sai version.
> > +        */
> > +       if (!sai->is_slave_mode) {
> > +               /* Software Reset for both Tx and Rx */
>
> Remove "for both Tx and Rx"
>
> >                 /* Check if the opposite FRDE is also disabled */
> >                 regmap_read(sai->regmap, FSL_SAI_xCSR(!tx, ofs), &xcsr);
> > +               if (sai->synchronous[tx] && !sai->synchronous[!tx] && !(xcsr & FSL_SAI_CSR_FRDE))
> > +                       fsl_sai_config_disable(sai, !tx);
>
> > +               if (sai->synchronous[tx] || !sai->synchronous[!tx] || !(xcsr & FSL_SAI_CSR_FRDE))
> > +                       fsl_sai_config_disable(sai, tx);
>
> The first "||" should probably be "&&".

No. it is !(!sai->synchronous[tx] && sai->synchronous[!tx] && (xcsr &
FSL_SAI_CSR_FRDE))
so then convert to
(sai->synchronous[tx] || !sai->synchronous[!tx] || !(xcsr & FSL_SAI_CSR_FRDE))

if change to &&, then it won't work for:
sai->synchronous[tx] = false, sai->synchronous[!tx]=false.
or
sai->synchronous[tx] = true, sai->synchronous[!tx]=true.
(even we don't have such a case).

>
> The trigger() logic is way more complicated than a simple cleanup
> in my opinion. Would suggest to split RMR part out of this change.
>
> And for conditions like "sync[tx] && !sync[!tx]", it'd be better
> to have a helper function to improve readability:
>
> /**
>  * fsl_sai_dir_is_synced - Check if stream is synced by the opposite stream
>  *
>  * SAI supports synchronous mode using bit/frame clocks of either Transmitter's
>  * or Receiver's for both streams. This function is used to check if clocks of
>  * current stream's are synced by the opposite stream.
>  *
>  * @sai: SAI context
>  * @dir: direction of current stream
>  */
> static inline bool fsl_sai_dir_is_synced(struct fsl_sai *sai, int dir)
> {
>         int adir = (dir == TX) ? RX : TX;
>
>         /* current dir in async mode while opposite dir in sync mode */
>         return !sai->synchronous[dir] && sai->synchronous[adir];
> }
>
> Then add more comments in trigger:
>
> static ...trigger()
> {
>         bool tx = substream->stream == SNDRV_PCM_STREAM_PLAYBACK;
>         int adir = tx ? RX : TX;
>         int dir = tx ? TX : RX;
>
>         // ....
>         {
>                 // ...
>
>                 /* Check if the opposite FRDE is also disabled */
>                 regmap_read(sai->regmap, FSL_SAI_xCSR(!tx, ofs), &xcsr);
>
>                 /*
>                  * If opposite stream provides clocks for synchronous mode and
>                  * it is inactive, disable it before disabling the current one
>                  */
>                 if (fsl_sai_dir_is_synced(adir) && !(xcsr & FSL_SAI_CSR_FRDE))
>                         fsl_sai_config_disable(sai, adir);
>
>                 /*
>                  * Disable current stream if either of:
>                  * 1. current stream doesn't provide clocks for synchronous mode
>                  * 2. current stream provides clocks for synchronous mode but no
>                  *    more stream is active.
>                  */
>                 if (!fsl_sai_dir_is_synced(dir) || !(xcsr & FSL_SAI_CSR_FRDE))
>                         fsl_sai_config_disable(sai, dir);
>
>                 // ...
>         }
>         // ....
> }
>
> Note, in fsl_sai_config_disable(sai, dir):
>         bool tx = dir == TX;
>
> Above all, I am just drafting, so please test it thoroughly :)

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox