LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 05/13] cpufreq/arch: powerpc: pasemi: Move prototypes to shared header
From: Lee Jones @ 2020-07-15  6:33 UTC (permalink / raw)
  To: Olof Johansson
  Cc: linux-pm, Rafael J. Wysocki, Linux Kernel Mailing List,
	Paul Mackerras, viresh kumar, linuxppc-dev,
	Linux ARM Mailing List
In-Reply-To: <CAOesGMjCZMyWLe+tpNFstC88odeSCKS8bM6Oj9cpaj6j7U94rQ@mail.gmail.com>

On Tue, 14 Jul 2020, Olof Johansson wrote:

> On Tue, Jul 14, 2020 at 7:50 AM Lee Jones <lee.jones@linaro.org> wrote:
> >
> > If function callers and providers do not share the same prototypes the
> > compiler complains of missing prototypes.  Fix this by moving the
> > already existing prototypes out to a mutually convenient location.
> >
> > Fixes the following W=1 kernel build warning(s):
> >
> >  drivers/cpufreq/pasemi-cpufreq.c:109:5: warning: no previous prototype for ‘check_astate’ [-Wmissing-prototypes]
> >  109 | int check_astate(void)
> >  | ^~~~~~~~~~~~
> >  drivers/cpufreq/pasemi-cpufreq.c:114:6: warning: no previous prototype for ‘restore_astate’ [-Wmissing-prototypes]
> >  114 | void restore_astate(int cpu)
> >  | ^~~~~~~~~~~~~~
> >
> > Cc: Olof Johansson <olof@lixom.net>
> > Cc: Michael Ellerman <mpe@ellerman.id.au>
> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Cc: Paul Mackerras <paulus@samba.org>
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Signed-off-by: Lee Jones <lee.jones@linaro.org>
> > ---
> >  arch/powerpc/platforms/pasemi/pasemi.h    | 15 ------------
> >  arch/powerpc/platforms/pasemi/powersave.S |  2 ++
> >  drivers/cpufreq/pasemi-cpufreq.c          |  1 +
> >  include/linux/platform_data/pasemi.h      | 28 +++++++++++++++++++++++
> >  4 files changed, 31 insertions(+), 15 deletions(-)
> >  create mode 100644 include/linux/platform_data/pasemi.h
> >
> > diff --git a/arch/powerpc/platforms/pasemi/pasemi.h b/arch/powerpc/platforms/pasemi/pasemi.h
> > index 70b56048ed1be..528d81ef748ad 100644
> > --- a/arch/powerpc/platforms/pasemi/pasemi.h
> > +++ b/arch/powerpc/platforms/pasemi/pasemi.h
> > @@ -15,21 +15,6 @@ extern void __init pasemi_map_registers(void);
> >  extern void idle_spin(void);
> >  extern void idle_doze(void);
> >
> > -/* Restore astate to last set */
> > -#ifdef CONFIG_PPC_PASEMI_CPUFREQ
> > -extern int check_astate(void);
> > -extern void restore_astate(int cpu);
> > -#else
> > -static inline int check_astate(void)
> > -{
> > -       /* Always return >0 so we never power save */
> > -       return 1;
> > -}
> > -static inline void restore_astate(int cpu)
> > -{
> > -}
> > -#endif
> > -
> >  extern struct pci_controller_ops pasemi_pci_controller_ops;
> >
> >  #endif /* _PASEMI_PASEMI_H */
> > diff --git a/arch/powerpc/platforms/pasemi/powersave.S b/arch/powerpc/platforms/pasemi/powersave.S
> > index d0215d5329ca7..7747b48963286 100644
> > --- a/arch/powerpc/platforms/pasemi/powersave.S
> > +++ b/arch/powerpc/platforms/pasemi/powersave.S
> > @@ -5,6 +5,8 @@
> >   * Maintained by: Olof Johansson <olof@lixom.net>
> >   */
> >
> > +#include <linux/platform_data/pasemi.h>
> > +
> >  #include <asm/processor.h>
> >  #include <asm/page.h>
> >  #include <asm/ppc_asm.h>
> > diff --git a/drivers/cpufreq/pasemi-cpufreq.c b/drivers/cpufreq/pasemi-cpufreq.c
> > index c66f566a854cb..c6bb3ecc90ef3 100644
> > --- a/drivers/cpufreq/pasemi-cpufreq.c
> > +++ b/drivers/cpufreq/pasemi-cpufreq.c
> > @@ -15,6 +15,7 @@
> >  #include <linux/timer.h>
> >  #include <linux/module.h>
> >  #include <linux/of_address.h>
> > +#include <linux/platform_data/pasemi.h>
> >
> >  #include <asm/hw_irq.h>
> >  #include <asm/io.h>
> > diff --git a/include/linux/platform_data/pasemi.h b/include/linux/platform_data/pasemi.h
> > new file mode 100644
> > index 0000000000000..3fed0687fcc9a
> > --- /dev/null
> > +++ b/include/linux/platform_data/pasemi.h
> > @@ -0,0 +1,28 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (C) 2020 Linaro Ltd.
> > + *
> > + * Author: Lee Jones <lee.jones@linaro.org>
> > + */
> 
> Absolutely not. It's neither your copyright, nor your authorship.

The file was new.  Anyway, the point is now moot.

-- 
Lee Jones [李琼斯]
Senior Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply

* [powerpc:merge] BUILD SUCCESS 93c756af92e11df5c1a8be7a8360f28a97f7ad3f
From: kernel test robot @ 2020-07-15  6:31 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  merge
branch HEAD: 93c756af92e11df5c1a8be7a8360f28a97f7ad3f  Automatic merge of 'master', 'next' and 'fixes' (2020-07-15 00:25)

elapsed time: 939m

configs tested: 104
configs skipped: 6

The following configs have been built successfully.
More configs may be tested in the coming days.

arm                                 defconfig
arm                              allyesconfig
arm                              allmodconfig
arm                               allnoconfig
arm64                            allyesconfig
arm64                               defconfig
arm64                            allmodconfig
arm64                             allnoconfig
sh                           se7724_defconfig
sh                          sdk7786_defconfig
arm                          tango4_defconfig
sh                            migor_defconfig
arm                         s3c6400_defconfig
m68k                         amcore_defconfig
i386                             allyesconfig
c6x                               allnoconfig
xtensa                generic_kc705_defconfig
mips                           ip27_defconfig
mips                  mips_paravirt_defconfig
c6x                        evmc6474_defconfig
powerpc                      pasemi_defconfig
sh                        dreamcast_defconfig
openrisc                         alldefconfig
m68k                        m5407c3_defconfig
arm                          ep93xx_defconfig
arm                          moxart_defconfig
sparc                               defconfig
mips                        maltaup_defconfig
mips                          rb532_defconfig
powerpc                mpc7448_hpc2_defconfig
mips                        nlm_xlp_defconfig
sh                 kfr2r09-romimage_defconfig
openrisc                            defconfig
xtensa                          iss_defconfig
arm                  colibri_pxa270_defconfig
ia64                      gensparse_defconfig
i386                              allnoconfig
i386                                defconfig
i386                              debian-10.3
ia64                             allmodconfig
ia64                                defconfig
ia64                              allnoconfig
ia64                             allyesconfig
m68k                             allmodconfig
m68k                              allnoconfig
m68k                           sun3_defconfig
m68k                                defconfig
m68k                             allyesconfig
nios2                               defconfig
nios2                            allyesconfig
c6x                              allyesconfig
openrisc                         allyesconfig
nds32                               defconfig
nds32                             allnoconfig
csky                             allyesconfig
csky                                defconfig
alpha                               defconfig
alpha                            allyesconfig
xtensa                           allyesconfig
h8300                            allyesconfig
h8300                            allmodconfig
xtensa                              defconfig
arc                                 defconfig
arc                              allyesconfig
sh                               allmodconfig
sh                                allnoconfig
microblaze                        allnoconfig
mips                             allyesconfig
mips                              allnoconfig
mips                             allmodconfig
parisc                            allnoconfig
parisc                              defconfig
parisc                           allyesconfig
parisc                           allmodconfig
powerpc                             defconfig
powerpc                          allyesconfig
powerpc                          rhel-kconfig
powerpc                          allmodconfig
powerpc                           allnoconfig
i386                 randconfig-a016-20200714
i386                 randconfig-a011-20200714
i386                 randconfig-a015-20200714
i386                 randconfig-a012-20200714
i386                 randconfig-a013-20200714
i386                 randconfig-a014-20200714
riscv                            allyesconfig
riscv                             allnoconfig
riscv                               defconfig
riscv                            allmodconfig
s390                             allyesconfig
s390                              allnoconfig
s390                             allmodconfig
s390                                defconfig
sparc                            allyesconfig
sparc64                             defconfig
sparc64                           allnoconfig
sparc64                          allyesconfig
sparc64                          allmodconfig
x86_64                    rhel-7.6-kselftests
x86_64                               rhel-8.3
x86_64                                  kexec
x86_64                                   rhel
x86_64                                    lkp
x86_64                              fedora-25

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply

* Re: [PATCH v3] powerpc/perf: Use SIER_USER_MASK while updating SPRN_SIER for EBB events
From: Athira Rajeev @ 2020-07-15  6:13 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: maddy, linuxppc-dev
In-Reply-To: <87ft9uvdad.fsf@mpe.ellerman.id.au>

[-- Attachment #1: Type: text/plain, Size: 3073 bytes --]



> On 14-Jul-2020, at 11:38 AM, Michael Ellerman <mpe@ellerman.id.au> wrote:
> 
> Athira Rajeev <atrajeev@linux.vnet.ibm.com <mailto:atrajeev@linux.vnet.ibm.com>> writes:
>>> On 19-Mar-2020, at 4:22 PM, Michael Ellerman <mpe@ellerman.id.au> wrote:
>>> 
>>> Hi Athira,
>>> 
>>> Athira Rajeev <atrajeev@linux.vnet.ibm.com> writes:
>>>> Sampled Instruction Event Register (SIER), is a PMU register,
>>>                                                              ^
>>>                                                              that
>>>> captures architecture state for a given sample. And sier_user_mask
>>>          ^                                          ^
>>>          don't think we need "architecture"         SIER_USER_MASK
>>> 
>>>> defined in commit 330a1eb7775b ("powerpc/perf: Core EBB support for 64-bit
>>>> book3s") defines the architected bits that needs to be saved from the SPR.
>>> 
>>> Not quite, it defines the bits that are visible to userspace.
>>> 
>>> And I think it's true that for EBB events the bits we need/want to save
>>> are only the user visible bits.
>>> 
>>>> Currently all of the bits from SIER are saved for EBB events. Patch fixes
>>>> this by ANDing the "sier_user_mask" to data from SIER in ebb_switch_out().
>>>> This will force save only architected bits from the SIER.
>>> 
>>> s/architected/user visible/
>>> 
>>> 
>>> But, why does it matter? The kernel saves the user visible bits, as well
>>> as the kernel-only bits into the thread struct. And then later the
>>> kernel restores that value into the hardware before returning to
>>> userspace.
>>> 
>>> But the hardware enforces the visibility of the bits, so userspace can't
>>> observe any bits that it shouldn't.
>>> 
>>> Or is there some other mechanism whereby userspace can see those bits? ;)
>>> 
>>> If there was, what would the security implications of that be?
>> 
>> Hi Michael,
>> 
>> Thanks for your comments. 
>> 
>> In ebb_switch_in, we set PMCC bit [MMCR0 44:45 ] to 10 which means
>> SIER ( Group B ) register is readable in problem state. Hence the
>> intention of the patch was to make sure we are not exposing the bits
>> which the userspace shouldn't be reading.
>> 
>> But following your comment about "hardware enforcing the visibility of
>> bits", I did try an "ebb" experiment which showed that reading
>> SPRN_SIER didn't expose any bits other than the user visible bits.
>> Sorry for the confusion here.
> 
> That's OK. Thanks for following my trail of clues :)
> 

Sure, Thanks for your inputs. I will come back on this after checking on information exposed via Ptrace

>> In that case, Can we drop the existing definition of SIER_USER_MASK if
>> it is no more needed ?
> 
> I think it is still needed, and I think this change to use it is good, because
> SIER is visible via ptrace.
> 
> What we need to do, is look at what information in SIER we are currently
> exposing to userspace via ptrace, and what the security implications (if
> any) of that are.
> 
> cheers


[-- Attachment #2: Type: text/html, Size: 13939 bytes --]

^ permalink raw reply

* Re: [PATCH 15/15] powerpc/powernv/sriov: Make single PE mode a per-BAR setting
From: Oliver O'Halloran @ 2020-07-15  6:16 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev
In-Reply-To: <c89dfd04-afc2-4d69-00ab-2e743d5bb844@ozlabs.ru>

On Wed, Jul 15, 2020 at 3:24 PM Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>
>
> > @@ -158,9 +157,9 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
> >               goto disable_iov;
> >       pdev->dev.archdata.iov_data = iov;
> >
> > +     /* FIXME: totalvfs > phb->ioda.total_pe_num is going to be a problem */
>
>
> WARN_ON_ONCE() then?

can't hurt

> > @@ -173,50 +172,51 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
> >                       goto disable_iov;
> >               }
> >
> > -             total_vf_bar_sz += pci_iov_resource_size(pdev,
> > -                             i + PCI_IOV_RESOURCES);
> > +             vf_bar_sz = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
> >
> >               /*
> > -              * If bigger than quarter of M64 segment size, just round up
> > -              * power of two.
> > +              * Generally, one segmented M64 BAR maps one IOV BAR. However,
> > +              * if a VF BAR is too large we end up wasting a lot of space.
> > +              * If we've got a BAR that's bigger than greater than 1/4 of the
>
>
> bigger, greater, huger? :)
>
> Also, a nit: s/got a BAR/got a VF BAR/

whatever, it's just words

> > +              * default window's segment size then switch to using single PE
> > +              * windows. This limits the total number of VFs we can support.
>
> Just to get idea about absolute numbers here.
>
> On my P9:
>
> ./pciex@600c3c0300000/ibm,opal-m64-window
>                  00060200 00000000 00060200 00000000 00000040 00000000
>
> so that default window's segment size is 0x40.0000.0000/512 = 512MB?

Yeah. It'll vary a bit since PHB3 and some PHB4s have 256.

> >                *
> > -              * Generally, one M64 BAR maps one IOV BAR. To avoid conflict
> > -              * with other devices, IOV BAR size is expanded to be
> > -              * (total_pe * VF_BAR_size).  When VF_BAR_size is half of M64
> > -              * segment size , the expanded size would equal to half of the
> > -              * whole M64 space size, which will exhaust the M64 Space and
> > -              * limit the system flexibility.  This is a design decision to
> > -              * set the boundary to quarter of the M64 segment size.
> > +              * The 1/4 limit is arbitrary and can be tweaked.
> >                */
> > -             if (total_vf_bar_sz > gate) {
> > -                     mul = roundup_pow_of_two(total_vfs);
> > -                     dev_info(&pdev->dev,
> > -                             "VF BAR Total IOV size %llx > %llx, roundup to %d VFs\n",
> > -                             total_vf_bar_sz, gate, mul);
> > -                     iov->m64_single_mode = true;
> > -                     break;
> > -             }
> > -     }
> > +             if (vf_bar_sz > (phb->ioda.m64_segsize >> 2)) {
> > +                     /*
> > +                      * On PHB3, the minimum size alignment of M64 BAR in
> > +                      * single mode is 32MB. If this VF BAR is smaller than
> > +                      * 32MB, but still too large for a segmented window
> > +                      * then we can't map it and need to disable SR-IOV for
> > +                      * this device.
>
>
> Why not use single PE mode for such BAR? Better than nothing.

Suppose you could, but I figured VFs were mainly interesting since you
could give each VF to a separate guest. If there's multiple VFs under
the same single PE BAR then they'd have to be assigned to the same
guest in order to retain the freeze/unfreeze behaviour that PAPR
requires. I guess that's how it used to work, but it seems better just
to disable them rather than having VFs which sort of work.

^ permalink raw reply

* Re: [PATCH v2 01/10] powerpc/perf: Add support for ISA3.1 PMU SPRs
From: Athira Rajeev @ 2020-07-15  6:07 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: Michael Neuling, maddy, linuxppc-dev
In-Reply-To: <87mu43vate.fsf@mpe.ellerman.id.au>

[-- Attachment #1: Type: text/plain, Size: 2503 bytes --]



> On 13-Jul-2020, at 6:20 PM, Michael Ellerman <mpe@ellerman.id.au> wrote:
> 
> Athira Rajeev <atrajeev@linux.vnet.ibm.com <mailto:atrajeev@linux.vnet.ibm.com>> writes:
>>> On 08-Jul-2020, at 4:32 PM, Michael Ellerman <mpe@ellerman.id.au> wrote:
>>> 
>>> Athira Rajeev <atrajeev@linux.vnet.ibm.com> writes:
>>> ...
>>>> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
>>>> index cd6a742..5c64bd3 100644
>>>> --- a/arch/powerpc/perf/core-book3s.c
>>>> +++ b/arch/powerpc/perf/core-book3s.c
>>>> @@ -39,10 +39,10 @@ struct cpu_hw_events {
>>>> 	unsigned int flags[MAX_HWEVENTS];
>>>> 	/*
>>>> 	 * The order of the MMCR array is:
>>>> -	 *  - 64-bit, MMCR0, MMCR1, MMCRA, MMCR2
>>>> +	 *  - 64-bit, MMCR0, MMCR1, MMCRA, MMCR2, MMCR3
>>>> 	 *  - 32-bit, MMCR0, MMCR1, MMCR2
>>>> 	 */
>>>> -	unsigned long mmcr[4];
>>>> +	unsigned long mmcr[5];
>>>> 	struct perf_event *limited_counter[MAX_LIMITED_HWCOUNTERS];
>>>> 	u8  limited_hwidx[MAX_LIMITED_HWCOUNTERS];
>>>> 	u64 alternatives[MAX_HWEVENTS][MAX_EVENT_ALTERNATIVES];
>>> ...
>>>> @@ -1310,6 +1326,10 @@ static void power_pmu_enable(struct pmu *pmu)
>>>> 	if (!cpuhw->n_added) {
>>>> 		mtspr(SPRN_MMCRA, cpuhw->mmcr[2] & ~MMCRA_SAMPLE_ENABLE);
>>>> 		mtspr(SPRN_MMCR1, cpuhw->mmcr[1]);
>>>> +#ifdef CONFIG_PPC64
>>>> +		if (ppmu->flags & PPMU_ARCH_310S)
>>>> +			mtspr(SPRN_MMCR3, cpuhw->mmcr[4]);
>>>> +#endif /* CONFIG_PPC64 */
>>>> 		goto out_enable;
>>>> 	}
>>>> 
>>>> @@ -1353,6 +1373,11 @@ static void power_pmu_enable(struct pmu *pmu)
>>>> 	if (ppmu->flags & PPMU_ARCH_207S)
>>>> 		mtspr(SPRN_MMCR2, cpuhw->mmcr[3]);
>>>> 
>>>> +#ifdef CONFIG_PPC64
>>>> +	if (ppmu->flags & PPMU_ARCH_310S)
>>>> +		mtspr(SPRN_MMCR3, cpuhw->mmcr[4]);
>>>> +#endif /* CONFIG_PPC64 */
>>> 
>>> I don't think you need the #ifdef CONFIG_PPC64?
>> 
>> Hi Michael
>> 
>> Thanks for reviewing this series.
>> 
>> SPRN_MMCR3 is not defined for PPC32 and we hit build failure for pmac32_defconfig.
>> The #ifdef CONFIG_PPC64 is to address this.
> 
> We like to avoid #ifdefs in the body of the code like that.
> 
> There's a bunch of existing #defines near the top of the file to make
> 32-bit work, I think you should just add another for this, so eg:
> 
> #ifdef CONFIG_PPC32
> ...
> #define SPRN_MMCR3	0

Ok Ok. Found that currently we do same way as you mentioned for MMCRA which is not supported for 32-bit
I will work on this change

Thanks
Athira 
> 
> cheers


[-- Attachment #2: Type: text/html, Size: 14484 bytes --]

^ permalink raw reply

* Re: [PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Masami Hiramatsu @ 2020-07-15  5:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
	Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
	Andy Lutomirski, Thomas Gleixner, Jiri Kosina, Anup Patel,
	Linux Kernel Mailing List, Philipp Rudo, Torsten Duwe,
	Masami Hiramatsu, Andrew Morton, Mark Rutland,
	James E.J. Bottomley, Vincent Chen, Omar Sandoval, open list:S390,
	Joe Lawrence, Helge Deller, John Fastabend, Anil S Keshavamurthy,
	Yonghong Song, Iurii Zaikin, Andrii Nakryiko, Thomas Huth,
	Vasily Gorbik, moderated list:ARM PORT, Daniel Axtens,
	Damien Le Moal, Martin KaFai Lau, Song Liu, Paul Walmsley,
	Heiko Carstens, Alexei Starovoitov, Jarkko Sakkinen, Atish Patra,
	Will Deacon, Daniel Borkmann, Masahiro Yamada, Nayna Jain,
	Ley Foon Tan, Christian Borntraeger, Sami Tolvanen, Naveen N. Rao,
	Mao Han, Marco Elver, Babu Moger, Borislav Petkov, Greentime Hu,
	Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
	open list:PARISC ARCHITECTURE, Jessica Yu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
	Thiago Jung Bauermann, Peter Zijlstra, David Howells,
	open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
	H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
	Jiri Olsa, Ard Biesheuvel, Vincenzo Frascino, Anders Roxell,
	Sven Schnelle, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Russell King, open list:RISC-V ARCHITECTURE, Mike Rapoport,
	Ingo Molnar, Albert Ou, Paul E. McKenney, Josh Poimboeuf,
	KP Singh, Dmitry Vyukov, Nick Hu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
	Palmer Dabbelt, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20200714100345.38354010@oasis.local.home>

On Tue, 14 Jul 2020 10:03:45 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Tue, 14 Jul 2020 14:33:14 +0100
> Mark Rutland <mark.rutland@arm.com> wrote:
> 
> > > Should work for all cases. Yes, we might then want something like a per
> > > arch:
> > > 
> > > 	{BPF,FTRACE,KPROBE}_TEXT_TYPE  
> > 
> > ... at that point why not:
> > 
> >   text_alloc_ftrace();
> >   text_alloc_module();
> >   text_alloc_bpf();
> >   text_alloc_kprobe();
> 
> I don't know about bpf and kprobes, but for ftrace, the only place that
> it allocates text happens to be in arch specific code.
> 
> If you want something special for ftrace, you could just add your own
> function. But for x86, a text_alloc_immediate() would work.
> (BTW, I like the function names over the enums)

kprobes will need the text_alloc_immediate() too, since it will use
the trampoline buffer where jumps to/from kernel code/modules.

Thanks,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH 15/15] powerpc/powernv/sriov: Make single PE mode a per-BAR setting
From: Alexey Kardashevskiy @ 2020-07-15  5:24 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev
In-Reply-To: <20200710052340.737567-16-oohall@gmail.com>



On 10/07/2020 15:23, Oliver O'Halloran wrote:
> Using single PE BARs to map an SR-IOV BAR is really a choice about what
> strategy to use when mapping a BAR. It doesn't make much sense for this to
> be a global setting since a device might have one large BAR which needs to
> be mapped with single PE windows and another smaller BAR that can be mapped
> with a regular segmented window. Make the segmented vs single decision a
> per-BAR setting and clean up the logic that decides which mode to use.
> 
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
> ---
>  arch/powerpc/platforms/powernv/pci-sriov.c | 131 +++++++++++----------
>  arch/powerpc/platforms/powernv/pci.h       |  10 +-
>  2 files changed, 75 insertions(+), 66 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c b/arch/powerpc/platforms/powernv/pci-sriov.c
> index 8de03636888a..87377d95d648 100644
> --- a/arch/powerpc/platforms/powernv/pci-sriov.c
> +++ b/arch/powerpc/platforms/powernv/pci-sriov.c
> @@ -146,10 +146,9 @@
>  static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>  {
>  	struct pnv_phb *phb = pci_bus_to_pnvhb(pdev->bus);
> -	const resource_size_t gate = phb->ioda.m64_segsize >> 2;
>  	struct resource *res;
>  	int i;
> -	resource_size_t size, total_vf_bar_sz;
> +	resource_size_t vf_bar_sz;
>  	struct pnv_iov_data *iov;
>  	int mul, total_vfs;
>  
> @@ -158,9 +157,9 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>  		goto disable_iov;
>  	pdev->dev.archdata.iov_data = iov;
>  
> +	/* FIXME: totalvfs > phb->ioda.total_pe_num is going to be a problem */


WARN_ON_ONCE() then?


>  	total_vfs = pci_sriov_get_totalvfs(pdev);
>  	mul = phb->ioda.total_pe_num;
> -	total_vf_bar_sz = 0;
>  
>  	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>  		res = &pdev->resource[i + PCI_IOV_RESOURCES];
> @@ -173,50 +172,51 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>  			goto disable_iov;
>  		}
>  
> -		total_vf_bar_sz += pci_iov_resource_size(pdev,
> -				i + PCI_IOV_RESOURCES);
> +		vf_bar_sz = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>  
>  		/*
> -		 * If bigger than quarter of M64 segment size, just round up
> -		 * power of two.
> +		 * Generally, one segmented M64 BAR maps one IOV BAR. However,
> +		 * if a VF BAR is too large we end up wasting a lot of space.
> +		 * If we've got a BAR that's bigger than greater than 1/4 of the


bigger, greater, huger? :)

Also, a nit: s/got a BAR/got a VF BAR/


> +		 * default window's segment size then switch to using single PE
> +		 * windows. This limits the total number of VFs we can support.

Just to get idea about absolute numbers here.

On my P9:

./pciex@600c3c0300000/ibm,opal-m64-window
                 00060200 00000000 00060200 00000000 00000040 00000000

so that default window's segment size is 0x40.0000.0000/512 = 512MB?


>  		 *
> -		 * Generally, one M64 BAR maps one IOV BAR. To avoid conflict
> -		 * with other devices, IOV BAR size is expanded to be
> -		 * (total_pe * VF_BAR_size).  When VF_BAR_size is half of M64
> -		 * segment size , the expanded size would equal to half of the
> -		 * whole M64 space size, which will exhaust the M64 Space and
> -		 * limit the system flexibility.  This is a design decision to
> -		 * set the boundary to quarter of the M64 segment size.
> +		 * The 1/4 limit is arbitrary and can be tweaked.
>  		 */
> -		if (total_vf_bar_sz > gate) {
> -			mul = roundup_pow_of_two(total_vfs);
> -			dev_info(&pdev->dev,
> -				"VF BAR Total IOV size %llx > %llx, roundup to %d VFs\n",
> -				total_vf_bar_sz, gate, mul);
> -			iov->m64_single_mode = true;
> -			break;
> -		}
> -	}
> +		if (vf_bar_sz > (phb->ioda.m64_segsize >> 2)) {
> +			/*
> +			 * On PHB3, the minimum size alignment of M64 BAR in
> +			 * single mode is 32MB. If this VF BAR is smaller than
> +			 * 32MB, but still too large for a segmented window
> +			 * then we can't map it and need to disable SR-IOV for
> +			 * this device.


Why not use single PE mode for such BAR? Better than nothing.


> +			 */
> +			if (vf_bar_sz < SZ_32M) {
> +				pci_err(pdev, "VF BAR%d: %pR can't be mapped in single PE mode\n",
> +					i, res);
> +				goto disable_iov;
> +			}
>  
> -	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> -		res = &pdev->resource[i + PCI_IOV_RESOURCES];
> -		if (!res->flags || res->parent)
> +			iov->m64_single_mode[i] = true;
>  			continue;
> +		}
> +
>  
> -		size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>  		/*
> -		 * On PHB3, the minimum size alignment of M64 BAR in single
> -		 * mode is 32MB.
> +		 * This BAR can be mapped with one segmented window, so adjust
> +		 * te resource size to accommodate.
>  		 */
> -		if (iov->m64_single_mode && (size < SZ_32M))
> -			goto disable_iov;
> +		pci_dbg(pdev, " Fixing VF BAR%d: %pR to\n", i, res);
> +		res->end = res->start + vf_bar_sz * mul - 1;
> +		pci_dbg(pdev, "                       %pR\n", res);
>  
> -		dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res);
> -		res->end = res->start + size * mul - 1;
> -		dev_dbg(&pdev->dev, "                       %pR\n", res);
> -		dev_info(&pdev->dev, "VF BAR%d: %pR (expanded to %d VFs for PE alignment)",
> +		pci_info(pdev, "VF BAR%d: %pR (expanded to %d VFs for PE alignment)",
>  			 i, res, mul);
> +
> +		iov->need_shift = true;
>  	}
> +
> +	// what should this be?
>  	iov->vfs_expanded = mul;
>  
>  	return;
> @@ -260,42 +260,42 @@ void pnv_pci_ioda_fixup_iov(struct pci_dev *pdev)
>  resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>  						      int resno)
>  {
> -	struct pnv_phb *phb = pci_bus_to_pnvhb(pdev->bus);
>  	struct pnv_iov_data *iov = pnv_iov_get(pdev);
>  	resource_size_t align;
>  
> -	/*
> -	 * On PowerNV platform, IOV BAR is mapped by M64 BAR to enable the
> -	 * SR-IOV. While from hardware perspective, the range mapped by M64
> -	 * BAR should be size aligned.
> -	 *
> -	 * When IOV BAR is mapped with M64 BAR in Single PE mode, the extra
> -	 * powernv-specific hardware restriction is gone. But if just use the
> -	 * VF BAR size as the alignment, PF BAR / VF BAR may be allocated with
> -	 * in one segment of M64 #15, which introduces the PE conflict between
> -	 * PF and VF. Based on this, the minimum alignment of an IOV BAR is
> -	 * m64_segsize.
> -	 *
> -	 * This function returns the total IOV BAR size if M64 BAR is in
> -	 * Shared PE mode or just VF BAR size if not.
> -	 * If the M64 BAR is in Single PE mode, return the VF BAR size or
> -	 * M64 segment size if IOV BAR size is less.
> -	 */
> -	align = pci_iov_resource_size(pdev, resno);
> +	int bar_no = resno - PCI_IOV_RESOURCES;
>  
>  	/*
>  	 * iov can be null if we have an SR-IOV device with IOV BAR that can't
>  	 * be placed in the m64 space (i.e. The BAR is 32bit or non-prefetch).
> -	 * In that case we don't allow VFs to be enabled so just return the
> -	 * default alignment.
> +	 * In that case we don't allow VFs to be enabled since one of their
> +	 * BARs would not be placed in the correct PE.
>  	 */
>  	if (!iov)
>  		return align;
>  	if (!iov->vfs_expanded)
>  		return align;
> -	if (iov->m64_single_mode)
> -		return max(align, (resource_size_t)phb->ioda.m64_segsize);
>  
> +	align = pci_iov_resource_size(pdev, resno);
> +
> +	/*
> +	 * If we're using single mode then we can just use the native VF BAR
> +	 * alignment. We validated that it's possible to use a single PE
> +	 * window above when we did the fixup.
> +	 */
> +	if (iov->m64_single_mode[bar_no])
> +		return align;
> +
> +	/*
> +	 * On PowerNV platform, IOV BAR is mapped by M64 BAR to enable the
> +	 * SR-IOV. While from hardware perspective, the range mapped by M64
> +	 * BAR should be size aligned.
> +	 *
> +	 * This function returns the total IOV BAR size if M64 BAR is in
> +	 * Shared PE mode or just VF BAR size if not.
> +	 * If the M64 BAR is in Single PE mode, return the VF BAR size or
> +	 * M64 segment size if IOV BAR size is less.
> +	 */
>  	return iov->vfs_expanded * align;
>  }
>  
> @@ -453,7 +453,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>  			continue;
>  
>  		/* don't need single mode? map everything in one go! */
> -		if (!iov->m64_single_mode) {
> +		if (!iov->m64_single_mode[i]) {
>  			win = pnv_pci_alloc_m64_bar(phb, iov);
>  			if (win < 0)
>  				goto m64_failed;
> @@ -546,6 +546,8 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>  		res = &dev->resource[i + PCI_IOV_RESOURCES];
>  		if (!res->flags || !res->parent)
>  			continue;
> +		if (iov->m64_single_mode[i])
> +			continue;
>  
>  		/*
>  		 * The actual IOV BAR range is determined by the start address
> @@ -577,6 +579,8 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>  		res = &dev->resource[i + PCI_IOV_RESOURCES];
>  		if (!res->flags || !res->parent)
>  			continue;
> +		if (iov->m64_single_mode[i])
> +			continue;
>  
>  		size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
>  		res2 = *res;
> @@ -622,8 +626,8 @@ static void pnv_pci_sriov_disable(struct pci_dev *pdev)
>  	/* Release VF PEs */
>  	pnv_ioda_release_vf_PE(pdev);
>  
> -	/* Un-shift the IOV BAR resources */
> -	if (!iov->m64_single_mode)
> +	/* Un-shift the IOV BARs if we need to */
> +	if (iov->need_shift)
>  		pnv_pci_vf_resource_shift(pdev, -base_pe);
>  
>  	/* Release M64 windows */
> @@ -741,9 +745,8 @@ static int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>  	 * the IOV BAR according to the PE# allocated to the VFs.
>  	 * Otherwise, the PE# for the VF will conflict with others.
>  	 */
> -	if (!iov->m64_single_mode) {
> -		ret = pnv_pci_vf_resource_shift(pdev,
> -						base_pe->pe_number);
> +	if (iov->need_shift) {
> +		ret = pnv_pci_vf_resource_shift(pdev, base_pe->pe_number);
>  		if (ret)
>  			goto shift_failed;
>  	}
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 13555bc549f4..a78d1feb8fb8 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -236,14 +236,20 @@ struct pnv_iov_data {
>  	/* number of VFs IOV BAR expanded. FIXME: rename this to something less bad */
>  	u16     vfs_expanded;
>  
> +	/*
> +	 * indicates if we need to move our IOV BAR to account for our
> +	 * allocated PE number when enabling VFs.
> +	 */
> +	bool    need_shift;
> +
>  	/* number of VFs enabled */
>  	u16     num_vfs;
>  
>  	/* pointer to the array of VF PEs. num_vfs long*/
>  	struct pnv_ioda_pe *vf_pe_arr;
>  
> -	/* Did we map the VF BARs with single-PE IODA BARs? */
> -	bool    m64_single_mode;
> +	/* Did we map the VF BAR with single-PE IODA BARs? */
> +	bool    m64_single_mode[PCI_SRIOV_NUM_BARS];
>  
>  	/*
>  	 * Bit mask used to track which m64 windows that we used to map the
> 

-- 
Alexey

^ permalink raw reply

* Re: [v3 1/5] KVM: PPC: Book3S HV: Disable page merging in H_SVM_INIT_START
From: Ram Pai @ 2020-07-15  5:16 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david
In-Reply-To: <20200713052941.GF7902@in.ibm.com>

On Mon, Jul 13, 2020 at 10:59:41AM +0530, Bharata B Rao wrote:
> On Sat, Jul 11, 2020 at 02:13:43AM -0700, Ram Pai wrote:
> > Merging of pages associated with each memslot of a SVM is
> > disabled the page is migrated in H_SVM_PAGE_IN handler.
> > 
> > This operation should have been done much earlier; the moment the VM
> > is initiated for secure-transition. Delaying this operation, increases
> > the probability for those pages to acquire new references , making it
> > impossible to migrate those pages in H_SVM_PAGE_IN handler.
> > 
> > Disable page-migration in H_SVM_INIT_START handling.
> 
> While it is a good idea to disable KSM merging for all VMAs during
> H_SVM_INIT_START, I am curious if you did observe an actual case of
> ksm_madvise() failing which resulted in subsequent H_SVM_PAGE_IN
> failing to migrate?

No. I did not find any ksm_madvise() failing.  But it did not make sense
to ksm_madvise() everytime a page_in was requested. Hence i proposed
this patch. H_SVM_INIT_START is the right place for ksm_advise().

> 
> > 
> > Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> > ---
> >  arch/powerpc/kvm/book3s_hv_uvmem.c | 96 +++++++++++++++++++++++++++++---------
> >  1 file changed, 74 insertions(+), 22 deletions(-)
> > 
> > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
> > index 3d987b1..bfc3841 100644
> > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> > @@ -211,6 +211,65 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, struct kvm *kvm,
> >  	return false;
> >  }
> >  
> > +static int kvmppc_memslot_page_merge(struct kvm *kvm,
> > +		struct kvm_memory_slot *memslot, bool merge)
> > +{
> > +	unsigned long gfn = memslot->base_gfn;
> > +	unsigned long end, start = gfn_to_hva(kvm, gfn);
> > +	int ret = 0;
> > +	struct vm_area_struct *vma;
> > +	int merge_flag = (merge) ? MADV_MERGEABLE : MADV_UNMERGEABLE;
> > +
> > +	if (kvm_is_error_hva(start))
> > +		return H_STATE;
> 
> This and other cases below seem to be a new return value from
> H_SVM_INIT_START. May be update the documentation too along with
> this patch?

ok.

> 
> > +
> > +	end = start + (memslot->npages << PAGE_SHIFT);
> > +
> > +	down_write(&kvm->mm->mmap_sem);
> 
> When you rebase the patches against latest upstream you may want to
> replace the above and other instances by mmap_write/read_lock().

ok.

> 
> > +	do {
> > +		vma = find_vma_intersection(kvm->mm, start, end);
> > +		if (!vma) {
> > +			ret = H_STATE;
> > +			break;
> > +		}
> > +		ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
> > +			  merge_flag, &vma->vm_flags);
> > +		if (ret) {
> > +			ret = H_STATE;
> > +			break;
> > +		}
> > +		start = vma->vm_end + 1;
> > +	} while (end > vma->vm_end);
> > +
> > +	up_write(&kvm->mm->mmap_sem);
> > +	return ret;
> > +}
> > +
> > +static int __kvmppc_page_merge(struct kvm *kvm, bool merge)
> > +{
> > +	struct kvm_memslots *slots;
> > +	struct kvm_memory_slot *memslot;
> > +	int ret = 0;
> > +
> > +	slots = kvm_memslots(kvm);
> > +	kvm_for_each_memslot(memslot, slots) {
> > +		ret = kvmppc_memslot_page_merge(kvm, memslot, merge);
> > +		if (ret)
> > +			break;
> > +	}
> > +	return ret;
> > +}
> > +
> > +static inline int kvmppc_disable_page_merge(struct kvm *kvm)
> > +{
> > +	return __kvmppc_page_merge(kvm, false);
> > +}
> > +
> > +static inline int kvmppc_enable_page_merge(struct kvm *kvm)
> > +{
> > +	return __kvmppc_page_merge(kvm, true);
> > +}
> > +
> >  unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
> >  {
> >  	struct kvm_memslots *slots;
> > @@ -232,11 +291,18 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
> >  		return H_AUTHORITY;
> >  
> >  	srcu_idx = srcu_read_lock(&kvm->srcu);
> > +
> > +	/* disable page-merging for all memslot */
> > +	ret = kvmppc_disable_page_merge(kvm);
> > +	if (ret)
> > +		goto out;
> > +
> > +	/* register the memslot */
> >  	slots = kvm_memslots(kvm);
> >  	kvm_for_each_memslot(memslot, slots) {
> >  		if (kvmppc_uvmem_slot_init(kvm, memslot)) {
> >  			ret = H_PARAMETER;
> > -			goto out;
> > +			break;
> >  		}
> >  		ret = uv_register_mem_slot(kvm->arch.lpid,
> >  					   memslot->base_gfn << PAGE_SHIFT,
> > @@ -245,9 +311,12 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
> >  		if (ret < 0) {
> >  			kvmppc_uvmem_slot_free(kvm, memslot);
> >  			ret = H_PARAMETER;
> > -			goto out;
> > +			break;
> >  		}
> >  	}
> > +
> > +	if (ret)
> > +		kvmppc_enable_page_merge(kvm);
> 
> Is there any use of enabling KSM merging in the failure path here?
> Won't UV terminate the VM if H_SVM_INIT_START fails? If there is no need,
> you can do away with some extra routines above.

UV will terminate it. But I did not want to tie that assumption into
this function.

> 
> >  out:
> >  	srcu_read_unlock(&kvm->srcu, srcu_idx);
> >  	return ret;
> > @@ -384,7 +453,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm)
> >   */
> >  static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
> >  		   unsigned long end, unsigned long gpa, struct kvm *kvm,
> > -		   unsigned long page_shift, bool *downgrade)
> > +		   unsigned long page_shift)
> >  {
> >  	unsigned long src_pfn, dst_pfn = 0;
> >  	struct migrate_vma mig;
> > @@ -400,18 +469,6 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
> >  	mig.src = &src_pfn;
> >  	mig.dst = &dst_pfn;
> >  
> > -	/*
> > -	 * We come here with mmap_sem write lock held just for
> > -	 * ksm_madvise(), otherwise we only need read mmap_sem.
> > -	 * Hence downgrade to read lock once ksm_madvise() is done.
> > -	 */
> > -	ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
> > -			  MADV_UNMERGEABLE, &vma->vm_flags);
> 
> I haven't seen the subsequent patches yet, but guess you are
> taking care of disabling KSM mering for hot-plugged memory too.

No. This is a good catch. The hotplugged memory patch needs to disable
KSM aswell.

RP

^ permalink raw reply

* Re: [v3 4/5] KVM: PPC: Book3S HV: retry page migration before erroring-out H_SVM_PAGE_IN
From: Ram Pai @ 2020-07-15  5:09 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david
In-Reply-To: <20200713095043.GH7902@in.ibm.com>

On Mon, Jul 13, 2020 at 03:20:43PM +0530, Bharata B Rao wrote:
> On Sat, Jul 11, 2020 at 02:13:46AM -0700, Ram Pai wrote:
> > The page requested for page-in; sometimes, can have transient
> > references, and hence cannot migrate immediately. Retry a few times
> > before returning error.
> 
> As I noted in the previous patch, we need to understand what are these
> transient errors and they occur on what type of pages?

Its not clear when they occur. But they occur quite irregularly and they 
do occur on pages that were shared in the previous boot.

> 
> The previous patch also introduced a bit of retry logic in the
> page-in path. Can you consolidate the retry logic into a separate
> patch?

yes. having the retry logic in a seperate patch gives us the flexibility
to drop it, if needed.  The retry patch is not the core element of this
patch series.

RP

^ permalink raw reply

* Re: [v3 3/5] KVM: PPC: Book3S HV: migrate remaining normal-GFNs to secure-GFNs in H_SVM_INIT_DONE
From: Ram Pai @ 2020-07-15  5:05 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david
In-Reply-To: <20200713094506.GG7902@in.ibm.com>

On Mon, Jul 13, 2020 at 03:15:06PM +0530, Bharata B Rao wrote:
> On Sat, Jul 11, 2020 at 02:13:45AM -0700, Ram Pai wrote:
> > The Ultravisor is expected to explicitly call H_SVM_PAGE_IN for all the pages
> >  
> >  	if (!(*mig.src & MIGRATE_PFN_MIGRATE)) {
> > -		ret = -1;
> > +		ret = -2;
> 
> migrate_vma_setup() has marked that this pfn can't be migrated. What
> transient errors are you observing which will disappear within 10
> retries?
> 
> Also till now when UV used to pull in all the pages, we never seemed to
> have hit these transient errors. But now when HV is pushing the same
> pages, we see these errors which are disappearing after 10 retries.
> Can you explain this more please? What sort of pages are these?

We did see them even before this patch. The retry alleviates the
problem, but does not entirely eliminate it. If the chance of seeing
the issue without the patch is 1%,  the chance of seeing this issue
with this patch becomes 0.25%.

> 
> >  		goto out_finalize;
> >  	}
> > +	bool retry = 0;
...snip...
> > +
> > +	*ret = 0;
> > +	while (kvmppc_next_nontransitioned_gfn(memslot, kvm, &gfn)) {
> > +
> > +		down_write(&kvm->mm->mmap_sem);
> 
> Acquiring and releasing mmap_sem in a loop? Any reason?
> 
> Now that you have moved ksm_madvise() calls to init time, any specific
> reason to take write mmap_sem here?

The semaphore protects the vma. right?

And its acquired/released in the loop, to provide the ability to relinquish
the cpu without getting into a tight loop and cause softlockups.

> 
> > +		start = gfn_to_hva(kvm, gfn);
> > +		if (kvm_is_error_hva(start)) {
> > +			*ret = H_STATE;
> > +			goto next;
> > +		}
> > +
> > +		end = start + (1UL << PAGE_SHIFT);
> > +		vma = find_vma_intersection(kvm->mm, start, end);
> > +		if (!vma || vma->vm_start > start || vma->vm_end < end) {
> > +			*ret = H_STATE;
> > +			goto next;
> > +		}
> > +
> > +		mutex_lock(&kvm->arch.uvmem_lock);
> > +		*ret = kvmppc_svm_migrate_page(vma, start, end,
> > +				(gfn << PAGE_SHIFT), kvm, PAGE_SHIFT, false);
> > +		mutex_unlock(&kvm->arch.uvmem_lock);
> > +
> > +next:
> > +		up_write(&kvm->mm->mmap_sem);
> > +
> > +		if (*ret == -2) {
> > +			retry = 1;
> 
> Using true or false assignment makes it easy to understand the intention
> of this 'retry' variable.

ok. next version will have that change.

Thanks for the comments,
RP

^ permalink raw reply

* Re: [PATCH 12/15] powerpc/powernv/sriov: De-indent setup and teardown
From: Alexey Kardashevskiy @ 2020-07-15  4:58 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: linuxppc-dev
In-Reply-To: <CAOSf1CGdGQ0cFNmfL=gVNcqX+JhT_W=73UWYd6aVourvtEs4ow@mail.gmail.com>



On 15/07/2020 14:46, Oliver O'Halloran wrote:
> On Wed, Jul 15, 2020 at 2:41 PM Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>
>>
>>
>> On 15/07/2020 14:21, Oliver O'Halloran wrote:
>>> On Wed, Jul 15, 2020 at 2:00 PM Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>>
>>>>
>>>> or we could just skip setting
>>>>
>>>> ppc_md.pcibios_sriov_enable = pnv_pcibios_sriov_enable;
>>>>
>>>> for uninteresting platforms in pnv_pci_init_ioda_phb().
>>>
>>> I don't think so. ppc_md is per-platform, not per-PHB andw e still
>>> have to deal with a mixture of IODA/NVLink/OpenCAPI PHBs on a single
>>> system.
>>
>> NVLink/OpenCAPI won't have SRIOV devices.
> 
> ...OR WILL THEY?

NO!


>> Other types won't appear on
>> the same platform simultaneously. It is not too clean, yes.
> 
> Sure, my point is that's a per-PHB setting rather than a per-platform
> one so we should set it up like that.

and my point is that you did too good job getting rid of IODA1 vs IODA2
checks to keep this check. But ok.

> 
>>> We could make it a callback in pnv_phb, but  it seemed like
>>> more indirection than it's worth.
>>
>> I genuinely dislike how we use ppc_md so removing things from it is
>> definitely a good thing.
> 
> you wouldn't be able to get rid of it. We'd have something like what
> we have for the existing pcibios calls where there's a "generic" one
> that bounces it to a member of pci_controller_ops, which then bounces
> it to the pnv_phb method. It's bad and I hate it.

No argument here...


-- 
Alexey

^ permalink raw reply

* Re: [PATCH 14/15] powerpc/powernv/sriov: Refactor M64 BAR setup
From: Alexey Kardashevskiy @ 2020-07-15  4:50 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev
In-Reply-To: <20200710052340.737567-15-oohall@gmail.com>



On 10/07/2020 15:23, Oliver O'Halloran wrote:
> Split up the logic so that we have one branch that handles setting up a
> segmented window and another that handles setting up single PE windows for
> each VF.
> 
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>


Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>



> ---
> This patch could be folded into the previous one. I've kept it
> seperate mainly because the diff is *horrific* when they're merged.
> ---
>  arch/powerpc/platforms/powernv/pci-sriov.c | 57 ++++++++++------------
>  1 file changed, 27 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c b/arch/powerpc/platforms/powernv/pci-sriov.c
> index 2f967aa4fbf5..8de03636888a 100644
> --- a/arch/powerpc/platforms/powernv/pci-sriov.c
> +++ b/arch/powerpc/platforms/powernv/pci-sriov.c
> @@ -441,52 +441,49 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>  	struct resource       *res;
>  	int                    i, j;
>  	int64_t                rc;
> -	int                    total_vfs;
>  	resource_size_t        size, start;
> -	int                    m64_bars;
> +	int                    base_pe_num;
>  
>  	phb = pci_bus_to_pnvhb(pdev->bus);
>  	iov = pnv_iov_get(pdev);
> -	total_vfs = pci_sriov_get_totalvfs(pdev);
> -
> -	if (iov->m64_single_mode)
> -		m64_bars = num_vfs;
> -	else
> -		m64_bars = 1;
>  
>  	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>  		res = &pdev->resource[i + PCI_IOV_RESOURCES];
>  		if (!res->flags || !res->parent)
>  			continue;
>  
> -		for (j = 0; j < m64_bars; j++) {
> +		/* don't need single mode? map everything in one go! */
> +		if (!iov->m64_single_mode) {
>  			win = pnv_pci_alloc_m64_bar(phb, iov);
>  			if (win < 0)
>  				goto m64_failed;
>  
> -			if (iov->m64_single_mode) {
> -				int pe_num = iov->vf_pe_arr[j].pe_number;
> -
> -				size = pci_iov_resource_size(pdev,
> -							PCI_IOV_RESOURCES + i);
> -				start = res->start + size * j;
> -				rc = pnv_ioda_map_m64_single(phb, win,
> -							     pe_num,
> -							     start,
> -							     size);
> -			} else {
> -				size = resource_size(res);
> -				start = res->start;
> -
> -				rc = pnv_ioda_map_m64_accordion(phb, win, start,
> -								size);
> -			}
> +			size = resource_size(res);
> +			start = res->start;
>  
> -			if (rc != OPAL_SUCCESS) {
> -				dev_err(&pdev->dev, "Failed to map M64 window #%d: %lld\n",
> -					win, rc);
> +			rc = pnv_ioda_map_m64_accordion(phb, win, start, size);
> +			if (rc)
> +				goto m64_failed;
> +
> +			continue;
> +		}
> +
> +		/* otherwise map each VF with single PE BARs */
> +		size = pci_iov_resource_size(pdev, PCI_IOV_RESOURCES + i);
> +		base_pe_num = iov->vf_pe_arr[0].pe_number;
> +
> +		for (j = 0; j < num_vfs; j++) {
> +			win = pnv_pci_alloc_m64_bar(phb, iov);
> +			if (win < 0)
> +				goto m64_failed;
> +
> +			start = res->start + size * j;
> +			rc = pnv_ioda_map_m64_single(phb, win,
> +						     base_pe_num + j,
> +						     start,
> +						     size);
> +			if (rc)
>  				goto m64_failed;
> -			}
>  		}
>  	}
>  	return 0;
> 

-- 
Alexey

^ permalink raw reply

* Re: [PATCH 12/15] powerpc/powernv/sriov: De-indent setup and teardown
From: Oliver O'Halloran @ 2020-07-15  4:46 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev
In-Reply-To: <f97edf44-0029-5a33-0cff-7067a81f10ba@ozlabs.ru>

On Wed, Jul 15, 2020 at 2:41 PM Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>
>
>
> On 15/07/2020 14:21, Oliver O'Halloran wrote:
> > On Wed, Jul 15, 2020 at 2:00 PM Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> >>
> >>
> >> or we could just skip setting
> >>
> >> ppc_md.pcibios_sriov_enable = pnv_pcibios_sriov_enable;
> >>
> >> for uninteresting platforms in pnv_pci_init_ioda_phb().
> >
> > I don't think so. ppc_md is per-platform, not per-PHB andw e still
> > have to deal with a mixture of IODA/NVLink/OpenCAPI PHBs on a single
> > system.
>
> NVLink/OpenCAPI won't have SRIOV devices.

...OR WILL THEY?

> Other types won't appear on
> the same platform simultaneously. It is not too clean, yes.

Sure, my point is that's a per-PHB setting rather than a per-platform
one so we should set it up like that.

> > We could make it a callback in pnv_phb, but  it seemed like
> > more indirection than it's worth.
>
> I genuinely dislike how we use ppc_md so removing things from it is
> definitely a good thing.

you wouldn't be able to get rid of it. We'd have something like what
we have for the existing pcibios calls where there's a "generic" one
that bounces it to a member of pci_controller_ops, which then bounces
it to the pnv_phb method. It's bad and I hate it.

>
>
> --
> Alexey

^ permalink raw reply

* Re: [PATCH 12/15] powerpc/powernv/sriov: De-indent setup and teardown
From: Alexey Kardashevskiy @ 2020-07-15  4:41 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: linuxppc-dev
In-Reply-To: <CAOSf1CGbDWFzuh5pD3CYayTXOFsw18E0kFn052xtxrdes_2_zw@mail.gmail.com>



On 15/07/2020 14:21, Oliver O'Halloran wrote:
> On Wed, Jul 15, 2020 at 2:00 PM Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>
>>
>> or we could just skip setting
>>
>> ppc_md.pcibios_sriov_enable = pnv_pcibios_sriov_enable;
>>
>> for uninteresting platforms in pnv_pci_init_ioda_phb().
> 
> I don't think so. ppc_md is per-platform, not per-PHB andw e still
> have to deal with a mixture of IODA/NVLink/OpenCAPI PHBs on a single
> system.

NVLink/OpenCAPI won't have SRIOV devices. Other types won't appear on
the same platform simultaneously. It is not too clean, yes.


> We could make it a callback in pnv_phb, but  it seemed like
> more indirection than it's worth.

I genuinely dislike how we use ppc_md so removing things from it is
definitely a good thing.


-- 
Alexey

^ permalink raw reply

* Re: [PATCH 12/15] powerpc/powernv/sriov: De-indent setup and teardown
From: Oliver O'Halloran @ 2020-07-15  4:21 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev
In-Reply-To: <ece39d29-6ab0-dcf4-0561-3c488c7921f9@ozlabs.ru>

On Wed, Jul 15, 2020 at 2:00 PM Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>
>
> or we could just skip setting
>
> ppc_md.pcibios_sriov_enable = pnv_pcibios_sriov_enable;
>
> for uninteresting platforms in pnv_pci_init_ioda_phb().

I don't think so. ppc_md is per-platform, not per-PHB andw e still
have to deal with a mixture of IODA/NVLink/OpenCAPI PHBs on a single
system. We could make it a callback in pnv_phb, but  it seemed like
more indirection than it's worth.

^ permalink raw reply

* Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86
From: Oliver O'Halloran @ 2020-07-15  4:18 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Greg Kroah-Hartman, linux-pci, bjorn, Paul Mackerras, sparclinux,
	Toan Le, Christoph Hellwig, Marek Vasut, Rob Herring,
	Lorenzo Pieralisi, Sagi Grimberg, Kevin Hilman, Russell King,
	Ley Foon Tan, Greg Ungerer, Geert Uytterhoeven, Jakub Kicinski,
	Matt Turner, linux-kernel-mentees, Guenter Roeck, Bjorn Helgaas,
	Ray Jui, linuxppc-dev, Jens Axboe, Ivan Kokshaysky, Shuah Khan,
	Keith Busch, Boris Ostrovsky, Richard Henderson, Juergen Gross,
	Thomas Bogendoerfer, Scott Branden, Jingoo Han,
	linux-kernel@vger.kernel.org, Philipp Zabel, Saheed O. Bolarinwa,
	Gustavo Pimentel, Bjorn Helgaas, David S. Miller, Heiner Kallweit
In-Reply-To: <CAK8P3a3EZX8=649R9cYF6_=ivh1Xyrgsc5mUtS=d5yvQ3doZaQ@mail.gmail.com>

On Wed, Jul 15, 2020 at 8:03 AM Arnd Bergmann <arnd@arndb.de> wrote:
>
> - Most error checking is static: PCIBIOS_BAD_REGISTER_NUMBER
>   only happens if you pass an invalid register number, but most
>   callers pass a compile-time constant register number that is
>   known to be correct, or the driver would never work. Similarly,
>   PCIBIOS_DEVICE_NOT_FOUND wouldn't normally happen
>   since you pass a valid pci_device pointer that was already
>   probed.

Having some feedback about obvious programming errors is still useful
when doing driver development. Reporting those via printk() would
probably be more useful to those who care though.

> - config space accesses are very rare compared to memory
>   space access and on the hardware side the error handling
>   would be similar, but readl/writel don't return errors, they just
>   access wrong registers or return 0xffffffff.
>   arch/powerpc/kernel/eeh.c has a ton extra code written to
>   deal with it, but no other architectures do.

TBH the EEH MMIO hooks were probably a mistake to begin with. Errors
detected via MMIO are almost always asynchronous to the error itself
so you usually just wind up with a misleading stack trace rather than
any kind of useful synchronous error reporting. It seems like most
drivers don't bother checking for 0xFFs either and rely on the
asynchronous reporting via .error_detected() instead, so I have to
wonder what the point is. I've been thinking of removing the MMIO
hooks and using a background poller to check for errors on each PHB
periodically (assuming we don't have an EEH interrupt) instead. That
would remove the requirement for eeh_dev_check_failure() to be
interrupt safe too, so it might even let us fix all the godawful races
in EEH.

> - If we add code to detect errors in pci_read_config_*
>   and do some of the stuff from powerpc's
>   eeh_dev_check_failure(), we are more likely to catch
>   intermittent failures when drivers don't check, or bugs
>   with invalid arguments in device drivers than relying on
>   drivers to get their error handling right when those code
>   paths don't ever get covered in normal testing.

Adding some kind of error detection to the generic config accessors
wouldn't hurt, but detection is only half the problem. The main job of
eeh_dev_check_failure() is waking up the EEH recovery thread which
actually handles notifying drivers, device resets, etc and you'd want
something in the PCI core. Right now there's two implementations of
that reporting logic: one for EEH in arch/powerpc/eeh_driver.c and one
for AER/DPC in drivers/pci/pcie/err.c. I think the latter could be
moved into the PCI core easily enough since there's not much about it
that's really specific to PCIe. Ideally we could drop the EEH specific
one too, but I'm not sure how to implement that without it devolving
into callback spaghetti.

Oliver

^ permalink raw reply

* Re: [PATCH 3/3] ASoC: fsl-asoc-card: Support Headphone and Microphone Jack detection
From: Shengjiu Wang @ 2020-07-15  4:14 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	Linux-ALSA, kuninori.morimoto.gx, Timur Tabi, samuel, katsuhiro,
	Fabio Estevam, Shengjiu Wang, Xiubo Li, linux-kernel,
	Liam Girdwood, Rob Herring, Mark Brown, Takashi Iwai,
	linuxppc-dev
In-Reply-To: <20200714211432.GA10818@Asurada-Nvidia>

On Wed, Jul 15, 2020 at 5:16 AM Nicolin Chen <nicoleotsuka@gmail.com> wrote:
>
> Hi Shengjiu,
>
> The whole series looks good to me. Just a couple of small
> questions inline:
>
> On Tue, Jul 14, 2020 at 05:05:36PM +0800, Shengjiu Wang wrote:
> > Use asoc_simple_init_jack function from simple card to implement
> > the Headphone and Microphone detection.
> > Register notifier to disable Speaker when Headphone is plugged in
> > and enable Speaker when Headphone is unplugged.
> > Register notifier to disable Digital Microphone when Analog Microphone
> > is plugged in and enable DMIC when Analog Microphone is unplugged.
> >
> > Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
> > ---
> >  sound/soc/fsl/Kconfig         |  1 +
> >  sound/soc/fsl/fsl-asoc-card.c | 69 ++++++++++++++++++++++++++++++++++-
> >  2 files changed, 68 insertions(+), 2 deletions(-)
>
> >  static int fsl_asoc_card_late_probe(struct snd_soc_card *card)
> >  {
> >       struct fsl_asoc_card_priv *priv = snd_soc_card_get_drvdata(card);
> > @@ -745,8 +789,29 @@ static int fsl_asoc_card_probe(struct platform_device *pdev)
> >       snd_soc_card_set_drvdata(&priv->card, priv);
> >
> >       ret = devm_snd_soc_register_card(&pdev->dev, &priv->card);
> > -     if (ret && ret != -EPROBE_DEFER)
> > -             dev_err(&pdev->dev, "snd_soc_register_card failed (%d)\n", ret);
> > +     if (ret) {
> > +             if (ret != -EPROBE_DEFER)
> > +                     dev_err(&pdev->dev, "snd_soc_register_card failed (%d)\n", ret);
>
> I think we may move this EPROBE_DEFER to the asrc_fail label.

If we move this to asrc_fail label, then it will be hard to define the
error message.
There are many places that goto asrc_fail.

>
> > +             goto asrc_fail;
> > +     }
> > +
> > +     if (of_property_read_bool(np, "hp-det-gpio")) {
>
> Could we move this check inside asoc_simple_init_jack? There's no
> problem with doing it here though, yet I got a bit confused by it
> as I thought it's a boolean type property, which would be against
> the DT bindings until I saw asoc_simple_init_jack() uses the same
> string to get the GPIO. Just it probably would be a bit tricky as
> we need it to be optional here.
>
> Otherwise, I think we may add a line of comments to indicate that
> the API would use the same string to get the GPIO.

In asoc_simple_init_jack, gpio_is_valid() will be invalid when there is
no "hp-det-gpio" property, and asoc_simple_init_jack will return 0.

The reason why I add a check here is mostly for
snd_soc_jack_notifier_register().
when there is no jack created, there will be a kernel dump.

or I can use this code:

-       if (of_property_read_bool(np, "hp-det-gpio")) {
-               ret = asoc_simple_init_jack(&priv->card, &priv->hp_jack,
-                                           1, NULL, "Headphone Jack");
-               if (ret)
-                       goto asrc_fail;
+       ret = asoc_simple_init_jack(&priv->card, &priv->hp_jack,
+                                   1, NULL, "Headphone Jack");
+       if (ret)
+               goto asrc_fail;

+       if (priv->hp_jack.jack.jack)
                snd_soc_jack_notifier_register(&priv->hp_jack.jack,
&hp_jack_nb);
-       }

what do you think?

best regards
wang shengjiu

^ permalink raw reply

* Re: [PATCH v3 2/9] powerpc/watchpoint: Fix DAWR exception constraint
From: Ravi Bangoria @ 2020-07-15  4:08 UTC (permalink / raw)
  To: Jordan Niethe
  Cc: Christophe Leroy, apopple, mikey, miltonm, peterz, oleg,
	Nicholas Piggin, linux-kernel, Paul Mackerras, jolsa, fweisbec,
	pedromfc, naveen.n.rao, linuxppc-dev, mingo, Ravi Bangoria
In-Reply-To: <CACzsE9qSjWKWGDQPGSk-c5f5pxUyWWtUFW+AzzB5M76qFcQ-Cw@mail.gmail.com>

Hi Jordan,

>> @@ -536,7 +538,12 @@ static bool check_dawrx_constraints(struct pt_regs *regs, int type,
>>          if (OP_IS_LOAD(type) && !(info->type & HW_BRK_TYPE_READ))
>>                  return false;
>>
>> -       if (OP_IS_STORE(type) && !(info->type & HW_BRK_TYPE_WRITE))
>> +       /*
>> +        * The Cache Management instructions other than dcbz never
>> +        * cause a match. i.e. if type is CACHEOP, the instruction
>> +        * is dcbz, and dcbz is treated as Store.
>> +        */
>> +       if ((OP_IS_STORE(type) || type == CACHEOP) && !(info->type & HW_BRK_TYPE_WRITE))
>>                  return false;
> This change seems seperate to this commit?

I also thought about it but was not sure. See below ...

>>
>>          if (is_kernel_addr(regs->nip) && !(info->type & HW_BRK_TYPE_KERNEL))
>> @@ -553,7 +560,8 @@ static bool check_dawrx_constraints(struct pt_regs *regs, int type,
>>    * including extraneous exception. Otherwise return false.
>>    */
>>   static bool check_constraints(struct pt_regs *regs, struct ppc_inst instr,
>> -                             int type, int size, struct arch_hw_breakpoint *info)
>> +                             unsigned long ea, int type, int size,
>> +                             struct arch_hw_breakpoint *info)
>>   {
>>          bool in_user_range = dar_in_user_range(regs->dar, info);
>>          bool dawrx_constraints;
>> @@ -569,11 +577,10 @@ static bool check_constraints(struct pt_regs *regs, struct ppc_inst instr,
>>          }
>>
>>          if (unlikely(ppc_inst_equal(instr, ppc_inst(0)))) {
>> -               if (in_user_range)
>> -                       return true;
>> -
>> -               if (dar_in_hw_range(regs->dar, info)) {
>> -                       info->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ;
>> +               if (cpu_has_feature(CPU_FTR_ARCH_31)) {
>> +                       if (dar_in_hw_range(regs->dar, info))
>> +                               return true;
>> +               } else {
>>                          return true;
> I think this would be clearer as:
>          if (cpu_has_feature(CPU_FTR_ARCH_31) &&
> !(dar_in_hw_range(regs->dar, info)))
>              return false;
>          else
>              return true;

ok

> 
>>                  }
>>                  return false;
>> @@ -581,10 +588,20 @@ static bool check_constraints(struct pt_regs *regs, struct ppc_inst instr,
>>
>>          dawrx_constraints = check_dawrx_constraints(regs, type, info);
>>
>> -       if (dar_user_range_overlaps(regs->dar, size, info))
>> +       if (type == UNKNOWN) {
>> +               if (cpu_has_feature(CPU_FTR_ARCH_31)) {
>> +                       if (dar_in_hw_range(regs->dar, info))
>> +                               return dawrx_constraints;
>> +               } else {
>> +                       return dawrx_constraints;
>> +               }
>> +               return false;
>> +       }
> Similar thing here, it could be:
>          if ((cpu_has_feature(CPU_FTR_ARCH_31)) &&
> !(dar_in_hw_range(regs->dar, info)))
>              return false;
>          else
>              return dawrx_constraints;

ok

>> +
>> +       if (ea_user_range_overlaps(ea, size, info))
>>                  return dawrx_constraints;
>>
>> -       if (dar_hw_range_overlaps(regs->dar, size, info)) {
>> +       if (ea_hw_range_overlaps(ea, size, info)) {
>>                  if (dawrx_constraints) {
>>                          info->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ;
>>                          return true;
>> @@ -593,8 +610,17 @@ static bool check_constraints(struct pt_regs *regs, struct ppc_inst instr,
>>          return false;
>>   }
>>
>> +static int cache_op_size(void)
>> +{
>> +#ifdef __powerpc64__
>> +       return ppc64_caches.l1d.block_size;
>> +#else
>> +       return L1_CACHE_BYTES;
>> +#endif
>> +}
>> +
>>   static void get_instr_detail(struct pt_regs *regs, struct ppc_inst *instr,
>> -                            int *type, int *size, bool *larx_stcx)
>> +                            int *type, int *size, unsigned long *ea)
>>   {
>>          struct instruction_op op;
>>
>> @@ -602,16 +628,23 @@ static void get_instr_detail(struct pt_regs *regs, struct ppc_inst *instr,
>>                  return;
>>
>>          analyse_instr(&op, regs, *instr);
>> -
>> -       /*
>> -        * Set size = 8 if analyse_instr() fails. If it's a userspace
>> -        * watchpoint(valid or extraneous), we can notify user about it.
>> -        * If it's a kernel watchpoint, instruction  emulation will fail
>> -        * in stepping_handler() and watchpoint will be disabled.
>> -        */
>>          *type = GETTYPE(op.type);
>> -       *size = !(*type == UNKNOWN) ? GETSIZE(op.type) : 8;
>> -       *larx_stcx = (*type == LARX || *type == STCX);
>> +       *ea = op.ea;
>> +#ifdef __powerpc64__
>> +       if (!(regs->msr & MSR_64BIT))
>> +               *ea &= 0xffffffffUL;
>> +#endif
>> +
>> +       *size = GETSIZE(op.type);
>> +       if (*type == CACHEOP) {
>> +               *size = cache_op_size();
>> +               *ea &= ~(*size - 1);
>> +       }
> Again related to CACHEOP, should these changes be mentioned in the
> commit message?

For CACHEOP, ea returned by analyse_instr() needs to be aligned down to cache
block size manually. Also, for CACHEOP, size returned by analyse_instr() is 0
and thus size also needs to be calculated manually. This was missed in
27985b2a640e. So it kind of relates to other changes of the patch but needs
special treatment as well. Will see if I can split it.

Thanks for the review,
Ravi

^ permalink raw reply

* Re: [PATCH 13/15] powerpc/powernv/sriov: Move M64 BAR allocation into a helper
From: Alexey Kardashevskiy @ 2020-07-15  4:02 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev
In-Reply-To: <20200710052340.737567-14-oohall@gmail.com>



On 10/07/2020 15:23, Oliver O'Halloran wrote:
> I want to refactor the loop this code is currently inside of. Hoist it on
> out.
> 
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>



Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>


> ---
>  arch/powerpc/platforms/powernv/pci-sriov.c | 31 ++++++++++++++--------
>  1 file changed, 20 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c b/arch/powerpc/platforms/powernv/pci-sriov.c
> index d5699cd2ab7a..2f967aa4fbf5 100644
> --- a/arch/powerpc/platforms/powernv/pci-sriov.c
> +++ b/arch/powerpc/platforms/powernv/pci-sriov.c
> @@ -416,6 +416,23 @@ static int64_t pnv_ioda_map_m64_single(struct pnv_phb *phb,
>  	return rc;
>  }
>  
> +static int pnv_pci_alloc_m64_bar(struct pnv_phb *phb, struct pnv_iov_data *iov)
> +{
> +	int win;
> +
> +	do {
> +		win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
> +				phb->ioda.m64_bar_idx + 1, 0);
> +
> +		if (win >= phb->ioda.m64_bar_idx + 1)
> +			return -1;
> +	} while (test_and_set_bit(win, &phb->ioda.m64_bar_alloc));
> +
> +	set_bit(win, iov->used_m64_bar_mask);
> +
> +	return win;
> +}
> +
>  static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>  {
>  	struct pnv_iov_data   *iov;
> @@ -443,17 +460,9 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>  			continue;
>  
>  		for (j = 0; j < m64_bars; j++) {
> -
> -			/* allocate a window ID for this BAR */
> -			do {
> -				win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
> -						phb->ioda.m64_bar_idx + 1, 0);
> -
> -				if (win >= phb->ioda.m64_bar_idx + 1)
> -					goto m64_failed;
> -			} while (test_and_set_bit(win, &phb->ioda.m64_bar_alloc));
> -			set_bit(win, iov->used_m64_bar_mask);
> -
> +			win = pnv_pci_alloc_m64_bar(phb, iov);
> +			if (win < 0)
> +				goto m64_failed;
>  
>  			if (iov->m64_single_mode) {
>  				int pe_num = iov->vf_pe_arr[j].pe_number;
> 

-- 
Alexey

^ permalink raw reply

* Re: [PATCH 12/15] powerpc/powernv/sriov: De-indent setup and teardown
From: Alexey Kardashevskiy @ 2020-07-15  4:00 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev
In-Reply-To: <20200710052340.737567-13-oohall@gmail.com>



On 10/07/2020 15:23, Oliver O'Halloran wrote:
> Remove the IODA2 PHB checks. We already assume IODA2 in several places so
> there's not much point in wrapping most of the setup and teardown process
> in an if block.
> 
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
> ---
>  arch/powerpc/platforms/powernv/pci-sriov.c | 86 ++++++++++++----------
>  1 file changed, 49 insertions(+), 37 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c b/arch/powerpc/platforms/powernv/pci-sriov.c
> index 08f88187d65a..d5699cd2ab7a 100644
> --- a/arch/powerpc/platforms/powernv/pci-sriov.c
> +++ b/arch/powerpc/platforms/powernv/pci-sriov.c
> @@ -610,16 +610,18 @@ static void pnv_pci_sriov_disable(struct pci_dev *pdev)
>  	num_vfs = iov->num_vfs;
>  	base_pe = iov->vf_pe_arr[0].pe_number;
>  
> +	if (WARN_ON(!iov))
> +		return;
> +
>  	/* Release VF PEs */
>  	pnv_ioda_release_vf_PE(pdev);
>  
> -	if (phb->type == PNV_PHB_IODA2) {
> -		if (!iov->m64_single_mode)
> -			pnv_pci_vf_resource_shift(pdev, -base_pe);
> +	/* Un-shift the IOV BAR resources */
> +	if (!iov->m64_single_mode)
> +		pnv_pci_vf_resource_shift(pdev, -base_pe);
>  
> -		/* Release M64 windows */
> -		pnv_pci_vf_release_m64(pdev, num_vfs);
> -	}
> +	/* Release M64 windows */
> +	pnv_pci_vf_release_m64(pdev, num_vfs);
>  }
>  
>  static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
> @@ -693,41 +695,51 @@ static int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>  	phb = pci_bus_to_pnvhb(pdev->bus);
>  	iov = pnv_iov_get(pdev);
>  
> -	if (phb->type == PNV_PHB_IODA2) {
> -		if (!iov->vfs_expanded) {
> -			dev_info(&pdev->dev, "don't support this SRIOV device"
> -				" with non 64bit-prefetchable IOV BAR\n");
> -			return -ENOSPC;
> -		}
> +	/*
> +	 * There's a calls to IODA2 PE setup code littered throughout. We could
> +	 * probably fix that, but we'd still have problems due to the
> +	 * restriction inherent on IODA1 PHBs.
> +	 *
> +	 * NB: We class IODA3 as IODA2 since they're very similar.
> +	 */
> +	if (phb->type != PNV_PHB_IODA2) {
> +		pci_err(pdev, "SR-IOV is not supported on this PHB\n");
> +		return -ENXIO;
> +	}

or we could just skip setting

ppc_md.pcibios_sriov_enable = pnv_pcibios_sriov_enable;

for uninteresting platforms in pnv_pci_init_ioda_phb().


>  
> -		/* allocate a contigious block of PEs for our VFs */
> -		base_pe = pnv_ioda_alloc_pe(phb, num_vfs);
> -		if (!base_pe) {
> -			pci_err(pdev, "Unable to allocate PEs for %d VFs\n", num_vfs);
> -			return -EBUSY;
> -		}
> +	if (!iov->vfs_expanded) {
> +		dev_info(&pdev->dev, "don't support this SRIOV device"
> +			" with non 64bit-prefetchable IOV BAR\n");
> +		return -ENOSPC;
> +	}
>  
> -		iov->vf_pe_arr = base_pe;
> -		iov->num_vfs = num_vfs;
> +	/* allocate a contigious block of PEs for our VFs */
> +	base_pe = pnv_ioda_alloc_pe(phb, num_vfs);
> +	if (!base_pe) {
> +		pci_err(pdev, "Unable to allocate PEs for %d VFs\n", num_vfs);
> +		return -EBUSY;
> +	}
>  
> -		/* Assign M64 window accordingly */
> -		ret = pnv_pci_vf_assign_m64(pdev, num_vfs);
> -		if (ret) {
> -			dev_info(&pdev->dev, "Not enough M64 window resources\n");
> -			goto m64_failed;
> -		}
> +	iov->vf_pe_arr = base_pe;
> +	iov->num_vfs = num_vfs;
>  
> -		/*
> -		 * When using one M64 BAR to map one IOV BAR, we need to shift
> -		 * the IOV BAR according to the PE# allocated to the VFs.
> -		 * Otherwise, the PE# for the VF will conflict with others.
> -		 */
> -		if (!iov->m64_single_mode) {
> -			ret = pnv_pci_vf_resource_shift(pdev,
> -							base_pe->pe_number);
> -			if (ret)
> -				goto shift_failed;
> -		}
> +	/* Assign M64 window accordingly */
> +	ret = pnv_pci_vf_assign_m64(pdev, num_vfs);
> +	if (ret) {
> +		dev_info(&pdev->dev, "Not enough M64 window resources\n");
> +		goto m64_failed;
> +	}
> +
> +	/*
> +	 * When using one M64 BAR to map one IOV BAR, we need to shift
> +	 * the IOV BAR according to the PE# allocated to the VFs.
> +	 * Otherwise, the PE# for the VF will conflict with others.
> +	 */
> +	if (!iov->m64_single_mode) {
> +		ret = pnv_pci_vf_resource_shift(pdev,
> +						base_pe->pe_number);

This can be a single line now. Thanks,


> +		if (ret)
> +			goto shift_failed;
>  	}
>  
>  	/* Setup VF PEs */
> 

-- 
Alexey

^ permalink raw reply

* Re: [PATCH] powerpc: Fix inconsistent function names
From: Michael Ellerman @ 2020-07-15  3:54 UTC (permalink / raw)
  To: YueHaibing, benh, paulus, haren, dave.hansen, npiggin
  Cc: YueHaibing, linuxppc-dev, linux-kernel
In-Reply-To: <20200715031939.31968-1-yuehaibing@huawei.com>

YueHaibing <yuehaibing@huawei.com> writes:

> The stub helpers name should be consistent with prototypes.
>
> mm_context_add_vas_windows() --> mm_context_add_vas_window()
> mm_context_remove_vas_windows() --> mm_context_remove_vas_window()
>
> Fixes: c420644c0a8f ("powerpc: Use mm_context vas_windows counter to issue CP_ABORT")
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> ---
>  arch/powerpc/include/asm/mmu_context.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index 1a474f6b1992..00fd1d44731a 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -218,8 +218,8 @@ static inline void inc_mm_active_cpus(struct mm_struct *mm) { }
>  static inline void dec_mm_active_cpus(struct mm_struct *mm) { }
>  static inline void mm_context_add_copro(struct mm_struct *mm) { }
>  static inline void mm_context_remove_copro(struct mm_struct *mm) { }
> -static inline void mm_context_add_vas_windows(struct mm_struct *mm) { }
> -static inline void mm_context_remove_vas_windows(struct mm_struct *mm) { }
> +static inline void mm_context_add_vas_window(struct mm_struct *mm) { }
> +static inline void mm_context_remove_vas_window(struct mm_struct *mm) { }
>  #endif

Both of those functions are only called from 64-bit only code, so the
stubs should not be needed at all. Which explains why we haven't seen a
build break.

So just dropping them would be better IMO.

cheers

^ permalink raw reply

* Re: [PATCH 05/13] cpufreq/arch: powerpc: pasemi: Move prototypes to shared header
From: Viresh Kumar @ 2020-07-15  3:51 UTC (permalink / raw)
  To: Olof Johansson
  Cc: linux-pm, linuxppc-dev, Rafael J. Wysocki,
	Linux Kernel Mailing List, Paul Mackerras, Lee Jones,
	Linux ARM Mailing List
In-Reply-To: <CAOesGMi1dfqPbFJ8YoUoJ75NdU1=XiNoYx+6+JLu44a4LuuYGA@mail.gmail.com>

On 14-07-20, 20:49, Olof Johansson wrote:
> On Tue, Jul 14, 2020 at 8:07 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > On 14-07-20, 15:50, Lee Jones wrote:
> > > If function callers and providers do not share the same prototypes the
> > > compiler complains of missing prototypes.  Fix this by moving the
> > > already existing prototypes out to a mutually convenient location.
> > >
> > > Fixes the following W=1 kernel build warning(s):
> > >
> > >  drivers/cpufreq/pasemi-cpufreq.c:109:5: warning: no previous prototype for ‘check_astate’ [-Wmissing-prototypes]
> > >  109 | int check_astate(void)
> > >  | ^~~~~~~~~~~~
> > >  drivers/cpufreq/pasemi-cpufreq.c:114:6: warning: no previous prototype for ‘restore_astate’ [-Wmissing-prototypes]
> > >  114 | void restore_astate(int cpu)
> > >  | ^~~~~~~~~~~~~~
> > >
> > > Cc: Olof Johansson <olof@lixom.net>
> > > Cc: Michael Ellerman <mpe@ellerman.id.au>
> > > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > > Cc: Paul Mackerras <paulus@samba.org>
> > > Cc: linuxppc-dev@lists.ozlabs.org
> > > Signed-off-by: Lee Jones <lee.jones@linaro.org>
> > > ---
> > >  arch/powerpc/platforms/pasemi/pasemi.h    | 15 ------------
> >
> > Is there no sane way we can include this file directly to the cpufreq
> > file ?
> 
> Yep. arch/powerpc seems to be in the search path for modules on powerpc, so:
> 
> diff --git a/drivers/cpufreq/pasemi-cpufreq.c b/drivers/cpufreq/pasemi-cpufreq.c
> index c66f566a854cb..815645170c4de 100644
> --- a/drivers/cpufreq/pasemi-cpufreq.c
> +++ b/drivers/cpufreq/pasemi-cpufreq.c
> @@ -22,6 +22,8 @@
>  #include <asm/time.h>
>  #include <asm/smp.h>
> 
> +#include <platforms/pasemi/pasemi.h>
> +
>  #define SDCASR_REG             0x0100
>  #define SDCASR_REG_STRIDE      0x1000
>  #define SDCPWR_CFGA0_REG       0x0100

Fantastic. Thanks.

-- 
viresh

^ permalink raw reply

* Re: [PATCH v3 05/12] powerpc/drmem: make lmb walk a bit more flexible
From: Thiago Jung Bauermann @ 2020-07-15  3:50 UTC (permalink / raw)
  To: Hari Bathini
  Cc: Pingfan Liu, Nayna Jain, Kexec-ml, Mahesh J Salgaonkar,
	Mimi Zohar, lkml, linuxppc-dev, Sourabh Jain, Petr Tesarik,
	Andrew Morton, Dave Young, Vivek Goyal, Eric Biederman
In-Reply-To: <159466090332.24747.9255471295044653085.stgit@hbathini.in.ibm.com>


Hari Bathini <hbathini@linux.ibm.com> writes:

> @@ -534,7 +537,7 @@ static int __init early_init_dt_scan_memory_ppc(unsigned long node,
>  #ifdef CONFIG_PPC_PSERIES
>  	if (depth == 1 &&
>  	    strcmp(uname, "ibm,dynamic-reconfiguration-memory") == 0) {
> -		walk_drmem_lmbs_early(node, early_init_drmem_lmb);
> +		walk_drmem_lmbs_early(node, NULL, early_init_drmem_lmb);

walk_drmem_lmbs_early() can now fail. Should this failure be propagated
as a return value of early_init_dt_scan_memory_ppc()?

>  		return 0;
>  	}
>  #endif
<snip>

> @@ -787,7 +790,7 @@ static int __init parse_numa_properties(void)
>  	 */
>  	memory = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
>  	if (memory) {
> -		walk_drmem_lmbs(memory, numa_setup_drmem_lmb);
> +		walk_drmem_lmbs(memory, NULL, numa_setup_drmem_lmb);

Similarly here. Now that this call can fail, should
parse_numa_properties() handle or propagate the failure?

>  		of_node_put(memory);
>  	}
>


--
Thiago Jung Bauermann
IBM Linux Technology Center

^ permalink raw reply

* Re: [PATCH 05/13] cpufreq/arch: powerpc: pasemi: Move prototypes to shared header
From: Olof Johansson @ 2020-07-15  3:49 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: linux-pm, linuxppc-dev, Rafael J. Wysocki,
	Linux Kernel Mailing List, Paul Mackerras, Lee Jones,
	Linux ARM Mailing List
In-Reply-To: <20200715030706.prxya7fyylscoy25@vireshk-i7>

On Tue, Jul 14, 2020 at 8:07 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 14-07-20, 15:50, Lee Jones wrote:
> > If function callers and providers do not share the same prototypes the
> > compiler complains of missing prototypes.  Fix this by moving the
> > already existing prototypes out to a mutually convenient location.
> >
> > Fixes the following W=1 kernel build warning(s):
> >
> >  drivers/cpufreq/pasemi-cpufreq.c:109:5: warning: no previous prototype for ‘check_astate’ [-Wmissing-prototypes]
> >  109 | int check_astate(void)
> >  | ^~~~~~~~~~~~
> >  drivers/cpufreq/pasemi-cpufreq.c:114:6: warning: no previous prototype for ‘restore_astate’ [-Wmissing-prototypes]
> >  114 | void restore_astate(int cpu)
> >  | ^~~~~~~~~~~~~~
> >
> > Cc: Olof Johansson <olof@lixom.net>
> > Cc: Michael Ellerman <mpe@ellerman.id.au>
> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Cc: Paul Mackerras <paulus@samba.org>
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Signed-off-by: Lee Jones <lee.jones@linaro.org>
> > ---
> >  arch/powerpc/platforms/pasemi/pasemi.h    | 15 ------------
>
> Is there no sane way we can include this file directly to the cpufreq
> file ?

Yep. arch/powerpc seems to be in the search path for modules on powerpc, so:

diff --git a/drivers/cpufreq/pasemi-cpufreq.c b/drivers/cpufreq/pasemi-cpufreq.c
index c66f566a854cb..815645170c4de 100644
--- a/drivers/cpufreq/pasemi-cpufreq.c
+++ b/drivers/cpufreq/pasemi-cpufreq.c
@@ -22,6 +22,8 @@
 #include <asm/time.h>
 #include <asm/smp.h>

+#include <platforms/pasemi/pasemi.h>
+
 #define SDCASR_REG             0x0100
 #define SDCASR_REG_STRIDE      0x1000
 #define SDCPWR_CFGA0_REG       0x0100


-Olof

^ permalink raw reply related

* Re: [PATCH -next] cpufreq: powernv: Make some symbols static
From: Viresh Kumar @ 2020-07-15  3:41 UTC (permalink / raw)
  To: Wei Yongjun, lee.jones
  Cc: Hulk Robot, linux-pm, Rafael J. Wysocki, linuxppc-dev
In-Reply-To: <20200714142355.29819-1-weiyongjun1@huawei.com>

On 14-07-20, 22:23, Wei Yongjun wrote:
> The sparse tool complains as follows:
> 
> drivers/cpufreq/powernv-cpufreq.c:88:1: warning:
>  symbol 'pstate_revmap' was not declared. Should it be static?
> drivers/cpufreq/powernv-cpufreq.c:383:18: warning:
>  symbol 'cpufreq_freq_attr_cpuinfo_nominal_freq' was not declared. Should it be static?
> drivers/cpufreq/powernv-cpufreq.c:669:6: warning:
>  symbol 'gpstate_timer_handler' was not declared. Should it be static?
> drivers/cpufreq/powernv-cpufreq.c:902:6: warning:
>  symbol 'powernv_cpufreq_work_fn' was not declared. Should it be static?
> 
> Those symbols are not used outside of this file, so mark
> them static.
> 
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
> ---
>  drivers/cpufreq/powernv-cpufreq.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)

Lee also sent a fix for this, but yours look complete :)

https://lore.kernel.org/lkml/20200714145049.2496163-7-lee.jones@linaro.org/

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

-- 
viresh

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox