LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ASoC: fsl_esai: add IRQF_SHARED for devm_request_irq
From: Shengjiu Wang @ 2020-07-23  4:00 UTC (permalink / raw)
  To: timur, nicoleotsuka, Xiubo.Lee, festevam, broonie, perex, tiwai,
	alsa-devel
  Cc: linuxppc-dev, linux-kernel

ESAI interfaces may share same interrupt line with EDMA on
some platforms (e.g. i.MX8QXP, i.MX8QM).
Add IRQF_SHARED flag to allow sharing the irq among several
devices

Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Signed-off-by: Viorel Suman <viorel.suman@nxp.com>
---
 sound/soc/fsl/fsl_esai.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/fsl/fsl_esai.c b/sound/soc/fsl/fsl_esai.c
index b8fbd7ba94af..4ae36099ae82 100644
--- a/sound/soc/fsl/fsl_esai.c
+++ b/sound/soc/fsl/fsl_esai.c
@@ -1012,7 +1012,7 @@ static int fsl_esai_probe(struct platform_device *pdev)
 	if (irq < 0)
 		return irq;
 
-	ret = devm_request_irq(&pdev->dev, irq, esai_isr, 0,
+	ret = devm_request_irq(&pdev->dev, irq, esai_isr, IRQF_SHARED,
 			       esai_priv->name, esai_priv);
 	if (ret) {
 		dev_err(&pdev->dev, "failed to claim irq %u\n", irq);
-- 
2.27.0


^ permalink raw reply related

* Re: [PATCH v2 2/2] KVM: PPC: Book3S HV: rework secure mem slot dropping
From: Bharata B Rao @ 2020-07-23  3:36 UTC (permalink / raw)
  To: Laurent Dufour
  Cc: linuxram, linux-kernel, kvm-ppc, paulus, sukadev, linuxppc-dev,
	bauerman
In-Reply-To: <20200721104202.15727-3-ldufour@linux.ibm.com>

On Tue, Jul 21, 2020 at 12:42:02PM +0200, Laurent Dufour wrote:
> When a secure memslot is dropped, all the pages backed in the secure device
> (aka really backed by secure memory by the Ultravisor) should be paged out
> to a normal page. Previously, this was achieved by triggering the page
> fault mechanism which is calling kvmppc_svm_page_out() on each pages.
> 
> This can't work when hot unplugging a memory slot because the memory slot
> is flagged as invalid and gfn_to_pfn() is then not trying to access the
> page, so the page fault mechanism is not triggered.
> 
> Since the final goal is to make a call to kvmppc_svm_page_out() it seems
> simpler to directly calling it instead of triggering such a mechanism. This
> way kvmppc_uvmem_drop_pages() can be called even when hot unplugging a
> memslot.
> 
> Since kvmppc_uvmem_drop_pages() is already holding kvm->arch.uvmem_lock,
> the call to __kvmppc_svm_page_out() is made.
> As __kvmppc_svm_page_out needs the vma pointer to migrate the pages, the
> VMA is fetched in a lazy way, to not trigger find_vma() all the time. In
> addition, the mmap_sem is help in read mode during that time, not in write
> mode since the virual memory layout is not impacted, and
> kvm->arch.uvmem_lock prevents concurrent operation on the secure device.
> 
> Cc: Ram Pai <linuxram@us.ibm.com>
> Cc: Bharata B Rao <bharata@linux.ibm.com>
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
> ---
>  arch/powerpc/kvm/book3s_hv_uvmem.c | 54 ++++++++++++++++++++----------
>  1 file changed, 37 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index 5a4b02d3f651..ba5c7c77cc3a 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -624,35 +624,55 @@ static inline int kvmppc_svm_page_out(struct vm_area_struct *vma,
>   * fault on them, do fault time migration to replace the device PTEs in
>   * QEMU page table with normal PTEs from newly allocated pages.
>   */
> -void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
> +void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *slot,
>  			     struct kvm *kvm, bool skip_page_out)
>  {
>  	int i;
>  	struct kvmppc_uvmem_page_pvt *pvt;
> -	unsigned long pfn, uvmem_pfn;
> -	unsigned long gfn = free->base_gfn;
> +	struct page *uvmem_page;
> +	struct vm_area_struct *vma = NULL;
> +	unsigned long uvmem_pfn, gfn;
> +	unsigned long addr, end;
> +
> +	mmap_read_lock(kvm->mm);
> +
> +	addr = slot->userspace_addr;

We typically use gfn_to_hva() for that, but that won't work for a
memslot that is already marked INVALID which is the case here.
I think it is ok to access slot->userspace_addr here of an INVALID
memslot, but just thought of explictly bringing this up.

> +	end = addr + (slot->npages * PAGE_SIZE);
>  
> -	for (i = free->npages; i; --i, ++gfn) {
> -		struct page *uvmem_page;
> +	gfn = slot->base_gfn;
> +	for (i = slot->npages; i; --i, ++gfn, addr += PAGE_SIZE) {
> +
> +		/* Fetch the VMA if addr is not in the latest fetched one */
> +		if (!vma || (addr < vma->vm_start || addr >= vma->vm_end)) {
> +			vma = find_vma_intersection(kvm->mm, addr, end);
> +			if (!vma ||
> +			    vma->vm_start > addr || vma->vm_end < end) {
> +				pr_err("Can't find VMA for gfn:0x%lx\n", gfn);
> +				break;
> +			}
> +		}

In Ram's series, kvmppc_memslot_page_merge() also walks the VMAs spanning
the memslot, but it uses a different logic for the same. Why can't these
two cases use the same method to walk the VMAs? Is there anything subtly
different between the two cases?

Regards,
Bharata.

^ permalink raw reply

* Re: [PATCHv3 2/2] powerpc/pseries: update device tree before ejecting hotplug uevents
From: Pingfan Liu @ 2020-07-23  2:27 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Kexec Mailing List, linuxppc-dev, Hari Bathini
In-Reply-To: <87pn8oqh86.fsf@mpe.ellerman.id.au>

On Wed, Jul 22, 2020 at 12:57 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Pingfan Liu <kernelfans@gmail.com> writes:
> > A bug is observed on pseries by taking the following steps on rhel:
>                                                                 ^
>                                                                 RHEL
>
> I assume it happens on mainline too?
Yes, it does.
>
[...]
> > diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> > index 1a3ac3b..def8cb3f 100644
> > --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> > +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> > @@ -372,6 +372,7 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
> >       invalidate_lmb_associativity_index(lmb);
> >       lmb_clear_nid(lmb);
> >       lmb->flags &= ~DRCONF_MEM_ASSIGNED;
> > +     drmem_update_dt();
>
> No error checking?
Hmm, here should be a more careful design. Please see the comment at the end.
>
> >       __remove_memory(nid, base_addr, block_sz);
> >
> > @@ -607,6 +608,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
> >
> >       lmb_set_nid(lmb);
> >       lmb->flags |= DRCONF_MEM_ASSIGNED;
> > +     drmem_update_dt();
>
> And here ..
> >
> >       block_sz = memory_block_size_bytes();
> >
> > @@ -625,6 +627,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
> >               invalidate_lmb_associativity_index(lmb);
> >               lmb_clear_nid(lmb);
> >               lmb->flags &= ~DRCONF_MEM_ASSIGNED;
> > +             drmem_update_dt();
>
>
> And here ..
>
> >               __remove_memory(nid, base_addr, block_sz);
> >       }
> > @@ -877,9 +880,6 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
> >               break;
> >       }
> >
> > -     if (!rc)
> > -             rc = drmem_update_dt();
> > -
> >       unlock_device_hotplug();
> >       return rc;
>
> Whereas previously we did check it.

drmem_update_dt() fails iff allocating memory fail. And in the failed
case, even the original code does not roll back the effect of
__add_memory()/__remove_memory().

And I plan to do the following in V4: if drmem_update_dt() fails in
dlpar_add_lmb(), then bails out immediately.

Thanks,
Pingfan

^ permalink raw reply

* Re: [PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Jarkko Sakkinen @ 2020-07-23  1:51 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
	Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
	Andy Lutomirski, Thomas Gleixner, Jiri Kosina, Anup Patel,
	linux-kernel, Philipp Rudo, Torsten Duwe, Masami Hiramatsu,
	Andrew Morton, Mark Rutland, James E.J. Bottomley, Vincent Chen,
	Omar Sandoval, open list:S390, Joe Lawrence, Helge Deller,
	John Fastabend, Anil S Keshavamurthy, Yonghong Song, Iurii Zaikin,
	Andrii Nakryiko, Thomas Huth, Vasily Gorbik,
	moderated list:ARM PORT, Daniel Axtens, Damien Le Moal,
	Martin KaFai Lau, Song Liu, Josh Poimboeuf, Heiko Carstens,
	Alexei Starovoitov, Atish Patra, Will Deacon, Daniel Borkmann,
	Masahiro Yamada, Nayna Jain, Ley Foon Tan, Christian Borntraeger,
	Sami Tolvanen, Naveen N. Rao, Mao Han, Marco Elver,
	Steven Rostedt, Babu Moger, Borislav Petkov, Greentime Hu,
	Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
	open list:PARISC ARCHITECTURE, Jessica Yu,
	open list:BPF JIT for MIPS 32-BIT AND 64-BIT, David S. Miller,
	Thiago Jung Bauermann, Peter Zijlstra, David Howells,
	Amit Daniel Kachhap, Sandipan Das, H. Peter Anvin,
	open list:SPARC + UltraSPARC sparc/sparc64, Tiezhu Yang,
	Miroslav Benes, Sven Schnelle, Ard Biesheuvel, Vincenzo Frascino,
	Anders Roxell, Jiri Olsa,
	maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT, Russell King,
	open list:RISC-V ARCHITECTURE, Mike Rapoport, Ingo Molnar,
	Albert Ou, Paul E. McKenney, Paul Walmsley, KP Singh,
	Dmitry Vyukov, Nick Hu,
	open list:BPF JIT for MIPS 32-BIT AND 64-BIT, open list:MIPS,
	Palmer Dabbelt, open list:LINUX FOR POWERPC 32-BIT AND 64-BIT
In-Reply-To: <20200716184909.Horde.JVRLLcKix_jhrJfiQYRbbQ1@messagerie.si.c-s.fr>

On Thu, Jul 16, 2020 at 06:49:09PM +0200, Christophe Leroy wrote:
> Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> a écrit :
> 
> > Rename module_alloc() to text_alloc() and module_memfree() to
> > text_memfree(), and move them to kernel/text.c, which is unconditionally
> > compiled to the kernel proper. This allows kprobes, ftrace and bpf to
> > allocate space for executable code without requiring to compile the modules
> > support (CONFIG_MODULES=y) in.
> 
> You are not changing enough in powerpc to have this work.
> On powerpc 32 bits (6xx), when STRICT_KERNEL_RWX is selected, the vmalloc
> space is set to NX (no exec) at segment level (ie by 256Mbytes zone) unless
> CONFIG_MODULES is selected.
> 
> Christophe

This has been deduced down to:

https://lore.kernel.org/lkml/20200717030422.679972-1-jarkko.sakkinen@linux.intel.com/

I.e. not intruding PPC anymore :-)

/Jarkko

^ permalink raw reply

* Re: [v3 11/15] powerpc/perf: BHRB control to disable BHRB logic when not used
From: Jordan Niethe @ 2020-07-23  1:28 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: Gautham R Shenoy, Michael Neuling, maddy, kvm, kvm-ppc, svaidyan,
	acme, jolsa, linuxppc-dev
In-Reply-To: <CACzsE9q-oOhABFWUWH5Jc3BuePpUSmtyrzaJt0x7iJSVpeXH0g@mail.gmail.com>

On Thu, Jul 23, 2020 at 11:26 AM Jordan Niethe <jniethe5@gmail.com> wrote:
>
> On Sat, Jul 18, 2020 at 1:26 AM Athira Rajeev
> <atrajeev@linux.vnet.ibm.com> wrote:
> >
> > PowerISA v3.1 has few updates for the Branch History Rolling Buffer(BHRB).
> >
> > BHRB disable is controlled via Monitor Mode Control Register A (MMCRA)
> > bit, namely "BHRB Recording Disable (BHRBRD)". This field controls
> > whether BHRB entries are written when BHRB recording is enabled by other
> > bits. This patch implements support for this BHRB disable bit.
> >
> > By setting 0b1 to this bit will disable the BHRB and by setting 0b0
> > to this bit will have BHRB enabled. This addresses backward
> > compatibility (for older OS), since this bit will be cleared and
> > hardware will be writing to BHRB by default.
> >
> > This patch addresses changes to set MMCRA (BHRBRD) at boot for power10
> > ( there by the core will run faster) and enable this feature only on
> > runtime ie, on explicit need from user. Also save/restore MMCRA in the
> > restore path of state-loss idle state to make sure we keep BHRB disabled
> > if it was not enabled on request at runtime.
> >
> > Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/perf/core-book3s.c       | 20 ++++++++++++++++----
> >  arch/powerpc/perf/isa207-common.c     | 12 ++++++++++++
> >  arch/powerpc/platforms/powernv/idle.c | 22 ++++++++++++++++++++--
> >  3 files changed, 48 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> > index bd125fe..31c0535 100644
> > --- a/arch/powerpc/perf/core-book3s.c
> > +++ b/arch/powerpc/perf/core-book3s.c
> > @@ -1218,7 +1218,7 @@ static void write_mmcr0(struct cpu_hw_events *cpuhw, unsigned long mmcr0)
> >  static void power_pmu_disable(struct pmu *pmu)
> >  {
> >         struct cpu_hw_events *cpuhw;
> > -       unsigned long flags, mmcr0, val;
> > +       unsigned long flags, mmcr0, val, mmcra;
> >
> >         if (!ppmu)
> >                 return;
> > @@ -1251,12 +1251,24 @@ static void power_pmu_disable(struct pmu *pmu)
> >                 mb();
> >                 isync();
> >
> > +               val = mmcra = cpuhw->mmcr.mmcra;
> > +
> >                 /*
> >                  * Disable instruction sampling if it was enabled
> >                  */
> > -               if (cpuhw->mmcr.mmcra & MMCRA_SAMPLE_ENABLE) {
> > -                       mtspr(SPRN_MMCRA,
> > -                             cpuhw->mmcr.mmcra & ~MMCRA_SAMPLE_ENABLE);
> > +               if (cpuhw->mmcr.mmcra & MMCRA_SAMPLE_ENABLE)
> > +                       val &= ~MMCRA_SAMPLE_ENABLE;
> > +
> > +               /* Disable BHRB via mmcra (BHRBRD) for p10 */
> > +               if (ppmu->flags & PPMU_ARCH_310S)
> > +                       val |= MMCRA_BHRB_DISABLE;
> > +
> > +               /*
> > +                * Write SPRN_MMCRA if mmcra has either disabled
> > +                * instruction sampling or BHRB.
> > +                */
> > +               if (val != mmcra) {
> > +                       mtspr(SPRN_MMCRA, mmcra);
> >                         mb();
> >                         isync();
> >                 }
> > diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c
> > index 77643f3..964437a 100644
> > --- a/arch/powerpc/perf/isa207-common.c
> > +++ b/arch/powerpc/perf/isa207-common.c
> > @@ -404,6 +404,13 @@ int isa207_compute_mmcr(u64 event[], int n_ev,
> >
> >         mmcra = mmcr1 = mmcr2 = mmcr3 = 0;
> >
> > +       /*
> > +        * Disable bhrb unless explicitly requested
> > +        * by setting MMCRA (BHRBRD) bit.
> > +        */
> > +       if (cpu_has_feature(CPU_FTR_ARCH_31))
> > +               mmcra |= MMCRA_BHRB_DISABLE;
> > +
> >         /* Second pass: assign PMCs, set all MMCR1 fields */
> >         for (i = 0; i < n_ev; ++i) {
> >                 pmc     = (event[i] >> EVENT_PMC_SHIFT) & EVENT_PMC_MASK;
> > @@ -479,6 +486,11 @@ int isa207_compute_mmcr(u64 event[], int n_ev,
> >                         mmcra |= val << MMCRA_IFM_SHIFT;
> >                 }
> >
> > +               /* set MMCRA (BHRBRD) to 0 if there is user request for BHRB */
> > +               if (cpu_has_feature(CPU_FTR_ARCH_31) &&
> > +                               (has_branch_stack(pevents[i]) || (event[i] & EVENT_WANTS_BHRB)))
> > +                       mmcra &= ~MMCRA_BHRB_DISABLE;
> > +
> >                 if (pevents[i]->attr.exclude_user)
> >                         mmcr2 |= MMCR2_FCP(pmc);
> >
> > diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
> > index 2dd4673..1c9d0a9 100644
> > --- a/arch/powerpc/platforms/powernv/idle.c
> > +++ b/arch/powerpc/platforms/powernv/idle.c
> > @@ -611,6 +611,7 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
> >         unsigned long srr1;
> >         unsigned long pls;
> >         unsigned long mmcr0 = 0;
> > +       unsigned long mmcra = 0;
> >         struct p9_sprs sprs = {}; /* avoid false used-uninitialised */
> >         bool sprs_saved = false;
> >
> > @@ -657,6 +658,21 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
> >                   */
> >                 mmcr0           = mfspr(SPRN_MMCR0);
> >         }
> > +
> > +       if (cpu_has_feature(CPU_FTR_ARCH_31)) {
> > +               /*
> > +                * POWER10 uses MMCRA (BHRBRD) as BHRB disable bit.
> > +                * If the user hasn't asked for the BHRB to be
> > +                * written, the value of MMCRA[BHRBRD] is 1.
> > +                * On wakeup from stop, MMCRA[BHRBD] will be 0,
> > +                * since it is previleged resource and will be lost.
> > +                * Thus, if we do not save and restore the MMCRA[BHRBD],
> > +                * hardware will be needlessly writing to the BHRB
> > +                * in problem mode.
> > +                */
> > +               mmcra           = mfspr(SPRN_MMCRA);
> If the only thing that needs to happen is set MMCRA[BHRBRD], why save the MMCRA?
> > +       }
> > +
> >         if ((psscr & PSSCR_RL_MASK) >= pnv_first_spr_loss_level) {
> >                 sprs.lpcr       = mfspr(SPRN_LPCR);
> >                 sprs.hfscr      = mfspr(SPRN_HFSCR);
> > @@ -700,8 +716,6 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
> >         WARN_ON_ONCE(mfmsr() & (MSR_IR|MSR_DR));
> >
> >         if ((srr1 & SRR1_WAKESTATE) != SRR1_WS_NOLOSS) {
> > -               unsigned long mmcra;
> > -
> >                 /*
> >                  * We don't need an isync after the mtsprs here because the
> >                  * upcoming mtmsrd is execution synchronizing.
> > @@ -721,6 +735,10 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
> >                         mtspr(SPRN_MMCR0, mmcr0);
> >                 }
> >
> > +               /* Reload MMCRA to restore BHRB disable bit for POWER10 */
> > +               if (cpu_has_feature(CPU_FTR_ARCH_31))
> > +                       mtspr(SPRN_MMCRA, mmcra);
> > +
> >                 /*
> >                  * DD2.2 and earlier need to set then clear bit 60 in MMCRA
> >                  * to ensure the PMU starts running.
> Just below here we have:
>         mmcra = mfspr(SPRN_MMCRA);
>         mmcra |= PPC_BIT(60);
>         mtspr(SPRN_MMCRA, mmcra);
>         mmcra &= ~PPC_BIT(60);
>         mtspr(SPRN_MMCRA, mmcra);
> It seems MMCRA[BHRBRD] could just be OR'd in someway similar to this
> for ISA v3.1?
Oh wait the user could have cleared it, just ignore me.
> > --
> > 1.8.3.1
> >

^ permalink raw reply

* Re: [v3 11/15] powerpc/perf: BHRB control to disable BHRB logic when not used
From: Jordan Niethe @ 2020-07-23  1:26 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: Gautham R Shenoy, Michael Neuling, maddy, kvm, kvm-ppc, svaidyan,
	acme, jolsa, linuxppc-dev
In-Reply-To: <1594996707-3727-12-git-send-email-atrajeev@linux.vnet.ibm.com>

On Sat, Jul 18, 2020 at 1:26 AM Athira Rajeev
<atrajeev@linux.vnet.ibm.com> wrote:
>
> PowerISA v3.1 has few updates for the Branch History Rolling Buffer(BHRB).
>
> BHRB disable is controlled via Monitor Mode Control Register A (MMCRA)
> bit, namely "BHRB Recording Disable (BHRBRD)". This field controls
> whether BHRB entries are written when BHRB recording is enabled by other
> bits. This patch implements support for this BHRB disable bit.
>
> By setting 0b1 to this bit will disable the BHRB and by setting 0b0
> to this bit will have BHRB enabled. This addresses backward
> compatibility (for older OS), since this bit will be cleared and
> hardware will be writing to BHRB by default.
>
> This patch addresses changes to set MMCRA (BHRBRD) at boot for power10
> ( there by the core will run faster) and enable this feature only on
> runtime ie, on explicit need from user. Also save/restore MMCRA in the
> restore path of state-loss idle state to make sure we keep BHRB disabled
> if it was not enabled on request at runtime.
>
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> ---
>  arch/powerpc/perf/core-book3s.c       | 20 ++++++++++++++++----
>  arch/powerpc/perf/isa207-common.c     | 12 ++++++++++++
>  arch/powerpc/platforms/powernv/idle.c | 22 ++++++++++++++++++++--
>  3 files changed, 48 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index bd125fe..31c0535 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -1218,7 +1218,7 @@ static void write_mmcr0(struct cpu_hw_events *cpuhw, unsigned long mmcr0)
>  static void power_pmu_disable(struct pmu *pmu)
>  {
>         struct cpu_hw_events *cpuhw;
> -       unsigned long flags, mmcr0, val;
> +       unsigned long flags, mmcr0, val, mmcra;
>
>         if (!ppmu)
>                 return;
> @@ -1251,12 +1251,24 @@ static void power_pmu_disable(struct pmu *pmu)
>                 mb();
>                 isync();
>
> +               val = mmcra = cpuhw->mmcr.mmcra;
> +
>                 /*
>                  * Disable instruction sampling if it was enabled
>                  */
> -               if (cpuhw->mmcr.mmcra & MMCRA_SAMPLE_ENABLE) {
> -                       mtspr(SPRN_MMCRA,
> -                             cpuhw->mmcr.mmcra & ~MMCRA_SAMPLE_ENABLE);
> +               if (cpuhw->mmcr.mmcra & MMCRA_SAMPLE_ENABLE)
> +                       val &= ~MMCRA_SAMPLE_ENABLE;
> +
> +               /* Disable BHRB via mmcra (BHRBRD) for p10 */
> +               if (ppmu->flags & PPMU_ARCH_310S)
> +                       val |= MMCRA_BHRB_DISABLE;
> +
> +               /*
> +                * Write SPRN_MMCRA if mmcra has either disabled
> +                * instruction sampling or BHRB.
> +                */
> +               if (val != mmcra) {
> +                       mtspr(SPRN_MMCRA, mmcra);
>                         mb();
>                         isync();
>                 }
> diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c
> index 77643f3..964437a 100644
> --- a/arch/powerpc/perf/isa207-common.c
> +++ b/arch/powerpc/perf/isa207-common.c
> @@ -404,6 +404,13 @@ int isa207_compute_mmcr(u64 event[], int n_ev,
>
>         mmcra = mmcr1 = mmcr2 = mmcr3 = 0;
>
> +       /*
> +        * Disable bhrb unless explicitly requested
> +        * by setting MMCRA (BHRBRD) bit.
> +        */
> +       if (cpu_has_feature(CPU_FTR_ARCH_31))
> +               mmcra |= MMCRA_BHRB_DISABLE;
> +
>         /* Second pass: assign PMCs, set all MMCR1 fields */
>         for (i = 0; i < n_ev; ++i) {
>                 pmc     = (event[i] >> EVENT_PMC_SHIFT) & EVENT_PMC_MASK;
> @@ -479,6 +486,11 @@ int isa207_compute_mmcr(u64 event[], int n_ev,
>                         mmcra |= val << MMCRA_IFM_SHIFT;
>                 }
>
> +               /* set MMCRA (BHRBRD) to 0 if there is user request for BHRB */
> +               if (cpu_has_feature(CPU_FTR_ARCH_31) &&
> +                               (has_branch_stack(pevents[i]) || (event[i] & EVENT_WANTS_BHRB)))
> +                       mmcra &= ~MMCRA_BHRB_DISABLE;
> +
>                 if (pevents[i]->attr.exclude_user)
>                         mmcr2 |= MMCR2_FCP(pmc);
>
> diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
> index 2dd4673..1c9d0a9 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -611,6 +611,7 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
>         unsigned long srr1;
>         unsigned long pls;
>         unsigned long mmcr0 = 0;
> +       unsigned long mmcra = 0;
>         struct p9_sprs sprs = {}; /* avoid false used-uninitialised */
>         bool sprs_saved = false;
>
> @@ -657,6 +658,21 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
>                   */
>                 mmcr0           = mfspr(SPRN_MMCR0);
>         }
> +
> +       if (cpu_has_feature(CPU_FTR_ARCH_31)) {
> +               /*
> +                * POWER10 uses MMCRA (BHRBRD) as BHRB disable bit.
> +                * If the user hasn't asked for the BHRB to be
> +                * written, the value of MMCRA[BHRBRD] is 1.
> +                * On wakeup from stop, MMCRA[BHRBD] will be 0,
> +                * since it is previleged resource and will be lost.
> +                * Thus, if we do not save and restore the MMCRA[BHRBD],
> +                * hardware will be needlessly writing to the BHRB
> +                * in problem mode.
> +                */
> +               mmcra           = mfspr(SPRN_MMCRA);
If the only thing that needs to happen is set MMCRA[BHRBRD], why save the MMCRA?
> +       }
> +
>         if ((psscr & PSSCR_RL_MASK) >= pnv_first_spr_loss_level) {
>                 sprs.lpcr       = mfspr(SPRN_LPCR);
>                 sprs.hfscr      = mfspr(SPRN_HFSCR);
> @@ -700,8 +716,6 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
>         WARN_ON_ONCE(mfmsr() & (MSR_IR|MSR_DR));
>
>         if ((srr1 & SRR1_WAKESTATE) != SRR1_WS_NOLOSS) {
> -               unsigned long mmcra;
> -
>                 /*
>                  * We don't need an isync after the mtsprs here because the
>                  * upcoming mtmsrd is execution synchronizing.
> @@ -721,6 +735,10 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
>                         mtspr(SPRN_MMCR0, mmcr0);
>                 }
>
> +               /* Reload MMCRA to restore BHRB disable bit for POWER10 */
> +               if (cpu_has_feature(CPU_FTR_ARCH_31))
> +                       mtspr(SPRN_MMCRA, mmcra);
> +
>                 /*
>                  * DD2.2 and earlier need to set then clear bit 60 in MMCRA
>                  * to ensure the PMU starts running.
Just below here we have:
        mmcra = mfspr(SPRN_MMCRA);
        mmcra |= PPC_BIT(60);
        mtspr(SPRN_MMCRA, mmcra);
        mmcra &= ~PPC_BIT(60);
        mtspr(SPRN_MMCRA, mmcra);
It seems MMCRA[BHRBRD] could just be OR'd in someway similar to this
for ISA v3.1?
> --
> 1.8.3.1
>

^ permalink raw reply

* Re: [PATCH v4 5/7] powerpc/iommu: Move iommu_table cleaning routine to iommu_table_clean
From: Leonardo Bras @ 2020-07-22 23:45 UTC (permalink / raw)
  To: Brian King, Alexey Kardashevskiy, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Joel Stanley,
	Christophe Leroy, Thiago Jung Bauermann, Ram Pai
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <d6078fce-bb5f-f829-5de2-5bce3cee2bd5@linux.vnet.ibm.com>

On Tue, 2020-07-21 at 19:52 -0500, Brian King wrote:
> > 
> > As of today, there seems to be nothing like that happening in the
> > driver I am testing. 
> > I spoke to Brian King on slack, and he mentioned that at the point DDW
> > is created there should be no allocations in place.
> 
> I think there are a couple of scenarios here. One is where there is a DMA
> allocation prior to a call to set the DMA mask. Second scenario is if the
> driver makes multiple calls to set the DMA mask. I would argue that a properly
> written driver should tell the IOMMU subsystem what DMA mask it supports prior
> to allocating DMA memroy. Documentation/core-api/dma-api-howto.rst should
> describe what is legal and what is not.
> 
> It might be reasonable to declare its not allowed to allocate DMA memory
> and then later change the DMA mask and clearly call this out in the documentation
> if its not already.
> 
> -Brian

Thank you for the feedback Brian!

That makes sense to me. I will try to have this in mind for the next
patchset. 

Best regards,


^ permalink raw reply

* Re: [PATCH v4 5/7] powerpc/iommu: Move iommu_table cleaning routine to iommu_table_clean
From: Leonardo Bras @ 2020-07-22 23:37 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <dd26d682-f013-763d-3a92-6d99633c6175@ozlabs.ru>

On Wed, 2020-07-22 at 11:28 +1000, Alexey Kardashevskiy wrote:
> 
> On 22/07/2020 08:13, Leonardo Bras wrote:
> > On Tue, 2020-07-21 at 14:59 +1000, Alexey Kardashevskiy wrote:
> > > On 16/07/2020 17:16, Leonardo Bras wrote:
> > > > Move the part of iommu_table_free() that does struct iommu_table cleaning
> > > > into iommu_table_clean, so we can invoke it separately.
> > > > 
> > > > This new function is useful for cleaning struct iommu_table before
> > > > initializing it again with a new DMA window, without having it freed and
> > > > allocated again.
> > > > 
> > > > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> > > > ---
> > > >  arch/powerpc/kernel/iommu.c | 30 ++++++++++++++++++------------
> > > >  1 file changed, 18 insertions(+), 12 deletions(-)
> > > > 
> > > > diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> > > > index 9704f3f76e63..c3242253a4e7 100644
> > > > --- a/arch/powerpc/kernel/iommu.c
> > > > +++ b/arch/powerpc/kernel/iommu.c
> > > > @@ -735,21 +735,10 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
> > > >  	return tbl;
> > > >  }
> > > >  
> > > > -static void iommu_table_free(struct kref *kref)
> > > > +static void iommu_table_clean(struct iommu_table *tbl)
> > > 
> > > iommu_table_free() + iommu_init_table() + set_iommu_table_base() should
> > > work too, why new helper?
> > 
> > iommu_table_free() also frees the tbl, which would need allocate it
> > again (new address) and to fill it up again, unnecessarily. 
> 
> It is a new table in fact, everything is new there. You are only saving
> kfree+kzalloc which does not seem a huge win.
> 
> Also, iommu_table_update() simply assumes 64bit window by passing
> res_start=res_end=0 to iommu_init_table() which is not horribly robust
> either. Yeah, I know, iommu_init_table() is always called with zeroes in
> pseries but this is somewhat ok as those tables are from the device tree
> and those windows don't overlap with 32bit MMIO but under KVM they will
> (well, if we hack QEMU to advertise a single window).
> 
> I suggest removing iommu_pseries_table_update() from 6/7 and do
> iommu_table_free() + iommu_init_table() + set_iommu_table_base() with a
> WARN_ON(pdev->dev.archdata.dma_offset>=SZ_4G), may be even do this all
> in enable_ddw() where we know for sure if it is 1:1 mapping or just a
> big window.

Sure, I have yet to understand the full impact of this change, but I
will implement this and give it a try.

> 
> Out of curiosity - what page sizes does pHyp advertise in "query"?

64kB (page shift 0x10)

> 
> 
> > I think it's a better approach to only change what is needed.
> > 
> > > There is also iommu_table_clear() which does a different thing so you
> > > need a better name.
> > 
> > I agree.
> > I had not noticed this other function before sending the patchset. What
> > would be a better name though? __iommu_table_free()? 
> > 
> > > Second, iommu_table_free
> > > use and it would be ok as we would only see this when hot-unplugging a
> > > PE because we always kept the default window.
> > > Btw you must be seeing these warnings now every time you create DDW with
> > > these patches as at least the first page is reserved, do not you?
> > 
> > It does not print a warning.
> > I noticed other warnings,
> 
> And what are these?

tce_freemulti_pSeriesLP: plpar_tce_stuff failed
[...]

It's regarding the change in pagesize. 
Some places have the tceshift hardcoded as 12, tce_freemulti_pSeriesLP
is one of them, and that is causing some errors.

I wrote a patch fixing this, and I will include it in the next series.

> 
> > but not this one from iommu_table_free():
> > /* verify that table contains no entries */
> > if (!bitmap_empty(tbl->it_ma
> > p, tbl->it_size))
> > 	pr_warn("%s: Unexpected TCEs\n", __func__);
> > 
> > Before that, iommu_table_release_pages(tbl) is supposed to clear the 
> > bitmap, so this only tests for a tce that is created in this short period.
> 
> iommu_table_release_pages() only clears reserved pages - page 0 (just a
> protection against NULL DMA pointers) and 32bit MMIO (these should not
> be set for 64bit window). The "%s: Unexpected TCEs\n" is what checks for
> actual mapped TCEs.
> 

Oh, I haven't noticed that. Thanks for pointing!

> > > Since we are replacing a table for a device which is still in the
> > > system, we should not try messing with its DMA if it already has
> > > mappings so the warning should become an error preventing DDW. It is
> > > rather hard to trigger in practice but I could hack a driver to ask for
> > > 32bit DMA mask first, map few pages and then ask for 64bit DMA mask, it
> > > is not illegal, I think. So this needs a new helper - "bool
> > > iommu_table_in_use(tbl)" - to use in enable_ddw(). Or I am overthinking
> > > this?... Thanks,
> > 
> > As of today, there seems to be nothing like that happening in the
> > driver I am testing. 
> > I spoke to Brian King on slack, and he mentioned that at the point DDW
> > is created there should be no allocations in place.
> 
> Correct, there should not be. But it is also not a huge effort to fall
> back if there are.

True.

> 
> > But I suppose some driver could try to do this.
> > 
> > Maybe a better approach would be removing the mapping only if the
> > default window is removed (at the end of enable_ddw, as an else to
> > resetting the default DMA window), and having a way to add more
> > mappings to those pools. But this last part doesn't look so simple, and
> > it would be better to understand if it's necessary investing work in
> > this.
> > 
> > What do you think?
> 
> Add iommu_table_in_use(tbl) and fail DDW if that says "yes".

Seems good, I will include that on the next patchset.

Still, I will try to implement that more complex approach in a future
patchset, as it may come to be useful.

Thank you for the feedback!


^ permalink raw reply

* Re: [PATCH v2 1/2] powerpc/drmem: accelerate memory_add_physaddr_to_nid() with LMB xarray
From: Anton Blanchard @ 2020-07-22 23:00 UTC (permalink / raw)
  To: Scott Cheloha
  Cc: Nathan Lynch, Nathan Fontenont, Michal Hocko, Michal Suchanek,
	David Hildenbrand, linux-kernel, linuxppc-dev, Rick Lindley
In-Reply-To: <20200221172901.1596249-2-cheloha@linux.ibm.com>

Hi Scott,

I'm hitting this issue and Rick just pointed my at your patch. Any
chance we could get it upstream?

Thanks,
Anton

> On PowerPC, memory_add_physaddr_to_nid() uses a linear search to find
> an LMB matching the given address.  This scales very poorly when there
> are many LMBs.  The poor scaling cripples drmem_init() during boot:
> lmb_set_nid(), which calls memory_add_physaddr_to_nid(), is called for
> each LMB.
> 
> If we index each LMB in an xarray by its base address we can achieve
> O(log n) search during memory_add_physaddr_to_nid(), which scales much
> better.
> 
> For example, in the lab we have a 64TB P9 machine with 256MB LMBs.
> So, suring drmem_init() we instantiate 249854 LMBs.  On a vanilla
> kernel it completes drmem_init() in ~35 seconds with a soft lockup
> trace.  On the patched kernel it completes drmem_init() in ~0.5
> seconds.
> 
> Before:
> [   53.721639] drmem: initializing drmem v2
> [   80.604346] watchdog: BUG: soft lockup - CPU#65 stuck for 23s!
> [swapper/0:1] [   80.604377] Modules linked in:
> [   80.604389] CPU: 65 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc2+
> #4 [   80.604397] NIP:  c0000000000a4980 LR: c0000000000a4940 CTR:
> 0000000000000000 [   80.604407] REGS: c0002dbff8493830 TRAP: 0901
> Not tainted  (5.6.0-rc2+) [   80.604412] MSR:  8000000002009033
> <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 44000248  XER: 0000000d [
> 80.604431] CFAR: c0000000000a4a38 IRQMASK: 0 [   80.604431] GPR00:
> c0000000000a4940 c0002dbff8493ac0 c000000001904400 c0003cfffffede30 [
>   80.604431] GPR04: 0000000000000000 c000000000f4095a
> 000000000000002f 0000000010000000 [   80.604431] GPR08:
> c0000bf7ecdb7fb8 c0000bf7ecc2d3c8 0000000000000008 c00c0002fdfb2001 [
>   80.604431] GPR12: 0000000000000000 c00000001e8ec200 [   80.604477]
> NIP [c0000000000a4980] hot_add_scn_to_nid+0xa0/0x3e0 [   80.604486]
> LR [c0000000000a4940] hot_add_scn_to_nid+0x60/0x3e0 [   80.604492]
> Call Trace: [   80.604498] [c0002dbff8493ac0] [c0000000000a4940]
> hot_add_scn_to_nid+0x60/0x3e0 (unreliable) [   80.604509]
> [c0002dbff8493b20] [c000000000087c10]
> memory_add_physaddr_to_nid+0x20/0x60 [   80.604521]
> [c0002dbff8493b40] [c0000000010d4880] drmem_init+0x25c/0x2f0 [
> 80.604530] [c0002dbff8493c10] [c000000000010154]
> do_one_initcall+0x64/0x2c0 [   80.604540] [c0002dbff8493ce0]
> [c0000000010c4aa0] kernel_init_freeable+0x2d8/0x3a0 [   80.604550]
> [c0002dbff8493db0] [c000000000010824] kernel_init+0x2c/0x148 [
> 80.604560] [c0002dbff8493e20] [c00000000000b648]
> ret_from_kernel_thread+0x5c/0x74 [   80.604567] Instruction dump: [
> 80.604574] 392918e8 e9490000 e90a000a e92a0000 80ea000c 1d080018
> 3908ffe8 7d094214 [   80.604586] 7fa94040 419d00dc e9490010 714a0088
> <2faa0008> 409e00ac e9490000 7fbe5040 [   89.047390] drmem: 249854
> LMB(s)
> 
> After:
> [   53.424702] drmem: initializing drmem v2
> [   53.898813] drmem: 249854 LMB(s)
> 
> lmb_set_nid() is called from dlpar_lmb_add() so this patch will also
> improve memory hot-add speeds on big machines.
> 
> Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>
> ---
>  arch/powerpc/include/asm/drmem.h |  1 +
>  arch/powerpc/mm/drmem.c          | 24 ++++++++++++++++++++++++
>  arch/powerpc/mm/numa.c           | 29 ++++++++++-------------------
>  3 files changed, 35 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/drmem.h
> b/arch/powerpc/include/asm/drmem.h index 3d76e1c388c2..90a5a9ad872b
> 100644 --- a/arch/powerpc/include/asm/drmem.h
> +++ b/arch/powerpc/include/asm/drmem.h
> @@ -88,6 +88,7 @@ static inline bool drmem_lmb_reserved(struct
> drmem_lmb *lmb) return lmb->flags & DRMEM_LMB_RESERVED;
>  }
>  
> +struct drmem_lmb *drmem_find_lmb_by_base_addr(u64 base_addr);
>  u64 drmem_lmb_memory_max(void);
>  void __init walk_drmem_lmbs(struct device_node *dn,
>  			void (*func)(struct drmem_lmb *, const
> __be32 **)); diff --git a/arch/powerpc/mm/drmem.c
> b/arch/powerpc/mm/drmem.c index 44bfbdae920c..62cbe79e3860 100644
> --- a/arch/powerpc/mm/drmem.c
> +++ b/arch/powerpc/mm/drmem.c
> @@ -11,12 +11,31 @@
>  #include <linux/of.h>
>  #include <linux/of_fdt.h>
>  #include <linux/memblock.h>
> +#include <linux/xarray.h>
>  #include <asm/prom.h>
>  #include <asm/drmem.h>
>  
> +static DEFINE_XARRAY(drmem_lmb_xa_base_addr);
>  static struct drmem_lmb_info __drmem_info;
>  struct drmem_lmb_info *drmem_info = &__drmem_info;
>  
> +static int drmem_cache_lmb_for_lookup(struct drmem_lmb *lmb)
> +{
> +	void *ret;
> +
> +	ret = xa_store(&drmem_lmb_xa_base_addr, lmb->base_addr, lmb,
> +		       GFP_KERNEL);
> +	if (xa_is_err(ret))
> +		return xa_err(ret);
> +
> +	return 0;
> +}
> +
> +struct drmem_lmb *drmem_find_lmb_by_base_addr(u64 base_addr)
> +{
> +	return xa_load(&drmem_lmb_xa_base_addr, base_addr);
> +}
> +
>  u64 drmem_lmb_memory_max(void)
>  {
>  	struct drmem_lmb *last_lmb;
> @@ -364,6 +383,8 @@ static void __init init_drmem_v1_lmbs(const
> __be32 *prop) 
>  	for_each_drmem_lmb(lmb) {
>  		read_drconf_v1_cell(lmb, &prop);
> +		if (drmem_cache_lmb_for_lookup(lmb) != 0)
> +			return;
>  		lmb_set_nid(lmb);
>  	}
>  }
> @@ -411,6 +432,9 @@ static void __init init_drmem_v2_lmbs(const
> __be32 *prop) lmb->aa_index = dr_cell.aa_index;
>  			lmb->flags = dr_cell.flags;
>  
> +			if (drmem_cache_lmb_for_lookup(lmb) != 0)
> +				return;
> +
>  			lmb_set_nid(lmb);
>  		}
>  	}
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 3c7dec70cda0..0fd7963a991e 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -958,27 +958,18 @@ early_param("topology_updates",
> early_topology_updates); static int
> hot_add_drconf_scn_to_nid(unsigned long scn_addr) {
>  	struct drmem_lmb *lmb;
> -	unsigned long lmb_size;
> -	int nid = NUMA_NO_NODE;
> -
> -	lmb_size = drmem_lmb_size();
> -
> -	for_each_drmem_lmb(lmb) {
> -		/* skip this block if it is reserved or not assigned
> to
> -		 * this partition */
> -		if ((lmb->flags & DRCONF_MEM_RESERVED)
> -		    || !(lmb->flags & DRCONF_MEM_ASSIGNED))
> -			continue;
> -
> -		if ((scn_addr < lmb->base_addr)
> -		    || (scn_addr >= (lmb->base_addr + lmb_size)))
> -			continue;
>  
> -		nid = of_drconf_to_nid_single(lmb);
> -		break;
> -	}
> +	lmb = drmem_find_lmb_by_base_addr(scn_addr);
> +	if (lmb == NULL)
> +		return NUMA_NO_NODE;
> +	
> +	/* can't use this block if it is reserved or not assigned to
> +	 * this partition */
> +	if ((lmb->flags & DRCONF_MEM_RESERVED)
> +	    || !(lmb->flags & DRCONF_MEM_ASSIGNED))
> +		return NUMA_NO_NODE;
>  
> -	return nid;
> +	return of_drconf_to_nid_single(lmb);
>  }
>  
>  /*


^ permalink raw reply

* Re: OF: Can't handle multiple dma-ranges with different offsets
From: Chris Packham @ 2020-07-22 22:11 UTC (permalink / raw)
  To: robh+dt@kernel.org, frowand.list@gmail.com, mpe@ellerman.id.au,
	benh@kernel.crashing.org, paulus@samba.org,
	christophe.leroy@c-s.fr
  Cc: devicetree@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <5cb3aaa7-e05e-5fbc-db42-60e07acdaf05@alliedtelesis.co.nz>


On 22/07/20 4:19 pm, Chris Packham wrote:
> Hi,
>
> I've just fired up linux kernel v5.7 on a p2040 based system and I'm 
> getting the following new warning
>
> OF: Can't handle multiple dma-ranges with different offsets on 
> node(/pcie@ffe202000)
> OF: Can't handle multiple dma-ranges with different offsets on 
> node(/pcie@ffe202000)
>
> The warning itself was added in commit 9d55bebd9816 ("of/address: 
> Support multiple 'dma-ranges' entries") but I gather it's pointing out 
> something about the dts. My boards dts is based heavily on 
> p2041rdb.dts and the relevant pci2 section is identical (reproduced 
> below for reference).
>
>     pci2: pcie@ffe202000 {
>         reg = <0xf 0xfe202000 0 0x1000>;
>         ranges = <0x02000000 0 0xe0000000 0xc 0x40000000 0 0x20000000
>               0x01000000 0 0x00000000 0xf 0xf8020000 0 0x00010000>;
>         pcie@0 {
>             ranges = <0x02000000 0 0xe0000000
>                   0x02000000 0 0xe0000000
>                   0 0x20000000
>
>                   0x01000000 0 0x00000000
>                   0x01000000 0 0x00000000
>                   0 0x00010000>;
>         };
>     };
>
> I haven't noticed any ill effect (aside from the scary message). I'm 
> not sure if there's something missing in the dts or in the code that 
> checks the ranges. Any guidance would be appreciated.

I've also just checked the T2080RDB on v5.7.9 which shows a similar issue

OF: Can't handle multiple dma-ranges with different offsets on 
node(/pcie@ffe250000)
OF: Can't handle multiple dma-ranges with different offsets on 
node(/pcie@ffe250000)
pcieport 0000:00:00.0: Invalid size 0xfffff9 for dma-range
pcieport 0000:00:00.0: AER: enabled with IRQ 21
OF: Can't handle multiple dma-ranges with different offsets on 
node(/pcie@ffe270000)
OF: Can't handle multiple dma-ranges with different offsets on 
node(/pcie@ffe270000)
pcieport 0001:00:00.0: Invalid size 0xfffff9 for dma-range
pcieport 0001:00:00.0: AER: enabled with IRQ 23



^ permalink raw reply

* [PATCH devicetree 4/4] powerpc: dts: t1040rdb: add ports for Seville Ethernet switch
From: Vladimir Oltean @ 2020-07-22 17:24 UTC (permalink / raw)
  To: robh+dt, shawnguo, mpe, devicetree
  Cc: madalin.bucur, linux-kernel, radu-andrei.bulie, fido_max, paulus,
	netdev, linuxppc-dev
In-Reply-To: <20200722172422.2590489-1-olteanv@gmail.com>

Define the network interface names for the switch ports and hook them up
to the 2 QSGMII PHYs that are onboard.

A conscious decision was taken to go along with the numbers that are
written on the front panel of the board and not with the hardware
numbers of the switch chip ports. The 2 are shifted by 4.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
 arch/powerpc/boot/dts/fsl/t1040rdb.dts | 111 +++++++++++++++++++++++++
 1 file changed, 111 insertions(+)

diff --git a/arch/powerpc/boot/dts/fsl/t1040rdb.dts b/arch/powerpc/boot/dts/fsl/t1040rdb.dts
index 40d7126dbe90..28ee06a1706d 100644
--- a/arch/powerpc/boot/dts/fsl/t1040rdb.dts
+++ b/arch/powerpc/boot/dts/fsl/t1040rdb.dts
@@ -75,4 +75,115 @@ &mdio0 {
 	phy_sgmii_2: ethernet-phy@3 {
 		reg = <0x3>;
 	};
+
+	/* VSC8514 QSGMII PHY */
+	phy_qsgmii_0: ethernet-phy@4 {
+		reg = <0x4>;
+	};
+
+	phy_qsgmii_1: ethernet-phy@5 {
+		reg = <0x5>;
+	};
+
+	phy_qsgmii_2: ethernet-phy@6 {
+		reg = <0x6>;
+	};
+
+	phy_qsgmii_3: ethernet-phy@7 {
+		reg = <0x7>;
+	};
+
+	/* VSC8514 QSGMII PHY */
+	phy_qsgmii_4: ethernet-phy@8 {
+		reg = <0x8>;
+	};
+
+	phy_qsgmii_5: ethernet-phy@9 {
+		reg = <0x9>;
+	};
+
+	phy_qsgmii_6: ethernet-phy@a {
+		reg = <0xa>;
+	};
+
+	phy_qsgmii_7: ethernet-phy@b {
+		reg = <0xb>;
+	};
+};
+
+&seville_port0 {
+	managed = "in-band-status";
+	phy-handle = <&phy_qsgmii_0>;
+	phy-mode = "qsgmii";
+	/* ETH4 written on chassis */
+	label = "swp4";
+	status = "okay";
+};
+
+&seville_port1 {
+	managed = "in-band-status";
+	phy-handle = <&phy_qsgmii_1>;
+	phy-mode = "qsgmii";
+	/* ETH5 written on chassis */
+	label = "swp5";
+	status = "okay";
+};
+
+&seville_port2 {
+	managed = "in-band-status";
+	phy-handle = <&phy_qsgmii_2>;
+	phy-mode = "qsgmii";
+	/* ETH6 written on chassis */
+	label = "swp6";
+	status = "okay";
+};
+
+&seville_port3 {
+	managed = "in-band-status";
+	phy-handle = <&phy_qsgmii_3>;
+	phy-mode = "qsgmii";
+	/* ETH7 written on chassis */
+	label = "swp7";
+	status = "okay";
+};
+
+&seville_port4 {
+	managed = "in-band-status";
+	phy-handle = <&phy_qsgmii_4>;
+	phy-mode = "qsgmii";
+	/* ETH8 written on chassis */
+	label = "swp8";
+	status = "okay";
+};
+
+&seville_port5 {
+	managed = "in-band-status";
+	phy-handle = <&phy_qsgmii_5>;
+	phy-mode = "qsgmii";
+	/* ETH9 written on chassis */
+	label = "swp9";
+	status = "okay";
+};
+
+&seville_port6 {
+	managed = "in-band-status";
+	phy-handle = <&phy_qsgmii_6>;
+	phy-mode = "qsgmii";
+	/* ETH10 written on chassis */
+	label = "swp10";
+	status = "okay";
+};
+
+&seville_port7 {
+	managed = "in-band-status";
+	phy-handle = <&phy_qsgmii_7>;
+	phy-mode = "qsgmii";
+	/* ETH11 written on chassis */
+	label = "swp11";
+	status = "okay";
+};
+
+&seville_port8 {
+	ethernet = <&enet0>;
+	status = "okay";
 };
-- 
2.25.1


^ permalink raw reply related

* [PATCH devicetree 2/4] powerpc: dts: t1040: label the 2 MDIO controllers
From: Vladimir Oltean @ 2020-07-22 17:24 UTC (permalink / raw)
  To: robh+dt, shawnguo, mpe, devicetree
  Cc: madalin.bucur, linux-kernel, radu-andrei.bulie, fido_max, paulus,
	netdev, linuxppc-dev
In-Reply-To: <20200722172422.2590489-1-olteanv@gmail.com>

In preparation of referencing the MDIO nodes from board DTS files (so
that we can add PHY nodes easier), add labels to mdio0 and mdio1.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
 arch/powerpc/boot/dts/fsl/t1040si-post.dtsi | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/boot/dts/fsl/t1040si-post.dtsi b/arch/powerpc/boot/dts/fsl/t1040si-post.dtsi
index 4af856dcc6a3..e1b138b3c714 100644
--- a/arch/powerpc/boot/dts/fsl/t1040si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/t1040si-post.dtsi
@@ -620,11 +620,11 @@ enet3: ethernet@e6000 {
 		enet4: ethernet@e8000 {
 		};
 
-		mdio@fc000 {
+		mdio0: mdio@fc000 {
 			interrupts = <100 1 0 0>;
 		};
 
-		mdio@fd000 {
+		mdio1: mdio@fd000 {
 			status = "disabled";
 		};
 	};
-- 
2.25.1


^ permalink raw reply related

* [PATCH devicetree 3/4] powerpc: dts: t1040rdb: put SGMII PHY under &mdio0 label
From: Vladimir Oltean @ 2020-07-22 17:24 UTC (permalink / raw)
  To: robh+dt, shawnguo, mpe, devicetree
  Cc: madalin.bucur, linux-kernel, radu-andrei.bulie, fido_max, paulus,
	netdev, linuxppc-dev
In-Reply-To: <20200722172422.2590489-1-olteanv@gmail.com>

We're going to add 8 more PHYs in a future patch. It is easier to follow
the hardware description if we don't need to fish for the path of the
MDIO controllers inside the SoC and just use the labels.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
 arch/powerpc/boot/dts/fsl/t1040rdb.dts | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/boot/dts/fsl/t1040rdb.dts b/arch/powerpc/boot/dts/fsl/t1040rdb.dts
index 65ff34c49025..40d7126dbe90 100644
--- a/arch/powerpc/boot/dts/fsl/t1040rdb.dts
+++ b/arch/powerpc/boot/dts/fsl/t1040rdb.dts
@@ -59,12 +59,6 @@ ethernet@e4000 {
 				phy-handle = <&phy_sgmii_2>;
 				phy-connection-type = "sgmii";
 			};
-
-			mdio@fc000 {
-				phy_sgmii_2: ethernet-phy@3 {
-					reg = <0x03>;
-				};
-			};
 		};
 	};
 
@@ -76,3 +70,9 @@ cpld@3,0 {
 };
 
 #include "t1040si-post.dtsi"
+
+&mdio0 {
+	phy_sgmii_2: ethernet-phy@3 {
+		reg = <0x3>;
+	};
+};
-- 
2.25.1


^ permalink raw reply related

* [PATCH devicetree 0/4] Add Seville Ethernet switch to T1040RDB
From: Vladimir Oltean @ 2020-07-22 17:24 UTC (permalink / raw)
  To: robh+dt, shawnguo, mpe, devicetree
  Cc: madalin.bucur, linux-kernel, radu-andrei.bulie, fido_max, paulus,
	netdev, linuxppc-dev

Seville is a DSA switch that is embedded inside the T1040 SoC, and
supported by the mscc_seville DSA driver. The driver has been accepted
this release cycle and is currently available in net-next (and
therefore, in linux-next).

This series adds this switch to the SoC's dtsi files and to the T1040RDB
board file.

Vladimir Oltean (4):
  powerpc: dts: t1040: add bindings for Seville Ethernet switch
  powerpc: dts: t1040: label the 2 MDIO controllers
  powerpc: dts: t1040rdb: put SGMII PHY under &mdio0 label
  powerpc: dts: t1040rdb: add ports for Seville Ethernet switch

 arch/powerpc/boot/dts/fsl/t1040rdb.dts      | 123 +++++++++++++++++++-
 arch/powerpc/boot/dts/fsl/t1040si-post.dtsi |  79 ++++++++++++-
 2 files changed, 194 insertions(+), 8 deletions(-)

-- 
2.25.1


^ permalink raw reply

* [PATCH devicetree 1/4] powerpc: dts: t1040: add bindings for Seville Ethernet switch
From: Vladimir Oltean @ 2020-07-22 17:24 UTC (permalink / raw)
  To: robh+dt, shawnguo, mpe, devicetree
  Cc: madalin.bucur, linux-kernel, radu-andrei.bulie, fido_max, paulus,
	netdev, linuxppc-dev
In-Reply-To: <20200722172422.2590489-1-olteanv@gmail.com>

Add the description of the embedded L2 switch inside the SoC dtsi file
for NXP T1040.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
 arch/powerpc/boot/dts/fsl/t1040si-post.dtsi | 75 +++++++++++++++++++++
 1 file changed, 75 insertions(+)

diff --git a/arch/powerpc/boot/dts/fsl/t1040si-post.dtsi b/arch/powerpc/boot/dts/fsl/t1040si-post.dtsi
index 315d0557eefc..4af856dcc6a3 100644
--- a/arch/powerpc/boot/dts/fsl/t1040si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/t1040si-post.dtsi
@@ -628,6 +628,81 @@ mdio@fd000 {
 			status = "disabled";
 		};
 	};
+
+	seville_switch: ethernet-switch@800000 {
+		#address-cells = <1>;
+		#size-cells = <0>;
+		compatible = "mscc,vsc9953-switch";
+		little-endian;
+		reg = <0x800000 0x290000>;
+
+		ports {
+			#address-cells = <1>;
+			#size-cells = <0>;
+
+			seville_port0: port@0 {
+				reg = <0>;
+				status = "disabled";
+			};
+
+			seville_port1: port@1 {
+				reg = <1>;
+				status = "disabled";
+			};
+
+			seville_port2: port@2 {
+				reg = <2>;
+				status = "disabled";
+			};
+
+			seville_port3: port@3 {
+				reg = <3>;
+				status = "disabled";
+			};
+
+			seville_port4: port@4 {
+				reg = <4>;
+				status = "disabled";
+			};
+
+			seville_port5: port@5 {
+				reg = <5>;
+				status = "disabled";
+			};
+
+			seville_port6: port@6 {
+				reg = <6>;
+				status = "disabled";
+			};
+
+			seville_port7: port@7 {
+				reg = <7>;
+				status = "disabled";
+			};
+
+			seville_port8: port@8 {
+				reg = <8>;
+				phy-mode = "internal";
+				status = "disabled";
+
+				fixed-link {
+					speed = <2500>;
+					full-duplex;
+				};
+			};
+
+			seville_port9: port@9 {
+				reg = <9>;
+				phy-mode = "internal";
+				status = "disabled";
+
+				fixed-link {
+					speed = <2500>;
+					full-duplex;
+				};
+			};
+		};
+	};
 };
 
 &qe {
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone
From: Atish Patra @ 2020-07-22 21:05 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Albert Ou, Alexandre Ghiti, Linux-MM, Anup Patel,
	linux-kernel@vger.kernel.org, Atish Patra, Palmer Dabbelt,
	Zong Li, Paul Walmsley, Paul Mackerras, linux-riscv, linuxppc-dev
In-Reply-To: <CAK8P3a2VHXDLK6iba=NxSQ-t=9P7LSwzwx3XrK=N=M+qoX_oeQ@mail.gmail.com>

On Wed, Jul 22, 2020 at 1:23 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Wed, Jul 22, 2020 at 9:52 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
> > On Wed, 22 Jul 2020 02:43:50 PDT (-0700), Arnd Bergmann wrote:
> > > On Tue, Jul 21, 2020 at 9:06 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
> > > The eventual goal is to have a split of 3840MB for either user or linear map
> > > plus and 256MB for vmalloc, including the kernel. Switching between linear
> > > and user has a noticeable runtime overhead, but it relaxes both the limits
> > > for user memory and lowmem, and it provides a somewhat stronger
> > > address space isolation.
> >
> > Ya, I think we decided not to do that, at least for now.  I guess the right
> > answer there will depend on what 32-bit systems look like, and since we don't
> > have any I'm inclined to just stick to the fast option.
>
> Makes sense. Actually on 32-bit Arm we see fewer large-memory
> configurations in new machines than we had in the past before 64-bit
> machines were widely available at low cost, so I expect not to see a
> lot new hardware with more than 1GB of DDR3 (two 256Mbit x16 chips)
> for cost reasons, and rv32 is likely going to be similar, so you may never
> really see a need for highmem or the above hack to increase the
> size of the linear mapping.
>
> I just noticed that rv32 allows 2GB of lowmem rather than just the usual
> 768MB or 1GB, at the expense of addressable user memory. This seems
> like an unusual choice, but I also don't see any reason to change this
> or make it more flexible unless actual users appear.
>

I am a bit confused here. As per my understanding, RV32 supports 1GB
of lowmem only
as the page offset is set to 0xC0000000. The config option
MAXPHYSMEM_2GB is misleading
as RV32 actually allows 1GB of physical memory only. Any memory blocks beyond
DRAM + 1GB are removed in setup_bootmem. IMHO, The current config
should clarify that.

Moreover, we should add 2G split under a separate configuration if we
want to support that.

>        Arnd
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv



-- 
Regards,
Atish

^ permalink raw reply

* Re: [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone
From: Arnd Bergmann @ 2020-07-22 20:22 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: Albert Ou, Alexandre Ghiti, Atish Patra, Anup Patel,
	linux-kernel@vger.kernel.org, Paul Walmsley, Linux-MM,
	Paul Mackerras, Zong Li, linux-riscv, linuxppc-dev
In-Reply-To: <mhng-820ebe55-b4a3-4ab3-b848-6d3551b43091@palmerdabbelt-glaptop1>

On Wed, Jul 22, 2020 at 9:52 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
> On Wed, 22 Jul 2020 02:43:50 PDT (-0700), Arnd Bergmann wrote:
> > On Tue, Jul 21, 2020 at 9:06 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
> > The eventual goal is to have a split of 3840MB for either user or linear map
> > plus and 256MB for vmalloc, including the kernel. Switching between linear
> > and user has a noticeable runtime overhead, but it relaxes both the limits
> > for user memory and lowmem, and it provides a somewhat stronger
> > address space isolation.
>
> Ya, I think we decided not to do that, at least for now.  I guess the right
> answer there will depend on what 32-bit systems look like, and since we don't
> have any I'm inclined to just stick to the fast option.

Makes sense. Actually on 32-bit Arm we see fewer large-memory
configurations in new machines than we had in the past before 64-bit
machines were widely available at low cost, so I expect not to see a
lot new hardware with more than 1GB of DDR3 (two 256Mbit x16 chips)
for cost reasons, and rv32 is likely going to be similar, so you may never
really see a need for highmem or the above hack to increase the
size of the linear mapping.

I just noticed that rv32 allows 2GB of lowmem rather than just the usual
768MB or 1GB, at the expense of addressable user memory. This seems
like an unusual choice, but I also don't see any reason to change this
or make it more flexible unless actual users appear.

       Arnd

^ permalink raw reply

* Re: [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone
From: Palmer Dabbelt @ 2020-07-22 19:52 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: aou, alex, Atish Patra, Anup Patel, linux-kernel, Paul Walmsley,
	linux-mm, paulus, zong.li, linux-riscv, linuxppc-dev
In-Reply-To: <CAK8P3a34sT2bQbkZUjaxaShzCkn+s35pXxS0UNhqGFu+t2hZYw@mail.gmail.com>

On Wed, 22 Jul 2020 02:43:50 PDT (-0700), Arnd Bergmann wrote:
> On Tue, Jul 21, 2020 at 9:06 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>>
>> On Tue, 21 Jul 2020 11:36:10 PDT (-0700), alex@ghiti.fr wrote:
>> > Let's try to make progress here: I add linux-mm in CC to get feedback on
>> > this patch as it blocks sv48 support too.
>>
>> Sorry for being slow here.  I haven't replied because I hadn't really fleshed
>> out the design yet, but just so everyone's on the same page my problems with
>> this are:
>>
>> * We waste vmalloc space on 32-bit systems, where there isn't a lot of it.
>
> There is actually an ongoing work to make 32-bit Arm kernels move
> vmlinux into the vmalloc space, as part of the move to avoid highmem.
>
> Overall, a 32-bit system would waste about 0.1% of its virtual address space
> by having the kernel be located in both the linear map and the vmalloc area.
> It's not zero, but not that bad either. With the typical split of 3072 MB user,
> 768MB linear and 256MB vmalloc, it's also around 1.5% of the available
> vmalloc area (assuming a 4MB vmlinux in a typical 32-bit kernel), but the
> boundaries can be changed arbitrarily if needed.

OK, I guess maybe it's not so bad.  Our 32-bit defconfig is 10MiB, but I
wouldn't really put much weight behind that number as it's just a 64-bit
defconfig built for 32-bit.  We don't have any 32-bit hardware anyway, so if
this becomes an issue later I guess we can just deal with it then.

> The eventual goal is to have a split of 3840MB for either user or linear map
> plus and 256MB for vmalloc, including the kernel. Switching between linear
> and user has a noticeable runtime overhead, but it relaxes both the limits
> for user memory and lowmem, and it provides a somewhat stronger
> address space isolation.

Ya, I think we decided not to do that, at least for now.  I guess the right
answer there will depend on what 32-bit systems look like, and since we don't
have any I'm inclined to just stick to the fast option.

> Another potential idea would be to completely randomize the physical
> addresses underneath the kernel by using a random permutation of the
> pages in the kernel image. This adds even more overhead (virt_to_phys
> may need to call vmalloc_to_page or similar) and may cause problems
> with DMA into kernel .data across page boundaries,
>
>> * Sort out how to maintain a linear map as the canonical hole moves around
>>   between the VA widths without adding a bunch of overhead to the virt2phys and
>>   friends.  This is probably going to be the trickiest part, but I think if we
>>   just change the page table code to essentially lie about VAs when an sv39
>>   system runs an sv48+sv39 kernel we could make it work -- there'd be some
>>   logical complexity involved, but it would remain fast.
>
> I assume you can't use the trick that x86 has where all kernel addresses
> are at the top of the 64-bit address space and user addresses are at the
> bottom, regardless of the size of the page tables?

They have the load in their mapping functions, as far as I can tell that's
required to do this sort of thing.  We do as well to handle some of the
implicit boot stuff for now, but I was assuming that we'd want to get rid of
that for performance reasons.  That said, maybe it just doesn't matter?  

^ permalink raw reply

* Re: [PATCH v4 07/12] ppc64/kexec_file: add support to relocate purgatory
From: Hari Bathini @ 2020-07-22 17:33 UTC (permalink / raw)
  To: Michael Ellerman, Andrew Morton
  Cc: kernel test robot, Pingfan Liu, Kexec-ml, Nayna Jain,
	Petr Tesarik, Mahesh J Salgaonkar, Mimi Zohar, lkml, linuxppc-dev,
	Sourabh Jain, Vivek Goyal, Dave Young, Thiago Jung Bauermann,
	Eric Biederman
In-Reply-To: <871rl4rxao.fsf@mpe.ellerman.id.au>



On 22/07/20 9:55 am, Michael Ellerman wrote:
> Hari Bathini <hbathini@linux.ibm.com> writes:
>> Right now purgatory implementation is only minimal. But if purgatory
>> code is to be enhanced to copy memory to the backup region and verify
>> sha256 digest, relocations may have to be applied to the purgatory.
>> So, add support to relocate purgatory in kexec_file_load system call
>> by setting up TOC pointer and applying RELA relocations as needed.
>>
>> Reported-by: kernel test robot <lkp@intel.com>
>> [lkp: In v1, 'struct mem_sym' was declared in parameter list]
>> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
>> ---
>>
>> * Michael, can you share your opinion on the below:
>>     - https://lore.kernel.org/patchwork/patch/1272027/
>>     - My intention in cover note.
> 
> It seems like a lot of complexity for little benefit.
> 
> AFAICS your final purgatory_64.c is only 36 lines, and all it does is a
> single (open coded) memcpy().
> 
> It seems like we could write that in not many more lines of assembler
> and avoid all this code.

Hi Michael,

I am not sure if you would agree with me on this, but I am looking at the
purgatory code as work in progress. As mentioned in the cover note, I intend
to add log messaging, sha256 verification into purgatory. And also change it
to position independent executable after moving common purgatory code (sha256
verification) to arch-independent code.

When I initially took this up, I wanted to add all the above changes too, but
cut down on it, in the interest of time, first to get kdump (kexec -s -p)
working in v5.9 merge window.

But as the logic in patches 07/12 & 08/12 has been tested in kexec-tools code
a lot of times and there are unlikely to be any changes to them except for
__kexec_do_relocs() function (afaics), when -PIE would be used, I submitted them.
With patch 09/12, I tried for a change that uses relocations while is minimal
for now.

Would you prefer it to be absolutely minimal by dropping patches 7 & 8 for
now and writing the backup data copy code in assembler?

Thanks
Hari

^ permalink raw reply

* Re: [PATCH v3 0/2] Selftest for cpuidle latency measurement
From: Pratik Sampat @ 2020-07-22 15:33 UTC (permalink / raw)
  To: Daniel Lezcano, rjw, mpe, benh, paulus, srivatsa, shuah, npiggin,
	ego, svaidy, pratik.r.sampat, linux-pm, linuxppc-dev,
	linux-kernel, linux-kselftest
In-Reply-To: <17e884b8-09d8-98a8-3890-bf506d2cdfca@linaro.org>

Hello Daniel,

On 21/07/20 8:27 pm, Daniel Lezcano wrote:
> On 21/07/2020 14:42, Pratik Rajesh Sampat wrote:
>> v2: https://lkml.org/lkml/2020/7/17/369
>> Changelog v2-->v3
>> Based on comments from Gautham R. Shenoy adding the following in the
>> selftest,
>> 1. Grepping modules to determine if already loaded
>> 2. Wrapper to enable/disable states
>> 3. Preventing any operation/test on offlined CPUs
>> ---
>>
>> The patch series introduces a mechanism to measure wakeup latency for
>> IPI and timer based interrupts
>> The motivation behind this series is to find significant deviations
>> behind advertised latency and resisdency values
> Why do you want to measure for the timer and the IPI ? Whatever the
> source of the wakeup, the exit latency remains the same, no ?
>
> Is all this kernel-ish code really needed ?
>
> What about using a highres periodic timer and make it expires every eg.
> 50ms x 2400, so it is 120 secondes and measure the deviation. Repeat the
> operation for each idle states.
>
> And in order to make it as much accurate as possible, set the program
> affinity on a CPU and isolate this one by preventing other processes to
> be scheduled on and migrate the interrupts on the other CPUs.
>
> That will be all userspace code, no?
>
>
The kernel module may not needed now that you mention it.
IPI latencies could be measured using pipes and threads using
pthread_attr_setaffinity_np to control the experiment, as you
suggested. This should internally fire a smp_call_function_single.

The original idea was to essentially measure it as closely as possible
in the kernel without involving the kernel-->userspace overhead.
However, the user-space approach may not be too much of a problem as
we are collecting a baseline and the delta of the latency is what we
would be concerned about anyways!

With respect to measuring both timers and IPI latencies: In principle
yes, the exit latency should remain the same but if there is a
deviation in reality we may want to measure it.

I'll implement this experiment in the userspace and get back with the
numbers to confirm.

Thanks for your comments!
Best,
Pratik

>
>


^ permalink raw reply

* Re: [PATCH] spi: ppc4xx: Convert to use GPIO descriptors
From: Mark Brown @ 2020-07-22 13:45 UTC (permalink / raw)
  To: linux-spi, Linus Walleij; +Cc: linuxppc-dev
In-Reply-To: <20200714072226.26071-1-linus.walleij@linaro.org>

On Tue, 14 Jul 2020 09:22:26 +0200, Linus Walleij wrote:
> This converts the PPC4xx SPI driver to use GPIO descriptors.
> 
> The driver is already just picking some GPIOs from the device
> tree so the conversion is pretty straight forward. However
> this driver is looking form a pure "gpios" property rather
> than the standard binding "cs-gpios" so we need to add a new
> exception to the gpiolib OF parser to allow this for this
> driver's compatibles.

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next

Thanks!

[1/1] spi: ppc4xx: Convert to use GPIO descriptors
      commit: 4726773292bfdb2917a0b4d369ddccd5e2f30991

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

^ permalink raw reply

* RE: [RFC PATCH] powerpc/pseries/svm: capture instruction faulting on MMIO access, in sprg0 register
From: Michael Ellerman @ 2020-07-22 12:45 UTC (permalink / raw)
  To: Ram Pai
  Cc: ldufour, aik, kvm-ppc, bharata, sathnaga, sukadev, linuxppc-dev,
	bauerman, david
In-Reply-To: <20200722074929.GI7339@oc0525413822.ibm.com>

Ram Pai <linuxram@us.ibm.com> writes:
> On Wed, Jul 22, 2020 at 12:06:06PM +1000, Michael Ellerman wrote:
>> Ram Pai <linuxram@us.ibm.com> writes:
>> > An instruction accessing a mmio address, generates a HDSI fault.  This fault is
>> > appropriately handled by the Hypervisor.  However in the case of secureVMs, the
>> > fault is delivered to the ultravisor.
>> >
>> > Unfortunately the Ultravisor has no correct-way to fetch the faulting
>> > instruction. The PEF architecture does not allow Ultravisor to enable MMU
>> > translation. Walking the two level page table to read the instruction can race
>> > with other vcpus modifying the SVM's process scoped page table.
>> 
>> You're trying to read the guest's kernel text IIUC, that mapping should
>> be stable. Possibly permissions on it could change over time, but the
>> virtual -> real mapping should not.
>
> Actually the code does not capture the address of the instruction in the
> sprg0 register. It captures the instruction itself. So should the mapping
> matter?

Sorry that was talking about reading the instruction by doing the page
walk, not with this patch applied.

cheers

^ permalink raw reply

* Re: [v3 07/15] powerpc/perf: Add power10_feat to dt_cpu_ftrs
From: Athira Rajeev @ 2020-07-22 12:28 UTC (permalink / raw)
  To: Jordan Niethe
  Cc: Gautham R Shenoy, Michael Neuling, maddy, kvm, kvm-ppc, svaidyan,
	acme, jolsa, linuxppc-dev
In-Reply-To: <CACzsE9p-mNGptKi_+RSOOhvmW5gfLcKEbtoS6ah_5ZmVCThfpQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 8112 bytes --]



> On 22-Jul-2020, at 4:19 PM, Jordan Niethe <jniethe5@gmail.com> wrote:
> 
> On Wed, Jul 22, 2020 at 5:55 PM Athira Rajeev
> <atrajeev@linux.vnet.ibm.com <mailto:atrajeev@linux.vnet.ibm.com>> wrote:
>> 
>> 
>> 
>> On 22-Jul-2020, at 10:11 AM, Jordan Niethe <jniethe5@gmail.com> wrote:
>> 
>> On Sat, Jul 18, 2020 at 1:13 AM Athira Rajeev
>> <atrajeev@linux.vnet.ibm.com> wrote:
>> 
>> 
>> From: Madhavan Srinivasan <maddy@linux.ibm.com>
>> 
>> Add power10 feature function to dt_cpu_ftrs.c along
>> with a power10 specific init() to initialize pmu sprs,
>> sets the oprofile_cpu_type and cpu_features. This will
>> enable performance monitoring unit(PMU) for Power10
>> in CPU features with "performance-monitor-power10".
>> 
>> For PowerISA v3.1, BHRB disable is controlled via Monitor Mode
>> Control Register A (MMCRA) bit, namely "BHRB Recording Disable
>> (BHRBRD)". This patch initializes MMCRA BHRBRD to disable BHRB
>> feature at boot for power10.
>> 
>> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
>> ---
>> arch/powerpc/include/asm/reg.h        |  3 +++
>> arch/powerpc/kernel/cpu_setup_power.S |  8 ++++++++
>> arch/powerpc/kernel/dt_cpu_ftrs.c     | 26 ++++++++++++++++++++++++++
>> 3 files changed, 37 insertions(+)
>> 
>> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
>> index 21a1b2d..900ada1 100644
>> --- a/arch/powerpc/include/asm/reg.h
>> +++ b/arch/powerpc/include/asm/reg.h
>> @@ -1068,6 +1068,9 @@
>> #define MMCR0_PMC2_LOADMISSTIME        0x5
>> #endif
>> 
>> +/* BHRB disable bit for PowerISA v3.10 */
>> +#define MMCRA_BHRB_DISABLE     0x0000002000000000
>> 
>> Shouldn't this go under SPRN_MMCRA with the other MMCRA_*.
>> 
>> 
>> 
>> Hi Jordan
>> 
>> Ok, the definition of MMCRA is under #ifdef for 64 bit .  if I move definition of MMCRA_BHRB_DISABLE along with other SPR's, I also
>> need to define this for 32-bit to satisfy core-book3s to compile as below:
>> 
>> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
>> index 900ada10762c..7e271657b412 100644
>> --- a/arch/powerpc/include/asm/reg.h
>> +++ b/arch/powerpc/include/asm/reg.h
>> @@ -888,6 +888,8 @@
>> #define   MMCRA_SLOT   0x07000000UL /* SLOT bits (37-39) */
>> #define   MMCRA_SLOT_SHIFT     24
>> #define   MMCRA_SAMPLE_ENABLE 0x00000001UL /* enable sampling */
>> +/* BHRB disable bit for PowerISA v3.10 */
>> +#define   MMCRA_BHRB_DISABLE  0x0000002000000000
>> #define   POWER6_MMCRA_SDSYNC 0x0000080000000000ULL    /* SDAR/SIAR synced */
>> #define   POWER6_MMCRA_SIHV   0x0000040000000000ULL
>> #define   POWER6_MMCRA_SIPR   0x0000020000000000ULL
>> @@ -1068,9 +1070,6 @@
>> #define MMCR0_PMC2_LOADMISSTIME        0x5
>> #endif
>> 
>> 
>> 
>> -/* BHRB disable bit for PowerISA v3.10 */
>> -#define MMCRA_BHRB_DISABLE     0x0000002000000000
>> -
>> /*
>>  * SPRG usage:
>>  *
>> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
>> index 36baae666387..88068f20827c 100644
>> --- a/arch/powerpc/perf/core-book3s.c
>> +++ b/arch/powerpc/perf/core-book3s.c
>> @@ -94,6 +94,7 @@ static unsigned int freeze_events_kernel = MMCR0_FCS;
>> #define SPRN_SIER2             0
>> #define SPRN_SIER3             0
>> #define MMCRA_SAMPLE_ENABLE    0
>> +#define MMCRA_BHRB_DISABLE     0
>> 
>> 
>> 
>> static inline unsigned long perf_ip_adjust(struct pt_regs *regs)
>> {
>> 
>> 
>> 
>> +
>> /*
>> * SPRG usage:
>> *
>> diff --git a/arch/powerpc/kernel/cpu_setup_power.S b/arch/powerpc/kernel/cpu_setup_power.S
>> index efdcfa7..b8e0d1e 100644
>> --- a/arch/powerpc/kernel/cpu_setup_power.S
>> +++ b/arch/powerpc/kernel/cpu_setup_power.S
>> @@ -94,6 +94,7 @@ _GLOBAL(__restore_cpu_power8)
>> _GLOBAL(__setup_cpu_power10)
>>       mflr    r11
>>       bl      __init_FSCR_power10
>> +       bl      __init_PMU_ISA31
>> 
>> So we set MMCRA here but then aren't we still going to call __init_PMU
>> which will overwrite that?
>> Would this setting MMCRA also need to be handled in __restore_cpu_power10?
>> 
>> 
>> Thanks for this nice catch !  When I rebased code initial phase, we didn’t had power10 part filled in.
>> It was a miss from my side in adding PMu init functions and thanks for pointing this out.
>> Below patch will call __init_PMU functions in setup and restore. Please check if this looks good
>> 
>> --
>> diff --git a/arch/powerpc/kernel/cpu_setup_power.S b/arch/powerpc/kernel/cpu_setup_power.S
>> index efdcfa714106..e672a6c5fd7c 100644
>> --- a/arch/powerpc/kernel/cpu_setup_power.S
>> +++ b/arch/powerpc/kernel/cpu_setup_power.S
>> @@ -94,6 +94,9 @@ _GLOBAL(__restore_cpu_power8)
>> _GLOBAL(__setup_cpu_power10)
>>  mflr r11
>>  bl __init_FSCR_power10
>> + bl __init_PMU
>> + bl __init_PMU_ISA31
>> + bl __init_PMU_HV
>>  b 1f
>> 
>> _GLOBAL(__setup_cpu_power9)
> 
> Won't you also need to change where the label 1 is:
> --- a/arch/powerpc/kernel/cpu_setup_power.S
> +++ b/arch/powerpc/kernel/cpu_setup_power.S
> @@ -100,8 +100,8 @@ _GLOBAL(__setup_cpu_power10)
> _GLOBAL(__setup_cpu_power9)
>        mflr    r11
>        bl      __init_FSCR
> -1:     bl      __init_PMU
> -       bl      __init_hvmode_206
> +       bl      __init_PMU
> +1:     bl      __init_hvmode_206
>        mtlr    r11
>        beqlr
>        li      r0,0

HI Jordan

I will address these comments and include changes for cpu_setup_power.S in a separate patch as suggested by Michael Ellerman

Thanks
Athira
> 
>> @@ -124,6 +127,9 @@ _GLOBAL(__setup_cpu_power9)
>> _GLOBAL(__restore_cpu_power10)
>>  mflr r11
>>  bl __init_FSCR_power10
>> + bl __init_PMU
>> + bl __init_PMU_ISA31
>> + bl __init_PMU_HV
>>  b 1f
>> 
>> _GLOBAL(__restore_cpu_power9)
>> @@ -233,3 +239,10 @@ __init_PMU_ISA207:
>>  li r5,0
>>  mtspr SPRN_MMCRS,r5
>>  blr
>> +
>> +__init_PMU_ISA31:
>> + li r5,0
>> + mtspr SPRN_MMCR3,r5
>> + LOAD_REG_IMMEDIATE(r5, MMCRA_BHRB_DISABLE)
>> + mtspr SPRN_MMCRA,r5
>> + blr
>> 
>> —
>> 
>>       b       1f
>> 
>> _GLOBAL(__setup_cpu_power9)
>> @@ -233,3 +234,10 @@ __init_PMU_ISA207:
>>       li      r5,0
>>       mtspr   SPRN_MMCRS,r5
>>       blr
>> +
>> +__init_PMU_ISA31:
>> +       li      r5,0
>> +       mtspr   SPRN_MMCR3,r5
>> +       LOAD_REG_IMMEDIATE(r5, MMCRA_BHRB_DISABLE)
>> +       mtspr   SPRN_MMCRA,r5
>> +       blr
>> diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
>> index 3a40951..f482286 100644
>> --- a/arch/powerpc/kernel/dt_cpu_ftrs.c
>> +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
>> @@ -450,6 +450,31 @@ static int __init feat_enable_pmu_power9(struct dt_cpu_feature *f)
>>       return 1;
>> }
>> 
>> +static void init_pmu_power10(void)
>> +{
>> +       init_pmu_power9();
>> +
>> +       mtspr(SPRN_MMCR3, 0);
>> +       mtspr(SPRN_MMCRA, MMCRA_BHRB_DISABLE);
>> +}
>> +
>> +static int __init feat_enable_pmu_power10(struct dt_cpu_feature *f)
>> +{
>> +       hfscr_pmu_enable();
>> +
>> +       init_pmu_power10();
>> +       init_pmu_registers = init_pmu_power10;
>> +
>> +       cur_cpu_spec->cpu_features |= CPU_FTR_MMCRA;
>> +       cur_cpu_spec->cpu_user_features |= PPC_FEATURE_PSERIES_PERFMON_COMPAT;
>> +
>> +       cur_cpu_spec->num_pmcs          = 6;
>> +       cur_cpu_spec->pmc_type          = PPC_PMC_IBM;
>> +       cur_cpu_spec->oprofile_cpu_type = "ppc64/power10";
>> +
>> +       return 1;
>> +}
>> +
>> static int __init feat_enable_tm(struct dt_cpu_feature *f)
>> {
>> #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>> @@ -639,6 +664,7 @@ struct dt_cpu_feature_match {
>>       {"pc-relative-addressing", feat_enable, 0},
>>       {"machine-check-power9", feat_enable_mce_power9, 0},
>>       {"performance-monitor-power9", feat_enable_pmu_power9, 0},
>> +       {"performance-monitor-power10", feat_enable_pmu_power10, 0},
>>       {"event-based-branch-v3", feat_enable, 0},
>>       {"random-number-generator", feat_enable, 0},
>>       {"system-call-vectored", feat_disable, 0},
>> --
>> 1.8.3.1


[-- Attachment #2: Type: text/html, Size: 26517 bytes --]

^ permalink raw reply

* Re: [PATCH v3] powerpc/pseries: Avoid using addr_to_pfn in realmode
From: Ganesh @ 2020-07-22 10:37 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev, mpe; +Cc: aneesh.kumar, mahesh
In-Reply-To: <1595325962.zikec6nkw7.astroid@bobo.none>

[-- Attachment #1: Type: text/plain, Size: 9104 bytes --]



On 7/21/20 3:38 PM, Nicholas Piggin wrote:
> Excerpts from Ganesh Goudar's message of July 20, 2020 6:03 pm:
>> When an UE or memory error exception is encountered the MCE handler
>> tries to find the pfn using addr_to_pfn() which takes effective
>> address as an argument, later pfn is used to poison the page where
>> memory error occurred, recent rework in this area made addr_to_pfn
>> to run in realmode, which can be fatal as it may try to access
>> memory outside RMO region.
>>
>> To fix this have separate functions for realmode and virtual mode
>> handling and let addr_to_pfn to run in virtual mode.
> You didn't really explain what you moved around. You added some
> helper functions, but what does it actually do differently now? Can you
> explain that in the changelog?

Sure, ill rephrase the changelog, here I have moved all that we can and we must
do in virtual mode to new helper function which runs in virtual mode, like filling
mce error info, using addr_to_pfn and calling save_mce_event().

>
> Thanks,
> Nick
>
>> Without this fix following kernel crash is seen on hitting UE.
>>
>> [  485.128036] Oops: Kernel access of bad area, sig: 11 [#1]
>> [  485.128040] LE SMP NR_CPUS=2048 NUMA pSeries
>> [  485.128047] Modules linked in:
>> [  485.128067] CPU: 15 PID: 6536 Comm: insmod Kdump: loaded Tainted: G OE 5.7.0 #22
>> [  485.128074] NIP:  c00000000009b24c LR: c0000000000398d8 CTR: c000000000cd57c0
>> [  485.128078] REGS: c000000003f1f970 TRAP: 0300   Tainted: G OE (5.7.0)
>> [  485.128082] MSR:  8000000000001003 <SF,ME,RI,LE>  CR: 28008284  XER: 00000001
>> [  485.128088] CFAR: c00000000009b190 DAR: c0000001fab00000 DSISR: 40000000 IRQMASK: 1
>> [  485.128088] GPR00: 0000000000000001 c000000003f1fbf0 c000000001634300 0000b0fa01000000
>> [  485.128088] GPR04: d000000002220000 0000000000000000 00000000fab00000 0000000000000022
>> [  485.128088] GPR08: c0000001fab00000 0000000000000000 c0000001fab00000 c000000003f1fc14
>> [  485.128088] GPR12: 0000000000000008 c000000003ff5880 d000000002100008 0000000000000000
>> [  485.128088] GPR16: 000000000000ff20 000000000000fff1 000000000000fff2 d0000000021a1100
>> [  485.128088] GPR20: d000000002200000 c00000015c893c50 c000000000d49b28 c00000015c893c50
>> [  485.128088] GPR24: d0000000021a0d08 c0000000014e5da8 d0000000021a0818 000000000000000a
>> [  485.128088] GPR28: 0000000000000008 000000000000000a c0000000017e2970 000000000000000a
>> [  485.128125] NIP [c00000000009b24c] __find_linux_pte+0x11c/0x310
>> [  485.128130] LR [c0000000000398d8] addr_to_pfn+0x138/0x170
>> [  485.128133] Call Trace:
>> [  485.128135] Instruction dump:
>> [  485.128138] 3929ffff 7d4a3378 7c883c36 7d2907b4 794a1564 7d294038 794af082 3900ffff
>> [  485.128144] 79291f24 790af00e 78e70020 7d095214 <7c69502a> 2fa30000 419e011c 70690040
>> [  485.128152] ---[ end trace d34b27e29ae0e340 ]---
>>
>> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
>> ---
>> V2: Leave bare metal code and save_mce_event as is.
>>
>> V3: Have separate functions for realmode and virtual mode handling.
>> ---
>>   arch/powerpc/platforms/pseries/ras.c | 119 ++++++++++++++++-----------
>>   1 file changed, 70 insertions(+), 49 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
>> index f3736fcd98fc..32fe3fad86b8 100644
>> --- a/arch/powerpc/platforms/pseries/ras.c
>> +++ b/arch/powerpc/platforms/pseries/ras.c
>> @@ -522,18 +522,55 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
>>   	return 0; /* need to perform reset */
>>   }
>>   
>> +static int mce_handle_err_realmode(int disposition, u8 error_type)
>> +{
>> +#ifdef CONFIG_PPC_BOOK3S_64
>> +	if (disposition == RTAS_DISP_NOT_RECOVERED) {
>> +		switch (error_type) {
>> +		case	MC_ERROR_TYPE_SLB:
>> +		case	MC_ERROR_TYPE_ERAT:
>> +			/*
>> +			 * Store the old slb content in paca before flushing.
>> +			 * Print this when we go to virtual mode.
>> +			 * There are chances that we may hit MCE again if there
>> +			 * is a parity error on the SLB entry we trying to read
>> +			 * for saving. Hence limit the slb saving to single
>> +			 * level of recursion.
>> +			 */
>> +			if (local_paca->in_mce == 1)
>> +				slb_save_contents(local_paca->mce_faulty_slbs);
>> +			flush_and_reload_slb();
>> +			disposition = RTAS_DISP_FULLY_RECOVERED;
>> +			break;
>> +		default:
>> +			break;
>> +		}
>> +	} else if (disposition == RTAS_DISP_LIMITED_RECOVERY) {
>> +		/* Platform corrected itself but could be degraded */
>> +		pr_err("MCE: limited recovery, system may be degraded\n");
>> +		disposition = RTAS_DISP_FULLY_RECOVERED;
>> +	}
>> +#endif
>> +	return disposition;
>> +}
>>   
>> -static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp)
>> +static int mce_handle_err_virtmode(struct pt_regs *regs,
>> +				   struct rtas_error_log *errp,
>> +				   struct pseries_mc_errorlog *mce_log,
>> +				   int disposition)
>>   {
>>   	struct mce_error_info mce_err = { 0 };
>> -	unsigned long eaddr = 0, paddr = 0;
>> -	struct pseries_errorlog *pseries_log;
>> -	struct pseries_mc_errorlog *mce_log;
>> -	int disposition = rtas_error_disposition(errp);
>>   	int initiator = rtas_error_initiator(errp);
>>   	int severity = rtas_error_severity(errp);
>> +	unsigned long eaddr = 0, paddr = 0;
>>   	u8 error_type, err_sub_type;
>>   
>> +	if (!mce_log)
>> +		goto out;
>> +
>> +	error_type = mce_log->error_type;
>> +	err_sub_type = rtas_mc_error_sub_type(mce_log);
>> +
>>   	if (initiator == RTAS_INITIATOR_UNKNOWN)
>>   		mce_err.initiator = MCE_INITIATOR_UNKNOWN;
>>   	else if (initiator == RTAS_INITIATOR_CPU)
>> @@ -572,18 +609,7 @@ static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp)
>>   	mce_err.error_type = MCE_ERROR_TYPE_UNKNOWN;
>>   	mce_err.error_class = MCE_ECLASS_UNKNOWN;
>>   
>> -	if (!rtas_error_extended(errp))
>> -		goto out;
>> -
>> -	pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
>> -	if (pseries_log == NULL)
>> -		goto out;
>> -
>> -	mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
>> -	error_type = mce_log->error_type;
>> -	err_sub_type = rtas_mc_error_sub_type(mce_log);
>> -
>> -	switch (mce_log->error_type) {
>> +	switch (error_type) {
>>   	case MC_ERROR_TYPE_UE:
>>   		mce_err.error_type = MCE_ERROR_TYPE_UE;
>>   		mce_common_process_ue(regs, &mce_err);
>> @@ -683,37 +709,32 @@ static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp)
>>   		mce_err.error_type = MCE_ERROR_TYPE_UNKNOWN;
>>   		break;
>>   	}
>> +out:
>> +	save_mce_event(regs, disposition == RTAS_DISP_FULLY_RECOVERED,
>> +		       &mce_err, regs->nip, eaddr, paddr);
>> +	return disposition;
>> +}
>>   
>> -#ifdef CONFIG_PPC_BOOK3S_64
>> -	if (disposition == RTAS_DISP_NOT_RECOVERED) {
>> -		switch (error_type) {
>> -		case	MC_ERROR_TYPE_SLB:
>> -		case	MC_ERROR_TYPE_ERAT:
>> -			/*
>> -			 * Store the old slb content in paca before flushing.
>> -			 * Print this when we go to virtual mode.
>> -			 * There are chances that we may hit MCE again if there
>> -			 * is a parity error on the SLB entry we trying to read
>> -			 * for saving. Hence limit the slb saving to single
>> -			 * level of recursion.
>> -			 */
>> -			if (local_paca->in_mce == 1)
>> -				slb_save_contents(local_paca->mce_faulty_slbs);
>> -			flush_and_reload_slb();
>> -			disposition = RTAS_DISP_FULLY_RECOVERED;
>> -			break;
>> -		default:
>> -			break;
>> -		}
>> -	} else if (disposition == RTAS_DISP_LIMITED_RECOVERY) {
>> -		/* Platform corrected itself but could be degraded */
>> -		printk(KERN_ERR "MCE: limited recovery, system may "
>> -		       "be degraded\n");
>> -		disposition = RTAS_DISP_FULLY_RECOVERED;
>> -	}
>> -#endif
>> +static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp)
>> +{
>> +	struct pseries_errorlog *pseries_log;
>> +	struct pseries_mc_errorlog *mce_log = NULL;
>> +	int disposition = rtas_error_disposition(errp);
>> +	u8 error_type, err_sub_type;
>> +
>> +	if (!rtas_error_extended(errp))
>> +		goto out;
>> +
>> +	pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
>> +	if (!pseries_log)
>> +		goto out;
>> +
>> +	mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
>> +	error_type = mce_log->error_type;
>> +	err_sub_type = rtas_mc_error_sub_type(mce_log);
>> +
>> +	disposition = mce_handle_err_realmode(disposition, error_type);
>>   
>> -out:
>>   	/*
>>   	 * Enable translation as we will be accessing per-cpu variables
>>   	 * in save_mce_event() which may fall outside RMO region, also
>> @@ -724,10 +745,10 @@ static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp)
>>   	 * Note: All the realmode handling like flushing SLB entries for
>>   	 *       SLB multihit is done by now.
>>   	 */
>> +out:
>>   	mtmsr(mfmsr() | MSR_IR | MSR_DR);
>> -	save_mce_event(regs, disposition == RTAS_DISP_FULLY_RECOVERED,
>> -			&mce_err, regs->nip, eaddr, paddr);
>> -
>> +	disposition = mce_handle_err_virtmode(regs, errp, mce_log,
>> +					      disposition);
>>   	return disposition;
>>   }
>>   
>> -- 
>> 2.17.2
>>
>>


[-- Attachment #2: Type: text/html, Size: 9381 bytes --]

^ permalink raw reply

* Re: [v3 04/15] powerpc/perf: Add support for ISA3.1 PMU SPRs
From: Athira Rajeev @ 2020-07-22  8:07 UTC (permalink / raw)
  To: Jordan Niethe
  Cc: Gautham R Shenoy, Michael Neuling, maddy, kvm, kvm-ppc, svaidyan,
	acme, jolsa, linuxppc-dev
In-Reply-To: <CACzsE9r9fy22hScRm7yz5OeZH9jXA+97hEfAOo-Nk_EPwW-_Dw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 9478 bytes --]



> On 22-Jul-2020, at 9:48 AM, Jordan Niethe <jniethe5@gmail.com> wrote:
> 
> On Sat, Jul 18, 2020 at 1:02 AM Athira Rajeev
> <atrajeev@linux.vnet.ibm.com <mailto:atrajeev@linux.vnet.ibm.com>> wrote:
>> 
>> From: Madhavan Srinivasan <maddy@linux.ibm.com>
>> 
>> PowerISA v3.1 includes new performance monitoring unit(PMU)
>> special purpose registers (SPRs). They are
>> 
>> Monitor Mode Control Register 3 (MMCR3)
>> Sampled Instruction Event Register 2 (SIER2)
>> Sampled Instruction Event Register 3 (SIER3)
>> 
>> MMCR3 is added for further sampling related configuration
>> control. SIER2/SIER3 are added to provide additional
>> information about the sampled instruction.
>> 
>> Patch adds new PPMU flag called "PPMU_ARCH_310S" to support
>> handling of these new SPRs, updates the struct thread_struct
>> to include these new SPRs, include MMCR3 in struct mmcr_regs.
>> This is needed to support programming of MMCR3 SPR during
>> event_[enable/disable]. Patch also adds the sysfs support
>> for the MMCR3 SPR along with SPRN_ macros for these new pmu sprs.
>> 
>> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
>> ---
>> arch/powerpc/include/asm/perf_event_server.h |  2 ++
>> arch/powerpc/include/asm/processor.h         |  4 ++++
>> arch/powerpc/include/asm/reg.h               |  6 ++++++
>> arch/powerpc/kernel/sysfs.c                  |  8 ++++++++
>> arch/powerpc/perf/core-book3s.c              | 29 ++++++++++++++++++++++++++++
>> 5 files changed, 49 insertions(+)
>> 
>> diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
>> index 14b8dc1..832450a 100644
>> --- a/arch/powerpc/include/asm/perf_event_server.h
>> +++ b/arch/powerpc/include/asm/perf_event_server.h
>> @@ -22,6 +22,7 @@ struct mmcr_regs {
>>        unsigned long mmcr1;
>>        unsigned long mmcr2;
>>        unsigned long mmcra;
>> +       unsigned long mmcr3;
>> };
>> /*
>>  * This struct provides the constants and functions needed to
>> @@ -75,6 +76,7 @@ struct power_pmu {
>> #define PPMU_HAS_SIER          0x00000040 /* Has SIER */
>> #define PPMU_ARCH_207S         0x00000080 /* PMC is architecture v2.07S */
>> #define PPMU_NO_SIAR           0x00000100 /* Do not use SIAR */
>> +#define PPMU_ARCH_310S         0x00000200 /* Has MMCR3, SIER2 and SIER3 */
> We elsewhere have CPU_FTR_ARCH_31, so should this be PPMU_ARCH_31S to
> be consistent.
>> 

Ok,
This change will need to be done in all places which are currently using PPMU_ARCH_310S

>> /*
>>  * Values for flags to get_alternatives()
>> diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
>> index 52a6783..a466e94 100644
>> --- a/arch/powerpc/include/asm/processor.h
>> +++ b/arch/powerpc/include/asm/processor.h
>> @@ -272,6 +272,10 @@ struct thread_struct {
>>        unsigned        mmcr0;
>> 
>>        unsigned        used_ebb;
>> +       unsigned long   mmcr3;
>> +       unsigned long   sier2;
>> +       unsigned long   sier3;
>> +
>> #endif
>> };
>> 
>> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
>> index 88e6c78..21a1b2d 100644
>> --- a/arch/powerpc/include/asm/reg.h
>> +++ b/arch/powerpc/include/asm/reg.h
>> @@ -876,7 +876,9 @@
>> #define   MMCR0_FCHV   0x00000001UL /* freeze conditions in hypervisor mode */
>> #define SPRN_MMCR1     798
>> #define SPRN_MMCR2     785
>> +#define SPRN_MMCR3     754
>> #define SPRN_UMMCR2    769
>> +#define SPRN_UMMCR3    738
>> #define SPRN_MMCRA     0x312
>> #define   MMCRA_SDSYNC 0x80000000UL /* SDAR synced with SIAR */
>> #define   MMCRA_SDAR_DCACHE_MISS 0x40000000UL
>> @@ -918,6 +920,10 @@
>> #define   SIER_SIHV            0x1000000       /* Sampled MSR_HV */
>> #define   SIER_SIAR_VALID      0x0400000       /* SIAR contents valid */
>> #define   SIER_SDAR_VALID      0x0200000       /* SDAR contents valid */
>> +#define SPRN_SIER2     752
>> +#define SPRN_SIER3     753
>> +#define SPRN_USIER2    736
>> +#define SPRN_USIER3    737
>> #define SPRN_SIAR      796
>> #define SPRN_SDAR      797
>> #define SPRN_TACR      888
>> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
>> index 571b325..46b4ebc 100644
>> --- a/arch/powerpc/kernel/sysfs.c
>> +++ b/arch/powerpc/kernel/sysfs.c
>> @@ -622,8 +622,10 @@ void ppc_enable_pmcs(void)
>> SYSFS_PMCSETUP(pmc8, SPRN_PMC8);
>> 
>> SYSFS_PMCSETUP(mmcra, SPRN_MMCRA);
>> +SYSFS_PMCSETUP(mmcr3, SPRN_MMCR3);
>> 
>> static DEVICE_ATTR(mmcra, 0600, show_mmcra, store_mmcra);
>> +static DEVICE_ATTR(mmcr3, 0600, show_mmcr3, store_mmcr3);
>> #endif /* HAS_PPC_PMC56 */
>> 
>> 
>> @@ -886,6 +888,9 @@ static int register_cpu_online(unsigned int cpu)
>> #ifdef CONFIG_PMU_SYSFS
>>        if (cpu_has_feature(CPU_FTR_MMCRA))
>>                device_create_file(s, &dev_attr_mmcra);
>> +
>> +       if (cpu_has_feature(CPU_FTR_ARCH_31))
>> +               device_create_file(s, &dev_attr_mmcr3);
>> #endif /* CONFIG_PMU_SYSFS */
>> 
>>        if (cpu_has_feature(CPU_FTR_PURR)) {
>> @@ -980,6 +985,9 @@ static int unregister_cpu_online(unsigned int cpu)
>> #ifdef CONFIG_PMU_SYSFS
>>        if (cpu_has_feature(CPU_FTR_MMCRA))
>>                device_remove_file(s, &dev_attr_mmcra);
>> +
>> +       if (cpu_has_feature(CPU_FTR_ARCH_31))
>> +               device_remove_file(s, &dev_attr_mmcr3);
>> #endif /* CONFIG_PMU_SYSFS */
>> 
>>        if (cpu_has_feature(CPU_FTR_PURR)) {
>> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
>> index f4d07b5..ca32fc0 100644
>> --- a/arch/powerpc/perf/core-book3s.c
>> +++ b/arch/powerpc/perf/core-book3s.c
>> @@ -72,6 +72,11 @@ struct cpu_hw_events {
>> /*
>>  * 32-bit doesn't have MMCRA but does have an MMCR2,
>>  * and a few other names are different.
>> + * Also 32-bit doesn't have MMCR3, SIER2 and SIER3.
>> + * Define them as zero knowing that any code path accessing
>> + * these registers (via mtspr/mfspr) are done under ppmu flag
>> + * check for PPMU_ARCH_310S and we will not enter that code path
>> + * for 32-bit.
>>  */
>> #ifdef CONFIG_PPC32
>> 
>> @@ -85,6 +90,9 @@ struct cpu_hw_events {
>> #define MMCR0_PMCC_U6          0
>> 
>> #define SPRN_MMCRA             SPRN_MMCR2
>> +#define SPRN_MMCR3             0
>> +#define SPRN_SIER2             0
>> +#define SPRN_SIER3             0
>> #define MMCRA_SAMPLE_ENABLE    0
>> 
>> static inline unsigned long perf_ip_adjust(struct pt_regs *regs)
>> @@ -581,6 +589,11 @@ static void ebb_switch_out(unsigned long mmcr0)
>>        current->thread.sdar  = mfspr(SPRN_SDAR);
>>        current->thread.mmcr0 = mmcr0 & MMCR0_USER_MASK;
>>        current->thread.mmcr2 = mfspr(SPRN_MMCR2) & MMCR2_USER_MASK;
>> +       if (ppmu->flags & PPMU_ARCH_310S) {
>> +               current->thread.mmcr3 = mfspr(SPRN_MMCR3);
> Like MMCR0_USER_MASK and MMCR2_USER_MASK do we need a MMCR3_USER_MASK
> here, or is there no need?

Jordan

We don’t need user mask for MMCR3 and other new SPRs ( SIER2/3) . Incase of MMCR3, we dont have any Freeze control bits and incase of SIER2/3, it is similar to SIER (where HW handles the masking of the bits), hence we didn't add any user_mask for these SPRs

Thanks
Athira

>> +               current->thread.sier2 = mfspr(SPRN_SIER2);
>> +               current->thread.sier3 = mfspr(SPRN_SIER3);
>> +       }
>> }
>> 
>> static unsigned long ebb_switch_in(bool ebb, struct cpu_hw_events *cpuhw)
>> @@ -620,6 +633,12 @@ static unsigned long ebb_switch_in(bool ebb, struct cpu_hw_events *cpuhw)
>>         * instead manage the MMCR2 entirely by itself.
>>         */
>>        mtspr(SPRN_MMCR2, cpuhw->mmcr.mmcr2 | current->thread.mmcr2);
>> +
>> +       if (ppmu->flags & PPMU_ARCH_310S) {
>> +               mtspr(SPRN_MMCR3, current->thread.mmcr3);
>> +               mtspr(SPRN_SIER2, current->thread.sier2);
>> +               mtspr(SPRN_SIER3, current->thread.sier3);
>> +       }
>> out:
>>        return mmcr0;
>> }
>> @@ -840,6 +859,11 @@ void perf_event_print_debug(void)
>>                pr_info("EBBRR: %016lx BESCR: %016lx\n",
>>                        mfspr(SPRN_EBBRR), mfspr(SPRN_BESCR));
>>        }
>> +
>> +       if (ppmu->flags & PPMU_ARCH_310S) {
>> +               pr_info("MMCR3: %016lx SIER2: %016lx SIER3: %016lx\n",
>> +                       mfspr(SPRN_MMCR3), mfspr(SPRN_SIER2), mfspr(SPRN_SIER3));
>> +       }
>> #endif
>>        pr_info("SIAR:  %016lx SDAR:  %016lx SIER:  %016lx\n",
>>                mfspr(SPRN_SIAR), sdar, sier);
>> @@ -1305,6 +1329,8 @@ static void power_pmu_enable(struct pmu *pmu)
>>        if (!cpuhw->n_added) {
>>                mtspr(SPRN_MMCRA, cpuhw->mmcr.mmcra & ~MMCRA_SAMPLE_ENABLE);
>>                mtspr(SPRN_MMCR1, cpuhw->mmcr.mmcr1);
>> +               if (ppmu->flags & PPMU_ARCH_310S)
>> +                       mtspr(SPRN_MMCR3, cpuhw->mmcr.mmcr3);
>>                goto out_enable;
>>        }
>> 
>> @@ -1348,6 +1374,9 @@ static void power_pmu_enable(struct pmu *pmu)
>>        if (ppmu->flags & PPMU_ARCH_207S)
>>                mtspr(SPRN_MMCR2, cpuhw->mmcr.mmcr2);
>> 
>> +       if (ppmu->flags & PPMU_ARCH_310S)
>> +               mtspr(SPRN_MMCR3, cpuhw->mmcr.mmcr3);
>> +
>>        /*
>>         * Read off any pre-existing events that need to move
>>         * to another PMC.
>> --
>> 1.8.3.1


[-- Attachment #2: Type: text/html, Size: 23566 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox